Brigham Young University (BYU) corpora.
Example: Wikipedia
This corpus contains the full text of Wikipedia (2014), and it contains 1.9 billion words in more than 4.4 million articles.
http://corpus.byu.edu/wiki/
List of BYU corpora:
The most widely used online corpora -- more than 130,000 distinct researchers, teachers, and students each month.
English
| # words | language/dialect | time period | compare |
NOW Corpus NEW | 2.8 billion+ | 20 countries / Web | 2010-yesterday | |
Global Web-Based English (GloWbE) | 1.9 billion | 20 countries / Web | 2012-13 | |
Wikipedia Corpus | 1.9 billion | English | -2014 | Info |
Hansard Corpus (British Parliament) | 1.6 billion | British | 1803-2005 | Info |
Corpus of Contemporary American English (COCA) | 520 million | American | 1990-2015 | * * * * * |
Corpus of Historical American English (COHA) | 400 million | American | 1810-2009 | * * |
TIME Magazine Corpus | 100 million | American | 1923-2006 | |
Corpus of American Soap Operas | 100 million | American | 2001-2012 | * |
British National Corpus (BYU-BNC)* | 100 million | British | 1980s-1993 | * * |
Strathy Corpus (Canada) | 50 million | Canadian | 1970s-2000s | |
CORE Corpus NEW | 50 million | Web registers | -2014 | |
Other languages | ||||
Corpus del Español (see also...) | 100 million | Spanish | 1200s-1900s | * |
Corpus do Português (see also...) | 45 million | Portuguese | 1300s-1900s | |
N-grams | ||||
Google Books: American English | 155 billion | American | 1500s-2000s | * |
Google Books: British English | 34 billion | British | 1500s-2000s | |
Google Books: One Million Books | 89 billion | Am/Br | 1500s-2000s | |
Google Books: Spanish | 45 billion | Spanish | 1500s-2000s |
-------------------
https://en.wikipedia.org/wiki/List_of_text_corpora
No comments:
Post a Comment