EDge21: Catch of The Day - The Corpus of Contemporary American English

Wednesday, August 17, 2011

Catch of The Day - The Corpus of Contemporary American English - 8/17/2011

The Corpus of Contemporary American English (COCA) is the largest freely-available corpus* of English, and the only large and balanced corpus of American English.

The Corpus of Contemporary American English was created in 2008, at Brigham Young University, and it is now used by tens of thousands of users every month. COCA is also related to other large corpora including the British National Corpus, the 100 million word TIME Corpus, and the new 400 million word Corpus of Historical American English.

The Corpus of Contemporary American English contains more than 425 million words of text equally divided among spoken, fiction, popular magazines, newspapers, and academic texts. It includes 20 million words for each year since 1990, and the corpus is also updated at least annually. Because of its design, it is suitable for looking at current, ongoing changes in the language.

You can search for exact words or phrases, wildcards, lemmas, parts of speech, or any combinations. You can search for surrounding words within a 10 word window (e.g. all nouns somewhere near faint, all adjectives near woman, or all verbs near feelings), which often gives you good insight into the meaning and use of a word.

With COCA, you can also easily carry out semantically-based queries of the corpus. For example, you can contrast and compare the words near two related words (little/small, democrats/republicans, men/women), to determine the difference in meaning or use between these words.

*a collection of written or spoken material in machine-readable form, assembled for the purpose of studying linguistic structures, frequencies, etc.

DISCLOSURE OF MATERIAL CONNECTION: http://cmp.ly/0

Wednesday, August 17, 2011

Catch of The Day - The Corpus of Contemporary American English - 8/17/2011

No comments: