A MASSIVE language research database responsible for bringing words such as "podcast'' and "celebutante'' to the pages of the Oxford dictionaries has officially hit a total of 1 billion words, researchers said.
Drawing on sources such as weblogs, chatrooms, newspapers, magazines and fiction, the Oxford English Corpus spots emerging trends in language usage to help guide lexicographers when composing the most recent editions of dictionaries.
The publisher of the Oxford English Dictionary, considered one of the most comprehensive dictionaries of the language, added words such as "supersize''and "wiki'' to its pages in its most recent August 2005 edition.
Oxford University Press lexicographer Catherine Soanes said the database is not a collection of 1 billion different words, but of sentences and other examples of the usage and spelling.
"The corpus is purely 21st century English," said Judy Pearsall, publishing manager of English dictionaries. "You're looking at current English and seeing what's happening right now. That's language at the cutting edge.''
As hybrid words such as "geek-chic'' or "inner-child'' increase in usage, Pearsall said part of the research project's goal is to identify words that have lasting power.
"English gets really creative, really fun. What we're putting in dictionaries is words that will stick around,'' she said.
Launched in January 2000, the Oxford English Corpus is part of the world's largest-funded language research project, costing US$90,000-107,000 per year.
It has helped identify how the spellings of common phrases have changed, such as "fazed by'' to "phased by'' or "free rein'' to "free reign.'' "Buck naked'' has increasingly evolved to "butt naked.''
The corpus collects evidence from all the places where English is spoken, whether from North America, Britain, the Caribbean, Australia or India, to reflect the most current and common usage of the English language.
The Oxford English Corpus is at the heart of dictionary-making in Oxford in the 21st century and ensures that the very latest developments in language today can be tracked and recorded. The Corpus can be used in many different ways to study the English language and cultures in which it is used. Because it is large, and because it is made up of text from many different subject areas and types of text, it acts as a representative slice of contemporary English from which all aspects of written language, from vocabulary and lexis to punctuation, discourse, and register, can be studied.