With approved use of the two billion word Cambridge English Corpus and helpful input and guidance from Paul Nation, Dr. Charles Browne, Dr. Brent Culligan and Joseph Phillips have worked together to create a new General Service List (NGSL) of important vocabulary words for students of English as a second language . The first version of this interim list was published in early 2013 and provides over 90% coverage for most general English texts. Please feel free to browse around the site, download the list in various forms, read articles about how it was created, and try out the growing number of online learning tools, analytical and editing tools and EFL textbooks which make use of the NGSL.
A New General Service List: Celebrating 60 years of Vocabulary Learning
In 1953, Michael West published a remarkable list of several thousand important vocabulary words known as the General Service List (GSL). Based on more than two decades of pre-computer corpus research, input from other famous early 20th century researchers such as Harold Palmer, and several vocabulary conferences sponsored by the Carnegie Foundation in the 30s, the GSL was designed to be more than simply a list of high frequency words, its primary purpose was to combine both objective and subjective criteria to come up with a list of words that would be of “general service” to learners of English as a foreign language. However, as useful and helpful as this list has been to us over the decades, it has also been criticized for (1) being based on a corpus that is considered to be both dated, (2) being too small by modern standards (the initial work on the GSL was based on a 2.5 million word corpus that was collected under a grant from the Rockefeller Foundation in 1938), and (3) for not clearly defining what constitutes a “word”.
In March of 2013, on the 60th anniversary of West’s publication of the GSL, my colleagues (Dr. Brent Culligan & Joseph Phillips of Aoyama Gakuin Women’s Junior College) and I (Dr. Charles Browne, Meiji Gakuin University) announced the creation of a New General Service List (NGSL), one that is based on a carefully selected 273 million-word subsection of the 2 billion word Cambridge English Corpus (CEC). Following many of the same steps that West and his colleagues did (as well as the many useful suggestions of Professor Paul Nation, project advisor and vocabulary specialist extraordinaire), we have tried to combine the strong objective scientific principles of corpus and vocabulary list creation with useful pedagogic insights to create a list of approximately 2800 high frequency words which meet the following goals:
- to update and greatly expand the size of the corpus used (273 million words compared to the 2.5 million word corpus behind the original GSL), with the hope of increasing the generalizability and validity of the list
- to create a list of the most important high-frequency words useful for second language learners of English, ones which gives the highest possible coverage of English texts with the fewest words possible.
- to make a NGSL that is based on a clearer definition of what constitutes a word
- to be a starting point for discussion among interested scholars and teachers around the world, with the goal of updating and revising the list based on this input (in much the same way that West did with the original GSL)
The chart below gives an indication of the improvement in coverage that the NGSL has over the original when considering each of the words on the list with its associated inflected forms (lemmas):
We will be doing our best to make this list available in as many useable formats as possible, including providing definitions for all words in easy English and uploading the NGSL to free online learning tools such as Quizlet. Please look around the site and leave comments for us to help improve both the site as well as the list itself.
Dr. Browne has been trying to spread the world about the NGSL through academic presentations at conferences around the world including the 2013 World Congress on Extensive Reading in Korea, a Featured Speaker address at the 2013 National KOTESOL Conference in October, the 2013 JALT National Conference in Kobe, a Keynote address at the 2013 Korean Association of Corpus Linguistics, the 2013 Vocab@Vic Conference in New Zealand, Cambridge Day conferences in Taiwan in April of 2014, a Plenary address at KOTESOL National in May of 2014, a vocabulary symposium at the AILA World Congress of Applied Linguistics in Australia in early August 2014 and a Featured speaker address at JACET National in late August 2014.
Update (Dec 2, 2013)
Based on a request from a Japanese university, we are now providing frequency and coverage figures not only for the 273 million word corpus of general English (which is comprised of 90% written and 10% spoken data), but also for the 27 million word spoken subsection of our corpus. As expected, important vocabulary thresholds are reached with far fewer words in spoken English. Our spoken subsection consisted of 3 main parts, spoken conversational English, TV and radio. What is interesting is how much lower the coverage figures are for TV and radio, than for conversational English:
While we are not ready to make strong claims about the spoken coverage figures as the list has not yet gone through the same level of scrutiny as the NGSL list as a whole, we provide the figures (and list) in the spirit of this whole project, and that is to try to put out the NGSL in as many useable forms and with as many useable tools as possible.
Update (Feb 17, 2014)
While we are all tremendous fans of the Academic Word List (AWL) developed by Dr. Averil Coxhead, her AWL was made to fit together with, and be a next step for students using West's 1953 GSL. With the publication of the 2013 NGSL, we knew that we needed to create a list of important high frequency academic words that fit tightly together with the NGSL and have just published a New Academic Word List (NAWL), based on a 288 million word academic corpus. You can read more about the corpus and download the list here.
Update (April 4, 2014)
Based on feedback given at conferences, email and this website, today we release version 1.01 of the NGSL. Both the 1.0 and 1.01 versions of the list are available for download from the pulldown menu on the left. The net result of the 1.01 changes will decrease the number of NGSL headwords by 17 from 2818 to 2801. Below is a summary of the revisions.
TWO WORDS ADDED:
• Insertion of TOURNAMENT, which was accidentally deleted in the initial analysis
• YEAH, which was originally counted as a derived form of YES, is now counted under its own headword
NINTEEN WORDS DELETED:
Four numbers were deleted and moved the supplemental list:
Inflected parts of speech of pronouns were demoted and listed under their canonical objective pronoun:
o HER was listed under SHE
o HIM and HIS were listed under HE
o ITS was listed under IT
o ME and MY were listed under I
o OUR and US were listed under WE
o THEIR and THEM were listed under THEY
o THESE was listed under THIS
o THOSE was listed under THAT
o WHOM and WHOSE were listed under WHO
o YOUR was listed under YOU
Please note that the Excel file has several tabs providing you with access to different bits of information regarding the NGSL 1.01 :
- The first 3 tabs give a lemmatized version of the list in 3 bands (1-1000, 1001-2000 and 2001-2801)
- The 4th tab gives you frequency information including the SFI and adjusted frequency per million (please note that this tab gives only the headwords without all the associated lemmas which may be useful for situations where the focus is more on teaching than in accurately calculating coverage)
- The 5th tab lists the 52 words from the categories of NUMBERS, MONTHS and DAYS OF THE WEEK which were removed from the NGSL but may be needed for pedagogic purposes.
NOTE: Development of the NGSL was made possible through approved access to the Cambridge English Corpus (CEC). The CEC is a multi-billion word computer database of contemporary spoken and written English. It includes British English, American English and other varieties of English. It also includes the Cambridge Learner Corpus, developed in collaboration with the University of Cambridge ESOL Examinations. Cambridge University Press has built up the CEC to provide evidence about language use that helps to produce better language teaching materials.