With approved use of the two billion word Cambridge English Corpus, Dr. Charles Browne, Dr. Brent Culligan and Joseph Phillips have created a New General Service List (NGSL) of important vocabulary words for students of English as a second language . The first version of this interim list was published in early 2013 and provides over 92% coverage for most general English texts (the highest of any published list of high frequency words to date). Please feel free to browse around the site, download the NGSL in various forms, read articles about how it was created, and try out the large and growing number of free online learning tools, teaching tools, analytical tools, editing tools and EFL textbooks which make use of the NGSL as well as our other special purpose word lists including the New Academic Word List (NAWL) and TOEIC Service List (TSL). Enjoy!
A New General Service List: Celebrating 60 years of Vocabulary Learning:
In 1953, Michael West published a remarkable list of several thousand important vocabulary words known as the General Service List (GSL). Based on more than two decades of pre-computer corpus research, input from other famous early 20th century researchers such as Harold Palmer, and several vocabulary conferences sponsored by the Carnegie Foundation in the 30s, the GSL was designed to be more than simply a list of high frequency words, its primary purpose was to combine both objective and subjective criteria to come up with a list of words that would be of “general service” to learners of English as a foreign language. However, as useful and helpful as this list has been to us over the decades, it has also been criticized for (1) being based on a corpus that is considered to be quite dated, (2) being too small by modern standards (the initial work on the GSL was based on a 2.5 million word corpus that was collected under a grant from the Rockefeller Foundation in 1938), and (3) for not clearly defining what constitutes a “word”.
In March of 2013, on the 60th anniversary of West’s publication of the GSL, my colleagues (Dr. Brent Culligan & Joseph Phillips of Aoyama Gakuin Women’s Junior College) and I (Dr. Charles Browne, Meiji Gakuin University) announced the creation of a New General Service List (NGSL), one that is based on a carefully selected 273 million-word subsection of the 2 billion word Cambridge English Corpus (CEC). Following many of the same steps that West and his colleagues did, we have tried to combine the strong objective scientific principles of corpus and vocabulary list creation with useful pedagogic insights to create a list of approximately 2800 high frequency words which meet the following goals:
- to update and greatly expand the size of the corpus used (273 million words compared to the 2.5 million word corpus behind the original GSL), with the hope of increasing the generalizability and validity of the list
- to create a list of the most important high-frequency words useful for second language learners of English, ones which gives the highest possible coverage of English texts with the fewest words possible.
- to make a NGSL that is based on a clearer definition of what constitutes a word
- to be a starting point for discussion among interested scholars and teachers around the world, with the goal of updating and revising the list based on this input (in much the same way that West did with the original GSL)
The chart below gives an indication of the improvement in coverage that the NGSL 1.01 version has over the original GSL when considering each of the words on the list with its associated inflected forms (lemmas):
We will be doing our best to make this list available in as many useable formats as possible, including providing definitions for all words in easy English and uploading the NGSL to free online learning tools such as Quizlet. Please look around the site and leave comments for us to help improve both the site as well as the list itself.
Combining the NGSL Word List with other Special Purpose Word Lists:
The NGSL was designed to help learners attain the highest possible coverage of general English with the fewest possible words, but once these 2800 words are mastered, the question becomes what words should learners study next? Although continuing up the ladder of general English is helpful and fine, there are 2 problems (1) the number of words you need to learn to make an additional 1% coverage gains increases sharply after 92% and (2) depending on the student's specialization, it is very likely that they will make more significant gains by learning special purpose vocabulary. To that end, we have created 3 additional special purpose vocabulary lists that fit together with the NGSL. The New Academic Word List (NAWL), the TOEIC Service List (TSL) and the Business Service List (BSL). Each offers extremely good coverage within that specific domain and may be useful for students with that goal:
One thing that needs to be noted is that when trying to read or listen to materials within a special purpose genre, the coverage offered by the NGSL will be a little different from what it offers for general English (sometimes lower and sometimes higher than 92%). Our special purpose word lists and associated free online learning and content creation tools are also available from this website on the pulldown menu on the left (NAWL and TSL up already with the BSL coming soon!).
A New Academic Word List (NAWL):
In the same way that Averil Coxhead's excellent Academic Word List (AWL) was designed to work as a seamless complement to the GSL for students wanting to quickly master both general and academic English, so too, the authors of the NGSL worked to create a New Academic Word List (NAWL) to complement students and teachers working with the NGSL who wanted to learn or teach academic English. As you can see from the chart below, the combines NGSL/NAWL gives about 5% more text coverage than the combined GSL/AWL. More information about the NAWL including various downloadable versions of the list can be found here
The TOEIC Service List (TSL):
In May of 2016, Dr Browne and Dr Culligan published a 2nd word list that fits together with the New General Service List known as the TSL or TOEIC Service List. Based on a 1.5 million word corpus of TOEIC materials, the 1200 words of the TSL (when combined with the 2800 words of the NGSL) provides up to 99% coverage of words that appear on the TOEIC test. More information on the TSL as well as a growing number of free TSL resources can be found on the TSL page from the pulldown on the left (or by clicking here).
The Business Service List (BSL):
(coming soon) Based on an approximately 100 million word corpus of business texts, newspapers, journals and websites, it is expected that BSL and NSGL will offer learners at least 94% coverage of the English they need in most general business situations and materials.
NGSL-S Update (June 3, 2016)
We finally got around to doing a long overdue cleanup and update of the frequencies for the spoken subsection of the NGSL, which will now be known as the NGSL-S 1.1. The frequencies and data can be downloaded from the "NGSL Lists" pulldown menu at the left. As you can see from the chart below, the NGSL-S 1.1 offers incremental improvements in coverage in all three categories (Unscripted Spoken, Radio, and TV) over the NGSL-S 1.0 that was released in 2013. At the bottom of the chart you can also see the improvement the 1.01 version of the main NGSL list over the original 1.0 version:
NGSL Update (Feb 17, 2014)
While we are all tremendous fans of the Academic Word List (AWL) developed by Dr. Averil Coxhead, her AWL was made to fit together with, and be a next step for students and teachers using West's 1953 GSL. With the publication of the 2013 NGSL, we knew that we needed to create a list of important high frequency academic words that fit tightly together with the NGSL and have just published a New Academic Word List (NAWL), based on a 288 million word academic corpus. Like the NGSL, the NAWL provides higher coverage than the original AWL. You can read more about the corpus and download the list here.
NGSL Update (April 4, 2014)
Based on feedback given at conferences, email and this website, today we release version 1.01 of the NGSL. Both the 1.0 and 1.01 versions of the list are available for download from the pulldown menu on the left. The net result of the 1.01 changes will decrease the number of NGSL headwords by 17 from 2818 to 2801. Below is a summary of the revisions.
TWO WORDS ADDED:
• Insertion of TOURNAMENT, which was accidentally deleted in the initial analysis
• YEAH, which was originally counted as a derived form of YES, is now counted under its own headword
NINTEEN WORDS DELETED:
Four numbers were deleted and moved the supplemental list:
Inflected parts of speech of pronouns were demoted and listed under their canonical objective pronoun:
o HER was listed under SHE
o HIM and HIS were listed under HE
o ITS was listed under IT
o ME and MY were listed under I
o OUR and US were listed under WE
o THEIR and THEM were listed under THEY
o THESE was listed under THIS
o THOSE was listed under THAT
o WHOM and WHOSE were listed under WHO
o YOUR was listed under YOU
Please note that the Excel file has several tabs providing you with access to different bits of information regarding the NGSL 1.01 :
- The first 3 tabs give a lemmatized version of the list in 3 bands (1-1000, 1001-2000 and 2001-2801)
- The 4th tab gives you frequency information including the SFI and adjusted frequency per million (please note that this tab gives only the headwords without all the associated lemmas which may be useful for situations where the focus is more on teaching than in accurately calculating coverage)
- The 5th tab lists the 52 words from the categories of NUMBERS, MONTHS and DAYS OF THE WEEK which were removed from the NGSL but may be needed for pedagogic purposes.
NGSL-S 1.0 Update (Dec 2, 2013)
Based on a request from a Japanese university, we are now providing frequency and coverage figures not only for the 273 million word corpus of general English (which is comprised of 90% written and 10% spoken data), but also for the 27 million word spoken subsection of our corpus which we will now refer to as the NGSL-S 1.0 (S=Spoken). As expected, important vocabulary thresholds are reached with far fewer words in spoken English. Our spoken subsection consisted of 3 main parts, spoken conversational English, TV and radio. What is interesting is how much lower the coverage figures are for TV and radio, than for conversational English:
While we are not ready to make strong claims about the spoken coverage figures as the list has not yet gone through the same level of scrutiny as the NGSL list as a whole, we provide the figures (and list) in the spirit of this whole project, and that is to try to put out the NGSL in as many useable forms and with as many useable tools as possible.
Spreading The Word...
Dr. Browne has been trying to disseminate information about the NGSL through a series of academic presentations at conferences around the world including the 2013 World Congress on Extensive Reading in Korea, a Featured Speaker address at the 2013 National KOTESOL Conference in October, the 2013 JALT National Conference in Kobe, a Keynote address at the 2013 Korean Association of Corpus Linguistics, the 2013 Vocab@Vic Conference in New Zealand, 2 Keynotes at Cambridge Day Conferences in Taiwan in April of 2014, a Plenary address at KOTESOL National in May of 2014, a Featured presentation at the Vocab SIG of JALT in June of 2014, a Plenary at the TRI-ELE Conference in Thailand, a Featured Vocabulary Symposium at the AILA World Congress of Applied Linguistics in Australia in early August 2014, a Featured speaker address at the JACET National Conference in late August 2014, a Keynote address at the JALT Extensive Reading Conference at Keisen university in September of 2014, a Keynote for Kansai TechDay Plus in Kobe in October of 2014, a Keynote at the ICEE Conference at Shin Chein University in Taiwan in April of 2015 and many more!...
NOTE: Development of the NGSL was made possible through approved access to the Cambridge English Corpus (CEC). The CEC is a multi-billion word computer database of contemporary spoken and written English. It includes British English, American English and other varieties of English. It also includes the Cambridge Learner Corpus, developed in collaboration with the University of Cambridge ESOL Examinations. Cambridge University Press has built up the CEC to provide evidence about language use that helps to produce better language teaching materials.