With approved use of the two billion word Cambridge English Corpus, Dr. Charles Browne, Dr. Brent Culligan and Joseph Phillips have created a New General Service List (NGSL) of core high frequency vocabulary words for students of English as a second language . First published in early 2013, the NGSL provides over 92% coverage for most general English texts (the highest of any corpus-derived general English word list to date). Use the pulldown menu to download the NGSL in various forms, and to try out our large and growing number of free online tools for learning, teaching, testing, analysing and editing texts with the NGSL. Be sure to check out our other special purpose word lists which have been designed to work in a modular approach in conjunction with the NGSL - these include the New Academic Word List (NAWL), the TOEIC Service List (TSL) and the Business Service List (BSL). All of our corpus-derived word lists are public domain and available to you for free as long as you properly cite our work (how to cite is mentioned in the FAQ section). Enjoy!
New General Service List 1.01 (NGSL): Celebrating 60 years of Vocabulary Learning:
In 1953, Michael West published a remarkable list of several thousand important vocabulary words known as the General Service List (GSL). Based on more than two decades of pre-computer corpus research, input from other famous early 20th century researchers such as Harold Palmer, and several vocabulary conferences sponsored by the Carnegie Foundation in the 30s, the GSL was designed to be more than simply a list of high frequency words, its primary purpose was to combine both objective and subjective criteria to come up with a list of words that would be of “general service” to learners of English as a foreign language. However, as useful and helpful as this list has been to us over the decades, it has also been criticized for (1) being based on a corpus that is considered to be quite dated, (2) being too small by modern standards (the initial work on the GSL was based on a 2.5 million word corpus that was collected under a grant from the Rockefeller Foundation in 1938), and (3) for not clearly defining what constitutes a “word”.
In March of 2013, on the 60th anniversary of West’s publication of the GSL, my colleagues (Dr. Brent Culligan & Joseph Phillips of Aoyama Gakuin Women’s Junior College) and I (Dr. Charles Browne, Meiji Gakuin University) announced the creation of a New General Service List (NGSL), one that is based on a carefully selected 273 million-word subsection of the 2 billion word Cambridge English Corpus (CEC) as follows:
Following many of the same steps that West and his colleagues did, we have tried to combine the strong objective scientific principles of corpus and vocabulary list creation with useful pedagogic insights to create a list of approximately 2800 high frequency words which meet the following goals:
- to update and greatly expand the size of the corpus used (273 million words compared to the 2.5 million word corpus behind the original GSL), with the hope of increasing the generalizability and validity of the list
- to create a list of the most important high-frequency words useful for second language learners of English, ones which gives the highest possible coverage of English texts with the fewest words possible.
- to make a NGSL that is based on a clearer definition of what constitutes a word
- to be a starting point for discussion among interested scholars and teachers around the world, with the goal of updating and revising the list based on this input (in much the same way that West did with the original GSL)
The chart below gives an indication of the improvement in coverage that the NGSL 1.01 version has over the original GSL when considering each of the words on the list with its associated inflected forms (lemmas):
We will be doing our best to make this list available in as many useable formats as possible, including providing definitions for all words in easy English and uploading the NGSL to free online learning tools such as Quizlet. Please look around the site and leave comments for us to help improve both the site as well as the list itself.
Combining the NGSL Word List with other Special Purpose Word Lists:
Although the NGSL was designed to help learners attain the highest possible coverage of general English with the fewest possible words, an important pedagogic question to consider is once these 2800 words are mastered, what words should learners study next?
While continuing to study the next most frequency general English words beyond the NGSL seems a logical next step, two issues which the learner faces are (1) the number of words they need to learn to make an additional 1% coverage gain increases sharply after 92%, and (2) depending on the student's specialization, it is very likely that they will make significantly faster gains by learning special purpose vocabulary.
To that end, we have created 3 additional special purpose vocabulary lists that fit together perfectly with the NGSL (i.e. no overlap or repeating words), the first is the New Academic Word List 1.0 (NAWL), the second is the TOEIC Service List 1.1 (TSL) and the third is the Business Service List 1.0 (BSL). We have also created a list of high frequency word of spoken English known as the NGSL-S (because it is a subset of the NGSL corpus). Each offers extremely good coverage within that specific domain and may be a useful next step for students with that goal. The efficiency of these lists can be seen the chart below which gives a rough estimate of coverage figures for each vocabulary list as well as the how the size of these lists compare with the overall vocabulary size of native speakers of English as well as the English language as a whole:
One thing that needs to be noted is that when trying to read or listen to materials within a special purpose genre, the coverage offered by the NGSL will be a little different from what it offers for general English (sometimes lower and sometimes higher than 92%). Downloads of all of our special purpose word lists and associated free online learning and content creation tools are available from this website via the pulldown menu on the left.
New Academic Word List 1.0 (NAWL):
While we are all tremendous fans of the original Academic Word List (AWL) developed by Dr. Averil Coxhead, her AWL was made to fit together with, and be a next step for students and teachers using West's 1953 GSL, both of which were based on the concept of word families. With the publication of the 2013 NGSL, which is based on the concept of modified lexemes, we knew that we needed to create a new list of important high frequency academic words that would fit tightly together with the NGSL and therefore published the New Academic Word List (NAWL), which was based on a 288 million word academic corpus. Like the NGSL, the NAWL provides a bit better coverage than the original AWL. As you can see from the chart below, the combined NGSL/NAWL gives about 5% more text coverage than the combined GSL/AWL. More information about the NAWL version 1.0, including various downloadable versions of the list can be found using the pulldown menu on the left (or by clicking here).
TOEIC Service List 1.1 (TSL):
In May of 2016, Dr Browne and Dr Culligan published a 2nd special purpose word list, the TSL or TOEIC Service List, that was designed to be used as the next step after learning the New General Service List for learners who need to understand the texts and listening materials that appear on the TOEIC exam. Based on a 1.5 million word corpus of TOEIC materials, the 1200 words of the TSL (when combined with the 2800 words of the NGSL) provides up to 99% coverage of words that appear on the TOEIC test. More information on the TSL version 1.1, as well as a growing number of free TSL resources can be found on the TSL page from the pulldown on the left (or by clicking here).
Business Service List 1.0 (BSL):
In July of 2016, Dr Browne and Dr Culligan published a 3rd special purpose word list, the BSL or Business Service List, that was designed to be used as the next step after learning the New General Service List for learners needing to comprehend general business English texts and materials. Based on an approximately 64 million word corpus of business texts, newspapers, journals and websites, the 1700 words of the BSL (when learned in combination of the 2800 words of the NSGL) provide learners with approximately 97% coverage of the English they need in most general business situations and materials. More information on the BSL version 1.0 as well as a growing number of free BSL resources can be found on the BSL page from the pulldown on the left (or by clicking here).
NGSL-S Update (June 3, 2016)
We finally got around to doing a long overdue cleanup and update of the frequencies for the spoken subsection of the NGSL, which will now be known as the NGSL-S 1.1. The frequencies and data can be downloaded from the "NGSL Lists" pulldown menu at the left. As you can see from the chart below, the NGSL-S 1.1 offers incremental improvements in coverage in all three categories (Unscripted Spoken, Radio, and TV) over the NGSL-S 1.0 that was released in 2013. At the bottom of the chart you can also see the improvement the 1.01 version of the main NGSL list over the original 1.0 version:
NGSL Update (April 4, 2014)
Based on feedback given at conferences, email and this website, today we release version 1.01 of the NGSL. Both the 1.0 and 1.01 versions of the list are available for download from the pulldown menu on the left. The net result of the 1.01 changes will decrease the number of NGSL headwords by 17 from 2818 to 2801. Below is a summary of the revisions.
TWO WORDS ADDED:
• Insertion of TOURNAMENT, which was accidentally deleted in the initial analysis
• YEAH, which was originally counted as a derived form of YES, is now counted under its own headword
NINTEEN WORDS DELETED:
Four numbers were deleted and moved the supplemental list:
Inflected parts of speech of pronouns were demoted and listed under their canonical objective pronoun:
o HER was listed under SHE
o HIM and HIS were listed under HE
o ITS was listed under IT
o ME and MY were listed under I
o OUR and US were listed under WE
o THEIR and THEM were listed under THEY
o THESE was listed under THIS
o THOSE was listed under THAT
o WHOM and WHOSE were listed under WHO
o YOUR was listed under YOU
Please note that the Excel file has several tabs providing you with access to different bits of information regarding the NGSL 1.01 :
- The first 3 tabs give a lemmatized version of the list in 3 bands (1-1000, 1001-2000 and 2001-2801)
- The 4th tab gives you frequency information including the SFI and adjusted frequency per million (please note that this tab gives only the headwords without all the associated lemmas which may be useful for situations where the focus is more on teaching than in accurately calculating coverage)
- The 5th tab lists the 52 words from the categories of NUMBERS, MONTHS and DAYS OF THE WEEK which were removed from the NGSL but may be needed for pedagogic purposes.
NGSL-S 1.0 Update (Dec 2, 2013)
Based on a request from a Japanese university, we are now providing frequency and coverage figures not only for the 273 million word corpus of general English (which is comprised of 90% written and 10% spoken data), but also for the 27 million word spoken subsection of our corpus which we will now refer to as the NGSL-S 1.0 (S=Spoken). As expected, important vocabulary thresholds are reached with far fewer words in spoken English. Our spoken subsection consisted of 3 main parts, spoken conversational English, TV and radio. What is interesting is how much lower the coverage figures are for TV and radio, than for conversational English:
While we are not ready to make strong claims about the spoken coverage figures as the list has not yet gone through the same level of scrutiny as the NGSL list as a whole, we provide the figures (and list) in the spirit of this whole project, and that is to try to put out the NGSL in as many useable forms and with as many useable tools as possible.
Spreading The Word...
Dr. Browne has been trying to disseminate information about the NGSL, NAWL, TSL and BSL through a series of academic presentations at conferences around the world including the 2013 World Congress on Extensive Reading in Korea, a Featured Speaker address at the 2013 National KOTESOL Conference in October, the 2013 JALT National Conference in Kobe, a Keynote address at the 2013 Korean Association of Corpus Linguistics, the 2013 Vocab@Vic Conference in New Zealand, 2 Keynotes at Cambridge Day Conferences in Taiwan in April of 2014, a Plenary address at KOTESOL National in May of 2014, a Featured presentation at the Vocab SIG of JALT in June of 2014, a Plenary at the TRI-ELE Conference in Thailand, a Featured Vocabulary Symposium at the AILA World Congress of Applied Linguistics in Australia in early August 2014, a Featured speaker address at the JACET National Conference in late August 2014, a Keynote address at the JALT Extensive Reading Conference at Keisen university in September of 2014, a Keynote for Kansai TechDay Plus in Kobe in October of 2014, a Keynote at the ICEE Conference at Shin Chein University in Taiwan in April of 2015, two Plenary Addresses given at a Professional Development Conference in June 2015, at Qatar University, Qatar, a Keynote address at the combined JALT, JASAL and Okayama University Language Education Centre Conference in July 2015, Plenary Talk in Sept 2015 at the TEFLIN Conference in Bali, Indonesia, a TEDx Talk in October of 2015 given at Tokyo International School in Tokyo, Japan and many more.
New General Service List by Browne, C., Culligan, B., and Phillips, J. is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Permissions beyond the scope of this license may be available at http://www.charlie-browne.com.
NOTE: Development of the NGSL was made possible through approved access to the Cambridge English Corpus (CEC). The CEC is a multi-billion word computer database of contemporary spoken and written English. It includes British English, American English and other varieties of English. It also includes the Cambridge Learner Corpus, developed in collaboration with the University of Cambridge ESOL Examinations. Cambridge University Press has built up the CEC to provide evidence about language use that helps to produce better language teaching materials.