So the title may be a reaction when someone read the recent addition of words such as COQUI into the new Official Scrabble Word List from Collins, the Collins Scrabble Words (CSW2015). Indeed, there are many un-English-looking words that one is tempted to find out if these are all indeed English words, and where they come from.
I decided to find the answer with the help of Google Trends. For those unfamiliar with it, Google Trends allows you to key in a word or term, and see (in relative terms) how often that word has been searched in the recent past. More importantly for my current purpose, it can also show a relative comparison of how frequently a term is searched across a few regions. Take for example this search for “Hillary Clinton” in Google Trends, which not only shows how often people has interest in her (spiking up during her 2008 contesting of the presidential candidacy), but also that most searches are, naturally, from the United States, reflecting the familiarity of the term “Hillary Clinton” in the US. (It is interesting to note that Kenya is the region with the third more searches on her, probably due to the contest against Obama, but that’s for another story.)
Taking the same idea, I unleashed the new additions in CSW2015 to Google Trends, to identify where the words are most commonly searched, as a proxy of where the words are most commonly used.
Although there are supposedly 6,500 new words in CSW2015, the list is unfortunately not publicly available at the moment. As the next best source, I decided to scour the CSW 15 initiation kit prepared by the WESPA to help Scrabble players transition to the new dictionary. The kit selected the most useful words to know from the 4171 words of 3 to 9 letter lengths to help Scrabblers prioritise their study (e.g. omitting the six letter words, which usually are not as useful).
I took all the 3-5 letter words from the kit, the new 7s and 8s in the top 10000 probability, and 7s and 8s with 4 vowels and more. This list was further scrubbed, removing words which are pure -S or -ED extensions of new words to reduce duplication e.g. PWN is in but PWNED and PWNS are out. New extensions of old words (e.g. HOIED) are retained. The result is a more manageable 873 words, which also managed to represent some 6 letter words via the plurals in the 7s.
These words were then all fed into Google Trends. Out of these, 242 of them were not searched frequently enough for Google Trend to show any results – led by the likes of BOXLA, HOIED, etc. From the remaining 631 words, I extracted out all the regions where each word has the most search.
The result is presented in the interactive chart below, where regions with the biggest circles are the ones where the most words were searched the most. Hover on the circles in a region to see the words associated with them.
(Note: I found out it may take some time for the chart to fully load. The regions with smaller counts will populate later; give it some time and explore the bigger circles first).
A further 85 words do not have data on the region where they are commonly searched (a sampling of them: CAZH, AIYEE, EMICS, etc). The remaining 546 words are all somewhere in the map above.
The USA is obviously, and as expected, the biggest contributor the searches. But there are rather surprisingly (for me) 83 other regions represented here. The top 10 with the number of words in them are:
Admittedly some of the words may be misleading due to them having a local context different from the meaning in the Collins dictionary: MMM which is tops in Indonesia happens to be the name of an organisation there, not an interjection; ditto YEOW which is a common enough surname in Singapore, etc. However there are several rather interesting appearances out of the expected, e.g. the Philippines seem to have a lot of FANGIRL s who love their EMOJIs. Puerto Rico indeed emerge tops for COQUI, but the other top word there (MONIC) hardly sounds Puerto Rican.
I have no doubt that the fine lexicographers involved in compiling the word list would have qualified all the words to ensure the words, however foreign, indeed fulfill some criteria to be recognised as words found in usage in English. For me, this map simply reflects the richness upon which the English language taps on to absorb new words.
In the (hopefully not so far) future I will “mine” the Google Trends data for more info on the recency of the new words.