Letter positions in Scrabble

I recently came across a chart showing where letters are used in an English word. Here, the author took a corpus of English works and used it to count in which position a letter most often occurs.

This inspires me to adapt it for Scrabble use, in particular to address one question: where should I put a tile as I shuffle it on my rack? For many players, myself included, shuffling the tiles on the rack help trigger some visual cues to discover a word lodged somewhere in the recesses of the brain. However, shuffling through all possible combinations will simply take too much time, or even risks missing a word if the combination is not attempted. An intuition on where certain tiles will most of the time end up may help alleviate this issue.

The chart below shows a sort of “heat map”, where the reddest colour indicates location most likely a letter would occur in a word (and hence, where you should put that Scrabble tile more often when shuffling).

Where is the hot location for this letter?

A few differences from the original chart in prooffreader.com  :

– This chart uses CSW12 word list where each word occurs only once. The original chart uses actual English bodies of work, hence certain words which are more frequent (e.g. “the”) will skew the position of certain letters. Relevant for linguistic exploration, not relevant for Scrabble.

– I use percentage rather than absolute count. I’m not interested to know how often V occurs overall in the dictionary. Instead, what I want to know if if I have a V, what is the percentage chance there is a word with it in a particular position.

– I only include words of 4-8 letter long. 2- and 3-letter words should be known cold for any serious Scrabble players, and longer words are far less useful unless you’re Nigel Richards (in which case you don’t need this heat map guide anyway).

 

This chart confirms some well-known assumptions, e.g. S is very valuable as an ending due to its presence in plurals and third-person present tense forms. Ditto the ending-dominant D due to -ED, and G due to -ING.

Y at the end is not so much surprising as its percentage: 51% of the time Y is found at an ending; even more than S which is at about 49%. No doubt the fact that -CY, -ITY, -LY etc all add to it, but the high percentage is mainly due to the fact that Y is hardly present elsewhere. So, don’t put your Ys on the left of your rack as you shuffle.

S on the other hand in itself is frequently found even in non-plurals; particularly S as the first letter seems to be the most frequent among the one-pointers. This is something that many beginners seem to forget: try to shuffle S to other parts also, you’d be surprised how flexible it is.

The chart also confirms an intuition that already helped me find words much faster: mid-valued tiles (3-4 pointers) are mainly dominant as the first letter, with the exception being H, possibly due to its many digraphs (-CH, -PH, -SH, -GH).

Among the power tiles, Q and J are front-heavy, X is end-heavy, and Z is pretty flexible, highlighting why its the best tile among the power tiles. K, however, surprised me as I have tended to put it at the front when shuffling; I guess now I know better.

 

I further split the heat map by word lengths, to see if there are any changes to the pattern within the same letter.

Letter positions for various word lengths

Where there is a grey spot in a square in the chart above, that means the letter never appears in that position (e.g. Q will never be found in the 6th position of a 7-letter word, but appears – albeit very infrequently – as the last in a 7-letter word).

A general observation is that for the longer words, the colours for the same letter are less contrasting, i.e. the letter are more likely to be found all over the place.

Some letter-specific observations:

– From 5-letter onwards, E occurs most frequently in second-last position. See one row above, and you’ll notice it moves in tandem with D’s hot spot in the last tile position, confirming the prevalence of  -ED.

– B, F, P and W notably lost heat as the last letter as the words grow longer: they are still reasonably common end letters in 4-letter words, but practically non-existent for bingo-length words. (Though I gleefully remember an opponent playing the P to the top-right TWS to empty the bag, reasoning there is hardly any long words ending with P, and I hastily rearranged my homeless HEISTER to plonk down TREESHIP to bingo out and eked out a win). W is the learning point for me, as I somehow intuited that there are reasonably many words ending with -OW; apparently not many enough. Also worth noting that B is not even common in the second last position.

– S, for all its flexibility, is pretty useless as a second letter.

– One-pointer consonants (L, N, R, T) are as expected very flexible and present everywhere. N does seem more prominent as the second last letter, presumably due to -ING.

– A, O, and U are particularly frequent in the second position; I guess second syllables onwards use more E and I.

– C ending is not as common as I thought – possibly a wrong intuition due to my tendency to look for -IC forms.

 

5 thoughts on “Letter positions in Scrabble

  1. GH

    Interesting! If I can remember these, I’d be able to make better judgments about which letter to dangle in which position in the triple-triple lane. For example, W in the second position is threatening but W in the fourth or sixth positions are almost of no threat at all. All these are also useful to know when closing a board down. Thanks!

    1. Ricky Purnomo Post author

      Yup, exactly why I investigated the 8-letter words too, even if I will only shuffle 7 letters on the rack: triple-triple lane management.

      Though the TREESHI(P) example should serve as a cautionary tale, as should a recent bingo-out of AIRPROO(F) in the Australia Nationals.

  2. Steve Grob

    I would be interested to know whether the computer programs already use this information. Not that the bots need any more help!
    Many humans, if not most, can infer much of this information when they play. Still a very useful tool.

  3. Quinten

    Would you be willing to share your code? I would love to run this same analysis for OTCWL2014. I assume it would end up looking much the same, but I am curious

    1. Ricky Purnomo Post author

      Quinten, I’m having problem with my github account at the moment. I can email you if you want (it’s in R).

Leave a Reply

Your email address will not be published.

%d bloggers like this: