## Saturday, May 17, 2008

### Four-Letter Words

The Change-a-Word is a staple of supermarket puzzle books. In Change-a-Words, you have to change a word like GULF into a word like YAWN by changing one letter at a time. (So you might say: GULF, GULL, BULL, BURL, BURN, BARN, YARN, YAWN.)

These puzzles are sort of fun, but they are mostly amusing for the variations they inspire (see e.g. my 75th Anniversary Lecture), and for the logical and linguistic questions they raise. For example,

1. Can most words be reached from most words?

2. Which words are "furthest" from each other? Specifically, if the shortest chain of words connecting A and B has length N, then which words A and B maximize N, and what is this maximal value of N?

I consulted my handy electronic dictionary to answer these questions.

As to question 1: For four-letter words, it appears to be the case that about 95% of the words are connected in a massive "supercontinent." This means most four-letter words can indeed be reached from most four-letter words - hence the steady stream of supermarket Change-a-Word books.

The 5% of four-letter-words lying "offshore" are either isolated "island words" (can you think of any?), or isolated pairs of words (such as IDOL-IDYL), or isolated triples such as OGEE-OGRE-OGLE or the distinct case IDLE-IDLY-ISLE, etc. The largest of the offshore islands is the eight-word "atoll" {QUAD, QUAI, QUAY, QUID, QUIP, QUIT, QUIZ, QUOD}. (See figure.)

As to question 2: In my dictionary at least, the Change-a-Word puzzle that requires the greatest number of steps to solve is this one:

INCH
.
.
.
OBIT

These two words lie at opposite ends of the dictionary, so to speak. I'll leave it to the reader to find a chain of words connecting them! A hint is available here, and the solution is available here.

P.S. One assumes that 15-letter words are not concentrated in a massive continent...but I haven't looked at those yet!

danimal said...

There are 26 to the power 4, or just under 457,000, possible combinations of four letters. How many real four-letter words does your electronic dictionary list? Let's call that number f.

I'm guessing f < 10000 -- in any case, the space of possible four-letter words is populated pretty sparsely by real English words.

If you distributed f points randomly in the 26x26x26x26 "space" of possible words, you'd get mainly isolated islands and a few small clusters of connected words. Maybe something as large as the eight word "atoll" in your post... The fact that 95% of real 4-letter words are connected I is due to the common patterns the words must have make them pronounceable?

In that 26x26x26x26 "space", the supercontinent would probably be a wispy-like fractal structure, since its total volume isn't that high but it would cover most of the space.

I guess you could actually make a plot of the space of possible and real three-letter words, since that would be in 3 dimensions. That would be fun to see, I might try it. Is there a list of all three-letter words somewhere on the Net?

danimal said...

Ooops, llight problem with the last post... in thinking about how many clusters would come from scattering f words randomly on the 26x26x26x26 "four-letter word" grid, I was thinking about words being connected if they are right next to each other in space, i.e. one *grid point* apart.

But two four-letter words are connected in the game if they differ by one *letter*. That corresponds to two grid points being connected by a line (of any length) along any of the four coordinate axes. This gives a much higher chance of two random four-letter strings being connected than if they have to land one grid unit apart.

So question seems to be: how does the expected cluster distribution from randomly throwing down f 4-letter words on the grid compare to what Jason found for actual four-letter words, i.e. the "supercontinent" and many small islands including the eight-word atoll?

But this also suggests a much tougher version of the supermarket puzzle! Find a route from one four-letter word to another, but making changes only between neighboring letters in the alphabet, i.e. "b" can only become "a" or "c".

JasonZimba said...

OK, let's try Danimal's game.

BANK...SHOT

Whew! We see one tough thing about the game is that vowels are not adjacent to one another. And "K" was a tough start, because it's next to "J". Let's try again.

BAND...CAMP
BANE
CANE
CAME
DAME
DANE

Stuck...! I can see that it's going to take some work just to find solvable instances of this game.