deepmoo

deepmoo was my entry in the programmer of the month competition.

The journal I kept while working on deepmoo.

The source code to deepmoo. The version submitted for the contest was automatically generated from this. If you want to compile it you'll also need the trigraph data it #includes.

My program's guesses on the contest words:

C:\sean\potm>deepmoo -v DUCK WITHOUT RESOURCES
Guessing 'DUCK'
 1 ETAO 0 4
 2 INSH 0 4
 3 RDLU 0 2
 4 MPCG 1 3
 5 WYBF 0 4
 6 JKQV 0 3
 7 DUCK 4 0
Guessing 'WITHOUT'
 1 ETTAAAA 1 5
 2 OIINNNN 1 4
 3 SHHRRRR 0 5
 4 DLLUUUU 1 3
 5 MPPCCCC 0 7
 6 GWYBFJK 0 6
 7 VQXXZZZ 0 7
 8 BITTHOU 2 1
 9 TITOUGH 2 1
10 FITUTHO 2 1
11 WITHOUT 7 0
Guessing 'RESOURCES'
 1 ETTAAAAAA 0 8
 2 OIINNNNNN 0 8
 3 SHHRRRRRR 1 2
 4 DLLUUUUUU 1 3
 5 MPPCCCCCC 1 3
 6 GWWYYYYBB 0 9
 7 FJJKKKQXZ 0 9
 8 SERECEOUS 2 0
 9 SOROSCUSE 1 0
10 CERCOUSER 2 0
11 COOCUEERS 2 0
12 REOURESCE 2 0
13 RESOURCES 9 0

By total coincidence, deepmoo solves 'SEAN' in 3 guesses!

On dictionaries, lossy compression, and wordspaces:

---------------------------------------------------

Feb 15, 1999

I spent a long time exploring the notion of "wordspace".

It's illegal for us to use a dictionary.  It's not
illegal for us to use digraphs or trigraphs or
quadgraphs, though, is it?  (Well, quadgraphs
for 4-letter words would be a dictionary!)

I don't know how to reconcile this, since any restrictions
on the "set of words the program might generate" could
be interpreted as a dictionary.  And the workaround
(explore the "dictionary" first, then fallback on the
full set of strings) prevents that definition from literally
applying.

Even just applying a little bit of smarts (you can't have
the same three letters appear in a row) restricts the
wordspace, so it doesn't seem fair to define a dictionary
as a restriction of a wordspace.

But what if I lossily compress a dictionary so that the
wordspace I pursue is the original dictionary plus
another 10% of garbage words or so?

What if it's 100% bigger?  What if it's 10 times
as big?  What if it's 100 times as big?

Here's information about some "restricted wordspaces"
or "synthetic dictionaries" for 9-letter words:

              N   
5,429,503,678,976  Strings from AAAAAAAAA...ZZZZZZZZZ
  184,511,547,557  Three adjacent trigs (using 5693 found in the dictionary)
   62,902,335,833  Three adjacent trigs (using 3977 found in 9 letter words)
    1,832,210,380  Three adjacent trigs (1289 1st, 1988 2nd, 715 3rd)
    1,191,748,033  All 7 trigraphs valid (of 3977)
      100,129,697  All 7 trigraphs valid for that position
       65,278,757  All 6 quadgraphs valid (of 12580)
        3,813,306  All 6 quads positionally valid from full dict
          449,505  All 6 quads positionally valid from 9-letter-word dict
            6,475  9-letter words in the computer dictionary I've constructed

Unfortunately, at this point POTM became a tedious, boring exercise in trying to compress those tables, since that seemed the best way of improving my program. I never got it past "All 7 trigraphs valid for that position", though.

A run of my program using the 3,813,306 word synthetic wordspace:

C:\sean\potm>deepmoo -q -v DUCK WITHOUT RESOURCES
Guessing 'DUCK'
 1 ETAO 0 4
 2 INSH 0 4
 3 RDLU 0 2
 4 MPCG 1 3
 5 WYBF 0 4
 6 JKQV 0 3
 7 DUCK 4 0
Guessing 'WITHOUT'
 1 ETTAAAA 1 5
 2 OIINNNN 1 4
 3 SHHRRRR 0 5
 4 DLLUUUU 1 3
 5 MPPCCCC 0 7
 6 GWYBFJK 0 6
 7 VQXXZZZ 0 7
 8 TITHOUG 5 1
 9 WITHOUT 7 0
Guessing 'RESOURCES'
 1 ETTAAAAAA 0 8
 2 OIINNNNNN 0 8
 3 SHHRRRRRR 1 2
 4 DLLUUUUUU 1 3
 5 MPPCCCCCC 1 3
 6 GWWYYYYBB 0 9
 7 FJJKKKQXZ 0 9
 8 SORCOUSES 2 0
 9 CORRESCUE 1 0
10 CROCURESS 3 0
11 RESOURCES 9 0

Not good enough to win, anyway!