Codebook representation
=======================
Some effort was exerted in balancing the representation of codebooks
to allow for both speed and size efficiency, particularly in the handling
of sparse codebooks. (Note: some sparse codebooks aren't that sparse,
and OUGHT to go the normal route, but this was easier and the difference
is minimla.)
Codebooks consist of two components:
huffman scalar decode (mapping variable-length numbers to small integers)
VQ vector decode (mapping small integers to known vectors)
Some codebooks are "sparse". For our size-aggressive decode strategy,
representing sparse codebooks the normal way explodes in size (one
codebook in a test file I looked at had only 81 of 6561 symbols
defined, and took 180KB of storage using a non-sparse representation).
For the huffman scalar decode, there are three possible integers:
codebook symbol -- the integer value represented
huffman symbol -- an arbitrary variable-bit-width integer
sorted index -- the index of the entry in the sorted_codewords array
We store the following fields for each codebook:
entries: the "number" of codebook symbols (i.e.. the largest one + 1)
sorted_entries: the number of entries in the 'sorted_' tables:
if !sparse, the number of symbols not determined by fast-huffman decode
if sparse, the number of present (non-sparse) symbols
codeword_lengths:
if !sparse, the huffman symbol lengths indexed by codebook symbol
if sparse, the huffman symbol lengths indexed by sorted index
codewords:
if !sparse, the huffman symbols indexed by codebook symbol
if sparse, NULL (during build, the huffman symbols indexed by build order)
fast_huffman:
if !sparse, the codebook symbol "indexed" by huffman symbol
if sparse, the sorted index "indexed" by huffman symbol
sorted_codewords:
if !sparse,
if STB_VORBIS_HUFFMAN_BINARY_SEARCH, huffman symbols in sorted order
else NULL
if sparse, all huffman symbols in sorted order
sorted_values:
if !sparse,
if STB_VORBIS_HUFFMAN_BINARY_SEARCH, codebook symbols indexed by sorted index
else NULL
if sparse, codebook symbols indexed by sorted index
Note that generally, all the tables are indexed by the same thing, _except_
the codeword_lengths table. Sparse codebooks have no arrays that are
indexed by codebook symbol (since codebook symbol is the thing that is
sparse).
If STB_VORBIS_NO_DIVIDES_IN_CODEBOOK is defined, then sparse vector codebook
accesses begin by determining the codebook symbol as normal (then precede to
mod/divide). If STB_VORBIS_NO_DIVIDES_IN_CODEBOOK is NOT defined, then we
are required to pre-expand the symbols into their VQ vectors; but we want to
do this "sparsely". So we "compact" the pre-expanded codebook so that it is
indexed by the 'sorted index' instead of by the 'codebook symbol'. To make
this work, the sparse vector codebook access must begin by determining the
_sorted index_ without converting it to the codebook symbol. Therefore
(a) it calls a different decode function that reports the sorted index; (b)
sparse lookups on the fast huffman need to return the sorted index, not the
codebook symbol, so that's what we store in the fast huffman; (c) we HAVE to
use the binary search to decode a sparse entry.