Vorbis Input Streaming ====================== Ogg Parsing The most important design decision for how stb_vorbis loads data is that it does not have a separate Ogg parser. This was a snap design decision I made early in development, primarily for efficiency reasons and clarity reasons, based on my experience with jpeg and png: the clearest, best code just decodes everything as it comes in in a linear stream. In practice I doubt that my choice leads to an efficiency win _or_ a clarity win; it does make the decoder less effective at recovering from certain unlikely kinds of corruption; but it does absolutely minimize the amount of read-ahead necessary to decode a frame. The last reason probably makes it the right choice, but it's debatable. If we parsed Ogg separately, we would have the ability to go ahead and read a whole Ogg page, and to pass the individual packets to the Vorbis code. 99% of the time, we would not look at the Vorbis bytes, and would not need to reassemble them (they would be intact in the Ogg frame), and the code would run at the same speed. In the infrequent case that a packet crossed an Ogg page boundary, we could copy an reassemble the packet and keep that complexity out of the vorbis decoder. Of course, in the current implementation, that complexity is buried even _further_ down, in get_bits and get8_packet, which track segments and process pages behind the decoder's back. The question is where that complexity is better to have. A plausible Ogg-parsing implementation would read _all_ of each page a packet spans, whereas the current implementation reads no further than the end of each packet. Because in practice pages and packets are both small, this makes little difference, but in terms of what the _spec_ allows this is fairly huge, since two pages combined could be as big as 128KB. The current implementation using pushdata might require that much memory (or more should a single packet cross even more pages--allowed by the spec as implausible as it might be), but the streaming readers can use an arbitrarily small buffer. Overall this feels like the more flexible design (more likely to be useful in a variety of platforms), at some sacrifice of clarity and corruption-detection. (See bullet-point list below for the case of bad corruption detection.) Streaming Input Most of the stb_vorbis codebase is written as if we are _pulling_ data in from some source, not having the data pushed at it. Even so, unless you seek, stb_vorbis does not do any read-ahead or rewinding. As noted, parsing of Ogg data is interleaved with parsing of Vorbis data automatically. For this reason the 'delete samples at the beginning' is not supported, since it's implicitly encoded in the ogg granule position describing the _last_ frame of a page, and we'd have to scan backwards and work out the size of all the frames in the page. IMO, the whole thing with Ogg having all the timing data is an unfortunate design that makes decoding way more painful, compared to, say, 2 bytes in the header. (It's not like you can losslessly edit an ogg vorbis file by ONLY parsing Ogg, since you still need to know how long each packet is in samples, which you can't tell without decoding the Vorbis header and a little of each packet.) The specification makes a note that encoders should only output two packets in the first frame if using a non-0 offset, but this is not guaranteed, and the consequences for this decoder would be far greater than the spec imagines (I imagine the rationale for this is that they expectated is that you'd load a whole page at a time, but since a page can be 64KB and that's 1/4 of the memory on some otherwise plausible platforms, that seems unwise). General Overview As a result, you can see a few things going on: -- pages and packets are decoded in a linear stream without prescanning / rewinding -- CRCs are not checked except on seek/stream recovery, because by the time we'd notice they were wrong we'd have already output all the bad data anyway (and we're about to miss an ogg capture pattern and catch the problem anyway) -- page numbers are not checked because if they're wrong, we'd like to check the CRC and keep going if the CRC is ok, but we have no way to do that. better to hope it's a legit page and a wrong page number and keep going (if it's a bad page, we'll recover soon enough; 50% chance of a framing error on each packet) then it is to throw the page away! -- the 'pushdata' interface creates a buffer and streams from it like normal. some pre-scanning code checks to make sure we have enough data (assuming no corruption). A corrupted input stream might not have enough data, in which case the end of the input buffer will be treated as eof and cause an error. (in fact, I use this to simplify determining that we have enough pushdata for parsing the header). -- for this reason we _have_ to check that we don't go off the end of the pushdata buffer, even if it passes the ogg 'enough data for this frame' rule, so there's no real savings to be had by putting the pushdata through a different interface. Therefore, if you're using the pushdata path, its use of the get8_packet() interface might seem inefficient but it's really not so bad. -- the pushdata buffer must have the ENTIRE header in it for us to open the file; and it must have an entire packet (and any immediately preceding Ogg page header) to decode a frame