I'm working on a project on the iPhone where I'm recording audio from the device mic using AVAudioRecorder, and then will be manipulating the recording.
To ensure that I'm reading in the samples from the file correctly, I'm using python's wave module to see if it returns the same samples.
However, python's wave module returns "fmt chunk and/or data chunk missing" when trying to open the wav file that is saved by AVAudioRecorder.
These are the settings I am using to record the file:
[audioSettings setObject:[NSNumber numberWithInt:kAudioFormatLinearPCM] forKey:AVFormatIDKey];
[audioSettings setObject:[NSNumber numberWithInt:16] forKey:AVLinearPCMBitDepthKey];
[audioSettings setObject:[NSNumber numberWithBool:NO] forKey:AVLinearPCMIsBigEndianKey];
[audioSettings setObject:[NSNumber numberWithFloat:4096] forKey:AVSampleRateKey];
[audioSettings setObject:[NSNumber numberWithInt:1] forKey:AVNumberOfChannelsKey];
[audioSettings setObject:[NSNumber numberWithBool:YES] forKey:AVLinearPCMIsNonInterleaved];
[audioSettings setObject:[NSNumber numberWithBool:NO] forKey:AVLinearPCMIsFloatKey];
1215N:~/Downloads$ od -c --read-bytes 128 testFile.wav
0000000 R I F F x H 001 \0 W A V E f m t
0000020 020 \0 \0 \0 001 \0 001 \0 @ 037 \0 \0 200 > \0 \0
0000040 002 \0 020 \0 F L L R 314 017 \0 \0 \0 \0 \0 \0
0000060 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
Apple software often creates WAVE files with a non-standard (but "spec" conformant)
"FLLR" subchunk after the
"fmt " subchunk and before the
"data" subchunk. I assume "FLLR" stands for "filler", and I assume the purpose of the subchunk is to enable some sort of data alignment optimization. The subchunk is usually about 4000 bytes long, but its actual length can vary depending on the length of the data preceding it.
Adding arbitrary subchunks to WAVE files is generally considered spec-conformant because WAVE is a subset of RIFF, and the common practice in RIFF file processing is to ignore chunks and subchunks which have an unrecognized identifier. The identifier
"FLLR" is "non-standard" and so should be ignored by any software which encounters it.
There is a fair amount of software out there that treats the WAVE format much more rigidly than it ought to, and I suspect the library you're using may be one of those pieces of software. For example, I have seen software that assumes that the audio bytes always begin at offset 44 -- this is an incorrect assumption.
In fact, finding the audio bytes in a WAVE file must be done by finding the location and size of the
"data" subchunk within the RIFF; this is the correct way to locate the audio bytes within a WAVE file.
Reading WAVE files properly must really begin as an exercise in locating and identifying RIFF subchunks. RIFF subchunks have an 8-byte header: 4 bytes for an identifier/name field which is traditionally filled with human-readable ASCII characters (e.g.
"fmt "), and a 4-byte little-endian unsigned integer specifying the number of bytes in the subchunk's data payload -- the subchunk's data payload follows immediately after its 8-byte header.
The WAVE file format reserves certain subchunk identifiers (or "names") as being meaningful to the WAVE format. There are a minimum of two subchunks that must always appear in every WAVE file:
"fmt "- the subchunk with this identifier has a payload which describes the basic information about the audio's format: sample rate, bit depth, etc.
"data"- the subchunk with this identifier has the actual audio bytes in its payload
"fact" is the next most common subchunk identifier. It is usually found in WAVE files that use a compressed codec, such as μ-law. See this enthusiast webpage for more information about some of the various subchunk identifiers in use today in the wild, and information about their payload structure.
From a purely RIFF perspective, subchunks need not appear in any particular order in the file, or at any particular fixed offset. In practice however, almost all software expects the
"fmt " subchunk to be the first subchunk. This is a concession to practicality: it is convenient to know early in the data stream what format of audio the WAVE contains -- this makes it easier to play a wave file from a network stream, for example. If the WAVE file uses a compressed format, such as μ-law, it is usually assumed that the
"fact" subchunk will appear directly after
After the format-specifying chunks are out of the way, assumptions about the location, ordering, and naming of subchunks should be abandoned. At this point, the software should locate expected subchunks by name only (e.g.
"data"). If subchunks are encountered that have unrecognized names (e.g.
"FLLR"), those subchunks should simply be skipped over and ignored. Skipping a subchunk requires reading its length so that you can skip over the correct number of bytes.
What Apple has done with the
"FLLR" subchunk is slightly unusual, and I'm not surprised that some software is tripped up by it. I suspect that the library you are using is simply unprepared to deal with the presence of the
"FLLR" subchunk. I would consider this a defect in the library. The mistake the library authors have made is probably something like:
They may be expecting the
"data" subchunk to appear within the first N bytes of the beginning of the file, where N is something less than ~4kB. They may give up looking if they have to scan too far into the file. The Apple
"FLLR" subchunk pushes the
"data" subchunk to a position >~4kB into the file.
They may be expecting the
"data" subchunk to have a specific ordinal subchunk position or byte offset within the RIFF. Perhaps they expect
"data" to appear immediately after
"fmt ". This is an incorrect way to process a RIFF file, though. The ordinal position and/or offset position of the
"data" subchunk should not be assumed.
As long as we're talking about correct WAVE file processing, I might as well remind everyone that the audio bytes (the
data subchunk's payload) may not run exactly to the end of the file. It is allowable to insert subchunks after the
data payload. Some programs use this to store a textual "comment" field at the end of the file. If you read blindly from the start of the
data payload until the EOF, you may pull in some metadata subchunks as audio, which will sounds like a "click" at the end of playback. You need to honor the length field of the
data subchunk and stop reading audio once you've consumed the entire data payload -- not stop when you hit EOF.