Radiodef Radiodef - 4 months ago 10
Java Question

How do I use audio sample data from Java Sound?

This question is usually asked as a part of another question but it turns out that the answer is very long. I've decided to answer it here so I can link to it elsewhere.

Although I'm not aware that Java can produce audio samples for the programmer at this time, if that changes in the future, this can be a place for it. I know that

is starting to have stuff like this, for example
AudioSpectrumListener
.




I am using
javax.sound.sampled
for playback and/or recording but I would like to do something with the audio.

Perhaps I'd like to display it visually or process it in some way.

How do I access audio sample data to do that with Java Sound?

See also:


Answer

Well, the simplest answer is that at the moment Java can not produce sample data for the programmer. Playback with javax.sound.sampled largely acts as a bridge between the file and the sound device. The bytes are read in from the file and sent off.

Do not assume the bytes are meaningful audio samples! Unless you happen to have an 8-bit AIFF file, they are not. (On the other hand, if the samples are definitely 8-bit signed, you can do arithmetic with them.)

So instead, I will enumerate the types of AudioFormat.Encoding and describe how to decode them yourself. This answer will not cover how to encode them but it is included in the complete code example at the bottom. Encoding is mostly just the decoding process in reverse.

This is a very long answer but I wanted to give as thorough of an overview as I could.


A Little About Digital Audio

Generally when digital audio is explained, we are referring to Linear Pulse-Code Modulation (LPCM).

A continuous sound wave is sampled at regular intervals and the amplitudes are quantized to integers of some scale.

Shown here is a sine wave sampled and quantized to 4 bits:

lpcm_graph

Notice that the most positive value in two's complement representation is 1 less than the most negative value. This is a minor detail to be aware of. For example if you are clipping a waveform and forget this, the positive clips will overflow.

When we have audio on the computer, we have an array of these samples. This is what we want to turn the byte array in to. To decode PCM we don't care too much about the sample rate or number of channels so I won't be covering that here.


Some Assumptions

All of the code examples will assume the following declarations:

  • byte[] bytes; The byte array, read from the InputStream.
  • float sample; The sample we are working on.
  • long temp; An interim value used for general manipulation.
  • int i; The position in the byte array at each sample.

All encodings will be scaled in the float[] array to the range of -1f <= sample <= 1f. All of the floating-point formats I've seen come this way and it is also the most useful.

Scaling is simple, just:

sample = sample / fullScale(bitsPerSample);

Where fullScale is 2bitsPerSample - 1.


How do I coerce the byte array in to meaningful data?

The byte array contains the sample frames split up and all in a line. This is actually very straight-forward except for something called endianness, which is the ordering of the bytes in each packet.

Here is a diagram. This packet holds the decimal value 9999:

  24-bit sample as big-endian:

 bytes[i]   bytes[i + 1] bytes[i + 2]
 ┌──────┐     ┌──────┐     ┌──────┐
 00000000     00100111     00001111

 24-bit sample as little-endian:

 bytes[i]   bytes[i + 1] bytes[i + 2]
 ┌──────┐     ┌──────┐     ┌──────┐
 00001111     00100111     00000000

They hold the same binary values; however, the byte orders are reversed.

  • In big-endian, the more significant bytes come before the less significant bytes.
  • In little-endian, the less significant bytes come before the more significant bytes.

WAV files are stored in little-endian byte order and AIFF files are stored in big-endian byte order. Endianness can be obtained from AudioFormat.

To concatenate the bytes and put them in to our temp variable, we:

  • Bitwise AND each byte with the mask 0xFF (which is 0b1111_1111) to avoid sign extension when the byte is automatically promoted. (char, byte and short are promoted to int when arithmetic is performed on them.)
  • Bit shift each byte in to position.
  • Bitwise OR the bytes together.

Here is a 24-bit example:

if(isBigEndian) {
    temp = (
          ((bytes[i    ] & 0xffL) << 16L)
        | ((bytes[i + 1] & 0xffL) <<  8L)
        |  (bytes[i + 2] & 0xffL)
    );
} else {
    temp = (
           (bytes[i    ] & 0xffL)
        | ((bytes[i + 1] & 0xffL) <<  8L)
        | ((bytes[i + 2] & 0xffL) << 16L)
    );
}

Notice that the shift order is reversed for endianness.

This process can also be generalized in to a loop (which is included in the full code), though it is much more esoteric-looking.

Now that we have the bytes concatenated together, we can turn them in to a sample.

How do I decode Encoding.PCM_SIGNED?

The two's complement sign must be extended. This means that if the most significant bit (MSB) is set to 1, we fill all the bits above it with 1. The arithmetic right-shift (>>) will do the filling for us automatically if the sign bit is set, so I usually do it this way:

int extensionBits = bitsPerLong - bitsPerSample;
sample = (temp << extensionBits) >> extensionBits.

(Where bitsPerLong is 64.)

To understand how this works, here is a diagram of sign extending 8-bit to 16-bit:

 This is the byte value -1 but the upper bits of the short are 0.
 Shift the byte's MSB in to the MSB position of the short.

 0000 0000 1111 1111
 <<                8
 ───────────────────
 1111 1111 0000 0000

 Shift it back and the right-shift fills all the upper bits with a 1.
 We now have the short value of -1.

 1111 1111 0000 0000
 >>                8
 ───────────────────
 1111 1111 1111 1111

Positive values (that had a 0 in the MSB) are left unchanged. This is a nice property of the arithmetic right-shift.

Then scale it.

How do I decode Encoding.PCM_UNSIGNED?

We turn it in to a signed number. Unsigned samples are simply offset so that, for example:

  • An unsigned value of 0 corresponds to the signed most negative value.
  • An unsigned value of 2bitsPerSample - 1 corresponds to the signed 0 value.
  • An unsigned value of 2bitsPerSample corresponds to the signed most positive value.

So this turns out to be pretty simple, just subtract the offset:

sample = temp - fullScale(bitsPerSample);

Then scale it.

How do I decode Encoding.PCM_FLOAT?

This is new since Java 7.

In practice, floating-point PCM is invariably either IEEE 32-bit or IEEE 64-bit and already scaled to the range of ±1.0. The samples can be obtained with the utility methods Float#intBitsToFloat and Double#longBitsToDouble.

// IEEE 32-bit
sample = Float.intBitsToFloat((int)temp);
// IEEE 64-bit
sample = (float)Double.longBitsToDouble(temp);

How do I decode Encoding.ULAW and Encoding.ALAW?

These are companding compression codecs that are more common in telephones and such. They are supported by javax.sound.sampled I assume because they are used by Sun's Au format. (Though it is not limited to just this type of container, for example, WAV can contain these encodings.)

You can conceptualize A-law and μ-law like they are a floating-point format. These are PCM formats but the range of values is non-linear.

There are two ways to decode them. I will show the mathematical equation. You can also decode them by manipulating the binary directly which is described in this blog post but is a bit more esoteric.

For both, the compressed data is 8-bit. Standardly A-law is 13-bit when decoded and μ-law is 14-bit when decoded; however, applying the equation yields a range of ±1.0.

Before you can apply the equation, there are three things to do:

  1. Some of the bits are standardly inverted for storage due to some archaic reason involving data integrity.
  2. They are stored as a sign and magnitude rather than a two's complement.
  3. The equation also expects a range of ±1.0 so the 8-bit value must be scaled.

For μ-law all the bits are inverted so:

temp = temp ^ 0xffL; // 0xff == 0b1111_1111

For A-law, every other bit is inverted so:

temp = temp ^ 0x55L; // 0x55 == 0b0101_0101

(XOR can be used to do inversion. See 'How do you set, clear and toggle a bit?')

To convert from sign and magnitude to two's complement, we:

  • Check to see if the sign bit is set.
  • If so, clear the sign bit and negate the number.
// 0x80 == 0b1000_0000
if((temp & 0x80L) == 0x80L) {
    temp = temp ^ 0x80L;
    temp = -temp;
}

Then scale the encoded numbers, the same way as described earlier:

sample = temp / fullScale(8);

Now we can apply the expansion.

The μ-law equation translated to Java is then:

sample = (float)(
    signum(sample)
        *
    (1.0 / 255.0)
        *
    (pow(256.0, abs(sample)) - 1.0)
);

The A-law equation translated to Java is then:

float signum = signum(sample);
sample = abs(sample);

if(sample < (1.0 / (1.0 + log(87.7)))) {
    sample = (float)(
        sample * ((1.0 + log(87.7)) / 87.7)
    );
} else {
    sample = (float)(
        exp((sample * (1.0 + log(87.7))) - 1.0) / 87.7
    );
}

sample = signum * sample;

Here is the full code for the SimpleAudioConversion class. (It is not designed for performance.)

If you are using this and you find a bug, please let me know since I did not test it very rigorously.

import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioFormat.Encoding;

import static java.lang.Math.ceil;
import static java.lang.Math.pow;
import static java.lang.Math.signum;
import static java.lang.Math.abs;
import static java.lang.Math.log;
import static java.lang.Math.exp;

/**
 * Performs rudimentary audio format conversion.
 * 
 * <p>
 * Example usage:
 * 
 * <pre>
 * {@code
 * 
 * AudioInputStream ais = ... ;
 * SourceDataLine  line = ... ;
 * AudioFormat      fmt = ... ;
 * 
 * // do prep
 * 
 * for(int blen = 0; (blen = ais.read(bytes)) > -1;) {
 *     int slen;
 *     slen = SimpleAudioConversion.unpack(bytes, samples, blen, fmt);
 * 
 *     // do something with samples
 * 
 *     blen = SimpleAudioConversion.pack(samples, bytes, slen, fmt);
 *     line.write(bytes, 0, blen);
 * }
 * 
 * }
 * </pre>
 * </p>
 * 
 * @author Radiodef
 * @see <a href="http://stackoverflow.com/a/26824664/2891664">Overview on StackOverflow.com</a>
 */
public final class SimpleAudioConversion {
    private SimpleAudioConversion() throws Exception {
        throw new Exception(getClass().getSimpleName());
    }

    /**
     * Converts
     *  from a byte array (<code>byte[]</code>)
     *  to an audio sample array (<code>float[]</code>).
     * 
     * @param bytes   the byte array, filled by the <code>InputStream</code>.
     * @param samples an array to fill up with audio samples.
     * @param blen    the return value of <code>InputStream#read</code>.
     * @param fmt     the source <code>AudioFormat</code>.
     * 
     * @return the number of valid audio samples converted.
     * 
     * @throws NullPointerException
     *  if <code>(bytes == null || samples == null || fmt == null)</code>.
     * @throws ArrayIndexOutOfBoundsException
     *  if <code>(bytes.length < blen)</code>
     *  or <code>(samples.length < blen / bytesPerSample(fmt.getBitsPerSample()))</code>.
     */
    public static int unpack(
        byte[] bytes, float[] samples, int blen, AudioFormat fmt
    ) {
        int   bitsPerSample = fmt.getSampleSizeInBits();
        int  bytesPerSample = bytesPerSample(bitsPerSample);
        boolean isBigEndian = fmt.isBigEndian();
        Encoding   encoding = fmt.getEncoding();
        double    fullScale = fullScale(bitsPerSample);

        int i = 0;
        int s = 0;
        while(i < blen) {
            long temp = unpackBits(bytes, i, isBigEndian, bytesPerSample);
            float sample = 0f;

            if(encoding == Encoding.PCM_SIGNED) {
                temp = extendSign(temp, bitsPerSample);
                sample = (float)(temp / fullScale);

            } else if(encoding == Encoding.PCM_UNSIGNED) {
                temp = signUnsigned(temp, bitsPerSample);
                sample = (float)(temp / fullScale);

            } else if(encoding == Encoding.PCM_FLOAT) {
                if(bitsPerSample == 32) {
                    sample = Float.intBitsToFloat((int)temp);
                } else if(bitsPerSample == 64) {
                    sample = (float)Double.longBitsToDouble(temp);
                }
            } else if(encoding == Encoding.ULAW) {
                sample = bitsToMuLaw(temp);

            } else if(encoding == Encoding.ALAW) {
                sample = bitsToALaw(temp);
            }

            samples[s] = sample;

            i += bytesPerSample;
            s++;
        }

        return s;
    }

    /**
     * Converts
     *  from an audio sample array (<code>float[]</code>)
     *  to a byte array (<code>byte[]</code>).
     * 
     * @param samples an array of audio samples to encode.
     * @param bytes   an array to fill up with bytes.
     * @param slen    the return value of <code>unpack</code>.
     * @param fmt     the destination <code>AudioFormat</code>.
     * 
     * @return the number of valid bytes converted.
     * 
     * @throws NullPointerException
     *  if <code>(samples == null || bytes == null || fmt == null)</code>.
     * @throws ArrayIndexOutOfBoundsException
     *  if <code>(samples.length < slen)</code>
     *  or <code>(bytes.length < slen * bytesPerSample(fmt.getSampleSizeInBits()))</code>.
     */
    public static int pack(
        float[] samples, byte[] bytes, int slen, AudioFormat fmt
    ) {
        int   bitsPerSample = fmt.getSampleSizeInBits();
        int  bytesPerSample = bytesPerSample(bitsPerSample);
        boolean isBigEndian = fmt.isBigEndian();
        Encoding   encoding = fmt.getEncoding();
        double    fullScale = fullScale(bitsPerSample);

        int i = 0;
        int s = 0;
        while(s < slen) {
            float sample = samples[s];
            long temp = 0L;

            if(encoding == Encoding.PCM_SIGNED) {
                temp = (long)(sample * fullScale);

            } else if(encoding == Encoding.PCM_UNSIGNED) {
                temp = (long)(sample * fullScale);
                temp = unsignSigned(temp, bitsPerSample);

            } else if(encoding == Encoding.PCM_FLOAT) {
                if(bitsPerSample == 32) {
                    temp = Float.floatToRawIntBits(sample);
                } else if(bitsPerSample == 64) {
                    temp = Double.doubleToRawLongBits(sample);
                }
            } else if(encoding == Encoding.ULAW) {
                temp = muLawToBits(sample);

            } else if(encoding == Encoding.ALAW) {
                temp = aLawToBits(sample);
            }

            packBits(bytes, i, temp, isBigEndian, bytesPerSample);

            i += bytesPerSample;
            s++;
        }

        return i;
    }

    /**
     * Computes the block-aligned bytes per sample of the audio format,
     * with <code>(int)ceil(bitsPerSample / 8.0)</code>.
     * 
     * <p>
     * This is generally equivalent to the optimization
     * <code>((bitsPerSample + 7) >>> 3)</code>. (Except for
     * the invalid argument <code>bitsPerSample <= 0</code>.)
     * </p>
     * <p>
     * Round towards the ceiling because formats that allow bit depths
     * in non-integral multiples of 8 typically pad up to the nearest
     * integral multiple of 8. So for example, a 31-bit AIFF file will
     * actually store 32-bit blocks.
     * </p>
     * 
     * @param bitsPerSample the return value of <code>AudioFormat#getSampleSizeInBits</code>.
     * @return The block-aligned bytes per sample of the audio format.
     */
    public static int bytesPerSample(int bitsPerSample) {
        return (int)ceil(bitsPerSample / 8.0);
    }

    /**
     * Computes the largest magnitude representable by the audio format,
     * with <code>pow(2.0, bitsPerSample - 1)</code>.
     * 
     * <p>
     * For <code>bitsPerSample < 64</code>, this is generally equivalent to
     * the optimization <code>(1L << (bitsPerSample - 1L))</code>. (Except for
     * the invalid argument <code>bitsPerSample <= 0</code>.)
     * </p>
     * <p>
     * The result is returned as a <code>double</code> because, in the case that
     * <code>bitsPerSample == 64</code>, a <code>long</code> would overflow.
     * </p>
     * 
     * @param bitsPerSample the return value of <code>AudioFormat#getBitsPerSample</code>.
     * @return the largest magnitude representable by the audio format.
     */
    public static double fullScale(int bitsPerSample) {
        return pow(2.0, bitsPerSample - 1);
    }

    private static long unpackBits(
        byte[] bytes, int i, boolean isBigEndian, int bytesPerSample
    ) {
        switch(bytesPerSample) {
            case 1:
                return unpack8Bit(bytes, i);
            case 2:
                return unpack16Bit(bytes, i, isBigEndian);
            case 3:
                return unpack24Bit(bytes, i, isBigEndian);
            default:
                return unpackAnyBit(bytes, i, isBigEndian, bytesPerSample);
        }
    }

    private static long unpack8Bit(byte[] bytes, int i) {
        return bytes[i] & 0xffL;
    }

    private static long unpack16Bit(byte[] bytes, int i, boolean isBigEndian) {
        if(isBigEndian) {
            return (
                  ((bytes[i    ] & 0xffL) << 8L)
                |  (bytes[i + 1] & 0xffL)
            );
        } else {
            return (
                   (bytes[i    ] & 0xffL)
                | ((bytes[i + 1] & 0xffL) << 8L)
            );
        }
    }

    private static long unpack24Bit(byte[] bytes, int i, boolean isBigEndian) {
        if(isBigEndian) {
            return (
                  ((bytes[i    ] & 0xffL) << 16L)
                | ((bytes[i + 1] & 0xffL) <<  8L)
                |  (bytes[i + 2] & 0xffL)
            );
        } else {
            return (
                   (bytes[i    ] & 0xffL)
                | ((bytes[i + 1] & 0xffL) <<  8L)
                | ((bytes[i + 2] & 0xffL) << 16L)
            );
        }
    }

    private static long unpackAnyBit(
        byte[] bytes, int i, boolean isBigEndian, int bytesPerSample
    ) {
        long temp = 0L;

        if(isBigEndian) {
            for(int b = 0; b < bytesPerSample; b++) {
                temp |= (bytes[i + b] & 0xffL) << (
                    8L * (bytesPerSample - b - 1L)
                );
            }
        } else {
            for(int b = 0; b < bytesPerSample; b++) {
                temp |= (bytes[i + b] & 0xffL) << (8L * b);
            }
        }

        return temp;
    }

    private static void packBits(
        byte[] bytes, int i, long temp, boolean isBigEndian, int bytesPerSample
    ) {
        switch(bytesPerSample) {
            case 1:
                pack8Bit(bytes, i, temp);
                break;
            case 2:
                pack16Bit(bytes, i, temp, isBigEndian);
                break;
            case 3:
                pack24Bit(bytes, i, temp, isBigEndian);
                break;
            default:
                packAnyBit(bytes, i, temp, isBigEndian, bytesPerSample);
                break;
        }
    }

    private static void pack8Bit(byte[] bytes, int i, long temp) {
        bytes[i] = (byte)(temp & 0xffL);
    }

    private static void pack16Bit(
        byte[] bytes, int i, long temp, boolean isBigEndian
    ) {
        if(isBigEndian) {
            bytes[i    ] = (byte)((temp >>> 8L) & 0xffL);
            bytes[i + 1] = (byte)( temp         & 0xffL);
        } else {
            bytes[i    ] = (byte)( temp         & 0xffL);
            bytes[i + 1] = (byte)((temp >>> 8L) & 0xffL);
        }
    }

    private static void pack24Bit(
        byte[] bytes, int i, long temp, boolean isBigEndian
    ) {
        if(isBigEndian) {
            bytes[i    ] = (byte)((temp >>> 16L) & 0xffL);
            bytes[i + 1] = (byte)((temp >>>  8L) & 0xffL);
            bytes[i + 2] = (byte)( temp          & 0xffL);
        } else {
            bytes[i    ] = (byte)( temp          & 0xffL);
            bytes[i + 1] = (byte)((temp >>>  8L) & 0xffL);
            bytes[i + 2] = (byte)((temp >>> 16L) & 0xffL);
        }
    }

    private static void packAnyBit(
        byte[] bytes, int i, long temp, boolean isBigEndian, int bytesPerSample
    ) {
        if(isBigEndian) {
            for(int b = 0; b < bytesPerSample; b++) {
                bytes[i + b] = (byte)(
                    (temp >>> (8L * (bytesPerSample - b - 1L))) & 0xffL
                );
            }
        } else {
            for(int b = 0; b < bytesPerSample; b++) {
                bytes[i + b] = (byte)((temp >>> (8L * b)) & 0xffL);
            }
        }
    }

    private static long extendSign(long temp, int bitsPerSample) {
        int extensionBits = 64 - bitsPerSample;
        return (temp << extensionBits) >> extensionBits;
    }

    private static long signUnsigned(long temp, int bitsPerSample) {
        return temp - (long)fullScale(bitsPerSample);
    }

    private static long unsignSigned(long temp, int bitsPerSample) {
        return temp + (long)fullScale(bitsPerSample);
    }

    // mu-law constant
    private static final double MU = 255.0;
    // A-law constant
    private static final double A = 87.7;
    // reciprocal of A
    private static final double RE_A = 1.0 / A;
    // natural logarithm of A
    private static final double LN_A = log(A);
    // if values are below this, the A-law exponent is 0
    private static final double EXP_0 = 1.0 / (1.0 + LN_A);

    private static float bitsToMuLaw(long temp) {
        temp ^= 0xffL;
        if((temp & 0x80L) == 0x80L) {
            temp = -(temp ^ 0x80L);
        }

        float sample = (float)(temp / fullScale(8));

        return (float)(
            signum(sample)
                *
            (1.0 / MU)
                *
            (pow(1.0 + MU, abs(sample)) - 1.0)
        );
    }

    private static long muLawToBits(float sample) {
        double sign = signum(sample);
        sample = abs(sample);

        sample = (float)(
            sign * (log(1.0 + (MU * sample)) / log(1.0 + MU))
        );

        long temp = (long)(sample * fullScale(8));

        if(temp < 0L) {
            temp = -temp ^ 0x80L;
        }

        return temp ^ 0xffL;
    }

    private static float bitsToALaw(long temp) {
        temp ^= 0x55L;
        if((temp & 0x80L) == 0x80L) {
            temp = -(temp ^ 0x80L);
        }

        float sample = (float)(temp / fullScale(8));

        float sign = signum(sample);
        sample = abs(sample);

        if(sample < EXP_0) {
            sample = (float)(sample * ((1.0 + LN_A) / A));
        } else {
            sample = (float)(exp((sample * (1.0 + LN_A)) - 1.0) / A);
        }

        return sign * sample;
    }

    private static long aLawToBits(float sample) {
        double sign = signum(sample);
        sample = abs(sample);

        if(sample < RE_A) {
            sample = (float)((A * sample) / (1.0 + LN_A));
        } else {
            sample = (float)((1.0 + log(A * sample)) / (1.0 + LN_A));
        }

        sample *= sign;

        long temp = (long)(sample * fullScale(8));

        if(temp < 0L) {
            temp = -temp ^ 0x80L;
        }

        return temp ^ 0x55L;
    }
}