koiyu koiyu - 5 months ago 109
Swift Question

How to equalize stereo input and apply audio effect only to single channel on iOS?

I need to process a stereo audio file on iOS as follows:


  • Both channels should have equal intensity, ie. make the stereo appear as mono

  • Route the mono audio to both left and right channels

  • Apply effects to the audio which is output to the right channel






What I currently have is:

+-------------------+
| AVAudioPlayerNode +------------------------+
+--------^----------+ |
| |
+--------+---------+ +--------v---------+
File ---> AVAudioPCMBuffer | | AVAudioMixerNode +---> Output
+--------+---------+ +--------^---------+
| |
+--------v----------+ +-------------------+ |
| AVAudioPlayerNode +--> AVAudioUnitEffect +-+
+-------------------+ +-------------------+


The effect is a subclass of AVAudioUnitEffect.

I'm having trouble making the stereo input appear as mono and outputting AVAudioPlayerNode to separate channels.

I tried to set the PlayerNodes' volume to 0.5 and pan to -1.0 and 1.0, but, as the input is stereo, this doesn't yield desired effects.

With AVFoundation, I figure I have at least two options: either I…

(1) equalize the channels for PlayerNodes so both PlayerNodes appear as mono — after which I could use the same logic as before: having equal volume on both PlayerNodes, other panning left and other right and applying the effect on one PlayerNode would, after MixerNode, result the effect appear only in right output channel.

(2) Keep the PlayerNodes as stereo (pan = 0.0), apply the effect only on one PlayerNode and then tell the MixerNode to use one PlayerNode's both channels as a source for the left channel and the other's channels for the right channel. I suppose then the MixerNode would effectively equalize the input channels so it would appear as the input is mono and effect can be heard only from one output channel.

The question is: is either of the aforementioned strategies possible and how? Is there another option that I've overlooked?

I'm using Swift for the project, but can cope with Objective-C.




Judging by the lack of responses and my own research, it seems to me AVFoundation might not be the way to go. The simplicity using AVFoundation is tempting, but I'm open to alternatives. Currently I'm researching
MTAudioProcessingTap
-classes and they might be of use. Help is still appreciated.

Answer

I managed to get the desired outcome by using two AVPlayers that I play simultaneously. One AVPlayer has input that has averaged audio data on left channel and silence on right; and vice versa in the other AVPlayer. Finally, the effect is applied only to one AVPlayer instance.

As applying the proprietary effect on an AVPlayer instance turned out to be trivial, the biggest hurdle was how to equalize the stereo input.

I found a couple of related questions (Panning a mono signal with MultiChannelMixer & MTAudioProcessingTap, AVPlayer playback of single channel audio stereo→mono) and a tutorial (Processing AVPlayer’s audio with MTAudioProcessingTap — which was referenced to in almost all the other tutorials I tried to google) all of which indicated the solution probably lies within MTAudioProcessingTap.

Sadly, official documentation for MTAudioProcessing tap (or any other aspect of MediaToolbox) is more or less nil. I mean, only some sample code was found online and the headers (MTAudioProcessingTap.h) through Xcode. But with the aforementioned tutorial I managed to start.

To make things not too easy, I decided to use Swift, rather than Objective-C, in which existing tutorials were available. Converting the calls wasn’t that bad and I even found almost-ready example of creating MTAudioProcessingTap in Swift 2. I did manage to hook on processing taps and lightly manipulate audio with it (well—I could output the stream as-is and zero it out completely, at least). To equalize the channels, however, was a task for the Accelerate framework, namely the vDSP portion of it.

However, using C APIs that extensively use pointers (case in point: vDSP) with Swift gets cumbersome rather quickly—at least comparing to how it’s done with Objective-C. This was also an issue when I initially wrote MTAudioProcessingTaps in Swift: I couldn’t pass AudioTapContext around without failures (in Obj-C getting the context is as easy as AudioTapContext *context = (AudioTapContext *)MTAudioProcessingTapGetStorage(tap);) and all the UnsafeMutablePointers made me think Swift isn’t the right tool for the job.

So, for the processing class, I ditched Swift and refactored it in Objective-C.
And, as mentioned earlier, I'm using two AVPlayers; so in AudioPlayerController.swift I have:

var left = AudioTap.create(TapType.L)
var right = AudioTap.create(TapType.R)

asset = AVAsset(URL: audioList[index].assetURL!) // audioList is [MPMediaItem]. asset is class property

let leftItem = AVPlayerItem(asset: asset)
let rightItem = AVPlayerItem(asset: asset)

var leftTap: Unmanaged<MTAudioProcessingTapRef>?
var rightTap: Unmanaged<MTAudioProcessingTapRef>?

MTAudioProcessingTapCreate(kCFAllocatorDefault, &left, kMTAudioProcessingTapCreationFlag_PreEffects, &leftTap)
MTAudioProcessingTapCreate(kCFAllocatorDefault, &right, kMTAudioProcessingTapCreationFlag_PreEffects, &rightTap)

let leftParams = AVMutableAudioMixInputParameters(track: asset.tracks[0])
let rightParams = AVMutableAudioMixInputParameters(track: asset.tracks[0])
leftParams.audioTapProcessor = leftTap?.takeUnretainedValue()
rightParams.audioTapProcessor = rightTap?.takeUnretainedValue()

let leftAudioMix = AVMutableAudioMix()
let rightAudioMix = AVMutableAudioMix()
leftAudioMix.inputParameters = [leftParams]
rightAudioMix.inputParameters = [rightParams]
leftItem.audioMix = leftAudioMix
rightItem.audioMix = rightAudioMix

// leftPlayer & rightPlayer are class properties
leftPlayer = AVPlayer(playerItem: leftItem)
rightPlayer = AVPlayer(playerItem: rightItem)
leftPlayer.play()
rightPlayer.play()

I use ”TapType” to distinct the channels and it is defined (in Objective-C) as simple as:

typedef NS_ENUM(NSUInteger, TapType) {
    TapTypeL = 0,
    TapTypeR = 1
};

MTAudioProcessingTap callbacks are created pretty much in the same way as in the tutorial. On creation, though, I save the TapType to context so I can check it in ProcessCallback:

static void tap_InitLeftCallback(MTAudioProcessingTapRef tap, void *clientInfo, void **tapStorageOut) {
    struct AudioTapContext *context = calloc(1, sizeof(AudioTapContext));
    context->channel = TapTypeL;
    *tapStorageOut = context;
}

And finally, the actual weightlifting happens in process callback with vDSP functions:

static void tap_ProcessCallback(MTAudioProcessingTapRef tap, CMItemCount numberFrames, MTAudioProcessingTapFlags flags, AudioBufferList *bufferListInOut, CMItemCount *numberFramesOut, MTAudioProcessingTapFlags *flagsOut) {
    // output channel is saved in context->channel
    AudioTapContext *context = (AudioTapContext *)MTAudioProcessingTapGetStorage(tap);

    // this fetches the audio for processing (and for output)
    OSStatus status;    
    status = MTAudioProcessingTapGetSourceAudio(tap, numberFrames, bufferListInOut, flagsOut, NULL, numberFramesOut);

    // NB: we assume the audio is interleaved stereo, which means the length of mBuffers is 1 and data alternates between L and R in `size` intervals.
    // If audio wasn’t interleaved, then L would be in mBuffers[0] and R in mBuffers[1]
    uint size = bufferListInOut->mBuffers[0].mDataByteSize / sizeof(float);
    float *left = bufferListInOut->mBuffers[0].mData;
    float *right = left + size;

    // this is where we equalize the stereo
    // basically: L = (L + R) / 2, and R = (L + R) / 2
    // which is the same as: (L + R) * 0.5
    // ”vasm” = add two vectors (L & R), multiply by scalar (0.5)
    float div = 0.5;
    vDSP_vasm(left, 1, right, 1, &div, left, 1, size);
    vDSP_vasm(right, 1, left, 1, &div, right, 1, size);

    // if we would end the processing here the audio would be virtually mono
    // however, we want to use distinct players for each channel, so here we zero out (multiply the data by 0) the other
    float zero = 0;
    if (context->channel == TapTypeL) {
        vDSP_vsmul(right, 1, &zero, right, 1, size);
    } else {
        vDSP_vsmul(left, 1, &zero, left, 1, size);
    }
}
Comments