Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

How to negotiate an audio format for a Windows Audio Session API (WASAPI) client

The Windows Audio Session API (WASAPI) provides a family of interfaces for playing or recording audio.

Chief among these are the IAudioClient, IAudioClient2, and IAudioClient3 interfaces.

There is a Windows audio session (WASAPI) sample on GitHub, but in this blog post I want to dive into the nitty-gritty of one particular question:

How do I decide what WAVEFORMATEX to pass to IAudioClient::Initialize*?
*Or equivalent

Before I answer this question, let's take a look at some of the relevant methods on these interfaces.

  1. IAudioClient2::SetClientProperties is a way for you to tell Windows some things about the audio stream before actually creating it (by passing an AudioClientProperties structure.)
    The client properties you specify can affect the answers to some of the questions you ask Windows, so be sure to set this BEFORE calling any of the other methods.
  2. IAudioClient::GetMixFormat gives you the audio format that the audio engine will use for this client (with its given AudioClientProperties) to mix all the similar playback streams together, or to split all the similar recording streams apart.
    This format is guaranteed to work*, but sometimes there is a better format that also works.
    * Unless you use AUDCLNT_SHAREMODE_EXCLUSIVE, or AudioClientProperties.bIsOffload = TRUE.
  3. PKEY_AudioEngine_DeviceFormat gives you the audio format that the audio engine uses after the playback mix to talk to the audio driver for the speaker, or to talk to the audio driver for the microphone before splitting the recording streams apart.
    This format is guaranteed to work with AUDCLNT_SHAREMODE_EXCLUSIVE.
    If the audio device has not been used with AUDCLNT_SHAREMODE_SHARED yet, the format may not have been calculated, and the property will be empty.
    You can force the format to be calculated by calling IAudioClient::GetMixFormat.
  4. IAudioClient::IsFormatSupported lets you ask Windows whether the client (with its given AudioClientProperties) supports a given format in a given share mode.
    In certain cases (e.g., AUDCLNT_SHAREMODE_SHARED), if the client does not support the format in question, Windows may suggest a format which (Windows thinks) is close.
  5. IAudioClient::Initialize considers the previously given AudioClientProperties; takes the WAVEFORMATEX you have decided on; and takes a set of flags, including AUDCLNT_STREAMFLAGS_XXX flags.
    The two interesting flags for format negotiation are AUDCLNT_STREAMFLAGS_AUTOCONVERTPCM and AUDCLNT_STREAMFLAGS_SRC_DEFAULT_QUALITY, which tell Windows that you want the WASAPI audio engine to do any necessary conversions between the client format you are giving it and the playback mix format or recording split format.
    This will work for uncompressed integer PCM and uncompressed floating-point client formats, but will not work for compressed formats like AAC.

OK, with all that background out of the way, let's try to answer the question. There are several approaches which will work.

  1. Use a higher level audio API instead of WASAPI
    This is the preferred approach. WASAPI is complicated. And even with all that complication, it's a very underpowered API - for example, it doesn't even do MP3 decoding.
    No matter what your application is, there is almost always a higher-level audio API which is better suited for you. If you are not sure what it is, send me an email and ask me; I might be able to recommend one for you, or I may be able to put you in touch with someone else who can.
    A few examples of higher-level audio APIs: MediaElement, MediaCapture, AudioGraph, XAudio2.
    If you've tried a higher-level API, but you've run into some problem or other and now you're resorting to WASAPI, email me and tell me about the problem; I want to fix it so we can get you back on the right API for you.
  2. If you don't care what format is used
    IAudioClient::SetClientProperties(...);
    use IAudioClient::GetMixFormat // no need to call IsFormatSupported here
  3. If you have a format in hand
    Maybe you're playing audio from a file, or maybe you need to record from the microphone and hand off to a DSP library that insists on a particular input format.
    If that is the case, use the following pattern:
    IAudioClient::SetClientProperties(...);
    if (IAudioClient::IsFormatSupported(formatInHand)) { use that }
    else { use IAudioClient::GetMixFormat, or the suggested closest supported format, and convert between that and formatInHand in the app code }

    Another option which will work, but which is less preferred, is to use this pattern:

    IAudioClient::SetClientProperties(...);
    IAudioClient::Initialize(formatInHand, AUDCLNT_STREAMFLAGS_AUTOCONVERTPCM | AUDCLNT_STREAMFLAGS_SRC_DEFAULT_QUALITY)

    Since you're using WASAPI directly (see point 1 above!) you will need to compress/decompress the audio in app code.

Or in tabular form (because people like tables)

AUDCLNT_SHAREMODE_SHARED AUDCLNT_SHAREMODE_EXCLUSIVE
Any format is fine
IAudioClient2::SetClientProperties(...);
use IAudioClient::GetMixFormat()
IAudioClient2::SetClientProperties(...);
use PKEY_AudioEngine_DeviceFormat
I have a particular format I want to use
IAudioClient2::SetClientProperties(...);
if (IAudioClient::IsFormatSupported(yourFormat)) { use it }
else { use the suggested closest-supported-format and convert between it and yourFormat in app code }

or

IAudioClient2::SetClientProperties(...);
IAudioClient::Initialize(yourFormat, AUDCLNT_STREAMFLAGS_AUTOCONVERTPCM | AUDCLNT_STREAMFLAGS_SRC_DEFAULT_QUALITY)
IAudioClient2::SetClientProperties(...);
if (IAudioClient::IsFormatSupported(yourFormat)) { use it }
else { use PKEY_AudioEngine_DeviceFormat and convert between it and yourFormat in app code }

Regardless of which approach you use, you should always have some assurance that the format will work before calling IAudioClient::Initialize.

You could get this assurance in various ways - IAudioClient::GetMixFormat, IAudioClient::IsFormatSupported, or AUDCLNT_STREAMFLAGS_AUTOCONVERTPCM | AUDCLNT_STREAMFLAGS_SRC_DEFAULT_QUALITY.

It is an application bug to call IAudioClient::Initialize blind.

Share the post

How to negotiate an audio format for a Windows Audio Session API (WASAPI) client

×

Subscribe to Msdn Blogs | Get The Latest Information, Insights, Announcements, And News From Microsoft Experts And Developers In The Msdn Blogs.

Get updates delivered right to your inbox!

Thank you for your subscription

×