Transcript
slab3d User Manual v6.7.4 Joel D. Miller
slab3d is a software-based real-time virtual acoustic environment (VAE) rendering system originally developed in the Spatial Auditory Displays Lab at NASA Ames Research Center. slab3d is now maintained externally as well.
Contents
Introduction Architecture, Libraries, and APIs Coordinate System Render and Generator Plugins Spatial Render Plugin Mixer Render Plugin HRTF Databases Error Handling Building slab3d Tips and Known Issues Applications XScape Content and slabx Coordinates Voice over IP (VoIP) AvADE (Aviation Auditory Display Engine) Server Software Appendix o Latency Analysis
Unless otherwise noted: Copyright (C) 2001-2015 United States Government as represented by the Administrator of the National Aeronautics and Space Administration (NASA). All Rights Reserved. Some sections: Copyright (C) 2006-2015 Joel D. Miller. All Rights Reserved. Last Updated: 3/5/2015
Introduction The slab3d release is a collection of applications, libraries, documentation, MATLAB utilities, and source code for virtual acoustic environment rendering and real-time audio signal processing. The list below provides a quick reference to some of the slab3d components.
just curious about 3D-sound - see SLABScape rendering your own HRTFs - see slabtools and SLABScape writing experiment software - see SRAPI and slabcon writing virtual environment software - see SRAPI and SLABScape writing rendering algorithms - see Render Plugins and SLABScape writing signal generators - see Generator Plugins, SLABScape, and SLABSurface real-time audio signal processing - see slabwire and SLABWireDemo frequency-driven OpenGL displays - see SLABWireDemo Voice over IP - see SLABCall, SLABScape, and SRAPI
Installation Please see slab3d.sonisphere.com for information regarding slab3d system requirements, download, and installation. Windows Sound Schemes When listening to slab3d with headphones it is probably best to disable system sounds by selecting the "No Sounds" sound scheme under Control Panel | Sounds and Multimedia | Sounds | Scheme. Otherwise, an uncomfortably loud system sound might occur while listening to slab3d. Speakers Property Some sound peripherals process the output signal based on the type of display attached to the output (e.g., headphones versus desktop speakers). When using slab3d, it is best to select an unprocessed output path under Control Panel | Sounds and Multimedia | Audio | Advanced | Speakers | Speaker Setup. ASIO ASIO latency will most likely be less than DirectSound latency (e.g., ~4ms vs. ~23ms for a similar configuration). If your sound peripheral does not have an ASIO driver, you might be able to use the slab3d ASIO features by installing ASIO4ALL (www.asio4all.com). ASIO4ALL provides an ASIO interface wrapper for the standard Windows Driver Model (WDM) sound driver interface. ASIO4ALL v2.6 was successfully tested with the "Avance AC97 Audio" sound device. The impact of ASIO4ALL on slab3d latency has not been examined.
Citing slab3d As a courtesy, the use of slab3d in published research should be acknowledged in the publication by citing the two slab3d home pages: [1] http://slab3d.sonisphere.com/, http://humansystems.arc.nasa.gov/SLAB/ The preferred paper reference describing the slab3d release and its implementation: [2] Miller, J. D. and Wenzel, E. M., "Recent Developments in SLAB: A Software-Based System for Interactive Spatial Sound Synthesis," Proceedings of the International Conference on Auditory Display, ICAD 2002, Kyoto, Japan, pp. 403-408, 2002.
Additional Documentation This document and others are available in the release installation directory: slab3d User Manual SRAPI Reference Manual slabwire Reference Manual slabtools User Manual slabtools Reference Manual
\doc\slab3d_user_manual.pdf \doc\ref\index.html \doc\slabwire\index.html \doc\slabtools_user_manual.pdf \doc\slabtools\index.html
Websites: slab3d home page slab3d SourceForge.net project page NASA SLAB Home Page NASA Open Source Software site
http://slab3d.sonisphere.com http://sourceforge.net/projects/slab3d http://humansystems.arc.nasa.gov/SLAB http://opensource.arc.nasa.gov
Papers: http://humansystems.arc.nasa.gov/groups/ACD/personnel_view.php?personnel_id=81
Acknowledgements I would like to thank Beth Wenzel for her support, feedback, and for making slab3d possible, Martine Godfroy for helping make XScape and slabx possible, Jonathan Abel for early DSP and physical modeling assistance, Mark Anderson for programming assistance, Durand Begault for initial use and suggestions, Marlene Hernan, Robert Padilla, Robin Orans, Dave Encisco, Pat Moran, and Vance Dubberly for NASA release, NOSA, and sys admin efforts, the Air Force Research Laboratory for use and support, and spatial audiophiles everywhere for all of the papers, talks, discussions, compositions... --joel Developers Joel Miller Jonathan Abel Mark Anderson Mitch Clapp Renee Goldschmid John Stewart
lead designer and programmer provided physical modeling and signal processing MATLAB examples during the first year of development developed or assisted with traklib Fastrak driver, SLABScape 3D View, BCF2000 interface, managed SRAPI provided SLABScape Direct3D head model and textures developed five SLABWireDemo OpenGL displays, improved spectrum display developed RT_DIS API and slabwire CSI_RT_DIS_Radio class
Open-Source Software The following open-source software is used in slab3d development. Many thanks to the authors! Software rtdis slabDISinterface JVOIP ASIO STK Auditory Toolbox fft.cpp GLUT Shortcut libresample VBAP doog2.m, gaussian.m wave_matlab WaveLab mp_fm2d.m doxygen Graphviz Dia for Windows m2html
How Used low-latency DIS support DIS support library VoIP sound sources low-latency audio interface gstk generator plugin cbe.m critical band filters SLABWireDemo FFT SLABWireDemo OpenGL displays SRAPI SetLinks() slabwcon, CResampler2 incomplete, rvbap.cpp dog.m, mdog.m DOG generation wmdemo.m wavelet transform cwt.m wavelet transform scaling dustimg.m low-vis image documentation documentation documentation documentation
Source slab3d usrlib directory (from John Stewart, AFRL) slab3d usrlib directory (permission granted by AFRL) http://research.edm.uhasselt.be/jori/page/index.php http://www.steinberg.de/ http://ccrma.stanford.edu/software/stk/ http://rvl4.ecn.purdue.edu/~malcolm/interval/1998-010/ "A Programmer's Guide to Sound" by Tim Kientzle http://www.xmission.com/~nate/glut.html http://www.codeguru.com http://wwwccrma.stanford.edu/~jos/resample/Available_Software.html http://www.acoustics.hut.fi/software/vbap/Pure_Data/ http://www.cs.berkeley.edu/~stellayu/code.html http://paos.colorado.edu/research/wavelets/ http://www-stat.stanford.edu/~wavelab/ http://technion.ac.il/~pavel/comphy/ http://www.stack.nl/~dimitri/doxygen/ http://www.graphviz.org/ http://dia-installer.de http://www.artefact.tk/software/matlab/m2html/
Architecture, Libraries, and APIs Architecture The slab3d architecture is illustrated in the figure below.
slab3d Architecture
SRAPI API and Library: The SLAB Render API (SRAPI) provides a platform for the development of virtual acoustic environment (VAE) applications. Using SRAPI, the user can control rendering, allocate sound sources, configure frame-accurate callbacks, select sound output destinations, and specify acoustic scene and renderer parameters (e.g., sound source location, number of HRIR FIR taps). SRAPI also supports a Script and Modifier mechanism for the automatic updating of acoustic scene parameters. Documentation: Library: Header Files: Source Code:
SRAPI Reference Manual \lib\srapimr.lib (mr = multi-threaded, release) \include\ \src\srapi\
slabwire API and Library: The slabwire library is used to configure a StreamIn-DSP-StreamOut sample processing chain. It is used internally by SRAPI but it can also be used as a stand-alone library. When using SRAPI, slabwire is used to control StreamIn sample streams and to develop Render Plugins and Generator Plugins. The SLABWireDemo application demonstrates the direct use of the slabwire API using the SoundInDSP-SoundOut framework. The slabwcon application demonstrates low-level slabwire use. Documentation: Library: Header Files: Source Code:
slabwire Reference Manual \lib\slabwmr.lib (mr = multi-threaded, release) \include\ \src\slabw\
Render Plugin API: The VAE rendering engine is encapsulated in the Render Plugin (rplugin) "Spatial". rplugins can be swapped in and out while rendering allowing for the comparison of different rendering strategies. rplugins also allow for the construction of tailored displays for specific applications (e.g., a spatial communications system). See Included Plugins. Documentation: Library: Header Files: Source Code:
slabwire Reference Manual Several render plugins are built into the srapi library. Additional render plugins can be placed in the SRAPI-based executable directory, e.g., \bin\r*.dll. \include\srapi.h (for built-in render plugin use) and \include\rplugin.h (for render plugin development) \src\r*\
Generator Plugin API: Signal generator sound sources are encapsulated in Generator Plugins (gplugins). This enables users to conveniently add additional signal generation algorithms to slab3d without modifying slab3d itself. See Included Plugins. Documentation: Library: Header File: Source Code:
slabwire Reference Manual Generator plugins are placed with the SRAPI-based executable, e.g., \bin\g*.dll. \include\gplugin.h \src\g*\
Static-state and Render-time, Static Settings and Dynamic Settings: static-state = state of SRAPI when not rendering render-time = state of SRAPI when rendering static settings = setting that must remain constant while rendering dynamic settings = settings that can be set at any time Some static settings are sensitive to setting sequence, e.g.: SampleRate before StreamOuts before StreamIns. Outs and ins depend on the sample rate and ASIO-in depends on ASIO-out. The SRAPI Reference Manual states if a function is a static-state function. All other functions should be assumed to be dynamic functions. This differentiation primarily impacts the calling sequence (e.g., static, dynamic, render start, dynamic, render stop, static, ...) and error handling. There are similar constraints in slabwire.
SRAPI Sound Sources and StreamIns: An SRAPI "sound source" contains state information for a virtual environment sound emitter (Spatial render plugin) and a mixer channel strip (Mixer render plugin). The SRAPI SrcAlloc() function allocates a sound source and assigns it a user-specified ID. This ID is used to set source attributes via the Src*() functions. All sound sources must be allocated before calling RenderStart() to initiate rendering. The samples attached to a sound source are termed StreamIns and are allocated using the SiAlloc*() functions. StreamIns are attached to sound sources using the SrcStream() function. The various StreamIn types are shown in the figure above.
SRAPI Outputs: The SRAPI Out*() functions are used to define the destination of sample output. For the Spatial renderer, this is a binaural signal corresponding to the sampled sound pressure at the openings of the listener's ear canals. The various stream output types (aka StreamOuts) are shown in the figure above.
Support Libraries usrlib libraries: Download information is available under Acknowledgements. JVOIP is used to support the VoIP (Voice over IP) sound source / StreamIn type. See VoIP. slabDISInterface provides DIS (Distributed Interactive Simulation) packet support. rtdis provides an alternate DIS interface. It also includes a DIS transmitter. traklib is a Polhemus Fastrak electromagnetic tracker driver developed in the Spatial Auditory Displays Lab at NASA Ames Research Center. traklib is used by SLABScape to access the Fastrak. download libraries: Information for the download libraries is provided under Building slab3d. ASIO is a low-latency sound device interface. STK is "The Synthesis ToolKit in C++" and is used by the gstk Generator Plugin. See Included Plugins. Microsoft Visual C++ libraries: See the project settings for the various slab3d applications. DirectX and XNA: See Building slab3d.
CSRAPI The primary C++ interface used to access SRAPI is CSRAPI (srapi.h). CSRAPI is used to configure sound sources, input streams, and sound outputs and to control rendering. It is also used to specify the acoustic scene and renderer parameters. The console app below demonstrates the minimum API necessary to render a sound source. #include #include "srapi.h" int main() { CSRAPI cSRAPI; if( cSRAPI.ErrorState() ) return 1; // error // specify Waveform-Audio device output cSRAPI.OutWave(); // allocate a sound source, ID = 0 cSRAPI.SrcAlloc( 0 ) // allocate a looped wave file StreamIn, ID = 0 if( cSRAPI.SiAllocFile( 0, "\\slab3d\\wavs\\voice.wav" ) != ERR_SLAB_NONE ) return 1; // error // attach ch 0 (third param) of stream 0 (second param) to // source 0 (first param) cSRAPI.SrcStream( 0, 0, 0 ); // set source 0 attributes cSRAPI.SrcLocate( 0, 0.5, 0.5, 0.5 ); cSRAPI.SrcGain( 0, 0.0 ); cSRAPI.SrcEnable( 0, true ); // render source for 5 seconds cSRAPI.RenderStart(); Sleep( 5000 ); cSRAPI.RenderStop(); return 0; // success
}
// x,y,z in meters // dB // enable rendering
CSRAPI Defaults Render Plugin: The default render plugin used for auditory display rendering is the binaural HRTF-based renderer "Spatial" (Render ID RENDER_SPATIAL defined in slabdefs.h). Render Plugin Directory: The render plugin directory is the executable directory. Generator Plugin Directory: The default generator plugin directory is the executable directory. A generator plugin directory can also be specified when allocating a signal generator. HRTF Database: The default HRTF database is \hrtf\jdm.slh (sample rate = 44100 samples/s). This database is built into the srapi library (using hrtfhex.h generated by slabcon). Setting
Default
Function
Sample Rate
44100 samples/s
SetSampleRate()
FIR Taps
128 taps for the direct path, 32 taps for reflections
SetFIRTaps()
Smooth Time
15ms parameter tracking filter time constant
SetSmoothTime()
Non-spatial Source Gain Offset
-24dB
SetNSOffset()
Source Location
origin, 0 x,y,z
SrcLocate()
Source Enable
not enabled
SrcEnable()
Source Reflection Enable
not enabled
SrcRefEnable()
Source Reflection Offset
disabled
SrcRefOffset()
Source Gain
0 scalar gain
SrcGainScalar()
Source Mute
not muted
SrcMute()
Source Radius
10cm
SrcRadius()
Source Spread Exponent
1
SrcRadius()
Room Depth (X)
4m
EnvPlanes()
Room Width (Y)
4m
EnvPlanes()
Room Height (Z)
4m, ceiling = 4m - dLstHeight, floor = dLstHeight, dLstHeight = 1.8288m = 6'
EnvPlanes()
Wall Materials
none, i.e., perfect reflector
EnvMaterial()
Listener Position
head at origin, 0 x,y,z,yaw,pitch,roll
LstPosition()
Listener Sensor Offset
7.5" above interaural axis
LstSensorOffset()
Delay Line Length
500ms
SetDelayIn()
Level Meter Attack Time Constant 137ms
SetLevelTimes()
Level Meter Release Time Constant
137ms
SetLevelTimes()
Skip Silence Threshold
0.05 linear gain
SetSkipSilence()
Skip Silence Time
0.5s
SetSkipSilence()
Skip Silence Fade-In
68ms
SetSkipSilence()
Tick Updating
disabled, i.e., use frame updating
SetTick()
Some of the defaults in the table have corresponding defines in slabdefs.h. "Sample Rate" through "Listener Sensor Offset" are reset to their defaults via CSRAPI::SetDefaults(). Copyright (C) 2006-2014 Joel D. Miller. All Rights Reserved.
Coordinate System slab3d uses a right-handed, FLT (Front-Left-Top) coordinate system, i.e., if fingers curled from x (front) to y (left), thumb points to z (top). Location +x front, through nose +y left, through left ear +z top, through top of head Orientation -yaw to right +yaw to left -pitch up +pitch down +roll right -roll left (positive CCW looking down axis towards origin) Polar +azimuth to right -azimuth to left +elevation up -elevation down +range forward -range backward
Convolvotron and Polhemus Azimuth and Elevation These definitions are compatible with the Polhemus Isotrak and Fastrak head trackers and the Crystal River Engineering Convolvotron coordinate systems with the following exceptions: slab3d azimuth = - Convolvotron and Polhemus azimuth slab3d and Convolvotron elevation = - Polhemus elevation
Polhemus Transmitter and Receiver The physical placement of the Polhemus transmitter and receiver often results in the following configuration (as specified on the Fastrak transmitter): +X forward, +Y right, +Z down. Since slab3d uses +Z up and +Y left, the signs of Y, Z, Yaw, and Pitch must be changed if your tracker driver follows this convention.
Microsoft XNA XNA uses a right-handed right_x-top_y-back_z coordinate system with +yaw left, +pitch up, +roll left. The Xprefixed Managed SRAPI functions are provided for use with XNA.
Render and Generator Plugins All audio rendering in slab3d is performed with render plugins (rplugins). The base class of all rplugins is CRPlugIn. The default rplugins are built into the srapi library. Additional rplugins can be built as DLLs and placed in the SRAPI-based executable directory. All rplugins found in this directory will be read into memory when a CSRAPI object is allocated. The SRAPI Render Functions exist for querying and selecting rplugins. Through the CSRAPI class, rplugins are given access to SRAPI's scene parameters, input delay lines, and output stream. The rplugin DLL and CSRAPI interface is only supported by SRAPI; rplugins can be created and used as CRPlugIn subclasses in slabwire. All signal generation in slab3d is performed with generator plugins (gplugins). Similar to rplugins, gplugins are subclassed from CGPlugIn, built as DLLs, and placed in the slab3d-based executable directory. A gplugin DLL directory can also be specified when allocating a generator. gplugins are completely implemented in slabwire and are thus accessible from both slabwire and SRAPI. See Also: CSRAPI, CRPlugIn, CGPlugIn, and gslab.h in the SRAPI Reference Manual.
Starter Projects The rmyplugin project is the starter project for creating user Render Plugins. The gslab project is the starter project for creating user Generator Plugins. Both projects contain functioning example code that illustrates where to place your new custom code. Follow the steps below to create your own plugin using Microsoft Visual C++: 1. Create a new Win32 DLL project. 2. Delete the generated .cpp from the project and from the directory. 3. Copy rmyplugin.cpp (Render Plugin) or gslab.cpp (Generator Plugin) from slab3d's src\ directory to your new project directory. Rename it if you wish. Add it to the project. 4. Add the slab3d include\ and include\slabw directories to the Additional Include Directories compiler property. 5. For slab3d to find your DLL, it must reside in the slab3d-based executable's directory. Set the linker Output File property as follows: 1. For debug DLLs, enter the name of your DLL appended to the corresponding executable's path. E.g., "/bin/r*d.dll" where '*' denotes a name of your choosing, 'r' indicates the DLL is a "render" plugin, 'd' indicates the DLL is the debug version of the DLL. 2. The release DLL format for the example above is "/bin/r*.dll". 3. For Generator Plugins, replace the 'r' prefix with a 'g' prefix. Source Code: The source code for rmyplugin is in \src\rmyplugin\. The source code for gslab is in \src\gslab\.
slab3d Release Plugins The following Render Plugins and Generator Plugins are included in the slab3d release. Render Plugins are prefixed with an 'r' and Generator Plugins are prefixed with a 'g'.
rspatial - contains "Spatial", the slab3d auralization algorithm. The rspatial source code contains the CSpatial class that can be used as a baseclass for special purpose spatial displays (e.g., rsar). rdiotic - contains the following Render Plugins: "Diotic", "Left-Monotic", "Right-Monotic", and "Dichotic". rmyplugin - contains the "Plugin Example" plugin, an extremely simple diotic display plugin that is useful as a Render Plugin starter project. rmixer - contains the "Mixer" plugin for use with the CSRAPI::SrcMix*() functions. rsar - was developed for a Search And Rescue communications application. It demonstrates how to extend "Spatial" for special-purpose applications. It also demonstrates the use of the SRAPI extension Plugin Variables. An rsar DLL is not included in the slab3d release and rsar is not supported by SLABScape. gslab - contains "noise", "sine", "square", "triangle", "am" (amplitude modulation), and "impulse" generators. In addition to providing slab3d's default signal generators, gslab also serves as the Generator Plugin starter project. gstk - wraps a Generator Plugin interface around the STK (Synthesis Toolkit) instruments. For more information regarding STK, see http://ccrma.stanford.edu/software/stk/.
Binaries: Built-in Render Plugins: \lib\srapimr.lib Additional Render Plugins: \bin\r*.dll Generator Plugins: \bin\g*.dll Source Code: Built-in Render Plugins: \lib\src\srapi\ DLLs: \src\\
Spatial Render Plugin Audio rendering in slab3d is divided between the SRAPI library and Render Plugins. The default Render Plugin is the HRTF-based "Spatial" plugin. The listener HRTF database used by Spatial contains minimum-phase headrelated impulse response (HRIR) pairs and interaural time delays (ITDs) at fixed azimuth and elevation increments. The azimuth and elevation increments can vary from one database to another. The slabwire frame size is 32 samples, meaning sound samples are routed through the signal processing chain 32 samples at a time. The sample data type is single-precision floating-point and all calculations are performed using single or doubleprecision floating-point arithmetic. Every frame, the Render Plugin receives a frame of samples. For a sample rate of 44,100 samples/s, the frame rate is 1378 frames/s. Every frame the following processing occurs: Script, Trajectory, and Callback Update - updates the script, trajectory and callback mechanisms at update rates defined by the user. The callback feature allows user code (e.g., a custom source trajectory) to run inside the slabwire thread. Acoustic Scene Update - converts scene parameters to DSP parameters. Spatial performs a scene update each time the user updates a scene API parameter (e.g., listener position). The maximum update rate depends on available CPU resources. Since the HRIR FIR filter coefficients are updated every other frame, the absolute maximum update rate is 690Hz. A typical scene update rate is 120Hz.
Acoustic Scene Update: o Tracker Sensor Offset - compensates for the location of the head tracker sensor. o Image Model - computes the location of sound source reflection images. o 3D Projection - converts scene information into listener-relative geometric quantities: Image-Listener Range Image Arrival Angle Signal Flow Translation - converts listener-relative geometric quantities into FIR coefficients and delay line indices (aka “DSP parameters”) for each sound path and for each of the listener’s ears, modeling: o Propagation Delay o Spherical Spreading Loss o HRTF Database Interpolation (FIR Coefficients, ITD)
Process - processes each sound image, performing the following signal processing tasks:
Delay Line - models propagation delay and ITD. o Delay Line Indices Parameter Tracking - bumps current delay line indices towards target values every sample. o Provided by slabwire layer: Interpolated Delay Line - implements a 2x up-sampled, linearly interpolated, fractionally indexed delay line. IIR Filter - models wall materials with a first-order IIR filter. FIR Filter - models spherical spreading loss and head related transfer functions. o FIR Coefficient Parameter Tracking - bumps current FIR coefficients towards target values every other frame. o FIR Filter Operation - implements an arbitrary length FIR filter. Typically, the direct path is computed with 128 taps and each reflection with 32 taps.
Mix - mixes the direct path and six first-order reflections for an arbitrary number of sound sources.
Modeling Notes
ITD is implemented by lagging the contralateral ear relative to the ipsalateral ear. In a real-world dynamic situation the ITD would be split between a lag and lead, respectively. When a sound source is moving relative to the listener, regardless of source-listener distance, the HRTF is updated immediately. In reality, due to propagation delay, there would be a slight delay before the sound wave incident angle changed. This is true of other source attributes as well, e.g., if one considered the source gain as a volume knob on the source.
Mixer Render Plugin The Mixer Render Plugin block diagram is shown below. Please see the SRAPI Reference Manual and the SrcMix functions for more information.
Mixer Render Plugin Block Diagram
source n aux send 1
L
aux returns
HRTF1,n(en1,n,az1,n,el1,n) R
aux send 2
L HRTF2,n(en2,n,az2,n,el2,n) R
gn,1 x r1,n,1 x
∑
out 1
n r2,n,1 x
gn,2 x r1,n,2 x
∑
out 2
n r2,n,2 x
etcetera to out 32
Definitions n HRTFaux#,src# enaux#,src# azaux#,src# elaux#,src# gsrc#,out# raux#,src#,out#
source number in order allocated each aux send is routed to an HRTF az,el processing block, there are two independent HRTF processing blocks per source enable processing block azimuth elevation source gain aux return gain applied to spatial signal, typically, odd/even out pairs should be identical, r 1,n,1 = r1,n,2, r1,n,3 = r1,n,4, etc.
The sum blocks sum over all sources.
HRTF Databases slab3d HRTF (Head-Related Transfer Function) databases contain HRIR (Head-Related Impulse Response) and ITD (Interaural Time Delay) filter data for spatial rendering. Measured HRTFs can be obtained using insert-ear microphones and a spherical array of speakers to construct a spherically-sampled database. The HRIRs are converted to minimum-phase to reduce filter length and to simplify real-time interpolation. Biharmonic spline interpolation is used to resample the data to a uniform grid. The grid is specified by an azimuth increment and an elevation increment. The default slab3d HRTF database is "jdm.slh". The SLH suffix denotes a slab3d HRTF. "HRTF databases" are sometimes referred to as "HRTF maps". See CSRAPI::LstHRTF() in the SRAPI Reference Manual for SRAPI HRTF database selection.
slab3d HRTF database format
HRTF Database Header struct HRTFHeader { short nVersion; char strName[32]; char strDate[8]; char strComment[256]; short nAzInc; short nElInc; short nNumPts; long lSampleRate; };
// // // // // // // //
database format version (2) subject's name date measured text comment az increment, degrees el increment, degrees number of HRIR points sample rate
The HRIR and ITD data immediately follow the header (below: nAzInc = 30, nElInc = 18, nNumPts = 128). AZ 180, ... 180, ... 180, ...
EL 90, left
ear, hrir pt 0
hrir pt 127 90, right ear, hrir pt 0
72, left
hrir pt 127 ear, hrir pt 0
-90, right ear, hrir pt 127 150, 90, left ear, hrir pt 0 ... -180, -90, right ear, hrir pt 127 180, 90, delay 180, 72, delay ... -180, -90, delay
Left ear points
Right ear points
Elevations (grouped by azimuth)
Azimuths
Delays (+ left ear lag, - right ear lag)
jdm.slh parameters Number of HRIR pairs and ITDs: Azimuth grid: Elevation grid: HRIR length: Sample Rate:
143 180 to -180 in -30 degree increments 90 to -90 in -18 degree increments 128 points 44100 samples/s
Error Handling There are two error checking modes corresponding to the two states of SRAPI, static-state and render-time. The two modes are required because of render-time multitasking. When rendering, SRAPI runs in two threads of execution, the API Thread (same as the user's thread) and the Render Thread, a thread created during a call to RenderStart(). Errors can occur inside the Render Thread without an API call (e.g., a state error in a Render Plugin not directly related to an API call). The user cannot check the return value of an API function to catch this error (unless "polling" and then only during the next "error poll", see below). Static-state Error Catching Function Return Values: Many SRAPI functions return SLABError. Static-state errors are typically caught with function return values. Render-time Error Catching There are two methods for catching render-time errors: Notification Callback: When using a notification callback, SRAPI notifies the user of an error by calling the user's callback function with SC::NotifyDspDone (i.e., DSP done due to error). Before calling RenderStart(), call SetNotify() specifying a notification callback function. If a render-time error occurs, the notification callback will be called. Use an error query function to verify the notification corresponds to an error (notifications can also be used to indicate render completion using the AutoStop feature). Error Polling: Whenever an error occurs, SRAPI enters an "error state." In an error state, all error-returning SRAPI functions return the current error. Error Polling refers to catching existing errors via function return values or error status functions. In other words, the user is checking for an error not necessarily caused by the function itself (e.g., an error caused by a previous function where the return value wasn't checked). Error State SRAPI enters an "error state" whenever an error occurs. In an error state:
rendering is stopped the sound output stream is stopped script processing is stopped all functions returning SLABError return the current error error information can be queried via ErrorState(), Error(), ErrorString(), and ErrorStack() error state maintained until explicitly cleared with ErrorClear()
Error Functions Overview The following functions exist for querying error information and clearing error state:
ErrorState(): returns true if in error state, false if not Error(): returns the current SLABError ErrorString(): returns a string describing the current error ErrorStack(): returns a string providing call stack error information for the current error ErrorClear(): clears error state SetNotify(): sets the notification callback function LogName(): sets the name of the log file used for logging error information Reset(): resets slab3d
Building slab3d Pre-built release-mode slab3d libraries, render plugins, generator plugins, and applications are included in the slab3d release. Building derivative works can often be accomplished using the pre-built components (e.g., see the sample applications slabcon and SLABWireDemo). Note regarding static vs. shared linking: To avoid "DLL hell", slab3d-based apps are typically statically linked. Some technologies, such as the Common Language Runtime, require MFC to be accessed via a shared DLL. Using this configuration with slab3d often results in linking conflicts. The work-around is to use the Visual Studio options Ignore Specific Libraries in concert with Additional Dependencies. Libs ignored by the former can be added using the latter. It might take some permuting, but this should allow a mixed static/shared app to be built. All slab3d libraries, plugins, and applications can be built using the slab3d release and a few publicly available components. The steps below assume slab3d was installed to the directory \slab3d. This is also the directory used to develop slab3d, so you might see references to this directory. In general, absolute directory paths are avoided, but a few may exist. The slab3d release is built using Microsoft Visual C++ 2010 (in Microsoft Visual Studio 2010 Professional).
Build Steps
The February 2010 release of DirectSound is used for low-latency sound device output. The DirectX SDK can be downloaded from the Microsoft Download Center: http://www.microsoft.com/en-us/download/default.aspx The following Visual C++ settings (or similar) should be used for the headers and libraries. If set using Visual Studio Options (versus Project Settings), these directories will be used for all projects and solutions. They should be place before the Windows SDK directories to avoid header clashes. Visual C++ Include Files Directory: C:\Program Files\Microsoft DirectX SDK (February 2010)\Include Visual C++ Library Files Directory: C:\Program Files\Microsoft DirectX SDK (February 2010)\Lib\x86 Other DirectX 9c and after SDKs should be compatible. Note, though, sometimes a release built on one machine will require a missing DLL when run on another. I tried to avoid this by using static builds, but somewhere along the DirectX 9 release cycle, Microsoft moved code from a static library to a DLL forcing users to update their DirectX End-User Runtime.
For VoIP support, the slabwire library uses the open source JVOIP implementation. Slightly modified versions of the JVOIP releases are included in slab3d's usrlib\ directory. The modifications are documented in usrlib\readme.txt. Build the entire solution usrlib\jvoip.sln, debug and release. If several warnings occur during the build, they can be safely ignored.
For DIS support, the slabwire library uses the Air Force Research Library's slabDISInterface library. This library is released with slab3d in the usrlib\ directory. Build the slabDISInterface project, debug and release.
For Polhemus Fastrak support, SLABScape uses the NASA-developed traklib library. Build the traklib project in usrlib\traklib\traklib.sln, debug and release.
The ASIO sound device interface is used for low-latency sample I/O. slab3d is built using ASIO SDK version 2.2, but 2.0 and 2.1 should build as well. The ASIO SDK files are not included in the slab3d release because the ASIO license does not allow the re-release of ASIO files. The ASIO SDK is available from www.steinberg.net under Support, 3rd Party Developers. Once downloaded, place a copy of the SDK in slab3d's usrlib\ directory, e.g., slab3d\usrlib\asiosdk2.2\ containing directories common\ and host\.
For XML read/write support, SRAPI uses XmlLite. XmlLite appears to be installed with Visual Studio 2010, though I originally used Visual Studio 2005 and the "Microsoft Windows SDK Update for Windows Vista". The required files are xmllite.h and xmllite.lib. The version of the SDK I used is titled "Windows SDK for Windows Server 2008 and .NET Framework 3.5" (aka "Windows SDK v6.1"). The ISO file "6.0.6001.18000.367-KRMSDK_EN.iso" is available at the Microsoft Download Center: http://www.microsoft.com/downloads An ISO mounting program (e.g., MagicDisc) can be used to install directly from the ISO. The default SDK install dir is "C:\Program Files\Microsoft SDKs\Windows\v6.1". XP users might have to download "XMLLite for Windows XP (KB915865)" to obtain the xmllite.dll runtime file.
SLABScape includes a simple speech recognition demo. This demo was originally coded using Visual Studio 2005 and the Microsoft Speech SDK 5.1 (SpeechSDK51.exe at the Microsoft Download Center). The Speech SDK headers and libs appear to be installed with Visual Studio 2010. Note: Even though the Speech SDK headers and libs appear to be installed with Visual Studio 2010, the grammar compiler gc.exe does not appear to be installed. The Windows SDK v6.1 ISO discussed above contains gc.exe in the bin\ directory. That gc.exe was used to build SLABScape's grammar.cfg and grammar.h files. A pre-built version of grammar.cfg is distributed with slab3d, so v6.1 isn't necessarily required. However, if errors are reported building SLABScape under v6.0A, v6.1 might help resolve them. If you are not interested in speech recognition and Windows SDK v6.1, see the define _ENABLE_SAPI in SLABScape file sr.h.
The slabx assembly and the X-prefixed projects require XNA 4.0. XNA is available from the Microsoft Download Center.
Build all \slab3d\src\slab3d.sln, debug and release.
Building gstk gstk is a slab3d generator plugin that contains all of the STK v4.2.0 instruments (STK is "The Synthesis ToolKit in C++"). A more recent version of STK is now available. I do not know its impact on gstk.
Download STK from http://ccrma.stanford.edu/software/stk and install. The slab3d STK-based projects assume STK is installed to \Apps\stk-4.2.0. If installed to another directory, you will need to update the project settings.
Open the solution \slab3d\src\gstk\gstk.sln and build the project gstk, debug and release. This will also build stklib. If warnings occur during the build, they can be safely ignored. stk1 is an STK test application that is not required for gstk.
Cleaning Batch files are included for cleaning (deleting) files created during the build process:
cleanrel.bat - used to clean all non-usrlib files except those required for the release package, i.e., some exe's, lib's, and dll's are not deleted. See the cleanrel.bat batch file for details. clean.bat - calls cleanrel.bat and then cleans the release exe's, lib's, dll's, and res's. cleandev.bat - cleans all usrlib build files except those required to build slab3d components. cleanusr.bat - cleans all usrlib build files.
Copyright (C) 2006-2014 Joel D. Miller. All Rights Reserved.
Tips and Known Issues Tips To disable source-listener range-dependent gain scaling use SrcRadius( idSrc, dRadius, 0.0 ). To place sound sources by specifying azimuth and elevation use SrcLocateRel() or LstPosition( 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 ) with SrcLocatePolar( idSrc, az, el, range ). To specify sound source time delays use LstPosition( 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 ) and SrcLocatePolar( idSrc, az, el, dSoundSpeed * time_delay ). To generate periodic sound sources not implemented in a signal generator, use SiAllocFile() with a wave file editing application. Since SiAllocFile() can loop wave files continuously, it can be used to generate fixed periodic sound sources. For real-time signal generation, one can use the Generator Plugin interface (see CGPlugIn). There are three methods for muting a sound source:
Call SrcMute(). Internally, it is similar to the SrcGainScalar() method below. The difference is SrcMute() preserves the gain value. Call SrcGainScalar() with a scalar gain of 0.0 or call SrcGain() with a dB gain value less than -97.0 dB (-96.3 dB is the dynamic range of 16-bit integer). This does not affect the processing performed. The computational load remains the same. Since the gain of the reflections is dependent upon the gain of the source, this mutes the reflections as well. Call SrcEnable() to disable the source. The source is no longer rendered, eliminating the source entirely. Disabling a source reduces computational load. Note, though, disabling is image-specific. If using reflections, SrcEnable() only impacts the direct path. SrcRefEnable() can be used to disable reflections.
DSP Parameter Caution The state copy between SRAPI and slabwire is between two different threads of execution. Thus, there can be a slight delay (microseconds to milliseconds) between the setting of a scene parameter, e.g., source location, and the resultant slabwire DSP value, e.g., source azimuth. Source azimuth is calculated by the DSP based on acoustic scene geometry. In general, this delay is not an issue. But, it is not instantaneous, therefore scene queries occurring immediately after a set, e.g., data binding, could report an incorrect value (the old value). Polling is the safest way to ensure the parameters set in one thread of execution have been captured by another. The primary scene parameters of concern are source azimuth, elevation, and range. The following SRAPI functions can be used to address this issue (see HeadMatch app): Lock(), UpdateWaitReset(), Unlock(), UpdateWait() Known Issues slab3d does not detect all occurrences of DirectSound buffer underflow. Usually, underflow results in an underflow counter increment. But, sometimes, stuttering is heard without an underflow increment. This is most likely to occur when the CPU is at maximum usage or a CPU usage spike occurs (e.g. reading a file). A missed underflow can be caused by a late Windows timer update, allowing the DirectSound pointers to wrap around the buffer to a valid location. Development has been paused on the SLABScape "hand". It is not documented.
Applications Several applications are included in the slab3d Release. The ones considered of general use are distributed in exe form. The others can be built using the Microsoft Visual Studio solution slab3d\src\slab3d.sln (slab3d\ refers to the slab3d release installation directory). All application source code is in slab3d\src. In general, the naming convention used is SLAB* for GUI applications and slab* for console applications. All applications are written in C++ or C# using Microsoft Visual Studio. SLABScape is a virtual environment rendering application. It provides a graphical user interface for several SLAB Render API audio-rendering parameters and Direct3D visual-rendering parameters. SLABScape can be used to create Start Menu links for a slab3d installation (see SLABScape | Help | About...). For more information, see the SLABScape Help Page. slabcon is a console application demonstrating SRAPI development.
The minimum API demo shows the minimum SRAPI API use required to spatially render a sound source. The general API demo demonstrates multiple sound outputs, signal generator allocation, User sound source sample streaming, and automatic scene updating using Modifiers. The Spatial2 render plugin demo demonstrates sound source VU meters, instant replay, and silence skip. The HRTF demo demonstrates HRTF database loading and selection. It also demos the use of a sound source azimuth modifier to create a circular trajectory.
SLABWireDemo demonstrates the use of the slabwire library. slabwire sits beneath SRAPI and is used to configure a low-level sound_in-to-DSP-to-sound_out audio rendering chain. SLABWireDemo contains an extremely simple audio effects render plugin that contains passthru, lowpass, and flanger effects and a sample grabber. The sample grabber is used to route samples to a visual waveform display and an FFT. The FFT provides frequency information for a spectrum display and frequency-triggered OpenGL displays. SLABCall places a VoIP call to a slab3d VoIP sound source. There are two connection types, ID and Port. An ID connection is a VoIP connection where all connections occur through one VoIP session and one port. The connection is identified by an ID number. A Port connection refers to a VoIP connection where each connection uses its own session and port. The connection is identified by a port number. Multiple connections will be mixed together. SLABSound is a sound I/O utility. It provides ASIO, DirectSound, Waveform API, and RIFF file information. SLABSurface provides a MIDI and/or GUI control surface interface to slab3d. It is useful for controlling and developing slab3d-based synthesis patches and generator plugins. src\archive\ contains past projects that are no longer maintained.
Copyright (C) 2006-2014 Joel D. Miller. All Rights Reserved.
XScape Content and slabx Coordinates This discussion assumes some familiarity with XNA Game Studio 4.0. The default XScape "Content" (models, textures, etc.) is built using the XScape project (see the slab3d\bin\Content\ directory). Additional content can be built using the cbuilder (content builder) project (or another XNA project). cbuilder outputs compiled content to slab3d\bin\ContentBuilder\. A content source item is referred to as an "asset". To be used by XScape, the asset must be compiled into a .xnb file. This limits the asset formats to those supported by the XNA importers (or thirdparty providers; this discussion is limited to the standard importers). Compiled model and texture assets can be moved to XScape's Content directory for rendering. MSDN - XNA Game Studio 4.0 - Standard Importers and Processors http://msdn.microsoft.com/en-us/library/bb447762.aspx For models (Output Type NodeContent), two importers exist: Autodesk FBX and Microsoft X File. For textures, the supported types are .bmp, .dds, .dib, .hdr, .jpg, .pfm, .png, .ppm, and .tga. An excellent source of model content is Google 3D Warehouse: http://sketchup.google.com/3dwarehouse/ Models can be downloaded in the Google SketchUp format (.skp) (and sometimes Collada). The free version of SketchUp only exports to Collada (.dae) and Google Earth File (*.kmz). SketchUp Pro includes FBX (http://sketchup.google.com/intl/en/product/whygopro.html). The free version of SketchUp can be extended via Plugins (tests were originally conducted using SketchUp 7). Three plugins were found that export to the X File format: 1. xExporter.rb by Erwan de Cadoudal http://sites.google.com/site/edecadoudal/sketchupwithdirectx 2. ZbylsXExporter.rb by Zbigniew Skowron http://www.scriptspot.com/sketchup/scripts/zbylsxexporter 3. 3D Rad Exporter http://www.3drad.com/Google-SketchUp-To-DirectX-XNA-Exporter-Plug-in.htm These will appear in the Google Plugins menu as "DirectX", "Zbyl's Exporter...", and "3D Rad". The first two were used to convert XScape's default content. If the DirectX SDK is installed, DirectX Viewer can be used to evaluate the generated X File. SketchUp Pro's FBX Export can also be used to prepare XNA model content. Different exporters yield different results, so sometimes multiple methods need to be tried before obtaining a usable result. Once the X File or FBX assets have been created, they can be added to the cbuilder Content (right-click Add | Existing Item...). The model's textures will also need to be added. If it is not obvious which textures are necessary, the Content processor's debug output can be used to add the textures based on error messages. When adding the textures, the Content properties need to be changed from the defaults: Build Action: "Compile" to "None" and Content Processor: "Texture - XNA Framework" to "No Processing Required". The textures are processed with the model so no additional processing is required. These steps place copies of the models and textures in the cbuilder project's Content directory. When using a SketchUp-generated FBX, the FBX will probably contain path information in the "Filename" and "RelativeFilename" fields. An FBX is a text file so it can be edited in Visual Studio to trim the Filename and RelativeFilename fields to include only the texture filenames.
In general there are three steps one might want to consider before exporting the model: 1. remove any components that aren't desired 2. translate the model so that the origin is in the most meaningful location for virtual world translations and orientations 3. orient the model so that XNA yaw,pitch,roll 0,0,0 is in the desired facing direction XScape and slabx use the right-handed XNA coordinate system: +x right +y up +z back XScape default camera facing: -z SketchUp uses the following right-handed coordinate system (Simple Template - Meters, positive axis solid, negative axis dotted): +x right (red) +y forward (green) +z up (blue) SketchUp default camera facing: +y The default camera facing for XScape, DirectX Viewer, and SketchUp are fairly similar with the SketchUp to XNA axes mapped as follows: SketchUp +x to XNA +x right SketchUp +y to XNA -z forward SketchUp +z to XNA +y up All exporters discussed except xExporter use the mapping above. xExporter maps SketchUp +x to XNA -z. Orientation angles are positive CCW as viewed from the positive axis towards the origin, e.g., for XNA, +yaw is CCW looking down +y. In XScape and slabx, 0 yaw is defined as facing forward towards -z. Note, the slabx.dll graphics engine uses the XNA coordinate system defined above. The slab3d.dll (Managed SRAPI) sound engine uses an FLT coordinate system (for historical reasons). To simplify the simultaneous use of both, Managed SRAPI extends SRAPI with XNA coordinate functions.
Voice over IP (VoIP) Both the SLAB Render API (SRAPI) and slabwire support Voice over IP (VoIP). VoIP was implemented using the open-source JVOIP package: JVOIP: JVOIP Thesis:
http://research.edm.uhasselt.be/jori/page/index.php http://research.edm.uhasselt.be/~jori/page/index.php?n=CS.Thesis
The specific JVOIP release used is included in the slab3d\usrlib directory. SRAPI and slabwire receive VoIP data but they do not send it. The SLABCall application demonstrates how to configure JVOIP to send sound samples to slab3d (see Applications and the source files SLABCallDlg.h/cpp). An executable version of SLABCall is included in the slab3d Release and SLABScape includes VoIP as a sound source allocation option. These two apps can be run simultaneously on the same computer (localhost) or on a network to establish a functional VoIP connection. VoIP communication takes place between a sending and receiving IP address and port. The localhost loopback IP address 127.0.0.1 can be used to run the receiving and transmitting apps on the same computer. The port used to receive data is referred to as the portbase. The portbase should be an even number. Only one app can open the port used as the portbase at a time. Thus, multiple apps running on the same host must specify different portbases. The default JVOIP portbase is 5000. So, if SLABScape is used to receive VoIP data on port 5000 from a sending app on localhost, the sending app must be set to a different portbase (e.g., 5100). This is true even if the sending app is not receiving VoIP data packets. Both SLABCall and the JVOIP test application JVOIPTestUtil can be used as the sending application. The SLABScape Source Allocation dialog VoIP "Port" setting is equivalent to the JVOIPTestUtil "RTP portbase". If both apps are running simultaneously on localhost, be sure to set these parameters to different ports. SLABCall's portbase is 5100. The SLABCall or JVOIPTestUtil "Destination Port" should match the SLABScape "Port" (or, equivalently, the SrcAllocVoIP(), CSIVoipPort(), or CSIVoipID() port). Due to the JVOIP extension architecture, two different slab3d interfaces have been developed for receiving VoIP data: VoipPort and VoipID. These terms come from the slabwire classes used to implement the interfaces. VoipPort simply defines the JVOIP output to be redirected to slab3d. Prior to the output stage, the default behavior of the JVOIP mixer is to mix all connections to the same port together. A VoipPort will do this as well. The downside of VoipPort is that each individual unmixed connection requires its own JVOIP session. Each JVOIP session requires a separate port and thread of execution. The VoipID interface was developed to use the same port and thread for all connections. In this approach, a JVOIP variable is reinterpreted as a stream ID. The SLABCall source code demonstrates how to set the ID for the transmitter. In slab3d, the first VoipID allocated corresponds to ID 0, the second to ID 1, and so on. The JVOIPTestUtil app distributed with slab3d has been modified to set the ID to 0 (see "// JDM" comments in JVOIPTestUtilDlg.h/cpp). Thus, it will connect to either a VoipPort or an ID 0 VoipID. SLABCall requires the user to specify the type (ID or Port) and the ID.
SRAPI In SRAPI, VoIP is a sound source type that behaves just like the other sound source types (e.g., File, ASIO, etc.). The SrcAllocVoIP() function is used to allocate a sound source that can receive sound samples from a VoIP connection. Note: SrcAllocVoIP() nID >= 0 corresponds to the slabwire VoipID interface, nID = -1 to the VoipPort interface. More information on SrcAllocVoIP() can be found in the SRAPI Reference Manual.
slabwire In slabwire, the Sample StreamIn (SSI) classes CSIVoipPort and CSIVoipID implement the VoipPort and VoipID interfaces. Even though VoipID is the preferred interface, VoipPort is slightly easier to use. Also, at the source code level, VoipPort is much easer to understand than VoipID. VoipID requires a fairly complex interface to JVOIP. For one or two sound sources, the two will perform similarly. For several connections, VoipID should be more efficient and robust. To see how VoipID and VoipPort fit into the slabwire architecture, see the "slabwire Class Diagram" on the main page of the slabwire Reference Manual. The class docs for CSIVoipPort and CSIVoipID illustrate the JVOIP subclasses used for slab3d VoIP support in the collaboration diagrams.
Copyright (C) 2006-2014 Joel D. Miller. All Rights Reserved.
AvADE (Aviation Auditory Display Engine) Server Software The AvADE software comprises three applications built into the slab3d virtual acoustic environment rendering system: WinSpeak, AvADE Client, and AvADE Server (henceforth "Avade"). Avade was designed with maintainability and extensibility in mind. It can be easily modified to support a wide variety of spatial audio and communications server applications.
WinSpeak.exe WinSpeak allows the user to experiment with the SAPI 5.4 TTS (Text-To-Speech) voices installed on the user's Windows 7/8 system, including the specification of SSML and SAPI XML pronunciation tags. "Microsoft Anna" is the default SAPI 5.4 voice installed with Windows 7. This voice, or other installed voices, can be used to generate real-time TTS with Avade.
AvADE Client (AvadeClient.exe) AvadeClient is a test utility that serves as a sample client application. It allows the user to type and send arbitrary text commands to the Avade server while viewing server-to-client responses in a log window. The command entry droplist contains several test commands.
AvADE Server (Avade.exe) Avade supports a variety of features organized around the concept of sound "sources" and sample "streams". For the Spatial Render Plugin, a "source" is a virtual environment sound emitter, i.e., a virtual entity with an azimuth, elevation, and range relative to a listener. For the Mixer Render Plugin, a source is a mixer channel strip, i.e., output channel gains and a binaural HRTF effects processor. For both plugins, a stream is the sound sample stream that plays from the sound source, e.g., wave files, TTS speech, DIS Radio communications, signal generators, etc.
Supported Sound Source Parameters Both Plugins o Enable o Gain (dB) Spatial Render Plugin o Spatialization (Spatial, Relative, Panned) o Azimuth, Elevation, and Range (Spatial and Relative sources) o Pan (Panned sources) o Source Radius and Spread Rolloff (source-listener distance attenuation) Mixer Render Plugin o Binaural HRTF Processor Enable, Azimuth, and Elevation o Output Channel Gain Scalars (ASIO device channels 1-8) Supported Sample Stream Input Types wave files (looping and one-shot) real-time TTS speech multichannel sound device input (ASIO driver required) signal generators (stream type for "sonifiers") DIS Radio communications The underlying audio engine, slab3d, supports several additional source parameters and stream types. Support for these can be added as needed. All source, stream, and Avade parameters can be saved to and read from human-readable XML files. A time-stamped log window provides high-level Avade status information (e.g., user activities, incoming messages). A polled audio engine status window provides low-level information (e.g., auditory scene state, error status). This allows the user to confirm that Avade GUI and client command settings have been instantiated by the audio engine. A test set of Avade commands have been implemented. Additional commands can be easily added. Wave files can be played and stopped from the client. TTS speech can be generated on demand and placed in queues based on priority. All higher priority speech plays before lower priority speech. Lower priority speech is interrupted to allow for an incoming higher priority message. The lower priority speech is then played again from the beginning. DIS (Distributed Interactive Simulation) Radio communications (IEEE standard 1278) can be streamed over network connections. Client communications can occur via the included DISTrans and DIS_Radio applications or other DIS radio transmitters. These communications include a frequency that can be used to specify individual communication channels. Thus, one can spatialize different communication channels to different positions in auditory space. For audio output, Avade supports three device driver interfaces, Waveform-Audio, DirectSound, and ASIO. Waveform-Audio has the highest latency but is supported by all Windows audio hardware. DirectSound is similar to Waveform-Audio but can provide lower latency. ASIO has low latency and flexible configuration options, but is supported by a smaller set of semi-pro and pro audio hardware.
AvADE Commands The AvADE server responds to the following string commands sent by a TCP/IP client. The AvadeClient utility can be used to test and demonstrate the use of these commands. These commands are also documented on the Avade Help tab. play( sourceName ) play() rewinds and plays the stream attached to a sound source. playstream( sourceName, streamName ) playstream() attaches, rewinds, and plays a sound source stream. stop( sourceName ) stop() stops the stream attached to a sound source.
locate( sourceName, az_degrees, el_degrees, r_meters ) locate() positions a sound source using polar coordinates. Note: The new position overrides the position specified on the Sources tab. rel( sourceName, az_degrees, el_degrees, r_meters ) rel() positions a sound source relative to the listener. If the listener is moved, the source moves as well. Notes: The new position overrides the position specified on the Sources tab. The listener is presently stationary; this command is for potential future functionality. pan( sourceName, pan_value ) pan() positions a source via left-right gain scalars. pan_value range: 0.0 = monotic-left, 0.5 = diotic, 1.0 = monotic-right Note: The new position overrides the position specified on the Sources tab. speak( sourceName, tts_text, priority ) speak() speaks the text specified in tts_text using the source specified by sourceName. tts_text quotation marks are optional. Priority is specified 1 (highest) to 5 (lowest). sonify( sourceName, sonified_value ) sonify() sonifies the value passed in sonified_value (typically a float). The sonification depends on the Sonifier selected for the source (Sources tab). The included sonifiers are more for proof of concept than actual use. They both work with the Spatial Render Plugin. Sonifiers synchronize sound source and signal generator behavior based on sonification parameters. They provide a way to extend slab3d at the C# level (SlabSharp). mixgain( sourceName, g1, g2, g3, g4 ) mixgain() adjusts ASIO output ch 1-4 gain scalars for sourceName. mixhrtf( sourceName, enable, az, el, gain ) mixhrtf() enables/disables HRTF processing using the azimuth, elevation, and gain specified for sourceName. The L/R binaural stream is output on channels 1 and 2 (mixed with mixgain() streams, if any).
AvADE Software Architecture
Appendix Latency Analysis Several rounds of slab3d latency analysis have been performed: 2003
J.D. Miller, M.R. Anderson, E.M. Wenzel, and B.U. McClain, “Latency Measurement of a Real-Time Virtual Acoustic Environment Rendering System,” Proc. 2003 ICAD, Boston, MA, July 2003. Paper Poster
2005
The white paper included below describes an ASIO-based audio latency analysis and reduction effort performed in 2005.
2012
A C# audio-visual latency analysis was performed in 2012. The corresponding technote "tn_av_latency_2012.docx" is included with the slab3d documentation.
2014
A quick Windows 7 and XNA4 investigation was added to the 2012 technote.
2014
Two ASIO audio peripherals were tested for use with the AvADE auditory server. Their latency values are included in the table below.
2014
A Windows 7 and XNA4 latency analysis was performed using the xbox controller. This effort also investigated XNA and Windows timing. The results are discussed in technote "tn_av_latency_2014.docx".
2015
DIS Radio and VoIP latencies were measured. The results are tabulated below.
AvADE ASIO Audio Peripheral Latency Comparison ASIO driver buffer size setting* AvADE Get ASIO Details (samples unless stated) min buffer size max buffer size preferred buffer size granularity selected buffer size input latency output latency est API latency (ms) est mean API latency (ms) est I/O latency (ms)
64
MOTU PCI ASIO 128
64 64 64 0 64 86 87 3-5 4 5
128 128 128 0 128 150 151 6-9 8 10
256
256 256 256 0 256 278 279 12-18 15 18
Focusrite USB 2.0 Audio Driver 89 (2.0ms) 133 (3.0ms)
89 1024 89 1 96* 232 328 10-12 11 15
133 1024 133 1 160* 360 520 15-19 17 24
* For the MOTU PCIe/24I/O, buffer sizes are set via the MOTU PCI Audio Console app and the Samples Per Buffer setting. For the Focusrite Scarlett 18i20, buffer sizes are set via the MixControl app and the ASIO Buffer Size setting (in ms). slab3d requires the buffer size to be a multiple of the audio processing frame size (32 samples).
DIS Radio and VoIP Latency 1/20/2015 There are two SRAPI DIS Radio APIs: 1) 2009/JDM = Joel Miller's DIS API developed circa 2009 using the AFRL-provided slabDISInterface library. Stream Types: StreamInDISRadioRT, StreamInDISFreq 2) RT_DIS/JMS = John Stewart's Real-Time DIS API integrated late 2013, built using John's rtdis library. Stream Type: StreamInDISRT Latency tests were performed with the apps DISTrans (ASIO) and SLABCall (Waveform-Audio) transmitting to Netcom (ASIO) on a single PC. Headset mic input was split to left-audio-in on a second PC. Netcom output was sent to right-audio-in on the second PC. All sample rates were 44,100 Hz. The DISTrans/SLABCall/Netcom PC used a Focusrite Scarlett 18i20 with a 2ms ASIO buffer size for an audio I/O device. The measurement PC used a MOTU PCIe 24I/O. Stream types are shown as defined in slabwire file streams.h and slabsharp file SlabSharp.cs. In SRAPI, these map to corresponding SiAlloc*() functions and settings.
Stream Type StreamInDISRadioRT StreamInDISFreq StreamInDISRT StreamInVoipPort StreamInVoipID Scarlett in-to-mix-out
Latency 33 ms 34 ms 27 ms 134 ms 98 ms 28 samples
Settings 10ms pre-buffer 10ms pre-buffer 10ms pre-buffer
Note: The stream type StreamInDISRadio was removed from SRAPI in slab3d v6.7.4 to simplify the 2009 API. Its only advantage was non-real-time output support, e.g.,file alone. Sound device output can be split to a file to provide similar functionality. StreamInDISRadio latency was measured to be 636 ms using a 600 ms pre-buffer (it was a legacy algorithm that required relatively large buffers; StreamInDISRadio was replaced by StreamInDISRadioRT).
Latency Reduction Techniques for Software-Based Real-Time Auditory Displays Abstract Much of the latency and latency variability present in a software-based real-time auditory display is due to sound buffer management. Techniques for reducing this latency and variability are discussed for the following application scenarios: a dynamically updated auditory display, an event-triggered auditory cue, multi-modal display synchronization, and input-to-output audio processing. Steinberg’s ASIO (Audio Streaming Input Output) cross-platform sound driver model was used to tailor solutions for each of these scenarios. 1.
Introduction
Two latency components of interest to the audio software developer are API latency and I/O (input-to-output) latency. An API (application programming interface) is the interface used to interact with a software component. Thus, API latency refers to the time lag between an API action and the audio output result of that action. For cases where the input is read, processed, and written to the output, I/O latency refers to the time lag between audio input and audio output. The minimization of API latency and I/O latency is the focus of the current work. 2.
Latency Scenarios
A variety of latency constraints can be placed on the design of a software-based real-time auditory display. For example, one might require latency variability to be at an absolute minimum while allowing for a slight increase in overall latency. These constraints would be useful for synchronizing the audio and visual components of a multimodal display (provided the visual display met the same criteria). The following goals for four application scenarios were set: 1) 2) 3) 4)
Reduction of API latency and latency variability in a dynamically updated auditory display Minimum API latency and latency variability for event-triggered auditory cue playback Minimum API latency variability for multi-modal display synchronization Minimum input-to-output sound processing latency and latency variability
As suggested above, these constraints are not always compatible, e.g., reduction of latency variability may lead to an increase in average latency. In fact, this was the case in meeting the criteria for (3). Thus, multiple modes of operation may have to be supported by the auditory display in order to meet different latency constraints. Below, the four latency scenarios are described in more detail. 2.1.
Dynamically Updated Auditory Display
A head-tracked VAE simulation is an example of a dynamically updated auditory display. Acoustic scene parameters (e.g., listener position) are often dependent upon controllers (e.g., head trackers, mice) and/or GUI (graphical user interface) controls (e.g., sliders, buttons). The user is free to interact with the controls at any time. Both latency and latency variability significantly degrade the simulation, thus a reduction in both is desirable. Complicating this effort is the fact that the system is typically under a fairly heavy computational load while rendering the virtual display. 2.2.
Event-Triggered Auditory Cue
The event-triggered auditory cue in this instance is a single static auditory cue that must follow a physical event as closely as possible (e.g., an auditory “click” cue corresponding to a keyboard key press). Average latency is the key parameter. 2.3.
Multi-Modal Display Synchronization
Often a single virtual environment application will be responsible for synchronizing more than one type of display (e.g., visual and auditory). Ideally, the non-auditory display will have low latency variability and some known rendering latency. In this instance, latency variability is the key parameter, as the control software can lag the display with the shorter latency.
2.4.
Input-to-Output Sound Processing
In communication applications and other applications where the sound source is external to the computer, I/O latency is crucial. For example, the sidetone (signal from the operator’s microphone) presented to the operator of a spatial auditory communication device should be closely synchronized to the operator’s speaking voice in order to avoid an annoying echo. The goal in this case was to enable the software implementation of hardware devices such as the Ames Spatial Auditory Display (ASAD) [1]. 3.
Latency Components
The API and I/O latency components are shown in Figure 1. The API component time intervals are labeled A1, A2, F4, and F5. The input-to-output time intervals are labeled F1 thru F5. When calculating latency for the in-toout scenario, these time intervals are additive. In the API case, however, the time interval relationships are implementation dependent. 3.1.
API Latency
Audio-processing API parameter modification often requires some form of translation before it can be applied to the sound sample stream. For example, API-level sound source movement in a VAE requires a new set of DSPlevel HRIR (head-related impulse response) filter taps and time delays. The time required to do this translation is labeled A1 in Figure 1. Once this translation is performed, the new DSP parameters are applied to the sound sample stream. The number of samples processed at a time is referred to as a frame. The time required to process one frame of sample input is labeled A2 in Figure 1. After processing, the samples are transferred to an audio output buffer (length in ms = F4), which is, in turn, given to the audio output driver. The audio driver transfers the samples to the sound peripheral hardware that eventually generates the sound heard. The time required to do this is labeled F5 in Figure 1. 3.2.
Input-to-Output Latency
Unlike the discreet nature of an API parameter change, in-to-out sound processing is a continuous process. A sound source external to the computer is connected to a sound peripheral that provides sound samples to software via an audio input driver (time required = F1). The samples are delivered in an audio input buffer that takes F2 milliseconds to fill. Once the application has access to the input samples it applies a DSP algorithm. Depending on the algorithm, significant latency can be introduced. As a trivial example, consider a delay line that simply delays sample input by 100ms. The latency introduced by the DSP algorithm is labeled F3 in Figure 1. From this point on, the sound path is as described in section 3.1.
A1
API Latency
I/O Latency
API parameter
audio input
API-to-DSP parameter translation
audio input driver
F1
audio input buffer
F2
A2
DSP frame processing
DSP algorithm
F3
F4
audio output buffer
audio output buffer
F4
F5
audio output driver
audio output driver
F5
audio output
audio output
Figure 1. API and I/O Latency Components. A1,A2 and F1-F5 are time intervals impacting latency.
3.3.
Audio Driver and Audio Buffer Latency
Although the audio driver latency itself can be minimal, the selected driver model dictates the size of the audio buffers and the type of buffer management possible. The two driver models examined were DirectSound and ASIO. In the previous work [2], DirectSound, a low-latency Microsoft Windows-specific audio driver API, was the interface chosen for audio output buffer management. This was based on the interface options available in 1998. Since then, Steinberg’s ASIO (Audio Streaming Input Output) driver interface has gained in popularity and support. ASIO is an open cross-platform interface that can support much lower latencies than DirectSound. Note, though, most Windows-compatible sound peripherals support DirectSound; typically, only higher-end sound peripherals support ASIO. 3.3.1.
DirectSound Latency
Several output-only DirectSound buffer management algorithms were developed. The one with lowest latency employed a single buffer polled at regular time intervals to determine when space was available for sample output (algorithm details in [2]). The end result was a maximum output buffer latency equal to the size of the output buffer used. Unfortunately, the output buffers had to be relatively large to avoid audible artifacts. Buffers used in practice were two-channel interleaved and ranged from 2048 samples (23.2ms) to 4096 samples (46.4ms) (unless otherwise noted, all sample rates are 44,100 samples/s). The DirectSound driver latency was found to be either 2.1ms or 26.3ms depending on driver implementation. Due to the difficulty of implementation relative to ASIO, an input-to-output solution was not implemented using DirectSound (see below). 3.3.2.
ASIO Latency
For each audio channel, ASIO uses a traditional double-buffer mechanism: while one buffer is locked for use by the sound peripheral, the other is free for use by the application. The application is notified of a buffer swap via a callback function. The ASIO interface is exceptionally simple and lean, allowing for much smaller buffer sizes than those possible with DirectSound. The selectable ASIO buffer sizes are device-dependent, but buffers as small as 64 samples (1.45ms) are possible. Unlike DirectSound, ASIO buffers are input/output by nature. Input and output buffers are the same size and are made available to the application simultaneously during the buffer swap callback. Thus, low-latency input-tooutput solutions readily follow from an output-only implementation. DirectSound, on the other hand, requires much more implementation overhead to implement an in-to-out solution. Further, the larger buffer sizes required for both sound input and sound output result in prohibitively large I/O latencies. The ASIO driver latency is driverdependent and can be queried from the driver. This allows for fairly predictable latency values given knowledge of the buffer management algorithm. 4.
Latency Reduction Results
Given the clear latency advantages of ASIO over DirectSound, ASIO was the driver model selected for satisfying the latency scenarios discussed in section 2. All test results that follow were performed using the following workstation configuration: Windows 2000 (SP4), P4 2.20GHz, 512KB RAM, Delta66 sound card, 64 sample ASIO buffer size (F2 = F4 = 1.45ms). The application SLABLatency was used to measure API latency while rendering one sound source and one listener in an anechoic virtual acoustic environment (frame size = 32 samples, A1 ~= 0.03ms, A2 ~= 0.10ms). Internal timestamps were externalized to a digital storage oscilloscope using the serial port (details in [2]). Latency intervals were measured using oscilloscope cursors with a resolution of 0.04ms. The SLABSound application was used to query the Delta66 ASIO input and output latencies (driver+buffer) using the ASIO function ASIOGetLatencies(). The input latency was reported to be 96 samples (2.18ms, F1+F2, light "CB" bars in Fig.2) and the output latency, 94 samples (2.13ms, F4+F5, dark "CB" bars in Fig.2). All source code required for implementing and measuring the algorithms discussed below is available in the slab3d release [3].
Figure 2. Latency Timing Diagram. CB1 and CB2 represent the ASIO callbacks that occur at times 1.45ms and 2.9ms, respectively. The light bar to the left is ASIO input latency; the dark bar to the right is ASIO output latency. The Frame bars represent the time required to process one frame of input samples. As described in section 3.3.2, the ASIO driver model uses a double-buffer callback mechanism for accessing sound sample input and output. Callbacks occur at an interval equal to the audio buffer size. The buffer management and frame processing algorithm is as follows: when a callback occurs, read samples from the input buffer, process them, and place the result in an output buffer; process frame size samples at a time and continue processing until the output buffer is full; submit the buffer for output during the next callback. Referring to Figure 2, the input buffer made available at time 1.45ms (CB1) is processed in two frames at time 1.45ms and submitted to the output driver at 2.9ms (CB2). Further details of each scenario’s implementation are discussed in the subsections that follow. Latency measurement results are summarized in Table 1. For each scenario, fifty measurements were taken. 4.1.
Dynamically Updated Auditory Display
In general, the timing of a dynamic API update is unconstrained. In practice, however, timing constraints may exist due to thread priorities. Typically, the audio peripheral driver and the frame processing threads run at a higher priority than the API updating thread to ensure that the audio stream is continuous and free of audible artifacts. Several other operating system and application threads may also impact the servicing of the API updating thread. Referring to Figure 2, the frame processing thread constraint limits the region of possible API updates for CB2 output to the span between the API Min and Max square markers. This span is the predicted API latency variability = F4-2*A2 = 1.45ms-2*0.10ms = 1.25ms (Fig.2, "API Var"). This predicted value is a rough estimate since the actual total frame processing time might change over time. For example, when an API update occurs, the API-to-DSP parameter translation time (A1) (Fig.2, dark Frame bar) is added to the frame processing time (A2). An interesting result is average latency and latency variability actually decrease as frame processing time increases. After the API update occurs, the new DSP parameters are applied to the sound samples made available at the next callback (Fig.2, 1.45ms, CB1). The processed samples are submitted to the ASIO driver the following callback (Fig.2, 2.9ms, CB2). ASIO output latency milliseconds later (F4+F5) the processed samples are displayed. Thus, the predicted minimum API latency is F4+(F4+F5) = 1.45ms+2.13ms = 3.58ms. The predicted maximum API latency is [F4-2*A2]+[F4+(F4+F5)] = 1.25ms+3.58ms = 4.83ms. As can be seen in Table 1, the measured values compare well with the predicted values.
4.2.
Event-Triggered Auditory Cue
When limiting the display requirements to the playback of an event-triggered static auditory cue, one ASIO buffer of API latency can be eliminated. This is because the auditory cue can be pre-processed and copied to an ASIO output buffer immediately after the event occurs without waiting for a callback to initiate frame processing. Further, pre-processing can help enable smaller ASIO buffer sizes by eliminating a competing CPU load during cue playback. Referring to Figure 2, the data is submitted to the audio driver at 1.45ms (CB1) instead of 2.9ms (CB2), resulting in the minimum and maximum API latencies being truncated to the dashed vertical line (CB1 output latency). Since the frame processing no longer happens in real-time, this is removed from the API latency variability calculation, yielding a predicted API latency variability equal to the ASIO buffer size (1.45ms). The measured values compare well with the predicted values; both are shown in Table 1. It is important to note that the lowest possible latency is only possible if the ASIO stream is up and running (playing silence) before the event occurs. Stream initiation was found to add up to 10ms of additional latency. 4.3.
Multi-Modal Display Synchronization
To synchronize an auditory display and a non-auditory display, the display update application requires knowledge of the buffer and frame times shown in Figure 2. This can be accomplished using thread synchronization functions (e.g., in Windows, SetEvent(), WaitForSingleObject()). These were used to write two API functions, WaitSuspend() that waits for the frame processing block to suspend waiting for output buffer space, and WaitSwap() that waits for the ASIO buffer swap callback. By placing the auditory display API update between WaitSuspend() and WaitSwap(), one ensures the update occurs during the API Variability region shown in Figure 2. By placing the non-auditory display update just after WaitSwap(), one ensures the audio will lag that update by the minimum API latency. The latency measurements taken using WaitSuspend() and WaitSwap() are shown in Table 1. A significant reduction in latency variability was noticed. 4.4.
Input-to-Output Sound Processing
The predicted I/O latency is calculated by summing the latency components labeled F1-F5 in Figure 1. The VAE rendering scenario used for testing includes 0.19ms of algorithm latency (0.03ms due to source-listener propagation delay and 0.16ms due to the leading seven samples of a 16-pt FIR used for up-sampling audio input). Since the ASIO callback must execute quickly, the DSP Algorithm component is applied between two callbacks, resulting in an additional callback interval (F4) of latency within the DSP Algorithm component (F3 = 0.19ms+F4, DSP Alg in Fig.2). This results in the following predicted I/O latency: [F1+F2]+F3+[F4+F5] = 2.18ms+0.19ms+1.45ms+2.13ms = 5.95ms. A non-spatial Diotic test was also performed that omits the 0.19ms of algorithm latency. Since the components that affect I/O latency run in high-priority threads, no latency variability was noticed in the latency measurements. The measured and predicted latency values are shown in Table 1. This mode was put to the test in the development of HelmetCam [4], a search and rescue communication system that employs a spatial auditory display similar to the hardware-implemented ASAD. Measured and Predicted Latencies (ms) Dynamic Update Latency (mean) Variability (span) max min standard dev
4.30 1.20 4.88 3.68 0.41
4.31 1.25 4.83 3.58
Event Triggered 2.81 1.40 3.64 2.24 0.39
2.86 1.45 3.58 2.13
Display Sync 3.43 0.08 3.44 3.36 0.02
3.58 0.00 3.58 3.58
Input-to-Output Latency spatial Latency diotic
6.08 5.88
5.95 5.76
Table 1. Latency Measurement Results. The first column under each heading lists the measured values in milliseconds, the second column, the predicted values in milliseconds. N = 50.
5.
Conclusions and Future Work
Latency and latency variability reduction techniques were developed and described for the following four application scenarios: a dynamically updated auditory display, an event-triggered auditory cue, multi-modal display synchronization, and input-to-output audio processing. Latency measurements were performed for each technique and compared to predicted values, matching within a fraction of a millisecond in each case. To reduce latency even further, future work may include the support of sample rates above 44,100Hz. Preliminary tests revealed that ASIO buffer sizes as short as 0.67ms were possible using a sample rate of 96,000Hz. 6. [1] [2] [3] [4]
References D.R. Begault, “3-D Sound for Virtual Reality and Multimedia,” pp. 230-234. J.D. Miller, M.R. Anderson, E.M. Wenzel, and B.U. McClain, “Latency Measurement of a Real-Time Virtual Acoustic Environment Rendering System,” Proc. 2003 ICAD, Boston, MA, July 2003. http://slab3d.sonisphere.com D.R. Begault, M.R. Anderson, B.U. McClain, J.D. Miller, and E.M. Wenzel, "Audio-Visual Communication Monitoring System for Enhanced Situational Awareness," Working Together: R&D Partnerships in Homeland Security Conference, Boston, MA, 27-28 April 2005.
Acknowledgements: Thanks to Mark Anderson for his suggestions and feedback regarding the techniques discussed in this paper.