Transcript
An empirical evaluation of a two-dimensional second-order sound field recording and reproduction system Abhaya Parthy
Craig Jin and Andr´e van Schaik
School of Information Technologies The University of Sydney, Australia
[email protected]
School of Electrical and Information Engineering The University of Sydney, Australia {craig,andre}@ee.usyd.edu.au
ABSTRACT We present an empirical evaluation of a two-dimensional second-order sound field recording and reproduction system which has been designed to operate over a frequency range of 50 Hz to 16 kHz. The system has been built and tested, and we present measurements comparing its performance to that of an ideal two-dimensional second-order sound field recording and reproduction system. 1. INTRODUCTION A system for recording and reproducing a real sound field is useful in many applications including audio displays, virtual reality, audio-only gaming, cinema, and auditory research. The most widely used system for recording and reproducing a sound field is the first-order Ambisonics system originally developed by Gerzon [1]. The first-order Ambisonics system uses a sound field microphone to record a sound field to a first-order spherical harmonic representation. This spherical harmonic representation of the sound field is then used to drive an array of loudspeakers to recreate the original sound field, correct to a first-order approximation. The first-order Ambisonics system is in wide use and performs quite well in most applications, however, there is a significant benefit in using higher order systems as they have an increased spatial fidelity, especially at higher frequencies, and a larger listening area at the centre of the loudspeaker array [2]. Much research has been done on higher order sound field recording and reproduction systems, including work by Poletti [3], Li et al. [4], Abhayapala and Ward [2, 5], Daniel et al. [6], and Bertet et al. [7]. We have built a second-order sound field recording and reproduction system for sound field research and evaluation, and psychoacoustic testing. The second-order sound field recording and reproduction system uses an 8 loudspeaker circular loudspeaker array for reproduction, and a 24 microphone circular microphone array for recording. The circular microphone array has been previously characterised for use as a beamformer, and a detailed analysis of its design and perforc 978-1-4244-1724-7/08/$25.00 2008 IEEE
mance is presented in [8]. We present an empirical evaluation of the combined circular microphone array and loudspeaker array system when it is used as a sound field recording and reproduction system. The performance of this sound field recording and reproduction system is evaluated by measuring its ability to record and reproduce a sound field generated by a single plane wave sound source. The circular microphone array is placed at the centre of the circular loudspeaker array and is used to measure a plane-wave sound field which is generated by the loudspeaker array. The centre of the loudspeaker array is termed the ”sweet-spot”, because it is the position at which the recreated sound field is the most accurate. The measured plane-wave sound field is compared to an ideal secondorder plane-wave sound field and the difference is the combined error in the circular microphone array and loudspeaker array, and hence the total sound field recording and reproduction system error. This error is equivalent to the error which would be obtained by recording an ideal plane wave sound field generated by a single ideal plane wave source using the circular microphone array, and then measuring the error in the sound field recreated by the circular loudspeaker array using an ideal sound field measurement device. In other words, since the system is linear, the system error due to recording and then recreating a sound field, is identical to the system error if we reverse the process and first recreate and then record the sound field. The main contribution of this paper is that we show that the existing theoretical approaches can be translated into the realisation of a practical two-dimensional second-order sound field recording and reproduction system, and we present empirical measurements comparing the performance of the system to an ideal second-order sound field recording and reproduction system. 2. METHODS The performance of the sound field recording and reproduction system we have constructed was evaluated by measuring its ability to record and reproduce a sound field generated by
267
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on January 28, 2010 at 00:01 from IEEE Xplore. Restrictions apply.
ICALIP2008
a single plane wave sound source. The circular microphone array was placed at the centre of the circular loudspeaker array, and impulse response measurements were measured from each loudspeaker on the circular loudspeaker array to each microphone on the circular microphone array. The recorded impulse response measurements were then used to simulate signals which would have been recorded by the circular microphone array when the circular loudspeaker array is recreating an ideal plane wave sound field generated by one source. The error which is calculated from this analysis is the combined error of the circular loudspeaker array and the circular microphone array, and hence the error of the entire sound field recording and reproduction system. 2.1. System and Equipment Setup
that all channels are uniformly amplified. The output from the preamplifiers are fed to two 16-channel Apogee AD-16X analogue-to-digital convertors which were sampling the signals at 48 kHz with a 24-bit resolution. The output from the analogue-to-digital convertors is in ADAT format and is sent to a RME ADI-648 MADI-to-ADAT converter, which converts the three 8-channel ADAT signals to one MADI format signal that carries all the 24-channels of audio. The 24 microphone signals, in MADI format, are recorded on a standard personal computer (PC) using a RME Hammerfall DSP MADI PCI sound card. The PC is able to process the 24 microphone signals in real-time to create 8 loudspeaker feeds using a custom VST plug-in that has been optimised for fast and efficient filtering of multiple signals in real-time. The 8 loudspeaker signals are output from the PC’s RME Hammerfall DSP MADI PCI sound card in MADI format, and are converted into ADAT format using a RME ADI-648 MADI-toADAT converter. The ADAT format loudspeaker signals have a sample rate of 48 kHz with a 24-bit resolution, and are fed to a 16-channel Apogee DA-16X digital-to-analogue convertor. The 8 analogue signals from the digital-to-analogue converter are fed to two 6-channel Ashley Powerlex 6250 power amplifiers which power 8 uncalibrated Tannoy V6 loudspeakers. The 8 loudspeakers are mounted equally spaced around a circular rigid aluminum ring with a radius of 2.8 m, supported 1.0 m above the ground, to form the circular loudspeaker array. The circular loudspeaker array is situated inside a hemianechoic room, which has walls and a ceiling that are anechoic down to 100 Hz, and a floor which has thick carpet and underlay to minimise reflections from it. The circular rigid aluminium ring reflects sound and does cause some audible distortion to the sound field within the circular loudspeaker array, however, this is minimal, and will be avoided once the aluminum ring is padded with sound absorbent material.
Fig. 1. This figure shows a photo of the 24 microphone circular microphone array. 2.2. Sound Field Signal Processing The circular microphone array is shown in Figure 1, and the experimental setup is shown in Figure 2. The circular microphone array is baffled by a rigid cylinder which has a radius of 5.70 cm and a length of 30.0 cm. The circular microphone array has 24 DPA type 4060-BM omni-directional microphones mounted equally spaced around its central circumference, and when used to perform a second-order decomposition of the sound field, it has a signal-to-noise ratio that is greater than 30.0 dB for its entire frequency range of operation of 50 Hz to 16 kHz. We did not apply additional filtering to compensate or flatten the response of the 24 microphones in the circular microphone array, however, they are all of an identical model and of a high quality, so they have very similar frequency responses. The signals from the 24 microphones are amplified by three 8-channel Digidesign PRE preamplifiers, which have digitally controlled gains so
We assume that the circular microphone array records a sound field in which all the sound sources are in the far-field, and we assume that the loudspeakers are ideal plane wave sources. To recreate the sound field recorded by the circular microphone array, the loudspeakers are fed signals which are obtained by processing the signals recorded by the microphones on the circular microphone array. This can be written as a matrix equation and is given by [2, 4, 8] Pa = cb,
(1)
where c is a constant, a is a vector of unknown weights to be assigned to each loudspeaker, P is the loudspeaker decoding
268
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on January 28, 2010 at 00:01 from IEEE Xplore. Restrictions apply.
Fig. 2. This figure shows a photo of the experimental setup. The circular microphone array is at the centre of the circular loudspeaker array, inside the hemi-anechoic room. matrix, and for a second-order system is given by
√ 1 √2 cos(θL ) √ 2 sin(θL ) , √2 cos(2θL ) 2 sin(2θL ) (2) where, θl = 2π l, is the angular position of loudspeaker l in L the circular loudspeaker array, L, is the number of loudspeakers in the circular loudspeaker array, and b is the circular harmonic decomposition of the sound field recorded by the circular microphone array, and for a second-order system is given by 2π PM 1 j=1 p(kr, φj ) 2πB0 M 2π PM cos(φ ) M j=1 p(kr, φj ) πB1j 2π PM sin(φj ) b= (3) M P j=1 p(kr, φj ) πB1 , 2π M cos(2φj ) M j=1 p(kr, φj ) πB2 PM sin(2φj ) 2π j=1 p(kr, φj ) πB2 M √ 1 √ 1 2 cos(θ1 ) √ √2 cos(θ2 ) P = √ 2 sin(θ1 ) √ 2 sin(θ2 ) 2 cos(2θ1 ) √ √2 cos(2θ2 ) 2 sin(2θ1 ) 2 sin(2θ2 )
... ... ... ... ...
where, k = 2π λ , is the wave number, λ, is the wavelength, r, is the radius of the circular microphone array, φj = 2π M j, is the angular position of microphone j on the circular microphone array, M , is the number of microphones in the circular microphone array, p(kr, φj ), is the pressure recorded by the microphone at position φj on the circular microphone array, and, Bn (kr), is defined as Jn0 (kr) Bn (kr) = n i Jn (kr) − 0 Hn (kr) , Hn (kr) n
(4)
where, Jn (x), Jn0 (x), Hn (x), and Hn0 (x) are the Bessel and Hankel functions of x and their derivatives, respectively, i = √ −1, and n is a Neumann symbol defined as 0 = 1, and n = 2 for n ≥ 1. The sound field reproduced by the loudspeaker array can be optimised depending on the listening requirements [6]. For small area listening at the centre of the loudspeaker array, frequencies below 700 Hz will be processed using (1) to preserve the phase information of the sound field. Frequencies above 700 Hz will be processed so that the magnitude of the energy
269
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on January 28, 2010 at 00:01 from IEEE Xplore. Restrictions apply.
vector is maximised by modifying (1) so that Pa = cb g,
(5)
where represents an element-by-element multiplication and √
g= 1
3 2
√
3 2
1 2
1 2
T
.
(6)
We define this decoding scheme as the standard decoding scheme for our system. For large area listening, where listeners may no longer be located at the centre of the loudspeaker array, in-phase decoding will be used for all frequencies so that no loudspeakers are driven out of phase. This is represented mathematically using (5) where g= 1
2 3
2 3
1 6
1 T 6
.
(7)
f (Hz) 50 150 400 700 1500 4000 9000 16000
Radius (m) 14 4.6 1.7 0.98 0.46 0.17 0.076 0.043
Error (%) 17.5 14.2 5.54 5.13 2.62 7.16 4.69 25.5
Table 1. This table shows the RMS error, between the measured sound field and the theoretical second-order sound field, expressed as a percentage of the RMS intensity of the theoretical sound field. The error is calculated as a function of the frequency indicated, for a circular area, with the given radius, centred at the centre of the loudspeaker array.
3. RESULTS The sound field recording and reproduction system has been tested at a number of frequencies of interest, and the recorded sound field is compared to an ideal two-dimensional secondorder sound field recording and reproduction system which has no error. The incoming wave direction for the tests is at an azimuth of 10◦ . Due to the nature of the loudspeaker matrix decoding employed, if the incoming wave direction corresponds to the direction of a loudspeaker then only that loudspeaker will have a non-zero weight, i.e., only one loudspeaker will be producing a signal to generate the sound field, which is, theoretically and intuitively, the optimal decoding. An incoming wave direction of 10◦ does not correspond to a loudspeaker direction, and thus, all the loudspeakers will have non-zero weights, i.e., all the loudspeakers will be producing a signal and contributing to the generated sound field. This is the typical case when a real recorded sound field is being recreated, and the error in the recreated sound field with all the loudspeakers contributing to the generated sound field will be higher than a sound field generated by a single loudspeaker. Table 1 shows the error in the measured sound field against frequency for a circular area, with a radius of 2 wavelengths, centred at the centre of the circular loudspeaker array. The error in the sound field has been calculated for the standard decoding scheme as an RMS error between the ideal second-order order sound field and the measured sound field and has been expressed as a percentage of the RMS intensity of the ideal second-order sound field. Figure 3 shows the ideal and recorded sound fields for the sound field recording and reproduction system at the frequencies indicated when both the standard decoding scheme and the in-phase decoding scheme are used to recreate the sound field. It should be noted that the term ideal is used to refer to an ideal secondorder standard or in-phase decoding of a plane-wave sound field and not an ideal plane-wave sound field. The results show that the error in the measured sound field is less than
approximately 7 per cent for frequencies between 400 Hz and 9.0 kHz, and for frequencies at the lower and higher limits of the frequency range the error is approximately 18 per cent and 26 percent respectively. These results are in agreement with informal listening tests which were performed by the authors and other subjects. The consensus among expert listeners who have used this system to listen to various sound field recordings, including a concert hall recording and a meeting room recording, is that the system sounds and performs better than a two-dimensional first-order ambisonics system because it has noticeably greater spatial fidelity than the two-dimensional first-order ambisonics system.
4. DISCUSSION Research into high-order sound field recording and reproduction in the past, including work by Poletti, Li et al., Abhayapala and Ward, and Daniel et al., has focused on theoretical evaluations of such systems, and has not provided empirical evaluations of such systems. Bertet et al. presented a perceptual evaluation of a two-dimensional sound field recording, however, no measurements were made of the sound field that is reproduced by the system. The results presented here form an initial empirical evaluation of a sound field recording and reproduction system. A comparison of standard and in-phase decoding is presented, and it can be seen that standard decoding approximates a plane-wave sound field more accurately at the centre of the loudspeaker array. Although these results are initial, and improvements to the system will be made, the performance of the system is quite promising, and already quite close to that of an ideal second-order sound field recording and reproduction system. This evaluation shows that the existing theory can be used to build a practical second-order sound field recording and reproduction system that is suitable for use in most audio applications.
270
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on January 28, 2010 at 00:01 from IEEE Xplore. Restrictions apply.
−4
−4
−4
−4
−2
−2
−2
−2
0
0
0
0
2
2
2
2
4
4 −4
−2
0
2
4
4 −4
−2
Metres
0
2
4
4 −4
−2
Metres
(a) 150 Hz (Measured)
0
2
4
−4
(b) 150 Hz (Ideal)
(c) 150 Hz (Measured, Inphase)
−0.5
−0.5
0
0
0
0
0.5
0.5
0.5
0.5
0
0.5
−0.5
0
0.5
−0.5
Metres
(e) 700 Hz (Measured)
0.5
−0.5
Metres
(f) 700 Hz (Ideal)
−0.15
0
−0.15
−0.1
−0.1
−0.1
−0.05
−0.05
−0.05
0
0
0
0
0.05
0.05
0.05
0.05
0.1
0.1
0.1
0.1
0.15 0
0.15
0.1
−0.1
Metres
0
0.1
0.15 −0.1
Metres
(i) 4000 Hz (Measured)
0
0.1
−0.1
Metres
(j) 4000 Hz (Ideal)
(k) 4000 Hz (Measured, In-phase) −0.04
−0.04
−0.02
−0.02
−0.02
−0.02
0
0
0
0
0.02
0.02
0.02
0.02
0
0.02
Metres
(m) 16000 Hz (Measured)
0.04
0.04 −0.04
−0.02
0
0.02
0.04 −0.04
0.04
Metres
−0.02
0
0.02
0.04
Metres
(n) 16000 Hz (Ideal)
0.1
(l) 4000 Hz (Ideal, In-phase)
−0.04
−0.02
0
Metres
−0.04
0.04 −0.04
0.5
−0.15
−0.1
−0.1
0
(h) 700 Hz (Ideal, In-phase)
−0.05
0.15
4
Metres
(g) 700 Hz (Measured, In-phase)
−0.15
2
(d) 150 Hz (Ideal, In-phase)
−0.5
Metres
0
Metres
−0.5
−0.5
−2
Metres
(o) 16000 Hz (Measured, In-phase)
0.04 −0.04
−0.02
0
0.02
0.04
Metres
(p) 16000 Hz (Ideal, In-phase)
Fig. 3. This figure shows, for the indicated frequencies, a comparison of the measured sound field and the ideal second-order sound field for both the standard decoding scheme and the in-phase decoding scheme. The lighter colour denotes a high pressure and the darker colour denotes a lower pressure. Note that the size of the area plotted is inversely proportional to the frequency, and thus becomes smaller with increasing frequency.
271
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on January 28, 2010 at 00:01 from IEEE Xplore. Restrictions apply.
5. CONCLUSION We have presented an initial empirical evaluation of a twodimensional second-order sound field recording and reproduction system which has been designed to operate over a frequency range of 50 Hz to 16 kHz. The performance of the sound field recording and reproduction system at the central listening point, from our initial findings, is very close to that of an ideal two-dimensional second-order sound field recording and reproduction system. Future work is planned to calibrate the loudspeakers in the circular loudspeaker array and the microphones in the circular microphone array, to measure the sound field in a number of locations within the circular microphone array, and to perform perceptual listening tests. 6. REFERENCES [1] M.A. Gerzon, “Periphony: With-height sound reproduction,” J. Audio Eng. Soc., vol. 21, no. 1, pp. 2–10, 1973. [2] D.B. Ward and T.D. Abhayapala, “Reproduction of a plane-wave sound field using an array of loudspeakers,” IEEE Trans. on Speech and Audio Proc., vol. 9, no. 6, pp. 697–707, 2001. [3] M.A. Poletti, “A unified theory of horizontal holographic sound systems,” J. Audio Eng. Soc., vol. 48, no. 12, pp. 1155–1182, Dec. 2000. [4] Z. Li, R. Duraiswami, and L.S. Davis, “Recording and reproducing high order surround auditory scenes for mixed and augmented reality,” in Proc. Third IEEE and ACM Int. Symposium on Mixed and Augmented Reality, 2004, pp. 240–249. [5] T.D. Abhayapala and D.B. Ward, “Theory and design of high order sound field microphones using spherical microphone array,” in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Proc., 2002, vol. 2, pp. 1949–1952. [6] J. Daniel, J. Rault, and J. Polack, “Ambisonics encoding of other audio formats for multiple listening conditions,” in 105th AES Convention, San Francisco, CA, USA, 1998. [7] S. Bertet, J. Daniel, E. Parizet, L. Gros, and O. Warusfel, “Investigation of the perceived spatial resolution of higher order ambisonic sound fields: A subjective evaluation involving virtual and real 3d microphones,” in Proc. 30th AES Int. Conf., Saariselka, Finland, 2007. [8] A. Parthy, C. Jin, and A. van Schaik, “Measured and theoretical performance comparison of a broadband circular microphone array,” in Proc. 31st AES Int. Conf., London, UK, 2007.
272
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on January 28, 2010 at 00:01 from IEEE Xplore. Restrictions apply.