Preview only show first 10 pages with watermark. For full document please download

Noise Cancelling Microphones For Automatic Speech Recognition

   EMBED


Share

Transcript

Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Noise cancelling microphones for automatic speech recognition ยจ H. Sohlstrom, journal: volume: number: year: pages: STL-QPSR 19 4 1978 030-038 http://www.speech.kth.se/qpsr STL-QPSR 4/1978 111. SPEECH RECOGNITION A. NOISE CANCELLING MICROPHONES FOR AUTOMATIC SPEECH RECOGNITION * H. Sohlstrom Abstract Automatic speech recognition a s well a s man-to-man communications in noisy environments require noise cancelling microphones . A number of such microphones a r e studied. Special attention i s given to a contact microphone. The test procedure i s described and the results a r e discussed. The contact microphone i s found to give better sound quality than expected. Introduction Automatic Speech Recognition systems a r e leaving the laboratory 6 stage. Several systems a r e today commercially available . If a system is to be useful i t s operation must be unaffected by background noise. This can be achieved in three ways. i s to reduce the noise level. The f i r s t - and best - way The second i s to use a noise cancelling microphone. The third is to extract the phonetic information from the waveform i n a way that makes the system immune to noise 2 . Noise cancelling microphones have been used for a long time in man-to-man communications. The situation i s , however, somewhat different in speech recognition systems depending on the special s e t of parameters adopted 7 . In the present study several noise cancelling microphones were tried both for man-to-man communication and speech recognition. One of the microphones was of the contact type, i . e . it was to be fixed upon the speaker in o r d e r to pick up vibrations rather than sound. This type of microphone i s used in very noisy environments, for example in aircrafts. microphone Because of the different principle used in this special interest w a s paid to i t . The speech recognition system was a phonetically oriented system developed by Mats Blomberg and Kjell Elenius. A short description of the system i s given in Blomberg and Elenius, 1978 1 . * Thesis work 1977 under supervision of Mats Blomberg and Kjell Elenius. STL-QPSR 4/1978 of measurements the microphones were tested in the way they a r e designed to work - a t the right distance and with speech. Speech is by no means a regular source of sound. To allow c o r - r e c t comparisons we made simultaneously recordings from two microphones of a person.,reading a short test. The recordings were made i n an anechoic room. One of the microphones was a pressure sensitive dynamic microphone from Sennheiser, MD 2 11. This microphone was the "referencetthigh quality microphone with which a l l the other were compared. The other recording was made with the microphone under test. The average amplitude distribution a s a function of frequency for the two recordings was then computed. This was done with our CD 1700 computer and a 51-channel spectrum analyzer. The differences between the distributions could then be interpreted a s deviations from ideal responses. To permit analysis of the separate speech sounds, . a s they were transduced by the microphones, recordings of VCV and CVG words were made. Also in this c a s e each microphone was compared to the reference microphone. F o r the contact microphone the measurements in the second group proved much m o r e relevant. Its response was very much dependent upon i t s positio.1 on the speaker. Several positions were tried. Two positions were found to be representative, each i n i t s own way. The two positions were on the forehead and on the neck just under the chin and halfway towards the e a r , Fig. III-A-1. If the microphone had been positioned closer to the larynx it would only have picked up a signal dominated by the fundamental and much of the formant pattern would have been lost. The *lrd gr_ouq of the measurements were performance tests using the speech recognition system. The recognition system works with isolated words. A standard vocabulary of 41 words was chosen. The words in this vocabulary a r e the words used in Swedish, when spelling out words over the telephone &dam, z e r t i l , C e s a r , e t c . ) , the numbers 0 - 9 i n Swedish and the words "miss" and "mellanslag" (space). This vocabulary was STL-QPSR 4/1978 33. spoken five t i m e s x i t h each microphone. .A :eference recording was made simultaneously with the MD 2 11 microphone mentioned e a r l i e r . Recordings were made both in an anechoic room and in a normal room where tapes with different kinds of noise were being played back. 5 The noise level was up to 90 dB (lin) . It should be mentioned that the words were read by the author, unfortunately i n a rather hoarse voice. This accounts for the overall low recognition r a t e s . Before the actual recognition t e s t with each microphone, the syst e m had to "listent1 to a number of repetitions of the vocabulary i n o r d e r to extract statistical information about formant freuuencies, sound duration etc. This information i s used i n the recognition process. This procedure will be r e f e r r e d to a s "learning1'. When the system was to operate in noise, the learning could be done either in silence o r with the noise used. Both c a s e s were studied. Re sults and discussion The measurement on the Se_nthsiger - -MD - -4Zi-did not give any s u r ~ r i s i n gresults. The frequency response f o r different directions to the sound source i s shown in Fig. 111-A-2. The rejection of sounds from the r e a r of the microphone i s a bit uneven over the frequency range, but a s this microphone i s not designed f o r u s e in noisy environments this i s of little importance. As can be seen f r o m Fig. 111-A-2 the response r i s e s some 10 dB f r o m the low end of the spectrum to the high. This changes a s the microphone i s moved closer to the sound source. At a distance of 1 dm the response is well balanced. The t e s t s with speech do not give any m o r e information about the microphone. The microphone i s a good example of a dynamic cardiod mic- rophone. __ Sennheiser headset r n i c r o ~ h o n e is a differential microphone, designed to be used v e r y near the s p e a k e r ' s mouth. If i t i s used f a r f r o m the sound source i t has a very uneven frequency response, rising steeply with frequency. Sennheiser has published diagrams showing a r a t h e r flat response at a distance of 1 cm. F i g . 111-A- I . Briiel 8 K j e r Microphone position on the neck. Potentiometer ~ a n ~ e : a - d ~e~ ctifiar:AM!!!~ower Lirn. Freq.: fO Hz Copenhagen Rec. No.:Date: 7706/6 S ~ g n . : .0 . 10 20 Hz 50 F i g . 111-A-2. 100 200 500 1L'OO 2000 5000 F r e q u e n c y r e s p o n s e , long d i s t a n c e . MD 4 2 1 . 10000 20000 STL-QPSR 4/5978 F o r our microphone this could not be duplicated with the test setup used. The sound source was a Briiel & Kjaer Artificial Voice 4215 modified to make the "mouth opening" s m a l l e r , more like the recent 421 9. The sound p r e s s u r e was held constant with the aid of a measuring microphone, controlling the output from the generator. The result obtained can be seen in Fig. 111-A-3. The response i s he& i s apparently a strong resonance in the micro- phone a t about 8 kHz. The rejection of sounds from distant sources f a r from flat. i s good. I t varies from 10 to 30 dB through the audible range. The average spectrum f o r the short text confirms the impression f r o m Fig. 111-A-3, see Fig. IXI-A-4. A closer examination of the different speech sounds revealed some breath noises but this is almost unavoidable with close talking microphones. The breath noises showed up a s an increased low f r e - quency level. -The - -KUC - - -7001 - - -microphone - - - - - has a frequency response that looks far better than that of the Sennheiser microphone. Fig. III-A-5 shows the response 1.0 cm from the sound source. The rejection of distant sound sources i s about the same a s that of the Sennheiser headset microphone. More breath noises could be heard with this microphone than with the preceding one. this. There a r e two possible explanations for This microphone has a better low frequency response and the noises can therefore more easily be heard. The second possible reason is that i t is i n fact only a "naked" microphone capsule without any protective screening against the a i r s t r e a m from the mouth. The average spectrum confirms that this microphone gives a good r e p r o duction of speech. As mentioned above, the position of the contact - - - -microphone - - - - - -has a g r e a t influence on the results. The response for speech transmitted through the tissues of the neck o r face i s selectively frequency dependent. The damping seems to be greatest in the soft tissues, especial- l y for high frequencies. The bone structure of the face s e e m s to have a much lower damping. This agrees well with what has been reported by others. In Fig. III-A-7 the average speech spectrum with the microphone on the forehead i s compared with the reference condition. The Potentiometer R a n o e : X & . dB Rectifier:.. RMS- ~ o w e rh n l . Freq.: .__- Hz 40-1 - M r 0 . ~f/llldr,':rv 3 &,