Transcript
5 Audio
Introduction
Sound is a relatively new capability for PCs
Hardware requirements
Software can provide the functions of a recording studio, including multi-track recording, mixing and effects, on a desktop computer
Audio is an important element in most multimedia applications
SWE 423 - MULTIMEDIA SYSTEMS Dr. Abdallah Al-Sukairi - KFUPM
1
Digital Audio
Sampled sound
every nth fraction of a second, a sample of sound is taken and stored digitally (bits & bytes) it was proven mathematically in 1948, that you can accurately represent any analog signal with a digital sampling rate equal to twice the maximum frequency contained in the source
Analog-to-digital (ADC) converters
appeared early 1980’s in the telephone industry
Waveform (continuous) vs digital form (discontinues)
Sampling rate (frequency)
sampling rates of 11.025, 22.05, and 44.1 KHz (samples per second) are standard in the audio industry the higher the sampling rate, the better the fidelity
… Digital Audio
Sampling time
Sample resolution
the speed at which the ADC converts the amplitude to a numeric sample value
# of bits to represent the amplitude value 8-bit sampling provides only 256 levels 16-bit sampling provides 65,536 levels
Pulse Code Modulation (PCM)
sample values are stored sequentially in a file
SWE 423 - MULTIMEDIA SYSTEMS Dr. Abdallah Al-Sukairi - KFUPM
2
… Digital Audio
Rounding-off (quantization)
Clipping
produces unwanted background hissing noise
for amplitude greater than the intervals available produces distortion
Digital-to-analog (DAC) converters
the output of a DAC is a stepped wave “staircase” filters are used to smooth the wave
Optimum combination of sampling rate and resolution
Digital recording and playback
8-bit ADC & DAC at 11.025 KHz: 8-bit @ 22.05 KHz: 16-bit @ 44.1 KHz:
telephone-like quality AM radio quality CD-audio quality
… Digital Audio
Mono Audio
Stereo Audio
two channels double the space requirements
Quality of digital audio
Storage requirements (mono)
bytes per second = (sample rate * bits per sample) / 8 Stereo * 2
Size of 1 minute recording
44.1 KHz 22.05 KHz 22.05 KHz
SWE 423 - MULTIMEDIA SYSTEMS Dr. Abdallah Al-Sukairi - KFUPM
16-bit 16-bit 8-bit
Stereo Stereo Mono
10.58 Mb 5.29 Mb 1.32 Mb
3
Sound Boards
Sample size
Sample rate
8-bit, 16-bit, 24-bit
8, 11.025, 22.05, 44.1, 48 , 96 KHz
Amplifiers (watts per channel)
2, 4
Digital signal processor (DSP)
On-Board Connectors
Hardware full-duplex support enables simultaneous record and playback
… Sound Boards
MIDI Synthesis
FM synthesis sounds are generated from mathematical formulas Wavetable synthesis a stored bank of sampled notes recorded from actual instruments
Effects Engine
Environmental 3D Positional Audio
SWE 423 - MULTIMEDIA SYSTEMS Dr. Abdallah Al-Sukairi - KFUPM
4
Synthesized Audio
Synthesized Music
Musical Instrument Digital Interface (MIDI)
Creating sounds that resemble those of conventional musical instruments
a standard that allows music synthesizers from different manufactures to communicate with each other MIDI interface was first adopted in 1983 MIDI is both a hardware and a software specification what is performed on one instrument can be played by any other instrument standard 128 sounds (patches)
MIDI files are significantly smaller than wave files
MIDI
MIDI port
MIDI messages
status bytes or data bytes
MIDI channels
MIDI kit
MIDI mapper
MIDI sequencer
SWE 423 - MULTIMEDIA SYSTEMS Dr. Abdallah Al-Sukairi - KFUPM
5
… MIDI
Playing MIDI file
MIDI synthesis
FM synthesis
sounds are generated from mathematical formulas (algorithms) Yamaha OPL-III chipset
Wavetable synthesis
a stored bank of sampled notes recorded from actual instruments
Digital Signal Processor (DSP)
First appeared in TurtleBeach boards
Off-load CPU
Programmable
Hardware compression and decompression
Sound effects
Multi-function boards
SWE 423 - MULTIMEDIA SYSTEMS Dr. Abdallah Al-Sukairi - KFUPM
6
Speakers
Built-in amplifier that delivers 10-20 watts per channel
Two cones: Tweeter and Woofer
Headphone jacks
Separate bass and treble controls
Magnetically shielded
Three-piece speakers
stand alone subwoofer gives depth to low frequency sounds
Dolby Digital 5.1 surround sound
Microphones
Operating principle
Directionality
Dynamic Condenser
Omnidirectional Unidirectional Bidirectional
Specifications
Sensitivity Overload characteristics Linearity, or Distortion Frequency response Noise
Microphone placement
Studio techniques
Mixer
SWE 423 - MULTIMEDIA SYSTEMS Dr. Abdallah Al-Sukairi - KFUPM
7
Oversampling
Improve the apparent quality (fidelity) of sound
Faster rate of DAC than the original ADC sampling rate
Technique
4x, 8x, and 16x oversampling In case of 4x oversampling, three 0-value samples are inserted between each pair of original samples an interpolation filter calculates appropriate values and replaces the 0s
DirectX
A set of APIs that allow applications to access multimedia hardware
Components
DirectDraw, DirectSound, Direct3D, DirectPlay, DirectInput, DirectAnimation, DirectShow
DirectSound
DirectSound provides mixing of audio streams, hardware acceleration, and direct access to the sound device Enable application developers to take advantage of extended services offered by sound cards and their associated drivers
SWE 423 - MULTIMEDIA SYSTEMS Dr. Abdallah Al-Sukairi - KFUPM
8
The Red Book Standard
CD-audio music market
ISO 10149
16-bit @ 44.1 kHz allow accurate reproduction of all sounds that human can hear
File Format
Each of the three major platforms has its own sound file format: AIFF for MacOS, WAV for Windows, and AU for Unix
WAVE (.WAV) files
Several formats
Microsoft standard a header with information about the sampling process used to create the file, followed by a stream of digital sound data in stereo 16-bit wave file, every other pair of bytes contains the data for one channel
.voc, au, aif, snd, …
Many utilities convert other formats to .WAV
WAV ripper Streambox Ripper
SWE 423 - MULTIMEDIA SYSTEMS Dr. Abdallah Al-Sukairi - KFUPM
9
The Structure of a Wave File
WAV files are RIFF files
Resource Interchange File Format (RIFF)
used by Microsoft to store many types of multimedia resource files tagged file format chunk
chunk ID size data
4 bytes 4 bytes
Sample Wave File Position (Dec)
Size (bytes)
Content
Comment
0
4
‘RIFF’
4
4
27796
8
4
‘WAVE’
12
4
‘fmt ‘
next chunk id
16
4
16
format chunk size
20
2
1
PCM format
22
2
1
no of channels
24
4
22050
sampling rate
28
4
22050
bytes per second
file size - 8
32
2
1
bytes per sample
34
2
8
bits per sample
36
4
‘data’
next chunk id
40
4
27760
size of wave data
44
x
x
digitized audio data
SWE 423 - MULTIMEDIA SYSTEMS Dr. Abdallah Al-Sukairi - KFUPM
10
Waveform Audio Recording Techniques
Audio noise
mute all unused inputs move the audio card to the last slot in the bus away from the video card
Use a good analog audio mixer
Use the best microphone you can afford
The acoustic environment
Microphone placement
Capture with high quality
Analog recording
Cassette DAT
Digital Audio Editing
Mixing sound from more than one source
Adding effects such as: echo, reverberation, chorus, etc.
Looping to extend duration
Commercial waveform editing tools
Sound Forge Wave Studio
SWE 423 - MULTIMEDIA SYSTEMS Dr. Abdallah Al-Sukairi - KFUPM
11
… Digital Audio Editing
Manipulate the audio in your media files
Applying processes and effects
Audio filters are used to remove noise and unwanted frequency components Effects, such as reverb and envelope shaping are used to alter the quality of sounds Digital technology permits new kinds of alteration, including time stretching and pitch alteration
Text-to-Speech
Table-based
Rule-based
dictionary storing text and audio (PCM) for every word
no recording rules to convert text to a set of ‘sound descriptor’ sound descriptors are converted to digital audio signals Exceptions
AT&T Labs’ Natural Voices Text-to-Speech Engine
AT&T Natural Voices - the WAV File edition
SWE 423 - MULTIMEDIA SYSTEMS Dr. Abdallah Al-Sukairi - KFUPM
12
Voice Recognition
Talking to the Web
Problems
High hardware requirements Accuracy Disruptive nature of speech
Major Players
Dragon Systems: Naturally Speaking IBM: ViaVoice Lernout & Hauspie (L&H): Voice Xpress
5 Audio
5.2 Audio Compression
SWE 423 - MULTIMEDIA SYSTEMS Dr. Abdallah Al-Sukairi - KFUPM
13
Audio Compression
No standard technique for compressing and decompressing waveform audio files
Hardware implementation vs. software
Encoders and Decoders (Codec)
Compression ratio
Bitate: the average number of bits that one second of audio data will consume
For a digital audio signal from a CD, the bit-rate is 1411.2 kbps With MPEG-2 AAC, CD-like sound quality is achieved at 96 kbps
Lossless vs. Lossy
Proprietary vs. open codecs
… Audio Compression
Differential Pulse Code Modulation (Delta Modulation)
if the sampling rate is fast enough, the difference between two successive values might be no more than one bit (1+, 0-)
Adaptive Differential Pulse Code Modulation (ADPCM)
extension of Delta Modulation more than one bit to describe the difference 4 or 8 bits 4-bit ADPCM can provide the equivalent of about 12-bit PCM 8-bit ADPCM rivals 16-bit PCM
SWE 423 - MULTIMEDIA SYSTEMS Dr. Abdallah Al-Sukairi - KFUPM
14
MPEG
MPEG (pronounced M-peg), which stands for Moving Picture Experts Group
MPEG is a working group in a subcommittee of ISO/IEC in charge of developing international standards for compression, decompression, processing, and coded representation of moving pictures, audio, and their combination
ISO-MPEG Audio Layer-3 (IS 11172-3 and IS 13818-3) MP3 Audio coding project done by the Fraunhofer IIS-A starting 1987 Using MPEG audio, one may achieve a typical data reduction of
1:4 by Layer 1 (384 kbps for a stereo signal) 1:6...1:8 by Layer 2 (256..192 kbps for a stereo signal) 1:10...1:12 by Layer 3 (128..112 kbps for a stereo signal) still maintaining the original CD sound quality MPEG Audio Layers
Sound Quality sound quality
bandwidth
mode
bitrate
reduction ratio
telephone sound
2.5kHz
mono
8 kbps
96:1
better than shortwave
4.5 kHz
mono
16 kbps
48:1
better than AM radio
7.5 kHz
mono
32 kbps
24:1
similar to FM radio
11 kHz
stereo
56...64 kbps
26...24:1
near-CD
15 kHz
stereo
96 kbps
16:1
CD
>15 kHz
stereo
112..128kbps
14..12:1
SWE 423 - MULTIMEDIA SYSTEMS Dr. Abdallah Al-Sukairi - KFUPM
15
MPEG Details
Other MPEG Standards
MPEG-2 AAC (ISO/IEC 13818-7) provides
MPEG-4 (ISO/IEC 14496-3) provides
a very high-quality audio coding standard for 1 to 48 channels at sampling rates of 8 to 96 kHz, with multichannel, multilingual, and multiprogram capabilities AAC works at bitrates from 8 kbit/s for a monophonic speech signal up to in excess of 160 kbit/s/channel for very-high-quality coding that permits multiple encode/decode cycles
coding and composition of natural and synthetic audio objects scalability of the bitrate of an audio bitstream scalability of encoder or decoder complexity Structured Audio: A universal language for score-driven sound synthesis TTSI: An interface for text-to-speech conversion systems
MPEG-7 (ISO/IEC 15938) will provide
standardized descriptions and description schemes of audio structures and sound content a language to specify such descriptions and description schemes
SWE 423 - MULTIMEDIA SYSTEMS Dr. Abdallah Al-Sukairi - KFUPM
16
MPEG-2 AAC Details
5 Audio
5.3 Audio on the Web
SWE 423 - MULTIMEDIA SYSTEMS Dr. Abdallah Al-Sukairi - KFUPM
17
Technologies & Tools
Create
Distribute
Manage
Playback
Limited Bandwidth
Codecs to compress or encode audio for real-time or local playback over the Internet and corporate intranets
Audio Streaming
Streamed data is transmitted by a server application and received and played in real-time by client applications These applications can start playing back audio as soon as enough data has been received and stored in the receiving station’s buffer A streamed file is simultaneously downloaded and played, but leaves behind no physical file on the viewer's machine UNICAST BROADCAST MULTICAST
… Technologies & Tools
HOW IP MULTICASTING WORKS
MBone has been in place since 1992 and has grown to more than 2000 subnets the user instructs the computer's network card to listen to a particular IP address for the multicast The computer originating the multicast does not need to know who has decided to receive it The bulk of the work that needs to be done to enable multicasting is performed by the network's routers and the protocols they run To signal that they want to receive a multicast, clients join the group to which the multicast is directed (groups are dynamic)
SWE 423 - MULTIMEDIA SYSTEMS Dr. Abdallah Al-Sukairi - KFUPM
18
… Technologies & Tools
Major Players (proprietary solutions)
RealNetworks
Helix Universal Server RealOne player
Apple
QuickTime Streaming Server, QuickTime Broadcaster
QuickTime Player
Microsoft
Windows Media Services
Support and deliver live broadcasts and streaming-stored multimedia content Bit rates from 28 kbps to 10 Mbps Intelligent Streaming – ensures that users will receive the highest quality regardless of connection speed or network congestion
Windows Media Rights Manager (DRM) Windows Media Audio 8 (Near-CD quality at just 48 Kbps) Windows Media Audio 9 Windows Media Encoder Windows Media Player File Extensions
.WMA for files that include audio compressed with the Windows Media Audio codec Content compressed with other codecs should be stored in file and use the .ASF extension
… Technologies & Tools
Motion Picture Experts Group (MPEG)
MP3 (MPEG 1, layer 3) Open audio compression codec MP3 players
Windows Media-based content can be streamed over a network in two ways
Using a Windows Media server
The ideal way to stream content is from Windows 2000 Server running Microsoft Windows Media Services Provides features such as live broadcasting and intelligent streaming, which automatically adjusts the bit rate of each client stream according to current available bandwidth You can stream using the Microsoft Media Server (MMS) protocol, or Hypertext Transfer Protocol (HTTP)
Using a Web server
may be the best option if you plan to offer only a few audio clips
SWE 423 - MULTIMEDIA SYSTEMS Dr. Abdallah Al-Sukairi - KFUPM
19
Embedding Audio in Web Pages
Helper applications
Bowser plug-ins
Downloading and playing audio
LiveAudio