Transcript
Introduction to Audio Processing 学号:2013329620026 姓名:张珂杰 班级:13 计科一班 How to make sense of almost any effects processor: Know the basic types and find the bare bones parameters. Controls that most processes will have: Bypass. This switch lets the original signal through so you can quickly compare the processed version to the original. Wet/dry mix. “Wet” means the processed signal; “dry” means the unprocessed original signal. Presets. Factory-suggested settings offer good starting points, but if you always rely on presets without tweaking them at all, then you’re not going to have an original sound and you might not really understand what the process is actually doing to your sound. Output gain. Some processes make a sound a lot louder or quieter as a side effect. This control allows you to adjust for that and keep your sound at a usable loudness. Input gain: Some processes have different results when they operate on a loud sound or a quiet sound. This lets you “tune” your input to get the desired response from the process. You’ll probably need to adjust the Output Gain in the opposite direction. Time Domain processes manipulate the sound by working only with what we see in a waveform view: amplitude changing over time. Not surprisingly, many time domain process with affect the amplitude or the time of a sound. However, with some cleverness, the spectrum of a sound can be affected, too. The most commonly known and used processes are time domain processes. Delays simply copy a sound and play it again later. You can get a wide variety of effects from it, though, if you’re clever. Delay period or time. This is how much you want to delay a sound in seconds or milliseconds. From 0–30ms, we don’t hear separate copies of a sound, just slightly different (usually anemic) timbres, due to phase cancellation. Up to 100ms, a sound might sound thicker or “doubled.” Over 100ms, the delayed sound will start to sound separate from the original. Natural acoustic echoes are usually 100–500ms. Anything longer starts to sound like a “call and response” echo of a second musical voice. Feedback. One delayed copy can be nice, but if you feed the output of the delay process back into its own input, you get an endless stream of delayed copies (e.g., “HELLO-ELLO-Ello-ello…”). The feedback control determines how much of the original sound to let back in (i.e., how loud). It’s usually shown as a percentage, but sometimes appears in decibels. One-hundred precent or 0dB means the full output is copied (and will never die away). You can make a looper with a delay time of a few seconds or more with a feedback at or just below 100%. Special trick: Comb Filters. We mentioned phase cancellation with short delay periods. Add some very high feedback (almost but not quite 100%), and several frequencies will be cut out and others will resonate, leaving a metallic, ringing harmonic series with its fundamental frequency based on the delay period. The resulting spectrum looks like a comb. Now you can start to see how we can get other effects like reverbs and filters by starting with a simple time-domain delay. Reverberation is basically an infinite mass of delays bouncing all around a room. It is often
used to suggest physical surroundings and distance of a sound, to smooth out roughness in a sound (like a timid shaky vocalist), or fill silences to keep a mix from feeling too sparse. Decay time dictates how long it takes for a sound to die away (a fall of 60dB is a standard point to measure). Longer decay times suggest more reflective surfaces in a space. Pre-delay or early reflections. In natural reverberation, you will hear a few very quick “slap” echoes (from nearby surfaces) before the smooth rush of reverberation arrives. You might be able to set the pre-delay, the time before the smooth reverberation comes in. Longer pre-delay suggests larger rooms. Alternatively, you might be offered a control for the duration or loudness of early reflections. Diffusion. Since reverb is just a mess of individual echoes, a room with a few flat surfaces may have a “lumpier” reverb and a room with many irregular surfaces may have a “smoother” reverb. Diffusion makes it smoother. Damping. It’s natural for walls, floors, etc. to absorb high frequencies faster than low frequencies, so it’s common to simulate this with a filter that reduces high frequencies (see filters below). More damping suggests more absorbent materials in a room (like carpet or people). Dynamics effects change the loudness of a sound, or change based on the loudness of their input. The two basic types are gates and limiters. An expander is a more forgiving form of gate, and a compressor is a more forgiving form of limiter. These are often used for as utilities, e.g., to fix problems. Threshold. This is the loudness above/below which the process jumps into action. A gate will silence any sound when it falls below the threshold; this is useful for cutting out unwanted background noise or tightening up sloppy attacks or releases. A limiter will try to turn down a sound once its loudness reaches a threshold in order to keep it from getting any louder; this is useful for adding sustain (by smashing the attack), keeping a signal from clipping (going over 0dB and getting cut off more harshly), or keeping a voice in the foreground (by reducing the dynamic range and making it all louder with output gain). Attack, Release, and Hold. These control how quickly the processes take effect and go back to normal. Hold allows you to avoid overly fidgety effects by making it stay active for a certain amount of time before it can change back. Ratio. Gates and limiters are not forgiving: all the sound or none (gates) or nothing any louder than the given threshold (limiters). This is all-or-nothing with no grey in the middle. An compressor is like a limiter, but it allows some grey area above the threshold by attenuating (reducing the loudness) of sound gradually. A 2:1 compression ratio means for any sound louder than the threshold, it attenuates it to bring it halfway back to the threshold; a 10:1 compression ratio is more extreme; a limiter is basically a compressor with a very very high ratio. Conversely, an expander is a gate with some grey area: an expansion ratio of 1:2 means that instead of silencing a sound below the threshold, it just cuts the volume in half. Key or Sidechain. This is really cool. You can often make dynamics processes listen to a different sound from the one they’re actually processing. If your bassist has sloppy attacks but your bass drum is solid, put a gate on the bass and set the key or sidechain to listen to the drum; the gate will use the bass drum’s sound to shape the bass guitar’s sound—this is a great way to “carve” rhythms out of sustained sounds, too. A ducker is just a compressor with a sidechain: automate balance between lead guitar and vocalist by putting a compressor on the guitar with a
sidechain listening to the vocal. When the vocalist enters, the compressor will make the guitar quieter and let it return to full volume when the vocalist takes a breath between phrases. Knee. Well, when you graph out input loudness versus output loudness, there’s always going to be a bend at the threshold that looks like a knee. You usually get to pick between a “hard” and “soft” knee, with a soft knee fudging the behavior around the threshold a bit to avoid suddenly starting or stopping changes in volume. Filters/Equalizers (EQ) shape the spectrum of a sound by emphasizing or deemphasizing certain frequency bands. Think of a running average: it sums and divides the last few numbers (samples) in order to smooth out any abrupt changes (i.e., reduce high frequencies). Filters can be used to reduce unwanted noise, help different sounds blend together, or adjust their character by emphasizing different frequency ranges. Filter type: what range of frequencies will it affect, and what will it do? High-pass, low-pass, and band-pass filters leave a certain range of frequencies (called a band) unchanged, and they eliminate everything else. High shelving, low shelving, and peak/notch filters boost or reduce a range of frequencies and leave others unchanged. Cutoff or center frequency: set the upper/lower limit or center of the frequencies affected. Q, bandwidth, slope, roll-off, or resonance: Think of a band-pass filter. You might know the center frequency, but you don’t yet know what range of frequencies above and below the center to pass. There are side effects to using time domain processes to affect frequency, though: there’s a gradual transition between the affected frequencies and the others. The slope or roll-off might let you control the steepness of the filter directly. A narrower bandwidth is going to have a steeper slope. The Q (quality) factor is the inverse of bandwidth (1/x), so high Q means a narrow bandwidth. Finally, with some filter formulas will cause a sound to resonate or ringing sound at the cutoff/center frequency. High Q or steep slope will result in greater resonance. Other common time domain processes you may see: Distortion, overdrive, fuzz, amp modeler, saturation. These all chop off the tops of your waveforms in natural, gritty, lovable ways that physical objects like guitar amplifiers or analog magnetic tape do. That is, they leave rough edges instead of harshly chopping of the peaks. That results in a less harsh sound with some interesting character, character that responds differently to loud and soft sounds, sounds with wide dynamic ranges or narrow ones. Sometimes you’re offered controls for an amount of overdrive (amplification) and a filter to reduce harsh high frequencies (as in nature). Other processes will let you pretend to choose guitar amp models, mic types, and mic positions. Amplitude Modulation and Ring Modulation. Now this is interesting. It turns out that when you fade a sound in and out very fast (more than 20 times per second—sound familiar?) then it no longer sounds like changes in volume or pulsing rhythm. It changes the timbre in a surprising way. The original partials are canceled out replaced by two copies: one set higher than the original, and one set lower, like two frequency shifts (see below). But that’s just when you change the volume in a sine wave pattern. Ring modulation does more. Remember that changing volume is just multiplying: times zero = silence, time one = full volume. So, what I just described is like multiplying a sound by a single singe wave (called amplitude modulation). Ring modulation (named for the shape of the analog electrical circuit that does this) multiplies a signal by itself, creating a very rich and complicated web of inharmonic partials that all change in natural ways in response to the original sound. That said, it’s commonly used for aliens, robots, and radio signals
with a lot of interference. Flanger, chorus, phaser. No big whoop. See Modulation below. Bit munging or degrading. It’s a digital signal, so why not mess directly with its digitalness? These processes will let you simulate lower sample rates (making muddier sounds and aliasing) or bit depths (resulting in harsh background noise). With extreme settings, it sounds like a cheap toy or losing at Space Invaders. Subtle amounts of aliasing, however, can add a delicate shimmer to a sound, and gradually reducing bit depth (especially with a reverb to smooth out sudden changes in bit depth) can make a sound “morph” into or out from pure noise. Frequency Domain processes can see and manipulate the partials in a sound’s spectrum, because the sound as gone through a FFT (Fast Fourier Transform) to break the sound into sine waves with different frequencies and amplitudes. This is like boiling a sound down to its “recipe” (analysis)—then you can tweak that recipe and remake the sound differently (resynthesis). This allows for some very creative and unexpected results. Time vs frequency, pitch versus rhythm, FFT size, window size, or frequency resolution. When you perform an FFT, you must have a tradeoff between time resolution and frequency resolution—one or the other must be blurred (much like Heisenberg’s Uncertainty Principle, if you know what that is). You might be asked to favor pitch accuracy or rhythmic accuracy, or you might be asked for a window size in samples (larger windows blur time more but allow more refined pitch processing) or frequency resolution (finer frequency resolution means more time blurring). Either your attacks or your timbre has to get fuzzy. That’s the trade-of for this kind of control over your sound, so use frequency domain effects strategically, or use them intentionally for the artifacts they produce. If you’re offered an overlap factor (meaning how many analysis windows overlap), you might be able to make up for some time blurring and get away with larger windows. Time stretch or pitch shift/transpose. Once you have the recipe of the sound, it’s simple to make it all just slower or just higher without changing the other. Extreme time stretch values will make the windows audible: you’ll hear a stuttering or stair-stepped effect. Pitch shifters might allow for harmonic correction, meaning it will try to impose the original spectral envelope onto the new shifted partials in an attempt to avoid the “chipmunk” effect and offer a more realistic illusion of the original sound source simply singing higher. Frequency shift is like pitch shifting with one critical difference: Whereas in pitch shifting you multiply all frequencies by a given number (e.g., doubling them=transposing up an octave), frequency shifting means adding the same value to every partial. This means the farther you frequency shift a harmonic series, the less it will be a harmonic series (integer multiples of the fundamental, or evenly spaced including 0Hz, e.g., 100,200,300… becomes 110,210,310…). This means it won’t sound as strongly pitched, and it will have strong inharmonic partials, making a sound more metallic or synthetic sounding. Pitch correction. This is a flaky process with well-known artifacts. The software tries to detect harmonic series in a sound to determine what pitches are present, then round all partials in that series to the nearest pitch on a piano keyboard (the tuning system called 12-tone equal temperament). Software like this is often wrong or fidgety, resulting in quirky jumps (to be avoided or embraced strategically). Fully correcting pitches results in unnaturally static sounds (voices sound robotic instead of human), so there is often a control for how much to change a pitch (e.g., splitting the difference) or an acceptable margin before the process takes control.
Spectral filters have no slope (as time domain filters do), just razor-precision. They’re often not worth the artifacts unless that’s what you’re after. Spectral gates can be refined noise reduction tools. They apply a loudness threshold (like time domain gates) to each frequency band separately, not the whole signal, so you can eliminate or reduce all the weak partials (which are likely to be unwanted noise). Higher thresholds reduce your sound to a kind of pointillistic “spectral splatter”—an interesting way to extract pitched motives from noisy sounds. Some gates let you keep what’s below the threshold and silence the strong partials, leaving a fascinating residue of the sound and any reverb tails. Convolution is like multiplying the spectrum of one sound by another, like imposing the spectral envelope on the partials of another sound. It could be done instant by instant, resulting in a cross-synthesis hybrid of the two sounds. Convolution reverb (or sometimes just moving convolution) allows you to use one sound as the reverb tail for every FFT frame of a sound, allowing you to capture the reverberation of one space and apply it to another sound. Alternatively, if you use a rhythmic burst of noise instead of a reverb tail, the input sound will be copied in the same rhythm! Variations Are Inevitable. Every process is going to give you slightly different controls. Physical metaphors. A reverb might give you an imaginary room to mold instead of controlling the parameters listed above directly. Stereo pairs. Some delays operate in stereo and define one channel’s parameters as percentages of the other channel’s parameters. Combinations. Many elaborate processes will offer useful combinations of common processes. For example, it is common for a reverb to have a damping function (really a filter to reduce high frequencies) or a gate (to cut off long tails). A multiband compressor uses filters to divide your sound into separate frequency bands so it can apply different compression settings to each band. An aural exciter is usually just a high-pass filter run through distortion (to add odd harmonics) and combined with the original signal. “Modulation” (which just means “changing”). Some processes will offer to give your sound life by “wiggling knobs” for you automatically. They might let you pick a simple waveform as a pattern (e.g., sine, square) or random numbers. You’re likely to be offered controls for the rate of change and the modulation depth (how much to change, usually the amplitude of the wave you picked—but this time “amplitude” means the difference between maximum and minimum values and doesn’t necessarily mean loudness). A flanger is just a delay modulated in a sine wave pattern. A chorus is just a delay modulated randomly. A phaser is a very very steep notch filter modulated in a sine wave pattern (just to get the phase distortion effects of steep filter slopes). WARNING: Use repetitive modulation patterns (like sine waves) sparingly, or your music will sound nauseating, predictable, unnatural, or dated (70s or 80s). Keep modulation depths subtle, rates slow and out of sync with any metric or rhythmic pattern, and try breaking up predictable patterns with a gate. Envelope. A process might let you sculpt parameters over time, but it will need to know when to start, e.g., when it detects an attack of a new sound, or when the loudness of a sound crosses a given threshold. See below for a more specific and common example. Envelope Follower or just Envelope or just Follower. “Envelope” really just means “shape” in the abstract sense, but it is often used to refer to an amplitude envelope. An amplitude envelope follower will listen to the loudness of a sound and offer to use that to control some parameter of
a processor. You’ll usually need to specify the range of values to use in mapping input loudness to the parameter at hand. When available, this is a terrific way to get some natural, musical variety in your sound automatically (in contrast to modulating parameters by sine waves, mentioned above).