Choosing suitable audio equipment for oral history interviews: perspectives of an audio engineer

Peter Kolomitsev is an audio engineer working at the State Library of South Australia whom we met in previous posts, talking about his job and providing his top tips for making excellent quality oral history recordings. In this post, I include Peter’s advice about what equipment is suitable for oral history interviews and why, and what equipment should be avoided at all costs.

The human voice and hearing

Before choosing equipment, we need to understand a bit about what sounds we can hear, and the range of the human voice.

Peter explained that the human voice has a range from approximately 100 – 150 Hz (the lower notes) up to 4 000 Hz (the higher sounds). But some of the sounds we make, such as the ‘s’ sounds, have a much higher frequency of up to 10 000 Hz. To capture the sound of the human voice faithfully, we therefore need to record this full range of frequencies. This means that the microphones and the recorders themselves used for oral history need to be sensitive to this range of frequencies.

As for human hearing, we hear in the range from 20 Hz to 20 000 Hz. The human voice fits exactly in the middle of this range, as shown in the figure below.

The range of human hearing from low frequencies (pitches) on the left to high frequencies on the right. Our sensitivity to different pitches varies with the intensity of the sound (on the vertical axis). The outer curve shows the limits of audible sounds, whereas the inner two curves show the range of frequencies and intensities for music and human speech. Courtesy of University of Illinois at Urbana-Champaign.

The two important measures in digital audio recording: bit-depth and sampling rate

Peter then explained two of the key measures of digital audio recording equipment.

The way that digital audio works is you have two measures. You have bit depth and sampling rate.

Digital audio is, as it implies, audio that uses digits, and the digits are a series of switches – ons and offs, or one and zero, so one represents on, zero represents off. The sound is collected by a series of digital words, which is a packet of ones and zeroes. The more ones and zeroes used to describe the audio, the higher the dynamic range. What we use for compact discs is a 16-bit word. They are a 16-bit digital audio, so that is each digital word has 16 ones and zeroes. We have higher resolution now, so we use 24-bit in our computers and our sound recorders – a lot of the really good ones. Most of the decent quality ones have up to 24-bit audio. When digital audio started coming about, it was often 12-bit or even lower, 8-bit. There is a lot more distortion in that, because there’s not as many ones and zeroes to describe the audio…

The other thing is our sampling rate, which is the number of times each one of those words is captured per second. So each second, there’s a certain number of audio digital snapshots that get taken… the sampling rate of compact disc is 44.1 kHz, or 44 100 times a second a 16-bit word is captured to describe the audio of that moment.”

How do these measures relate to our hearing?

Digital audio recorders such as this Sound Devices Mix-Pre3 enable you to record at 24-bit and 48 kHz, as suggested for oral history interviews.

Our hearing range is from about 20 Hz, which is very, very low – in fact we probably don’t even hear it, it’s more of a felt frequency most of the time unless it’s very loud – up to 20 000 Hz (or 20 kHz). And that’s kind of a nominal thing because we’re not machines, we’re humans, so everyone’s a little bit different, and as we age the higher frequencies diminish.

So you go, ‘OK, then why are we having a sampling rate that’s twice, in fact more than twice as high as our hearing range?’ There’s an effect called the Nyquist frequency… The Nyquist frequency states that the highest audible frequency that you can record is half of the sampling rate. So now, that starts to make sense; compact discs – their sampling frequency is just a little bit above twice our audio range. So that means that the frequency response of compact discs are just above where we hear up to. So it captures all of the frequencies.”

So now we know that it is important to record using a sampling frequency of at least twice the frequency you want to hear, which for humans is 20 000 Hz. This means a sampling frequency of at least 40 000 Hz.

How does bit-depth relate to what we hear?

The bit-depth will give us our dynamic range, so the difference between the very lowest sound pressure we can record – our noise floor – and our absolute maximum loudest sound. So with compact discs, at 16-bit, that equates to a dynamic range of about 96 dB or thereabouts. That’s roughly enough to be able to capture a really good dynamic amount of music – like a big dynamic orchestra you’d be able to capture without the inherent noise of the system being louder than the quietest space in the room.”

What about cheap digital recorders, such as office recorders or smart phones? What are they like in terms of sampling frequency and bit-depth?

Example of a cheap office recorder, not suitable for oral history interviews

With the lower quality recorders, they might have a sampling rate of – a common one is…22 kHz. Some of them even have a sampling rate of about 4 kHz, which means that the highest frequency that it an record at is 2 000 Hz, and I said before about our voice range, being in that 100-150 Hz up to 4000 Hz range. That’s the body of it. That’s the main bit which we really need to hear for clarity. We still generate sounds up above that – those sizzly s’s – the sibilance of those s’s – that’s up 10 000 Hz. So the recorder inherently is not recording anything above 2 000 Hz, which really makes it sound muffled.”

What problems are caused by low sampling frequencies?

Peter went on to explain that a low sampling frequency:

…also has another effect of making the audio sound really, really crunchy because the number of slices are much fewer, so it adds a real nasty, ringy sound. Also, there’s a lot of distortion that’s caused around that Nyquist frequency, and that distortion, if it’s sitting right in the middle of the sound that we want to record, i.e. our voices, you really hear that nasty, ringy, crunchy distortion. So that’s why those poorer quality recorders might sound muffled or crunchy.”

Peter made the following very important point about recording at lower sampling frequencies:

If you’ve not captured anything above 2 000 Hz, you can’t get it back. What you don’t capture you can’t get back. There’s no magic software that will recreate the audio that returns what’s been missed out.”

What are some of the other limitations of cheap recorders?

Aside from their lower sampling frequencies and bit-depth, Peter explained some of the other limitations of cheap recorders that result in poor quality recordings.

1. Microphone quality

Those little office dictators, or the phone microphones, are made to a budget. So your phone might be a $500 item that’s got to make phone calls, do text, get the internet, have a touch screen so you can play games, have a case so they don’t break, and they need a microphone. And the reason you’ve got the microphone in there is to make telephone calls.

So it’s just one part of the cost of the whole thing. It’s a budget microphone and because they’re designed to make telephone calls, which is voice, they use all of those filtering techniques to filter out background noise, and they’re designed to be quite close to the mouth. So the filtering techniques are filtering out the very low frequencies that we don’t need to hear for telephony, and the very high frequencies which we don’t need to hear, because we know that between 100 Hz and 4 kHz is what we need to hear for basic clarity. Beyond that, the phone companies can throw that stuff away because we don’t really super-need to hear it. So they’re often electronically filtered, and may be designed to not be responsive in those areas. And the same with those little [office] note takers. Once again, the quality of the microphone is just one factor in the whole build of the item. And then they’re also designed as a dictation thing, so they may have some filtering in there.”

2. Built-in microphone

They might have a built-in microphone and you start to have an issue about where to place it, so if you want to capture both the interviewer and the interviewee the thing to do is plonk it down in between the two of you as opposed to getting right close to the person’s mouth. As the sound-capturing device moves further away from the sound source… the sound that we want to capture gets proportionally quieter compared to the sounds around. So the background sounds get captured louder, effectively.”

Using a single, centrally placed microphone reduces the signal-to-noise ratio, reducing the quality of the recordings. It is much better to use two external microphones placed close to each speaker.

3. Filtering effects and dynamic range limiters

Another thing that those sorts of [recorders] will do, is they’ll use dynamic range limiters that will squish the sound into a very small dynamic range. I talked before about the noise floor and the ceiling. So basically they’ve got a very poor signal-to-noise ratio, with the signal being the thing that we want to capture and the noise being the thing that’s inherent in any of these [devices]. So they will use electronics to squish the sound into a smaller dynamic range, from the quietest sound that the sound that we want to capture is making, to the loudest, so that it doesn’t cause additional electronic distortion from the unit – because they make enough distortion as it is. But the effect of that is, that also makes the really quiet sounds louder as well, compared to the sound we want to have. As it’s squishing the signal that we want down, it’s also effectively turning up all the background sound. So once that’s happened, it’s hard to get rid of those sounds because they’re already in the recording.”

Key points to know about choosing digital audio equipment for interviewing

If you are looking for equipment to use for oral history interviews, the key points are:

Example of a lapel microphone suitable for oral history recordings. This one is made by Rode.

Choose a high quality digital recorder that enables you to record at a minimum of 24-bit bit-depth and 48 kHz sampling frequency.
Choose a recorder that enables you to plug in two external microphones separately (with separate gain controls) to make a stereo recording.
Make sure the recorder can record as uncompressed WAV files.
Use high quality external microphones that have a wide frequency response range to record the whole range of the human voice.
Do not use cheap office recorders or smart phones, for all the reasons explained above. If the promotional material for the device does not mention any of the technical details (such as bit-depth or sampling frequency), beware! Information about good quality recorders of the type used in oral history will include all the necessary technical information.

With thanks to Peter Kolomitsev for his insights into digital audio equipment!

For more help on choosing digital audio equipment for oral history interviews, check this post.

Quotes from the oral history interview with Peter Kolomitsev used with permission of Peter Kolomitsev.