What is the quality threshold for recordings sounding better/truer?
I have a couple of "audiophile" friends who either obsess of vinyl or lossless FLAC files and invest a lot of money into special headphones and audio systems, and swear by the massive difference it makes when listening. There's an obvious difference between cheap earbuds and quality speakers, but when it comes to recording quality I've seen it claimed that, for MP3, after about 192 kbps no one can tell the difference in quality.
That experiment is a little informal. Have there been proper studies to determine how much detail humans can distinguish in audio and, more specifically, what is the quality threshold at which people can no longer distinguish two recordings of the same music? As Franck Dernoncourt pointed out, the bitrate correlates with different quality levels given different file formats and conversion methods, but I'd be happy with any info using a clearly defined format or quality metric.
This is related to one of our definition questions. The pros and cons of vinyl have been discussed on the Skeptics site and the differences between some file formats have been discussed on the Music site, so I'd like to focus more on bitrate and how the human ear works.
Interesting question. I cannot tell the difference once the bitrate is higher than 192. But I'm not sure this is a music question. Are the workings of the human ear not more suited to a site about biology?
@CarlH Maybe my title was still misleading. I'm not looking for a biological explanation, or any explanation really, just a statistical observation. As a Music Fan I'd like to know whether I'd be wasting my time seeking out super high quality files, or if it would improve my experience, etc.
@AndréStannek Please post your answers below. Comments are not designed for this, thanks.
@RobertCartaino thought it was more suitable for a comment since it was just my opinion. Anyway it's kind of hard to post it as an answer now, since it was deleted ;-)
As some folk already explained, the answer to this question is in first place dependent on the encoding format and and the equipment used to reproduce the sound. But beyond that, it will also be dependent on the individual's ability to interpret the sound. Hearing degrades with age, sounds that you could listen at 15 may be totally inaudible at 50. And of course there is the music itself. I listen mostly to progressive rock and the difference from 192 kb/s MP3 to CD is pretty evident on the HiFi. Were I to fancy other music genre it could be different.
@LuísdeSousa all true what you say about the source material. I think the presumption is that we are considering cases where the reproduction equipment and the hearing ability of the listener are not limiting factors. I was interested to read, though, that there are differences in the quality of decoders : http://mp3decoders.mp3-tech.org/objective.html
Unfortunately there seem to have been a large number of informal experiments, with various shortcomings such as using a limited range of source material, focusing on preference without also looking at ability to tell a difference, or making the files listened to available to test subjects (removing some or all of the 'blind' quality from the test) - but relatively few published 'scientific' studies.
Most lossy compression algorithms I know of are based on perceptual coding - a nicely detailed paper describing some of the principles (in the context of the development of PASC, an early technique) is at http://www.minidisc.org/MaskingPaper.html.
A few points from that paper :
the ear divides the sounds it receives into frequency bands, called 'critical bands'. Quiet sounds in the critical bands can be 'masked' by other, louder sounds in the same critical band. (this gives the opportunity for data reduction : if we can't hear a quieter sound, we don't need to encode it).
"There is no agreement as to the specific number of critical bands active simultaneously. Critical bands and their center frequencies are continuous, as opposed to having strict boundaries at specific frequency locations".
we have many more bands in the lower frequency range
In general, low sounds mask higher sounds
Sometimes a signal can be masked by a sound preceding it, called forward masking, or even by a sound following it, called backward masking
The masking threshold of a signal entering in one ear can be raised by a masker entering in the other ear.
Spatial location can have a negative effect on masking
Thinking about all those in combination, it's clear that the characteristics of the audio we're encoding are going to have a huge influence on the amount of data we can throw away and still keep the result subjectively indistinguishable from the original (or at least, not subjectively worse). So, even taking account file format and compression algorithm, another huge variable is the song we're encoding.
This means that as a general rule, taking a rough average of the results from the 'unscientific' tests we've read about may give as good an answer to your question as we can hope for without considering a particular song.
For what it's worth, my personal impressions, for music, on average:
- 128Kbps is usually just about good enough to enjoy the song, but is almost always noticeably worse than the original
- 192Kbps is always good enough to enjoy the song, and sometimes be indistinguishable from the original (or at least, not subjectively worse).
- 320 Kbps is close enough that I'm not putting any money on that I can distinguish it from the original in most cases; (but I would bet I can find some particular tracks where I can).
On yet another personal note : I listen to music through playlists, so I'm not usually aware what file format or bitrate I'm listening to. Sometimes I catch a bit of a song and I think "wow, that sounds good!" and I check the file - very often (85% of the time?) it is lossless or 320kbps; just occasionally it's 192. It's never less.
Those caveats regarding masking seem very important to factor in. Thinking about it, I would tend to agree that maybe rough results are the best we can get. Thank for the information!
The "best"/"optimal" bitrate threshold depends on the audio coding format, since the bit rate is simply the number of bits that are conveyed or processed per unit of time (typically in second). E.g. if you take a FLAC file, and converted it into a 44.1 kHz 16-bit two-channel WAV file, the latter will have a higher bitrate but will sound the same. It also depends on the loudspeakers as you mentioned, and other factors such as the song itself (e.g. rock vs. EDM).
Here are some examples of frequency analysis using Audacity that show the impact of the bitrate:
MP3 320 kbps from Beatport:
MP3 128 kbps from YouTube:
AAC 256 kbps from iTunes (they used to be 128 kbps!):
Frequency analysis is just one of many aspects to investigate the sound quality.
From personal experience, discussing with several EDM DJs, for club use the consensus was MP3 320 Kbps or above. If needs be, 192 and 256 Kbps are usable too but stay avoid from them if possible. Don't use anything below.
Some other ideas: When does playing 320 Kbps MP3 instead of FLAC matter?
This is excellent info and I obviously need to tweak my question to account for bitrate across different file formats. However, I don't think this actually provides an answer to the question -- the threshold where increased quality makes no difference to the listener. The answers to your related question don't really seem to have any science unfortunately.
There's some weird stuff going on below 100Hz in all those examples - makes me doubt the accuracy of the rest of the data.
I think these graphs need to be compared to a plot of frequency response for a typical human ear.