Spotify Audio Quality: A Scientificish Analysis

Preface

There’s a lot of noise out there about the audio quality of the music provided through the Spotify streaming service. This “article” aims to clarify some concepts and common misconceptions, and show a cold hard comparison of the raw audio data from a CD and from Spotify. Listening tests have also been performed and will be discussed. Questions, suggestions and corrections are very welcome.

Background

There are some controversy about which audio format is “the best”, but a great majority of the sane subset of the civilized population of the earth argue that it’s either a digital data source with sufficient resolution, or some other crazy shit like those old darn vinyls. Often, it boils down to a differing definition of “good sound”. Usually, people want the sound from the original source(s) to be captured, stored and replayed as correctly as possible. If you on the other hand like music with crackling, or pleasant distortion, then maybe you should go for vinyl. With vinyl, you’ll also end up with an orgy of retro design, and the possibility to endlessly show off to your friends. Digital sources are generally more stable and predictable, and although streaming your favorite music through your internet connection might seem unromantic, it is both convenient and advantageous.

Audio on ordinary CDs are encoded in necessary conformity with the Red Book standard. They have two channels (stereo), a sample rate of 44.1 kHz, and a 16 bit sample resolution. In short: Some really smart guys designed the Red Book standard, and they were not fucking around. I’m not saying the CD is perfect, but it’s damn good.

Spotify.com claims to be serving audio in compressed Ogg Vorbis format at a bitrate of 320kbit (premium subscription). Much can be said about this format, and compression in general, but for now I’ll just promise you it’s pretty great. A way too common mistake is the confusion of DRC (Dynamic Range Compression aka “compression”), and Audio data compression (aka “compression”). In most hi-fi settings, the former is usually a bitch. The latter is applied to increase convenience of transmission and/or storage, and does often compromise quality. Although DRC is sometimes used in music production and broadcasting, it has little to do with hi-fi and conservation of audio quality because it is rarely arises unless someone intentionally applies it. Please note that data compression does not always mean loss of quality (it’s scary how many people that does not know, or refuses to believe this). When compressing using a lossless compression codec like for example FLAC, all the audio samples are preserved perfectly.

My experience is that some people complain about the utterly “flat” or “horrible” sound from Spotify. Why would the people behind Spotify choose to degrade the quality when it’s so easy to preserve? We all know that they are compressing it for convenience, but are there any reasons to believe that any potential audible imperfections in the sound from Spotify are any other than the rather well-known compression artifacts of Ogg Vorbis?

Test setup

I ripped the 8th track from Kari Bremnes’ album Svarta Bjørn to uncompressed WAV, using Exact Audio Copy. If you have any objections to my approach to obtaining a clean copy, please read about Cyclic Redundancy Check before opening your pie hole.

I used the official Spotify client for Windows XP, and a third-party application called Replay Music to capture the audio data from Spotify. This application allegedly captures the audio digitally, so it should be rather safe to assume that it is a lossless process. If you have a premium subscription and want to try out high bitrate mode yourself, remember to enable it in the preferences in the Spotify client.

Because the CD contains uncompressed audio data, and Spotify are delivering compressed audio, significant differences in the waveforms were expected. As an experiment, I also tried to compress the CD audio myself to compare it with Spotify’s compressed version.

Comparison

Once the raw data from the sources are conveniently stored in WAV-files, the comparison can begin. GNU Octave and Steinberg Nuendo was used to compare, analyze and manipulate the data. A custom application was also written to browse and analyze the waveforms.

A first glance – Standard bitrate

After plotting the left channel from the two sources, significant differences in amplitude and timing became visible (Fig 1). There could be many reasons for the timing difference but I did not find it relevant to investigate it. The Spotify track could easily be distinguished from the original when doing plain listening tests, because it sounded much louder. I found this loudness differing rather strange. After matching the timing, I performed an RMS operation on each channel to determine the difference, and adjust it accordingly. This method is not the perfect way of doing loudness matching, but produced a good result (See: http://en.wikipedia.org/wiki/Loudness).

Fig 1: Different timing and amplitude found. Blue = CD, Green = Spotify (standard quality)

After cropping and scaling, the waveforms could be compared. An evident difference was expected, since compression significantly changes the waveform. It is however, hard to tell whether the differences visible in the plot are audible. Remember, the effects of lossy compression is an old story. If you want to compare compressed/uncompressed audio yourself, you can check out the audio samples on Xiph.Org Foundation’s webpage. Fig 2 shows a comparison of a few audio waves from the CD and from Spotify. The figure slightly suggests a loss of high frequency waves, which is not uncommon as a result of compression. However, I can only speculate about its audible severity.

Fig 2: Samples 100000 to 100500. Blue = CD, Green = Spotify

High bitrate

In terms of loudness, the result was very similar to the standard definition Spotify audio. When loading the files in Nuendo, the differences in the waveforms became even more apparent. The Spotify tracks were visibly louder, and when reduced to the CD level, it looked clipped/limited. The average RMS for the Spotify320kbit track was -14.02/-14.18dB and the CD having an average RMS of -15.66/-15.79dB.

Fig 3: Nuendo showing the waveforms of the different audio tracks. From top to bottom: CD, Ogg320kbit, OggQ5, SpotifyHQ, Spotify

It turns out Spotify has a volume normalization feature that tries to keep a consistent volume between tracks. This is actually a form of DRC. Understandably, this rather intrusive feature was not well received by some users, and was made optional in february 2009. To improve my comparison I decided to disable volume normalization.

If Spotify and I are getting our audio from the same clean source, I assumed that using the same compression codec probably would produce the same, or very similar output. I therefore compressed the audio data from the CD with Ogg Vorbis 320kbit and compared the output to the audio obtained from Spotify. Fig 4 shows a segment of the waveforms of the same track obtained in five different ways.  When disabling volume normalization, I found that the waveforms from my compressed file, the CD and Spotify are very similar both in regards to shape and amplitude.

Fig 4: Waveforms from five different sources. Red=CD, Green=Spotify, Yellow=SpotifyHQ, White=SpotifyHQ(normalization disabled), Blue=Ogg Vorbis 320kbit

Frequency spectrum analysis

When compressing audio, it is not uncommon to discard inaudible details of the wave, not necessarily preserving all frequencies equally. Because of this, it would come as no surprise to me if the frequency spectrum would differ slightly between my sources. However, some of the results were a bit alarming. Figure 5 and 6 shows the frequency analysis of a track obtained from a CD and from Spotify. It shows a knife sharp frequency cutoff in the Spotify audio. When I compressed the CD audio myself, the frequency spectrum hardly changed at all. However, when doing the same thing with two other tracks, the graphs looked just fine. Figure 7 and 8 shows the analysis of one of these tracks. When comparing the waveforms of those last two tracks with the CD audio, the results were even more similar than I had seen previously (See figure 9). I do not have any good theories for why the first track gave a strange result, and not the other two.

Fig 5: Frequency spectrum analysis of Kari Bremnes, Svarta Bjørn, track 8, CD audio

Fig 6: Frequency spectrum analysis of Kari Bremnes, Svarta Bjørn, track 8, Spotify high bitrate mode, no normalization

Fig 7: Frequency analysis of A-ha - Foot Of The Mountain captured from the CD

Fig 8: Frequency analysis of A-ha - Foot Of The Mountain captured from Spotify

Fig 9: A-ha - Foot Of The Mountain, waveform Red=CD, Green=Spotify

Blind listening tests

As a final test, I wanted to see if I could hear the differences between the sources. When doing listening tests, high-end equipment is a must. Lucky me was allowed to use my uncle’s top notch hardware. I used a pair of Sony MDR 7506 and Sennheiser HD 650 headphones, and a Benchmark DAC1 digitally connected to a PC with a coaxial cable. We did blind tests where we switched between two sources whenever we wanted, not knowing from which to which. Since this is one setup, two guys, one track, four ears, I’ll subjectively and unscientifically sum it up:

To hear the difference between the sources was:

  • CD versus Spotify: very hard, but doable
  • CD versus Spotify high bitrate: we gave up

Conclusion (Speculations and opinions)

Spotify’s standard definition audio sounds good, really good, but if you are into hi-fi I really think high bitrate is the way to go. I’m a sucker for good sound, but I think I’ll always be happy with 320kbit Ogg Vorbis. I strongly doubt that I’m able to hear the difference from a CD, and if you think that you can, you’re probably not very familiar with the placebo effect (do BLIND listening tests before you decide to disagree!).

CDs have a bitrate of 1,411.2kbit/s. That’s 4.4 times more than 320kbit/s. With FLAC lossless compression, the data rate can usually be reduced to 50-60% of the original, which would make CD quality audio end up somewhere around 706-847kbit/s. That is just 2-3 times more than the current bitrate. I think Spotify could easily make that small stretch to please our worried ears. Personally, I don’t care that much because I’m a big fan of Ogg Vorbis.

And remember: consider turning off volume normalization!


A.wav
B.wav