Mean Opinion Score
In multimedia (audio, voice telephony, or video) especially when codecs are used to compress the bandwidth requirement (for example, of a digitized voice connection from the standard 64 kilobit/second PCM modulation), the mean opinion score (MOS) provides a numerical indication of the perceived quality of received media after compression and/or transmission. The MOS is expressed as a single number in the range 1 to 5, where 1 is lowest perceived audio quality, and 5 is the highest perceived audio quality measurement.
MOS tests for voice are specified by ITU-T recommendation P.800
The MOS is generated by averaging the results of a set of standard, subjective tests where a number of listeners rate the heard audio quality of test sentences read aloud by both male and female speakers over the communications medium being tested. A listener is required to give each sentence a rating using the following rating scheme:
MOS | Quality | Impairment |
---|---|---|
5 | Excellent | Imperceptible |
4 | Good | Perceptible but not annoying |
3 | Fair | Slightly annoying |
2 | Poor | Annoying |
1 | Bad | Very annoying |
The MOS is the arithmetic mean of all the individual scores, and can range from 1 (worst) to 5 (best).
Compressor/decompressor (codec) systems and digital signal processing (DSP) are commonly used in voice communications, and can be configured to conserve bandwidth, but there is a trade-off between voice quality and bandwidth conservation. The best codecs provide the most bandwidth conservation while producing the least degradation of voice quality. Bandwidth can be measured quantitatively, but voice quality requires human interpretation, although estimates of voice quality can be made by automatic test systems.
A similar process can be used to evaluate subjective video quality.
As an example, the following are mean opinion scores for one implementation of different codecs:
Codec | Data rate [kbit/s] |
Mean opinion score (MOS) |
---|---|---|
G.711 (ISDN) | 64 | 4.3 |
iLBC | 15.2 | 4.14 |
AMR | 12.2 | 4.14 |
G.729 | 8 | 3.92 |
G.723.1 r63 | 6.3 | 3.9 |
GSM EFR | 12.2 | 3.8 |
G.726 ADPCM | 32 | 3.8 |
G.729a | 8 | 3.7 |
G.723.1 r53 | 5.3 | 3.65 |
GSM FR | 12.2 | 3.5 |
A drawback of obtaining MOS estimations is that it may be more time-consuming and expensive as it requires hiring experts to make estimations. When a voice coding system is under development, or the developer has to test and compare a couple of audio systems, it's very important to have a possibility for a quick check.
Some suitable English-language phrases used for determining a MOS as suggested by ITU-T recommendation P.800 are:
- You will have to be very quiet.
- There was nothing to be seen.
- They worshipped wooden idols.
- I want a minute with the inspector.
- Did he need any money?
See also
- Subjective video quality
- MUSHRA ITU BS.1534 Recommendation
- PSQM Perceptual Speech Quality Measure (ITU-T P.861 - withdrawn and replaced with PESQ ITU-T P.862)
- PESQ Perceptual Evaluation of Speech Quality, is mechanism for automated assessment of the speech quality enjoyed by the user of a telephone system. It is standardised as ITU-T recommendation P.862 (02/01).
- PEVQ Perceptual Evaluation of Video Quality, a measurement algorithm for the automated assessment of video quality.
- PEAQ Perceptual Evaluation of Audio Quality, a measurement algorithm for the automated assessment of audio quality.