If you look around one of these facilities and try to find an audio fader or a video switcher, you will be disappointed because there aren't any. Nor is there anyone there to control a switcher or fader. In modern facilities, staff are managing data that are being stored on massive servers with the unbelievable hard drive storage of many terabytes. Programmed schedulers are running the playout for hundreds of TV channels. The monitoring that is being carried out is primarily to control the basic condition of the running video and audio. There isn't any quality monitoring in place.
In such an environment, how does one ensure that certain quality standards for audio and video are being maintained for each dedicated transmission or distribution network? This is a delicate issue because, in an automated facility, there is a need to achieve optimum results while keeping human effort to a minimum.
In terms of picture content, this isn't too difficult because throughout the production chain, the video signal is subject to many dedicated controls in order to keep it within legal transmission requirements. Video format conversion is commonplace, and various levels of performance are available. Video is data that is delivered in a compressed format for the digital domain, and all those compression devices are performing a number of video filters and algorithms to guarantee the best possible signal quality within an acceptable bandwidth for the transport stream.
However, when it comes to audio, things are different. Of course, there is technology on the market that is designed to convert the audio into the format required by the transport stream and to meet the technical specifications needed to guarantee the best sonic performance. But that is where the similarity stops because, with audio, there is no common overall technical specification that is designed to check or legalize the content.
Audio engineers do have the benefit of some technical recommendations, but sometimes this doesn't solve the problem. For example, if audio is coded into the digital domain, the highest possible value in level should be 0dBFS. But that basic recommendation doesn't really reflect the wide variety of different, practical ways in which one can deal with digital audio. Take, for instance, CD audio. At 16-bit audio resolution, CD audio uses almost all of the available coding space (theoretically 96dB system dynamic). This means that if a broadcaster is getting music from a CD, it will be controlled and mastered to reach 0dBFS as its maximum value. That's very different from a typical broadcast signal, which now uses -18dBFS or -20dBFS as its alignment level in order to keep it in line with the recommendations issued by international regulatory authorities such as the ITU, the EBU and ATSC. What this means for broadcasters is that audio content coming from different sources can have very different level conditions. Content from CD audio can be more than twice as loud as audio coming from a standard TV broadcast.
Differences in program loudness are painful
As an industry, one thing we are all realizing is that the level of audio sources used in broadcast transmission can vary wildly. Of course, nothing is transmitted that hasn't already been processed and quality-controlled, but we still have a number of issues to contend with relating to audio levels and audio control.
Over the last 10 years, broadcasters have begun to understand that technical-oriented level control doesn't necessarily solve their problems when it comes to delivering better quality audio transmission. Audio loudness is now a hot topic, and there have been many articles published that discuss this issue and give the background to it. In simple terms, all audio sources that have been processed to control loudness should deliver the same overall loudness impression. And proper loudness control is definitely improving the quality of audio in digital broadcasting systems.
But what remains an issue is how loudness control is applied in today's world of automated broadcasting. Broadcasters have no choice but to use integrated or external audio processing to perform this control. I can already hear the complaints that some people, most notably skilled audio engineers, will make in response to that statement. I know they will be asking how automated online loudness control can be pleasant to the ears and be done in a way that isn't detrimental to the audio.
I can understand their concerns, but in a world where automation is king, there is no other choice because broadcasters are not likely to install an audio booth with proper monitoring and fader control where someone can sit and perform the task manually. We have to recognize that some kind of automated audio control is required if broadcasters are to comply with the new loudness standards and recommendations and maintain the highest level of quality for their audiences.
Given that there is no other choice, all that remains to be discussed is what characteristics this online automated loudness control system should have.