Skip to Content

Dialog Detection

The First Step Toward Measuring Intelligibility

Detecting Dialog

Before diving into the various measurement techniques, one key requirement to consider is that many technologies used to assess dialog intelligibility rely on methods that determine whether dialog is present in the audio signal at any given moment, commonly referred to as “dialog separation.”

These methods are known by different names but refer to the same basic concept: detecting whether speech is present or absent in the signal.


Voice Activity Detection (VAD)

Speech Activity Detection (SAD)

Speech-gating

How it works

As you may know, LUFS - also referred to as Integrated Loudness (I) - is a loudness measurement based on the entire program, regardless of whether dialog is present.

By applying a Voice Activity Detector (VAD), this measurement can be separated into two distinct components:

A) A measurement that includes only segments containing dialog

B) A measurement that includes only segments without dialog

The VAD determines whether dialog is present in the signal and, based on this detection, generates two measurements: ID and IND.

ID: Dialog Loudness

This value represents the integrated loudness of all segments where dialog is detected.

IND: Non-dialog Loudness

This value represents the integrated loudness of all segments where no dialog is detected.


 
Typically, the VAD would concentrate only on channels where dialog is likely to occur, typically Left, Center and Right channels.
With RTW instruments, this is a setting that can be altered by the user:




EBU Tech 3343

The Manual Method

While automatic voice detection technologies are commonly used, dialog separation can be performed manually. However, this approach is both time-consuming and less consistent.

EBU Tech 3343 (chapter 9.4.1 Measurement of the Loudness-to-Dialog Ratio, LDR) outlines a manual procedure for dialog separation. 


Manual separation naturally leads to a less consistent result as it depends on the experience and diligence of individuals. Analysing the whole film to extract wide-range dialogue manually is an exhausting, time-consuming process. Therefore, spot-checking narrow-range dialogue is more feasible and should be performed in the following fashion.


  • Choose three on-screen dialogue sequences from the beginning, the middle and the end of the film (omit off-screen narration, radio voices and similar signals)
  • The selections should not contain music and/or competing sound effects
  • Each of the three sequences should be about 30 seconds long
  • Measure the individual loudness of the three sequences with ITU-R BS.1770
  • Take the average of the three loudness values to obtain narrow-range Dialogue Loudness

This process can  be performed manually using an RTW TouchMonitor. However, in practice, very few engineers are likely to follow this method due to its time-consuming nature. Instead, automatic detection and measurement are typically preferred.

EBU Tech 3343 acknowledges this trend:


"Automatic separation gives more consistency and repeatability but is dependent on the  separation algorithm. It is anticipated that algorithms using Artificial Intelligence (AI) can lead to increasingly precise results, reflecting the actual wide-range Dialogue Loudness."


EBU Tech 3343

Technologies for Detecting Dialog

If you prefer not to perform dialog separation manually, a range of technologies is available for automatic dialog detection.

Dolby Dialog Intelligence™

The most established VAD technology is Dolby’s Dialogue Intelligence™. Widely adopted by Dolby and other industry players, it has become an industry standard and forms the basis of the VAD approach used by RTW.

The algorithm was eventually made publicly available.

A key drawback of Dolby’s Dialogue Intelligence™ is that it introduces a noticeable latency of around 2 seconds. To compensate for this, RTW instruments provide a delay compensation feature.


RTW Implementation

RTW meters use the Dolby Dialogue Intelligence™ speech-gating technology to determine dialog presence for the Dialog Gated Loudness Measurements.

Dolby Dialogue Intelligence™-based instruments are found in several RTW instruments, such as the Dialog Detector, the Numeric Instruments as well as bar-graphs and charts.



Loudness Numerics
Loudness Numerics can display a range of measurements based on Dolby Dialogue Intelligence™:
  • Sort-term Loudness, Dialog Gated
  • Integrated Loudness,  Dialog Gated
  • Loudness Range, Dialog Gated
  • Loudness to Dialog Ratio
  • Background to Dialog Ratio
Dialog Detector
The Dialog Detector is a simple instrument that lights up when dialog is detected.

It can be used to automate other devices by transmitting OSC events when dialog is detected.



BDR
Background to Dialog Ratio.
  • Background Loudness
  • Dialog Loudness
Loudness Chart
Background to Dialog Ratio.
  • Sort-term Loudness, Dialog Gated

Fraunhofer SAD

A more advanced method is integrated into Fraunhofer’s Listening Effort technology. RTW uses Fraunhofers technology in the Dialog Intelligence Instrument, Bargraphs and Numeric instruments.


Other technologies

There are other, both commercial and noncommercial solutions on the market. One of the best performing options today is the Silero VAD.





Loudness Based

Using Loudness Measurements to Qualify Dialog Intelligence


Challenges

Where Do Dialog Intelligibility Problems Originate?