Downmixing surround audio to stereo

The standard way

Downmixing a 5.1 surround track with FFmpeg is quite easy. There is actually a best practice guide for this suggesting to use the simple command:

ffmpeg -i input.ext -ac 2 output.ext

Of course you could also directly downmix the audio in a video file with for example ffmpeg -i input.mkv -ac 2 -c:a aac output.mkv. This will downmix and encode the audio to a stereo AAC track.

The complicated way

Still there is some debate on whether this is the best way to go. Users argue that this will result in the audio volume being too low. So a suggestion was pulled from a doom9 thread to use the following command:

ffmpeg -i input.ext -c:v copy -ac 2 -af "pan=stereo|FL=FC+0.30*FL+0.30*BL|FR=FC+0.30*FR+0.30*BR" output.ext

FL=FC+0.30*FL+0.30*BL basically increases the volume by 30% of the FL (front right) and RL (back left). Then it adds those two channels to the left in the stereo mix. FR=FC+0.30*FR+0.30*BR equally does the same for the right side. So is this a better way to go? In theory this will increase both the level on the left and right channel but also make the center channel sound more quiet. Note that very often in movies the dialogue is located in the center. At worst this will make the dialogue more quiet. But enough theory… let our ears decide…


What I did was to try both commands and compare the results in Cubase. Then I did random solo on each track to see if I could hear any difference. Note that the track which was converted the “complicated” way was a little low so I pulled it up a little. I could immediately hear that the “standard way” was better… To me it seems like some users confuse loud volume with better audio quality. After all, louder volume reveals more details in the music, as long as it does not result in distortion. I know from a music production perspective that “louder is better” is still an argument for a lot of people. Anyway you can listen for yourself here: I am switching between each track quite randomly but you can see which track I have selected by watching the [S]-buttons (solo button).


To me the “complicated way” sounds more “flat” and lacks the spatial balance of the “standard way” so I think I will stick to that. Since the “standard” way produces exactly the same result as for example VLC Player or Cubase it seems more likely that this is the “correct way”, whatever that is.