Difference between revisions of "User:Mjb/FFmpeg"

From Offset
Jump to navigationJump to search
(starting new cluster / downmix)
(Downmix 7.1 to 5.1 and Dolby Pro Logic II stereo)
Line 849: Line 849:
 
For example, maybe if the picture is still slightly too dim, try a nonlinear gamma correction by using the <code>curves</code> filter to make midtones at 90% intensity become 100%—e.g. <code>curves=all='0/0 0.9/1 1/1'</code>. I've tried this, though, and feel it's insufficient on its own...bright white looks better, but the rest of the picture still needs some work.
 
For example, maybe if the picture is still slightly too dim, try a nonlinear gamma correction by using the <code>curves</code> filter to make midtones at 90% intensity become 100%—e.g. <code>curves=all='0/0 0.9/1 1/1'</code>. I've tried this, though, and feel it's insufficient on its own...bright white looks better, but the rest of the picture still needs some work.
  
==Downmix 7.1 to 5.1 and Dolby Pro Logic II stereo==
+
==Downmix 7.1 to 5.1 and stereo==
  
 
Standard 7.1 audio has 4 surround speakers (side left, rear left, side right, rear right), where side is optimally 90° and no more than 110°. Rear is way in back, 135° to 150°.
 
Standard 7.1 audio has 4 surround speakers (side left, rear left, side right, rear right), where side is optimally 90° and no more than 110°. Rear is way in back, 135° to 150°.
Line 857: Line 857:
 
FFmpeg will convert 7.1 to "5.1(side)" automatically just by setting the number of output channels to 6:
 
FFmpeg will convert 7.1 to "5.1(side)" automatically just by setting the number of output channels to 6:
  
     -c:a eac3 -b:a 480k -ac 6 -metadata:s:a title="Dolby Digital 5.1"
+
     -c:a eac3 -b:a 480k -ac 6 -metadata:s:a title="Dolby Digital Plus 5.1"
  
Instead of converting multichannel to plain stereo, consider instead converting it to Dolby Pro Logic II, which is stereo, but processed in a way that makes it possible to recover 5.1 from it.
+
As for stereo, the default settings apply the official Dolby downmix algorithm, which discards the LFE channel:
 +
 
 +
    -c:a aac -b:a 256k -ac 2
 +
 
 +
It works well enough, but I am never totally happy with it, as compared to an official stereo mix.
 +
 
 +
Another option some people like is to convert to Dolby Pro Logic II, which is stereo, but processed in a way that makes it possible to recover 5.1 from it:
  
 
     -c:a aac -b:a 256k -filter:a "aresample=matrix_encoding=dplii:ochl=stereo" -metadata:s:a title="Dolby Pro Logic II"
 
     -c:a aac -b:a 256k -filter:a "aresample=matrix_encoding=dplii:ochl=stereo" -metadata:s:a title="Dolby Pro Logic II"
 +
 +
My limited experience with Pro Logic is that while the 5.1 recovery is decent, the stereo listening experience occasionally ends up suboptimal, depending on what all is going on in the stereo field. For example, at about 30:15 into ''Tron Legacy'', the drums during the crowd noise as Sam enters the game arena sound absolutely ''dreadful''. And it is questionable why you'd want Pro Logic II when any receiver that can handle it surely also handles Dolby Digital Plus.
  
 
TODO: command line to do both at the same time from one source
 
TODO: command line to do both at the same time from one source
  
 
[[Category:Video]]
 
[[Category:Video]]

Revision as of 01:41, 27 December 2023

Installation on Linux via Snap

Install the latest dev version:

 sudo snap install ffmpeg --edge

I initially got "permission denied" errors when trying to use ffmpeg to read any files from a shared folder in a Lubuntu VM hosted on Windows. Turns out I had to do this first, even though there was no removable media involved (it's a quirk of Snap, I guess):

 snap connect ffmpeg:removable-media

General syntax

Input files are generally going to be audio and/or video, sometimes text. Audio and video files are often in "container" formats, e.g. an MP4 container file might have one H.264 video stream in it and several AAC audio streams. Common containers are MP4, MKV, AVI, VOB.

Always specify an input file:

 ffmpeg -i inputfile

If there are spaces in the file path, put it in quotes:

 ffmpeg -i "inputfile"

You can specify multiple input files to make use of streams from them (e.g. video from one, audio from another):

 ffmpeg -i inputfile0 -i inputfile1

If a file has multiple streams to choose from, by default you will only get, from all the files:

  1. the one highest quality video stream, and
  2. the one highest quality audio stream.

You can override this and specify which streams you want by using -map n:x options, where n is the input file number and x is the stream number (both start at zero). The order of the options is mirrored in the output, and any other streams in the input file are ignored. For example, this makes the first 3 output streams be based on file 0 stream 1, file 0 stream 5, and file 1 stream 4, in that order:

 ffmpeg -i inputfile0 -i inputfile1 -map 0:1 -map 0:5 -map 1:4

You can specify streams by type, also optionally by number:

 ffmpeg -i inputfile0 -i inputfile1 -map 0:v -map 1:a:2 -map_chapters 0 -map_metadata 0

To get all streams (not limiting the number of audio and video streams), you must explicitly specify them, or just specify which file number to get them all from:

 ffmpeg -i inputfile0 -map 0

More options for mapping are explained very well in the FFmpeg wiki.

You can set new stream title metadata:

 ... -metadata:s:v:0 title="movie title here" -metadata:s:a:0 title="5.1 Surround"

For batch processing, you can use features of the Windows command shell (e.g. convert all AVIs to MP4s):

 for /f "usebackq delims==" %f in (`dir /b *.avi`) do ffmpeg -i "%f" other ffmpeg options here "%~nf.mp4"

When you specify only input files, you get info about the file contents, and an admonishment to specify an output file.

When you specify an output file and don't use any other options to specify the format you want, FFmpeg converts the input file(s) to one output file, using a format based on the output file's filename extension:

 ffmpeg -i inputfile.wav outputfile.mp3

Default behavior is transcoding the streams to something probably of much lower quality than you want.

You can specify the format and parameters for the transcoding:

 ffmpeg -i inputfile.wav -b:a 320k outputfile.mp3

You can add filters when transcoding:

 ffmpeg -i inputfile.mp4 -vf "filter1,filter2" outputfile.mp4

You can remux streams (put them in a different container) without transcoding them:

 ffmpeg -i inputfile.mp4 -vcodec copy -acodec copy -scodec copy outputfile.mkv

When specifying codecs for video, audio and subtitles, you can use a shorter syntax:

 ffmpeg -i inputfile.mp4 -c:v copy -c:a copy -c:s copy outputfile.mkv

Here is an example of a remux which preserves all metadata (including chapters) as it copies streams 0, 3, 5 and 7:

 ffmpeg -i inputfile.mkv -map_metadata 0 -map_chapters 0 -map 0:0 -map 0:3 -map 0:5 -map 0:7 -c:v copy -c:a copy -c:s copy outputfile.mkv

You can force video or audio to be omitted by using -map and not including any streams of that type, or you can use -vn or -an:

 ffmpeg -i inputfile.mkv -an silentoutputfile.mkv

If you get "Timestamps are unset in a packet" warnings, tell FFmpeg to generate new timestamps:

 ffmpeg -fflags +genpts -i ...

If you get "Starting new cluster due to timestamp" warnings, work around it with this:

 ffmpeg -max_interleave_delta 0 ...

Minimize output:

 ffmpeg -hide_banner ...
 ffmpeg -loglevel panic ...

Remove closed captions from H.264 video:

 ffmpeg -i input.mkv -codec copy -bsf:v "filter_units=remove_types=6" output.mkv

On Windows, encoding can really hog CPU, disrupting the functionality of other running tasks. If you don't mind it running slower, you can change the priority of an already-running task to Low via the details tab of the Task Manager, or you can initially launch it with low priority in its own window. It can also help to confine it to a single thread:

 start /low ffmpeg -threads 1 ...

Containers and codecs

Common a/v containers are AVI, MKV, MP4, MPEG-1, VOB, MPEG-TS, MPEG-PS, WAV. Inside the container you can have audio streams, video streams, and subtitles. WAV is audio only. Common audio stream codecs are MP2, AAC, AC-3, PCM. Common video stream codecs are MPEG-2, H.264, AVC.

Only certain combinations are widely supported. For example, I have better luck using MKV instead of MP4 as the container for H.264 video with multichannel AC-3 (Dolby Digital) audio. But if using multichannel AAC audio, the MP4 container is preferred.

Losslessly join videos

Let's say you want to losslessly concatenate two or more videos end-to-end, and the video & audio codecs and attributes (e.g. frame size) are all the same.

Preferred method for MPEG-1, MPEG-2 PS, or DV

You can use the concat filter to concatenate input files. Ensure the files are in the current directory and the names do not contain spaces. Specify input framerate to prevent timestamp confusion:

 ffmpeg -r ntsc -i "concat:VTS_01_1.VOB|VTS_01_2.VOB|VTS_01_3.VOB" -c copy outputfile

Unfortunately this does not support wildcards. Here is a workaround which assumes you have PowerShell installed (it comes with Windows 7 and newer):

 for /f %x in ('PowerShell -Command "[char]0x0022 + 'concat:' + ((gci -include *.VOB -name) -join '|') + [char]0x0022"') do ffmpeg -r ntsc -i %x -c copy joined.avi

I have not tested this with filenames containing spaces or apostrophes.

Generic method for any format

 (echo file input1.m4v & echo file input2.m4v) > "%temp%\flist.txt" & ffmpeg -safe 0 -f concat -i "%temp%\flist.txt" -c copy -y outputfile.mp4 & del "%temp%\flist.txt"

or (seems to be failing in recent versions of FFmpeg):

 (echo file input1.m4v & echo file input2.m4v) | ffmpeg -safe 0 -f concat -protocol_whitelist file,pipe,crypto -i - -c copy -y outputfile.mp4

In the file list, you must put the file names in single quotes if they contain spaces or weird characters.

This info is adapted from an answer at StackOverflow and a tip on the ffmpeg-users list.

Here is a variation with an automatically generated file list on Windows, using PowerShell:

 PowerShell -Command "set-location -LiteralPath '"%cd%"'; gci -include *.avi -Name | foreach {""""file '$_'"""}"" > flist.txt & ffmpeg -safe 0 -f concat -i flist.txt -c copy -y outputfile.avi & del flist.txt

or (seems to be failing in recent versions of FFmpeg):

 PowerShell -Command "set-location -LiteralPath '"%cd%"'; gci -include *.avi -Name | foreach {""""file '$_'"""}"" | ffmpeg -safe 0 -f concat -protocol_whitelist file,pipe,crypto -i - -c copy -y outputfile.avi

When the videos have different codecs, you need to do the concatenation as part of filter chain, which is more complicated. I suggest starting here: https://trac.ffmpeg.org/wiki/Concatenate#differentcodec

Extract a temporal portion of a video

Let's say you just want to take the portion from 37:07.5 to 41:30 (a duration of 04:22.5). Either of these will work:

 ffmpeg -i inputfile -c copy -ss 37:07.5 -to 41:30 outputfile
 ffmpeg -i inputfile -c copy -ss 37:07.5 -t 4:22.5 outputfile

In theory it should not matter if the -ss # is before the input file or after. When it is after, it means to start at the beginning of the file, and ignore the input until the designated point is reached. This can be very slow with large files. When the -ss is before the input file, it means to quickly seek to that point and pretend that point is 0:00 or slightly before (if there's no keyframe at that spot); this resetting of timestamps affects the subsequent -to:

 ffmpeg -ss 37:07.5 -i inputfile -c copy -to 4:22.5 outputfile

The video will not start until the first key frame after the cut point, so you may well end up with audio cut right where you want it, but video not starting until a second or two later.

I have had bad luck with this method. It seems the parameters get mixed up sometimes? I don't know. Just stick with the slow method.

If you will be using the clip with the concat demuxer, add -avoid_negative_ts 1.

Further explanation (sorta): https://trac.ffmpeg.org/wiki/Seeking

Remove a temporal portion of a video

You have to create a file for each segment to keep, then concatenate the segments. For example:

 ffmpeg -to 0:17:33 -i inputfile -map 0 -c copy s1.mkv
 ffmpeg -ss 0:17:36 -i inputfile -map 0 -avoid_negative_ts make_zero -c copy s2.mkv
 echo file s1.mkv > flist.txt && echo file s2.mkv >> flist.txt
 ffmpeg -f concat -safe 0 -i flist.txt -map 0 -c copy outputfile

As mentioned previously, seeking may not be accurate with some codecs; experiment with later timestamps to get the best cut points.

Rotate 180 degrees

When you hold a camera phone the wrong way, it will just put a 180° rotation flag in the metadata, which not all players will support.

To change just the rotation flag, e.g. to 0°:

ffmpeg -i inputfile -map_metadata 0 -metadata:s:v rotate="0" -codec copy outputfile

To rotate the actual video, chain the hflip and vflip filters:

ffmpeg -i inputfile -vf "vflip,hflip" outputfile

The rotation flag will not be changed when you do this, so you can set it afterward (assumes video is stream # 0):

ffmpeg -i inputfile -c copy -metadata:s:v:0 rotate=0 outputfile

Or you can do both at the same time (untested):

ffmpeg -i inputfile -vf "vflip,hflip" -metadata:s:v:0 rotate=0 outputfile

Here's a more robust example (worked for me):

 ffmpeg -i input.mp4 -metadata:s:v rotate="0" -vf "hflip,vflip" -c:v libx264 -acodec copy output.mp4

The c:v libx264 means to output H.264 video, which is what the input will be if it is from an iPhone. For more H.264 ffmpeg tips, see https://trac.ffmpeg.org/wiki/Encode/H.264

References:

Fix aspect ratio

Lossless options just use metadata to tell the player to horizontally shrink or stretch as needed.

Aspect ratio issues

For maximum compatibility, you must set the DAR in the container (MP4 or MKV) metadata, and you must set the DAR or PAR in the video stream metadata, if the stream format supports it. Some players (like my Roku) only use what is in the video stream, others (like VLC, but maybe not always) will prefer what is in the container.

Instead of PAR, FFmpeg and H.264 call the pixel aspect ratio SAR—the S standing for Sample, not Storage—so SAR is what I will say here.

FFmpeg reports the stream SAR & DAR in square brackets, and if the container does not match the stream, this will be followed by the container's values, unbracketed.

If you want to go down the rabbit hole of precision...

PAL is based on a 625-line analog signal in which there is a 576-line "active" picture (actually a pair of 288-line fields); the rest of the lines, as well as about 15% of the left side of the line and about 1.5% of the right side, are for control signals and are not supposed to ever be visible. The first line of one field and the last line of the other are only half-lines of picture data. An analog CRT would have displayed only an "action safe" area of the active picture area: probably the centermost 90%.

Digital video capture devices, as used when authoring PAL DVDs, sample the full active picture such that it is 702×576, plus 1 pixel of horizontal blanking on each side, for a resulting frame size of 704×576, since both dimensions are neatly divisible by 16 that way. This is typically output in a 720×576 frame by just adding 8 more pixels of black pillarboxing to the left and right sides. Or, it might have been captured at 720×576 in the first place, with the blanking signal filling up the sides rather than artificial pillarboxing.

The standard DAR of 4:3 or 16:9 is supposed to apply to the 702×576 active area, but due to different conversion techniques, it may apply to the 704×576 area or the full 720×576 area.

Most people and playback devices don't worry about it. They just stretch the full 720×576 to 768×576 (4:3) or 1024×576 (16:9). However, some people crop the 8px blanking-interval pillarbox off of each side, and then stretch the remaining 704×576 to 768×576 (4:3) or 1024×576 (16:9). For example, with FFmpeg, -vf crop=704:480:8:0 along with -aspect 4/3. Or, instead of cropping, they tweak the SAR & DAR to get the output to be stretched a little more, so that the active area is 4:3 and the left & right edges are treated like overscan, outside the bounds of a 4:3 analog CRT display—e.g., -aspect 15/11 as shown at left.

NTSC has the same situation, but the numbers are different. It is based on a 525-line signal in which the active picture is nominally 486 lines (two 243-line fields), but in reality can be anything from 480 to 486 lines. The first line of one field and the last line of the other are only half-lines of picture data. The first full video line is normally reserved for Closed Captioning data. Several lines at the bottom are often just VCR head noise. Digital video capture devices sample a 486-line analog active picture area such that it is 710.85×486, but output a "digital active area" which is normally 720×480, with some horizontal blanking signal included on the left & right sides, and 6 lines simply being removed (encoders vary in how they go about it) so that the result is divisible by 16.

The standard DAR of 4:3 or 16:9 is supposed to apply to the original 710.85×486 active area, but due to different conversion techniques, it may apply to a width of 711, 712, or 720, by either 480 or 486 lines. Most people and playback devices just shrink or stretch the full 720×480 to 640×480 (4:3) or 854×480 (16:9). Some instead crop 8px of blanking from each side to get a 704px width, then resize or tag as 4:3 for a final display of 640×480; or leave it at 720px uncropped, and resize or tag as 15:11 for 654×480 display, accepting that whatever is in the leftmost & rightmost 8px will be beyond the edges of a true 4:3 CRT screen.

The problem with blindly following anyone's advice in this regard is sometimes it still comes out slightly wrong due to differences and quirks of capturing and conversion on its way from the original source into that 720-wide frame you're starting with. In my opinion, just (doing what DVDs do) tagging the 720×480 for 4:3 display is the simplest solution to get within 3% of perfect, which is better than the 9%–12% error of the raw 720-wide picture.

A couple of excellent references:

The bitstream filters for H.264 stream metadata, and HEVC (H.265) stream metadata allow you to rewrite the SAR.

Some common values:

  • Given NTSC DVD 720×480 frame, for 4:3 DAR, use 8:9 SAR. Output will be 640×480.
  • Given NTSC DVD 720×480 frame, for 16:9 DAR, use 32:27 SAR. Output will be 853⅓×480 (probably 854×480).
  • Given PAL DVD 720×576 frame, for 4:3 DAR, use 16:15 SAR. Output will be 768×576.
  • Given PAL DVD 720×576 frame, for 16:9 DAR, use 64:45 SAR. Output will be 1024×576.

From what I've seen, these 4:3 and 16:9 DAR values are always set in the MPEG-2 video streams on DVDs.

For what it's worth, Roku Media Player on a Roku Stick, connected to a 1080p TV, likes these values. A 720×480 or 720×576 H.264 MP4 tagged in the stream as 4:3 is scaled such that the entire frame displays at 4:3 (1440×1080), and circles are perfectly round.

Despite the BT.601-compliant active image being 710¹⁷⁄₂₀ pixels wide (NTSC) or 702 pixels wide (PAL), an active-picture width of 704 pixels is nearly ubiquitous in the ATSC, DV, MPEG, and DVD standards, as well as the output of many video capture cards. (That said, capture card firmware or drivers can be programmed with different image scaling settings, so there can be variation from device to device.)

DVD content is therefore sometimes in a fully-filled 704×480 or 704×576 frame, and you can use the same DAR and SAR values as above. Similarly, if a 720px-wide capture has ~8px of blank space on the left and right sides, it may be prudent to crop to a width of 704px first, and then use the values above. Some DVD players are said to handle 4:3-tagged content this way, e.g. by making the inner 704-wide portion be 4:3 by scaling the entire 720px-wide image to be slightly wider than 4:3, the blank sides disappearing beyond the edges of an SD display.

If you don't want to crop 720 to 704, you can leave it at 720, and use the values below:

  • Given 704×480 active picture in a 720×480 frame (8 pixels blank on each side), for 4:3 active picture area DAR, use 10:11 SAR and 15:11 DAR. Output will be 654⁶⁄₁₁×480, with 640×480 active picture area.
  • Given 704×480 active picture in a 720×480 frame (8 pixels blank on each side), for 16:9 active picture area DAR, use 40:33 SAR and 20:11 DAR. Output will be 872⁸⁄₁₁×480, with 853⅓×480 active picture area.
  • Given 702×576 active picture in a 720×576 frame (9 pixels blank on each side), for ~4:3 active picture area DAR, use 59:54 SAR and 295:216 DAR. Output will be 786⅔×576, with 769⁵⁄₂₇×576 active picture area.
    • Or, use the common approximation 12:11 SAR and 15:11 DAR. Output will be 785⁵⁄₁₁×576, with 768×576 active picture area.
  • Given 702×576 active picture in a 720×576 frame (9 pixels blank on each side), for ~16:9 active picture area DAR, use 118:81 SAR and 295:162 DAR. Output will be 1048⁸⁄₉×576, with 1025⁴⁷⁄₈₁×576 active picture area.
    • Or, use the common approximation 16:11 SAR and 20:11 DAR. Output will be 1047³⁄₁₁×576, with 1024×576 active picture area.

(PAL is 702 wide but padded with 1px on each side to make it 704, and literally no one tries to scale it exactly right.)

Full-frame images

When a 4:3 image fills the full 720px-wide frame (no blank space on the left & right), it is unclear what to do. Often, such as for Digital8-sourced DV video, the full frame represents the active picture area, and this is assumed by some players (like my Roku). Or, it could be because the active picture area was erroneously stretched from 704 to 720 already, in which case all of the above will be wrong until you first rescale (not crop) the source to be 704px wide. So if the final result is still stretched too wide, consider this method. Really, though, for a video that doesn't have any known shapes (perfect circles and squares) to calibrate against, there is no way to know for sure what aspect ratio is correct. See my explanation in the sidebar.

Pan-and-scan tagging

Sometimes a 4:3-tagged MPEG-2 stream will contain, on a frame-by-frame basis, pan-and-scan info in the form of sequence display extension headers designating a horizontal display size of 540. What this actually means is that within the 720x480 frame, there is a 540x480 portion which contains the 4:3 portion; the full frame is actually 16:9. By default, the 4:3 portion is centered relative to the full frame, but there can be other flags saying where exactly the 4:3 portion is. The idea is that when outputting to a 4:3 screen, the player is expected to crop and possibly zoom as needed so that only the 4:3 portion of the frame is displayed.

VCR head-switching noise and vertical cropping

There is often VCR head-switching noise at the bottom of the frame of a VHS transfer. It warps the lines somewhat randomly, so it is not really repairable. Most people just ignore it. You could cover it up with a black bar, or you could crop it out. If you crop it, you no longer have exactly 480 or 576 lines, so it throws off the aspect ratio calculations. However, it will be fine as long as you remember 704 becomes 640, and keep the line count the same. Example:

  • If 704×480 is to be 4:3, then the SAR is 10:11 and output is 640×480 (4:3). If cropped to 704×476, it will need to be output as 640×476, so you can tag it for 640:476 DAR, which reduces to 160:119. Likewise, 704×472 will need to be output as 640×472, which is 80:59.

Lossless options for MPEG-4 video

Examples:

Set H.264 bitstream metadata to display 4:3 content as 16:9:
ffmpeg -i input.mp4 -map 0 -c copy -bsf:v h264_metadata=sample_aspect_ratio=4/3 -aspect 16/9 output.mp4
Set H.264 bitstream metadata to declare 8:9 SAR:
ffmpeg -i input.mp4 -map 0 -c copy -bsf:v h264_metadata=sample_aspect_ratio=8/9 -aspect 4/3 output.mp4

This may result in a harmless(?) warning: "4 bytes left at end of AVCC header."

Set H.265 bitstream metadata to declare 8:9 SAR:
ffmpeg -i input.mp4 -map 0 -c copy -bsf:v hevc_metadata=sample_aspect_ratio=8/9 -aspect 4/3 output.mp4

Even if you crop vertically to remove letterboxing (but still leave the 720 width), the SAR remains the same. The DAR can be given to -aspect as the actual pixel dimensions. For example:

ffmpeg -i input.mp4 -map 0 -c copy -bsf:v h264_metadata=sample_aspect_ratio=8/9 -aspect 720/368 output.mp4

See https://superuser.com/questions/907933/correct-aspect-ratio-without-re-encoding-video-file for basically the same info, and User:Mjb/MP4Box for examples using MP4Box, which may or may not provide better results.

Lossless option for MPEG-2 video

Setting the aspect ratio in the container does not work very well, I have found. You have to set it in the video stream's metadata.

FFmpeg has a bitstream filter for MPEG-2 metadata which includes the ability to rewrite the DAR. Valid values are 4/3, 16/9, or 221/100.

Set MPEG-2 video bitstream metadata in a VOB to declare 16:9 DAR:
ffmpeg -i input.vob -map 0 -c copy -bsf:v mpeg2_metadata=display_aspect_ratio=16/9 output.vob

Separate from the SAR and DAR metadata, MPEG-2 video sometimes has a "Sequence Display Extension" flag which contains pan-and-scan data for DVD players. This can have an impact on the actual DAR. I had some videos with this flag set and which would display or convert to 16:9 no matter what method I used with FFmpeg to try to get it to be 4:3. The solution was to use FFmpeg to demux the video stream to a .m2v file, then load that into ReStream to remove the Sequence Display Extension data, then use FFmpeg to remux it back to a VOB:

  ffmpeg -i input.vob -c:v copy -an input.m2v
  [load input.m2v in ReStream, remove Sequence Display Extension, write input.0.m2v]
  ffmpeg -fflags +genpts -i input.0.m2v -i input.vob -map 0:0 -map 1:2 -c copy output.vob

Unfortunately the demux/remux process can make the audio sync be a little off. If the audio needs to be delayed, sometimes it can be fixed in the ReStream step by making the frames field of the first GOP timestamp be something a little higher than zero (and taking into account that each frame in the timestamps is 1/64th of a second). For some reason, it doesn't always work, so I recommend using using FFmpeg's -itsoffset option; see the fix audio sync section below.

Lossy options

If you are OK with transcoding the video, the setsar & setdar filters work, at least for H.264 video. They do not change the stored image dimensions, just the declared shape and interpretation of the pixels. You are only supposed to set one of them (SAR or DAR), not both; the other will be calculated automatically.

For example, if you are starting with anamorphic (horizontally squished) 4:3 or 5:4 video stored in the VOBs of a widescreen (16:9) DVD, and you are cropping the original image to something other than its original 720x480 (NTSC) or 720x576 (PAL), and you want to keep it anamorphic, then you will need to use setsar=32/27 for NTSC or setsar=64/45 for PAL. If you are not cropping, then you can either use those setsar values or you can just use setdar=16/9. This will give you 16:9 DAR, same as what anyone playing the DVD would get from a real DVD player and TV, but there is still a possibility that the source material was not properly converted to 32/27 or 64/45 SAR in the first place, in which case you have to figure out better SAR values on your own (discussion).

Another option is to scale (stretch/squish) the stored video, or pad it with black bars. Here are some filter recipes to do that (source):

Given desired SAR as w/h, shrink/stretch to fit
scale="trunc(iw*sar/([w/h])/hsub)*hsub:trunc(ih/vsub)*vsub",setsar="[w/h]"
Given desired DAR as w/h, pad to fit
pad="trunc(if(lt(dar\,[w/h])\,ih*[w/h]/sar\,iw)/hsub)*hsub:trunc(if(lt(dar\,[w/h])\,ih\,iw/([w/h])*sar)/vsub)*vsub:(ow-iw)/2\:(oh-ih)/2:black",setdar="[w/h]"
Given desired max. width in pixels, shrink if needed
scale="trunc([width]/hsub)*hsub:trunc(ow*sar/dar/vsub)*vsub"
Given desired max. height in pixels, shrink if needed
scale="trunc(oh/sar*dar/hsub)*hsub:trunc(if(gt(ih\,[max_height])\,[max_height]\,ih)/vsub)*vsub"
Crop to match content size (POSIX shell command line; needs adjustment to work on Windows)
`ffmpeg -ss 60 -i SOURCE.EXT -f matroska -t 10 -an -vf cropdetect=24:16:0 -y -crf 51 -preset ultrafast /dev/null 2>&1 | grep -o crop=.* | sort -bh | uniq -c | sort -bh | tail -n1 | grep -o crop=.*`,scale="trunc(iw/hsub)*hsub:trunc(ih/vsub)*vsub"

Reformat video for iPod Classic

The iPod classic has a 320x240 display and can only play H.264 Baseline Level 3.0 videos with a 4:3 aspect ratio. You can give it up to 640x480 and let it resize for you, but it's best to just make your own videos which are exactly 320x240, letterboxed or pillarboxed as needed.

  ffmpeg -i input.mp4 -vf "scale=320:240:force_original_aspect_ratio=decrease:flags=lanczos,pad=320:240:(ow-iw)/2:(oh-ih)/2" -profile:v baseline -level 3 -movflags +faststart -preset veryslow -c:a copy output.mp4

Unfortunately this is still not good enough to guarantee playback on my iPod. I'm still trying to figure it out.

Embed subtitles

Subtitle fonts

I recommend installing these fonts:

  • Liberation Sans Narrow Bold is based on Arial and looks pretty good when stretched horizontally and given an outline.
  • Tuffy Bold also looks good stretched and with an outline. It has a "young readers" version of the letter g, but a standard a.

In the distant past, I used Tiresias Infofont, which is free and close enough to the non-free Tiresias Screenfont for my eyes. Not the greatest coverage though.

Here are some more candidates with young readers' a & g glyphs:

  • Advantage (Bold aka Demi)
  • Andika Eur
  • Helvetica Textbook LT Roman (Bold)
  • TuffyInfant – this one is styled a little too "Comic Sans" for me

There are many others which only have the young readers' g, or which have the young readers' a only in their italic variants.

Hardsub examples

Here's an example of adding fully formatted softsubs (in a .ass file) into hardsubs:

ffmpeg -threads 2 -i foo.mkv -c:v libx264 -pix_fmt yuv420p -profile:v high -level:v 4 -preset:v veryslow -crf 21 -vf "subtitles=filename=foo.ass:charenc=utf-8" -map 0:v -map 0:a -sn -c:a copy -movflags faststart -f mp4 -y foo_with_hardsubs.mp4

Similarly, here's what I did to transcode an animated film and turn softsubs (in a .srt file) into hardsubs, including adjustments to the subtitle font:

ffmpeg -threads 2 -i foo.mkv -c:v libx264 -pix_fmt yuv420p -profile:v high -level:v 4 -preset:v veryslow -crf 21 -tune animation -vf "subtitles=original_size=film:filename=foo.srt:charenc=cp1252:force_style=FontName='Liberation Sans Narrow Bold,PrimaryColour=&H3322FFFF,Outline=1'" -r 24000/1001 -g 15 -flags -global_header -map 0:0 -map 0:1 -sn -c:a ac3 -b:a 448k -movflags faststart -y foo_with_hardsubs.mkv

The subtitles parameter original_size=film applied to a 1920x1080 video results in the font being horizontally stretched, which looks good (in my opinion) when applied to a bold narrow font; the result is very tightly kerned.

I had trouble getting spaces and directory names to work in foo.srt, so it's best to just have it be a very simply named file in the current directory. Any help with getting the file from a subdirectory on Windows would be appreciated.

Basically the same but with brightness and contrast adjustments (e.g. for those weirdly dim Studio Ghibli transfers):

ffmpeg -threads 2 foo.mp4 -vf "eq=brightness=0.04:contrast=1.2,subtitles=filename=foo.srt:charenc=utf-8:force_style=FontName='Liberation Sans Narrow Bold,PrimaryColour=&H3322FFFF,Outline=2'" -c:v libx264 -pix_fmt yuv420p -profile:v high -level:v 4 -preset veryslow -crf 21 -tune animation -flags -global_header -c:a copy -movflags faststart -y foo_with_hardsubs.mp4

Here we are doing more complex adjustments for yellowish, dim Studio Ghibli transfers, but only after the first 12 seconds (e.g. to preserve the correctly colored Studio Ghibli intro):

ffmpeg -threads 2 -i foo.mkv -filter_complex "[0:v]trim=start=0:duration=12[a];[0:v]trim=start=12,setpts=PTS-STARTPTS[b];[b]eq=brightness=0.13:contrast=1.48:saturation=1.23:gamma=1.5:gamma_r=0.71:gamma_g=0.68:gamma_b=0.76:gamma_weight=0.7,hqdn3d[c];[a][c]concat,subtitles=original_size=film:filename=foo.srt:charenc=utf-8:force_style=FontName='Liberation Sans Narrow Bold,PrimaryColour=&H3322FFFF,Outline=2'[d]" -map [d] -map 0:a -c:v libx264 -pix_fmt yuv420p -profile:v high -level:v 4 -preset veryslow -crf 21 -tune animation -c:a copy -y foo_with_hardsubs.mkv

(Every film's a little different, so adjust brightness, contrast, gamma to taste.)

Here's one I used to transcode and deinterlace DVD video, turning softsubs into hardsubs:

ffmpeg -fflags +genpts -i foo.vob -c:v libx264 -pix_fmt yuv420p -profile:v high -level:v 4 -preset:v veryslow -crf 21 -vf "yadif=mode=3:parity=0,mcdeint=mode=medium,subtitles=filename=foo.srt:charenc=cp1252:force_style=FontName='Liberation Sans Narrow Bold,PrimaryColour=&H3322FFFF,Outline=1'" -map 0:1 -map 0:a -g 15 -flags -global_header -c:a copy -movflags faststart -f mp4 -y foo_with_hardsubs.mp4

Here's what I did to attempt to deinterlace, overlay DVDsubs ("picture"-based subtitles), and transcode a VOB file...but I am overlooking something because it ends up producing 2 video streams in the output, one deinterlaced and the other with hardsubs...help!:

ffmpeg -fflags +genpts -i foo.vob -filter_complex "[0:v]yadif=mode=3:parity=0,mcdeint=mode=medium[1v];[1v][0:3]overlay[v]" -map "[v]" -map 0:a -c:v libx264 -pix_fmt yuv420p -profile:v high -level:v 4 -preset:v veryslow -crf 21 -g 15 -flags -global_header -c:a ac3 -b:a 448k -movflags faststart -f mp4 -y foo_with_hardsubs.mp4

In the filter_complex syntax, filters are separated with semicolons as usual, and their inputs and outputs are defined with bracketed codes on the left and right sides of each filter spec, respectively. [0:v] means use file #0's video stream, [1v] and [v] are just arbitrarily naming the outputs so they can be used as inputs to other filters or in the map option.

Transcode to a specific frame size & bitrate

My iPhone records video at 1920×1080. The audio doesn't take up much space at all, but the video is H.264 at about 17 Mbps, so it uses up 125 MB per minute. Here is a way to get it down to a more manageable size and more portable container format, along with the 180° rotation ("vflip,hflip") mentioned above:

ffmpeg.exe -i inputfile.mov -acodec copy -b:v 2000k -vf "vflip,hflip,scale=1024:-1" outputfile.mkv

This makes it be about 15 MB/minute: 2 Mbps, 1024 width, -1 means whatever height will preserve the aspect ratio.

Reference:

Fix audio sync

Sometimes I need to delay the audio a slight amount. I use VLC media player to figure out the amount (play the file, Ctrl+E to open effects window, go to Synchronization tab, experiment with different values).

For some reason, FFmpeg's MKV support is still (as of early 2019) still not as robust as it should be. If the source containers are MKV, and you are only remuxing, then I suggest using MKVToolNix GUI instead of FFmpeg. You just add the files, choose the streams you want, select the audio streams and enter the delay (in ms, not s) in Timestamps section, and start the multiplexing task.

For other types of containers, FFmpeg offers a couple of options to delay the audio. One is -itsoffset #, which only works on an input video stream:

   ffmpeg -fflags +genpts -itsoffset -0.15 -i input.0.m2v -i input.vob -map 0:0 -map 1:2 -c copy output.vob

Another option is to use the -ss # option to simply skip the first however-many seconds of whichever stream you don't want to delay:

   ffmpeg -fflags +genpts -ss 0.15 -i input.0.m2v -i input.vob -map 0:0 -map 1:2 -c copy output.vob

Sometimes these don't work very well.

If the audio is PCM, just demux it, pad the intro with the desired amount of silence, and remux (keeping in mind MPEG PCM must be 16-bit Big Endian):

   ffmpeg -i input.vob out.wav
   sox out.wav out-padded.wav pad 0.18
   ffmpeg -loglevel panic -i input.vob -i out-padded.wav -map 0:1 -map 1:0 -c:v copy -c:a pcm_s16be -y out.vob

Audio drift

Sometimes audio and video might be out of sync in a way that's more complicated than a simple, constant delay. For example, maybe the longer the video plays, the further behind the video is.

One way to fix this is to speed up or slow down the audio by reinterpreting the sample rate. This will change the pitch but will not alter the audio samples, so it is the least destructive solution. However, this is only useful if the audio drift is constant across the whole video. On one particular DV capture of mine, the audio speeds up and slows down seemingly at random, independently of the video. The only way to fix that is to divide it into segments and adjust each one independently.

Anyway, to reinterpret the sample rate, first calculate the new sample rate, e.g. if the drift in a 32 kHz audio stream is 0.01 s per minute, then 0.01×32000 = 320 samples per minute = 5.333 samples per second. So new sample rate can be about 31995 to slow down or 32005 to speed up. Now you can demux the audio, then use SoX to read it raw, specifying the format with your custom sample rate, and write a new WAV which you can then remux with the video.

  • ffmpeg -i input.avi -c:a copy fast.wav
  • sox -r 31995 -e signed -c 2 -b 16 fast.wav ok.wav
  • ffmpeg -i input.avi -i ok.wav -map 0:0 -map 1:0 -c copy output.avi

The downside of this is the resulting video has an unusual sample rate for the audio. It is possible some players won't like this. In this case I suggest resampling again, e.g. change the sox command above to:

  • sox -r 31995 -e signed -c 2 -b 16 fast.wav -r 32k ok.wav

FFmpeg also offers an option which keeps pitch the same, but it did not work for my situation. I think this is more for videos where the video timestamps are not at regular intervals. It resamples the audio so that it stays in sync with the timestamps in the video, in this example speeding up or slowing down by up to 1000 samples per second:

  • ffmpeg -i input.avi -c:v copy -af "aresample=async=1000" -c:a pcm_s16le output.avi

Processing interlaced content

About interlacing

Interlaced video uses two "fields" of alternating lines to compose one frame. The fields are normally representing slightly different moments in time, so they are shown sequentially, with the assumption that an old CRT screen's fading phosphors and your your eyes' persistence of vision will allow you to perceive the current and previous field even though they were never completely on-screen at any moment in time. Modern computers and handheld devices instead draw the whole frame so both fields are on-screen at the same time, and that image is held without fading. This tends to result in vary obvious "comb" artifacts whenever there's motion, especially horizontal motion. Modern digital TVs usually handle interlaced content specially, either deinterlacing it or otherwise displaying it in a way that looks reasonably good, although you probably will still see the artifacts. (I think interlacing was also less noticeable on old CRT-based TVs because the phosphor screen arranged its "pixels" such that alternating rows were offset like a brick wall. Not sure if that really affected things though.)

Normally when people use the word interlaced, it means each field represents a sequential window of time: basically 1/60th of a second in the NTSC countries, 1/50th in PAL. On old tube cameras, the first part of the field's first row is what was happening in that part of the camera's view at the beginning of that time window, and the last part of the last row is what was happening in that part of the camera's view at the end of that time window, so each successive point in the scan represents a later moment in time. This changed in the late 1980s and beyond, as high-end, CCD-based cameras began to "take a picture" (like a film camera would) 60 times per second and then scan each of those still images, meaning each field is still sequential but represents (essentially) an instant, very much like when film is scanned for the telecine process. There are also cameras or broadcast systems which produce both fields from the same picture, so output 60 fields per second but are sourced from half that many unique images per second. So interlaced really only refers to a frame being split into fields which may be (and usually are) designated for sequential display.

MPEG-2 and MPEG-4 content is flagged as interlaced or progressive on a per-frame basis. MPEG-4 H.264 (but not H.265) supports two types of interlacing: PAFF and MBAFF. PAFF works the same as in MPEG-2; entire frames are interlaced. MBAFF, on the other hand, allows for interlacing to be tagged and specified on a per-macroblock (partial picture) basis. It is mainly only used by broadcasters, e.g. to have the fast-moving parts of the frame be interlaced, and the rest progressive. libx264, as used by FFmpeg, only supports MBAFF. This shouldn't be any different in quality or efficiency for fully interlaced frames, but it is slower to encode than PAFF would be.

Detecting interlaced video

Strangely, it is difficult to know whether video is interlaced. The conventional wisdom is "you can only know by looking at it", which I interpret to mean "no one has yet figured out how to algorithmically detect it with 100% reliability."

MPEG-2 and MPEG-4 containers can provide some hints, but they often don't, or they get it wrong.

If the container says that the content is interlaced, then ffmpeg -i might say "interlaced" or "(tv)" or "top field first" or "bottom field first".

In reality, some content may be only partially interlaced, such as when an interlaced video scene is spliced in the middle of a progressive film. This is more likely to read as progressive at the container level; you don't find out some of it is interlaced until you play it. Another thing that happens sometimes is progressive content gets encoded as interlaced for broadcast such that each field is sourced from the same picture and thus represents the same moment in time. Telecined content is kind of a hybrid of these two scenarios. I have also seen weird combinations, like interlaced animations over a progressive source and encoded as interlaced (in the Duran Duran "Come Undone" video), or interlaced video sourced from zoomed or composited interlaced material (so there's comb effects you can't get rid of) (in the Duran Duran "Rio" video's letterboxed scenes).

FFmpeg has a frame info filter called showinfo. It displays info about each frame, one per line. The interlace mode is indicated by i:P, i:B, or i:T, for progressive, interlaced bottom field first, or interlaced top field first. Here's how to show the info for the first 10 frames:

  ffmpeg -hide_banner -i "input.avi" -an -vf showinfo -frames:v 10 -f null -

FFmpeg also has an interlace detection filter called idet. It looks at the actual images and tags suspected interlaced frames as interlaced, for the benefit of a deinterlace filter like yadif. It can also be used by itself to get statistics which can help you decide whether and how the video is interlaced.

  ffmpeg -hide_banner -i "input.avi" -an -vf idet -frames:v 100 -f null -

There's no flag indicating telecine, but if you step through the video frame by frame in VLC with deinterlacing off, you'll see it in action as (e.g.) 2 obviously interlaced frames followed by 3 that look fine, over and over, possibly with occasional irregularities due to sloppy editing. Likewise, telecined content will be read by idet as having many interlaced and progressive frames mixed together, when in fact it is all interlaced, just such that some fields compose a frame representing the same source picture (because the picture was duplicated across two adjacent frames). FFmpeg also has a (sort of) telecine detection filter which looks for duplicate fields; see the inverse telecine (IVTC) section below.

All DVDs store video as SD and interlaced (slightly under 30 fps for NTSC, or exactly 25 for PAL), so just assume it is interlaced. You can deinterlace content from an NTSC DVD to 480p60 for great results, or 480p30 for OK results. However, content sourced from film might be flagged for the player to IVTC to 480p24. If you do the IVTC yourself, you'll get the very best results, it will be encoded very efficiently, and motion blur will be exactly as the filmmaker intended.

Keep transcoded content interlaced

FFmpeg assumes the input video is progressive (-top -1). Sometimes it will know the input is interlaced, but it's better to tell it explicitly by using -top 0 for bottom-field first (standard for DV) or -top 1 for top-field first:

  • -top 0

If the output is an MPEG-2 or MPEG-4 video format, add these flags:

  • -flags +ilme+ildct

If outputting H.264, you need to again specify the field order, typically with bff=1 for bottom-field first (typical for DV) or tff=1 for top-field first (typical for most other content):

  • -x264opts bff=1

Oddly, I noticed with these options, FFmpeg produces an H.264 file that confuses the "Yadif 2x" filter in VLC media player. The result is every other frame is repeated, instead of interpolated. I don't really understand it. I can feed the same output file to FFmpeg and it's YADIF filter handles it just fine. I'm uncertain whether FFmpeg has messed something up or if it's just a bug in VLC.

There is another caveat: H.262 (MPEG-2 video) encodes each field separately, so it is fine to use 4:2:0 (yuv420) color subsampling. H.264, though, apparently encodes the full frame, so its interlacing support is only guaranteed to work with the High 4:2:2 profile or better; you can force it to use 4:2:0 but the chroma will be smeared vertically, and thus temporally. Many playback devices don't support 4:2:2 (even Blu-Ray is 4:2:0), so most people do not bother trying to use H.264 with interlaced material.

Fix interlaced field order

If an MPEG-2 video file's metadata says it is interlaced with bottom field first, when in fact it is encoded top field first (or vice-versa, or progressive), then you have to edit the container:

  1. Demux the video to an elementary stream, e.g.: ffmpeg -i input.mpg -c:v copy tmp.m2v
  2. Load the stream in Restream and edit its metadata (tick or untick the "top field first" box); click Write. It will write a new file with ".0" appended to the main part of the filename.
  3. Remux the video with FFmpeg (or MP4Box or whatever), e.g.: ffmpeg -i tmp.0.m2v -i input.mpg -map 0:0 -map 1:1 -c:v copy -c:a copy fixedoutput.mpg

FFmpeg will complain about the elementary stream not having timestamps. This is normal and should be OK to ignore. In theory, specifying the framerate, e.g. -r 30000/1001, should eliminate the warning, but it does not, last I checked.

Restream does not work for MPEG-4, e.g. H.264 video. To do it right, you have to re-encode the source video correctly. If you transcode it and try to apply the appropriate deinterlacing, it might be mostly OK, but probably will have some glitches, and possibly some gnarly "ghost" effects in scenes with motion. For example, a progressive clip made from an interlaced source without any deinterlacing will have spatial and temporal smearing, especially in the chroma.

Basic deinterlace

In FFmpeg, there are several deinterlacers to choose from:

  • yadif (Yet Another De-Interlace Filter) is the most popular.
  • yadif combined with mcdeint (motion-compensating de-interlacer) can provide better results than yadif alone.
  • bwdif (Bob Weaver De-Interlace Filter) supposedly improves upon yadif.
  • nnedi uses slow neural networks, but is supposedly excellent for live-action.

Each has pros and cons. I usually use yadif, or yadif combined with mcdeint. According to the Avisynth wiki, yadif "checks pixels of previous, current and next frames to re-create the missed field by edge-directed interpolation and uses a spatial check to prevent most artifacts."

In FFmpeg, -deinterlace works, but is deprecated and is now just an alias for -vf yadif. This produces deinterlaced output at the input framerate, i.e. 1 frame for every 2 fields. Fast motion will be blurry but smooth. Bitrate will be reasonable.

You get better quality if you use -vf yadif=mode=1, which outputs 1 frame for each field, so scenes with fast motion will look more like the original, but this also bloats the bitrate.

I find that for DV content, which is interlaced bottom-field first, I need to explicitly tell the yadif filter the field order, so ultimately I'm doing something more like this: -vf "yadif=mode=1:parity=1"

If there is a mix of interlaced and progressive frames, e.g. an interlaced scene in the middle of a progressive film or vice-versa (happens on DVDs sometimes), you can also add the deint=1 option. It tells the filter to only process frames which are tagged as interlaced. This is commonly used after the idet or fieldmatch filters, e.g. -vf idet,yadif=mode=1:deint=1.

Advanced

For my conversions from DV, I am going to be using this in the future:

-vf "yadif=mode=3:parity=1,mcdeint=mode=medium"

YADIF mode 3 is like mode 1, but skips the 2nd part of its processing, for which we use the MCDeint spatial deinterlacer instead. My testing indicates this results in a slightly crisper image and better handling of fast horizontal motion. Omit mode=medium for a slightly faster encode. I did not notice a difference quality-wise (as compared to the default "fast" mode), but medium is not that slow, so why not use it?

I tried using extra_slow instead of medium, and found that aside from being intolerably slow, there is a tradeoff: this mode gives you smoother gradients and less aliasing/shimmer (aliasing is when diagonal lines or close-together horizontal lines seem to flicker or look stair-steppy), but you also get a slight loss of detail, as fine details get mistaken for things that need to be blurred.

mcdeint's qp=10 helps improve the aliasing even further, but does not really improve on the loss of detail.

With libx264's -preset veryslow as opposed to -preset veryfast, there is a slight improvement in detail with MCDeint's medium mode. Weirdly, the detail gets more error-prone in MCDeint's extra_slow mode. So maxing out the settings maybe isn't ideal.

QTGMC

Another deinterlace option is using AviSynth with QTGMC. In my own experiments, it does do as good of a job as the pure FFmpeg options above. It's main advantage seems to be in that it detects and eliminates "shimmering" and other post-deinterlace artifacts. It is also a bit slow.

I recommend QTGMC for the deinterlace if you are already going to be using AviSynth for other processing of the video, or when you are getting shimmering with a pure FFmpeg solution.

Inverse telecine

Telecine content is typically 24 fps film which has been "pulled down" to 29.97 fps (59.94 "fields" of alternating lines per second) in a process that slows the film down by 0.001% and then combines interlacing and duplication: every other frame becomes 2 fields, and every frame in between becomes 3 fields. Technically it is all interlaced, but it works out such that (for NTSC at least) a "3:2" pattern repeats: 3 frames which might casually be described as progressive or non-interlaced (because both fields are from the same moment in time) followed by 2 frames which are each very obviously interlaced (combining two different moments in time). That's "hard telecine"; there is also "soft telecine" which is 23.976 fps progressive frames, but encoded with flags that tell the player to do the 3:2 pulldown on the fly. ffmpeg -i does not look deeply enough into the file to detect telecine; soft telecine gets reported as progressive, hard telecine is reported the same as interlaced (top or bottom).

Undoing this process is "inverse telecine" (IVTC) and it can be tricky. The question is, does it really matter? Well I would say yes, if something was shot on film at 24 fps, fast motion may blur in ways that don't match the way it blurs when you watch the film, and this may bug you. Or not. But then you also have to consider that after you have applied the IVTC filters, you now probably have to compress that video again, so you may want to keep a copy around of your "lossless" original content if you can play it as-is.

Here is a filter chain that I have somewhat successfully used for IVTC:

  • ffmpeg -i input.vob -vf fieldmatch,yadif=deint=1,decimate -b:v 6000k -maxrate 7000k -bufsize 1835k -c:v mpeg2video -c:a copy out.vob

or you can use -vf fieldmatch,bwdif=deint=1,decimate if you want to use bwdif instead of yadif. For MPEG-2 in MKV input, use the fps filter first to set the input framerate to whatever it actually is (e.g. 30000/1001); otherwise, FFmpeg gets confused and thinks the field rate is the framerate.

fieldmatch looks for combing, then assumes or decides which fields belong together, and it tags the relevant frames as interlaced. yadif with the deint=1 parameter deinterlaces the tagged frames; 1 in 5 frames will then be a duplicate of the one before it. decimate compares frames in groups of 5 (by default) and removes the 1 frame it thinks is most likely a duplicate. Thus from every 30 frames of video input you get 24 frames of output (or you can think of it as 4 frames out for every 5 in). More precisely, given 29.97 fps NTSC input, you get the ever-so-slightly slow rate of 23.976 fps. If you want to speed it up, I'm sure you could force a perfect 24 fps rate with the fps=24 filter, but then you must speed up the audio by the same amount to keep it in sync. A speed difference of 0.001% is undetectable by human eyes and certainly ears as well; e.g. it is the difference between 440 Hz and 439.56 Hz. So is it worth the fuss to make it be a perfect 24? No.

Well, this fieldmatch/yadif/decimate either works flawlessly, or it fails and causes jumps or leaves behind some interlaced frames.

When the decimate filter reports that some of the non-discarded frames are still interlaced, it means the 3:2 pulldown pattern was not consistent. If this happens for one or two frames in the whole file, especially if they occur at scene changes, then you can probably ignore it and say the IVTC worked well enough. But when there are a bunch of these warnings, my advice is to try to do it with AviSynth first, or just do a regular deinterlace and give up on a proper IVTC for now. With FFmpeg, it's too much work to determine exactly what went wrong and how to fix it.

Nevertheless, if you really want to go into it...

It could be that something went wrong before or during the telecine process, or the content was edited improperly afterward, or there's a deliberate mix of telecine and progressive or fully interlaced content.

When telecine content has been edited after the telecine process, sometimes frames are missing and it might not be possible to find a matching field for some of the interlaced frames. For this I've seen a recommendation of using dejudder,fps=30000/1001 before the fieldmatch. The dejudder filter I think just drops the incomplete interlaced frames, although I don't know how it really works. The fps filter inserts duplicate frames to make up for the dropped ones.

When transcoding hard-telecine content from old NTSC DVDs which were apparently converted from PAL sources, I noticed that the fieldmatch filter with default settings lets some combed frames slip through, because there is a "blended field" in some of the frames that are supposed to be progressive—i.e. one field is like an interlaced frame, itself, and is probably safe to discard. Taking care of this is very difficult and I have yet to figure out a way to do it. One thing that definitely does not work is using fieldmatch=combmatch=full. The documentation even gives this as an example of "advanced" IVTC, but it did not work well for me. Normally fieldmatch assumes which fields belong together based on field order, except during scene changes. combmatch=full tells it to never assume, and instead try to figure out which fields belong together based on whether there's a combination that doesn't look interlaced.

On top of this, I'm seeing on some DVDs that chroma is interlaced separately from luma. Diagnosing and fixing this is a lot of effort and I have not yet figured out how to do it. Some good forum discussion about this exists.

There is another filter, pullup, which can be used in combination with the framerate (-r) option to achieve a similar effect, e.g. -vf pullup -r 24000/1001. It seems to be an older, less capable filter, so I recommend sticking to fieldmatch.

I have some telecine content on DVD (music videos by the French band Air) which looks like crap no matter which method I use (pullup or fieldmatch,yadif,decimate). I think it is just not a good transfer in the first place.

Reportedly, FFmpeg is good, but not quite as good at IVTC as AviSynth. See my AviSynth info for more on that.

Convert DV AVI to H.264 MP4

Since 2016 I've been digitizing the analog video signal from a VCR by running it through a camcorder which outputs DV format over a FireWire cable. I'm using Sony's PlayMemories Home (version 5.4.02.06120, the last version to support tape-based camcorders; do not upgrade to 6.0!) to capture the camera's data stream (720x480, 29.97 Hz, interlaced DVCPRO—a.k.a. DVCPRO25, dvsd, dvvideo, or consumer DV) and put it into AVI containers.

The camera has a choice of audio during the transfer: 12-bit or 16-bit. Either way, it's stereo. If you choose 12-bit, the output is actually 16-bit 32 kHz PCM, but from a 12-bit source. There's broadband hiss associated with this, but it is not distracting. If you choose 16-bit, it's true 16-bit 48 kHz PCM. The 12-bit mode is fine for typical home-movie audio, e.g. speech and background sounds, but for tapes with Hi-Fi music on them, I only use 16-bit mode. If I end up needing to downsample to 32 kHz mono, I do this afterward, deleting tmp.bat when done:

 echo ffmpeg -i "%~1" -c:v copy -c:a pcm_s16le -ac 1 -af aresample=resampler=soxr -ar 32000 -y out.avi ^&^& mv out.avi "%~1" > tmp.bat & start /low for /f "delims=" %x in ('dir /b *.avi') do tmp.bat "%x"

Anyway, there are problems with the resulting DV-AVI files:

  • Huge files: about 13 GB per hour.
  • DV's 4:1:1 YUV subsampling results in desaturated, fuzzy color (color is sampled at ¼ horizontal resolution—i.e., on each line, each set of 4 pixels gets their average color).
  • Interlaced output looks bad when viewed on computers.

The file size can be reduced by transcoding to more efficient format like H.264 (MPEG-4 AVC) for the video and AAC-LC for the audio. I can't undo the damage caused by the chroma subsampling, but I can make it look less washed-out (but somewhat cartoony) by applying a saturation filter (hue=s=#). The annoying "comb" effect from interlacing can be mitigated, at a cost, by using a deinterlace filter (yadif). The H.264 codec can also be optimized for grainy video, which my old SLP-mode VHS clips tend to have. Here is an example:

 ffmpeg -i inputfile.avi -vf "yadif,hue=s=1.25" -c:v libx264 -preset veryslow -crf 20 -tune grain -c:a aac -b:a 160k outputfile.mp4

With these settings, the output MP4 is about 17% the size of the input AVI, so about 2.2 GB per hour. The video data rate is about 5 Mbps. I think it looks pretty good.

-crf 20 sets quality level 20 (23 is default, lower is better but has diminishing returns, I sometimes use 15). aac is the native AAC encoder, which is better than libfaac but not as good as libfdk_aac (which isn't in my build of FFmpeg).

Here's an example of using 2-pass encoding, which requires specifying a target bitrate rather than quality:

 ffmpeg -i input.avi -vf "yadif,hue=s=1.25" -c:v libx264 -preset veryslow -pass 1 -b:v 11000k -f mp4 -y NUL
 ffmpeg -i input.avi -vf "yadif,hue=s=1.25" -c:v libx264 -preset veryslow -pass 2 -b:v 11000k -y output.mp4

2-pass is only for getting a higher quality when using a target bitrate, i.e. it does not help if you are using -crf.

I was using a 1.4 saturation filter but found it was a little too unnatural.

Starting mid-2018, this is what I plan to use for the home movies, optionally with -ac 1 when the audio is mono, and omitting or toning down the saturation boost when the content was recorded directly with the DV camera:

 ffmpeg -i input.avi -vf "yadif=mode=3:parity=1,mcdeint=mode=medium:qp=10,hue=s=1.25" -maxrate 11M -bufsize 14M -c:v libx264 -preset medium -crf 20 -c:a aac -b:a 128k -pix_fmt yuv420p -profile:v main -level:v 3.1 output.mp4

The bitrate caps are to ensure streaming to my old Blu-Ray player will work, but with CRF 20 I doubt the bitrate will ever reach the limit. Quality doesn't really get any better without adjusting the CRF further downward, which bloats the bitrate. Changing the profile to High 4.1, or using -preset slow or veryslow will reduce the bitrate and file size further (maybe 10%) but encoding speed drops by half and quality is virtually unaffected for this type of video.

If you have ideas on better settings to use, please let me know!

If you want to output to .VOB files for use with any DVD player, you have to use the older H.262 format (MPEG-2 Video) with 4:2:0 subsampling, and MP2 or AC-3 audio. The video quality can still be very good overall; the format is just not as efficient.

H.264 capabilities

H.264 profiles and levels help optimize the encoded video for different classes of playback devices.

  • Profiles are basically feature sets for different targets:
    • Baseline (BP) = most compatible with cheapest, slowest devices
    • Main (MP) = standard for mainstream/consumer devices, DVD grade
    • Extended (XP) = Main, plus better support for streaming (but not supported in FFmpeg)
    • High (HP or HiP) = for basic broadcast and other HD devices, Blu-Ray grade
    • High 10 (Hi10P) = High, plus support for 10 bpp
    • High 4:2:2 (Hi422P) = High 10, plus support for 4:2:2 chroma subsampling
    • High 4:4:4 Predictive (Hi444PP) = adds support for 4:4:4, 14 bpp, lossless, etc.
  • Levels mandate maximum video bitrates and macroblock rates (which imply reasonable frame sizes & rates for high quality):
    • Level 3 max bitrate = 10 Mbps (BP/MP/XP), 12.5 Mbps (HiP), 40 Mbps (Hi422P/Hi444PP)
    • Level 3.1 max bitrate = 14 Mbps (BP/MP/XP), 17.5 Mbps (HiP), 56 Mbps (Hi422P/Hi444PP)
    • Level 3 max frame = ~ 720×576 @ 25 fps or 720×480 @ 30 fps or 352×480 @ 60 fps
    • Level 3.1 max frame = ~ 1280×720 @ 30 fps or 720×576 @ 60 fps
    • See more details at http://blog.mediacoderhq.com/h264-profiles-and-levels/

To use these features, add to your command line:

  • -profile:v profile where profile is one of baseline, main, high, high10, high422, or high444.
  • -level:v level, where level is one of 3.0, 3.1, 3.2, 4.0, 4.1, 4.2, 5.0, or 5.1.

If you force the bitrate to be higher than the Profile & Level combo supports, then the file will probably only work in software players.

For example, in one case I was encoding for 720x480 60 fps 4:2:2 devices at 15 Mbps. FFmpeg automatically selected High 4:2:2 level 4, but I could force it to be higher if I wanted.

To play 4:2:2 content on my 4:2:2-incapable devices, I have to configure my media server to transcode it; see User:Mjb/Serviio.

If I want a format I can serve natively to all of my devices, I need to encode 4:2:0 at max 11 Mbps, and profile Main level 3.1. Level 4.0 is possible too, but riskier.

Some recommendations

  • SD NTSC native 480i30: -profile:v main -level:v 3.0
  • SD NTSC IVTC'd to 480p24: -profile:v main -level:v 3.0
  • SD NTSC deinterlaced to 480p30: -profile:v main -level:v 3.0
  • SD NTSC deinterlaced to 480p60: -profile:v main -level:v 3.1

When the peak bitrate of all streams combined (not just the video) approaches the nominal limits of 10 Mbps (Main 3.0) or 14 Mbps (Main 3.1), I suggest going one higher on the level—e.g. 3.2 instead of 3.1.

The Roku Stick (2017 model) can supposedly handle H.264 at 10 Mbps average, 15 Mbps peak, either Main or High 4.0 or 4.1.

Chroma subsampling

FFmpeg's libx264 codec uses 4:2:2 subsampling (color at ½ horizontal, full vertical resolution) by default—assuming you didn't specify Baseline or Main profile because 4:2:2 is supported by the High profile only—but of course the result can never be better than the 4:1:1 input. So for greater compatibility with playback devices/apps, I use -pix_fmt yuv420p to have the codec use 4:2:0 (color at ½ horizontal and ½ vertical resolution). This will naturally be worse than 4:2:2, but the difference really is not that visually significant on delinterlaced material; see https://www.red.com/red-101/video-chroma-subsampling for examples.

Avidemux

I experimented with using Avidemux, which is a free video editor like VirtualDub. It can utilize FFmpeg libs, among others, if using it to convert output. It also can do lossless editing and splitting. I may use it to split some huge DV AVIs into DVD-R sized pieces. Unfortunately, it is very crash-prone, so I can't recommend it. It also seems to maybe not do the splits correctly (or at least not to FFmpeg's liking).

In order to improve the look of DV captures of LP-mode VHS recordings of analog cable broadcasts, I tried the following filter chain:

  • ChromaShift (U: -5, V: -4) to get the color fields in sync; this may vary by tape and source.
  • dgbob (mode 1, order 0, threshold 0) for bob deinterlacing (doubles the framerate, but motion is smooth).
  • Mplayer hue (hue -4 to -15, sat 1.0) to make blues blue instead of purple, etc.
  • MPlayer eq2 (cont 1.04, brigh 0.02, sat 1.37) to boost brightness/contrast/gamma/saturation and make neutral colors neutral; figuring out ideal settings is difficult!
  • blacken borders (left: 22, right: 22) to simplify the left and right edges, and to crop out the pixels made colorless by ChromaShift.

x264 encoder settings (General):

  • Preset: veryslow
  • Tuning: grain
  • Profile: baseline
  • Fast First Pass [off]
  • Encoding Mode: Video Size (Two Pass)
  • Target Video Size: depends on destination. To fill up a single-layer DVD-R, I think 4300 MB should be OK. It seems the doubled framerate from dgbob throws the estimate off by roughly half, so I have to double the target size I enter here.

x264 encoder settings (Output 1):

  • Predefined aspect ratio: 8:9 (NTSC 4:3) - This is the pixel aspect ratio (PAR) to tag in the output file, and setting it to 8:9 makes it be the same as the DV input. The Avidemux wiki says not to change this, but if I don't, the output defaults to (I think) PAR 1:1, thus it has display aspect ratio (DAR) 3:2 (because of DV's 720x480 storage); this is slightly horizontally elongated when played. By setting PAR 8:9, it is saying to pretend the pixels are slightly narrower than they are tall.

Avidemux to filter, FFmpeg to encode

It seems to be impossible to stop Avidemux from creating 4:2:0 output. In order to get 4:2:2, I can just use Avidemux for filtering, outputting a lossless video file (so, this is impractical for long clips). Then I can do the FFmpeg encoding from the command line. What a mess!

Example workflow:

  • Prep the audio
    • ffmpeg -i "input.avi" -vn -c:a copy "tmp.wav"
    • Process tmp.wav in an audio editor to adjust channels, EQ, resample, normalize to -3 dB peaks, reduce noise, etc.
  • Prep the filtered video
    • In Avidemux, load input.avi and set up the video filter chain as desired
    • Set the Video Output to (FF)HuffYUV - this is a lossless format using about 1 GB per minute!
    • Set Audio Output to Copy
    • In Audio > Select Track, disable Track 1. Don't set it to use tmp.wav because it will add a glitch at the end.
    • Set the Output Format to AVI Muxer
    • See below for how to save the current settings for use with other video files
    • Save to tmp.avi - this will be 3:2 (default for 720x480 with 1:1 pixels) but we'll fix it during compression
  • Calculate the target bitrate
    • This calculator (one of many online) can help. My Blu-Ray player can only handle ~17 Mbps video and I find 15 Mbps (15000 kbps) is usually plenty.
  • Compress the video and audio into one H.264 MP4
  • ffmpeg -y -i tmp.avi -i tmp.wav -map 0:0 -map 1:0 -c:v libx264 -preset veryslow -tune grain -pass 1 -b:v 15000k -aspect 4:3 -c:a aac -b:a 128k -shortest -f mp4 -y NUL
  • ffmpeg -y -i tmp.avi -i tmp.wav -map 0:0 -map 1:0 -c:v libx264 -preset veryslow -tune grain -pass 2 -b:v 15000k -aspect 4:3 -c:a aac -b:a 128k -shortest -y output.mp4

For applying the same filters to multiple files, save a project with the filters you want, then edit the file and remove or comment out everything you don't need. For example, in the following project script, I replaced the comment at the top and commented out the portion specific to a particular video file, so that I can now run it after loading any other video:

# this script sets the following Avidemux options:
#
# HuffYUV (lossless) video compression
# no audio
# output to AVI container w/OpenDML extension (allows files over 4 GB)
# filters for processing my VHS rips
#
adm = Avidemux()
#adm.loadVideo("C:/path/to/some/video.avi")
#adm.clearSegments()
#adm.addSegment(0, 0, 28895531)
#adm.markerA = 0
#adm.markerB = 28895531
adm.videoCodec("HUFFYUV", "encoderType=0")
adm.addVideoFilter("chromashift", "u=-5", "v=-4")
adm.addVideoFilter("dgbob", "thresh=0", "order=False", "mode=1", "ap=False")
adm.addVideoFilter("hue", "hue=-4.000000", "saturation=1.000000")
adm.addVideoFilter("eq2", "contrast=1.040000", "brightness=0.020000", "saturation=1.370000", "gamma=1.270000", "gamma_weight=1.620000", "rgamma=0.990000", "bgamma=1.080000", "ggamma=1.010000")
adm.addVideoFilter("blackenBorder", "left=22", "right=22", "top=0", "bottom=0")
adm.audioClearTracks()
adm.setSourceTrackLanguage(0,"eng")
adm.setContainer("AVI", "odmlType=1")

Save this code into the avidemux "custom" directory, with whatever filename you want, e.g. %appdata%\avidemux\custom\VHS rip filters.py. Restart Avidemux, load a video, and then choose that script from the Custom menu, and the settings should all take effect.

Replace audio in a DVD file

Music videos in MPEG-2 format (.vob or .mpg files) sometimes come with bad source audio and I will want to replace the audio with a good copy of my own. Of course, I have to pay close attention and make sure that the audio in the video is the same; videos often use custom edits or they overdub other sounds on top. Assuming I have suitable audio in a lossless format like FLAC, here's what I do to replace it:

First, extract the source audio to a WAV file:

  • ffmpeg -i input.vob output.wav

Note: if the original audio was lossy, the resulting WAV will probably be bigger because it includes encoder delay & padding, possibly also decoder delay (i.e. a bunch of silence at the beginning, and a little bit at the end). It's best if you can figure out how much there is and trim it. However, I don't know a good way to do that!

Next, use a wave editor (I use Audition) to create a new WAV that is perfectly time-aligned with the old. There are different ways of doing this. Here's one way:

  • Convert the replacement to the desired output sample rate.
  • Pick a non-silent spot at the beginning and end of the original file to be the anchor points. You are looking for spots that you can find in both the old and new files. Set a marker at each spot. (In the original, markers at samples 28936 and 8591175; in the replacement, at samples 4568 and 8560470).
  • How many samples are in between the markers? (8591175-28936=8562239 and 8560470-4568=8555902) Your goal is to change the replacement to match the original.
  • What's the original:replacement ratio? (8562239/8555902=1.0007406583198358279466034089685)
  • Do a pitch shift on the replacement with a target duration of that number multiplied by the current duration, in samples. (8659287*1.0007406583198358279466034089685=8665700)
  • Check how many samples are in between the markers. (8566809-4571=8562238) It should be really close to the original now. If not, figure out what you did wrong and try again.
  • Pad or trim silence from the beginning so that the first marker is at the same location as the first marker in the original. (28936-4571=24365 padding to add). If the beginning is offset or not silent, apply a fade-in beforehand.
  • Fade and/or pad the end, so that the total duration is the same as the original.

Now you need to mux them together. What is supposed to work is this:

  • ffmpeg -i input.vob -i new.wav -map 0:1 -map 1:0 -vcodec copy -acodec copy out.vob

-map 0:1 means use stream #1 from file #0 (input.vob), and -map 1:0 means use stream #0 from file #1 (new.wav).

Unfortunately, FFmpeg currently doesn't like to mux PCM audio into a VOB or MPEG-2 container without giving all kinds of packet too large / buffer underflow errors. Supposedly this was fixed, but it's not working for me, so...

The solution is to use MPEG Video Wizard DVD. In that app: Drag the video into the timeline's video bar, right-click on it and choose to mute its audio (otherwise it will mix them together). Drag the audio into the timeline's music bar. Click on Export (looks like a videotape) in the main toolbar, and make sure it's going to do a stream copy.

A more advanced example:

  • downloaded clip from YouTube as .mp4 (AVC video + AAC audio)
  • demuxed and converted AAC to WAV:
    • ffmpeg -i input.mp4 output.wav
  • noted that output.wav had 18345984 samples (6:56.009)
  • sized replacement audio to 18340836 samples (6:55.892) (a rough guess as to ideal size)
  • used fhgaacenc via foobar2000 to encode replacement audio as .m4a
  • muxed original video with replacement audio, bitrate ~2 Mbps, saturation 1.7x:
    • ffmpeg -i input.mp4 -i new_audio.m4a -map 0:0 -map 1:0 -vcodec libx264 -acodec copy -b:v 2000000 -vf "hue=s=1.7" out.mp4

Resulting video seems synced with its audio, but just to see how bad my guess was:

  • demuxed and converted AAC to WAV:
    • ffmpeg -i out.mp4 out.wav
  • noted that out.wav had 18342912 samples (6:55.939) ... 2076 samples more than I input, but 3072 less than needed!

Oh well.

Make a slideshow for YouTube

In the examples below, I use -loop 1, which is a very poorly documented feature of FFmpeg. I've also seen examples using -loop -1. I need to research this more.

The most basic slideshow is just one still image that stays on screen for as long the audio plays.

Assuming you have already encoded the audio to AAC-LC at 48 kHz:

  • ffmpeg -framerate ntsc -loop 1 -i image.jpeg -i audio.m4a -c:a copy -shortest -pix_fmt yuv420p out.mp4

I tried experimenting with different framerates to reduce the file size. FFmpeg can produce a usable video down to about 0.02 (1 frame every 50 seconds). Below that, the image does not show up in VLC.

However, I also found that using low framerates throws off the -shortest calculation. At 0.02 fps, a 3:18 video elongates to 5:50, but it is unplayable after 3:18. You can mitigate this somewhat by replacing -shortest with -t #### where #### is the exact duration of the audio file. But even then, the video duration will be the minimum possible with the framerate, in order to get the duration you requested. So at 0.02 fps, you are forcing it to be a multiple of about 50 seconds! Not ideal.

So for now, I think it is best to set -t explicitly, and set -framerate to 1 divided by the approximate precision you want in the duration—e.g., 2 will make the video be no more than a half-second too long (and probably not even that much).

The video picture size will be exactly the same as the source image size, which may not be what you want, for YouTube. Recommended sizes include 854x480, 1280x720, and 1920x1080. In FFmpeg you can rescale and add the necessary padding by setting environment variables to your desired width and height, then using the scale filter with a fancy formula:

  • set w=1920 & set h=1080
  • ffmpeg -framerate 2 -loop 1 -i image.jpeg -i audio.m4a -c:a copy -t duration -vf "scale=min(iw*%h%/ih\,%w%):min(%h%\,ih*%w%/iw),pad=%w%:%h%:(%w%-iw)/2:(%h%-ih)/2" -pix_fmt yuv420p out.mp4

After upload, it seems YouTube will then suggest an "enhancement" of cropping the padding out. That will surely involve transcoding, so don't do it.

For a silent slideshow lasting 120 seconds, here is a more general pattern:

  • ffmpeg -framerate film -loop 1 -t 120 -i image.png -pix_fmt yuv420p -crf 0 image.mp4

Convert 4K UHD to 1080p HD

Sometimes I encounter 4K (2160p) HDR Blu-Ray rips which I need to convert to a less CPU/bandwidth-intensive format that my Roku, TV, and sound system are happier with. So instead of a 3840px-wide frame with 10-bit BT.2020 color encoded in H.265 format with 7.1 audio, I want a 1920px-wide frame with SDR (8-bit) BT.709 color encoded in H.264 format with 5.1 audio.

The simplest way, ideally, would be to just tell FFmpeg to convert the input to 8-bit BT.709 color via the -pix_fmt yuv420p and -vf colorspace=all=bt709 options:

   ffmpeg -i input-4k-hdr.mkv -map_chapters 0 -c:a eac3 -b:a 480k -ac 6 -c:s copy -c:v libx264 -preset veryslow -x264-params b-adapt=2:me=umh:trellis=1 -pix_fmt yuv420p -vf "colorspace=all=bt709:fast=0,scale=1920:-2:flags=lanczos" output-1080p.mkv

Surprisingly, as of December 2023, this does not work because FFmpeg's colorspace filter still does not support smpte2084 as an input color transfer value. SMPTE 2084 is commonly found in UHD Blu-Ray videos; it provides crucial info about the HDR gamma curve.

One alternative is to use the GUI app Handbrake; it utilizes a custom build of FFmpeg which has been patched to handle smpte2084 input, and it seems to work well.

However, a common workaround, assuming FFmpeg was built with --enable-zlib (the z.lib color & resizing toolkit, not zlib!), is to chain the zscale and tonemap filters, which has the added benefit of allowing you to choose the way tone mapping (dynamic range conversion) is done:

   ffmpeg -i input-4k-hdr.mkv -map_chapters 0 -c:a eac3 -b:a 480k -ac 6 -c:s copy -c:v libx264 -preset veryslow -x264-params b-adapt=2:me=umh:trellis=1 -vf "zscale=t=linear:npl=175,format=gbrpf32le,zscale=p=bt709,tonemap=tonemap=mobius:param=0.1:desat=0,zscale=t=bt709:m=bt709:r=tv:w=1920:h=-2:f=lanczos,format=yuv420p" output-1080p.mkv

I'm not really sure whether it's ideal to resize first or change the color first. I've also seen conflicting info about what tone map to use (hable, mobius, or reinhard) and what npl value to use. A lower npl value brightens things up, but obliterates too much detail, so I experimented a bit and settled on 175 as a tolerable level. hable preserves the brightest & darkest details but might come out slightly dark, whereas mobius prioritizes color accuracy, but loses detail at the brightest & darkest extremes. So far, I've been OK with mobius with param=0.1, which almost the same as reinhard. I had to take the param down from the default of 0.3 because I kept getting oversaturated reds & oranges, and the desat option was no help. I'm still not entirely happy with it, though.

Regardless, gamut & gamma conversion is notoriously difficult to get right, and it's quite typical for the results of transcoding to be unsatisfying. Even the original Blu-Ray cannot be trusted to have been encoded with properly calibrated colors. Nor can the original digital transfer it was made from. Nor even the original celluloid film negative or print! In fact, very few analog films have an official or ideal color gamut, other than just whatever ended up on the original negative. A transfer made decades later inevitably looks worse, so it inevitably gets digitally color-corrected and artistically color-graded in post, with a result which will never be the same each time this is done. And then, a Blu-Ray version derived from such a transfer is almost certainly further remastered to be one engineer's idea of an aesthetically pleasing release for the then-contemporary mass market. Therefore, every release looks a bit different, color-wise, and this is normal. If you fail to get the colors to be as perfect a match as possible when transcoding, or you're unhappy with how the transcode renders on different devices, you can take comfort in that it's not necessarily any less correct than any other. With this in mind, don't be afraid to some light remastering of your own, in order to get the colors to be what you think they should be, according to whatever standard you desire.

For example, maybe if the picture is still slightly too dim, try a nonlinear gamma correction by using the curves filter to make midtones at 90% intensity become 100%—e.g. curves=all='0/0 0.9/1 1/1'. I've tried this, though, and feel it's insufficient on its own...bright white looks better, but the rest of the picture still needs some work.

Downmix 7.1 to 5.1 and stereo

Standard 7.1 audio has 4 surround speakers (side left, rear left, side right, rear right), where side is optimally 90° and no more than 110°. Rear is way in back, 135° to 150°.

Standard 5.1 audio has 2 surround speakers (side left, side right), where side is optimally 110°, i.e. slightly to the rear. This layout is called "5.1(side)" in FFmpeg. Plain "5.1" is wrong; it's like 7.1 without the side speakers.

FFmpeg will convert 7.1 to "5.1(side)" automatically just by setting the number of output channels to 6:

   -c:a eac3 -b:a 480k -ac 6 -metadata:s:a title="Dolby Digital Plus 5.1"

As for stereo, the default settings apply the official Dolby downmix algorithm, which discards the LFE channel:

   -c:a aac -b:a 256k -ac 2

It works well enough, but I am never totally happy with it, as compared to an official stereo mix.

Another option some people like is to convert to Dolby Pro Logic II, which is stereo, but processed in a way that makes it possible to recover 5.1 from it:

   -c:a aac -b:a 256k -filter:a "aresample=matrix_encoding=dplii:ochl=stereo" -metadata:s:a title="Dolby Pro Logic II"

My limited experience with Pro Logic is that while the 5.1 recovery is decent, the stereo listening experience occasionally ends up suboptimal, depending on what all is going on in the stereo field. For example, at about 30:15 into Tron Legacy, the drums during the crowd noise as Sam enters the game arena sound absolutely dreadful. And it is questionable why you'd want Pro Logic II when any receiver that can handle it surely also handles Dolby Digital Plus.

TODO: command line to do both at the same time from one source