Difference between revisions of "User:Mjb/FFmpeg"

From Offset
Jump to navigationJump to search
(About interlacing)
(About interlacing)
Line 171: Line 171:
 
===About interlacing===
 
===About interlacing===
  
Interlaced video uses two "fields" of alternating lines to compose one frame. The fields are normally representing slightly different moments in time, so they are shown sequentially, with the assumption that the CRT screen's fading phosphors and your your eyes' persistence of vision will allow you to perceive the current and previous field even though they were never completely on-screen at any moment in time. Computers and handheld devices draw the whole frame (i.e. both fields) so they are both on screen at the same time, and they hold that image without fading, which tends to result in vary obvious "comb" artifacts whenever there's motion, especially horizontal motion. Modern digital TVs usually handle interlaced content specially, either deinterlacing it or otherwise displaying it in a way that looks reasonably good, although you probably will still see the artifacts.
+
Interlaced video uses two "fields" of alternating lines to compose one frame. The fields are normally representing slightly different moments in time, so they are shown sequentially, with the assumption that an old CRT screen's fading phosphors and your your eyes' persistence of vision will allow you to perceive the current and previous field even though they were never completely on-screen at any moment in time. Modern computers and handheld devices instead draw the whole frame so both fields are on-screen at the same time, and that image is held without fading. This tends to result in vary obvious "comb" artifacts whenever there's motion, especially horizontal motion. Modern digital TVs usually handle interlaced content specially, either deinterlacing it or otherwise displaying it in a way that looks reasonably good, although you probably will still see the artifacts. (I ''think'' interlacing was also less noticeable on old CRT-based TVs because the phosphor screen arranged its "pixels" such that alternating rows were offset like a brick wall. Not sure if that really affected things though.)
  
 
Normally when people use the word ''interlaced'', it means each field represents a sequential window of time: basically 1/60th of a second in the NTSC countries, 1/50th in PAL. The first part of the field's first row is what was happening in that part of the camera's view at the beginning of that time window, and the last part of the last row is what was happening in that part of the camera's view at the end of that time window. My impression is that this changed in the late 1980s and beyond, as high-end cameras began to "take a picture" (like a film camera would) 60 times per second and then scan each of those still images, meaning each field is still sequential but represents a much narrower window of time (basically an instant), very much like when film is scanned for the telecine process. There are also cameras or broadcast systems which produce both fields from the same picture, so output 60 fields per second but are sourced from half that many unique images per second. (Someone correct me if I am wrong about any of this.) So ''interlaced'' really only refers to a frame being split into fields which ''may'' (and usually are) be designated for sequential display.
 
Normally when people use the word ''interlaced'', it means each field represents a sequential window of time: basically 1/60th of a second in the NTSC countries, 1/50th in PAL. The first part of the field's first row is what was happening in that part of the camera's view at the beginning of that time window, and the last part of the last row is what was happening in that part of the camera's view at the end of that time window. My impression is that this changed in the late 1980s and beyond, as high-end cameras began to "take a picture" (like a film camera would) 60 times per second and then scan each of those still images, meaning each field is still sequential but represents a much narrower window of time (basically an instant), very much like when film is scanned for the telecine process. There are also cameras or broadcast systems which produce both fields from the same picture, so output 60 fields per second but are sourced from half that many unique images per second. (Someone correct me if I am wrong about any of this.) So ''interlaced'' really only refers to a frame being split into fields which ''may'' (and usually are) be designated for sequential display.

Revision as of 04:54, 25 June 2018

General syntax

Input files are generally going to be audio and/or video, sometimes text. Audio and video files are often in "container" formats, e.g. an MP4 container file might have one H.264 video stream in it and several AAC audio streams. Common containers are MP4, MKV, AVI, VOB.

Always specify an input file:

 ffmpeg -i inputfile

If there are spaces in the file path, put it in quotes:

 ffmpeg -i "inputfile"

You can specify multiple input files to make use of streams from them (e.g. video from one, audio from another):

 ffmpeg -i inputfile0 -i inputfile1

If a file has multiple streams to choose from, by default you will get the highest quality video stream and the the highest quality audio from among all the files. You can specify which streams you want by using -map n:x options, where n is the input file number and x is the stream number (both start at zero). The order of the options is mirrored in the output, and any other streams in the input file are ignored. For example, this makes the first 3 output streams be based on file 0 stream 1, file 0 stream 5, and file 1 stream 4, in that order:

 ffmpeg -i inputfile0 -i inputfile1 -map 0:1 -map 0:5 -map 1:4

For batch processing, you can use features of the Windows command shell (e.g. convert all AVIs to MP4s):

 for /f "usebackq delims==" %f in (`dir /b *.avi`) do ffmpeg -i "%f" other ffmpeg options here "%~nf.mp4"

When you specify only input files, you get info about the file contents.

When you specify an output file and don't use any other options to specify the format you want, FFmpeg converts the input file(s) to one output file, using a format based on the output file's filename extension:

 ffmpeg -i inputfile.wav outputfile.mp3

Default behavior is transcoding the streams to something probably of much lower quality than you want.

You can specify the format and parameters for the transcoding:

 ffmpeg -i inputfile.wav -b:a 320k outputfile.mp3

You can add filters when transcoding:

 ffmpeg -i inputfile.mp4 -vf "filter1,filter2" outputfile.mp4

You can remux streams (put them in a different container) without transcoding them:

 ffmpeg -i inputfile.mp4 -vcodec copy -acodec copy -scodec copy outputfile.mkv

When specifying codecs for video, audio and subtitles, you can use a shorter syntax:

 ffmpeg -i inputfile.mp4 -c:v copy -c:a copy -c:s copy outputfile.mkv

Here is an example of a remux which preserves all metadata (including chapters) as it copies streams 0, 3, 5 and 7:

 ffmpeg -i inputfile.mkv -map_metadata 0 -map 0:0 -map 0:3 -map 0:5 -map 0:7 -c:v copy -c:a copy -c:s copy outputfile.mkv

You can force video or audio to be omitted by using -map and not including any streams of that type, or you can use -vn or -an:

 ffmpeg -i inputfile.mkv -an silentoutputfile.mkv

Containers and codecs

Common a/v containers are AVI, MKV, MP4, MPEG-1, VOB, MPEG-TS, MPEG-PS, WAV. Inside the container you can have audio streams, video streams, and subtitles. WAV is audio only. Common audio stream codecs are MP2, AAC, AC-3, PCM. Common video stream codecs are MPEG-2, H.264, AVC. Only certain combinations are widely supported.

Losslessly join videos

Let's say you want to losslessly concatenate two or more videos end-to-end, and the video & audio codecs and attributes (e.g. frame size) are all the same.

Preferred method for MPEG-1, MPEG-2 PS, or DV

You can use the concat filter to concatenate input files. Specify input framerate to prevent timestamp confusion:

 ffmpeg -r ntsc -i 'concat:VTS_01_1.VOB|VTS_01_2.VOB|VTS_01_3.VOB' -codec copy outputfile

Generic method for any format

 (echo file input1.m4v & echo file input2.m4v) > "%temp%\flist.txt" & ffmpeg -safe 0 -f concat -i "%temp%\flist.txt" -c copy outputfile.mp4 & del "%temp%\flist.txt"

In the file list, put the file names in single quotes if they contain spaces or weird characters.

This info is adapted from an answer at StackOverflow.

When the videos have different codecs, you need to do the concatenation as part of filter chain, which is more complicated. I suggest starting here: https://trac.ffmpeg.org/wiki/Concatenate#differentcodec

Extract a temporal portion of a video

Let's say you just want to take the portion from 37:07.5 to 41:30.

 ffmpeg -i inputfile -vcodec copy -acodec copy -ss 37:07.5 -to 41:30 outputfile

Rotate 180 degrees

When you hold a camera phone the wrong way, it will just put a 180° rotation flag in the metadata, which not all players will support. To rotate the actual video, chain the hflip and vflip filters:

ffmpeg -i inputfile -vf "vflip,hflip" outputfile

The rotation flag will not be changed when you do this, so you can set it afterward (assumes video is stream # 0):

ffmpeg -i inputfile -c copy -metadata:s:v:0 rotate=0 outputfile

Or you can do both at the same time (untested):

ffmpeg -i inputfile -vf "vflip,hflip" -metadata:s:v:0 rotate=0 outputfile

Here's a more robust example (worked for me):

 ffmpeg -i input.mp4 -metadata:s:v rotate="0" -vf "hflip,vflip" -c:v libx264 -acodec copy output.mp4

The c:v libx264 means to output H.264 video, which is what the input will be if it is from an iPhone. For more H.264 ffmpeg tips, see https://trac.ffmpeg.org/wiki/Encode/H.264

References:

Fix aspect ratio

Given desired display aspect ratio (DAR) as w:h (e.g. 4:3), set container metadata to match
ffmpeg -i input.mp4 -c:v copy -c:a copy -aspect 4:3 output.mp4

This tells the player to shrink or stretch as needed. It does not work with MPEG-2 video (e.g. in a VOB file) and it also may not work in all players. The only other solution is to transcode with -vf scale=whatever as shown below.

See https://superuser.com/questions/907933/correct-aspect-ratio-without-re-encoding-video-file for basically the same info, and User:Mjb/MP4Box for examples using MP4Box, which may or may not provide better results.

I have not tested them, but here are some filter recipes for fixing various aspect ratio issues via transcoding (source):

Given desired SAR as w/h, shrink/stretch to fit
scale="trunc(iw*sar/([w/h])/hsub)*hsub:trunc(ih/vsub)*vsub",setsar="[w/h]"
Given desired DAR as w/h, pad to fit
pad="trunc(if(lt(dar\,[w/h])\,ih*[w/h]/sar\,iw)/hsub)*hsub:trunc(if(lt(dar\,[w/h])\,ih\,iw/([w/h])*sar)/vsub)*vsub:(ow-iw)/2\:(oh-ih)/2:black",setdar="[w/h]"
Given desired max. width in pixels, shrink if needed
scale="trunc([width]/hsub)*hsub:trunc(ow*sar/dar/vsub)*vsub"
Given desired max. height in pixels, shrink if needed
scale="trunc(oh/sar*dar/hsub)*hsub:trunc(if(gt(ih\,[max_height])\,[max_height]\,ih)/vsub)*vsub"
Crop to match content size (POSIX shell command line; needs adjustment to work on Windows)
`ffmpeg -ss 60 -i SOURCE.EXT -f matroska -t 10 -an -vf cropdetect=24:16:0 -y -crf 51 -preset ultrafast /dev/null 2>&1 | grep -o crop=.* | sort -bh | uniq -c | sort -bh | tail -n1 | grep -o crop=.*`,scale="trunc(iw/hsub)*hsub:trunc(ih/vsub)*vsub"

Transcode to a specific frame size & bitrate

My iPhone records video at 1920x1080. The audio doesn't take up much space at all, but the video is H.264 at about 17 Mbps, so it uses up 125 MB per minute. Here is a way to get it down to a more manageable size and more portable container format, along with the 180° rotation ("vflip,hflip") mentioned above:

ffmpeg.exe -i inputfile.mov -acodec copy -b:v 2000k -vf "vflip,hflip,scale=1024:-1" outputfile.mkv

This makes it be about 15 MB/minute: 2 Mbps, 1024 width, -1 means whatever height will preserve the aspect ratio.

Reference:

Fix audio drift

Sometimes audio and video might be out of sync in a way that's more complicated than a simple, constant delay. For example, maybe the longer the video plays, the further behind the video is.

One way to fix this is to speed up or slow down the audio by reinterpreting the sample rate. This will change the pitch but will not alter the audio samples, so it is the least destructive solution. However, this is only useful if the audio drift is constant across the whole video. On one particular DV capture of mine, the audio speeds up and slows down seemingly at random, independently of the video. The only way to fix that is to divide it into segments and adjust each one independently.

Anyway, to reinterpret the sample rate, first calculate the new sample rate, e.g. if the drift in a 32 kHz audio stream is 0.01 s per minute, then 0.01×32000 = 320 samples per minute = 5.333 samples per second. So new sample rate can be about 31995 to slow down or 32005 to speed up. Now you can demux the audio, then use SoX to read it raw, specifying the format with your custom sample rate, and write a new WAV which you can then remux with the video.

  • ffmpeg -i input.avi -c:a copy fast.wav
  • sox -r 31995 -e signed -c 2 -b 16 fast.wav ok.wav
  • ffmpeg -i input.avi -i ok.wav -map 0:0 -map 1:0 -c copy output.avi

The downside of this is the resulting video has an unusual sample rate for the audio. It is possible some players won't like this. In this case I suggest resampling again, e.g. change the sox command above to:

  • sox -r 31995 -e signed -c 2 -b 16 fast.wav -r 32k ok.wav

FFmpeg also offers an option which keeps pitch the same, but it did not work for my situation. I think this is more for videos where the video timestamps are not at regular intervals. It resamples the audio so that it stays in sync with the timestamps in the video, in this example speeding up or slowing down by up to 1000 samples per second:

  • ffmpeg -i input.avi -c:v copy -af "aresample=async=1000" -c:a pcm_s16le output.avi

Processing interlaced content

About interlacing

Interlaced video uses two "fields" of alternating lines to compose one frame. The fields are normally representing slightly different moments in time, so they are shown sequentially, with the assumption that an old CRT screen's fading phosphors and your your eyes' persistence of vision will allow you to perceive the current and previous field even though they were never completely on-screen at any moment in time. Modern computers and handheld devices instead draw the whole frame so both fields are on-screen at the same time, and that image is held without fading. This tends to result in vary obvious "comb" artifacts whenever there's motion, especially horizontal motion. Modern digital TVs usually handle interlaced content specially, either deinterlacing it or otherwise displaying it in a way that looks reasonably good, although you probably will still see the artifacts. (I think interlacing was also less noticeable on old CRT-based TVs because the phosphor screen arranged its "pixels" such that alternating rows were offset like a brick wall. Not sure if that really affected things though.)

Normally when people use the word interlaced, it means each field represents a sequential window of time: basically 1/60th of a second in the NTSC countries, 1/50th in PAL. The first part of the field's first row is what was happening in that part of the camera's view at the beginning of that time window, and the last part of the last row is what was happening in that part of the camera's view at the end of that time window. My impression is that this changed in the late 1980s and beyond, as high-end cameras began to "take a picture" (like a film camera would) 60 times per second and then scan each of those still images, meaning each field is still sequential but represents a much narrower window of time (basically an instant), very much like when film is scanned for the telecine process. There are also cameras or broadcast systems which produce both fields from the same picture, so output 60 fields per second but are sourced from half that many unique images per second. (Someone correct me if I am wrong about any of this.) So interlaced really only refers to a frame being split into fields which may (and usually are) be designated for sequential display.

Detecting interlaced video

Strangely, it is difficult to know whether video is interlaced.

MPEG-2 and MPEG-4 containers can provide some hints, but they often don't, or they get it wrong.

If the container says that the content is interlaced, then ffmpeg -i might say "interlaced" or "(tv)" or "top field first" or "bottom field first".

In reality, some content may be only partially interlaced, such as when an interlaced video scene is spliced in the middle of a progressive film. This is more likely to read as progressive at the container level; you don't find out some of it is interlaced until you play it. Another thing that happens sometimes is progressive content gets encoded as interlaced for broadcast such that each field is sourced from the same picture and thus represents the same moment in time. Telecined content is kind of a hybrid of these two scenarios. I have also seen weird combinations, like interlaced animations over a progressive source and encoded as interlaced (in the Duran Duran "Come Undone" video), or interlaced video sourced from zoomed or composited interlaced material (so there's comb effects you can't get rid of) (in the Duran Duran "Rio" video's letterboxed scenes).

FFmpeg has an interlace detection filter called idet. It tags suspected interlaced frames as interlaced, for the benefit of a deinterlace filter like yadif. It can also be used by itself to get statistics which can help you decide whether and how the video is interlaced.

There's no flag indicating telecine, but if you step through the video frame by frame in VLC with deinterlacing off, you'll see it in action as (e.g.) 2 obviously interlaced frames followed by 3 that look fine, over and over, possibly with occasional irregularities due to sloppy editing. Likewise, telecined content will be read by idet as having many interlaced and progressive frames mixed together, when in fact it is all interlaced, just such that some fields compose a frame representing the same source picture (because the picture was duplicated across two adjacent frames). FFmpeg also has a (sort of) telecine detection filter which looks for duplicate fields; see the inverse telecine section below.

Keep transcoded content interlaced

In FFmpeg, make sure your command line includes this, and your output is in an MPEG-2 or MPEG-4 video format:

  • -flags +ilme+ildct

Fix interlaced field order

If a video file's metadata says it is interlaced with bottom field first, when in fact it is encoded top field first (or vice-versa), you have to edit the container:

  1. Demux the video to an elementary stream, e.g.: ffmpeg -i input.mpg -c:v copy tmp.m2v
  2. Load the stream in Restream and edit its metadata (tick or untick the "top field first" box); click Write. It will write a new file with ".0" appended to the main part of the filename.
  3. Remux the video with FFmpeg (or MP4Box or whatever), e.g.: ffmpeg -i tmp.0.m2v -i input.mpg -map 0:0 -map 1:1 -c:v copy -c:a copy fixedoutput.mpg

FFmpeg will complain about the elementary stream not having timestamps. This is normal and should be OK to ignore. In theory, specifying the framerate, e.g. -r 30000/1001, should eliminate the warning, but it does not, last I checked.

Basic deinterlace

The best deinterlace filter is YADIF (Yet Another De-Interlace Filter). According to the Avisynth wiki, "it checks pixels of previous, current and next frames to re-create the missed field by edge-directed interpolation and uses a spatial check to prevent most artifacts."

In FFmpeg, -deinterlace works, but is deprecated and is now just an alias for -vf yadif. This produces deinterlaced output at the input framerate, i.e. 1 frame for every 2 fields. Fast motion will be blurry but smooth. Bitrate will be reasonable.

You get better quality if you use -vf yadif=mode=1, which outputs 1 frame for each field, so scenes with fast motion will look more like the original, but this also bloats the bitrate.

If there is a mix of interlaced and progressive frames, you can also add the deint=1 option, e.g. -vf yadif=mode=1:deint=1. It tells the filter to only process frames which are tagged as interlaced. This is commonly used after the idet or fieldmatch filters.

Inverse telecine

Telecine content is typically 24 fps film which has been "pulled down" to 29.97 fps (59.94 "fields" of alternating lines per second) in a process that slows the film down by 0.001% and then combines interlacing and duplication: every other frame becomes 2 fields, and every frame in between becomes 3 fields. Technically it is all interlaced, but it works out such that (for NTSC at least) a pattern repeats: 3 frames which might casually be described as progressive or non-interlaced (because both fields are from the same moment in time) followed by 2 frames which are each very obviously interlaced (combining two different moments in time). Undoing this is "inverse telecine" (IVTC) and it can be tricky. The question is, does it really matter? Well I would say yes, if something was shot on film at 24 fps, fast motion may blur in ways that don't match the way it blurs when you watch the film, and this may bug you. Or not. But then you also have to consider that after you have applied the IVTC filters, you now probably have to compress that video again, so you may want to keep a copy around of your "lossless" original content if you can play it as-is.

Here is a filter chain that I have successfully used for IVTC:

  • ffmpeg -i input.vob -vf fieldmatch,yadif=deint=1,decimate -b:v 7000k -c:v mpeg2video -c:a copy out.vob

If I understand correctly, fieldmatch just spots duplicate fields and tags the relevant frames as interlaced. yadif with the deint=1 parameter deinterlaces the tagged frames; 1 in 5 frames will then be a duplicate of the one before it. decimate compares frames in groups of 5 (by default) and removes the 1 frame it thinks is most likely a duplicate.

Thus from every 30 frames of video input you get 24 frames of output (or you can think of it as 4 frames out for every 5 in).

It plays at the ever-so-slightly slow rate of 23.976 fps. If you want to speed it up, I'm sure you could force a perfect 24 fps rate, but I think you then need to also speed up the audio by the same amount. A speed difference of 0.001% is undetectable by human eyes and certainly ears as well; e.g. it is the difference between 440 Hz and 439.56 Hz. So is it worth the fuss? No.

There is another filter, pullup, which can be used in combination with the framerate (-r) option to achieve a similar effect, e.g. -vf pullup -r 24000/1001. I don't know what the differences between the two methods are.

I have some telecine content on DVD (music videos by the French band AIR) which looks like crap no matter which method I use (pullup or fieldmatch,yadif,decimate). I think it is just not a good transfer in the first place.

Convert DV AVI to H.264 MP4

It's 2016/2017 and I'm now digitizing the analog video signal from a VCR by running it through a camcorder which outputs DV format over a FireWire cable. I'm using Sony's PlayMemories Home software to capture the camera's data stream (720x480, 29.97 Hz, interlaced DVCPRO—a.k.a. DVCPRO25, dvsd, dvvideo, or consumer DV) and put it into AVI containers.

The camera has a choice of audio during the transfer: 12-bit or 16-bit. If you choose 12-bit, the output is actually 16-bit 32 kHz PCM (but I assume the bottom 4 bits of each sample are all zeroes). If you choose 16-bit, it's true 16-bit 48 kHz PCM. The 12-bit mode is fine for typical home-movie audio, e.g. speech and background noise, but for tapes with Hi-Fi music on them, I use 16-bit mode.

Anyway, there are problems with the resulting DV-AVI files:

  • Huge files: about 13 GB per hour.
  • DV's 4:1:1 YUV subsampling results in desaturated, fuzzy color (color is sampled at ¼ horizontal resolution—i.e., on each line, each set of 4 pixels gets their average color).
  • Interlaced output looks bad when viewed on computers.

The file size can be reduced by transcoding to more efficient format like H.264 (MPEG-4 AVC) for the video and AAC-LC for the audio. I can't undo the damage caused by the chroma subsampling, but I can make it look less washed-out (but somewhat cartoony) by applying a saturation filter (hue=s=#). The annoying "comb" effect from interlacing can be mitigated, at a cost, by using a deinterlace filter (yadif). The H.264 codec can also be optimized for grainy video, which my old SLP-mode VHS clips tend to have. Here is an example:

 ffmpeg -i inputfile.avi -vf "yadif,hue=s=1.4" -c:v libx264 -preset veryslow -crf 20 -tune grain -c:a aac -b:a 160k outputfile.mp4

With these settings, the output MP4 is about 17% the size of the input AVI, so about 2.2 GB per hour. The video data rate is about 5 Mbps. I think it looks pretty good.

-crf 20 sets quality level 20 (23 is default, lower is better but has diminishing returns, I sometimes use 15). aac is the native AAC encoder, which is better than libfaac but not as good as libfdk_aac (which isn't in my build of FFmpeg).

Here's an example of using 2-pass encoding, which requires specifying a target bitrate rather than quality:

 ffmpeg -i inputfile.avi -vf "yadif,hue=s=1.4" -c:v libx264 -preset veryslow -pass 1 -b:v 11000k -f mp4 -y NUL
 ffmpeg -i inputfile.avi -vf "yadif,hue=s=1.4" -c:v libx264 -preset veryslow -pass 2 -b:v 11000k -y outputfile.mp4

If you have ideas on better settings to use, please let me know!

If you want to output to .VOB files for use with any DVD player, you have to use the older H.262 format (MPEG-2 Video) with 4:2:0 subsampling, and MP2 or AC-3 audio. The video quality can still be very good overall; the format is just not as efficient.

H.264 capabilities

H.264 profiles and levels help optimize the encoded video for different classes of playback devices.

  • Profiles are basically feature sets for different targets:
    • Baseline (BP) = most compatible with cheapest, slowest devices
    • Main (MP) = standard for mainstream/consumer devices, DVD grade
    • Extended (XP) = Main, plus better support for streaming (but not supported in FFmpeg)
    • High (HP or HiP) = for basic broadcast and other HD devices, Blu-Ray grade
    • High 10 (Hi10P) = High, plus support for 10 bpp
    • High 4:2:2 (Hi422P) = High 10, plus support for 4:2:2 chroma subsampling
    • High 4:4:4 Predictive (Hi444PP) = adds support for 4:4:4, 14 bpp, lossless, etc.
  • Levels mandate maximum video bitrates and macroblock rates (which imply reasonable frame sizes & rates for high quality):
    • Level 3 max bitrate = 10 Mbps (BP/MP/XP), 12.5 Mbps (HiP), 40 Mbps (Hi422P/Hi444PP)
    • Level 3.1 max bitrate = 14 Mbps (BP/MP/XP), 17.5 Mbps (HiP), 56 Mbps (Hi422P/Hi444PP)
    • Level 3 max frame = ~ 720×576 @ 25 fps or 720×480 @ 30 fps or 352×480 @ 60 fps
    • Level 3.1 max frame = ~ 1280×720 @ 30 fps or 720×576 @ 60 fps
    • See more details at http://blog.mediacoderhq.com/h264-profiles-and-levels/

To use these features, add to your command line:

  • -profile:v profile where profile is one of baseline, main, high, high10, high422, or high444.
  • -level:v level, where level is one of 3.0, 3.1, 4.0, 4.1, 4.2, 5.0, or 5.1.

If you force the bitrate to be higher than the Profile & Level combo supports, then the file will probably only work in software players.

Since I'm encoding for 720x480 60 fps 4:2:2 devices at 15 Mbps, FFmpeg selects High 4:2:2 level 4. To play this content on my 4:2:2-incapable devices, I have to configure my media server to transcode it; see User:Mjb/Serviio.

If I want a format I can serve natively to my devices, I need to encode 4:2:0 at 12 Mbps, and profile Main level 3.1. Level 4.0 is possible too, but riskier. Example of encoding roughly 5 to 10 Mbps, Main level 3.1, 4:2:0 color, best deinterlace (double framerate), downmix to mono:

  • ffmpeg -i input.avi -vf "yadif=mode=1,hue=s=1.4" -c:v libx264 -preset veryslow -crf 16 -pix_fmt yuv420p -profile:v main -level:v 3.1 -c:a aac -b:a 128k -ac 1 out.mp4

For DV video, all frames are going to be interlaced, but for other sources, maybe only some of them are (e.g. a DVD might use interlaced video scenes in the middle of a film). In this case, you want "idet,yadif=mode=1" instead of just "yadif=mode=1". The idet filter detects interlacing and tags the frames accordingly, so that the yadif filter can act on the appropriate ones.

Chroma subsampling

FFmpeg's x264 codec uses 4:2:2 subsampling (color at ½ horizontal, full vertical resolution) by default—assuming you didn't specify Baseline or Main profile because 4:2:2 is supported by the High profile only—but of course the result can never be better than the 4:1:1 input. Another option, for greater compatibility with playback devices/apps, is to use -pix_fmt yuv420p to have the codec use 4:2:0 (color at ½ horizontal and ½ vertical resolution). 4:2:0 will naturally be worse than 4:2:2, but the difference really is not that visually significant on delinterlaced material; see http://www.red.com/learn/red-101/video-chroma-subsampling for examples.

Avidemux

Rather than using FFmpeg directly, I am also experimenting with using Avidemux, which is a free video editor like VirtualDub. It can utilize FFmpeg libs, among others, if using it to convert output. It also can do lossless editing and splitting. I may use it to split some huge DV AVIs into DVD-R sized pieces.

In order to improve the look of DV captures of LP-mode VHS recordings of analog cable broadcasts, I am trying the following filter chain:

  • ChromaShift (U: -5, V: -4) to get the color fields in sync; this may vary by tape and source.
  • dgbob (mode 1, order 0, threshold 0) for bob deinterlacing (doubles the framerate, but motion is smooth).
  • Mplayer hue (hue -4 to -15, sat 1.0) to make blues blue instead of purple, etc.
  • MPlayer eq2 (cont 1.04, brigh 0.02, sat 1.37) to boost brightness/contrast/gamma/saturation and make neutral colors neutral; figuring out ideal settings is difficult!
  • blacken borders (left: 22, right: 22) to simplify the left and right edges, and to crop out the pixels made colorless by ChromaShift.

x264 encoder settings (General):

  • Preset: veryslow
  • Tuning: grain
  • Profile: baseline
  • Fast First Pass [off]
  • Encoding Mode: Video Size (Two Pass)
  • Target Video Size: depends on destination. To fill up a single-layer DVD-R, I think 4300 MB should be OK. It seems the doubled framerate from dgbob throws the estimate off by roughly half, so I have to double the target size I enter here.

x264 encoder settings (Output 1):

  • Predefined aspect ratio: 8:9 (NTSC 4:3) - This is the pixel aspect ratio (PAR) to tag in the output file, and setting it to 8:9 makes it be the same as the DV input. The Avidemux wiki says not to change this, but if I don't, the output defaults to (I think) PAR 1:1, thus it has display aspect ratio (DAR) 3:2 (because of DV's 720x480 storage); this is slightly horizontally elongated when played. By setting PAR 8:9, it is saying to pretend the pixels are slightly narrower than they are tall.

Avidemux to filter, FFmpeg to encode

It seems to be impossible to stop Avidemux from creating 4:2:0 output (changing the chroma channels to half their vertical resolution), so in order to get 4:2:2, I am resorting to a workaround where I just use Avidemux for filtering, and then do the FFmpeg encoding from the command line. What a mess!

Example workflow:

  • Prep the audio
    • ffmpeg -i "input.avi" -vn -c:a copy "tmp.wav"
    • Process tmp.wav in an audio editor to adjust channels, EQ, resample, normalize to -3 dB peaks, reduce noise, etc.
  • Prep the filtered video
    • In Avidemux, load input.avi and set up the video filter chain as desired
    • Set the Video Output to (FF)HuffYUV - this is a lossless format using about 1 GB per minute!
    • Set Audio Output to Copy
    • In Audio > Select Track, disable Track 1. Don't set it to use tmp.wav because it will add a glitch at the end.
    • Set the Output Format to AVI Muxer
    • See below for how to save the current settings for use with other video files
    • Save to tmp.avi - this will be 3:2 (default for 720x480 with 1:1 pixels) but we'll fix it during compression
  • Calculate the target bitrate
    • This calculator (one of many online) can help. My Blu-Ray player can only handle ~17 Mbps video and I find 15 Mbps (15000 kbps) is usually plenty.
  • Compress the video and audio into one H.264 MP4
  • ffmpeg -y -i tmp.avi -i tmp.wav -map 0:0 -map 1:0 -c:v libx264 -preset veryslow -tune grain -pass 1 -b:v 15000k -aspect 4:3 -c:a aac -b:a 128k -shortest -f mp4 -y NUL
  • ffmpeg -y -i tmp.avi -i tmp.wav -map 0:0 -map 1:0 -c:v libx264 -preset veryslow -tune grain -pass 2 -b:v 15000k -aspect 4:3 -c:a aac -b:a 128k -shortest -y output.mp4

For applying the same filters to multiple files, save a project with the filters you want, then edit the file and remove or comment out everything you don't need. For example, in the following project script, I replaced the comment at the top and commented out the portion specific to a particular video file, so that I can now run it after loading any other video:

# this script sets the following Avidemux options:
#
# HuffYUV (lossless) video compression
# no audio
# output to AVI container w/OpenDML extension (allows files over 4 GB)
# filters for processing my VHS rips
#
adm = Avidemux()
#adm.loadVideo("C:/path/to/some/video.avi")
#adm.clearSegments()
#adm.addSegment(0, 0, 28895531)
#adm.markerA = 0
#adm.markerB = 28895531
adm.videoCodec("HUFFYUV", "encoderType=0")
adm.addVideoFilter("chromashift", "u=-5", "v=-4")
adm.addVideoFilter("dgbob", "thresh=0", "order=False", "mode=1", "ap=False")
adm.addVideoFilter("hue", "hue=-4.000000", "saturation=1.000000")
adm.addVideoFilter("eq2", "contrast=1.040000", "brightness=0.020000", "saturation=1.370000", "gamma=1.270000", "gamma_weight=1.620000", "rgamma=0.990000", "bgamma=1.080000", "ggamma=1.010000")
adm.addVideoFilter("blackenBorder", "left=22", "right=22", "top=0", "bottom=0")
adm.audioClearTracks()
adm.setSourceTrackLanguage(0,"eng")
adm.setContainer("AVI", "odmlType=1")

Save this code into the avidemux "custom" directory, with whatever filename you want, e.g. %appdata%\avidemux\custom\VHS rip filters.py. Restart Avidemux, load a video, and then choose that script from the Custom menu, and the settings should all take effect.

Replace audio in a DVD file

Music videos in MPEG-2 format (.vob or .mpg files) sometimes come with bad source audio and I will want to replace the audio with a good copy of my own. Of course, I have to pay close attention and make sure that the audio in the video is the same; videos often use custom edits or they overdub other sounds on top. Assuming I have suitable audio in a lossless format like FLAC, here's what I do to replace it:

First, extract the source audio to a WAV file:

  • ffmpeg -i input.vob output.wav

Note: if the original audio was lossy, the resulting WAV will probably be bigger because it includes encoder delay & padding, possibly also decoder delay (i.e. a bunch of silence at the beginning, and a little bit at the end). It's best if you can figure out how much there is and trim it. However, I don't know a good way to do that!

Next, use a wave editor (I use Audition) to create a new WAV that is perfectly time-aligned with the old. There are different ways of doing this. Here's one way:

  • Convert the replacement to the desired output sample rate.
  • Pick a non-silent spot at the beginning and end of the original file to be the anchor points. You are looking for spots that you can find in both the old and new files. Set a marker at each spot. (In the original, markers at samples 28936 and 8591175; in the replacement, at samples 4568 and 8560470).
  • How many samples are in between the markers? (8591175-28936=8562239 and 8560470-4568=8555902) Your goal is to change the replacement to match the original.
  • What's the original:replacement ratio? (8562239/8555902=1.0007406583198358279466034089685)
  • Do a pitch shift on the replacement with a target duration of that number multiplied by the current duration, in samples. (8659287*1.0007406583198358279466034089685=8665700)
  • Check how many samples are in between the markers. (8566809-4571=8562238) It should be really close to the original now. If not, figure out what you did wrong and try again.
  • Pad or trim silence from the beginning so that the first marker is at the same location as the first marker in the original. (28936-4571=24365 padding to add). If the beginning is offset or not silent, apply a fade-in beforehand.
  • Fade and/or pad the end, so that the total duration is the same as the original.

Now you need to mux them together. What is supposed to work is this:

  • ffmpeg -i input.vob -i new.wav -map 0:1 -map 1:0 -vcodec copy -acodec copy out.vob

-map 0:1 means use stream #1 from file #0 (input.vob), and -map 1:0 means use stream #0 from file #1 (new.wav).

Unfortunately, FFmpeg currently doesn't like to mux PCM audio into a VOB or MPEG-2 container without giving all kinds of packet too large / buffer underflow errors. Supposedly this was fixed, but it's not working for me, so...

The solution is to use MPEG Video Wizard DVD. In that app: Drag the video into the timeline's video bar, right-click on it and choose to mute its audio (otherwise it will mix them together). Drag the audio into the timeline's music bar. Click on Export (looks like a videotape) in the main toolbar, and make sure it's going to do a stream copy.

A more advanced example:

  • downloaded clip from YouTube as .mp4 (AVC video + AAC audio)
  • demuxed and converted AAC to WAV:
    • ffmpeg -i input.mp4 output.wav
  • noted that output.wav had 18345984 samples (6:56.009)
  • sized replacement audio to 18340836 samples (6:55.892) (a rough guess as to ideal size)
  • used fhgaacenc via foobar2000 to encode replacement audio as .m4a
  • muxed original video with replacement audio, bitrate ~2 Mbps, saturation 1.7x:
    • ffmpeg -i input.mp4 -i new_audio.m4a -map 0:0 -map 1:0 -vcodec libx264 -acodec copy -b:v 2000000 -vf "hue=s=1.7" out.mp4

Resulting video seems synced with its audio, but just to see how bad my guess was:

  • demuxed and converted AAC to WAV:
    • ffmpeg -i out.mp4 out.wav
  • noted that out.wav had 18342912 samples (6:55.939) ... 2076 samples more than I input, but 3072 less than needed!

Oh well.

Make a slideshow for YouTube

The most basic slideshow is just one still image that stays on screen for as long the audio plays.

Assuming you have already encoded the audio to AAC-LC at 48 kHz:

  • ffmpeg -framerate 30 -loop 1 -i image.jpeg -i audio.m4a -c:a copy -shortest -pix_fmt yuv420p out.mp4

I tried experimenting with different framerates to reduce the file size. FFmpeg can produce a usable video down to about 0.02 (1 frame every 50 seconds). Below that, the image does not show up in VLC.

However, I also found that using low framerates throws off the -shortest calculation. At 0.02 fps, a 3:18 video elongates to 5:50, but it is unplayable after 3:18. You can mitigate this somewhat by replacing -shortest with -t #### where #### is the exact duration of the audio file. But even then, the video duration will be the minimum possible with the framerate, in order to get the duration you requested. So at 0.02 fps, you are forcing it to be a multiple of about 50 seconds! Not ideal.

So for now, I think it is best to set -t explicitly, and set -framerate to 1 divided by the approximate precision you want in the duration—e.g., 2 will make the video be no more than a half-second too long (and probably not even that much).

The video picture size will be exactly the same as the source image size, which may not be what you want, for YouTube. Recommended sizes include 854x480, 1280x720, and 1920x1080. In FFmpeg you can rescale and add the necessary padding by setting environment variables to your desired width and height, then using the scale filter with a fancy formula:

  • set w=1920 & set h=1080
  • ffmpeg -framerate 2 -loop 1 -i image.jpeg -i audio.m4a -c:a copy -t duration -vf "scale=min(iw*%h%/ih\,%w%):min(%h%\,ih*%w%/iw),pad=%w%:%h%:(%w%-iw)/2:(%h%-ih)/2" -pix_fmt yuv420p out.mp4

After upload, it seems YouTube will then suggest an "enhancement" of cropping the padding out. That will surely involve transcoding, so don't do it.