如何填充音轨的开头?

How to fill the beginning of an audio track?

如何填充音轨的开头?

我有一个包含 3 分钟内容的视频文件 (.mp4)。

stream 0:0 [Video] H.264  Duration : 3 min 0 s
stream 0:1 [Audio] EC-3   Duration : 2 min 46 s
stream 0:2 [Audio] AAC    Duration : 2 min 59 s

但是,如果我通过FFmpeg转成TS文件,用PotPlayer播放,就出现了奇怪的情况:

FFmpeg转TS文件日志:

ffmpeg started on 2021-12-22 at 17:02:11
Report written to "ffmpeg-20211222-170211.log"
Log level: 48
Command line:
ffmpeg -i file_in.mp4 -map 0:0 -c:v copy -map 0:1 -map 0:2 -c:a aac -ac 2 -ar 48000 -b:a 128k -f mpegts file_out.ts -report
ffmpeg version 2021-12-20-git-631e31773b-full_build-www.gyan.dev Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 11.2.0 (Rev2, Built by MSYS2 project)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libdav1d --enable-libdavs2 --enable-libuavs3d --enable-libzvbi --enable-librav1e --enable-libsvtav1 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libshaderc --enable-vulkan --enable-libplacebo --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-lib  libavutil      57. 11.100 / 57. 11.100
  libavcodec     59. 14.100 / 59. 14.100
  libavformat    59. 10.100 / 59. 10.100
  libavdevice    59.  0.101 / 59.  0.101
  libavfilter     8. 20.100 /  8. 20.100
  libswscale      6.  1.101 /  6.  1.101
  libswresample   4.  0.100 /  4.  0.100
  libpostproc    56.  0.100 / 56.  0.100
Splitting the commandline.
Reading option '-i' ... matched as input url with argument 'file_in.mp4'.
Reading option '-map' ... matched as option 'map' (set input stream mapping) with argument '0:0'.
Reading option '-c:v' ... matched as option 'c' (codec name) with argument 'copy'.
Reading option '-map' ... matched as option 'map' (set input stream mapping) with argument '0:1'.
Reading option '-map' ... matched as option 'map' (set input stream mapping) with argument '0:2'.
Reading option '-c:a' ... matched as option 'c' (codec name) with argument 'aac'.
Reading option '-ac' ... matched as option 'ac' (set number of audio channels) with argument '2'.
Reading option '-ar' ... matched as option 'ar' (set audio sampling rate (in Hz)) with argument '48000'.
Reading option '-b:a' ... matched as option 'b' (video bitrate (please use -b:v)) with argument '128k'.
Reading option '-f' ... matched as option 'f' (force format) with argument 'mpegts'.
Reading option 'file_out.ts' ... matched as output url.
Reading option '-report' ... matched as option 'report' (generate a report) with argument '1'.
Finished splitting the commandline.
Parsing a group of options: global .
Applying option report (generate a report) with argument 1.
Successfully parsed a group of options.
Parsing a group of options: input url file_in.mp4.
Successfully parsed a group of options.
Opening an input file: file_in.mp4.
[NULL @ 000002139462d600] Opening 'file_in.mp4' for reading
[file @ 000002139462db40] Setting default whitelist 'file,crypto,data'
[mov,mp4,m4a,3gp,3g2,mj2 @ 000002139462d600] Format mov,mp4,m4a,3gp,3g2,mj2 probed with size=2048 and score=100
[mov,mp4,m4a,3gp,3g2,mj2 @ 000002139462d600] ISO: File Type Major Brand: isom
[mov,mp4,m4a,3gp,3g2,mj2 @ 000002139462d600] Unknown dref type 0x206c7275 size 12
[mov,mp4,m4a,3gp,3g2,mj2 @ 000002139462d600] Processing st: 0, edit list 0 - media time: -1, duration: 688
[mov,mp4,m4a,3gp,3g2,mj2 @ 000002139462d600] Processing st: 0, edit list 1 - media time: 1335, duration: 2880880
[mov,mp4,m4a,3gp,3g2,mj2 @ 000002139462d600] drop a frame at curr_cts: 2882871 @ 4315
[mov,mp4,m4a,3gp,3g2,mj2 @ 000002139462d600] Offset DTS by 1335 to make first pts zero.
[mov,mp4,m4a,3gp,3g2,mj2 @ 000002139462d600] Setting codecpar->delay to 2 for stream st: 0
[mov,mp4,m4a,3gp,3g2,mj2 @ 000002139462d600] Unknown dref type 0x206c7275 size 12
[mov,mp4,m4a,3gp,3g2,mj2 @ 000002139462d600] Processing st: 1, edit list 0 - media time: -1, duration: 628800
[mov,mp4,m4a,3gp,3g2,mj2 @ 000002139462d600] Processing st: 1, edit list 1 - media time: 0, duration: 8011776
[mov,mp4,m4a,3gp,3g2,mj2 @ 000002139462d600] Unknown dref type 0x206c7275 size 12
[mov,mp4,m4a,3gp,3g2,mj2 @ 000002139462d600] Processing st: 2, edit list 0 - media time: -1, duration: 8208
[mov,mp4,m4a,3gp,3g2,mj2 @ 000002139462d600] Processing st: 2, edit list 1 - media time: 0, duration: 8632272
[mov,mp4,m4a,3gp,3g2,mj2 @ 000002139462d600] rfps: 23.916667 0.004923
[mov,mp4,m4a,3gp,3g2,mj2 @ 000002139462d600] rfps: 23.916667 0.004923
[mov,mp4,m4a,3gp,3g2,mj2 @ 000002139462d600] rfps: 24.000000 0.000861
[mov,mp4,m4a,3gp,3g2,mj2 @ 000002139462d600] rfps: 24.083333 0.016134
[mov,mp4,m4a,3gp,3g2,mj2 @ 000002139462d600] rfps: 48.000000 0.003444
[mov,mp4,m4a,3gp,3g2,mj2 @ 000002139462d600] rfps: 23.976024 0.000048
[mov,mp4,m4a,3gp,3g2,mj2 @ 000002139462d600] rfps: 47.952048 0.000194
[mov,mp4,m4a,3gp,3g2,mj2 @ 000002139462d600] Before avformat_find_stream_info() pos: 57062140 bytes read:264333 seeks:1 nb_streams:3
[h264 @ 000002139462f040] nal_unit_type: 7(SPS), nal_ref_idc: 3
[h264 @ 000002139462f040] nal_unit_type: 8(PPS), nal_ref_idc: 3
[h264 @ 000002139462f040] nal_unit_type: 7(SPS), nal_ref_idc: 3
[h264 @ 000002139462f040] nal_unit_type: 8(PPS), nal_ref_idc: 3
[h264 @ 000002139462f040] nal_unit_type: 6(SEI), nal_ref_idc: 0
[h264 @ 000002139462f040] nal_unit_type: 5(IDR), nal_ref_idc: 3
[h264 @ 000002139462f040] Format yuv420p chosen by get_format().
[h264 @ 000002139462f040] Reinit context to 1920x1024, pix_fmt: yuv420p
[h264 @ 000002139462f040] no picture 
[mov,mp4,m4a,3gp,3g2,mj2 @ 000002139462d600] max_analyze_duration 5000000 reached at 5004988 microseconds st:0
[mov,mp4,m4a,3gp,3g2,mj2 @ 000002139462d600] After avformat_find_stream_info() pos: 349875 bytes read:624781 seeks:2 frames:345
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'file_in.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.12.100
  Duration: 00:03:00.06, start: 0.043000, bitrate: 2535 kb/s
  Stream #0:0[0x1](und), 120, 1/16000: Video: h264 (High) (avc1 / 0x31637661), yuv420p(progressive), 1916x1012 [SAR 1:1 DAR 479:253], 1945 kb/s, 23.98 fps, 23.98 tbr, 16k tbn (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
  Stream #0:1[0x2](por), 0, 1/48000: Audio: eac3 (ec-3 / 0x332D6365), 48000 Hz, 5.1(side), fltp, 384 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]
    Side data:
      audio service type: main
  Stream #0:2[0x3](eng), 225, 1/48000: Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, 5.1, fltp, 224 kb/s
    Metadata:
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]
Successfully opened the file.
Parsing a group of options: output url file_out.ts.
Applying option map (set input stream mapping) with argument 0:0.
Applying option c:v (codec name) with argument copy.
Applying option map (set input stream mapping) with argument 0:1.
Applying option map (set input stream mapping) with argument 0:2.
Applying option c:a (codec name) with argument aac.
Applying option ac (set number of audio channels) with argument 2.
Applying option ar (set audio sampling rate (in Hz)) with argument 48000.
Applying option b:a (video bitrate (please use -b:v)) with argument 128k.
Applying option f (force format) with argument mpegts.
Successfully parsed a group of options.
Opening an output file: file_out.ts.
[file @ 0000021395074580] Setting default whitelist 'file,crypto,data'
Successfully opened the file.
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #0:1 -> #0:1 (eac3 (native) -> aac (native))
  Stream #0:2 -> #0:2 (aac (native) -> aac (native))
Press [q] to stop, [?] for help
cur_dts is invalid st:0 (0) [init:1 i_done:0 finish:0] (this is harmless if it occurs once at the start per stream)
!!!!!!! A lot of repeated lines !!!!!!!
cur_dts is invalid st:1 (0) [init:0 i_done:0 finish:0] (this is harmless if it occurs once at the start per stream)
detected 12 logical cores
[graph_1_in_0_2 @ 0000021394fe9140] Setting 'time_base' to value '1/48000'
[graph_1_in_0_2 @ 0000021394fe9140] Setting 'sample_rate' to value '48000'
[graph_1_in_0_2 @ 0000021394fe9140] Setting 'sample_fmt' to value 'fltp'
[graph_1_in_0_2 @ 0000021394fe9140] Setting 'channel_layout' to value '0x3f'
[graph_1_in_0_2 @ 0000021394fe9140] tb:1/48000 samplefmt:fltp samplerate:48000 chlayout:0x3f
[format_out_0_2 @ 0000021394fe9340] Setting 'sample_fmts' to value 'fltp'
[format_out_0_2 @ 0000021394fe9340] Setting 'sample_rates' to value '48000'
[format_out_0_2 @ 0000021394fe9340] Setting 'channel_layouts' to value '0x3'
[format_out_0_2 @ 0000021394fe9340] auto-inserting filter 'auto_aresample_0' between the filter 'Parsed_anull_0' and the filter 'format_out_0_2'
[AVFilterGraph @ 000002139473a5c0] query_formats: 4 queried, 7 merged, 3 already done, 0 delayed
[auto_aresample_0 @ 0000021394fe97c0] [SWR @ 0000021394ce9200] Using fltp internally between filters
[auto_aresample_0 @ 0000021394fe97c0] [SWR @ 0000021394ce9200] Matrix coefficients:
[auto_aresample_0 @ 0000021394fe97c0] [SWR @ 0000021394ce9200] FL: FL:1.000000 FR:0.000000 FC:0.707107 LFE:0.000000 BL:0.707107 BR:0.000000 
[auto_aresample_0 @ 0000021394fe97c0] [SWR @ 0000021394ce9200] FR: FL:0.000000 FR:1.000000 FC:0.707107 LFE:0.000000 BL:0.000000 BR:0.707107 
[auto_aresample_0 @ 0000021394fe97c0] ch:6 chl:5.1 fmt:fltp r:48000Hz -> ch:2 chl:stereo fmt:fltp r:48000Hz
cur_dts is invalid st:0 (0) [init:1 i_done:0 finish:0] (this is harmless if it occurs once at the start per stream)
!!!!!!! A lot of repeated lines !!!!!!!
cur_dts is invalid st:1 (0) [init:0 i_done:0 finish:0] (this is harmless if it occurs once at the start per stream)
[graph_0_in_0_1 @ 0000021394f88540] Setting 'time_base' to value '1/48000'
[graph_0_in_0_1 @ 0000021394f88540] Setting 'sample_rate' to value '48000'
[graph_0_in_0_1 @ 0000021394f88540] Setting 'sample_fmt' to value 'fltp'
[graph_0_in_0_1 @ 0000021394f88540] Setting 'channel_layout' to value '0x60f'
[graph_0_in_0_1 @ 0000021394f88540] tb:1/48000 samplefmt:fltp samplerate:48000 chlayout:0x60f
[format_out_0_1 @ 0000021394f88c40] Setting 'sample_fmts' to value 'fltp'
[format_out_0_1 @ 0000021394f88c40] Setting 'sample_rates' to value '48000'
[format_out_0_1 @ 0000021394f88c40] Setting 'channel_layouts' to value '0x3'
[format_out_0_1 @ 0000021394f88c40] auto-inserting filter 'auto_aresample_0' between the filter 'Parsed_anull_0' and the filter 'format_out_0_1'
[AVFilterGraph @ 0000021394739980] query_formats: 4 queried, 7 merged, 3 already done, 0 delayed
[auto_aresample_0 @ 0000021394f89c40] [SWR @ 0000021394995200] Using fltp internally between filters
[auto_aresample_0 @ 0000021394f89c40] [SWR @ 0000021394995200] Matrix coefficients:
[auto_aresample_0 @ 0000021394f89c40] [SWR @ 0000021394995200] FL: FL:1.000000 FR:0.000000 FC:0.707107 LFE:0.000000 SL:0.707107 SR:0.000000 
[auto_aresample_0 @ 0000021394f89c40] [SWR @ 0000021394995200] FR: FL:0.000000 FR:1.000000 FC:0.707107 LFE:0.000000 SL:0.000000 SR:0.707107 
[auto_aresample_0 @ 0000021394f89c40] ch:6 chl:5.1(side) fmt:fltp r:48000Hz -> ch:2 chl:stereo fmt:fltp r:48000Hz
[mpegts @ 000002139505b440] service 1 using PCR in pid=256, pcr_period=83ms
[mpegts @ 000002139505b440] muxrate VBR, sdt every 500 ms, pat/pmt every 100 ms
Output #0, mpegts, to 'file_out.ts':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf59.10.100
  Stream #0:0(und), 0, 1/90000: Video: h264 (High) (avc1 / 0x31637661), yuv420p(progressive), 1916x1012 [SAR 1:1 DAR 479:253], q=2-31, 1945 kb/s, 23.98 fps, 23.98 tbr, 90k tbn (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
  Stream #0:1(por), 0, 1/90000: Audio: aac (LC), 48000 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]
      encoder         : Lavc59.14.100 aac
    Side data:
      audio service type: main
  Stream #0:2(eng), 0, 1/90000: Audio: aac (LC), 48000 Hz, stereo, fltp, 128 kb/s
    Metadata:
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]
      encoder         : Lavc59.14.100 aac
Automatically inserted bitstream filter 'h264_mp4toannexb'; args=''
[mpegts @ 000002139505b440] Delay between the first packet and last packet in the muxing queue is 10009433 > 10000000: forcing output
[mpegts @ 000002139505b440] Delay between the first packet and last packet in the muxing queue is 10010433 > 10000000: forcing output
[mpegts @ 000002139505b440] Delay between the first packet and last packet in the muxing queue is 10010000 > 10000000: forcing output
!!!!!!! A lot of repeated lines !!!!!!!
[mpegts @ 000002139505b440] Delay between the first packet and last packet in the muxing queue is 10153500 > 10000000: forcing output
[mpegts @ 000002139505b440] Delay between the first packet and last packet in the muxing queue is 10132500 > 10000000: forcing output
[mpegts @ 000002139505b440] Delay between the first packet and last packet in the muxing queue is 10111167 > 10000000: forcing output
[mpegts @ 000002139505b440] Delay between the first packet and last packet in the muxing queue is 10089833 > 10000000: forcing output
[mpegts @ 000002139505b440] Delay between the first packet and last packet in the muxing queue is 10068500 > 10000000: forcing output
[mpegts @ 000002139505b440] Delay between the first packet and last packet in the muxing queue is 10047500 > 10000000: forcing output
[mpegts @ 000002139505b440] Delay between the first packet and last packet in the muxing queue is 10025500 > 10000000: forcing output
[mpegts @ 000002139505b440] Delay between the first packet and last packet in the muxing queue is 10004167 > 10000000: forcing output
frame=  316 fps=0.0 q=-1.0 size=     226kB time=00:00:13.05 bitrate= 141.8kbits/s speed=24.3x    
cur_dts is invalid st:1 (0) [init:1 i_done:0 finish:0] (this is harmless if it occurs once at the start per stream)
[mpegts @ 000002139505b440] Delay between the first packet and last packet in the muxing queue is 10024833 > 10000000: forcing output
[mpegts @ 000002139505b440] Delay between the first packet and last packet in the muxing queue is 10010000 > 10000000: forcing output
[mpegts @ 000002139505b440] Delay between the first packet and last packet in the muxing queue is 10003500 > 10000000: forcing output
cur_dts is invalid st:1 (0) [init:1 i_done:0 finish:0] (this is harmless if it occurs once at the start per stream)
cur_dts is invalid st:1 (0) [init:1 i_done:0 finish:0] (this is harmless if it occurs once at the start per stream)
cur_dts is invalid st:1 (0) [init:1 i_done:0 finish:0] (this is harmless if it occurs once at the start per stream)
frame=  557 fps=537 q=-1.0 size=    1280kB time=00:00:23.10 bitrate= 453.8kbits/s speed=22.3x    
frame=  824 fps=536 q=-1.0 size=    3840kB time=00:00:34.26 bitrate= 918.2kbits/s speed=22.3x    
frame= 1117 fps=548 q=-1.0 size=   10496kB time=00:00:46.46 bitrate=1850.6kbits/s speed=22.8x    
frame= 1401 fps=552 q=-1.0 size=   16384kB time=00:00:58.30 bitrate=2301.9kbits/s speed=  23x    
frame= 1619 fps=533 q=-1.0 size=   19456kB time=00:01:07.41 bitrate=2364.3kbits/s speed=22.2x    
frame= 1922 fps=543 q=-1.0 size=   23296kB time=00:01:20.03 bitrate=2384.4kbits/s speed=22.6x    
frame= 2174 fps=538 q=-1.0 size=   26368kB time=00:01:30.54 bitrate=2385.5kbits/s speed=22.4x    
frame= 2414 fps=532 q=-1.0 size=   30464kB time=00:01:40.56 bitrate=2481.6kbits/s speed=22.2x    
frame= 2702 fps=536 q=-1.0 size=   34048kB time=00:01:52.57 bitrate=2477.7kbits/s speed=22.3x    
frame= 3007 fps=543 q=-1.0 size=   38912kB time=00:02:05.29 bitrate=2544.2kbits/s speed=22.6x    
frame= 3279 fps=543 q=-1.0 size=   42496kB time=00:02:16.63 bitrate=2547.8kbits/s speed=22.6x    
frame= 3515 fps=537 q=-1.0 size=   44800kB time=00:02:26.47 bitrate=2505.5kbits/s speed=22.4x    
frame= 3693 fps=525 q=-1.0 size=   46080kB time=00:02:33.90 bitrate=2452.7kbits/s speed=21.9x    
frame= 3893 fps=516 q=-1.0 size=   47616kB time=00:02:42.24 bitrate=2404.2kbits/s speed=21.5x    
frame= 4060 fps=505 q=-1.0 size=   49152kB time=00:02:49.21 bitrate=2379.5kbits/s speed=  21x    
frame= 4230 fps=495 q=-1.0 size=   50176kB time=00:02:56.31 bitrate=2331.2kbits/s speed=20.6x    
[out_0_2 @ 0000021394fe9240] EOF on sink link out_0_2:default.
[out_0_1 @ 0000021394f88340] EOF on sink link out_0_1:default.
No more output streams to write to, finishing.
frame= 4317 fps=485 q=-1.0 Lsize=   50763kB time=00:02:59.96 bitrate=2310.7kbits/s speed=20.2x    
video:42755kB audio:5483kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 5.235127%
Input file #0 (file_in.mp4):
  Input stream #0:0 (video): 4317 packets read (43781511 bytes); 
  Input stream #0:1 (audio): 5216 packets read (8011776 bytes); 5216 frames decoded (8011776 samples); 
  Input stream #0:2 (audio): 8430 packets read (5037240 bytes); 8430 frames decoded (8632320 samples); 
  Total: 17963 packets (56830527 bytes) demuxed
Output file #0 (file_out.ts):
  Output stream #0:0 (video): 4317 packets muxed (43781511 bytes); 
  Output stream #0:1 (audio): 7824 frames encoded (8011776 samples); 7825 packets muxed (2712232 bytes); 
  Output stream #0:2 (audio): 8430 frames encoded (8632320 samples); 8431 packets muxed (2902132 bytes); 
  Total: 20573 packets (49395875 bytes) muxed
[AVIOContext @ 0000021394635180] Statistics: 51981812 bytes written, 0 seeks, 199 writeouts
13646 frames successfully decoded, 0 decoding errors
[aac @ 000002139507a800] Qavg: 852.752
[aac @ 00000213949000c0] Qavg: 830.111
[AVIOContext @ 0000021394636000] Statistics: 57112522 bytes read, 2 seeks

只需检查其属性:

Audio 1 has 0 channel!

[mpegts @ 0000029a189781c0] After avformat_find_stream_info() pos: 0 bytes read:1036432 seeks:3 frames:492
Input #0, mpegts, from 'file_out.ts':
  Duration: 00:03:00.14, start: 1.483433, bitrate: 2308 kb/s
  Program 1 
    Metadata:
      service_provider: FFmpeg
  Stream #0:0[0x100], 170, 1/90000: Video: h264 (High) ([27][0][0][0] / 0x001B), yuv420p(progressive), 1916x1012 [SAR 1:1 DAR 479:253], 23.98 fps, 23.98 tbr, 90k tbn
  Stream #0:1[0x101](por), 0, 1/90000: Audio: aac ([15][0][0][0] / 0x000F), 0 channels
  Stream #0:2[0x102](eng), 322, 1/90000: Audio: aac (LC) ([15][0][0][0] / 0x000F), 48000 Hz, stereo, fltp, 130 kb/s

我环顾四周,显然我可以在这些库/解决方案之间做出选择:

  1. 在Filter_complex中使用concat,在Audio 1的前面连接一个13秒的静音片段。 结果:无法精确控制静音片段的长度,画面与声音不同步,播放时明显卡顿,看来不是个好主意。

  2. 在Filter_complex中使用amix,创建一个无声片段,延迟Audio1并混合它们。 结果:属性看起来没问题,但似乎“adelay”这个参数对Audio1没有影响。
    我使用命令:

    ffmpeg 
    -f lavfi -i anullsrc 
    -i "file_in.mp4" 
    -filter_complex "[1:1]adelay=13s|13s[short];[0:0][short]amix[aout]" 
    -map 1:0 -c:v copy 
    -map "[aout]" -map 1:2 -c:a aac -ac 2 -ar 48000 -b:a 128k 
    -shortest -f mpegts "file_out.ts"
    

那么,我应该如何正确填写曲目的开头

只需要从头到尾填充该音轨。您可以为此使用 aresample 过滤器。

ffmpeg 
-i "file_in.mp4" 
-filter:a:0 "aresample=48000:ocl=stereo:async=3072:first_pts=0" 
-map 0 -c copy -c:a:0 aac -b:a 128k -f mpegts "file_out.ts"