
Recherche avancée
Médias (91)
-
Chuck D with Fine Arts Militia - No Meaning No
15 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
-
Paul Westerberg - Looking Up in Heaven
15 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
-
Le Tigre - Fake French
15 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
-
Thievery Corporation - DC 3000
15 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
-
Dan the Automator - Relaxation Spa Treatment
15 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
-
Gilberto Gil - Oslodum
15 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
Autres articles (70)
-
Mise à jour de la version 0.1 vers 0.2
24 juin 2013, parExplications des différents changements notables lors du passage de la version 0.1 de MediaSPIP à la version 0.3. Quelles sont les nouveautés
Au niveau des dépendances logicielles Utilisation des dernières versions de FFMpeg (>= v1.2.1) ; Installation des dépendances pour Smush ; Installation de MediaInfo et FFprobe pour la récupération des métadonnées ; On n’utilise plus ffmpeg2theora ; On n’installe plus flvtool2 au profit de flvtool++ ; On n’installe plus ffmpeg-php qui n’est plus maintenu au (...) -
Le profil des utilisateurs
12 avril 2011, parChaque utilisateur dispose d’une page de profil lui permettant de modifier ses informations personnelle. Dans le menu de haut de page par défaut, un élément de menu est automatiquement créé à l’initialisation de MediaSPIP, visible uniquement si le visiteur est identifié sur le site.
L’utilisateur a accès à la modification de profil depuis sa page auteur, un lien dans la navigation "Modifier votre profil" est (...) -
Configurer la prise en compte des langues
15 novembre 2010, parAccéder à la configuration et ajouter des langues prises en compte
Afin de configurer la prise en compte de nouvelles langues, il est nécessaire de se rendre dans la partie "Administrer" du site.
De là, dans le menu de navigation, vous pouvez accéder à une partie "Gestion des langues" permettant d’activer la prise en compte de nouvelles langues.
Chaque nouvelle langue ajoutée reste désactivable tant qu’aucun objet n’est créé dans cette langue. Dans ce cas, elle devient grisée dans la configuration et (...)
Sur d’autres sites (8040)
-
FFmpeg media info incorrect when encoding aac audio track
11 janvier 2023, par user2028936I recently dipped my feet into encoding with FFmpeg. It's been working fine for me apart form 1 issues I have noticed recently. When I try and re-encode an audio track to acc the media info tags are somewhat wrong. Bitrate, streamsize etc seems to be copied form the source as opposed to reading it from the new track.



Essentially I want to copy the source video, audio and subtitles and also create an AAC version of the soundtrack for playback on devices that don't support HD audio.



Below is an example of what I am doing :



ffmpeg -i source.mkv -map_chapters 0 -map 0:v -c:v copy -c:s copy -map 0:a -c:a:0 copy -map 0:a c:a:1 aac destination.mkv




This results in a media info of :



Audio #1
ID : 2
Format : MLP FBA 16-ch
Format/Info : Meridian Lossless Packing FBA with 16-channel presentation
Commercial name : Dolby TrueHD with Dolby Atmos
Codec ID : A_TRUEHD
Duration : 1 h 50 min
Bit rate mode : Variable
Bit rate : 3 545 kb/s
Maximum bit rate : 5 826 kb/s
Channel(s) : 8 channels
Channel layout : L R C LFE Ls Rs Lb Rb
Sampling rate : 48.0 kHz
Frame rate : 1 200.000 FPS (40 SPF)
Bit depth : 24 bits
Compression mode : Lossless
Delay relative to video : 21 ms
Stream size : 2.73 GiB (29%)
Title : Dolby TrueHD 7.1 (Atmos)
Language : English
Default : Yes
Forced : No
Number of dynamic objects : 11
Bed channel count : 1 channel
Bed channel configuration : LFE

Audio #2
ID : 3
Format : AAC LC
Format/Info : Advanced Audio Codec Low Complexity
Codec ID : A_AAC-2
Duration : 1 h 50 min
Bit rate : 3 545 kb/s
Channel(s) : 8 channels
Channel layout : C L R Ls Rs Lb Rb LFE
Sampling rate : 48.0 kHz
Frame rate : 46.875 FPS (1024 SPF)
Compression mode : Lossy
Stream size : 2.73 GiB (29%)
Title : AAC 5.1
Writing library : Lavc58.62.100 aac
Language : English
Default : Yes
Forced : No




As you can see the bit rate, stream size, etc is copied from the source track which is incorrect. I've tried the -map_metadata -1 but that removes the metadata without creating a new one and also removed it from all video and audio tracks which I don't want.



Any other ideas what I am doing wrong or is this just a quirk of the software ?



Thanks in advance.



Adding -report output trimming the lines "Writing block of size..." as the file is huge :



ffmpeg started on 2019-12-04 at 14:31:25
Report written to "ffmpeg-20191204-143125.log"
Log level: 48
Command line:
"C:\\ffmpeg\\ffmpeg-20191118-d831edc-win64-static\\bin\\ffmpeg" -i input.mkv -report -map_chapters 0 -map 0:v -c:v copy -c:s copy -map 0:a -c:a:0 copy -map 0:a -c:a:1 aac destination.mkv
ffmpeg version git-2019-11-18-d831edc Copyright (c) 2000-2019 the FFmpeg developers
 built with gcc 9.2.1 (GCC) 20191010
 configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt --enable-amf
 libavutil 56. 36.100 / 56. 36.100
 libavcodec 58. 62.100 / 58. 62.100
 libavformat 58. 35.100 / 58. 35.100
 libavdevice 58. 9.101 / 58. 9.101
 libavfilter 7. 66.100 / 7. 66.100
 libswscale 5. 6.100 / 5. 6.100
 libswresample 3. 6.100 / 3. 6.100
 libpostproc 55. 6.100 / 55. 6.100
Splitting the commandline.
Reading option '-i' ... matched as input url with argument 'input.mkv'.
Reading option '-report' ... matched as option 'report' (generate a report) with argument '1'.
Reading option '-map_chapters' ... matched as option 'map_chapters' (set chapters mapping) with argument '0'.
Reading option '-map' ... matched as option 'map' (set input stream mapping) with argument '0:v'.
Reading option '-c:v' ... matched as option 'c' (codec name) with argument 'copy'.
Reading option '-c:s' ... matched as option 'c' (codec name) with argument 'copy'.
Reading option '-map' ... matched as option 'map' (set input stream mapping) with argument '0:a'.
Reading option '-c:a:0' ... matched as option 'c' (codec name) with argument 'copy'.
Reading option '-map' ... matched as option 'map' (set input stream mapping) with argument '0:a'.
Reading option '-c:a:1' ... matched as option 'c' (codec name) with argument 'aac'.
Reading option 'destination.mkv' ... matched as output url.
Finished splitting the commandline.
Parsing a group of options: global .
Applying option report (generate a report) with argument 1.
Successfully parsed a group of options.
Parsing a group of options: input url input.mkv.
Successfully parsed a group of options.
Opening an input file: input.mkv.
[NULL @ 000001b290ddad00] Opening 'input.mkv' for reading
[file @ 000001b290ddbe00] Setting default whitelist 'file,crypto'
[matroska,webm @ 000001b290ddad00] Format matroska,webm probed with size=2048 and score=100
st:0 removing common factor 1000000 from timebase
st:1 removing common factor 1000000 from timebase
[matroska,webm @ 000001b290ddad00] Before avformat_find_stream_info() pos: 5080 bytes read:32768 seeks:0 nb_streams:2
[hevc @ 000001b290ddeb80] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 000001b290ddeb80] Decoding VPS
[hevc @ 000001b290ddeb80] Main 10 profile bitstream
[hevc @ 000001b290ddeb80] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 000001b290ddeb80] Decoding SPS
[hevc @ 000001b290ddeb80] Main 10 profile bitstream
[hevc @ 000001b290ddeb80] Decoding VUI
[hevc @ 000001b290ddeb80] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 000001b290ddeb80] Decoding PPS
[hevc @ 000001b290ddeb80] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 000001b290ddeb80] Decoding SEI
[hevc @ 000001b290ddeb80] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 000001b290ddeb80] Decoding SEI
[hevc @ 000001b290ddeb80] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0
[hevc @ 000001b290ddeb80] Decoding SEI
[hevc @ 000001b290ddeb80] Skipped PREFIX SEI 5
[matroska,webm @ 000001b290ddad00] All info found
[matroska,webm @ 000001b290ddad00] After avformat_find_stream_info() pos: 10219 bytes read:32768 seeks:0 frames:4
Input #0, matroska,webm, from 'input.mkv':
 Metadata:
 title : input
 ENCODER : Lavf58.35.100
 Duration: 01:50:06.89, start: 0.000000, bitrate: 11569 kb/s
 Chapter #0:0: start 0.000000, end 308.766792
 Metadata:
 title : Chapter 01
 Chapter #0:1: start 308.766792, end 679.720708
 Metadata:
 title : Chapter 02
 Chapter #0:2: start 679.720708, end 904.820583
 Metadata:
 title : Chapter 03
 Chapter #0:3: start 904.820583, end 1282.781500
 Metadata:
 title : Chapter 04
 Chapter #0:4: start 1282.781500, end 1487.736250
 Metadata:
 title : Chapter 05
 Chapter #0:5: start 1487.736250, end 1846.177667
 Metadata:
 title : Chapter 06
 Chapter #0:6: start 1846.177667, end 2116.531083
 Metadata:
 title : Chapter 07
 Chapter #0:7: start 2116.531083, end 2443.858083
 Metadata:
 title : Chapter 08
 Chapter #0:8: start 2443.858083, end 2731.979250
 Metadata:
 title : Chapter 09
 Chapter #0:9: start 2731.979250, end 3178.383542
 Metadata:
 title : Chapter 10
 Chapter #0:10: start 3178.383542, end 3636.507875
 Metadata:
 title : Chapter 11
 Chapter #0:11: start 3636.507875, end 3867.488625
 Metadata:
 title : Chapter 12
 Chapter #0:12: start 3867.488625, end 4208.037167
 Metadata:
 title : Chapter 13
 Chapter #0:13: start 4208.037167, end 4502.372875
 Metadata:
 title : Chapter 14
 Chapter #0:14: start 4502.372875, end 4853.431917
 Metadata:
 title : Chapter 15
 Chapter #0:15: start 4853.431917, end 5267.470542
 Metadata:
 title : Chapter 16
 Chapter #0:16: start 5267.470542, end 5543.329458
 Metadata:
 title : Chapter 17
 Chapter #0:17: start 5543.329458, end 5847.132958
 Metadata:
 title : Chapter 18
 Chapter #0:18: start 5847.132958, end 6150.269125
 Metadata:
 title : Chapter 19
 Chapter #0:19: start 6150.269125, end 6606.891958
 Metadata:
 title : Chapter 20
 Stream #0:0, 3, 1/1000: Video: hevc (Main 10), yuv420p10le(tv, bt2020nc/bt2020/smpte2084), 3840x1600 [SAR 1:1 DAR 12:5], 23.98 fps, 23.98 tbr, 1k tbn, 23.98 tbc
 Metadata:
 title : input
 ENCODER : Lavc58.62.100 libx265
 DURATION : 01:50:06.891000000
 Stream #0:1(eng), 1, 1/1000: Audio: truehd, 48000 Hz, 7.1, s32 (24 bit) (default)
 Metadata:
 title : input
 _STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
 BPS-eng : 3544728
 DURATION-eng : 01:50:06.893000000
 NUMBER_OF_FRAMES-eng: 7928271
 NUMBER_OF_BYTES-eng: 2927455224
 _STATISTICS_WRITING_APP-eng: mkvmerge v40.0.0 ('Old Town Road + Pony') 64-bit
 _STATISTICS_WRITING_DATE_UTC-eng: 2019-11-30 10:59:25
 DURATION : 01:50:06.893000000
Successfully opened the file.
Parsing a group of options: output url destination.mkv.
Applying option map_chapters (set chapters mapping) with argument 0.
Applying option map (set input stream mapping) with argument 0:v.
Applying option c:v (codec name) with argument copy.
Applying option c:s (codec name) with argument copy.
Applying option map (set input stream mapping) with argument 0:a.
Applying option c:a:0 (codec name) with argument copy.
Applying option map (set input stream mapping) with argument 0:a.
Applying option c:a:1 (codec name) with argument aac.
Successfully parsed a group of options.
Opening an output file: destination.mkv.
[file @ 000001b290ea7e40] Setting default whitelist 'file,crypto'
Successfully opened the file.
Stream mapping:
 Stream #0:0 -> #0:0 (copy)
 Stream #0:1 -> #0:1 (copy)
 Stream #0:1 -> #0:2 (truehd (native) -> aac (native))
Press [q] to stop, [?] for help
cur_dts is invalid st:0 (0) [init:1 i_done:0 finish:0] (this is harmless if it occurs once at the start per stream)
cur_dts is invalid st:1 (0) [init:1 i_done:0 finish:0] (this is harmless if it occurs once at the start per stream)
cur_dts is invalid st:2 (0) [init:0 i_done:0 finish:0] (this is harmless if it occurs once at the start per stream)
cur_dts is invalid st:0 (0) [init:1 i_done:0 finish:0] (this is harmless if it occurs once at the start per stream)
cur_dts is invalid st:1 (0) [init:1 i_done:0 finish:0] (this is harmless if it occurs once at the start per stream)
cur_dts is invalid st:2 (0) [init:0 i_done:0 finish:0] (this is harmless if it occurs once at the start per stream)
cur_dts is invalid st:0 (0) [init:1 i_done:0 finish:0] (this is harmless if it occurs once at the start per stream)
cur_dts is invalid st:1 (0) [init:1 i_done:0 finish:0] (this is harmless if it occurs once at the start per stream)
cur_dts is invalid st:2 (0) [init:0 i_done:0 finish:0] (this is harmless if it occurs once at the start per stream)
cur_dts is invalid st:0 (0) [init:1 i_done:0 finish:0] (this is harmless if it occurs once at the start per stream)
cur_dts is invalid st:1 (0) [init:1 i_done:0 finish:0] (this is harmless if it occurs once at the start per stream)
cur_dts is invalid st:2 (0) [init:0 i_done:0 finish:0] (this is harmless if it occurs once at the start per stream)
detected 12 logical cores
[graph_0_in_0_1 @ 000001b290dda840] Setting 'time_base' to value '1/48000'
[graph_0_in_0_1 @ 000001b290dda840] Setting 'sample_rate' to value '48000'
[graph_0_in_0_1 @ 000001b290dda840] Setting 'sample_fmt' to value 's32'
[graph_0_in_0_1 @ 000001b290dda840] Setting 'channel_layout' to value '0x63f'
[graph_0_in_0_1 @ 000001b290dda840] tb:1/48000 samplefmt:s32 samplerate:48000 chlayout:0x63f
[format_out_0_2 @ 000001b290e0ea00] Setting 'sample_fmts' to value 'fltp'
[format_out_0_2 @ 000001b290e0ea00] Setting 'sample_rates' to value '96000|88200|64000|48000|44100|32000|24000|22050|16000|12000|11025|8000|7350'
[format_out_0_2 @ 000001b290e0ea00] auto-inserting filter 'auto_resampler_0' between the filter 'Parsed_anull_0' and the filter 'format_out_0_2'
[AVFilterGraph @ 000001b290e1b3c0] query_formats: 4 queried, 6 merged, 3 already done, 0 delayed
[auto_resampler_0 @ 000001b290e0fe80] [SWR @ 000001b291232000] Using fltp internally between filters
[auto_resampler_0 @ 000001b290e0fe80] ch:8 chl:7.1 fmt:s32 r:48000Hz -> ch:8 chl:7.1 fmt:fltp r:48000Hz
[matroska @ 000001b290dfba40] get_metadata_duration returned: 6606893000
[matroska @ 000001b290dfba40] Write early duration from metadata = 6606893
Output #0, matroska, to 'destination.mkv':
 Metadata:
 title : input
 encoder : Lavf58.35.100
 Chapter #0:0: start 0.000000, end 308.766792
 Metadata:
 title : Chapter 01
 Chapter #0:1: start 308.766792, end 679.720708
 Metadata:
 title : Chapter 02
 Chapter #0:2: start 679.720708, end 904.820583
 Metadata:
 title : Chapter 03
 Chapter #0:3: start 904.820583, end 1282.781500
 Metadata:
 title : Chapter 04
 Chapter #0:4: start 1282.781500, end 1487.736250
 Metadata:
 title : Chapter 05
 Chapter #0:5: start 1487.736250, end 1846.177667
 Metadata:
 title : Chapter 06
 Chapter #0:6: start 1846.177667, end 2116.531083
 Metadata:
 title : Chapter 07
 Chapter #0:7: start 2116.531083, end 2443.858083
 Metadata:
 title : Chapter 08
 Chapter #0:8: start 2443.858083, end 2731.979250
 Metadata:
 title : Chapter 09
 Chapter #0:9: start 2731.979250, end 3178.383542
 Metadata:
 title : Chapter 10
 Chapter #0:10: start 3178.383542, end 3636.507875
 Metadata:
 title : Chapter 11
 Chapter #0:11: start 3636.507875, end 3867.488625
 Metadata:
 title : Chapter 12
 Chapter #0:12: start 3867.488625, end 4208.037167
 Metadata:
 title : Chapter 13
 Chapter #0:13: start 4208.037167, end 4502.372875
 Metadata:
 title : Chapter 14
 Chapter #0:14: start 4502.372875, end 4853.431917
 Metadata:
 title : Chapter 15
 Chapter #0:15: start 4853.431917, end 5267.470542
 Metadata:
 title : Chapter 16
 Chapter #0:16: start 5267.470542, end 5543.329458
 Metadata:
 title : Chapter 17
 Chapter #0:17: start 5543.329458, end 5847.132958
 Metadata:
 title : Chapter 18
 Chapter #0:18: start 5847.132958, end 6150.269125
 Metadata:
 title : Chapter 19
 Chapter #0:19: start 6150.269125, end 6606.891958
 Metadata:
 title : Chapter 20
 Stream #0:0, 0, 1/1000: Video: hevc (Main 10), yuv420p10le(tv, bt2020nc/bt2020/smpte2084), 3840x1600 [SAR 1:1 DAR 12:5], q=2-31, 23.98 fps, 23.98 tbr, 1k tbn, 1k tbc
 Metadata:
 title : input
 ENCODER : Lavc58.62.100 libx265
 DURATION : 01:50:06.891000000
 Stream #0:1(eng), 0, 1/1000: Audio: truehd ([255][255][255][255] / 0xFFFFFFFF), 48000 Hz, 7.1, s32 (24 bit) (default)
 Metadata:
 title : input
 _STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
 BPS-eng : 3544728
 DURATION-eng : 01:50:06.893000000
 NUMBER_OF_FRAMES-eng: 7928271
 NUMBER_OF_BYTES-eng: 2927455224
 _STATISTICS_WRITING_APP-eng: mkvmerge v40.0.0 ('Old Town Road + Pony') 64-bit
 _STATISTICS_WRITING_DATE_UTC-eng: 2019-11-30 10:59:25
 DURATION : 01:50:06.893000000
 Stream #0:2(eng), 0, 1/1000: Audio: aac (LC) ([255][0][0][0] / 0x00FF), 48000 Hz, 7.1, fltp (24 bit), 469 kb/s (default)
 Metadata:
 title : input
 _STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
 BPS-eng : 3544728
 DURATION-eng : 01:50:06.893000000
 NUMBER_OF_FRAMES-eng: 7928271
 NUMBER_OF_BYTES-eng: 2927455224
 _STATISTICS_WRITING_APP-eng: mkvmerge v40.0.0 ('Old Town Road + Pony') 64-bit
 _STATISTICS_WRITING_DATE_UTC-eng: 2019-11-30 10:59:25
 DURATION : 01:50:06.893000000
 encoder : Lavc58.62.100 aac
cur_dts is invalid st:2 (0) [init:1 i_done:0 finish:0] (this is harmless if it occurs once at the start per stream)

....................
....................
[matroska @ 000001b290dfba40] Writing block of size 168 with pts 6606913, dts 6606913, duration 1 at relative offset 550755 in cluster at offset 9940898622. TrackNumber 2, keyframe 1
[matroska @ 000001b290dfba40] end duration = 6606914
[matroska @ 000001b290dfba40] stream 0 end duration = 6606912
[matroska @ 000001b290dfba40] stream 1 end duration = 6606914
[matroska @ 000001b290dfba40] stream 2 end duration = 6606914
frame=158407 fps=357 q=-1.0 Lsize= 9708475kB time=01:50:06.91 bitrate=12037.7kbits/s speed=14.9x 
video:6416483kB audio:3234397kB subtitle:0kB other streams:0kB global headers:2kB muxing overhead: 0.596787%
Input file #0 (input.mkv):
 Input stream #0:0 (video): 158407 packets read (6570478730 bytes); 
 Input stream #0:1 (audio): 7928271 packets read (2927455224 bytes); 7928271 frames decoded (317130840 samples); 
 Total: 8086678 packets (9497933954 bytes) demuxed
Output file #0 (destination.mkv):
 Output stream #0:0 (video): 158407 packets muxed (6570478730 bytes); 
 Output stream #0:1 (audio): 7928271 packets muxed (2927455224 bytes); 
 Output stream #0:2 (audio): 309699 frames encoded (317130840 samples); 309700 packets muxed (384567176 bytes); 
 Total: 8396378 packets (9882501130 bytes) muxed
7928271 frames successfully decoded, 0 decoding errors
[AVIOContext @ 000001b291230100] Statistics: 8 seeks, 39676 writeouts
[aac @ 000001b290dff3c0] Qavg: 3839.668
[AVIOContext @ 000001b290de4080] Statistics: 9554742230 bytes read, 0 seeks



-
Subtitling Sierra VMD Files
1er juin 2016, par Multimedia Mike — Game HackingI was contacted by a game translation hobbyist from Spain (henceforth known as The Translator). He had set his sights on Sierra’s 7-CD Phantasmagoria. This mammoth game was driven by a lot of FMV files and animations that have speech. These require language translation in the form of video subtitling. He’s lucky that he found possibly the one person on the whole internet who has just the right combination of skill, time, and interest to pull this off. And why would I care about helping ? I guess I share a certain camaraderie with game hackers. Don’t act so surprised. You know what kind of stuff I like to work on.
The FMV format used in this game is VMD, which makes an appearance in numerous Sierra titles. FFmpeg already supports decoding this format. FFmpeg also supports subtitling video. So, ideally, all that’s necessary to support this goal is to add a muxer for the VMD format which can encode raw video and audio, which the format supports. Implement video compression as extra credit.
The pipeline that I envisioned looks like this :
VMD Subtitling Process
“Trivial !” I surmised. I just never learn, do I ?
The Plan
So here’s my initial pitch, outlining the work I estimated that I would need to do towards the stated goal :- Create a new file muxer that produces a syntactically valid VMD file with bogus video and audio data. Make sure it works with both FFmpeg’s playback system as well as the proper Phantasmagoria engine.
- Create a new video encoder that essentially operates in pass-through mode while correctly building a palette.
- Create a new basic encoder for the video frames.
A big unknown for me was exactly how subtitle handling operates in FFmpeg. Thanks to this project, I now know. I was concerned because I was pretty sure that font rendering entails anti-aliasing which bodes poorly for keeping the palette count under 256 unique colors.
Computer Science Puzzle
When pondering how to process the palette, I was excited for the opportunity to exercise actual computer science. FFmpeg converts frames from paletted frames to full RGB frames. Then it needs to convert them back to paletted frames. I had a vague recollection of solving this problem once before when I was experimenting with a new paletted video codec. I seem to recall that I did the palette conversion in a very naive manner. I just used a static 256-element array and processed each RGB pixel of the frame, seeing if the value already occurred in the table (O(n) lookup) and adding it otherwise.
There are more efficient algorithms, however, such as hash tables and trees. Somewhere along the line, FFmpeg helpfully acquired a rarely-used tree data structure, which was perfect for this project.
So I was pretty pleased with this optimization. Too bad this wouldn’t survive to the end of the effort.
Another palette-related challenge was the fact that a group of pictures would be accumulating a new palette but that palette needed to be recorded before the group. Thus, the muxer needed to have extra logic to rewind the file when the video encoder transmitted a palette change.
Video Compression
VMD has a few methods in its compression toolbox. It can use interframe differencing, it has some RLE, or it can code a frame raw. It can also use a custom LZ-like format on top of these. For early prototypes, I elected to leave each frame coded raw. After the concept was proved, I implemented the frame differencing.
Top frame compared with the middle frame yields the bottom frame : red pixels indicate changesEncoding only those red dots in between vast runs of unchanged pixels yielded a vast measurable improvement. The next step was to try wiring up FFmpeg’s existing LZ compression facilities to the encoder. This turned out to be implausible since VMD’s LZ variant has nothing to do with anything FFmpeg already provides. Fortunately, the LZ piece is not absolutely required and the frame differencing + RLE provides plenty of compression.
Subtitling
I’ve never done anything, multimedia programming-wise, concerning subtitles. I guess all the entertainment I care about has always been in my native tongue. What a good excuse to program outside of my comfort zone !First, I needed to know how to access FFmpeg’s subtitling facilities. Fortunately, The Translator did the legwork on this matter so I didn’t have to figure it out.
However, I intuitively had misgivings about this phase. I had heard that the subtitling process performs anti-aliasing. That means that the image would need to be promoted to a higher colorspace for this phase and that the anti-aliasing process would likely push the color count way past 256. Some quick tests revealed this to be the case, as the running color count would leap by several hundred colors as soon as the palette accounting algorithm encountered a subtitle.
So I dug into the subtitle subsystem. I discovered that the subtitle library operates by creating a linked list of subtitle bitmaps that the client app must render. The bitmaps are comprised of 8-bit alpha transparency values that must be composited onto the target frame (i.e., 0 = transparent, 255 = 100% opaque). For example, the letter ‘H’ :
(with 00s removed) 13 F8 41 00 00 00 00 68 E4 | 13 F8 41 68 E4 14 FF 44 00 00 00 00 6C EC | 14 FF 44 6C EC 14 FF 44 00 00 00 00 6C EC | 14 FF 44 6C EC 14 FF 44 00 00 00 00 6C EC | 14 FF 44 6C EC 14 FF DC D0 D0 D0 D0 E4 EC | 14 FF DC D0 D0 D0 D0 E4 EC 14 FF 7E 50 50 50 50 9A EC | 14 FF 7E 50 50 50 50 9A EC 14 FF 44 00 00 00 00 6C EC | 14 FF 44 6C EC 14 FF 44 00 00 00 00 6C EC | 14 FF 44 6C EC 14 FF 44 00 00 00 00 6C EC | 14 FF 44 6C EC 11 E0 3B 00 00 00 00 5E CE | 11 E0 3B 5E CE
To get around the color explosion problem, I chose a threshold value and quantized values above and below to 255 and 0, respectively. Further, the process chooses an appropriate color from the existing palette rather than introducing any new colors.
Muxing Matters
In order to force VMD into a general purpose media framework, a lot of special information needs to be passed around. Like many paletted codecs, the palette needs to be transmitted from the file demuxer to the video decoder via some side channel. For re-encoding, this also implies that the palette needs to make the trip from the video encoder to the file muxer. As if this wasn’t enough, individual VMD frames have even more data that needs to be ferried between the muxer and codec levels, including frame change boundaries. FFmpeg provides methods to do these things, but I could not always rely on the systems to relay the data in all cases. I was probably doing something wrong ; I accept that. Instead, I just packed all the information at the front of an encoded frame and split it apart in the muxer.I could not quite figure out how to get the audio and video muxed correctly. As a result, neither FFmpeg nor the Phantasmagoria engine could replay the files correctly.
Plan B
Since I was having so much trouble creating an entirely new VMD file, likely due to numerous unknown bits of the file format, I thought of another angle : re-use the existing VMD file. For this approach, I kept the video encoder and file muxer that I created in the initial phase, but modified the file muxer to emit a special intermediate file. Then, I created a Python tool to repackage the original VMD file using compressed video data in the intermediate file.For this phase, I also implemented a command line switch for FFmpeg to disable subtitle blending, to make the feature feel like less of an unofficial hack, as though this nonsense would ever have a chance of being incorporated upstream.
At this point, I was seeing some success with the complete, albeit roundabout, subtitling process. I constructed a subtitle file using “Spanish I Learned From Mexican Telenovelas” and the frames turned out fairly readable :
“she cheated on him”
“he’s a scumbag” … these random subtitles could fit surprisingly well !
The few files that I tested appeared to work fine. But then I handed off my work to The Translator and he immediately found a bunch of problems. According to my notes, the problems mostly took the form of flashing, solid color frames. Further, I found tiny, mostly imperceptible flaws in my RLE compressor, usually only detectable by running strict comparison tools ; but I wasn’t satisfied.
At this point, I think I attempted to just encode the entire palette at the front of each frame, as allowed by the format, but that did not seem to fix any problems. My notes are not completely clear on this matter (likely because I was still trying to figure out the exact problem), but I think it had to do with FFmpeg inserting extra video frames in order to even out gaps in the video framerate.
Sigh, Plan C
At this point, I was getting tired of trying to force FFmpeg to do this. So I decided to minimize its involvement using lessons learned up to this point.The next pitch :
- Create a new C program that can open an existing VMD file and output an identical VMD file. I know this sounds easy, but the specific method of copying entails interpreting individual parts of the file and writing those individual parts to the new file. This is in preparation for…
- Import the VMD video decoder functions directly into the program to decode the individual video frames and re-encode them, replacing the video frames as the file is rewritten.
- Wire up the subtitle system. During the adventure to disable subtitle blending, I accidentally learned enough about interfacing to the subtitle library to just invoke it directly.
- Rewrite the RLE method so that it is 100% correct.
Off to work I went. That part about lifting the existing VMD decoder functions out of their libavcodec nest turned out to not be that straightforward. As an alternative, I modified the decoder to dump the raw frames to an intermediate file. In doing so, I think I was able to avoid the issue of the duplicated frames that plagued the previous efforts.
Also, remember how I was really pleased with the palette conversion technique in which I was able to leverage computer science big-O theory ? By this stage, I had no reason to convert the paletted video to RGB in the first place ; all of the decoding, subtitling and re-encoding operates in the paletted colorspace.
This approach seemed to work pretty well. The final program is subtitle-vmd.c. The process is still a little weird. The modifications in my own FFmpeg fork are necessary to create an intermediate file that the new C tool can operate with.
Next Steps
The Translator has found some assorted bugs and corner cases that still need to be ironed out. Further, for extra credit, I need find the change windows for each frame to improve compression just a little more. I don’t think I will be trying for LZ compression, though.However, almost as soon as I had this whole system working, The Translator informed me that there is another, different movie format in play in the Phantasmagoria engine called ROBOT, with an extension of RBT. Fortunately, enough of the algorithms have been reverse engineered and re-implemented in ScummVM that I was able to sort out enough details for another subtitling project. That will be the subject of a future post.
See Also :
- Subtitling Sierra RBT Files : The followup in which I discuss how to scribble text on the other animation format
-
Revision 31963 : Si on a le plugin yaml activé, on peut alors exporter chaque menu dans un ...
7 octobre 2009, par rastapopoulos@… — LogSi on a le plugin yaml activé, on peut alors exporter chaque menu dans un fichier yaml.
Reste à proposer l’import de ce même fichier.