
Recherche avancée
Médias (1)
-
Carte de Schillerkiez
13 mai 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Texte
Autres articles (60)
-
Les tâches Cron régulières de la ferme
1er décembre 2010, parLa gestion de la ferme passe par l’exécution à intervalle régulier de plusieurs tâches répétitives dites Cron.
Le super Cron (gestion_mutu_super_cron)
Cette tâche, planifiée chaque minute, a pour simple effet d’appeler le Cron de l’ensemble des instances de la mutualisation régulièrement. Couplée avec un Cron système sur le site central de la mutualisation, cela permet de simplement générer des visites régulières sur les différents sites et éviter que les tâches des sites peu visités soient trop (...) -
Supporting all media types
13 avril 2011, parUnlike most software and media-sharing platforms, MediaSPIP aims to manage as many different media types as possible. The following are just a few examples from an ever-expanding list of supported formats : images : png, gif, jpg, bmp and more audio : MP3, Ogg, Wav and more video : AVI, MP4, OGV, mpg, mov, wmv and more text, code and other data : OpenOffice, Microsoft Office (Word, PowerPoint, Excel), web (html, CSS), LaTeX, Google Earth and (...)
-
Keeping control of your media in your hands
13 avril 2011, parThe vocabulary used on this site and around MediaSPIP in general, aims to avoid reference to Web 2.0 and the companies that profit from media-sharing.
While using MediaSPIP, you are invited to avoid using words like "Brand", "Cloud" and "Market".
MediaSPIP is designed to facilitate the sharing of creative media online, while allowing authors to retain complete control of their work.
MediaSPIP aims to be accessible to as many people as possible and development is based on expanding the (...)
Sur d’autres sites (7354)
-
Error audio loading when runing Whisper Open AI model
9 juin 2024, par John mickThe problem I'm trying to solve is that I can't run Whisper model for some audio, it says something related to audio decoding.


payload.wav: Invalid data found when processing input.
raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e



I tried using the
micro-machines.wav
and it works fine but when i used other audio it gives me an error.

import whisper

model = whisper.load_model("base")
text=model.transcribe('micro-machines.wav',fp16=False)
print(text)
text=model.transcribe('payload.wav',fp16=False)
print(text)



Error I'm getting for payload :


d:\...\venv\lib\site-packages\whisper\transcribe.py:79: UserWarning: FP16 is not supported on CPU; using FP32 instead
 warnings.warn("FP16 is not supported on CPU; using FP32 instead") 
Traceback (most recent call last):
 File "d:\...\venv\lib\site-packages\whisper\audio.py", line 42, in load_audio
 ffmpeg.input(file, threads=0) 
 File "d:\...\venv\lib\site-packages\ffmpeg\_run.py", line 325, in run 
 raise Error('ffmpeg', out, err) 
ffmpeg._run.Error: ffmpeg error (see stderr output for detail) 

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
 File "C:\....\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
 return _run_code(code, main_globals, None,
 File "C:\.....\Python\Python39\lib\runpy.py", line 87, in _run_code
 exec(code, run_globals)
 File "D:\...\venv\Scripts\whisper.exe\__main__.py", line 7, in <module>
 File "d:\...\venv\lib\site-packages\whisper\transcribe.py", line 314, in cli
 result = transcribe(model, audio_path, temperature=temperature, **args)
 File "d:\...\venv\lib\site-packages\whisper\transcribe.py", line 85, in transcribe
 mel = log_mel_spectrogram(audio)
 File "d:\...\venv\lib\site-packages\whisper\audio.py", line 111, in log_mel_spectrogram
 audio = load_audio(audio)
 File "d:\...\venv\lib\site-packages\whisper\audio.py", line 47, in load_audio
 raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e
RuntimeError: Failed to load audio: ffmpeg version 6.0-essentials_build-www.gyan.dev Copyright (c) 2000-2023 the FFmpeg developers
 built with gcc 12.2.0 (Rev10, Built by MSYS2 project)
 configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enab
le-gmp --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxv
id --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-libass --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf 
--enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libgme --enable-libopenmpt --enable-libo
pencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enab
le-librubberband
 libavutil 58. 2.100 / 58. 2.100
 libavcodec 60. 3.100 / 60. 3.100
 libavformat 60. 3.100 / 60. 3.100
 libavdevice 60. 1.100 / 60. 1.100
 libavfilter 9. 3.100 / 9. 3.100
 libswscale 7. 1.100 / 7. 1.100
 libswresample 4. 10.100 / 4. 10.100
 libpostproc 57. 1.100 / 57. 1.100
payload.wav: Invalid data found when processing input
</module>


I tried searching for solutions and I found one which says It appears that the code failed to load the audio file for some reason and even failed to display that error because e.stderr did not contain a valid UTF-8 string


-
FFmpeg C API WMAV2 AVCodecParserContext not found even though CLI can parse WMAs on MacOS
3 octobre 2023, par grendellI am following the decode_audio.c example from FFmpeg, but I am unable to initialize a parser for
AV_CODEC_ID_WMAV2
.

Test code :


#include 
#include <libavcodec></libavcodec>avcodec.h>

int main() {
 // codec is found successfully
 const AVCodec * codec = avcodec_find_decoder(AV_CODEC_ID_WMAV2);
 if (!codec) {
 fprintf(stderr, "codec not found\n");
 return 1;
 }

 // parser is always NULL
 AVCodecParserContext * parser = av_parser_init(codec->id);
 if (!parser) {
 fprintf(stderr, "parser not found\n");
 return 1;
 }

 av_parser_close(parser);
 return 0;
}



Build commands :


clang -c -I/opt/homebrew/Cellar/ffmpeg/6.0_1/include wma2mp3.c -o obj/wma2mp3.o
clang -L/opt/homebrew/Cellar/ffmpeg/6.0_1/lib -lavcodec obj/wma2mp3.o -o wma2mp3



I'm surprised by the fact that the FFmpeg CLI can perform this operation on the same machine :


% ffmpeg -i test.wma test.mp3
ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers
 built with Apple clang version 14.0.3 (clang-1403.0.22.14.1)
 configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/6.0_1 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-audiotoolbox --enable-neon
 libavutil 58. 2.100 / 58. 2.100
 libavcodec 60. 3.100 / 60. 3.100
 libavformat 60. 3.100 / 60. 3.100
 libavdevice 60. 1.100 / 60. 1.100
 libavfilter 9. 3.100 / 9. 3.100
 libswscale 7. 1.100 / 7. 1.100
 libswresample 4. 10.100 / 4. 10.100
 libpostproc 57. 1.100 / 57. 1.100
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, asf, from 'test.wma':
 Metadata:
 ToolName : Windows Media Encoding Utility
 ToolVersion : 8.00.00.0343
 Duration: 00:00:00.74, start: 0.000000, bitrate: 80 kb/s
 Stream #0:0: Audio: wmav2 (a[1][0][0] / 0x0161), 44100 Hz, 1 channels, fltp, 48 kb/s
Stream mapping:
 Stream #0:0 -> #0:0 (wmav2 (native) -> mp3 (libmp3lame))
Press [q] to stop, [?] for help
Output #0, mp3, to 'test.mp3':
 Metadata:
 ToolName : Windows Media Encoding Utility
 ToolVersion : 8.00.00.0343
 TSSE : Lavf60.3.100
 Stream #0:0: Audio: mp3, 44100 Hz, mono, fltp
 Metadata:
 encoder : Lavc60.3.100 libmp3lame
[libmp3lame @ 0x130706320] Queue input is backward in timeed=N/A 
[mp3 @ 0x1307056e0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 15668 >= 14764
size= 8kB time=00:00:00.97 bitrate= 65.8kbits/s speed= 103x 
video:0kB audio:8kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 4.048112%



I am using an Apple M1 machine running MacOS 13.5.2 (22G91).


Is the CLI using a different mechanism than
av_parser_parse2
to perform this conversion, and is there a better way to accomplish this via the C API ?

-
Revision 30966 : eviter le moche ’doctype_ecrire’ lors de l’upgrade
17 août 2009, par fil@… — Logeviter le moche ’doctype_ecrire’ lors de l’upgrade