
Recherche avancée
Médias (91)
-
Corona Radiata
26 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
-
Lights in the Sky
26 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
-
Head Down
26 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
-
Echoplex
26 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
-
Discipline
26 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
-
Letting You
26 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
Autres articles (13)
-
Les formats acceptés
28 janvier 2010, parLes commandes suivantes permettent d’avoir des informations sur les formats et codecs gérés par l’installation local de ffmpeg :
ffmpeg -codecs ffmpeg -formats
Les format videos acceptés en entrée
Cette liste est non exhaustive, elle met en exergue les principaux formats utilisés : h264 : H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 m4v : raw MPEG-4 video format flv : Flash Video (FLV) / Sorenson Spark / Sorenson H.263 Theora wmv :
Les formats vidéos de sortie possibles
Dans un premier temps on (...) -
Supporting all media types
13 avril 2011, parUnlike most software and media-sharing platforms, MediaSPIP aims to manage as many different media types as possible. The following are just a few examples from an ever-expanding list of supported formats : images : png, gif, jpg, bmp and more audio : MP3, Ogg, Wav and more video : AVI, MP4, OGV, mpg, mov, wmv and more text, code and other data : OpenOffice, Microsoft Office (Word, PowerPoint, Excel), web (html, CSS), LaTeX, Google Earth and (...)
-
Ajouter notes et légendes aux images
7 février 2011, parPour pouvoir ajouter notes et légendes aux images, la première étape est d’installer le plugin "Légendes".
Une fois le plugin activé, vous pouvez le configurer dans l’espace de configuration afin de modifier les droits de création / modification et de suppression des notes. Par défaut seuls les administrateurs du site peuvent ajouter des notes aux images.
Modification lors de l’ajout d’un média
Lors de l’ajout d’un média de type "image" un nouveau bouton apparait au dessus de la prévisualisation (...)
Sur d’autres sites (3346)
-
Android + ffmpeg + AudioTrack produces bad audio output
12 septembre 2014, par Goddchenhere is what I am trying to do : use an
AudioRecord
and "pipe" the output ofAudioRecord.read(byte[],...)
to an ffmpeg process’ stdin that will convert to a 3gp (AAC) file.The ffmpeg call is as follows :
ProcessBuilder processBuilder = new ProcessBuilder(BINARY.getAbsolutePath(),
"-y",
"-ar", "44100", "-c:a", "pcm_s16le", "-ac", "1","-f","s16le",
"-i", "-",
"-strict", "-2", "-c:a", "aac",
outFile.getAbsolutePath());The AudioRecord is setup as follows :
AudioRecord record = new AudioRecord(/*AudioSource.VOICE_RECOGNITION,*/ AudioSource.MIC,
SAMPLING_RATE,
AudioFormat.CHANNEL_IN_MONO,
AudioFormat.ENCODING_PCM_16BIT,
bufferSize);SAMPLING_RATE = 44100
andbufferSize
is the one returned byAudioRecord.getMinBufferSize(...)
I am writing the data to ffmpeg like this :
try {
IOUtils.write(data, getFFmpegHelper().getCurrentProcessOutputStream());
} catch (Exception e) {
Log.e(Application.LOG_TAG, "Error writing data to ffmpeg process", e);
//TODO notify user, stop the recording, etc...
}So far so good, the ffmpeg runs and created a proper 3gp file. But the audio in the file is totally off. It seems "choppy" (not sure if this is the correct english word ;) ) and also the pace is wrong, is plays too fast.
Check out this sample : http://goddchen.de/android/tmp/tmp.3gp
This is the output of the ffmpeg process :
[s16le @ 0x23634d0] Estimating duration from bitrate, this may be inaccurate
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, s16le, from 'pipe:':
Duration: N/A, start: 0.000000, bitrate: 705 kb/s
Stream #0:0: Audio: pcm_s16le, 44100 Hz, mono, s16, 705 kb/s
[aformat @ 0x2363100] auto-inserting filter 'auto-inserted resampler 0' between the filter 'src' and the filter 'aformat'
[aresample @ 0x235b0a0] chl:mono fmt:s16 r:44100Hz -> chl:mono fmt:flt r:44100Hz
Output #0, 3gp, to '/data/data/com.test.audio/files/tmp.3gp':
Metadata:
encoder : Lavf54.6.100
Stream #0:0: Audio: aac (mp4a / 0x6134706D), 44100 Hz, mono, flt, 128 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le -> aac)
size= 3kB time=00:00:00.18 bitrate= 132.5kbits/s
size= 8kB time=00:00:00.55 bitrate= 120.9kbits/s
size= 12kB time=00:00:00.83 bitrate= 121.8kbits/s
size= 16kB time=00:00:01.04 bitrate= 122.8kbits/s
size= 20kB time=00:00:01.32 bitrate= 122.5kbits/s
size= 23kB time=00:00:01.53 bitrate= 121.6kbits/s
size= 27kB time=00:00:01.81 bitrate= 121.0kbits/s
size= 31kB time=00:00:02.11 bitrate= 120.7kbits/s
size= 35kB time=00:00:02.32 bitrate= 123.4kbits/s
video:0kB audio:34kB global headers:0kB muxing overhead 3.031610% -
How to set pts, dts and duration in ffmpeg library ?
24 mars, par hsleeI want to pack some compressed video packets(h.264) to ".mp4" container.
One word, Muxing, no decoding and no encoding.
And I have no idea how to set pts, dts and duration.



- 

- I get the packets with "pcap" library.
- I removed headers before compressed video data show up. e.g. Ethernet, VLAN.
- I collected data until one frame and decoded it for getting information of data. e.g. width, height. (I am not sure that it is necessary)
- I initialized output context, stream and codec context.
- I started to receive packets with "pcap" library again. (now for muxing)
- I made one frame and put that data in AVPacket structure.
- I try to set PTS, DTS and duration. (I think here is wrong part, not sure though)

















*7-1. At the first frame, I saved time(msec) with packet header structure.



*7-2. whenever I made one frame, I set parameters like this : PTS(current time - start time), DTS(same PTS value), duration(current PTS - before PTS)



I think it has some error because :



- 

-
I don't know how far is suitable long for dts from pts.
-
At least, I think duration means how long time show this frame from now to next frame, so It should have value(next PTS - current PTS), but I can not know the value next PTS at that time.







It has I-frame only.



// make input context for decoding

AVFormatContext *&ic = gInputContext;

ic = avformat_alloc_context();

AVCodec *cd = avcodec_find_decoder(AV_CODEC_ID_H264);

AVStream *st = avformat_new_stream(ic, cd);

AVCodecContext *cc = st->codec;

avcodec_open2(cc, cd, NULL);

// make packet and decode it after collect packets is be one frame

gPacket.stream_index = 0;

gPacket.size = gPacketLength[0];

gPacket.data = gPacketData[0];

gPacket.pts = AV_NOPTS_VALUE;

gPacket.dts = AV_NOPTS_VALUE;

gPacket.flags = AV_PKT_FLAG_KEY;

avcodec_decode_video2(cc, gFrame, &got_picture, &gPacket);

// I checked automatically it initialized after "avcodec_decode_video2"

// put some info that I know that not initialized

cc->time_base.den = 90000;

cc->time_base.num = 1;

cc->bit_rate = 2500000;

cc->gop_size = 1;

// make output context with input context

AVFormatContext *&oc = gOutputContext;

avformat_alloc_output_context2(&oc, NULL, NULL, filename);

AVFormatContext *&ic = gInputContext;

AVStream *ist = ic->streams[0];

AVCodecContext *&icc = ist->codec;

AVStream *ost = avformat_new_stream(oc, icc->codec);

AVCodecContext *occ = ost->codec;

avcodec_copy_context(occ, icc);

occ->flags |= CODEC_FLAG_GLOBAL_HEADER;

avio_open(&(oc->pb), filename, AVIO_FLAG_WRITE);

// repeated part for muxing

AVRational Millisecond = { 1, 1000 };

gPacket.stream_index = 0;

gPacket.data = gPacketData[0];

gPacket.size = gPacketLength[0];

gPacket.pts = av_rescale_rnd(pkthdr->ts.tv_sec * 1000 /

 + pkthdr->ts.tv_usec / 1000 /

 - gStartTime, Millisecond.den, ost->time_base.den, /

 (AVRounding)(AV_ROUND_NEAR_INF | AV_ROUND_PASS_MINMAX));

gPacket.dts = gPacket.pts;

gPacket.duration = gPacket.pts - gPrev;

gPacket.flags = AV_PKT_FLAG_KEY;

gPrev = gPacket.pts;

av_interleaved_write_frame(gOutputContext, &gPacket);




Expected and actual results is a .mp4 video file that can play.


-
Batch splitting large audio files into small fixed-length audio files in moments of silence
26 juillet 2023, par Haldjärvito train the SO-VITS-SVC neural network, we need 10-14 second voice files. As a material, let's say I use phrases from some game. I have already made a batch script for decoding different files into one working format, another batch script for removing silence, as well as a batch script for combining small audio files into files of 13-14 seconds (I used Python, pydub and FFmpeg). To successfully automatically create a training dataset, it remains only to make one batch script - Cutting audio files lasting more than 14 seconds into separate files lasting 10-14 seconds, cutting in places of silence or close to silence is highly preferable.


So, it is necessary to batch cut large audio files (20 seconds, 70 seconds, possibly several hundred seconds) into segments of approximately 10-14 seconds, however, the main task is to look for the quietest place in the cut areas so as not to cut phrases in the middle of a word (this is not very good for model training). So, is it really possible to do this in a very optimal way, so that the processing of a 30-second file does not take 15 seconds, but is fast ? Quiet zone detection is required only in the area of cuts, that is, 10-14 seconds, if counted from the very beginning of the file.


I would be very grateful for any help.


I tried to write a script together with ChatGPT, but all options gave completely unpredictable results and were not even close to what I needed... I had to stop at the option with a sharp cut of files for exactly 14000 milliseconds. However, I hope there is a chance to make a variant with cutting exactly in quiet areas.


import os
from pydub import AudioSegment

input_directory = ".../RemSilence/"
output_directory = ".../Split/"
max_duration = 14000

def split_audio_by_duration(input_file, duration):
 audio = AudioSegment.from_file(input_file)
 segments = []
 for i in range(0, len(audio), duration):
 segment = audio[i:i + duration]
 segments.append(segment)
 return segments

if __name__ == "__main__":
 os.makedirs(output_directory, exist_ok=True)
 audio_files = [os.path.join(input_directory, file) for file in os.listdir(input_directory) if file.endswith(".wav")]
 audio_files.sort(key=lambda file: len(AudioSegment.from_file(file)))
 for file in audio_files:
 audio = AudioSegment.from_file(file)
 if len(audio) > max_duration:
 segments = split_audio_by_duration(file, max_duration)
 for i, segment in enumerate(segments):
 output_filename = f"output_{len(os.listdir(output_directory))+1}.wav"
 output_file_path = os.path.join(output_directory, output_filename)
 segment.export(output_file_path, format="wav")
 else:
 output_filename = f"output_{len(os.listdir(output_directory))+1}.wav"
 output_file_path = os.path.join(output_directory, output_filename)
 audio.export(output_file_path, format="wav")