
Recherche avancée
Autres articles (11)
-
Publier sur MédiaSpip
13 juin 2013Puis-je poster des contenus à partir d’une tablette Ipad ?
Oui, si votre Médiaspip installé est à la version 0.2 ou supérieure. Contacter au besoin l’administrateur de votre MédiaSpip pour le savoir -
Supporting all media types
13 avril 2011, parUnlike most software and media-sharing platforms, MediaSPIP aims to manage as many different media types as possible. The following are just a few examples from an ever-expanding list of supported formats : images : png, gif, jpg, bmp and more audio : MP3, Ogg, Wav and more video : AVI, MP4, OGV, mpg, mov, wmv and more text, code and other data : OpenOffice, Microsoft Office (Word, PowerPoint, Excel), web (html, CSS), LaTeX, Google Earth and (...)
-
Selection of projects using MediaSPIP
2 mai 2011, parThe examples below are representative elements of MediaSPIP specific uses for specific projects.
MediaSPIP farm @ Infini
The non profit organizationInfini develops hospitality activities, internet access point, training, realizing innovative projects in the field of information and communication technologies and Communication, and hosting of websites. It plays a unique and prominent role in the Brest (France) area, at the national level, among the half-dozen such association. Its members (...)
Sur d’autres sites (6030)
-
Android + ffmpeg + AudioTrack produces bad audio output
12 septembre 2014, par Goddchenhere is what I am trying to do : use an
AudioRecord
and "pipe" the output ofAudioRecord.read(byte[],...)
to an ffmpeg process’ stdin that will convert to a 3gp (AAC) file.The ffmpeg call is as follows :
ProcessBuilder processBuilder = new ProcessBuilder(BINARY.getAbsolutePath(),
"-y",
"-ar", "44100", "-c:a", "pcm_s16le", "-ac", "1","-f","s16le",
"-i", "-",
"-strict", "-2", "-c:a", "aac",
outFile.getAbsolutePath());The AudioRecord is setup as follows :
AudioRecord record = new AudioRecord(/*AudioSource.VOICE_RECOGNITION,*/ AudioSource.MIC,
SAMPLING_RATE,
AudioFormat.CHANNEL_IN_MONO,
AudioFormat.ENCODING_PCM_16BIT,
bufferSize);SAMPLING_RATE = 44100
andbufferSize
is the one returned byAudioRecord.getMinBufferSize(...)
I am writing the data to ffmpeg like this :
try {
IOUtils.write(data, getFFmpegHelper().getCurrentProcessOutputStream());
} catch (Exception e) {
Log.e(Application.LOG_TAG, "Error writing data to ffmpeg process", e);
//TODO notify user, stop the recording, etc...
}So far so good, the ffmpeg runs and created a proper 3gp file. But the audio in the file is totally off. It seems "choppy" (not sure if this is the correct english word ;) ) and also the pace is wrong, is plays too fast.
Check out this sample : http://goddchen.de/android/tmp/tmp.3gp
This is the output of the ffmpeg process :
[s16le @ 0x23634d0] Estimating duration from bitrate, this may be inaccurate
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, s16le, from 'pipe:':
Duration: N/A, start: 0.000000, bitrate: 705 kb/s
Stream #0:0: Audio: pcm_s16le, 44100 Hz, mono, s16, 705 kb/s
[aformat @ 0x2363100] auto-inserting filter 'auto-inserted resampler 0' between the filter 'src' and the filter 'aformat'
[aresample @ 0x235b0a0] chl:mono fmt:s16 r:44100Hz -> chl:mono fmt:flt r:44100Hz
Output #0, 3gp, to '/data/data/com.test.audio/files/tmp.3gp':
Metadata:
encoder : Lavf54.6.100
Stream #0:0: Audio: aac (mp4a / 0x6134706D), 44100 Hz, mono, flt, 128 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le -> aac)
size= 3kB time=00:00:00.18 bitrate= 132.5kbits/s
size= 8kB time=00:00:00.55 bitrate= 120.9kbits/s
size= 12kB time=00:00:00.83 bitrate= 121.8kbits/s
size= 16kB time=00:00:01.04 bitrate= 122.8kbits/s
size= 20kB time=00:00:01.32 bitrate= 122.5kbits/s
size= 23kB time=00:00:01.53 bitrate= 121.6kbits/s
size= 27kB time=00:00:01.81 bitrate= 121.0kbits/s
size= 31kB time=00:00:02.11 bitrate= 120.7kbits/s
size= 35kB time=00:00:02.32 bitrate= 123.4kbits/s
video:0kB audio:34kB global headers:0kB muxing overhead 3.031610% -
How to set pts, dts and duration in ffmpeg library ?
24 mars, par hsleeI want to pack some compressed video packets(h.264) to ".mp4" container.
One word, Muxing, no decoding and no encoding.
And I have no idea how to set pts, dts and duration.



- 

- I get the packets with "pcap" library.
- I removed headers before compressed video data show up. e.g. Ethernet, VLAN.
- I collected data until one frame and decoded it for getting information of data. e.g. width, height. (I am not sure that it is necessary)
- I initialized output context, stream and codec context.
- I started to receive packets with "pcap" library again. (now for muxing)
- I made one frame and put that data in AVPacket structure.
- I try to set PTS, DTS and duration. (I think here is wrong part, not sure though)

















*7-1. At the first frame, I saved time(msec) with packet header structure.



*7-2. whenever I made one frame, I set parameters like this : PTS(current time - start time), DTS(same PTS value), duration(current PTS - before PTS)



I think it has some error because :



- 

-
I don't know how far is suitable long for dts from pts.
-
At least, I think duration means how long time show this frame from now to next frame, so It should have value(next PTS - current PTS), but I can not know the value next PTS at that time.







It has I-frame only.



// make input context for decoding

AVFormatContext *&ic = gInputContext;

ic = avformat_alloc_context();

AVCodec *cd = avcodec_find_decoder(AV_CODEC_ID_H264);

AVStream *st = avformat_new_stream(ic, cd);

AVCodecContext *cc = st->codec;

avcodec_open2(cc, cd, NULL);

// make packet and decode it after collect packets is be one frame

gPacket.stream_index = 0;

gPacket.size = gPacketLength[0];

gPacket.data = gPacketData[0];

gPacket.pts = AV_NOPTS_VALUE;

gPacket.dts = AV_NOPTS_VALUE;

gPacket.flags = AV_PKT_FLAG_KEY;

avcodec_decode_video2(cc, gFrame, &got_picture, &gPacket);

// I checked automatically it initialized after "avcodec_decode_video2"

// put some info that I know that not initialized

cc->time_base.den = 90000;

cc->time_base.num = 1;

cc->bit_rate = 2500000;

cc->gop_size = 1;

// make output context with input context

AVFormatContext *&oc = gOutputContext;

avformat_alloc_output_context2(&oc, NULL, NULL, filename);

AVFormatContext *&ic = gInputContext;

AVStream *ist = ic->streams[0];

AVCodecContext *&icc = ist->codec;

AVStream *ost = avformat_new_stream(oc, icc->codec);

AVCodecContext *occ = ost->codec;

avcodec_copy_context(occ, icc);

occ->flags |= CODEC_FLAG_GLOBAL_HEADER;

avio_open(&(oc->pb), filename, AVIO_FLAG_WRITE);

// repeated part for muxing

AVRational Millisecond = { 1, 1000 };

gPacket.stream_index = 0;

gPacket.data = gPacketData[0];

gPacket.size = gPacketLength[0];

gPacket.pts = av_rescale_rnd(pkthdr->ts.tv_sec * 1000 /

 + pkthdr->ts.tv_usec / 1000 /

 - gStartTime, Millisecond.den, ost->time_base.den, /

 (AVRounding)(AV_ROUND_NEAR_INF | AV_ROUND_PASS_MINMAX));

gPacket.dts = gPacket.pts;

gPacket.duration = gPacket.pts - gPrev;

gPacket.flags = AV_PKT_FLAG_KEY;

gPrev = gPacket.pts;

av_interleaved_write_frame(gOutputContext, &gPacket);




Expected and actual results is a .mp4 video file that can play.


-
Batch splitting large audio files into small fixed-length audio files in moments of silence
26 juillet 2023, par Haldjärvito train the SO-VITS-SVC neural network, we need 10-14 second voice files. As a material, let's say I use phrases from some game. I have already made a batch script for decoding different files into one working format, another batch script for removing silence, as well as a batch script for combining small audio files into files of 13-14 seconds (I used Python, pydub and FFmpeg). To successfully automatically create a training dataset, it remains only to make one batch script - Cutting audio files lasting more than 14 seconds into separate files lasting 10-14 seconds, cutting in places of silence or close to silence is highly preferable.


So, it is necessary to batch cut large audio files (20 seconds, 70 seconds, possibly several hundred seconds) into segments of approximately 10-14 seconds, however, the main task is to look for the quietest place in the cut areas so as not to cut phrases in the middle of a word (this is not very good for model training). So, is it really possible to do this in a very optimal way, so that the processing of a 30-second file does not take 15 seconds, but is fast ? Quiet zone detection is required only in the area of cuts, that is, 10-14 seconds, if counted from the very beginning of the file.


I would be very grateful for any help.


I tried to write a script together with ChatGPT, but all options gave completely unpredictable results and were not even close to what I needed... I had to stop at the option with a sharp cut of files for exactly 14000 milliseconds. However, I hope there is a chance to make a variant with cutting exactly in quiet areas.


import os
from pydub import AudioSegment

input_directory = ".../RemSilence/"
output_directory = ".../Split/"
max_duration = 14000

def split_audio_by_duration(input_file, duration):
 audio = AudioSegment.from_file(input_file)
 segments = []
 for i in range(0, len(audio), duration):
 segment = audio[i:i + duration]
 segments.append(segment)
 return segments

if __name__ == "__main__":
 os.makedirs(output_directory, exist_ok=True)
 audio_files = [os.path.join(input_directory, file) for file in os.listdir(input_directory) if file.endswith(".wav")]
 audio_files.sort(key=lambda file: len(AudioSegment.from_file(file)))
 for file in audio_files:
 audio = AudioSegment.from_file(file)
 if len(audio) > max_duration:
 segments = split_audio_by_duration(file, max_duration)
 for i, segment in enumerate(segments):
 output_filename = f"output_{len(os.listdir(output_directory))+1}.wav"
 output_file_path = os.path.join(output_directory, output_filename)
 segment.export(output_file_path, format="wav")
 else:
 output_filename = f"output_{len(os.listdir(output_directory))+1}.wav"
 output_file_path = os.path.join(output_directory, output_filename)
 audio.export(output_file_path, format="wav")