Recherche avancée

Médias (91)

Autres articles (99)

  • Configuration spécifique d’Apache

    4 février 2011, par

    Modules spécifiques
    Pour la configuration d’Apache, il est conseillé d’activer certains modules non spécifiques à MediaSPIP, mais permettant d’améliorer les performances : mod_deflate et mod_headers pour compresser automatiquement via Apache les pages. Cf ce tutoriel ; mode_expires pour gérer correctement l’expiration des hits. Cf ce tutoriel ;
    Il est également conseillé d’ajouter la prise en charge par apache du mime-type pour les fichiers WebM comme indiqué dans ce tutoriel.
    Création d’un (...)

  • Installation en mode ferme

    4 février 2011, par

    Le mode ferme permet d’héberger plusieurs sites de type MediaSPIP en n’installant qu’une seule fois son noyau fonctionnel.
    C’est la méthode que nous utilisons sur cette même plateforme.
    L’utilisation en mode ferme nécessite de connaïtre un peu le mécanisme de SPIP contrairement à la version standalone qui ne nécessite pas réellement de connaissances spécifique puisque l’espace privé habituel de SPIP n’est plus utilisé.
    Dans un premier temps, vous devez avoir installé les mêmes fichiers que l’installation (...)

  • Taille des images et des logos définissables

    9 février 2011, par

    Dans beaucoup d’endroits du site, logos et images sont redimensionnées pour correspondre aux emplacements définis par les thèmes. L’ensemble des ces tailles pouvant changer d’un thème à un autre peuvent être définies directement dans le thème et éviter ainsi à l’utilisateur de devoir les configurer manuellement après avoir changé l’apparence de son site.
    Ces tailles d’images sont également disponibles dans la configuration spécifique de MediaSPIP Core. La taille maximale du logo du site en pixels, on permet (...)

Sur d’autres sites (5237)

  • Android + ffmpeg + AudioTrack produces bad audio output

    12 septembre 2014, par Goddchen

    here is what I am trying to do : use an AudioRecord and "pipe" the output of AudioRecord.read(byte[],...) to an ffmpeg process’ stdin that will convert to a 3gp (AAC) file.

    The ffmpeg call is as follows :

           ProcessBuilder processBuilder = new ProcessBuilder(BINARY.getAbsolutePath(),
                   "-y",
                   "-ar", "44100", "-c:a", "pcm_s16le", "-ac", "1","-f","s16le",
                   "-i", "-",
                   "-strict", "-2", "-c:a", "aac",
                   outFile.getAbsolutePath());

    The AudioRecord is setup as follows :

    AudioRecord record = new AudioRecord(/*AudioSource.VOICE_RECOGNITION,*/ AudioSource.MIC,
               SAMPLING_RATE,
               AudioFormat.CHANNEL_IN_MONO,
               AudioFormat.ENCODING_PCM_16BIT,
               bufferSize);

    SAMPLING_RATE = 44100 and bufferSize is the one returned by AudioRecord.getMinBufferSize(...)

    I am writing the data to ffmpeg like this :

    try {
                           IOUtils.write(data, getFFmpegHelper().getCurrentProcessOutputStream());
                       } catch (Exception e) {
                           Log.e(Application.LOG_TAG, "Error writing data to ffmpeg process", e);
                           //TODO notify user, stop the recording, etc...
                       }

    So far so good, the ffmpeg runs and created a proper 3gp file. But the audio in the file is totally off. It seems "choppy" (not sure if this is the correct english word ;) ) and also the pace is wrong, is plays too fast.

    Check out this sample : http://goddchen.de/android/tmp/tmp.3gp

    This is the output of the ffmpeg process :

       [s16le @ 0x23634d0] Estimating duration from bitrate, this may be inaccurate
       Guessed Channel Layout for  Input Stream #0.0 : mono
       Input #0, s16le, from 'pipe:':
       Duration: N/A, start: 0.000000, bitrate: 705 kb/s
       Stream #0:0: Audio: pcm_s16le, 44100 Hz, mono, s16, 705 kb/s
       [aformat @ 0x2363100] auto-inserting filter 'auto-inserted resampler 0' between the filter 'src' and the filter 'aformat'
       [aresample @ 0x235b0a0] chl:mono fmt:s16 r:44100Hz -> chl:mono fmt:flt r:44100Hz
       Output #0, 3gp, to '/data/data/com.test.audio/files/tmp.3gp':
       Metadata:
       encoder         : Lavf54.6.100
       Stream #0:0: Audio: aac (mp4a / 0x6134706D), 44100 Hz, mono, flt, 128 kb/s
       Stream mapping:
       Stream #0:0 -> #0:0 (pcm_s16le -> aac)
       size=       3kB time=00:00:00.18 bitrate= 132.5kbits/s    
    size=       8kB time=00:00:00.55 bitrate= 120.9kbits/s    
    size=      12kB time=00:00:00.83 bitrate= 121.8kbits/s    
    size=      16kB time=00:00:01.04 bitrate= 122.8kbits/s    
    size=      20kB time=00:00:01.32 bitrate= 122.5kbits/s    
    size=      23kB time=00:00:01.53 bitrate= 121.6kbits/s    
    size=      27kB time=00:00:01.81 bitrate= 121.0kbits/s    
    size=      31kB time=00:00:02.11 bitrate= 120.7kbits/s    
    size=      35kB time=00:00:02.32 bitrate= 123.4kbits/s
       video:0kB audio:34kB global headers:0kB muxing overhead 3.031610%
  • How to set pts, dts and duration in ffmpeg library ?

    24 mars, par hslee

    I want to pack some compressed video packets(h.264) to ".mp4" container.
One word, Muxing, no decoding and no encoding.
And I have no idea how to set pts, dts and duration.

    



      

    1. I get the packets with "pcap" library.
    2. 


    3. I removed headers before compressed video data show up. e.g. Ethernet, VLAN.
    4. 


    5. I collected data until one frame and decoded it for getting information of data. e.g. width, height. (I am not sure that it is necessary)
    6. 


    7. I initialized output context, stream and codec context.
    8. 


    9. I started to receive packets with "pcap" library again. (now for muxing)
    10. 


    11. I made one frame and put that data in AVPacket structure.
    12. 


    13. I try to set PTS, DTS and duration. (I think here is wrong part, not sure though)
    14. 


    



    *7-1. At the first frame, I saved time(msec) with packet header structure.

    



    *7-2. whenever I made one frame, I set parameters like this : PTS(current time - start time), DTS(same PTS value), duration(current PTS - before PTS)

    



    I think it has some error because :

    



      

    1. I don't know how far is suitable long for dts from pts.

    2. 


    3. At least, I think duration means how long time show this frame from now to next frame, so It should have value(next PTS - current PTS), but I can not know the value next PTS at that time.

    4. 


    



    It has I-frame only.

    



    // make input context for decoding

AVFormatContext *&ic = gInputContext;

ic = avformat_alloc_context();

AVCodec *cd = avcodec_find_decoder(AV_CODEC_ID_H264);

AVStream *st = avformat_new_stream(ic, cd);

AVCodecContext *cc = st->codec;

avcodec_open2(cc, cd, NULL);

// make packet and decode it after collect packets is be one frame

gPacket.stream_index = 0;

gPacket.size    = gPacketLength[0];

gPacket.data    = gPacketData[0];

gPacket.pts     = AV_NOPTS_VALUE;

gPacket.dts     = AV_NOPTS_VALUE;

gPacket.flags   = AV_PKT_FLAG_KEY;

avcodec_decode_video2(cc, gFrame, &got_picture, &gPacket);

// I checked automatically it initialized after "avcodec_decode_video2"

// put some info that I know that not initialized

cc->time_base.den   = 90000;

cc->time_base.num   = 1;

cc->bit_rate    = 2500000;

cc->gop_size    = 1;

// make output context with input context

AVFormatContext *&oc = gOutputContext;

avformat_alloc_output_context2(&oc, NULL, NULL, filename);

AVFormatContext *&ic = gInputContext;

AVStream *ist = ic->streams[0];

AVCodecContext *&icc = ist->codec;

AVStream *ost = avformat_new_stream(oc, icc->codec);

AVCodecContext *occ = ost->codec;

avcodec_copy_context(occ, icc);

occ->flags |= CODEC_FLAG_GLOBAL_HEADER;

avio_open(&(oc->pb), filename, AVIO_FLAG_WRITE);

// repeated part for muxing

AVRational Millisecond = { 1, 1000 };

gPacket.stream_index = 0;

gPacket.data = gPacketData[0];

gPacket.size = gPacketLength[0];

gPacket.pts = av_rescale_rnd(pkthdr->ts.tv_sec * 1000 /

    + pkthdr->ts.tv_usec / 1000 /

    - gStartTime, Millisecond.den, ost->time_base.den, /

    (AVRounding)(AV_ROUND_NEAR_INF | AV_ROUND_PASS_MINMAX));

gPacket.dts = gPacket.pts;

gPacket.duration = gPacket.pts - gPrev;

gPacket.flags = AV_PKT_FLAG_KEY;

gPrev = gPacket.pts;

av_interleaved_write_frame(gOutputContext, &gPacket);


    



    Expected and actual results is a .mp4 video file that can play.

    


  • Batch splitting large audio files into small fixed-length audio files in moments of silence

    26 juillet 2023, par Haldjärvi

    to train the SO-VITS-SVC neural network, we need 10-14 second voice files. As a material, let's say I use phrases from some game. I have already made a batch script for decoding different files into one working format, another batch script for removing silence, as well as a batch script for combining small audio files into files of 13-14 seconds (I used Python, pydub and FFmpeg). To successfully automatically create a training dataset, it remains only to make one batch script - Cutting audio files lasting more than 14 seconds into separate files lasting 10-14 seconds, cutting in places of silence or close to silence is highly preferable.

    


    So, it is necessary to batch cut large audio files (20 seconds, 70 seconds, possibly several hundred seconds) into segments of approximately 10-14 seconds, however, the main task is to look for the quietest place in the cut areas so as not to cut phrases in the middle of a word (this is not very good for model training). So, is it really possible to do this in a very optimal way, so that the processing of a 30-second file does not take 15 seconds, but is fast ? Quiet zone detection is required only in the area of cuts, that is, 10-14 seconds, if counted from the very beginning of the file.

    


    I would be very grateful for any help.

    


    I tried to write a script together with ChatGPT, but all options gave completely unpredictable results and were not even close to what I needed... I had to stop at the option with a sharp cut of files for exactly 14000 milliseconds. However, I hope there is a chance to make a variant with cutting exactly in quiet areas.

    


    import os
from pydub import AudioSegment

input_directory = ".../RemSilence/"
output_directory = ".../Split/"
max_duration = 14000

def split_audio_by_duration(input_file, duration):
    audio = AudioSegment.from_file(input_file)
    segments = []
    for i in range(0, len(audio), duration):
        segment = audio[i:i + duration]
        segments.append(segment)
    return segments

if __name__ == "__main__":
    os.makedirs(output_directory, exist_ok=True)
    audio_files = [os.path.join(input_directory, file) for file in os.listdir(input_directory) if file.endswith(".wav")]
    audio_files.sort(key=lambda file: len(AudioSegment.from_file(file)))
    for file in audio_files:
        audio = AudioSegment.from_file(file)
        if len(audio) > max_duration:
            segments = split_audio_by_duration(file, max_duration)
            for i, segment in enumerate(segments):
                output_filename = f"output_{len(os.listdir(output_directory))+1}.wav"
                output_file_path = os.path.join(output_directory, output_filename)
                segment.export(output_file_path, format="wav")
        else:
            output_filename = f"output_{len(os.listdir(output_directory))+1}.wav"
            output_file_path = os.path.join(output_directory, output_filename)
            audio.export(output_file_path, format="wav")