Recherche avancée

Médias (1)

Mot : - Tags -/net art

Autres articles (49)

  • Use, discuss, criticize

    13 avril 2011, par

    Talk to people directly involved in MediaSPIP’s development, or to people around you who could use MediaSPIP to share, enhance or develop their creative projects.
    The bigger the community, the more MediaSPIP’s potential will be explored and the faster the software will evolve.
    A discussion list is available for all exchanges between users.

  • Des sites réalisés avec MediaSPIP

    2 mai 2011, par

    Cette page présente quelques-uns des sites fonctionnant sous MediaSPIP.
    Vous pouvez bien entendu ajouter le votre grâce au formulaire en bas de page.

  • MediaSPIP Player : problèmes potentiels

    22 février 2011, par

    Le lecteur ne fonctionne pas sur Internet Explorer
    Sur Internet Explorer (8 et 7 au moins), le plugin utilise le lecteur Flash flowplayer pour lire vidéos et son. Si le lecteur ne semble pas fonctionner, cela peut venir de la configuration du mod_deflate d’Apache.
    Si dans la configuration de ce module Apache vous avez une ligne qui ressemble à la suivante, essayez de la supprimer ou de la commenter pour voir si le lecteur fonctionne correctement : /** * GeSHi (C) 2004 - 2007 Nigel McNie, (...)

Sur d’autres sites (9515)

  • ffmpeg to count words in audio text

    17 juillet 2020, par Joel Parker

    I am new to signal processing but wanted to take an audio file and determine how many words are spoken in one minute. I was thinking I could use the top of the loudness peaks to count the words but do not quite understand how to achieve this.

    


    First I used ffmpeg to remove the audio from the mp4 file I am using :

    


    ffmpeg -i courtcase.mp4 audiofile.mp4

    


    Then I tried to detect the loudness :

    


    ffmpeg -t 10 -i audiofile.mp4 -af "volumedetect" -f null /dev/null

    


    This produced some statistical information :

    


    video:157kB audio:1723kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[Parsed_volumedetect_0 @ 0x7fa6b26068c0] n_samples: 882000
[Parsed_volumedetect_0 @ 0x7fa6b26068c0] mean_volume: -20.6 dB
[Parsed_volumedetect_0 @ 0x7fa6b26068c0] max_volume: -4.0 dB
[Parsed_volumedetect_0 @ 0x7fa6b26068c0] histogram_4db: 64
[Parsed_volumedetect_0 @ 0x7fa6b26068c0] histogram_5db: 88
[Parsed_volumedetect_0 @ 0x7fa6b26068c0] histogram_6db: 220
[Parsed_volumedetect_0 @ 0x7fa6b26068c0] histogram_7db: 843



    


    I am not sure why it still shows 157kB of video, maybe my first command is wrong ?

    


    Anyway, assuming the file is just audio I found this command, which I believe shows dbm slices for 10 seconds :

    


    ffmpeg -i audiofile.mp4 -af astats=metadata=1:reset=1,ametadata=print:key=lavfi.astats.Overall.RMS_level:file=- -f null -


    


    and it produced a bunch of output :

    


    video:5782kB audio:63504kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[Parsed_astats_0 @ 0x7ff74c004bc0] Channel: 1
[Parsed_astats_0 @ 0x7ff74c004bc0] DC offset: 0.000240
[Parsed_astats_0 @ 0x7ff74c004bc0] Min level: -0.166239
[Parsed_astats_0 @ 0x7ff74c004bc0] Max level: 0.127112
[Parsed_astats_0 @ 0x7ff74c004bc0] Min difference: 0.000003
[Parsed_astats_0 @ 0x7ff74c004bc0] Max difference: 0.025335
[Parsed_astats_0 @ 0x7ff74c004bc0] Mean difference: 0.004455
[Parsed_astats_0 @ 0x7ff74c004bc0] RMS difference: 0.006165
[Parsed_astats_0 @ 0x7ff74c004bc0] Peak level dB: -15.585332
[Parsed_astats_0 @ 0x7ff74c004bc0] RMS level dB: -26.251394
[Parsed_astats_0 @ 0x7ff74c004bc0] RMS peak dB: -26.251394
[Parsed_astats_0 @ 0x7ff74c004bc0] RMS trough dB: -26.251394
[Parsed_astats_0 @ 0x7ff74c004bc0] Crest factor: 3.414311
[Parsed_astats_0 @ 0x7ff74c004bc0] Flat factor: 0.000000
[Parsed_astats_0 @ 0x7ff74c004bc0] Peak count: 2
[Parsed_astats_0 @ 0x7ff74c004bc0] Noise floor dB: nan
[Parsed_astats_0 @ 0x7ff74c004bc0] Noise floor count: 0
[Parsed_astats_0 @ 0x7ff74c004bc0] Bit depth: 32/32
[Parsed_astats_0 @ 0x7ff74c004bc0] Dynamic range: 72.297593
[Parsed_astats_0 @ 0x7ff74c004bc0] Zero crossings: 74
[Parsed_astats_0 @ 0x7ff74c004bc0] Zero crossings rate: 0.072266
[Parsed_astats_0 @ 0x7ff74c004bc0] Number of NaNs: 0
[Parsed_astats_0 @ 0x7ff74c004bc0] Number of Infs: 0
[Parsed_astats_0 @ 0x7ff74c004bc0] Number of denormals: 0
[Parsed_astats_0 @ 0x7ff74c004bc0] Channel: 2
[Parsed_astats_0 @ 0x7ff74c004bc0] DC offset: 0.000240
[Parsed_astats_0 @ 0x7ff74c004bc0] Min level: -0.166239
[Parsed_astats_0 @ 0x7ff74c004bc0] Max level: 0.127112
[Parsed_astats_0 @ 0x7ff74c004bc0] Min difference: 0.000003
[Parsed_astats_0 @ 0x7ff74c004bc0] Max difference: 0.025335
[Parsed_astats_0 @ 0x7ff74c004bc0] Mean difference: 0.004455
[Parsed_astats_0 @ 0x7ff74c004bc0] RMS difference: 0.006165
[Parsed_astats_0 @ 0x7ff74c004bc0] Peak level dB: -15.585332
[Parsed_astats_0 @ 0x7ff74c004bc0] RMS level dB: -26.251394
[Parsed_astats_0 @ 0x7ff74c004bc0] RMS peak dB: -26.251394
[Parsed_astats_0 @ 0x7ff74c004bc0] RMS trough dB: -26.251394
[Parsed_astats_0 @ 0x7ff74c004bc0] Crest factor: 3.414311
[Parsed_astats_0 @ 0x7ff74c004bc0] Flat factor: 0.000000
[Parsed_astats_0 @ 0x7ff74c004bc0] Peak count: 2
[Parsed_astats_0 @ 0x7ff74c004bc0] Noise floor dB: nan
[Parsed_astats_0 @ 0x7ff74c004bc0] Noise floor count: 0
[Parsed_astats_0 @ 0x7ff74c004bc0] Bit depth: 32/32
[Parsed_astats_0 @ 0x7ff74c004bc0] Dynamic range: 72.297593
[Parsed_astats_0 @ 0x7ff74c004bc0] Zero crossings: 74
[Parsed_astats_0 @ 0x7ff74c004bc0] Zero crossings rate: 0.072266
[Parsed_astats_0 @ 0x7ff74c004bc0] Number of NaNs: 0
[Parsed_astats_0 @ 0x7ff74c004bc0] Number of Infs: 0
[Parsed_astats_0 @ 0x7ff74c004bc0] Number of denormals: 0
[Parsed_astats_0 @ 0x7ff74c004bc0] Overall
[Parsed_astats_0 @ 0x7ff74c004bc0] DC offset: 0.000240
[Parsed_astats_0 @ 0x7ff74c004bc0] Min level: -0.166239
[Parsed_astats_0 @ 0x7ff74c004bc0] Max level: 0.127112
[Parsed_astats_0 @ 0x7ff74c004bc0] Min difference: 0.000003
[Parsed_astats_0 @ 0x7ff74c004bc0] Max difference: 0.025335
[Parsed_astats_0 @ 0x7ff74c004bc0] Mean difference: 0.004455
[Parsed_astats_0 @ 0x7ff74c004bc0] RMS difference: 0.006165
[Parsed_astats_0 @ 0x7ff74c004bc0] Peak level dB: -15.585332
[Parsed_astats_0 @ 0x7ff74c004bc0] RMS level dB: -26.251394
[Parsed_astats_0 @ 0x7ff74c004bc0] RMS peak dB: -26.251394
[Parsed_astats_0 @ 0x7ff74c004bc0] RMS trough dB: -26.251394
[Parsed_astats_0 @ 0x7ff74c004bc0] Flat factor: 0.000000
[Parsed_astats_0 @ 0x7ff74c004bc0] Peak count: 2.000000
[Parsed_astats_0 @ 0x7ff74c004bc0] Noise floor dB: nan
[Parsed_astats_0 @ 0x7ff74c004bc0] Noise floor count: 0.000000
[Parsed_astats_0 @ 0x7ff74c004bc0] Bit depth: 32/32
[Parsed_astats_0 @ 0x7ff74c004bc0] Number of samples: 1024
[Parsed_astats_0 @ 0x7ff74c004bc0] Number of NaNs: 0.000000
[Parsed_astats_0 @ 0x7ff74c004bc0] Number of Infs: 0.000000
[Parsed_astats_0 @ 0x7ff74c004bc0] Number of denormals: 0.000000
ts_time:368.268
lavfi.astats.Overall.RMS_level=-29.670653
frame:15861 pts:16241664 pts_time:368.292
lavfi.astats.Overall.RMS_level=-30.851195
frame:15862 pts:16242688 pts_time:368.315
lavfi.astats.Overall.RMS_level=-30.700943
frame:15863 pts:16243712 pts_time:368.338
lavfi.astats.Overall.RMS_level=-33.638604
frame:15864 pts:16244736 pts_time:368.361
lavfi.astats.Overall.RMS_level=-21.873170
frame:15865 pts:16245760 pts_time:368.385
lavfi.astats.Overall.RMS_level=-20.001936
frame:15866 pts:16246784 pts_time:368.408
lavfi.astats.Overall.RMS_level=-18.571318
frame:15867 pts:16247808 pts_time:368.431
lavfi.astats.Overall.RMS_level=-18.470749
frame:15868 pts:16248832 pts_time:368.454
lavfi.astats.Overall.RMS_level=-19.506688
frame:15869 pts:16249856 pts_time:368.477
lavfi.astats.Overall.RMS_level=-21.270579
frame:15870 pts:16250880 pts_time:368.501
lavfi.astats.Overall.RMS_level=-25.007862
frame:15871 pts:16251904 pts_time:368.524
lavfi.astats.Overall.RMS_level=-25.654372
frame:15872 pts:16252928 pts_time:368.547
lavfi.astats.Overall.RMS_level=-24.948357
frame:15873 pts:16253952 pts_time:368.57
lavfi.astats.Overall.RMS_level=-30.523540
frame:15874 pts:16254976 pts_time:368.594
....


    


    This is where I'm stuck. I think I have the information I need to determine the number of words spoken in a minute, except I don't know how to put all together. Also the last command just measures 10s slices, would I need to change that to 60s ? Does anyone know how to do this or if there is a better approach ?

    


  • Get waveform data from audio file using FFMPEG

    16 septembre 2020, par Lutando

    I am writing an application that needs to get the raw waveform data of an audio file so I can render it in an application (C#/.NET). I am using ffmpeg to offload this task but it looks like ffmpeg can only output the waveform data as a png or as a stream to gnuplot.

    



    I have looked at other libraries to do this (NAudio/CSCore) however they require windows/microsoft media foundation and since this app is going to be deployed to azure as a web app I can not use them.

    



    My strategy was to just read the waveform data from the png itself but this seems hacky and over the top. The ideal output would be a fix sampled series of peaks in an array where each value in the array is the peak value (ranging from 1-100 or something, like this for example).

    


  • FFMPEG loudnorm ruins audio dynamics

    5 décembre 2020, par storror

    I've been trying to write a small python script to batch normalize
audio streams by LUFS. I'm using the two pass version of FFMPEGs
loudnorm filter, however no matter how I set up the parameters for
it, in the end result there's a drastic difference in loudness
between the beginning and the end of the audio stream. In linear
mode there is no problem, but i think the dynamic mode should be the
core of the whole filter.
    
My parameters :
    
IL_TARGET = -18dB
    
LRA_TARGET = 8dB
    
TP_TARGET = -2dB

    


    Input Integrated : -35.7 LUFS
    
Input True Peak : -12.9 dBTP
    
Input LRA : 4.5 LU
    
Input Threshold : -46.2 LUFS
    
Output Integrated : -18.0 LUFS
    
Output True Peak : -2.0 dBTP
    
Output LRA : 3.8 LU
    
Output Threshold : -28.6 LUFS

    


    Normalization Type : Dynamic
    
snip of the end result

    


    What causes this problem ? Is there anyway to solve it ?