Recherche avancée

Recherche
Choix de la période de publication
Date minimale :

Date maximale :

Type de date :
Choix de la langue
Choix du type de média
Choix de la rubrique
Choix de la licence de publication
Choix de l’auteur

Médias (29)

Mot : - Tags -/Musique

#7 Ambience

16 octobre 2011, par kent1

Mis à jour : Juin 2015

Langue : English

Type : Audio

Tags : creative commons, Musique, mp3, Elephant dreams, soundtrack

1
2
3
4
5
#6 Teaser Music

16 octobre 2011, par kent1

Mis à jour : Février 2013

Langue : English

Type : Audio

Tags : creative commons, Musique, mp3, Elephant dreams, soundtrack

1
2
3
4
5
#5 End Title

16 octobre 2011, par kent1

Mis à jour : Février 2013

Langue : English

Type : Audio

Tags : creative commons, Musique, mp3, Elephant dreams, soundtrack

1
2
3
4
5
#3 The Safest Place

16 octobre 2011, par kent1

Mis à jour : Février 2013

Langue : English

Type : Audio

Tags : creative commons, Musique, mp3, Elephant dreams, soundtrack

1
2
3
4
5
#4 Emo Creates

15 octobre 2011, par kent1

Mis à jour : Février 2013

Langue : English

Type : Audio

Tags : creative commons, Musique, mp3, Elephant dreams, soundtrack

1
2
3
4
5
#2 Typewriter Dance

15 octobre 2011, par kent1

Mis à jour : Février 2013

Langue : English

Type : Audio

Tags : creative commons, Musique, mp3, Elephant dreams, soundtrack

1
2
3
4
5

1 | 2 | 3 | 4 | 5

Autres articles (77)

Les autorisations surchargées par les plugins

27 avril 2010, par kent1

Mediaspip core
autoriser_auteur_modifier() afin que les visiteurs soient capables de modifier leurs informations sur la page d’auteurs
Publier sur MédiaSpip

13 juin 2013

Puis-je poster des contenus à partir d’une tablette Ipad ?
Oui, si votre Médiaspip installé est à la version 0.2 ou supérieure. Contacter au besoin l’administrateur de votre MédiaSpip pour le savoir
List of compatible distributions

26 avril 2011, par kent1

The table below is the list of Linux distributions compatible with the automated installation script of MediaSPIP. Distribution nameVersion nameVersion number Debian Squeeze 6.x.x Debian Weezy 7.x.x Debian Jessie 8.x.x Ubuntu The Precise Pangolin 12.04 LTS Ubuntu The Trusty Tahr 14.04
If you want to help us improve this list, you can provide us access to a machine whose distribution is not mentioned above or send the necessary fixes to add (...)

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 26

Sur d’autres sites (7742)

ffmpeg to count words in audio text

17 juillet 2020, par Joel Parker

I am new to signal processing but wanted to take an audio file and determine how many words are spoken in one minute. I was thinking I could use the top of the loudness peaks to count the words but do not quite understand how to achieve this.

First I used ffmpeg to remove the audio from the mp4 file I am using :

ffmpeg -i courtcase.mp4 audiofile.mp4

Then I tried to detect the loudness :

ffmpeg -t 10 -i audiofile.mp4 -af "volumedetect" -f null /dev/null

This produced some statistical information :

video:157kB audio:1723kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown&#xA;[Parsed_volumedetect_0 @ 0x7fa6b26068c0] n_samples: 882000&#xA;[Parsed_volumedetect_0 @ 0x7fa6b26068c0] mean_volume: -20.6 dB&#xA;[Parsed_volumedetect_0 @ 0x7fa6b26068c0] max_volume: -4.0 dB&#xA;[Parsed_volumedetect_0 @ 0x7fa6b26068c0] histogram_4db: 64&#xA;[Parsed_volumedetect_0 @ 0x7fa6b26068c0] histogram_5db: 88&#xA;[Parsed_volumedetect_0 @ 0x7fa6b26068c0] histogram_6db: 220&#xA;[Parsed_volumedetect_0 @ 0x7fa6b26068c0] histogram_7db: 843&#xA;&#xA;

I am not sure why it still shows 157kB of video, maybe my first command is wrong ?

Anyway, assuming the file is just audio I found this command, which I believe shows dbm slices for 10 seconds :

ffmpeg -i audiofile.mp4 -af astats=metadata=1:reset=1,ametadata=print:key=lavfi.astats.Overall.RMS_level:file=- -f null -&#xA;

and it produced a bunch of output :

video:5782kB audio:63504kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Channel: 1&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] DC offset: 0.000240&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Min level: -0.166239&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Max level: 0.127112&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Min difference: 0.000003&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Max difference: 0.025335&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Mean difference: 0.004455&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] RMS difference: 0.006165&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Peak level dB: -15.585332&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] RMS level dB: -26.251394&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] RMS peak dB: -26.251394&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] RMS trough dB: -26.251394&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Crest factor: 3.414311&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Flat factor: 0.000000&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Peak count: 2&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Noise floor dB: nan&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Noise floor count: 0&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Bit depth: 32/32&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Dynamic range: 72.297593&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Zero crossings: 74&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Zero crossings rate: 0.072266&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Number of NaNs: 0&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Number of Infs: 0&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Number of denormals: 0&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Channel: 2&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] DC offset: 0.000240&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Min level: -0.166239&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Max level: 0.127112&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Min difference: 0.000003&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Max difference: 0.025335&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Mean difference: 0.004455&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] RMS difference: 0.006165&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Peak level dB: -15.585332&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] RMS level dB: -26.251394&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] RMS peak dB: -26.251394&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] RMS trough dB: -26.251394&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Crest factor: 3.414311&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Flat factor: 0.000000&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Peak count: 2&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Noise floor dB: nan&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Noise floor count: 0&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Bit depth: 32/32&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Dynamic range: 72.297593&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Zero crossings: 74&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Zero crossings rate: 0.072266&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Number of NaNs: 0&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Number of Infs: 0&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Number of denormals: 0&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Overall&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] DC offset: 0.000240&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Min level: -0.166239&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Max level: 0.127112&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Min difference: 0.000003&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Max difference: 0.025335&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Mean difference: 0.004455&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] RMS difference: 0.006165&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Peak level dB: -15.585332&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] RMS level dB: -26.251394&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] RMS peak dB: -26.251394&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] RMS trough dB: -26.251394&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Flat factor: 0.000000&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Peak count: 2.000000&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Noise floor dB: nan&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Noise floor count: 0.000000&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Bit depth: 32/32&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Number of samples: 1024&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Number of NaNs: 0.000000&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Number of Infs: 0.000000&#xA;[Parsed_astats_0 @ 0x7ff74c004bc0] Number of denormals: 0.000000&#xA;ts_time:368.268&#xA;lavfi.astats.Overall.RMS_level=-29.670653&#xA;frame:15861 pts:16241664 pts_time:368.292&#xA;lavfi.astats.Overall.RMS_level=-30.851195&#xA;frame:15862 pts:16242688 pts_time:368.315&#xA;lavfi.astats.Overall.RMS_level=-30.700943&#xA;frame:15863 pts:16243712 pts_time:368.338&#xA;lavfi.astats.Overall.RMS_level=-33.638604&#xA;frame:15864 pts:16244736 pts_time:368.361&#xA;lavfi.astats.Overall.RMS_level=-21.873170&#xA;frame:15865 pts:16245760 pts_time:368.385&#xA;lavfi.astats.Overall.RMS_level=-20.001936&#xA;frame:15866 pts:16246784 pts_time:368.408&#xA;lavfi.astats.Overall.RMS_level=-18.571318&#xA;frame:15867 pts:16247808 pts_time:368.431&#xA;lavfi.astats.Overall.RMS_level=-18.470749&#xA;frame:15868 pts:16248832 pts_time:368.454&#xA;lavfi.astats.Overall.RMS_level=-19.506688&#xA;frame:15869 pts:16249856 pts_time:368.477&#xA;lavfi.astats.Overall.RMS_level=-21.270579&#xA;frame:15870 pts:16250880 pts_time:368.501&#xA;lavfi.astats.Overall.RMS_level=-25.007862&#xA;frame:15871 pts:16251904 pts_time:368.524&#xA;lavfi.astats.Overall.RMS_level=-25.654372&#xA;frame:15872 pts:16252928 pts_time:368.547&#xA;lavfi.astats.Overall.RMS_level=-24.948357&#xA;frame:15873 pts:16253952 pts_time:368.57&#xA;lavfi.astats.Overall.RMS_level=-30.523540&#xA;frame:15874 pts:16254976 pts_time:368.594&#xA;....&#xA;

This is where I'm stuck. I think I have the information I need to determine the number of words spoken in a minute, except I don't know how to put all together. Also the last command just measures 10s slices, would I need to change that to 60s ? Does anyone know how to do this or if there is a better approach ?

How the ffmpeg astats crest factor is calculated

30 août 2017, par FranGar
I’m scripting a ffmpeg chain process for my work. The aim is normalizing/compressing lot of audio files (mp3’s).
It’s done in Python and the critical part is the line :
```
ffmpeg -y -i "Input.mp3" -codec:a libmp3lame -b:a 96k -af acompressor=threshold=-15dB:ratio=5:attack=0.01:release=1000:knee=2,dynaudnorm=g=3:m=2:p=0.95 "Output.mp3"
```
The python script it’s complete and working BUT the nature of the audios (voice recordings) are very different so I can’t use the same params for all of them.

I make some experimenting with the values of the ffmpeg filter astats and i discovered that the crest factor (Standard ratio of peak to RMS level ) gave a good reference to programatically get the better params.

In fact I saw that a recording with a nice dynamic range sound and smooth in shape, get crest values around 9-15 (the compress/normlz params will be somehow conservative). But audios with crest around 22-30 need more aggressive processing.
(All empirically)

Somebody can clarify how the crest values are really calculated ? Which are the peaks taken to account ? (Why the flat factor is always 0 ?)
Or if somebody knows how to get a value representing the sound ’smoothness’ will be nice also.

Thanks for the ideas.
How the ffmpeg astats crest factor value of an audio track is calculated

29 août 2017, par FranGar
I’m scripting a ffmpeg chain process for my work. The aim is normalizing/compressing lot of audio files (mp3’s).
It’s done in Python and the critical part is the line :
```
ffmpeg -y -i "Input.mp3" -codec:a libmp3lame -b:a 96k -af acompressor=threshold=-15dB:ratio=5:attack=0.01:release=1000:knee=2,dynaudnorm=g=3:m=2:p=0.95 "Output.mp3"
```
The python script it’s complete and working BUT the nature of the audios (voice recordings) are very different so I can’t use the same params for all of them.

I make some experimenting with the values of the ffmpeg filter astats and i discovered that the crest factor (Standard ratio of peak to RMS level ) gave a good reference to programatically get the better params.

In fact I saw that a recording with a nice dynamic range sound and smooth in shape, get crest values around 9-15 (the compress/normlz params will be somehow conservative). But audios with crest around 22-30 need more aggressive processing.
(All empirically)

Somebody can clarify how the crest values are really calculated ? Which are the peaks taken to account ? (Why the flat factor is always 0 ?)
Or if somebody knows how to get a value representing the sound ’smoothness’ will be nice also.

Thanks for the ideas.

1 | ... | 1021 | 1022 | 1023 | 1024 | 1025 | 1026 | 1027 | 1028 | 1029 | ... | 2581

Recherche avancée

Médias (29)

#7 Ambience

#6 Teaser Music

#5 End Title

#3 The Safest Place

#4 Emo Creates

#2 Typewriter Dance

Autres articles (77)

Les autorisations surchargées par les plugins

Publier sur MédiaSpip

List of compatible distributions

Sur d’autres sites (7742)

ffmpeg to count words in audio text

How the ffmpeg astats crest factor is calculated

How the ffmpeg astats crest factor value of an audio track is calculated

Se connecter

Navigation

Syndication

Boussole SPIP