Recherche avancée

Médias (91)

Autres articles (105)

  • Publier sur MédiaSpip

    13 juin 2013

    Puis-je poster des contenus à partir d’une tablette Ipad ?
    Oui, si votre Médiaspip installé est à la version 0.2 ou supérieure. Contacter au besoin l’administrateur de votre MédiaSpip pour le savoir

  • Use, discuss, criticize

    13 avril 2011, par

    Talk to people directly involved in MediaSPIP’s development, or to people around you who could use MediaSPIP to share, enhance or develop their creative projects.
    The bigger the community, the more MediaSPIP’s potential will be explored and the faster the software will evolve.
    A discussion list is available for all exchanges between users.

  • Encoding and processing into web-friendly formats

    13 avril 2011, par

    MediaSPIP automatically converts uploaded files to internet-compatible formats.
    Video files are encoded in MP4, Ogv and WebM (supported by HTML5) and MP4 (supported by Flash).
    Audio files are encoded in MP3 and Ogg (supported by HTML5) and MP3 (supported by Flash).
    Where possible, text is analyzed in order to retrieve the data needed for search engine detection, and then exported as a series of image files.
    All uploaded files are stored online in their original format, so you can (...)

Sur d’autres sites (8611)

  • Revision 5704 : On force l’api google v3 pour l’instant pour GIS 2

    16 août 2011, par kent1 — Log

    On force l’api google v3 pour l’instant pour GIS 2

  • Revision 1d8526d0cc : Merge "Add more coding staticstics tracker" into sandbox/jingning@google.com/dec

    31 juillet 2015, par Jingning Han

    Merge "Add more coding staticstics tracker" into
    sandbox/jingning@google.com/decoder_test_suite

  • How to use Google's Cloud Speech-to-Text API to transcribe a video using the REST API

    8 juin 2018, par mrb

    I’d like to have the transcript of 2 people speaking in a video, but I get an empty response from the Cloud Speech-to-Text API

    Approach :

    I have a 56 minute video file containing a conversation between two people. I would like to have the transcript of that conversation, and I would like to use Google’s Cloud Speech-to-Text API to get that.

    To save a little on my Google Cloud Storage I converted to video to audio first by using mmpeg.

    First I’d tried to figure out the audio codec by using the command below, and it looks like AAC.
    ffmpeg -i video.mp4

    Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'videoplayback.mp4':
     Metadata:
       major_brand     : mp42
       minor_version   : 0
       compatible_brands: isommp42
       creation_time   : 2015-12-30T08:17:14.000000Z
     Duration: 00:56:03.99, start: 0.000000, bitrate: 362 kb/s
       Stream #0:0(und): Video: h264 (Constrained Baseline) (avc1 / 0x31637661), yuv420p, 490x360 [SAR 1:1 DAR 49:36], 264 kb/s,     29.97 fps, 29.97 tbr, 30k tbn, 59.94 tbc (default)
       Metadata:
         handler_name    : VideoHandler
       Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 96 kb/s (default)
       Metadata:
         creation_time   : 2015-12-30T08:17:31.000000Z
         handler_name    : IsoMedia File Produced by Google, 5-11-2011    

    So I took that from the video by using :
    ffmpeg -i video.mp4 -vn -acodec copy myaudio.aac

    Details so far :
    ffmpeg -i myaudio.aac
    Outputs :

    Input #0, aac, from 'myaudio.aac':
     Duration: 00:56:47.49, bitrate: 97 kb/s
       Stream #0:0: Audio: aac (LC), 44100 Hz, stereo, fltp, 97 kb/s

    After that I converted it to opus because I’m told that opus is better
    ffmpeg -i myaudio.aac -acodec libopus -b:a 97k -vbr on -compression_level 10 myaudio.opus

    Info so far :
    opusinfo myaudio.opus

    User comments section follows...
       encoder=Lavc58.18.100 libopus
    Opus stream 1:
       Pre-skip: 312
       Playback gain: 0 dB
       Channels: 2
       Original sample rate: 48000Hz
       Packet duration:   20.0ms (max),   20.0ms (avg),   20.0ms (min)
       Page duration:   1000.0ms (max), 1000.0ms (avg), 1000.0ms (min)
       Total data length: 29956714 bytes (overhead: 0.872%)
       Playback length: 56m:03.990s
       Average bitrate: 71.24 kb/s, w/o overhead: 70.62 kb/s

    I this point I uploaded the myaudio.opus to the Google Cloud Storage.

    curl POST 1
    I started the speech recognition by doing a POST with curl :

    curl --request POST  --header "Content-Type: application/json" --url 'https://speech.googleapis.com/v1/speech:longrunningrecognize?fields=done%2Cerror%2Cmetadata%2Cname%2Cresponse&key={MY_API_KEY}' --data '{"audio": {"uri": "gs://{MY_BUCKET}/myaudio.opus"},"config": {"encoding": "OGG_OPUS", "sampleRateHertz": 48000, "languageCode": "en-US"}}'

    Response : {"name": "123456789"}
    123456789 was not the actual value.

    curl GET 1
    Now I wanted to have the results :

    curl --request GET --url 'https://speech.googleapis.com/v1/operations/123456789?fields=done%2Cerror%2Cmetadata%2Cname%2Cresponse&key={MY_API_KEY}'

    This gave me the error : Error : Unable to recognize speech, possible error in encoding or channel config. Please correct the config and retry the request.

    So I updated the encoding configuration from OGG_OPUS to LINEAR16.

    curl POST 2
    Did the post again :

    curl --request POST  --header "Content-Type: application/json" --url 'https://speech.googleapis.com/v1/speech:longrunningrecognize?fields=done%2Cerror%2Cmetadata%2Cname%2Cresponse&key={MY_API_KEY}' --data '{"audio": {"uri": "gs://{MY_BUCKET}/myaudio.opus"},"config": {"encoding": "LINEAR16", "sampleRateHertz": 48000, "languageCode": "en-US"}}'

    Response : {"name": "987654321"}

    curl GET 2

    curl --request GET --url 'https://speech.googleapis.com/v1/operations/987654321?fields=done%2Cerror%2Cmetadata%2Cname%2Cresponse&key={MY_API_KEY}'

    Response :

    {
     "name": "987654321",
     "metadata": {
       "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
       "progressPercent": 100,
       "startTime": "2018-06-08T11:01:24.596504Z",
       "lastUpdateTime": "2018-06-08T11:01:51.825882Z"
     },
     "done": true
    }

    The problem is that I don’t get the actual transcription. According the the documentation there should be a response key in the response containing the data.

    Since I’m kinda stuck here I’d like to know if I’m doing something completely wrong. I don’t have any technical or resource limitation so all suggestions are very welcome ! Also happy to change my approach.

    Thanks in advance ! Cheers