Recherche avancée

Médias (1)

Mot : - Tags -/ticket

Autres articles (71)

  • MediaSPIP 0.1 Beta version

    25 avril 2011, par

    MediaSPIP 0.1 beta is the first version of MediaSPIP proclaimed as "usable".
    The zip file provided here only contains the sources of MediaSPIP in its standalone version.
    To get a working installation, you must manually install all-software dependencies on the server.
    If you want to use this archive for an installation in "farm mode", you will also need to proceed to other manual (...)

  • MediaSPIP version 0.1 Beta

    16 avril 2011, par

    MediaSPIP 0.1 beta est la première version de MediaSPIP décrétée comme "utilisable".
    Le fichier zip ici présent contient uniquement les sources de MediaSPIP en version standalone.
    Pour avoir une installation fonctionnelle, il est nécessaire d’installer manuellement l’ensemble des dépendances logicielles sur le serveur.
    Si vous souhaitez utiliser cette archive pour une installation en mode ferme, il vous faudra également procéder à d’autres modifications (...)

  • La file d’attente de SPIPmotion

    28 novembre 2010, par

    Une file d’attente stockée dans la base de donnée
    Lors de son installation, SPIPmotion crée une nouvelle table dans la base de donnée intitulée spip_spipmotion_attentes.
    Cette nouvelle table est constituée des champs suivants : id_spipmotion_attente, l’identifiant numérique unique de la tâche à traiter ; id_document, l’identifiant numérique du document original à encoder ; id_objet l’identifiant unique de l’objet auquel le document encodé devra être attaché automatiquement ; objet, le type d’objet auquel (...)

Sur d’autres sites (9675)

  • Google’s YouTube Uses FFmpeg

    9 février 2011, par Multimedia Mike — General

    Controversy arose last week when Google accused Microsoft of stealing search engine results for their Bing search engine. It was a pretty novel sting operation and Google did a good job of visually illustrating their side of the story on their official blog.

    This reminds me of the fact that Google’s YouTube video hosting site uses FFmpeg for converting videos. Not that this is in the same league as the search engine shenanigans (it’s perfectly legit to use FFmpeg in this capacity, but to my knowledge, Google/YouTube has never confirmed FFmpeg usage), but I thought I would revisit this item and illustrate it with screenshots. This is not new information— I first empirically tested this fact 4 years ago. However, a lot of people wonder how exactly I can identify FFmpeg on the backend when I claim that I’ve written code that helps power YouTube.

    Short Answer
    How do I know YouTube uses FFmpeg to convert multimedia ? Because :

    1. FFmpeg can decode a number of impossibly obscure multimedia formats using code I wrote
    2. YouTube can transcode many of the same formats
    3. I screwed up when I wrote the code to support some of these weird formats
    4. My mistakes are still present when YouTube transcodes certain fringe formats

    Longer Answer (With Pictures !)
    Let’s take a video format named RoQ, developed by noted game designer Graeme Devine. Originated for use in the FMV-heavy game The 11th Hour, the format eventually found its way into the Quake 3 engine as well as many games derived from the same technology.

    Dr. Tim Ferguson reverse engineered the format (though it would later be open sourced along with the rest of the Q3 engine). I wrote a RoQ playback system for FFmpeg, and I messed up in doing so. I believe my coding error helps demonstrate the case I’m trying to make here.

    Observe what happened when I pushed the jk02.roq sample through YouTube in my original experiment 4 years ago :



    Do you see how the canyon walls bleed into the sky ? That’s not supposed to happen. FFmpeg doesn’t do that anymore but I was able to go back into the source code history to find when it did do that :



    Academic Answer
    FFmpeg fixed this bug in June of 2007 (thanks to Eric Lasota). The problem had to do with premature colorspace conversion in my original decoder.

    Leftovers
    I tried uploading the video again to see if the problem persists in YouTube’s transcoder. First bit of trivia : YouTube detects when you have uploaded the same video twice and rejects the subsequent attempts. So I created a double concatenation of the video and uploaded it. The problem is gone, illustrating that the backend is actually using a newer version of FFmpeg. This surprises me for somewhat esoteric reasons.

    Here’s another interesting bit of trivia for those who don’t do a lot of YouTube uploading— YouTube reports format details when you upload a video :



    So, yep, RoQ format. And you can wager that this will prompt me to go back through the litany of unusual formats that FFmpeg supports to see how YouTube responds.

  • Computer crashing when using python tools in same script

    5 février 2023, par SL1997

    I am attempting to use the speech recognition toolkit VOSK and the speech diarization package Resemblyzer to transcibe audio and then identify the speakers in the audio.

    


    Tools :

    


    https://github.com/alphacep/vosk-api
    
https://github.com/resemble-ai/Resemblyzer

    


    I can do both things individually but run into issues when trying to do them when running the one python script.

    


    I used the following guide when setting up the diarization system :

    


    https://medium.com/saarthi-ai/who-spoke-when-build-your-own-speaker-diarization-module-from-scratch-e7d725ee279

    


    Computer specs are as follows :

    


    Intel(R) Core(TM) i3-7100 CPU @ 3.90GHz, 3912 Mhz, 2 Core(s), 4 Logical Processor(s)
    
32GB RAM

    


    The following is my code, I am not to sure if using threading is appropriate or if I even implemented it correctly, how can I best optimize this code as to achieve the results I am looking for and not crash.

    


    from vosk import Model, KaldiRecognizer
from pydub import AudioSegment
import json
import sys
import os
import subprocess
import datetime
from resemblyzer import preprocess_wav, VoiceEncoder
from pathlib import Path
from resemblyzer.hparams import sampling_rate
from spectralcluster import SpectralClusterer
import threading
import queue
import gc



def recognition(queue, audio, FRAME_RATE):

    model = Model("Vosk_Models/vosk-model-small-en-us-0.15")

    rec = KaldiRecognizer(model, FRAME_RATE)
    rec.SetWords(True)

    rec.AcceptWaveform(audio.raw_data)
    result = rec.Result()

    transcript = json.loads(result)#["text"]

    #return transcript
    queue.put(transcript)



def diarization(queue, audio):

    wav = preprocess_wav(audio)
    encoder = VoiceEncoder("cpu")
    _, cont_embeds, wav_splits = encoder.embed_utterance(wav, return_partials=True, rate=16)
    print(cont_embeds.shape)

    clusterer = SpectralClusterer(
        min_clusters=2,
        max_clusters=100,
        p_percentile=0.90,
        gaussian_blur_sigma=1)

    labels = clusterer.predict(cont_embeds)

    def create_labelling(labels, wav_splits):

        times = [((s.start + s.stop) / 2) / sampling_rate for s in wav_splits]
        labelling = []
        start_time = 0

        for i, time in enumerate(times):
            if i > 0 and labels[i] != labels[i - 1]:
                temp = [str(labels[i - 1]), start_time, time]
                labelling.append(tuple(temp))
                start_time = time
            if i == len(times) - 1:
                temp = [str(labels[i]), start_time, time]
                labelling.append(tuple(temp))

        return labelling

    #return
    labelling = create_labelling(labels, wav_splits)
    queue.put(labelling)



def identify_speaker(queue1, queue2):

    transcript = queue1.get()
    labelling = queue2.get()

    for speaker in labelling:

        speakerID = speaker[0]
        speakerStart = speaker[1]
        speakerEnd = speaker[2]

        result = transcript['result']
        words = [r['word'] for r in result if speakerStart < r['start'] < speakerEnd]
        #return
        print("Speaker",speakerID,":",' '.join(words), "\n")





def main():

    queue1 = queue.Queue()
    queue2 = queue.Queue()

    FRAME_RATE = 16000
    CHANNELS = 1

    podcast = AudioSegment.from_mp3("Podcast_Audio/Film-Release-Clip.mp3")
    podcast = podcast.set_channels(CHANNELS)
    podcast = podcast.set_frame_rate(FRAME_RATE)

    first_thread = threading.Thread(target=recognition, args=(queue1, podcast, FRAME_RATE))
    second_thread = threading.Thread(target=diarization, args=(queue2, podcast))
    third_thread = threading.Thread(target=identify_speaker, args=(queue1, queue2))

    first_thread.start()
    first_thread.join()
    gc.collect()

    second_thread.start()
    second_thread.join()
    gc.collect()

    third_thread.start()
    third_thread.join()
    gc.collect()

    # transcript = recognition(podcast,FRAME_RATE)
    #
    # labelling = diarization(podcast)
    #
    # print(identify_speaker(transcript, labelling))


if __name__ == '__main__':
    main()


    


    When I say crash I mean everything freezes, I have to hold down the power button on the desktop and turn it back on again. No blue/blank screen, just frozen in my IDE looking at my code. Any help in resolving this issue would be greatly appreciated.

    


  • ARM inline asm secrets

    6 juillet 2010, par Mans — ARM, Compilers

    Although I generally recommend against using GCC inline assembly, preferring instead pure assembly code in separate files, there are occasions where inline is the appropriate solution. Should one, at a time like this, turn to the GCC documentation for guidance, one must be prepared for a degree of disappointment. As it happens, much of the inline asm syntax is left entirely undocumented. This article attempts to fill in some of the blanks for the ARM target.

    Constraints

    Each operand of an inline asm block is described by a constraint string encoding the valid representations of the operand in the generated assembly. For example the “r” code denotes a general-purpose register. In addition to the standard constraints, ARM allows a number of special codes, only some of which are documented. The full list, including a brief description, is available in the constraints.md file in the GCC source tree. The following table is an extract from this file consisting of the codes which are meaningful in an inline asm block (a few are only useful in the machine description itself).

    f Legacy FPA registers f0-f7.
    t The VFP registers s0-s31.
    v The Cirrus Maverick co-processor registers.
    w The VFP registers d0-d15, or d0-d31 for VFPv3.
    x The VFP registers d0-d7.
    y The Intel iWMMX co-processor registers.
    z The Intel iWMMX GR registers.
    l In Thumb state the core registers r0-r7.
    h In Thumb state the core registers r8-r15.
    j A constant suitable for a MOVW instruction. (ARM/Thumb-2)
    b Thumb only. The union of the low registers and the stack register.
    I In ARM/Thumb-2 state a constant that can be used as an immediate value in a Data Processing instruction. In Thumb-1 state a constant in the range 0 to 255.
    J In ARM/Thumb-2 state a constant in the range -4095 to 4095. In Thumb-1 state a constant in the range -255 to -1.
    K In ARM/Thumb-2 state a constant that satisfies the I constraint if inverted. In Thumb-1 state a constant that satisfies the I constraint multiplied by any power of 2.
    L In ARM/Thumb-2 state a constant that satisfies the I constraint if negated. In Thumb-1 state a constant in the range -7 to 7.
    M In Thumb-1 state a constant that is a multiple of 4 in the range 0 to 1020.
    N Thumb-1 state a constant in the range 0 to 31.
    O In Thumb-1 state a constant that is a multiple of 4 in the range -508 to 508.
    Pa In Thumb-1 state a constant in the range -510 to +510
    Pb In Thumb-1 state a constant in the range -262 to +262
    Ps In Thumb-2 state a constant in the range -255 to +255
    Pt In Thumb-2 state a constant in the range -7 to +7
    G In ARM/Thumb-2 state a valid FPA immediate constant.
    H In ARM/Thumb-2 state a valid FPA immediate constant when negated.
    Da In ARM/Thumb-2 state a const_int, const_double or const_vector that can be generated with two Data Processing insns.
    Db In ARM/Thumb-2 state a const_int, const_double or const_vector that can be generated with three Data Processing insns.
    Dc In ARM/Thumb-2 state a const_int, const_double or const_vector that can be generated with four Data Processing insns. This pattern is disabled if optimizing for space or when we have load-delay slots to fill.
    Dn In ARM/Thumb-2 state a const_vector which can be loaded with a Neon vmov immediate instruction.
    Dl In ARM/Thumb-2 state a const_vector which can be used with a Neon vorr or vbic instruction.
    DL In ARM/Thumb-2 state a const_vector which can be used with a Neon vorn or vand instruction.
    Dv In ARM/Thumb-2 state a const_double which can be used with a VFP fconsts instruction.
    Dy In ARM/Thumb-2 state a const_double which can be used with a VFP fconstd instruction.
    Ut In ARM/Thumb-2 state an address valid for loading/storing opaque structure types wider than TImode.
    Uv In ARM/Thumb-2 state a valid VFP load/store address.
    Uy In ARM/Thumb-2 state a valid iWMMX load/store address.
    Un In ARM/Thumb-2 state a valid address for Neon doubleword vector load/store instructions.
    Um In ARM/Thumb-2 state a valid address for Neon element and structure load/store instructions.
    Us In ARM/Thumb-2 state a valid address for non-offset loads/stores of quad-word values in four ARM registers.
    Uq In ARM state an address valid in ldrsb instructions.
    Q In ARM/Thumb-2 state an address that is a single base register.

    Operand codes

    Within the text of an inline asm block, operands are referenced as %0, %1 etc. Register operands are printed as rN, memory operands as [rN, #offset], and so forth. In some situations, for example with operands occupying multiple registers, more detailed control of the output may be required, and once again, an undocumented feature comes to our rescue.

    Special code letters inserted between the % and the operand number alter the output from the default for each type of operand. The table below lists the more useful ones.

    c An integer or symbol address without a preceding # sign
    B Bitwise inverse of integer or symbol without a preceding #
    L The low 16 bits of an immediate constant
    m The base register of a memory operand
    M A register range suitable for LDM/STM
    H The highest-numbered register of a pair
    Q The least significant register of a pair
    R The most significant register of a pair
    P A double-precision VFP register
    p The high single-precision register of a VFP double-precision register
    q A NEON quad register
    e The low doubleword register of a NEON quad register
    f The high doubleword register of a NEON quad register
    h A range of VFP/NEON registers suitable for VLD1/VST1
    A A memory operand for a VLD1/VST1 instruction
    y S register as indexed D register, e.g. s5 becomes d2[1]