Recherche avancée

Médias (0)

Mot : - Tags -/logo

Aucun média correspondant à vos critères n’est disponible sur le site.

Autres articles (57)

  • Websites made ​​with MediaSPIP

    2 mai 2011, par

    This page lists some websites based on MediaSPIP.

  • Possibilité de déploiement en ferme

    12 avril 2011, par

    MediaSPIP peut être installé comme une ferme, avec un seul "noyau" hébergé sur un serveur dédié et utilisé par une multitude de sites différents.
    Cela permet, par exemple : de pouvoir partager les frais de mise en œuvre entre plusieurs projets / individus ; de pouvoir déployer rapidement une multitude de sites uniques ; d’éviter d’avoir à mettre l’ensemble des créations dans un fourre-tout numérique comme c’est le cas pour les grandes plate-formes tout public disséminées sur le (...)

  • Ajouter des informations spécifiques aux utilisateurs et autres modifications de comportement liées aux auteurs

    12 avril 2011, par

    La manière la plus simple d’ajouter des informations aux auteurs est d’installer le plugin Inscription3. Il permet également de modifier certains comportements liés aux utilisateurs (référez-vous à sa documentation pour plus d’informations).
    Il est également possible d’ajouter des champs aux auteurs en installant les plugins champs extras 2 et Interface pour champs extras.

Sur d’autres sites (11560)

  • Computer crashing when using python tools in same script

    5 février 2023, par SL1997

    I am attempting to use the speech recognition toolkit VOSK and the speech diarization package Resemblyzer to transcibe audio and then identify the speakers in the audio.

    


    Tools :

    


    https://github.com/alphacep/vosk-api
    
https://github.com/resemble-ai/Resemblyzer

    


    I can do both things individually but run into issues when trying to do them when running the one python script.

    


    I used the following guide when setting up the diarization system :

    


    https://medium.com/saarthi-ai/who-spoke-when-build-your-own-speaker-diarization-module-from-scratch-e7d725ee279

    


    Computer specs are as follows :

    


    Intel(R) Core(TM) i3-7100 CPU @ 3.90GHz, 3912 Mhz, 2 Core(s), 4 Logical Processor(s)
    
32GB RAM

    


    The following is my code, I am not to sure if using threading is appropriate or if I even implemented it correctly, how can I best optimize this code as to achieve the results I am looking for and not crash.

    


    from vosk import Model, KaldiRecognizer
from pydub import AudioSegment
import json
import sys
import os
import subprocess
import datetime
from resemblyzer import preprocess_wav, VoiceEncoder
from pathlib import Path
from resemblyzer.hparams import sampling_rate
from spectralcluster import SpectralClusterer
import threading
import queue
import gc



def recognition(queue, audio, FRAME_RATE):

    model = Model("Vosk_Models/vosk-model-small-en-us-0.15")

    rec = KaldiRecognizer(model, FRAME_RATE)
    rec.SetWords(True)

    rec.AcceptWaveform(audio.raw_data)
    result = rec.Result()

    transcript = json.loads(result)#["text"]

    #return transcript
    queue.put(transcript)



def diarization(queue, audio):

    wav = preprocess_wav(audio)
    encoder = VoiceEncoder("cpu")
    _, cont_embeds, wav_splits = encoder.embed_utterance(wav, return_partials=True, rate=16)
    print(cont_embeds.shape)

    clusterer = SpectralClusterer(
        min_clusters=2,
        max_clusters=100,
        p_percentile=0.90,
        gaussian_blur_sigma=1)

    labels = clusterer.predict(cont_embeds)

    def create_labelling(labels, wav_splits):

        times = [((s.start + s.stop) / 2) / sampling_rate for s in wav_splits]
        labelling = []
        start_time = 0

        for i, time in enumerate(times):
            if i > 0 and labels[i] != labels[i - 1]:
                temp = [str(labels[i - 1]), start_time, time]
                labelling.append(tuple(temp))
                start_time = time
            if i == len(times) - 1:
                temp = [str(labels[i]), start_time, time]
                labelling.append(tuple(temp))

        return labelling

    #return
    labelling = create_labelling(labels, wav_splits)
    queue.put(labelling)



def identify_speaker(queue1, queue2):

    transcript = queue1.get()
    labelling = queue2.get()

    for speaker in labelling:

        speakerID = speaker[0]
        speakerStart = speaker[1]
        speakerEnd = speaker[2]

        result = transcript['result']
        words = [r['word'] for r in result if speakerStart < r['start'] < speakerEnd]
        #return
        print("Speaker",speakerID,":",' '.join(words), "\n")





def main():

    queue1 = queue.Queue()
    queue2 = queue.Queue()

    FRAME_RATE = 16000
    CHANNELS = 1

    podcast = AudioSegment.from_mp3("Podcast_Audio/Film-Release-Clip.mp3")
    podcast = podcast.set_channels(CHANNELS)
    podcast = podcast.set_frame_rate(FRAME_RATE)

    first_thread = threading.Thread(target=recognition, args=(queue1, podcast, FRAME_RATE))
    second_thread = threading.Thread(target=diarization, args=(queue2, podcast))
    third_thread = threading.Thread(target=identify_speaker, args=(queue1, queue2))

    first_thread.start()
    first_thread.join()
    gc.collect()

    second_thread.start()
    second_thread.join()
    gc.collect()

    third_thread.start()
    third_thread.join()
    gc.collect()

    # transcript = recognition(podcast,FRAME_RATE)
    #
    # labelling = diarization(podcast)
    #
    # print(identify_speaker(transcript, labelling))


if __name__ == '__main__':
    main()


    


    When I say crash I mean everything freezes, I have to hold down the power button on the desktop and turn it back on again. No blue/blank screen, just frozen in my IDE looking at my code. Any help in resolving this issue would be greatly appreciated.

    


  • ARM inline asm secrets

    6 juillet 2010, par Mans — ARM, Compilers

    Although I generally recommend against using GCC inline assembly, preferring instead pure assembly code in separate files, there are occasions where inline is the appropriate solution. Should one, at a time like this, turn to the GCC documentation for guidance, one must be prepared for a degree of disappointment. As it happens, much of the inline asm syntax is left entirely undocumented. This article attempts to fill in some of the blanks for the ARM target.

    Constraints

    Each operand of an inline asm block is described by a constraint string encoding the valid representations of the operand in the generated assembly. For example the “r” code denotes a general-purpose register. In addition to the standard constraints, ARM allows a number of special codes, only some of which are documented. The full list, including a brief description, is available in the constraints.md file in the GCC source tree. The following table is an extract from this file consisting of the codes which are meaningful in an inline asm block (a few are only useful in the machine description itself).

    f Legacy FPA registers f0-f7.
    t The VFP registers s0-s31.
    v The Cirrus Maverick co-processor registers.
    w The VFP registers d0-d15, or d0-d31 for VFPv3.
    x The VFP registers d0-d7.
    y The Intel iWMMX co-processor registers.
    z The Intel iWMMX GR registers.
    l In Thumb state the core registers r0-r7.
    h In Thumb state the core registers r8-r15.
    j A constant suitable for a MOVW instruction. (ARM/Thumb-2)
    b Thumb only. The union of the low registers and the stack register.
    I In ARM/Thumb-2 state a constant that can be used as an immediate value in a Data Processing instruction. In Thumb-1 state a constant in the range 0 to 255.
    J In ARM/Thumb-2 state a constant in the range -4095 to 4095. In Thumb-1 state a constant in the range -255 to -1.
    K In ARM/Thumb-2 state a constant that satisfies the I constraint if inverted. In Thumb-1 state a constant that satisfies the I constraint multiplied by any power of 2.
    L In ARM/Thumb-2 state a constant that satisfies the I constraint if negated. In Thumb-1 state a constant in the range -7 to 7.
    M In Thumb-1 state a constant that is a multiple of 4 in the range 0 to 1020.
    N Thumb-1 state a constant in the range 0 to 31.
    O In Thumb-1 state a constant that is a multiple of 4 in the range -508 to 508.
    Pa In Thumb-1 state a constant in the range -510 to +510
    Pb In Thumb-1 state a constant in the range -262 to +262
    Ps In Thumb-2 state a constant in the range -255 to +255
    Pt In Thumb-2 state a constant in the range -7 to +7
    G In ARM/Thumb-2 state a valid FPA immediate constant.
    H In ARM/Thumb-2 state a valid FPA immediate constant when negated.
    Da In ARM/Thumb-2 state a const_int, const_double or const_vector that can be generated with two Data Processing insns.
    Db In ARM/Thumb-2 state a const_int, const_double or const_vector that can be generated with three Data Processing insns.
    Dc In ARM/Thumb-2 state a const_int, const_double or const_vector that can be generated with four Data Processing insns. This pattern is disabled if optimizing for space or when we have load-delay slots to fill.
    Dn In ARM/Thumb-2 state a const_vector which can be loaded with a Neon vmov immediate instruction.
    Dl In ARM/Thumb-2 state a const_vector which can be used with a Neon vorr or vbic instruction.
    DL In ARM/Thumb-2 state a const_vector which can be used with a Neon vorn or vand instruction.
    Dv In ARM/Thumb-2 state a const_double which can be used with a VFP fconsts instruction.
    Dy In ARM/Thumb-2 state a const_double which can be used with a VFP fconstd instruction.
    Ut In ARM/Thumb-2 state an address valid for loading/storing opaque structure types wider than TImode.
    Uv In ARM/Thumb-2 state a valid VFP load/store address.
    Uy In ARM/Thumb-2 state a valid iWMMX load/store address.
    Un In ARM/Thumb-2 state a valid address for Neon doubleword vector load/store instructions.
    Um In ARM/Thumb-2 state a valid address for Neon element and structure load/store instructions.
    Us In ARM/Thumb-2 state a valid address for non-offset loads/stores of quad-word values in four ARM registers.
    Uq In ARM state an address valid in ldrsb instructions.
    Q In ARM/Thumb-2 state an address that is a single base register.

    Operand codes

    Within the text of an inline asm block, operands are referenced as %0, %1 etc. Register operands are printed as rN, memory operands as [rN, #offset], and so forth. In some situations, for example with operands occupying multiple registers, more detailed control of the output may be required, and once again, an undocumented feature comes to our rescue.

    Special code letters inserted between the % and the operand number alter the output from the default for each type of operand. The table below lists the more useful ones.

    c An integer or symbol address without a preceding # sign
    B Bitwise inverse of integer or symbol without a preceding #
    L The low 16 bits of an immediate constant
    m The base register of a memory operand
    M A register range suitable for LDM/STM
    H The highest-numbered register of a pair
    Q The least significant register of a pair
    R The most significant register of a pair
    P A double-precision VFP register
    p The high single-precision register of a VFP double-precision register
    q A NEON quad register
    e The low doubleword register of a NEON quad register
    f The high doubleword register of a NEON quad register
    h A range of VFP/NEON registers suitable for VLD1/VST1
    A A memory operand for a VLD1/VST1 instruction
    y S register as indexed D register, e.g. s5 becomes d2[1]
  • ffmpeg : stream copy from .mxf into NLE-compatible format

    9 juin 2013, par David

    Because my NLE software does not support the .mxf-files from Canon XF100 I need to convert them into a supported format.

    As far as I know, mxf-files are just another container format for mpeg2 streams, so it would be really nice to extract the streams and place them into another container (without reencoding).

    I think ffmpeg can do this – correct me if I'm wrong – by running the following command :

    ffmpeg -i in.mxf -vcodec copy out.m2ts (or .ts, .mts, ...)

    ffmpeg finishes without errors after about 2 seconds (in.mxf is abut 170mb) :

    c:\video>c:\ffmpeg\bin\ffmpeg -i in.MXF -vcodec copy out.m2ts
    ffmpeg version N-53680-g0ab9362 Copyright (c) 2000-2013 the FFmpeg developers
     built on May 30 2013 12:14:03 with gcc 4.7.3 (GCC)
     configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-av
    isynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enab
    le-iconv --enable-libass --enable-libbluray --enable-libcaca --enable-libfreetyp
    e --enable-libgsm --enable-libilbc --enable-libmodplug --enable-libmp3lame --ena
    ble-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-l
    ibopus --enable-librtmp --enable-libschroedinger --enable-libsoxr --enable-libsp
    eex --enable-libtheora --enable-libtwolame --enable-libvo-aacenc --enable-libvo-
    amrwbenc --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libxavs --
    enable-libxvid --enable-zlib
     libavutil      52. 34.100 / 52. 34.100
     libavcodec     55. 12.102 / 55. 12.102
     libavformat    55.  8.100 / 55.  8.100
     libavdevice    55.  2.100 / 55.  2.100
     libavfilter     3. 73.100 /  3. 73.100
     libswscale      2.  3.100 /  2.  3.100
     libswresample   0. 17.102 /  0. 17.102
     libpostproc    52.  3.100 / 52.  3.100
    Guessed Channel Layout for  Input Stream #0.1 : mono
    Guessed Channel Layout for  Input Stream #0.2 : mono
    Input #0, mxf, from 'in.MXF':
     Metadata:
       uid             : 1bb23c97-6205-4800-80a2-e00002244ba7
       generation_uid  : 1bb23c97-6205-4800-8122-e00002244ba7
       company_name    : CANON
       product_name    : XF100
       product_version : 1.00
       product_uid     : 060e2b34-0401-010d-0e15-005658460100
       modification_date: 2013-01-06 11:05:02
       timecode        : 01:42:14:22
     Duration: 00:00:28.32, start: 0.000000, bitrate: 51811 kb/s
       Stream #0:0: Video: mpeg2video (4:2:2), yuv422p, 1920x1080 [SAR 1:1 DAR 16:9
    ], 25 fps, 25 tbr, 25 tbn, 50 tbc
       Stream #0:1: Audio: pcm_s16le, 48000 Hz, mono, s16, 768 kb/s
       Stream #0:2: Audio: pcm_s16le, 48000 Hz, mono, s16, 768 kb/s
    Output #0, mpegts, to 'out.m2ts':
     Metadata:
       uid             : 1bb23c97-6205-4800-80a2-e00002244ba7
       generation_uid  : 1bb23c97-6205-4800-8122-e00002244ba7
       company_name    : CANON
       product_name    : XF100
       product_version : 1.00
       product_uid     : 060e2b34-0401-010d-0e15-005658460100
       modification_date: 2013-01-06 11:05:02
       timecode        : 01:42:14:22
       encoder         : Lavf55.8.100
       Stream #0:0: Video: mpeg2video, yuv422p, 1920x1080 [SAR 1:1 DAR 16:9], q=2-3
    1, 25 fps, 90k tbn, 25 tbc
       Stream #0:1: Audio: mp2, 48000 Hz, mono, s16, 128 kb/s
    Stream mapping:
     Stream #0:0 -> #0:0 (copy)
     Stream #0:1 -> #0:1 (pcm_s16le -> mp2)
    Press [q] to stop, [?] for help
    frame=  532 fps=0.0 q=-1.0 size=  143511kB time=00:00:21.25 bitrate=55314.1kbits
    frame=  561 fps=435 q=-1.0 size=  151254kB time=00:00:22.42 bitrate=55242.0kbits
    frame=  586 fps=314 q=-1.0 size=  158021kB time=00:00:23.41 bitrate=55288.0kbits
    frame=  609 fps=255 q=-1.0 size=  164182kB time=00:00:24.34 bitrate=55235.4kbits
    frame=  636 fps=217 q=-1.0 size=  171463kB time=00:00:25.42 bitrate=55235.1kbits
    frame=  669 fps=194 q=-1.0 size=  180133kB time=00:00:26.72 bitrate=55226.3kbits
    frame=  699 fps=173 q=-1.0 size=  188326kB time=00:00:27.92 bitrate=55256.6kbits
    frame=  708 fps=169 q=-1.0 Lsize=  190877kB time=00:00:28.30 bitrate=55233.6kbit
    s/s
    video:172852kB audio:442kB subtitle:0 global headers:0kB muxing overhead 10.1461
    18%

    Unfortunately the output file turns out to be displayed correctly only by vlc player.
    My NLE-software (Cyberlink Power Director) is able to open the file but most of the picture is green. Only a few pixels on the left edge show the original video :

    output file

    Any ideas how to solve that problem ? Is there a better way to use .mxf-files in NLE-software without native support ?

    thanks in advance