Recherche avancée

Médias (91)

Autres articles (103)

  • Websites made ​​with MediaSPIP

    2 mai 2011, par

    This page lists some websites based on MediaSPIP.

  • Creating farms of unique websites

    13 avril 2011, par

    MediaSPIP platforms can be installed as a farm, with a single "core" hosted on a dedicated server and used by multiple websites.
    This allows (among other things) : implementation costs to be shared between several different projects / individuals rapid deployment of multiple unique sites creation of groups of like-minded sites, making it possible to browse media in a more controlled and selective environment than the major "open" (...)

  • Contribute to a better visual interface

    13 avril 2011

    MediaSPIP is based on a system of themes and templates. Templates define the placement of information on the page, and can be adapted to a wide range of uses. Themes define the overall graphic appearance of the site.
    Anyone can submit a new graphic theme or template and make it available to the MediaSPIP community.

Sur d’autres sites (12989)

  • Computer crashing when using python tools in same script

    5 février 2023, par SL1997

    I am attempting to use the speech recognition toolkit VOSK and the speech diarization package Resemblyzer to transcibe audio and then identify the speakers in the audio.

    


    Tools :

    


    https://github.com/alphacep/vosk-api
    
https://github.com/resemble-ai/Resemblyzer

    


    I can do both things individually but run into issues when trying to do them when running the one python script.

    


    I used the following guide when setting up the diarization system :

    


    https://medium.com/saarthi-ai/who-spoke-when-build-your-own-speaker-diarization-module-from-scratch-e7d725ee279

    


    Computer specs are as follows :

    


    Intel(R) Core(TM) i3-7100 CPU @ 3.90GHz, 3912 Mhz, 2 Core(s), 4 Logical Processor(s)
    
32GB RAM

    


    The following is my code, I am not to sure if using threading is appropriate or if I even implemented it correctly, how can I best optimize this code as to achieve the results I am looking for and not crash.

    


    from vosk import Model, KaldiRecognizer
from pydub import AudioSegment
import json
import sys
import os
import subprocess
import datetime
from resemblyzer import preprocess_wav, VoiceEncoder
from pathlib import Path
from resemblyzer.hparams import sampling_rate
from spectralcluster import SpectralClusterer
import threading
import queue
import gc



def recognition(queue, audio, FRAME_RATE):

    model = Model("Vosk_Models/vosk-model-small-en-us-0.15")

    rec = KaldiRecognizer(model, FRAME_RATE)
    rec.SetWords(True)

    rec.AcceptWaveform(audio.raw_data)
    result = rec.Result()

    transcript = json.loads(result)#["text"]

    #return transcript
    queue.put(transcript)



def diarization(queue, audio):

    wav = preprocess_wav(audio)
    encoder = VoiceEncoder("cpu")
    _, cont_embeds, wav_splits = encoder.embed_utterance(wav, return_partials=True, rate=16)
    print(cont_embeds.shape)

    clusterer = SpectralClusterer(
        min_clusters=2,
        max_clusters=100,
        p_percentile=0.90,
        gaussian_blur_sigma=1)

    labels = clusterer.predict(cont_embeds)

    def create_labelling(labels, wav_splits):

        times = [((s.start + s.stop) / 2) / sampling_rate for s in wav_splits]
        labelling = []
        start_time = 0

        for i, time in enumerate(times):
            if i > 0 and labels[i] != labels[i - 1]:
                temp = [str(labels[i - 1]), start_time, time]
                labelling.append(tuple(temp))
                start_time = time
            if i == len(times) - 1:
                temp = [str(labels[i]), start_time, time]
                labelling.append(tuple(temp))

        return labelling

    #return
    labelling = create_labelling(labels, wav_splits)
    queue.put(labelling)



def identify_speaker(queue1, queue2):

    transcript = queue1.get()
    labelling = queue2.get()

    for speaker in labelling:

        speakerID = speaker[0]
        speakerStart = speaker[1]
        speakerEnd = speaker[2]

        result = transcript['result']
        words = [r['word'] for r in result if speakerStart < r['start'] < speakerEnd]
        #return
        print("Speaker",speakerID,":",' '.join(words), "\n")





def main():

    queue1 = queue.Queue()
    queue2 = queue.Queue()

    FRAME_RATE = 16000
    CHANNELS = 1

    podcast = AudioSegment.from_mp3("Podcast_Audio/Film-Release-Clip.mp3")
    podcast = podcast.set_channels(CHANNELS)
    podcast = podcast.set_frame_rate(FRAME_RATE)

    first_thread = threading.Thread(target=recognition, args=(queue1, podcast, FRAME_RATE))
    second_thread = threading.Thread(target=diarization, args=(queue2, podcast))
    third_thread = threading.Thread(target=identify_speaker, args=(queue1, queue2))

    first_thread.start()
    first_thread.join()
    gc.collect()

    second_thread.start()
    second_thread.join()
    gc.collect()

    third_thread.start()
    third_thread.join()
    gc.collect()

    # transcript = recognition(podcast,FRAME_RATE)
    #
    # labelling = diarization(podcast)
    #
    # print(identify_speaker(transcript, labelling))


if __name__ == '__main__':
    main()


    


    When I say crash I mean everything freezes, I have to hold down the power button on the desktop and turn it back on again. No blue/blank screen, just frozen in my IDE looking at my code. Any help in resolving this issue would be greatly appreciated.

    


  • ARM inline asm secrets

    6 juillet 2010, par Mans — ARM, Compilers

    Although I generally recommend against using GCC inline assembly, preferring instead pure assembly code in separate files, there are occasions where inline is the appropriate solution. Should one, at a time like this, turn to the GCC documentation for guidance, one must be prepared for a degree of disappointment. As it happens, much of the inline asm syntax is left entirely undocumented. This article attempts to fill in some of the blanks for the ARM target.

    Constraints

    Each operand of an inline asm block is described by a constraint string encoding the valid representations of the operand in the generated assembly. For example the “r” code denotes a general-purpose register. In addition to the standard constraints, ARM allows a number of special codes, only some of which are documented. The full list, including a brief description, is available in the constraints.md file in the GCC source tree. The following table is an extract from this file consisting of the codes which are meaningful in an inline asm block (a few are only useful in the machine description itself).

    f Legacy FPA registers f0-f7.
    t The VFP registers s0-s31.
    v The Cirrus Maverick co-processor registers.
    w The VFP registers d0-d15, or d0-d31 for VFPv3.
    x The VFP registers d0-d7.
    y The Intel iWMMX co-processor registers.
    z The Intel iWMMX GR registers.
    l In Thumb state the core registers r0-r7.
    h In Thumb state the core registers r8-r15.
    j A constant suitable for a MOVW instruction. (ARM/Thumb-2)
    b Thumb only. The union of the low registers and the stack register.
    I In ARM/Thumb-2 state a constant that can be used as an immediate value in a Data Processing instruction. In Thumb-1 state a constant in the range 0 to 255.
    J In ARM/Thumb-2 state a constant in the range -4095 to 4095. In Thumb-1 state a constant in the range -255 to -1.
    K In ARM/Thumb-2 state a constant that satisfies the I constraint if inverted. In Thumb-1 state a constant that satisfies the I constraint multiplied by any power of 2.
    L In ARM/Thumb-2 state a constant that satisfies the I constraint if negated. In Thumb-1 state a constant in the range -7 to 7.
    M In Thumb-1 state a constant that is a multiple of 4 in the range 0 to 1020.
    N Thumb-1 state a constant in the range 0 to 31.
    O In Thumb-1 state a constant that is a multiple of 4 in the range -508 to 508.
    Pa In Thumb-1 state a constant in the range -510 to +510
    Pb In Thumb-1 state a constant in the range -262 to +262
    Ps In Thumb-2 state a constant in the range -255 to +255
    Pt In Thumb-2 state a constant in the range -7 to +7
    G In ARM/Thumb-2 state a valid FPA immediate constant.
    H In ARM/Thumb-2 state a valid FPA immediate constant when negated.
    Da In ARM/Thumb-2 state a const_int, const_double or const_vector that can be generated with two Data Processing insns.
    Db In ARM/Thumb-2 state a const_int, const_double or const_vector that can be generated with three Data Processing insns.
    Dc In ARM/Thumb-2 state a const_int, const_double or const_vector that can be generated with four Data Processing insns. This pattern is disabled if optimizing for space or when we have load-delay slots to fill.
    Dn In ARM/Thumb-2 state a const_vector which can be loaded with a Neon vmov immediate instruction.
    Dl In ARM/Thumb-2 state a const_vector which can be used with a Neon vorr or vbic instruction.
    DL In ARM/Thumb-2 state a const_vector which can be used with a Neon vorn or vand instruction.
    Dv In ARM/Thumb-2 state a const_double which can be used with a VFP fconsts instruction.
    Dy In ARM/Thumb-2 state a const_double which can be used with a VFP fconstd instruction.
    Ut In ARM/Thumb-2 state an address valid for loading/storing opaque structure types wider than TImode.
    Uv In ARM/Thumb-2 state a valid VFP load/store address.
    Uy In ARM/Thumb-2 state a valid iWMMX load/store address.
    Un In ARM/Thumb-2 state a valid address for Neon doubleword vector load/store instructions.
    Um In ARM/Thumb-2 state a valid address for Neon element and structure load/store instructions.
    Us In ARM/Thumb-2 state a valid address for non-offset loads/stores of quad-word values in four ARM registers.
    Uq In ARM state an address valid in ldrsb instructions.
    Q In ARM/Thumb-2 state an address that is a single base register.

    Operand codes

    Within the text of an inline asm block, operands are referenced as %0, %1 etc. Register operands are printed as rN, memory operands as [rN, #offset], and so forth. In some situations, for example with operands occupying multiple registers, more detailed control of the output may be required, and once again, an undocumented feature comes to our rescue.

    Special code letters inserted between the % and the operand number alter the output from the default for each type of operand. The table below lists the more useful ones.

    c An integer or symbol address without a preceding # sign
    B Bitwise inverse of integer or symbol without a preceding #
    L The low 16 bits of an immediate constant
    m The base register of a memory operand
    M A register range suitable for LDM/STM
    H The highest-numbered register of a pair
    Q The least significant register of a pair
    R The most significant register of a pair
    P A double-precision VFP register
    p The high single-precision register of a VFP double-precision register
    q A NEON quad register
    e The low doubleword register of a NEON quad register
    f The high doubleword register of a NEON quad register
    h A range of VFP/NEON registers suitable for VLD1/VST1
    A A memory operand for a VLD1/VST1 instruction
    y S register as indexed D register, e.g. s5 becomes d2[1]
  • Need help configuring FFMPEG to work with a webcams h264 stream

    9 août 2020, par The Welsh Dragon

    I have been trying to get a H264 stream from a H264 usb webcam working but I am not making much progress so I'm hoping someone knows FFMPEG better than me !

    


    There are dozens of questions/answers on SO but none solve my problem.

    


    In short, I get a very pixelated (or sometimes mostly green) screen. I am using VLC to test the stream which is coming via an RTSP server. I am using FFMPEG to copy the webcam stream to the local RTSP server.

    


    The webcam also supports YUYV which I can get working - it is just the h264 stream causing me problems.

    


    So this is how the device is presented :

    


    H264 USB Camera: USB Camera (usb-20980000.usb-1):
        /dev/video0
        /dev/video1
        /dev/video2
        /dev/video3


    


    /dev/video0 is the YUYV and MPEG stream
/dev/video2 is the h264 stream that has the following capabilities :

    


    ioctl: VIDIOC_ENUM_FMT
        Type: Video Capture

        [0]: 'H264' (H.264, compressed)
                Size: Discrete 1920x1080
                        Interval: Discrete 0.033s (30.000 fps)
                        Interval: Discrete 0.040s (25.000 fps)
                        Interval: Discrete 0.067s (15.000 fps)
                        Interval: Discrete 0.033s (30.000 fps)
                        Interval: Discrete 0.040s (25.000 fps)
                        Interval: Discrete 0.067s (15.000 fps)
                Size: Discrete 1280x720
                        Interval: Discrete 0.033s (30.000 fps)
                        Interval: Discrete 0.040s (25.000 fps)
                        Interval: Discrete 0.067s (15.000 fps)
                Size: Discrete 800x600
                        Interval: Discrete 0.033s (30.000 fps)
                        Interval: Discrete 0.040s (25.000 fps)
                        Interval: Discrete 0.067s (15.000 fps)
                Size: Discrete 640x480
                        Interval: Discrete 0.033s (30.000 fps)
                        Interval: Discrete 0.040s (25.000 fps)
                        Interval: Discrete 0.067s (15.000 fps)
                Size: Discrete 640x360
                        Interval: Discrete 0.033s (30.000 fps)
                        Interval: Discrete 0.040s (25.000 fps)
                        Interval: Discrete 0.067s (15.000 fps)
                Size: Discrete 352x288
                        Interval: Discrete 0.033s (30.000 fps)
                        Interval: Discrete 0.040s (25.000 fps)
                        Interval: Discrete 0.067s (15.000 fps)
                Size: Discrete 320x240
                        Interval: Discrete 0.033s (30.000 fps)
                        Interval: Discrete 0.040s (25.000 fps)
                        Interval: Discrete 0.067s (15.000 fps)
                Size: Discrete 1920x1080
                        Interval: Discrete 0.033s (30.000 fps)
                        Interval: Discrete 0.040s (25.000 fps)
                        Interval: Discrete 0.067s (15.000 fps)
                        Interval: Discrete 0.033s (30.000 fps)
                        Interval: Discrete 0.040s (25.000 fps)
                        Interval: Discrete 0.067s (15.000 fps)


    


    I have tried various resolutions, the smaller giving slightly less pixelated images but none are usable and definitely dont compare to the YUYV high resolution results.

    


    This (YUYV) command works :

    


    ffmpeg -input_format yuyv422 -f video4linux2 -s 1280x720 -r 10 -i /dev/video0 -c:v h264_omx -r 10 -b:v 2M -an -f rtsp rtsp://localhost:80/live/stream


    


    These two h264 options dont work :

    


    ffmpeg -input_format h264 -f video4linux2 -video_size 1920x1080 -framerate 30 -i /dev/video0 -c:v copy -an -f rtsp rtsp://localhost:80/live/stream


    


    ffmpeg -re -i /dev/video2 -video_size 800x600 -framerate 15 -pix_fmt yuv420p -tune zerolatency -c:v copy -an -f rtsp rtsp://localhost:80/live/stream


    


    For that last command the FFMPEG output looks like this :

    


    ffmpeg version git-2020-08-07-6fdf3cc Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 8 (Raspbian 8.3.0-6+rpi1)
  configuration: --extra-ldflags=-latomic --arch=armel --target-os=linux --enable-gpl --enable-omx --enable-omx-rpi --enable-nonfree --enable-libfreetype --enable-libx264 --enable-libmp3lame --enable-mmal --enable-indev=alsa --enable-outdev=alsa
  libavutil      56. 58.100 / 56. 58.100
  libavcodec     58.100.100 / 58.100.100
  libavformat    58. 50.100 / 58. 50.100
  libavdevice    58. 11.101 / 58. 11.101
  libavfilter     7. 87.100 /  7. 87.100
  libswscale      5.  8.100 /  5.  8.100
  libswresample   3.  8.100 /  3.  8.100
  libpostproc    55.  8.100 / 55.  8.100
Input #0, video4linux2,v4l2, from '/dev/video2':
  Duration: N/A, start: 1353.265049, bitrate: N/A
    Stream #0:0: Video: h264 (Main), yuv420p(progressive), 1920x1080, 30 fps, 30 tbr, 1000k tbn, 2000k tbc
[udp @ 0x38c29f0] attempted to set receive buffer to size 393216 but it only ended up set as 360448
[udp @ 0x38d7b50] attempted to set receive buffer to size 393216 but it only ended up set as 360448
Output #0, rtsp, to 'rtsp://localhost:80/live/stream':
  Metadata:
    encoder         : Lavf58.50.100
    Stream #0:0: Video: h264 (Main), yuv420p(progressive), 1920x1080, q=2-31, 30 fps, 30 tbr, 90k tbn, 1000k tbc
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
Press [q] to stop, [?] for help
[rtsp @ 0x38fd890] Timestamps are unset in a packet for stream 0. This is deprecated and will stop working in the future. Fix your code to set the timestamps properly
[rtsp @ 0x38fd890] Non-monotonous DTS in output stream 0:0; previous: 0, current: 0; changing to 1. This may result in incorrect timestamps in the output file.
frame=  348 fps= 18 q=-1.0 size=N/A time=00:00:21.03 bitrate=N/A speed=1.09x


    


    The issue looks like it is bandwidth related or the lack of processing power in the device being used BUT the YUYV works at a high resolution and (taking a completely different approach i.e. not using FFMPEG) I can get a very decent MPEG stream working on the same device.

    


    So any FFMPEG experts out there who can help me with getting the correct parameters for a h264 stream ?