Recherche avancée

Médias (0)

Mot : - Tags -/performance

Aucun média correspondant à vos critères n’est disponible sur le site.

Autres articles (55)

  • Support de tous types de médias

    10 avril 2011

    Contrairement à beaucoup de logiciels et autres plate-formes modernes de partage de documents, MediaSPIP a l’ambition de gérer un maximum de formats de documents différents qu’ils soient de type : images (png, gif, jpg, bmp et autres...) ; audio (MP3, Ogg, Wav et autres...) ; vidéo (Avi, MP4, Ogv, mpg, mov, wmv et autres...) ; contenu textuel, code ou autres (open office, microsoft office (tableur, présentation), web (html, css), LaTeX, Google Earth) (...)

  • Selection of projects using MediaSPIP

    2 mai 2011, par

    The examples below are representative elements of MediaSPIP specific uses for specific projects.
    MediaSPIP farm @ Infini
    The non profit organizationInfini develops hospitality activities, internet access point, training, realizing innovative projects in the field of information and communication technologies and Communication, and hosting of websites. It plays a unique and prominent role in the Brest (France) area, at the national level, among the half-dozen such association. Its members (...)

  • Sélection de projets utilisant MediaSPIP

    29 avril 2011, par

    Les exemples cités ci-dessous sont des éléments représentatifs d’usages spécifiques de MediaSPIP pour certains projets.
    Vous pensez avoir un site "remarquable" réalisé avec MediaSPIP ? Faites le nous savoir ici.
    Ferme MediaSPIP @ Infini
    L’Association Infini développe des activités d’accueil, de point d’accès internet, de formation, de conduite de projets innovants dans le domaine des Technologies de l’Information et de la Communication, et l’hébergement de sites. Elle joue en la matière un rôle unique (...)

Sur d’autres sites (4662)

  • How to make your plugin multilingual – Introducing the Piwik Platform

    29 octobre 2014, par Thomas Steur — Development

    This is the next post of our blog series where we introduce the capabilities of the Piwik platform (our previous post was Generating test data – Introducing the Piwik Platform). This time you’ll learn how to equip your plugin with translations. Users of your plugin will be very thankful that they can use and translate the plugin in their language !

    Getting started

    In this post, we assume that you have already set up your development environment and created a plugin. If not, visit the Piwik Developer Zone where you’ll find the tutorial Setting up Piwik and other Guides that help you to develop a plugin.

    Managing translations

    Piwik is available in over 50 languages and comes with many translations. The core itself provides some basic translations for words like “Visitor” and “Help”. They are stored in the directory /lang. In addition, each plugin can provide its own translations for wordings that are used in this plugin. They are located in /plugins/*/lang. In those directories you’ll find one JSON file for each language. Each language file consists in turn of tokens that belong to a group.

    {
       "MyPlugin":{
           "BlogPost": "Blog post",
           "MyToken": "My translation",
           "InteractionRate": "Interaction Rate"
       }
    }

    A group usually represents the name of a plugin, in this case “MyPlugin”. Within this group, all the tokens are listed on the left side and the related translations on the right side.

    Building a translation key

    As you will later see to actually translate a word or a sentence you’ll need to know the corresponding translation key. This key is built by combining a group and a token separated by an underscore. You can for instance use the key MyPlugin_BlogPost to get a translation of “Blog post”. Defining a new key is as easy as adding a new entry to the “MyPlugin” group.

    Providing default translations

    If a translation cannot be found then the English translation will be used as a default. Therefore, you should always provide a default translation in English for all keys in the file en.json (ie, /plugins/MyPlugin/lang/en.json).

    Adding translations for other languages

    This is as easy as creating new files in the lang subdirectory of your plugin. The filename consists of a 2 letter ISO 639-1 language code completed by the extension .json. This means German translations go into a file named de.json, French ones into a file named fr.json. To see a list of languages you can use have a look at the /lang directory.

    Reusing translations

    As mentioned Piwik comes with quite a lot of translations. You can and should reuse them but you are supposed to be aware that a translation key might be removed or renamed in the future. It is also possible that a translation key was added in a recent version and therefore is not available in older versions of Piwik. We do not currently announce any of such changes. Still, 99% of the translation keys do not change and it is therefore usually a good idea to reuse existing translations. Especially when you or your company would otherwise not be able to provide them. To find any existing translation keys go to Settings => Translation search in your Piwik installation. The menu item will only appear if the development mode is enabled.

    Translations in PHP

    Use the Piwik::translate() function to translate any text in PHP. Simply pass any existing translation key and you will get the translated text in the language of the current user in return. The English translation will be returned in case none for the current language exists.

    $translatedText = Piwik::translate('MyPlugin_BlogPost');

    Translations in Twig Templates

    To translate text in Twig templates, use the translate filter.

    {{ 'MyPlugin_BlogPost'|translate }}

    Contributing translations to Piwik

    Did you know you can contribute translations to Piwik ? In case you want to improve an existing translation, translate a missing one or add a new language go to Piwik Translations and sign up for an account. You won’t need any knowledge in development to do this.

    Advanced features

    Of course there are more useful things you can do with translations. For instance you can use placeholders like %s in your translations and you can use translations in JavaScript as well. In case you want to know more about those topics check out our Internationalization guide. Currently, this guide only covers translations but we will cover more topics like formatting numbers and handling currencies in the future.

    Congratulations, you have learnt how to make your plugin multilingual !

    If you have any feedback regarding our APIs or our guides in the Developer Zone feel free to send it to us.

  • Computer crashing when using python tools in same script

    5 février 2023, par SL1997

    I am attempting to use the speech recognition toolkit VOSK and the speech diarization package Resemblyzer to transcibe audio and then identify the speakers in the audio.

    


    Tools :

    


    https://github.com/alphacep/vosk-api
    
https://github.com/resemble-ai/Resemblyzer

    


    I can do both things individually but run into issues when trying to do them when running the one python script.

    


    I used the following guide when setting up the diarization system :

    


    https://medium.com/saarthi-ai/who-spoke-when-build-your-own-speaker-diarization-module-from-scratch-e7d725ee279

    


    Computer specs are as follows :

    


    Intel(R) Core(TM) i3-7100 CPU @ 3.90GHz, 3912 Mhz, 2 Core(s), 4 Logical Processor(s)
    
32GB RAM

    


    The following is my code, I am not to sure if using threading is appropriate or if I even implemented it correctly, how can I best optimize this code as to achieve the results I am looking for and not crash.

    


    from vosk import Model, KaldiRecognizer
from pydub import AudioSegment
import json
import sys
import os
import subprocess
import datetime
from resemblyzer import preprocess_wav, VoiceEncoder
from pathlib import Path
from resemblyzer.hparams import sampling_rate
from spectralcluster import SpectralClusterer
import threading
import queue
import gc



def recognition(queue, audio, FRAME_RATE):

    model = Model("Vosk_Models/vosk-model-small-en-us-0.15")

    rec = KaldiRecognizer(model, FRAME_RATE)
    rec.SetWords(True)

    rec.AcceptWaveform(audio.raw_data)
    result = rec.Result()

    transcript = json.loads(result)#["text"]

    #return transcript
    queue.put(transcript)



def diarization(queue, audio):

    wav = preprocess_wav(audio)
    encoder = VoiceEncoder("cpu")
    _, cont_embeds, wav_splits = encoder.embed_utterance(wav, return_partials=True, rate=16)
    print(cont_embeds.shape)

    clusterer = SpectralClusterer(
        min_clusters=2,
        max_clusters=100,
        p_percentile=0.90,
        gaussian_blur_sigma=1)

    labels = clusterer.predict(cont_embeds)

    def create_labelling(labels, wav_splits):

        times = [((s.start + s.stop) / 2) / sampling_rate for s in wav_splits]
        labelling = []
        start_time = 0

        for i, time in enumerate(times):
            if i > 0 and labels[i] != labels[i - 1]:
                temp = [str(labels[i - 1]), start_time, time]
                labelling.append(tuple(temp))
                start_time = time
            if i == len(times) - 1:
                temp = [str(labels[i]), start_time, time]
                labelling.append(tuple(temp))

        return labelling

    #return
    labelling = create_labelling(labels, wav_splits)
    queue.put(labelling)



def identify_speaker(queue1, queue2):

    transcript = queue1.get()
    labelling = queue2.get()

    for speaker in labelling:

        speakerID = speaker[0]
        speakerStart = speaker[1]
        speakerEnd = speaker[2]

        result = transcript['result']
        words = [r['word'] for r in result if speakerStart < r['start'] < speakerEnd]
        #return
        print("Speaker",speakerID,":",' '.join(words), "\n")





def main():

    queue1 = queue.Queue()
    queue2 = queue.Queue()

    FRAME_RATE = 16000
    CHANNELS = 1

    podcast = AudioSegment.from_mp3("Podcast_Audio/Film-Release-Clip.mp3")
    podcast = podcast.set_channels(CHANNELS)
    podcast = podcast.set_frame_rate(FRAME_RATE)

    first_thread = threading.Thread(target=recognition, args=(queue1, podcast, FRAME_RATE))
    second_thread = threading.Thread(target=diarization, args=(queue2, podcast))
    third_thread = threading.Thread(target=identify_speaker, args=(queue1, queue2))

    first_thread.start()
    first_thread.join()
    gc.collect()

    second_thread.start()
    second_thread.join()
    gc.collect()

    third_thread.start()
    third_thread.join()
    gc.collect()

    # transcript = recognition(podcast,FRAME_RATE)
    #
    # labelling = diarization(podcast)
    #
    # print(identify_speaker(transcript, labelling))


if __name__ == '__main__':
    main()


    


    When I say crash I mean everything freezes, I have to hold down the power button on the desktop and turn it back on again. No blue/blank screen, just frozen in my IDE looking at my code. Any help in resolving this issue would be greatly appreciated.

    


  • What is Google Analytics data sampling and what’s so bad about it ?

    16 août 2019, par Joselyn Khor — Analytics Tips, Development

    What is Google Analytics data sampling, and what’s so bad about it ?

    Google (2019) explains what data sampling is :

    “In data analysis, sampling is the practice of analysing a subset of all data in order to uncover the meaningful information in the larger data set.”[1]

    This is basically saying instead of analysing all of the data, there’s a threshold on how much data is analysed and any data after that will be an assumption based on patterns.

    Google’s (2019) data sampling thresholds :

    Ad-hoc queries of your data are subject to the following general thresholds for sampling :
    [Google] Analytics Standard : 500k sessions at the property level for the date range you are using
    [Google] Analytics 360 : 100M sessions at the view level for the date range you are using (para. 3) [2]

    This threshold is limiting because your data in GA may become more inaccurate as the traffic to your website increases.

    Say you’re looking through all your traffic data from the last year and find you have 5 million page views. Only 500K of that 5 million is accurate ! The data for the remaining 4.5 million (90%) is an assumption based on the 500K sample size.

    This is a key weapon Google uses to sell to large businesses. In order to increase that threshold for more accurate reporting, upgrading to premium Google Analytics 360 for approximately US$150,000 per year seems to be the only choice.

    What’s so bad about data sampling ?

    It’s unfair to say sampled data is to be disregarded completely. There is a calculation ensuring it is representative and can allow you to get good enough insights. However, we don’t encourage it as we don’t just want “good enough” data. We want the actual facts.

    In a recent survey sent to Matomo customers, we found a large proportion of users switched from GA to Matomo due to the data sampling issue.

    The two reasons why data sampling isn’t preferable : 

    1. If the selected sample size is too small, you won’t get a good representative of all the data. 
    2. The bigger your website grows, the more inaccurate your reports will become.

    An example of why we don’t fully trust sampled data is, say you have an ecommerce store and see your GA revenue reports aren’t matching the actual sales data, due to data sampling. In GA you may be seeing revenue for the month as $1 million, instead of actual sales of $800K.

    The sampling here has caused an inaccuracy that could have negative financial implications. What you get in the GA report is an estimated dollar figure rather than the actual sales. Making decisions based on inaccurate data can be costly in this case. 

    Another disadvantage to sampled data is that you might be missing out on opportunities you would’ve noticed if you were given a view of the whole. E.g. not being able to see real patterns occurring due to the data already being predicted. 

    By not getting a chance to see things as they are and only being able to jump to the conclusions and assumptions made by GA is risky. The bigger your business grows, the less you can risk making business decisions based on assumptions that could be inaccurate. 

    If you feel you could be missing out on opportunities because your GA data is sampled data, get 100% accurately reported data. 

    The benefits of 100% accurate data

    Matomo doesn’t use data sampling on any of our products or plans. You get to see all of your data and not a sampled data set.

    Data quality is necessary for high impact decision-making. It’s hard to make strategic changes if you don’t have confidence that your data is reliable and accurate.

    Learn about how Matomo is a serious contender to Google Analytics 360. 

    Now you can import your Google Analytics data directly into your Matomo

    If you’re wanting to make the switch to Matomo but worried about losing all your historic Google Analytics data, you can now import this directly into your Matomo with the Google Analytics Importer tool.


    Take the challenge !

    Compare your Google Analytics data (sampled data) against your Matomo data, or if you don’t have Matomo data yet, sign up to our 30-day free trial and start tracking !

    References :

    [1 & 2] About data sampling. (2019). In Analytics Help About data sampling. Retrieved August 14, 2019, from https://support.google.com/analytics/answer/2637192