Recherche avancée

Médias (1)

Mot : - Tags -/musée

Autres articles (30)

  • Submit bugs and patches

    13 avril 2011

    Unfortunately a software is never perfect.
    If you think you have found a bug, report it using our ticket system. Please to help us to fix it by providing the following information : the browser you are using, including the exact version as precise an explanation as possible of the problem if possible, the steps taken resulting in the problem a link to the site / page in question
    If you think you have solved the bug, fill in a ticket and attach to it a corrective patch.
    You may also (...)

  • List of compatible distributions

    26 avril 2011, par

    The table below is the list of Linux distributions compatible with the automated installation script of MediaSPIP. Distribution nameVersion nameVersion number Debian Squeeze 6.x.x Debian Weezy 7.x.x Debian Jessie 8.x.x Ubuntu The Precise Pangolin 12.04 LTS Ubuntu The Trusty Tahr 14.04
    If you want to help us improve this list, you can provide us access to a machine whose distribution is not mentioned above or send the necessary fixes to add (...)

  • Organiser par catégorie

    17 mai 2013, par

    Dans MédiaSPIP, une rubrique a 2 noms : catégorie et rubrique.
    Les différents documents stockés dans MédiaSPIP peuvent être rangés dans différentes catégories. On peut créer une catégorie en cliquant sur "publier une catégorie" dans le menu publier en haut à droite ( après authentification ). Une catégorie peut être rangée dans une autre catégorie aussi ce qui fait qu’on peut construire une arborescence de catégories.
    Lors de la publication prochaine d’un document, la nouvelle catégorie créée sera proposée (...)

Sur d’autres sites (6069)

  • New proposed ePrivacy Regulation and why Piwik might not need tracking consent compared to Google Analytics & co

    11 janvier 2017, par InnoCraft — Community

    The EU is proposing new ePrivacy Regulations. The proposed Regulation on Privacy and Electronic Communications will increase the protection of people’s private life and open up new opportunities for business.

    The new ePrivacy Regulation proposal

    The proposal mentions several changes for example to the “Cookie Law” where no longer a cookie consent will be needed when the cookies improve the user’s internet experience, for example to remember the shopping cart history or when completing a form over several pages.

    However, consent to track a user’s behaviour may be needed in the future, unless the analytics data collection is hosted on the first-party website.

    From TheRegister : O’Neil noted a minor change in which visitors to a website for analytics purposes do not require consent, as long as any personal data collected is only processed by the first party.

    First party Analytics respecting privacy

    Piwik is an open-source analytics platform that is used on more than 1 million websites and apps in over 150 countries, and available in more than 50 languages. The difference with other analytics solutions is that you can download and install Piwik on your own infrastructure. Websites and mobile apps tracking users with their own Piwik very likely won’t require a consent from their users if these regulations become reality.

    We have regularly written about why privacy matters, or more recently 11 ways Piwik Analytics helps you to protect your visitors privacy.

    Besides the standard Piwik features, there are Premium Features that let businesses and organizations further maximize their success based on the tracked data. Need help in hosting Piwik on premise ? InnoCraft are THE Piwik experts and know it best as it is the company of the makers of Piwik. InnoCraft provides support subscriptions and enterprise packages to help you setting up, configuring and maintaining Piwik on your infrastructure as well as offer training and custom development.

    We’re excited to be building the best digital analytics platform which respects our privacy on the Internet.

    Thank you for being a valued member of the Piwik community !

  • On-premise analytics demand grows as Google Analytics GDPR uncertainties continue

    7 janvier 2020, par Jake Thornton — Privacy

    The Google Analytics GDPR relationship is a complicated one. Website owners in states like Berlin in Germany are now required to ask users for consent to collect their data. This doesn’t make for the friendliest user-experience and often the website visitor will simply click “no.”

    The problem Google Analytics now presents website owners in the EU is with more visitors clicking “no”, the less accurate your data will become.

    Why do you need to ask your visitors for consent ?

    At this stage it’s simply because Google Analytics collects data for its own purposes. An example of this is using your visitor’s personal data for retargeting purposes across their advertising platforms like Google Ads and YouTube. 

    Google’s Privacy & Terms states : “when you visit a website that uses advertising services like AdSense, including analytics tools like Google Analytics, or embeds video content from YouTube, your web browser automatically sends certain information to Google. This includes the URL of the page you’re visiting and your IP address. We may also set cookies on your browser or read cookies that are already there. Apps that use Google advertising services also share information with Google, such as the name of the app and a unique identifier for advertising.”

    The rise of hosting web analytics on-premise

    Managing Google Analytics and GDPR can quickly become complicated, so there’s been an increase in website owners switching from cloud-hosted web analytics platforms, like Google Analytics, to more GDPR compliant alternatives, where you can host web analytics software on your own servers. This is called hosting web analytics on-premise.

    Hosting web analytics on your own servers means :

    No third-parties are involved

    The visitor data your website collects is stored on your own internal infrastructure. This means no third-parties are involved and there’s no risk of personal data being used in the way Google Analytics uses it e.g. sending personal data to its advertising platforms. 

    When you sign up with Google Analytics you sign away control of your user’s personal data. With on-premise website analytics, you own your data and are in full control.

    NOTE : Though Google Analytics uses personal data for its own purposes, not all cloud hosted web analytics platforms do this. As an example, Matomo Analytics Cloud hosted solution states that all personal data collected is not used for its own purposes and that Matomo has no rights in accessing or using this personal data. 

    You control where in the world your personal data is stored

    Google Analytics servers are based out of USA, Europe and Asia, so where your personal data will end up is uncertain and you don’t have the option to choose which location it goes to when using free Google Analytics.

    Different countries have different laws when it comes to accessing personal data. When you choose to host your web analytics on-premise, you can choose the location of your servers and where the personal data is stored.

    More flexibility

    With self-hosted web analytics platforms like Matomo On-Premise, you can extend the platform to do anything you want without the restrictions that cloud hosted platforms impose.

    You can :

    • Get full access to the source code of open-source solutions, like Matomo
    • Extend the platform however you want for your business
    • Get access to APIs
    • Have no data limitations or restrictions
    • Get RAW data access
    • Have control over security

    >> Read more about on-premise flexibility for web analytics here

    So what does the future look like for Google Analytics and GDPR ?

    It’s difficult to assess this right now. How exactly GDPR is enforced is still quite unclear. 

    What is clear however, is now website owners in Berlin using Google Analytics are lawfully required to ask their visitors for consent to collect personal data. It has been reported that Google Analytics has already received 200,000 complaints in Germany alone and it appears this trend is likely to continue across much of the EU.

    When using Google Analytics in the EU you must also ensure your privacy policy is updated so website visitors are aware that data is being collected through Google Analytics for its own purposes.

    Moving to a web analytics on-premise platform

    Matomo Analytics is the #1 open-source web analytics platform in the world and has been rated as an exceptional alternative to Google Analytics. Check the reviews on Capterra.

    Choosing Matomo On-Premise means you can control exactly where your data is stored, you have full flexibility to customise the platform to do what you want and it’s FREE.

    Matomo’s mission is to give control back to website owners and the team has designed the platform so that moving away from Google Analytics is seamless. Matomo offers most of your favourite Google Analytics features, a leaner interface to navigate, and the option to add free and paid premium features that Google Analytics can’t even offer you.

    And now you can import your historical Google Analytics data directly into your Matomo with the Google Analytics Importer plugin.

    And if you can’t host web analytics on your own servers ...

    Hosting web analytics on-premise is not an option for all businesses as you do need the internal infrastructure and technical knowledge to host your own platform.

    If you can’t self-host, then Matomo has a Cloud hosted solution you can easily install and operate like Google Analytics, which is hosted on Matomo’s servers in the EU. 

    The GDPR advantages of choosing Matomo Cloud over Google Analytics are :

    • Servers are secure and based in the EU (strict laws forbid outside access)
    • 100% data ownership – we never use data for our own purposes
    • You can export your data anytime and switch to Matomo On-Premise whenever you like
    • User-privacy protection
    • Advanced GDPR Manager and data anonymisation features which GA doesn’t offer

    Interested to learn more ?

    If you are wanting to learn more about why users are making the move from Google Analytics to Matomo, check out our Matomo Analytics vs Google Analytics comparison page.

    >> Matomo Analytics vs Google Analytics

  • How to Stream Audio from Google Cloud Storage in Chunks and Convert Each Chunk to WAV for Whisper Transcription

    25 juillet, par Douglas Landvik

    I'm working on a project where I need to transcribe audio stored in a Google Cloud Storage bucket using OpenAI's Whisper model. The audio is stored in WebM format with Opus encoding, and due to the file size, I'm streaming the audio in 30-second chunks.

    


    To convert each chunk to WAV (16 kHz, mono, 16-bit PCM) compatible with Whisper, I'm using FFmpeg. The first chunk converts successfully, but subsequent chunks fail to convert. I suspect this is because each chunk lacks the WebM container's header, which FFmpeg needs to interpret the Opus codec correctly.

    


    Here’s a simplified version of my approach :

    


    Download Chunk : I download each chunk from GCS as bytes.
Convert with FFmpeg : I pass the bytes to FFmpeg to convert each chunk from WebM/Opus to WAV.

    


    async def handle_transcription_and_notify(
    consultation_service: ConsultationService,
    consultation_processor: ConsultationProcessor,
    consultation: Consultation,
    language: str,
    notes: str,
    clinic_id: str,
    vet_email: str,
    trace_id: str,
    blob_path: str,
    max_retries: int = 3,
    retry_delay: int = 5,
    max_concurrent_tasks: int = 3
):
    """
    Handles the transcription process by streaming the file from GCS, converting to a compatible format, 
    and notifying the client via WebSocket.
    """
    chunk_duration_sec = 30  # 30 seconds per chunk
    logger.info(f"Starting transcription process for consultation {consultation.consultation_id}",
                extra={'trace_id': trace_id})

    # Initialize GCS client
    service_account_key = os.environ.get('SERVICE_ACCOUNT_KEY_BACKEND')
    if not service_account_key:
        logger.error("Service account key not found in environment variables", extra={'trace_id': trace_id})
        await send_discord_alert(
            f"Service account key not found for consultation {consultation.consultation_id}.\nTrace ID: {trace_id}"
        )
        return

    try:
        service_account_info = json.loads(service_account_key)
        credentials = service_account.Credentials.from_service_account_info(service_account_info)
    except Exception as e:
        logger.error(f"Error loading service account credentials: {str(e)}", extra={'trace_id': trace_id})
        await send_discord_alert(
            f"Error loading service account credentials for consultation {consultation.consultation_id}.\nError: {str(e)}\nTrace ID: {trace_id}"
        )
        return

    # Initialize GCS client
    service_account_key = os.environ.get('SERVICE_ACCOUNT_KEY_BACKEND')
    if not service_account_key:
        logger.error("Service account key not found in environment variables", extra={'trace_id': trace_id})
        await send_discord_alert(
            f"Service account key not found for consultation {consultation.consultation_id}.\nTrace ID: {trace_id}"
        )
        return

    try:
        service_account_info = json.loads(service_account_key)
        credentials = service_account.Credentials.from_service_account_info(service_account_info)
    except Exception as e:
        logger.error(f"Error loading service account credentials: {str(e)}", extra={'trace_id': trace_id})
        await send_discord_alert(
            f"Error loading service account credentials for consultation {consultation.consultation_id}.\nError: {str(e)}\nTrace ID: {trace_id}"
        )
        return

    storage_client = storage.Client(credentials=credentials)
    bucket_name = 'vetz_consultations'
    blob = storage_client.bucket(bucket_name).get_blob(blob_path)
    bytes_per_second = 16000 * 2  # 32,000 bytes per second
    chunk_size_bytes = 30 * bytes_per_second
    size = blob.size

    async def stream_blob_in_chunks(blob, chunk_size):
        loop = asyncio.get_running_loop()
        start = 0
        size = blob.size
        while start < size:
            end = min(start + chunk_size - 1, size - 1)
            try:
                logger.info(f"Requesting chunk from {start} to {end}", extra={'trace_id': trace_id})
                chunk = await loop.run_in_executor(
                    None, lambda: blob.download_as_bytes(start=start, end=end)
                )
                if not chunk:
                    break
                logger.info(f"Yielding chunk from {start} to {end}, size: {len(chunk)} bytes",
                            extra={'trace_id': trace_id})
                yield chunk
                start += chunk_size
            except Exception as e:
                logger.error(f"Error downloading chunk from {start} to {end}: {str(e)}", exc_info=True,
                             extra={'trace_id': trace_id})
                raise e

    async def convert_to_wav(chunk_bytes, chunk_idx):
        """
        Convert audio chunk to WAV format compatible with Whisper, ensuring it's 16 kHz, mono, and 16-bit PCM.
        """
        try:
            logger.debug(f"Processing chunk {chunk_idx}: size = {len(chunk_bytes)} bytes")

            detected_format = await detect_audio_format(chunk_bytes)
            logger.info(f"Detected audio format for chunk {chunk_idx}: {detected_format}")
            input_io = io.BytesIO(chunk_bytes)
            output_io = io.BytesIO()

            # ffmpeg command to convert webm/opus to WAV with 16 kHz, mono, and 16-bit PCM

            # ffmpeg command with debug information
            ffmpeg_command = [
                "ffmpeg",
                "-loglevel", "debug",
                "-f", "s16le",            # Treat input as raw PCM data
                "-ar", "48000",           # Set input sample rate
                "-ac", "1",               # Set input to mono
                "-i", "pipe:0",
                "-ar", "16000",           # Set output sample rate to 16 kHz
                "-ac", "1",               # Ensure mono output
                "-sample_fmt", "s16",     # Set output format to 16-bit PCM
                "-f", "wav",              # Output as WAV format
                "pipe:1"
            ]

            process = subprocess.Popen(
                ffmpeg_command,
                stdin=subprocess.PIPE,
                stdout=subprocess.PIPE,
                stderr=subprocess.PIPE
            )

            stdout, stderr = process.communicate(input=input_io.read())

            if process.returncode == 0:
                logger.info(f"FFmpeg conversion completed successfully for chunk {chunk_idx}")
                output_io.write(stdout)
                output_io.seek(0)

                # Save the WAV file locally for listening
                output_dir = "converted_chunks"
                os.makedirs(output_dir, exist_ok=True)
                file_path = os.path.join(output_dir, f"chunk_{chunk_idx}.wav")

                with open(file_path, "wb") as f:
                    f.write(stdout)
                logger.info(f"Chunk {chunk_idx} saved to {file_path}")

                return output_io
            else:
                logger.error(f"FFmpeg failed for chunk {chunk_idx} with return code {process.returncode}")
                logger.error(f"Chunk {chunk_idx} - FFmpeg stderr: {stderr.decode()}")
                return None

        except Exception as e:
            logger.error(f"Unexpected error in FFmpeg conversion for chunk {chunk_idx}: {str(e)}")
            return None

    async def transcribe_chunk(idx, chunk_bytes):
        for attempt in range(1, max_retries + 1):
            try:
                logger.info(f"Transcribing chunk {idx + 1} (attempt {attempt}).", extra={'trace_id': trace_id})

                # Convert to WAV format
                wav_io = await convert_to_wav(chunk_bytes, idx)
                if not wav_io:
                    logger.error(f"Failed to convert chunk {idx + 1} to WAV format.")
                    return ""

                wav_io.name = "chunk.wav"
                chunk_transcription = await consultation_processor.transcribe_audio_whisper(wav_io)
                logger.info(f"Chunk {idx + 1} transcribed successfully.", extra={'trace_id': trace_id})
                return chunk_transcription
            except Exception as e:
                logger.error(f"Error transcribing chunk {idx + 1} (attempt {attempt}): {str(e)}", exc_info=True,
                             extra={'trace_id': trace_id})
                if attempt < max_retries:
                    await asyncio.sleep(retry_delay)
                else:
                    await send_discord_alert(
                        f"Max retries reached for chunk {idx + 1} in consultation {consultation.consultation_id}.\nError: {str(e)}\nTrace ID: {trace_id}"
                    )
                    return ""  # Return empty string for failed chunk

    await notification_manager.send_personal_message(
        f"Consultation {consultation.consultation_id} is being transcribed.", vet_email
    )

    try:
        idx = 0
        full_transcription = []
        async for chunk in stream_blob_in_chunks(blob, chunk_size_bytes):
            transcription = await transcribe_chunk(idx, chunk)
            if transcription:
                full_transcription.append(transcription)
            idx += 1

        combined_transcription = " ".join(full_transcription)
        consultation.full_transcript = (consultation.full_transcript or "") + " " + combined_transcription
        consultation_service.save_consultation(clinic_id, vet_email, consultation)
        logger.info(f"Transcription saved for consultation {consultation.consultation_id}.",
                    extra={'trace_id': trace_id})

    except Exception as e:
        logger.error(f"Error during transcription process: {str(e)}", exc_info=True, extra={'trace_id': trace_id})
        await send_discord_alert(
            f"Error during transcription process for consultation {consultation.consultation_id}.\nError: {str(e)}\nTrace ID: {trace_id}"
        )
        return

    await notification_manager.send_personal_message(
        f"Consultation {consultation.consultation_id} has been transcribed.", vet_email
    )

    try:
        template_service = TemplateService()
        medical_record_template = template_service.get_template_by_name(
            consultation.medical_record_template_id).sections

        sections = await consultation_processor.extract_structured_sections(
            transcription=consultation.full_transcript,
            notes=notes,
            language=language,
            template=medical_record_template,
        )
        consultation.sections = sections
        consultation_service.save_consultation(clinic_id, vet_email, consultation)
        logger.info(f"Sections processed for consultation {consultation.consultation_id}.",
                    extra={'trace_id': trace_id})
    except Exception as e:
        logger.error(f"Error processing sections for consultation {consultation.consultation_id}: {str(e)}",
                     exc_info=True, extra={'trace_id': trace_id})
        await send_discord_alert(
            f"Error processing sections for consultation {consultation.consultation_id}.\nError: {str(e)}\nTrace ID: {trace_id}"
        )
        raise e

    await notification_manager.send_personal_message(
        f"Consultation {consultation.consultation_id} is fully processed.", vet_email
    )
    logger.info(f"Successfully processed consultation {consultation.consultation_id}.",
                extra={'trace_id': trace_id})