Recherche avancée

Médias (1)

Mot : - Tags -/berlin

Autres articles (84)

  • Pas question de marché, de cloud etc...

    10 avril 2011

    Le vocabulaire utilisé sur ce site essaie d’éviter toute référence à la mode qui fleurit allègrement
    sur le web 2.0 et dans les entreprises qui en vivent.
    Vous êtes donc invité à bannir l’utilisation des termes "Brand", "Cloud", "Marché" etc...
    Notre motivation est avant tout de créer un outil simple, accessible à pour tout le monde, favorisant
    le partage de créations sur Internet et permettant aux auteurs de garder une autonomie optimale.
    Aucun "contrat Gold ou Premium" n’est donc prévu, aucun (...)

  • Activation de l’inscription des visiteurs

    12 avril 2011, par

    Il est également possible d’activer l’inscription des visiteurs ce qui permettra à tout un chacun d’ouvrir soit même un compte sur le canal en question dans le cadre de projets ouverts par exemple.
    Pour ce faire, il suffit d’aller dans l’espace de configuration du site en choisissant le sous menus "Gestion des utilisateurs". Le premier formulaire visible correspond à cette fonctionnalité.
    Par défaut, MediaSPIP a créé lors de son initialisation un élément de menu dans le menu du haut de la page menant (...)

  • Ecrire une actualité

    21 juin 2013, par

    Présentez les changements dans votre MédiaSPIP ou les actualités de vos projets sur votre MédiaSPIP grâce à la rubrique actualités.
    Dans le thème par défaut spipeo de MédiaSPIP, les actualités sont affichées en bas de la page principale sous les éditoriaux.
    Vous pouvez personnaliser le formulaire de création d’une actualité.
    Formulaire de création d’une actualité Dans le cas d’un document de type actualité, les champs proposés par défaut sont : Date de publication ( personnaliser la date de publication ) (...)

Sur d’autres sites (9772)

  • Google Analytics Privacy Issues : Is It Really That Bad ?

    2 juin 2022, par Erin

    If you find yourself asking : “What’s the deal with Google Analytics privacy ?”, you probably have some second thoughts. 

    Your hunch is right. Google Analytics (GA) is a popular web analytics tool, but it’s far from being perfect when it comes to respecting users’ privacy. 

    This post helps you understand tremendous Google Analytics privacy concerns users, consumers and regulators expressed over the years.

    In this blog, we’ll cover :

    What Does Google Analytics Collect About Users ? 

    To understand Google Analytics privacy issues, you need to know how Google treats web users’ data. 

    By default, Google Analytics collects the following information : 

    • Session statistics — duration, page(s) viewed, etc. 
    • Referring website details — a link you came through or keyword used. 
    • Approximate geolocation — country, city. 
    • Browser and device information — mobile vs desktop, OS usage, etc. 

    Google obtains web analytics data about users via two means : an on-site Google Analytics tracking code and cookies.

    A cookie is a unique identifier (ID) assigned to each user visiting a web property. Each cookie stores two data items : unique user ID and website name. 

    With the help of cookies, web analytics solutions can recognise returning visitors and track their actions across the website(s).

    First-party vs third-party cookies
    • First party cookies are generated by one website and collect user behaviour data from said website only. 
    • Third-party cookies are generated by a third-party website object (for example, an ad) and can track user behaviour data across multiple websites. 

    As it’s easy to imagine, third-party cookies are a goldmine for companies selling online ads. Essentially, they allow ad platforms to continue watching how the user navigates the web after clicking a certain link. 

    Yet, people have little clue as to which data they are sharing and how it is being used. Also, user consent to tracking across websites is only marginally guaranteed by existing Google Analytics controls. 

    Why Third-Party Cookie Data Collection By GA Is Problematic 

    Cookies can transmit personally identifiable information (PII) such as name, log in details, IP address, saved payment method and so on. Some of these details can end up with advertisers without consumers’ direct knowledge or consent.

    Regulatory frameworks such as General Data Protection Regulation (GDPR) in Europe and California Consumer Privacy Act (CCPA) emerged as a response to uncontrolled user behaviour tracking.

    Under regulatory pressure, Big Tech companies had to adapt their data collection process.

    Apple was the first to implement by-default third-party blocking in the Safari browser. Then added a tracking consent mechanism for iPhone users starting from iOS 15.2 and later. 

    Google, too, said it would drop third-party cookie usage after The European Commission and UK’s Competition and Markets Authority (CMA) launched antitrust investigations into its activity. 

    To shake off the data watchdogs, Google released a Privacy Sandbox — a set of progressive tech, operational and compliance changes for ensuring greater consumer privacy. 

    Google’s biggest promise : deprecate third-party cookies usage for all web and mobile products. 

    Originally, Google promised to drop third-party cookies by 2022, but that didn’t happen. Instead, Google delayed cookie tracking depreciation for Chrome until the second half of 2023

    Why did they push back on this despite hefty fines from regulators ?

    Because online ads make Google a lot of money.

    In 2021, Alphabet Inc (parent company of Google), made $256.7 billion in revenue, of which $209.49 billion came from selling advertising. 

    Lax Google Analytics privacy enforcement — and its wide usage by website owners — help Google make those billions from collecting and selling user data. 

    How Google Uses Collected Google Analytics Data for Advertising 

    Over 28 million websites (or roughly 85% of the Internet) have Google Analytics tracking codes installed. 

    Even if one day we get a Google Analytics version without cookies, it still won’t address all the privacy concerns regulators and consumers have. 

    Over the years, Google has accumulated an extensive collection of user data. The company’s engineers used it to build state-of-the-art deep learning models, now employed to build advanced user profiles. 

    Deep learning is the process of training a machine to recognise data patterns. Then this “knowledge” is used to produce highly-accurate predictive insights. The more data you have for model training — the better its future accuracy will be. 

    Google has amassed huge deposits of data from its collection of products — GA, YouTube, Gmail, Google Docs and Google Maps among others. Now they are using this data to build a third-party cookies-less alternative mechanism for modelling people’s preferences, habits, lifestyles, etc. 

    Their latest model is called Google Topics. 

    This comes only after Google’s failed attempt to replace cookie-based training with Federated Learning of Cohorts (FLoC) model. But the solution wasn’t offering enough user transparency and user controls among other issues.

    Google Topics
    Source : Google Blog

    Google Topics promises to limit the granularity of data advertisers get about users. 

    But it’s still a web user surveillance method. With Google Topics, the company will continue collecting user data via Chrome (and likely other Google products) — and share it with advertisers. 

    Because as we said before : Google is in the business of profiting off consumers’ data. 

    Two Major Ways Google Takes Advantage of Customer Data

    Every bit of data Google collects across its ecosystem of products can be used in two ways :

    • For ad targeting and personalisation 
    • To improve Google’s products 

    The latter also helps the former. 

    Advanced Ad Personalisation and Targeting

    GA provides the company with ample data on users’ 

    • Recent and frequent searches 
    • Location history
    • Visited websites
    • Used apps 
    • Videos and ads viewed 
    • Personal data like age or gender 

    The company’s privacy policy explicitly states that :

    Google Analytics Privacy Policy
    Source : Google

    Google also admits to using collected data to “measure the effectiveness of advertising” and “personalise content and ads you see on Google.” 

    But there are no further elaborations on how exactly customers’ data is used — and what you can do to prevent it from being shared with third parties. 

    In some cases, Google also “forgets” to inform users about its in-product tracking.

    Journalists from CNBC and The New York Times independently concluded that Google monitors users’ Gmail activity. In particular, the company scans your inbox for recent purchases, trips, flights and bills notifications. 

    While Google says that this information isn’t sold to advertisers (directly), they still may use the “saved information about your orders in other Google services”. 

    Once again, this means you have little control or knowledge of subsequent data usage. 

    Improving Product Usability 

    Google has many “arms” to collect different data points — from user’s search history to frequently-travelled physical routes. 

    They also reserve the right to use these insights for improving existing products. 

    Here’s what it means : by combining different types of data points obtained from various products, Google can pierce a detailed picture of a person’s life. Even if such user profile data is anonymised, it is still alarmingly accurate. 

    Douglas Schmidt, a computer science researcher at Vanderbilt University, well summarised the matter : 

    “[Google’s] business model is to collect as much data about you as possible and cross-correlate it so they can try to link your online persona with your offline persona. This tracking is just absolutely essential to their business. ‘Surveillance capitalism’ is a perfect phrase for it.”

    Google Data Collection Obsession Is Backed Into Its Business Model 

    OK, but Google offers some privacy controls to users ? Yes. Google only sees and uses the information you voluntarily enter or permit them to access. 

    But as the Washington Post correspondent points out :

    “[Big Tech] companies get to set all the rules, as long as they run those rules by consumers in convoluted terms of service that even those capable of decoding the legalistic language rarely bother to read. Other mechanisms for notice and consent, such as opt-outs and opt-ins, create similar problems. Control for the consumer is mostly an illusion.”

    Google openly claims to be “one of many ad networks that personalise ads based on your activity online”. 

    The wrinkle is that they have more data than all other advertising networks (arguably combined). This helps Google sell high-precision targeting and contextually personalised ads for billions of dollars annually.

    Given that Google has stakes in so many products — it’s really hard to de-Google your business and minimise tracking and data collection from the company.

    They are also creating a monopoly on data collection and ownership. This fact makes regulators concerned. The 2021 antitrust lawsuit from the European Commission says : 

    “The formal investigation will notably examine whether Google is distorting competition by restricting access by third parties to user data for advertising purposes on websites and apps while reserving such data for its own use.”

    In other words : By using consumer data to its unfair advantage, Google allegedly shuts off competition.

    But that’s not the only matter worrying regulators and consumers alike. Over the years, Google also received numerous other lawsuits for breaching people’s privacy, over and over again. 

    Here’s a timeline : 

    Separately, Google has a very complex history with GDPR compliance

    How Google Analytics Contributes to the Web Privacy Problem 

    Google Analytics is the key puzzle piece that supports Google’s data-driven business model. 

    If Google was to release a privacy-focused Google Analytics alternative, it’d lose access to valuable web users’ data and a big portion of digital ad revenues. 

    Remember : Google collects more data than it shares with web analytics users and advertisers. But they keep a lot of it for personal usage — and keep looking for ways to share this intel with advertisers (in a way that keeps regulators off their tail).

    For Google Analytics to become truly ethical and privacy-focused, Google would need to change their entire revenue model — which is something they are unlikely to do.

    Where does this leave Google Analytics users ? 

    In a slippery territory. By proxy, companies using GA are complicit with Google’s shady data collection and usage practice. They become part of the problem.

    In fact, Google Analytics usage opens a business to two types of risks : 

    • Reputational. 77% of global consumers say that transparency around how data is collected and used is important to them when interacting with different brands. That’s why data breaches and data misuse by brands lead to major public outrages on social media and boycotts in some cases. 
    • Legal. EU regulators are on a continuous crusade against Google Analytics 4 (GA4) as it is in breach of GDPR. French and Austrian watchdogs ruled the “service” illegal. Since Google Analytics is not GDPR compliant, it opens any business using it to lawsuits (which is already happening).

    But there’s a way out.

    Choose a Privacy-Friendly Google Analytics Alternative 

    Google Analytics is a popular web analytics service, but not the only one available. You have alternatives such as Matomo. 

    Our guiding principle is : respecting privacy.

    Unlike Google Analytics, we leave data ownership 100% in users’ hands. Matomo lets you implement privacy-centred controls for user data collection.

    Plus, you can self-host Matomo On-Premise or choose Matomo Cloud with data securely stored in the EU and in compliance with GDPR.

    The best part ? You can try our ethical alternative to Google Analytics for free. No credit card required ! Start your free 21-day trial now

  • Google Optimize vs Matomo A/B Testing : Everything You Need to Know

    17 mars 2023, par Erin — Analytics Tips

    Google Optimize is a popular A/B testing tool marketers use to validate the performance of different marketing assets, website design elements and promotional offers. 

    But by September 2023, Google will sunset both free and paid versions of the Optimize product. 

    If you’re searching for an equally robust, but GDPR compliant, privacy-friendly alternative to Google Optimize, have a look at Matomo A/B Testing

    Integrated with our analytics platform and conversion rate optimisation (CRO) tools, Matomo allows you to run A/B and A/B/n tests without any usage caps or compromises in user privacy.

    Disclaimer : Please note that the information provided in this blog post is for general informational purposes only and is not intended to provide legal advice. Every situation is unique and requires a specific legal analysis. If you have any questions regarding the legal implications of any matter, please consult with your legal team or seek advice from a qualified legal professional.

    Google Optimize vs Matomo : Key Capabilities Compared 

    This guide shows how Matomo A/B testing stacks against Google Optimize in terms of features, reporting, integrations and pricing.

    Supported Platforms 

    Google Optimize supports experiments for dynamic websites and single-page mobile apps only. 

    If you want to run split tests in mobile apps, you’ll have to do so via Firebase — Google’s app development platform. It also has a free tier but paid usage-based subscription kicks in after your product(s) reaches a certain usage threshold. 

    Google Optimize also doesn’t support CRO experiments for web or desktop applications, email campaigns or paid ad campaigns.Matomo A/B Testing, in contrast, allows you to run experiments in virtually every channel. We have three installation options — using JavaScript, server-side technology, or our mobile tracking SDK. These allow you to run split tests in any type of web or mobile app (including games), a desktop product, or on your website. Also, you can do different email marketing tests (e.g., compare subject line variants).

    A/B Testing 

    A/B testing (split testing) is the core feature of both products. Marketers use A/B testing to determine which creative elements such as website microcopy, button placements and banner versions, resonate better with target audiences. 

    You can benchmark different versions against one another to determine which variation resonates more with users. Or you can test an A version against B, C, D and beyond. This is called A/B/n testing. 

    Both Matomo A/B testing and Google Optimize let you test either separate page elements or two completely different landing page designs, using redirect tests. You can show different variants to different user groups (aka apply targeting criteria). For example, activate tests only for certain device types, locations or types of on-site behaviour. 

    The advantage of Matomo is that we don’t limit the number of concurrent experiments you can run. With Google Optimize, you’re limited to 5 simultaneous experiments. Likewise, 

    Matomo lets you select an unlimited number of experiment objectives, whereas Google caps the maximum choice to 3 predefined options per experiment. 

    Objectives are criteria the underlying statistical model will use to determine the best-performing version. Typically, marketers use metrics such as page views, session duration, bounce rate or generated revenue as conversion goals

    Conversions Report Matomo

    Multivariate testing (MVT)

    Multivariate testing (MVT) allows you to “pack” several A/B tests into one active experiment. In other words : You create a stack of variants to determine which combination drives the best marketing outcomes. 

    For example, an MVT experiment can include five versions of a web page, where each has a different slogan, product image, call-to-action, etc. Visitors are then served with a different variation. The tracking code collects data on their behaviours and desired outcomes (objectives) and reports the results.

    MVT saves marketers time as it’s a great alternative to doing separate A/B tests for each variable. Both Matomo and Google Optimize support this feature. However, Google Optimize caps the number of possible combinations at 16, whereas Matomo has no limits. 

    Redirect Tests

    Redirect tests, also known as split URL tests, allow you to serve two entirely different web page versions to users and compare their performance. This option comes in handy when you’re redesigning your website or want to test a localised page version in a new market. 

    Also, redirect tests are a great way to validate the performance of bottom-of-the-funnel (BoFU) pages as a checkout page (for eCommerce websites), a pricing page (for SaaS apps) or a contact/booking form (for a B2B service businesses). 

    You can do split URL tests with Google Optimize and Matomo A/B Testing. 

    Experiment Design 

    Google Optimize provides a visual editor for making simple page changes to your website (e.g., changing button colour or adding several headline variations). You can then preview the changes before publishing an experiment. For more complex experiments (e.g., testing different page block sequences), you’ll have to codify experiments using custom JavaScript, HTML and CSS.

    In Matomo, all A/B tests are configured on the server-side (i.e., by editing your website’s raw HTML) or client-side via JavaScript. Afterwards, you use the Matomo interface to start or schedule an experiment, set objectives and view reports. 

    Experiment Configuration 

    Marketers know how complex customer journeys can be. Multiple factors — from location and device to time of the day and discount size — can impact your conversion rates. That’s why a great CRO app allows you to configure multiple tracking conditions. 

    Matomo A/B testing comes with granular controls. First of all, you can decide which percentage of total web visitors participate in any given experiment. By default, the number is set to 100%, but you can change it to any other option. 

    Likewise, you can change which percentage of traffic each variant gets in an experiment. For example, your original version can get 30% of traffic, while options A and B receive 40% each. We also allow users to specify custom parameters for experiment participation. You can only show your variants to people in specific geo-location or returning visitors only. 

    Finally, you can select any type of meaningful objective to evaluate each variant’s performance. With Matomo, you can either use standard website analytics metrics (e.g., total page views, bounce rate, CTR, visit direction, etc) or custom goals (e.g., form click, asset download, eCommerce order, etc). 

    In other words : You’re in charge of deciding on your campaign targeting criteria, duration and evaluation objectives.

    A free Google Optimize account comes with three main types of user targeting options : 

    • Geo-targeting at city, region, metro and country levels. 
    • Technology targeting  by browser, OS or device type, first-party cookie, etc. 
    • Behavioural targeting based on metrics like “time since first arrival” and “page referrer” (referral traffic source). 

    Users can also configure other types of tracking scenarios (for example to only serve tests to signed-in users), using condition-based rules

    Reporting 

    Both Matomo and Google Optimize use different statistical models to evaluate which variation performs best. 

    Matomo relies on statistical hypothesis testing, which we use to count unique visitors and report on conversion rates. We analyse all user data (with no data sampling applied), meaning you get accurate reporting, based on first-hand data, rather than deductions. For that reason, we ask users to avoid drawing conclusions before their experiment participation numbers reach a statistically significant result. Typically, we recommend running an experiment for at least several business cycles to get a comprehensive report. 

    Google Optimize, in turn, uses Bayesian inference — a statistical method, which relies on a random sample of users to compare the performance rates of each creative against one another. While a Bayesian model generates CRO reports faster and at a bigger scale, it’s based on inferences.

    Model developers need to have the necessary skills to translate subjective prior beliefs about the probability of a certain event into a mathematical formula. Since Google Optimize is a proprietary tool, you cannot audit the underlying model design and verify its accuracy. In other words, you trust that it was created with the right judgement. 

    In comparison, Matomo started as an open-source project, and our source code can be audited independently by anyone at any time. 

    Another reporting difference to mind is the reporting delays. Matomo Cloud generates A/B reports within 6 hours and in only 1 hour for Matomo On-Premise. Google Optimize, in turn, requires 12 hours from the first experiment setup to start reporting on results. 

    When you configure a test experiment and want to quickly verify that everything is set up correctly, this can be an inconvenience.

    User Privacy & GDPR Compliance 

    Google Optimize works in conjunction with Google Analytics, which isn’t GDPR compliant

    For all website traffic from the EU, you’re therefore obliged to show a cookie consent banner. The kicker, however, is that you can only show an Optimize experiment after the user gives consent to tracking. If the user doesn’t, they will only see an original page version. Considering that almost 40% of global consumers reject cookie consent banners, this can significantly affect your results.

    This renders Google Optimize mostly useless in the EU since it would only allow you to run tests with a fraction ( 60%) of EU traffic — and even less if you apply any extra targeting criteria. 

    In comparison, Matomo is fully GDPR compliant. Therefore, our users are legally exempt from displaying cookie-consent banners in most EU markets (with Germany and the UK being an exception). Since Matomo A/B testing is part of Matomo web analytics, you don’t have to worry about GDPR compliance or breaches in user privacy. 

    Digital Experience Intelligence 

    You can get comprehensive statistical data on variants’ performance with Google Optimize. But you don’t get further insights on why some tests are more successful than others. 

    Matomo enables you to collect more insights with two extra features :

    • User session recordings : Monitor how users behave on different page versions. Observe clicks, mouse movements, scrolls, page changes, and form interactions to better understand the users’ cumulative digital experience. 
    • Heatmaps : Determine which elements attract the most users’ attention to fine-tune your split tests. With a standard CRO tool, you only assume that a certain page element does matter for most users. A heatmap can help you determine for sure. 

    Both of these features are bundled into your Matomo Cloud subscription

    Integrations 

    Both Matomo and Google Optimize integrate with multiple other tools. 

    Google Optimize has native integrations with other products in the marketing family — GA, Google Ads, Google Tag Manager, Google BigQuery, Accelerated Mobile Pages (AMP), and Firebase. Separately, other popular marketing apps have created custom connectors for integrating Google Optimize data. 

    Matomo A/B Testing, in turn, can be combined with other web analytics and CRO features such as Funnels, Multi-Channel Attribution, Tag Manager, Form Analytics, Heatmaps, Session Recording, and more ! 

    You can also conveniently export your website analytics or CRO data using Matomo Analytics API to analyse it in another app. 

    Pricing 

    Google Optimize is a free tool but has usage caps. If you want to schedule more than 5 concurrent experiments or test more than 16 variants at once, you’ll have to upgrade to Optimize 360. Optimize 360 prices aren’t listed publicly but are said to be closer to six figures per year. 

    Matomo A/B Testing is available with every Cloud subscription (starting from €19) and Matomo On-Premise users can also get A/B Testing as a plugin (starting from €199/year). In each case, there are no caps or data limits. 

    Google Optimize vs Matomo A/B Testing : Comparison Table

    Features/capabilitiesGoogle OptimizeMatomo A/B test
    Supported channelsWebWeb, mobile, email, digital campaigns
    A/B testingcheck mark iconcheck mark icon
    Multivariate testing (MVT)check mark iconcheck mark icon
    Split URL testscheck mark iconcheck mark icon
    Web analytics integration Native with UA/GA4 Native with Matomo

    You can also migrate historical UA (GA3) data to Matomo
    Audience segmentation BasicAdvanced
    Geo-targetingcheck mark iconX
    Technology targetingcheck mark iconX
    Behavioural targetingBasicAdvanced
    Reporting modelBayesian analysisStatistical hypothesis testing
    Report availability Within 12 hours after setup 6 hours for Matomo Cloud

    1 hour for Matomo On-Premise
    HeatmapsXcheck mark icon

    Included with Matomo Cloud
    Session recordingsXcheck mark icon

    Included with Matomo Cloud
    GDPR complianceXcheck mark icon
    Support Self-help desk on a free tierSelf-help guides, user forum, email
    PriceFree limited tier From €19 for Cloud subscription

    From €199/year as plugin for On-Premise

    Final Thoughts : Who Benefits the Most From an A/B Testing Tool ?

    Split testing is an excellent method for validating various assumptions about your target customers. 

    With A/B testing tools you get a data-backed answer to research hypotheses such as “How different pricing affects purchases ?”, “What contact button placement generates more clicks ?”, “Which registration form performs best with new app subscribers ?” and more. 

    Such insights can be game-changing when you’re trying to improve your demand-generation efforts or conversion rates at the BoFu stage. But to get meaningful results from CRO tests, you need to select measurable, representative objectives.

    For example, split testing different pricing strategies for low-priced, frequently purchased products makes sense as you can run an experiment for a couple of weeks to get a statistically relevant sample. 

    But if you’re in a B2B SaaS product, where the average sales cycle takes weeks (or months) to finalise and things like “time-sensitive discounts” or “one-time promos” don’t really work, getting adequate CRO data will be harder. 

    To see tangible results from CRO, you’ll need to spend more time on test ideation than implementation. Your team needs to figure out : which elements to test, in what order, and why. 

    Effective CRO tests are designed for a specific part of the funnel and assume that you’re capable of effectively identifying and tracking conversions (goals) at the selected stage. This alone can be a complex task since not all customer journeys are alike. For SaaS websites, using a goal like “free trial account registration” can be a good starting point.

    A good test also produces a meaningful difference between the proposed variant and the original version. As Nima Yassini, Partner at Deloitte Digital, rightfully argues :

    “I see people experimenting with the goal of creating an uplift. There’s nothing wrong with that, but if you’re only looking to get wins you will be crushed when the first few tests fail. The industry average says that only one in five to seven tests win, so you need to be prepared to lose most of the time”.

    In many cases, CRO tests don’t provide the data you expected (e.g., people equally click the blue and green buttons). In this case, you need to start building your hypothesis from scratch. 

    At the same time, it’s easy to get caught up in optimising for “vanity metrics” — such that look good in the report, but don’t quite match your marketing objectives. For example, better email headline variations can improve your email open rates. But if users don’t proceed to engage with the email content (e.g. click-through to your website or use a provided discount code), your efforts are still falling short. 

    That’s why developing a baseline strategy is important before committing to an A/B testing tool. Google Optimize appealed to many users because it’s free and allows you to test your split test strategy cost-effectively. 

    With its upcoming depreciation, many marketers are very committed to a more expensive A/B tool (especially when they’re not fully sure about their CRO strategy and its results). 

    Matomo A/B testing is a cost-effective, GDPR-compliant alternative to Google Optimize with a low learning curve and extra competitive features. 

    Discover if Matomo A/B Testing is the ideal Google Optimize alternative for your organization with our free 21-day trial. No credit card required.

  • How to Stream Audio from Google Cloud Storage in Chunks and Convert Each Chunk to WAV for Whisper Transcription

    25 juillet, par Douglas Landvik

    I'm working on a project where I need to transcribe audio stored in a Google Cloud Storage bucket using OpenAI's Whisper model. The audio is stored in WebM format with Opus encoding, and due to the file size, I'm streaming the audio in 30-second chunks.

    


    To convert each chunk to WAV (16 kHz, mono, 16-bit PCM) compatible with Whisper, I'm using FFmpeg. The first chunk converts successfully, but subsequent chunks fail to convert. I suspect this is because each chunk lacks the WebM container's header, which FFmpeg needs to interpret the Opus codec correctly.

    


    Here’s a simplified version of my approach :

    


    Download Chunk : I download each chunk from GCS as bytes.
Convert with FFmpeg : I pass the bytes to FFmpeg to convert each chunk from WebM/Opus to WAV.

    


    async def handle_transcription_and_notify(
    consultation_service: ConsultationService,
    consultation_processor: ConsultationProcessor,
    consultation: Consultation,
    language: str,
    notes: str,
    clinic_id: str,
    vet_email: str,
    trace_id: str,
    blob_path: str,
    max_retries: int = 3,
    retry_delay: int = 5,
    max_concurrent_tasks: int = 3
):
    """
    Handles the transcription process by streaming the file from GCS, converting to a compatible format, 
    and notifying the client via WebSocket.
    """
    chunk_duration_sec = 30  # 30 seconds per chunk
    logger.info(f"Starting transcription process for consultation {consultation.consultation_id}",
                extra={'trace_id': trace_id})

    # Initialize GCS client
    service_account_key = os.environ.get('SERVICE_ACCOUNT_KEY_BACKEND')
    if not service_account_key:
        logger.error("Service account key not found in environment variables", extra={'trace_id': trace_id})
        await send_discord_alert(
            f"Service account key not found for consultation {consultation.consultation_id}.\nTrace ID: {trace_id}"
        )
        return

    try:
        service_account_info = json.loads(service_account_key)
        credentials = service_account.Credentials.from_service_account_info(service_account_info)
    except Exception as e:
        logger.error(f"Error loading service account credentials: {str(e)}", extra={'trace_id': trace_id})
        await send_discord_alert(
            f"Error loading service account credentials for consultation {consultation.consultation_id}.\nError: {str(e)}\nTrace ID: {trace_id}"
        )
        return

    # Initialize GCS client
    service_account_key = os.environ.get('SERVICE_ACCOUNT_KEY_BACKEND')
    if not service_account_key:
        logger.error("Service account key not found in environment variables", extra={'trace_id': trace_id})
        await send_discord_alert(
            f"Service account key not found for consultation {consultation.consultation_id}.\nTrace ID: {trace_id}"
        )
        return

    try:
        service_account_info = json.loads(service_account_key)
        credentials = service_account.Credentials.from_service_account_info(service_account_info)
    except Exception as e:
        logger.error(f"Error loading service account credentials: {str(e)}", extra={'trace_id': trace_id})
        await send_discord_alert(
            f"Error loading service account credentials for consultation {consultation.consultation_id}.\nError: {str(e)}\nTrace ID: {trace_id}"
        )
        return

    storage_client = storage.Client(credentials=credentials)
    bucket_name = 'vetz_consultations'
    blob = storage_client.bucket(bucket_name).get_blob(blob_path)
    bytes_per_second = 16000 * 2  # 32,000 bytes per second
    chunk_size_bytes = 30 * bytes_per_second
    size = blob.size

    async def stream_blob_in_chunks(blob, chunk_size):
        loop = asyncio.get_running_loop()
        start = 0
        size = blob.size
        while start < size:
            end = min(start + chunk_size - 1, size - 1)
            try:
                logger.info(f"Requesting chunk from {start} to {end}", extra={'trace_id': trace_id})
                chunk = await loop.run_in_executor(
                    None, lambda: blob.download_as_bytes(start=start, end=end)
                )
                if not chunk:
                    break
                logger.info(f"Yielding chunk from {start} to {end}, size: {len(chunk)} bytes",
                            extra={'trace_id': trace_id})
                yield chunk
                start += chunk_size
            except Exception as e:
                logger.error(f"Error downloading chunk from {start} to {end}: {str(e)}", exc_info=True,
                             extra={'trace_id': trace_id})
                raise e

    async def convert_to_wav(chunk_bytes, chunk_idx):
        """
        Convert audio chunk to WAV format compatible with Whisper, ensuring it's 16 kHz, mono, and 16-bit PCM.
        """
        try:
            logger.debug(f"Processing chunk {chunk_idx}: size = {len(chunk_bytes)} bytes")

            detected_format = await detect_audio_format(chunk_bytes)
            logger.info(f"Detected audio format for chunk {chunk_idx}: {detected_format}")
            input_io = io.BytesIO(chunk_bytes)
            output_io = io.BytesIO()

            # ffmpeg command to convert webm/opus to WAV with 16 kHz, mono, and 16-bit PCM

            # ffmpeg command with debug information
            ffmpeg_command = [
                "ffmpeg",
                "-loglevel", "debug",
                "-f", "s16le",            # Treat input as raw PCM data
                "-ar", "48000",           # Set input sample rate
                "-ac", "1",               # Set input to mono
                "-i", "pipe:0",
                "-ar", "16000",           # Set output sample rate to 16 kHz
                "-ac", "1",               # Ensure mono output
                "-sample_fmt", "s16",     # Set output format to 16-bit PCM
                "-f", "wav",              # Output as WAV format
                "pipe:1"
            ]

            process = subprocess.Popen(
                ffmpeg_command,
                stdin=subprocess.PIPE,
                stdout=subprocess.PIPE,
                stderr=subprocess.PIPE
            )

            stdout, stderr = process.communicate(input=input_io.read())

            if process.returncode == 0:
                logger.info(f"FFmpeg conversion completed successfully for chunk {chunk_idx}")
                output_io.write(stdout)
                output_io.seek(0)

                # Save the WAV file locally for listening
                output_dir = "converted_chunks"
                os.makedirs(output_dir, exist_ok=True)
                file_path = os.path.join(output_dir, f"chunk_{chunk_idx}.wav")

                with open(file_path, "wb") as f:
                    f.write(stdout)
                logger.info(f"Chunk {chunk_idx} saved to {file_path}")

                return output_io
            else:
                logger.error(f"FFmpeg failed for chunk {chunk_idx} with return code {process.returncode}")
                logger.error(f"Chunk {chunk_idx} - FFmpeg stderr: {stderr.decode()}")
                return None

        except Exception as e:
            logger.error(f"Unexpected error in FFmpeg conversion for chunk {chunk_idx}: {str(e)}")
            return None

    async def transcribe_chunk(idx, chunk_bytes):
        for attempt in range(1, max_retries + 1):
            try:
                logger.info(f"Transcribing chunk {idx + 1} (attempt {attempt}).", extra={'trace_id': trace_id})

                # Convert to WAV format
                wav_io = await convert_to_wav(chunk_bytes, idx)
                if not wav_io:
                    logger.error(f"Failed to convert chunk {idx + 1} to WAV format.")
                    return ""

                wav_io.name = "chunk.wav"
                chunk_transcription = await consultation_processor.transcribe_audio_whisper(wav_io)
                logger.info(f"Chunk {idx + 1} transcribed successfully.", extra={'trace_id': trace_id})
                return chunk_transcription
            except Exception as e:
                logger.error(f"Error transcribing chunk {idx + 1} (attempt {attempt}): {str(e)}", exc_info=True,
                             extra={'trace_id': trace_id})
                if attempt < max_retries:
                    await asyncio.sleep(retry_delay)
                else:
                    await send_discord_alert(
                        f"Max retries reached for chunk {idx + 1} in consultation {consultation.consultation_id}.\nError: {str(e)}\nTrace ID: {trace_id}"
                    )
                    return ""  # Return empty string for failed chunk

    await notification_manager.send_personal_message(
        f"Consultation {consultation.consultation_id} is being transcribed.", vet_email
    )

    try:
        idx = 0
        full_transcription = []
        async for chunk in stream_blob_in_chunks(blob, chunk_size_bytes):
            transcription = await transcribe_chunk(idx, chunk)
            if transcription:
                full_transcription.append(transcription)
            idx += 1

        combined_transcription = " ".join(full_transcription)
        consultation.full_transcript = (consultation.full_transcript or "") + " " + combined_transcription
        consultation_service.save_consultation(clinic_id, vet_email, consultation)
        logger.info(f"Transcription saved for consultation {consultation.consultation_id}.",
                    extra={'trace_id': trace_id})

    except Exception as e:
        logger.error(f"Error during transcription process: {str(e)}", exc_info=True, extra={'trace_id': trace_id})
        await send_discord_alert(
            f"Error during transcription process for consultation {consultation.consultation_id}.\nError: {str(e)}\nTrace ID: {trace_id}"
        )
        return

    await notification_manager.send_personal_message(
        f"Consultation {consultation.consultation_id} has been transcribed.", vet_email
    )

    try:
        template_service = TemplateService()
        medical_record_template = template_service.get_template_by_name(
            consultation.medical_record_template_id).sections

        sections = await consultation_processor.extract_structured_sections(
            transcription=consultation.full_transcript,
            notes=notes,
            language=language,
            template=medical_record_template,
        )
        consultation.sections = sections
        consultation_service.save_consultation(clinic_id, vet_email, consultation)
        logger.info(f"Sections processed for consultation {consultation.consultation_id}.",
                    extra={'trace_id': trace_id})
    except Exception as e:
        logger.error(f"Error processing sections for consultation {consultation.consultation_id}: {str(e)}",
                     exc_info=True, extra={'trace_id': trace_id})
        await send_discord_alert(
            f"Error processing sections for consultation {consultation.consultation_id}.\nError: {str(e)}\nTrace ID: {trace_id}"
        )
        raise e

    await notification_manager.send_personal_message(
        f"Consultation {consultation.consultation_id} is fully processed.", vet_email
    )
    logger.info(f"Successfully processed consultation {consultation.consultation_id}.",
                extra={'trace_id': trace_id})