
Recherche avancée
Autres articles (101)
-
ANNEXE : Les plugins utilisés spécifiquement pour la ferme
5 mars 2010, parLe site central/maître de la ferme a besoin d’utiliser plusieurs plugins supplémentaires vis à vis des canaux pour son bon fonctionnement. le plugin Gestion de la mutualisation ; le plugin inscription3 pour gérer les inscriptions et les demandes de création d’instance de mutualisation dès l’inscription des utilisateurs ; le plugin verifier qui fournit une API de vérification des champs (utilisé par inscription3) ; le plugin champs extras v2 nécessité par inscription3 (...)
-
Les autorisations surchargées par les plugins
27 avril 2010, parMediaspip core
autoriser_auteur_modifier() afin que les visiteurs soient capables de modifier leurs informations sur la page d’auteurs -
Qu’est ce qu’un éditorial
21 juin 2013, parEcrivez votre de point de vue dans un article. Celui-ci sera rangé dans une rubrique prévue à cet effet.
Un éditorial est un article de type texte uniquement. Il a pour objectif de ranger les points de vue dans une rubrique dédiée. Un seul éditorial est placé à la une en page d’accueil. Pour consulter les précédents, consultez la rubrique dédiée.
Vous pouvez personnaliser le formulaire de création d’un éditorial.
Formulaire de création d’un éditorial Dans le cas d’un document de type éditorial, les (...)
Sur d’autres sites (11631)
-
Choosing the best self-hosted open-source analytics platform
16 juillet, par JoeGoogle Analytics (GA) is the most widely used analytics platform, with 50.3% of the top 1 million active websites using it today. You’re probably using it right now.
But despite being a free tool, Google Analytics is proprietary software, which means you’re handing over your browsing data, metadata and search history to a third party.
Do you trust them ? We sure don’t.
This lack of control can lead to potential privacy risks and compliance issues. These issues have so far resulted in fines under the EU’s General Data Protection Regulation (GDPR) of an average of €2.5 million each, for a total of almost €6.6 billion since 2018.
Open-source analytics platforms offer a solution. They’re a safer and more transparent alternative that lets you retain full control over how you collect and store your customers’ data. But what are these tools ? Where do you find them ? And, most importantly, how do you choose the best one for your needs ?
This guide explores the benefits and features of open-source analytics platforms and compares popular options, including Matomo, a leading self-hosted, open-source Google Analytics alternative.
What is an open-source analytics platform ?
An analytics platform is software that collects, processes and analyses data to gain insights, identify trends, and make informed decisions. It helps users understand past performance, monitor current activities and predict future outcomes.
An open-source analytics platform is a type of analytics suite in which anyone can view, modify and distribute the underlying source code.
In contrast to proprietary analytics platforms, where a single entity owns and controls the code, open-source analytics platforms adhere to the principles of free and open-source software (FOSS). This allows everyone to use, study, share, and customise the software to meet their needs, fostering collaboration and transparency.
Open-source analytics and the Free Software Foundation
The concept of FOSS is rooted in the idea of software freedom. According to the Free Software Foundation (FSF), this idea is defined by four fundamental freedoms granted to the user the freedom to :
- Use or run the program as they wish, for any purpose.
- Study how the program works and change it as they wish.
- Redistribute copies to help others.
- Improve the code and distribute copies of their improved versions to others.
Open access to the source code is a precondition for guaranteeing these freedoms.
The importance of FOSS licensing
The FSF has been instrumental in the free software movement, which serves as the foundation for open-source analytics platforms. Among other things, it created the GNU General Public Licence (GPL), which guarantees that all software distributions include the source code and are distributed under the same licence.
However, other licences, including several copyleft and permissive licences, have been developed to address certain legal issues and loopholes in the GPL. Analytics platforms distributed under any of these licences are considered open-source since they are FSF-compliant.
Benefits and drawbacks of open-source analytics platforms
Open-source analytics platforms offer a compelling alternative to their proprietary counterparts, but they also have a few challenges.
Benefits of open-source analytics
- Full data ownership : Many open-source solutions let you host the analytics platform yourself. This gives you complete control over your customers’ data, ensuring privacy and security.
- Customisable solution : With access to the source code, you can tailor the platform to your specific needs.
- Full transparency : You can inspect the code to see exactly how data is collected, processed and stored, helping you ensure compliance with privacy regulations.
- Community-driven development : Open-source projects benefit from the contributions of a global community of developers. This leads to faster innovation, quicker bug fixes and, in some cases, a wider range of features.
- No predefined limits : Self-hosted open-source analytics platforms don’t impose arbitrary limits on data storage or processing. You’re only limited by your own server resources.
Cons of open-source analytics
- Technical expertise required : Setting up and maintaining a self-hosted open-source platform often requires technical knowledge.
- No live/dedicated support team : While many projects have active communities, dedicated support might be limited compared to commercial offerings.
- Integration challenges : Integrating with other tools in your stack might require custom development, especially if pre-built integrations aren’t available.
- Feature gaps : Depending on the specific platform, there might be gaps in functionality compared to mature proprietary solutions.
Why open-source is better than proprietary analytics
Proprietary analytics platforms, like Google Analytics, have long been the go-to choice for many businesses. However, growing concerns around data privacy, vendor lock-in and limited customisation are driving a shift towards open-source alternatives.
No vendor lock-in
Proprietary platforms lock you into their ecosystem, controlling terms, pricing and future development. Migrating data can be costly, and you’re dependent on the vendor for updates.
Open-source platforms allow users to switch providers, modify software and contribute to development. Contributors can also create dedicated migration tools to import data from GA and other proprietary platforms.
Data privacy concerns
Proprietary analytics platforms can heighten the risk of data privacy violations and subsequent fines under regulations like the GDPR and the California Consumer Privacy Act (CCPA). This is because their opaque ‘black box’ design often obscures how they collect, process and use data.
Businesses often have limited visibility and even less control over a vendor’s data handling. They don’t know whether these vendors are using it for their own benefit or sharing it more widely, which can lead to privacy breaches and other data protection violations.
These fines can reach into the millions and even billions. For example, Zoom was fined $85 million in 2021 for CCPA violations, while the largest fine in history has been the €1.2 billion fine imposed on Meta by the Irish Data Protection Act (DPA) under the EU GDPR.
Customisation
Proprietary platforms often offer a one-size-fits-all approach. While they might have some customisation options, you’re ultimately limited by what the vendor provides. Open-source platforms, on the other hand, offer unparalleled flexibility.
Unlimited data processing
Proprietary analytics platforms often restrict the amount of data you can collect and process, especially on free plans. Going over these limits usually requires upgrading to a paid plan, which can be a problem for high-traffic websites or businesses with large datasets.
Self-hosted tools only limit data processing based on your server resources, allowing you to collect and analyse as much data as you need at no extra cost.
No black box effect
Since proprietary tools are closed-source, they often lack transparency in their data processing methods. It’s difficult to understand and validate how their algorithms work or how they calculate specific metrics. This “black box” effect can lead to trust issues and make it challenging to validate your data’s accuracy.
11 Key features to look for in an open-source analytics platform
Choosing the right open-source analytics platform is crucial for unlocking actionable insights from your customers’ data. Here are 11 key features to consider :
#1. Extensive support documentation and resource libraries
Even with technical expertise, you might encounter challenges or have questions about the platform. A strong support system is essential. Look for platforms with comprehensive documentation, active community forums and the option for professional support for mission-critical deployments.
#2. Live analytics
Having access to live data and reports is crucial for making timely and informed decisions. A live analytics feature allows you to :
- Monitor website traffic as it happens.
- Optimise campaign performance tracking.
- Identify and respond to issues like traffic spikes, drops or errors quickly, allowing for rapid troubleshooting.
For example, Matomo updates tracking data every 10 seconds, which is more than enough to give you a live view of your website performance.
#3. Personal data tracking
Understanding user behaviour is at the heart of effective analytics. Look for a platform that allows you to track personal data while respecting privacy. This might include features like :
- Creating detailed profiles of individual users and tracking their interactions across multiple sessions.
- Track user-specific attributes like demographics, interests or purchase history.
- Track user ID across different devices and platforms to understand user experience.
#4. Conversion tracking
Ultimately, you want to measure how effective your website is in achieving your business goals. Conversion tracking allows you to :
- Define and track key performance indicators (KPIs) like purchases, sign-ups or downloads.
- Identify bottlenecks in the user journey that prevent conversions.
- Measure the ROI of your marketing campaigns.
#5. Session recordings
Session recordings give your development team a qualitative understanding of user behaviour by letting you watch replays of individual user sessions. This can help you :
- Identify usability issues.
- Understand how users navigate your site and interact with different elements.
- Uncover bugs or errors.
#6. A/B testing
Experimentation is key to optimising your website and improving conversion rates. Look for an integrated A/B testing feature that allows you to :
- Test different variations of your website in terms of headlines, images, calls to action or page layouts.
- Measure the impact on key metrics.
- Implement changes based on statistically significant differences in user behaviour patterns, rather than guesswork.
#7. Custom reporting and dashboards
Every business has unique reporting needs. Look for a flexible platform that allows you to :
- Build custom reports that focus on the metrics that matter most to you.
- Create personalised dashboards that provide a quick overview of those KPIs.
- Automate report generation to save your team valuable time.
#8. No data sampling
Data sampling can save time and processing power, but it can also lead to inaccurate insights if the sample isn’t representative of the entire dataset. The solution is to avoid data sampling entirely.
Processing 100% of your customers’ data ensures that your reports are accurate and unbiased, providing a true picture of customer behaviour.
#9. Google Analytics migration tools
If you’re migrating from Google Analytics, a data export/import tool can save you time and effort. Some open-source analytics projects offer dedicated data importers to transfer historical data from GA into the new platform, preserving valuable insights. These tools help maintain data continuity and simplify the transition, reducing the manual effort involved in setting up a new analytics platform.
#10 A broad customer base
The breadth and diversity of an analytics platform’s customer base can be a strong indicator of its trustworthiness and capabilities. Consider the following :
- Verticals served
- The size of the companies that use it
- Whether it’s trusted in highly-regulated industries
If a platform is trusted by a large entity with stringent security and privacy requirements, such as governments or military branches, it speaks volumes about its security and data protection capabilities.
#11 Self-hosting
Self-hosting offers unparalleled control over your customers’ data and infrastructure.
Unlike cloud-based solutions, where your customers’ data resides on third-party servers, self-hosting means you manage your own servers and databases. This approach ensures that your customers’ data remains within your own infrastructure, enhancing privacy and security.
There are other features, like analytics for mobile apps, but these 11 will help shortlist your options to find the ideal tool.
Choosing your self-hosted open-source analytics platform : A step-by-step guide
The right self-hosted open-source analytics platform can significantly impact your data strategy. Follow these steps to make the best choice :
Step #1. Define your needs and objectives
Begin by clearly outlining what you want to achieve with your analytics platform :
- Identify relevant KPIs.
- Determine what type of reports to generate, their frequency and distribution.
- Consider your privacy and compliance needs, like GDPR and CCPA.
Step #2. Define your budget
While self-hosted open-source platforms are usually free to use, there are still costs associated with self-hosting, including :
- Server hardware and infrastructure.
- Ongoing maintenance, updates and potential support fees.
- Development resources if you plan to customise the platform.
Step #3. Consider scalability and performance
Scaling your analytics can be an issue with self-hosted platforms since it means scaling your server infrastructure as well. Before choosing a platform, you must think about :
- Current traffic volume and projected growth.
- Your current capacity to handle traffic.
- The platform’s scalability options.
Step #4. Research and evaluate potential solutions
Shortlist a few different open-source analytics platforms that align with your requirements. In addition to the features outlined above, also consider factors like :
- Ease of use.
- Community and support.
- Comprehensive documentation.
- The platform’s security track record.
Step #5. Sign up for a free trial and conduct thorough testing
Many platforms offer free trials or demos. Take advantage of these opportunities to test the platform’s features, evaluate the user interface and more.
You can embed multiple independent tracking codes on your website, which means you can test multiple analytics platforms simultaneously. Doing so helps you compare and validate results based on the same data, making comparisons more objective and reliable.
Step #6. Plan for implementation and ongoing management
After choosing a platform, follow the documentation to install and configure the software. Plan how you’ll migrate existing data if you’re switching from another platform.
Ensure your team is trained on the platform, and establish a plan for updates, security patches and backups. Then, you’ll be ready to migrate to the new platform while minimising downtime.
Top self-hosted open-source analytics tools
Let’s examine three prominent self-hosted open-source analytics tools.
Matomo
Main Features Analytics updated every 10 seconds, custom reports, dashboards, user segmentation, goal tracking, e-commerce tracking, funnels, heatmaps, session recordings, A/B testing, SEO tools and more advanced features. Best for Businesses of all sizes and from all verticals. Advanced users Licencing GPLv3 (core platform).Various commercial licences for plugins. Pricing Self-hosted : Free (excluding paid plugins).Cloud version : Starts at $21.67/mo for 50K website hits when paid annually. Matomo Analytics dashboard
Matomo is a powerful web analytics platform that prioritises data privacy and user control. It offers a comprehensive suite of features, including live analytics updated every 10 seconds, custom reporting, e-commerce tracking and more. You can choose between a full-featured open-source, self-hosted platform free of charge or a cloud-based, fully managed paid analytics service.
Matomo also offers 100% data ownership and has a user base of over 1 million websites, including heavyweights like NASA, the European Commission, ahrefs and the United Nations.
Plausible Analytics
Main Features Basic website analytics (page views, visitors, referrers, etc.), custom events, goal tracking and some campaign tracking features. Best for Website owners, bloggers and small businesses.Non-technical users. Licencing AGPLv3. Pricing Self-hosted : FreeCloud version : Starts at $7.50/mo for 10K website hits when paid annually. Plausible Analytics
(Image source)Plausible Analytics is a lightweight, privacy-focused analytics tool designed to be simple and easy to use. It provides essential website traffic data without complex configurations or intrusive tracking.
Fathom Lite & Fathom Analytics
Main features Basic website analytics (page views, visitors, referrers, etc.), custom events and goal tracking. Best for Website owners and small businesses.Non-technical users. Licencing Fathom Lite : MIT Licence (self-hosted).Fathom Analytics : Proprietary. Pricing Fathom Lite : Free but currently unsupported.Cloud version : Starts at $12.50/month for up to 50 sites when paid annually. Fathom Analytics
(Image source)Fathom started as an open-source platform in 2018. But after the founders released V1.0.1, they switched to a closed-source, paid, proprietary model called Fathom Analytics. Since then, it has always been closed-source.
However, the open-source version, Fathom Lite, is still available. It has very limited functionality, uses cookies and is currently unsupported by the company. No new features are under development and uptime isn’t guaranteed.
Matomo vs. Plausible vs. Fathom
Matomo, Plausible, and Fathom are all open-source, privacy-focused alternatives to Google Analytics. They offer features like no data sampling, data ownership, and EU-based cloud hosting.
Here’s a head-to-head comparison of the three :
Matomo Plausible Fathom Focus Comprehensive, feature-rich, customizable Simple, lightweight, beginner-friendly Simple, lightweight, privacy-focused Target User Businesses, marketers and analysts seeking depth Beginners, bloggers, and small businesses Website owners and users prioritising simplicity Open Source Fully open-source Fully open-source Limited open-source version Advanced analytics Extensive Very limited Very limited Integrations 100+ Limited Fewer than 15 Customisation High Low Low Data management Granular control, raw data access, complex queries Simplified, no raw data access Simplified, no raw data access GDPR features Compliant by design, plus GDPR Manager Guides only Compliant by design Pricing Generally higher Generally lower Intermediate Learning curve Steeper Gentle Gentle The open-core dilemma
Open-source platforms are beneficial and trustworthy, leading some companies to falsely market themselves as such.
Some were once open-source but later became commercial, criticised as “bait-and-switch.” Others offer a limited open-source “core” with proprietary features, called the “open core” model. While this dual licensing can be ethical and sustainable, some abuse it by offering a low-value open-source version and hiding valuable features behind a paywall.
However, other companies have embraced the dual-licensing model in a more ethical way, providing a valuable solution with a wide range of tools under the open-source license and only leaving premium, non-essential add-ons as paid features.
Matomo is a prime example of this practice, championing the principles of open-source analytics while developing a sustainable business model for its users’ benefit.
Choose Matomo as your open-source data analytics tool
Open-source analytics platforms offer compelling advantages over proprietary solutions like Google Analytics. They provide greater transparency, data ownership and customisation. Choosing an open-source analytics platform over a proprietary one gives you more control over your customers’ data and supports compliance with user privacy regulations.
With its comprehensive features, powerful tools, commitment to privacy and active community, Matomo stands out as a leading choice. Make the switch to Matomo for ethical, user-focused analytics.
-
Reverse Engineering Italian Literature
1er juillet 2014, par Multimedia Mike — Reverse EngineeringSome time ago, Diego “Flameeyes” Pettenò tried his hand at reverse engineering a set of really old CD-ROMs containing even older Italian literature. The goal of this RE endeavor would be to extract the useful literature along with any structural metadata (chapters, etc.) and convert it to a more open format suitable for publication at, e.g., Project Gutenberg or Archive.org.
Unfortunately, the structure of the data thwarted the more simplistic analysis attempts (like inspecting for blocks of textual data). This will require deeper RE techniques. Further frustrating the effort, however, is the fact that the binaries that implement the reading program are written for the now-archaic Windows 3.1 operating system.
In pursuit of this RE goal, I recently thought of a way to glean more intelligence using DOSBox.
Prior Work
There are 6 discs in the full set (distributed along with 6 sequential issues of a print magazine named L’Espresso). Analysis of the contents of the various discs reveals that many of the files are the same on each disc. It was straightforward to identify the set of files which are unique on each disc. This set of files all end with the extension “LZn”, where n = 1..6 depending on the disc number. Further, the root directory of each disc has a file indicating the sequence number (1..6) of the CD. Obviously, these are the interesting targets.The LZ file extensions stand out to an individual skilled in the art of compression– could it be a variation of the venerable LZ compression ? That’s actually unlikely because LZ — also seen as LIZ — stands for Letteratura Italiana Zanichelli (Zanichelli’s Italian Literature).
The Unix ‘file’ command was of limited utility, unable to plausibly identify any of the files.
Progress was stalled.
Saying Hello To An Old Frenemy
I have been showing this screenshot to younger coworkers to see if any of them recognize it :
Not a single one has seen it before. Senior computer citizen status : Confirmed.
I recently watched an Ancient DOS Games video about Windows 3.1 games. This episode showed Windows 3.1 running under DOSBox. I had heard this was possible but that it took a little work to get running. I had a hunch that someone else had probably already done the hard stuff so I took to the BitTorrent networks and quickly found a download that had the goods ready to go– a directory of Windows 3.1 files that just had to be dropped into a DOSBox directory and they would be ready to run.
Aside : Running OS software procured from a BitTorrent network ? Isn’t that an insane security nightmare ? I’m not too worried since it effectively runs under a sandboxed virtual machine, courtesy of DOSBox. I suppose there’s the risk of trojan’d OS software infecting binaries that eventually leave the sandbox.
Using DOSBox Like ‘strace’
strace is a tool available on some Unix systems, including Linux, which is able to monitor the system calls that a program makes. In reverse engineering contexts, it can be useful to monitor an opaque, binary program to see the names of the files it opens and how many bytes it reads, and from which locations. I have written examples of this before (wow, almost 10 years ago to the day ; now I feel old for the second time in this post).Here’s the pitch : Make DOSBox perform as strace in order to serve as a platform for reverse engineering Windows 3.1 applications. I formed a mental model about how DOSBox operates — abstracted file system classes with methods for opening and reading files — and then jumped into the source code. Sure enough, the code was exactly as I suspected and a few strategic print statements gave me the data I was looking for.
Eventually, I even took to running DOSBox under the GNU Debugger (GDB). This hasn’t proven especially useful yet, but it has led to an absurd level of nesting :
The target application runs under Windows 3.1, which is running under DOSBox, which is running under GDB. This led to a crazy situation in which DOSBox had the mouse focus when a GDB breakpoint was triggered. At this point, DOSBox had all desktop input focus and couldn’t surrender it because it wasn’t running. I had no way to interact with the Linux desktop and had to reboot the computer. The next time, I took care to only use the keyboard to navigate the application and trigger the breakpoint and not allow DOSBox to consume the mouse focus.
New Intelligence
By instrumenting the local file class (virtual HD files) and the ISO file class (CD-ROM files), I was able to watch which programs and dynamic libraries are loaded and which data files the code cares about. I was able to narrow down the fact that the most interesting programs are called LEGGENDO.EXE (‘reading’) and LEGGENDA.EXE (‘legend’ ; this has been a great Italian lesson as well as RE puzzle). The first calls the latter, which displays this view of the data we are trying to get at :
When first run, the program takes an interest in a file called DBBIBLIO (‘database library’, I suspect) :
=== Read(’LIZ98\DBBIBLIO.LZ1’) : req 337 bytes ; read 337 bytes from pos 0x0 === Read(’LIZ98\DBBIBLIO.LZ1’) : req 337 bytes ; read 337 bytes from pos 0x151 === Read(’LIZ98\DBBIBLIO.LZ1’) : req 337 bytes ; read 337 bytes from pos 0x2A2 [...]
While we were unable to sort out all of the data files in our cursory investigation, a few things were obvious. The structure of this file looked to contain 336-byte records. Turns out I was off by 1– the records are actually 337 bytes each. The count of records read from disc is equal to the number of items shown in the UI.
Next, the program is interested in a few more files :
*** isoFile() : ’DEPOSITO\BLOKCTC.LZ1’, offset 0x27D6000, 2911488 bytes large === Read(’DEPOSITO\BLOKCTC.LZ1’) : req 96 bytes ; read 96 bytes from pos 0x0 *** isoFile() : ’DEPOSITO\BLOKCTX0.LZ1’, offset 0x2A9D000, 17152 bytes large === Read(’DEPOSITO\BLOKCTX0.LZ1’) : req 128 bytes ; read 128 bytes from pos 0x0 === Seek(’DEPOSITO\BLOKCTX0.LZ1’) : seek 384 (0x180) bytes, type 0 === Read(’DEPOSITO\BLOKCTX0.LZ1’) : req 256 bytes ; read 256 bytes from pos 0x180 === Seek(’DEPOSITO\BLOKCTC.LZ1’) : seek 1152 (0x480) bytes, type 0 === Read(’DEPOSITO\BLOKCTC.LZ1’) : req 32 bytes ; read 32 bytes from pos 0x480 === Read(’DEPOSITO\BLOKCTC.LZ1’) : req 1504 bytes ; read 1504 bytes from pos 0x4A0 [...]
Eventually, it becomes obvious that BLOKCTC has the juicy meat. There are 32-byte records followed by variable-length encoded text sections. Since there is no text to be found in these files, the text is either compressed, encrypted, or both. Some rough counting (the program seems to disable copy/paste, which thwarts more precise counting), indicates that the text size is larger than the data chunks being read from disc, so compression seems likely. Encryption isn’t out of the question (especially since the program deems it necessary to disable copy and pasting of this public domain literary data), and if it’s in use, that means the key is being read from one of these files.
Blocked On Disassembly
So I’m a bit blocked right now. I know exactly where the data lives, but it’s clear that I need to reverse engineer some binary code. The big problem is that I have no idea how to disassemble Windows 3.1 binaries. These are NE-type executable files. Disassemblers abound for MZ files (MS-DOS executables) and PE files (executables for Windows 95 and beyond). NE files get no respect. It’s difficult (but not impossible) to even find data about the format anymore, and details are incomplete. It should be noted, however, the DOSBox-as-strace method described here lends insight into how Windows 3.1 processes NE-type EXEs. You can’t get any more authoritative than that.So far, I have tried the freeware version of IDA Pro. Unfortunately, I haven’t been able to get the program to work on my Windows machine for a long time. Even if I could, I can’t find any evidence that it actually supports NE files (the free version specifically mentions MZ and PE, but does not mention NE or LE).
I found an old copy of Borland’s beloved Turbo Assembler and Debugger package. It has Turbo Debugger for Windows, both regular and 32-bit versions. Unfortunately, the normal version just hangs Windows 3.1 in DOSBox. The 32-bit Turbo Debugger loads just fine but can’t load the NE file.
I’ve also wondered if DOSBox contains any advanced features for trapping program execution and disassembling. I haven’t looked too deeply into this yet.
Future Work
NE files seem to be the executable format that time forgot. I have a crazy brainstorm about repacking NE files as MZ executables so that they could be taken apart with an MZ disassembler. But this will take some experimenting.If anyone else has any ideas about ripping open these binaries, I would appreciate hearing them.
And I guess I shouldn’t be too surprised to learn that all the literature in this corpus is already freely available and easily downloadable anyway. But you shouldn’t be too surprised if that doesn’t discourage me from trying to crack the format that’s keeping this particular copy of the data locked up.
-
How to not process any personal data with Matomo and what it means for you
22 avril 2018, par InnoCraftDisclaimer : this blog post has been written by digital analysts, not lawyers. The purpose of this article is to explain how to not process any personal data with Matomo in order to avoid going through the GDPR compliance process with Matomo analytics. This work comes from our interpretation of different sources : the official GDPR text and the UK privacy commission : ICO resources. It cannot be considered as a professional legal advice. So as GDPR, this information is subject to change. GDPR may be also known as RGPD in French, Spanish, Portuguese, Datenschutz-Grundverordnung, DS-GVO in German, Algemene verordening gegevensbescherming in Dutch, Regolamento generale sulla protezione dei dati in Italian.
Are you looking for a way to not process any personal data with Matomo ? If the answer is yes, you are at the right place. From our understanding, if you are not processing personal data, then you shouldn’t be concerned about GDPR. Our inspiration came from this official reference :
“The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes.“
In this blog post we are going to see how you can configure Matomo in order to not process any personal data and what the consequences are.
Which data is considered as personal according to GDPR ?
From : eur-lex.europa.eu
(1) “‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’) ; an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person ;”
(30) “Natural persons may be associated with online identifiers provided by their devices, applications, tools and protocols, such as internet protocol addresses, cookie identifiers or other identifiers such as radio frequency identification tags. This may leave traces which, in particular when combined with unique identifiers and other information received by the servers, may be used to create profiles of the natural persons and identify them.”
So according to your Matomo configuration, it may leave some traces within the following data :
- IP addresses
- Cookies identifiers
- Page URL or page titles
- User ID and Custom “personal” data
- Ecommerce order IDs
- Location
- Heatmaps & Session Recordings
Let’s see each of them in more detail.
1. IP addresses
IP addresses can indirectly identify an individual. It can also give a good approximation of an individual’s location.
IP addresses are therefore considered as personal data which means you need to anonymize them. To do so, a feature is available within Matomo, where you can anonymize the IP. We recommend you to anonymize at least the last two bytes :
See our configuration guide for more information
What are the consequences of using this feature ?
When applying IP anonymization on two bytes, you will no longer be able to see the full IP in the UI.
Moreover, there is a small chance that 2 different visitors with the same device and software configuration will be identified as the same visitor if the anonymised IP address is the same for both.
2. Cookies
It is not clear for us yet if all cookies are considered equal under GDPR. At this stage it is too early to make a definite decision.
Did you know ? Matomo lets you optionally disable the creation of cookies by adding an extra line of code to your tracking code see below.
See our configuration guide for more information
What are the consequences of using this feature ?
Matomo is using a few first party cookies, and the following cookies may hold personal data :
- _pk_id : contains a visitor id used to identify unique visitors
- _pk_ref : to identify from where they came from
If Matomo cannot set cookies, it will use a technique called Fingerprint. It is based on several metadata such as the operating system, browser, browser plugins, IP address, browser language ; just to name a few to identify a unique visitor. As this feature is less accurate than the one using cookies, the number of visitors and visits will be affected.
3. Page URLs and page titles
URLs are not mentioned within the official GDPR text. However, we know that according to the different CMS you use, some of them may have URLs including personal identifiers.
For example :
As a result, you need to find a way to anonymize this data.
There are several ways you can perform this action according to your website. If your website is adding the personal data through query parameters, you can define a rule to exclude them from Matomo.
If the personal data are not included within query parameters, you can use the “setCustomURL” feature and write your code as follow :
See our developer documentation for more information
If you are also processing personal data within the title tag, you can use the following function : “setDocumentTitle”.
What are the consequences of using this feature ?
By anonymizing the URLs containing personal data, some of your URLs will be grouped together.
4. User ID and custom personal data
User ID is a feature (a tracking code needs to be added) which allows you to identify the same user across different devices.
A User ID needs a corresponding database in order to link a user across different devices, it can be an email, a username, a name, a random number… All those data are either direct or non direct online identifiers and are therefore under the scope of GDPR.
It will be the same situation if you are using custom variables and/or custom dimensions in order to push personal data to the system.
To continue using the User ID feature but not recording personal data, you can consider using a hash function which will anonymize/convert your actual User ID into something like “3jrj3j34434834urj33j3”.
Alternatively, you can enable the feature “Anonymise User IDs”. This feature will be available starting in Matomo 3.5.0 :
What are the consequences of using this feature ?
Under GDPR, User ID is personal data. Anonymizing the User ID using a hash function or our built-in functionality make the User Id pseudo-anonymous, which means it can’t be easily identified to a specific user. As a result, you will still get accurate visits and unique visitors metrics, and the Visitor Profile, but without tracking the original User ID which is personal data.
5. Ecommerce order IDs
Order IDs are the reference number assigned to the products/services bought by your customers. As this information can be crossed with your internal database, it is considered as an online identifier and is therefore under the scope of GDPR. As for User ID, you can anonymize order IDs using our built-in functionality to Anonymise Order IDs (see section 4. about User Id).
What are the consequences of anonymizing order ID ?
It really depends on your former use of order IDs. If you were not using them in the past then you should not see any difference.
6. Location
Based on the IP address of a visitor, Matomo can detect the visitors location. Location data is problematic for privacy as this technology has become quite accurate and can detect not only the city a visitor is from, but sometimes an even more precise position of a visitor.
In order to not leave any accurate traces, we strongly recommend you to enable the IP anonymization feature. Next, you need to enable the setting “Also use the anonymized IP address when enriching visits”. You find this setting directly below the IP anonymization. This is important as otherwise the full IP address will be used to geolocate a visitor.
What are the consequences of anonymizing location data ?
The more bytes you anonymize from the IP, the more anonymized your location will be. When you remove two bytes as suggested, the city and region location reports will not be as accurate. In some cases even the country may not be detected correctly anymore.
7. Heatmaps & Session Recordings
Heatmaps & Session Recording is a premium feature in Matomo allowing you to see where users click, hover, type and scroll. With session recordings you can then replay their actions in a video.
Heatmaps & Session Recordings are under the scope of GDPR as they can disclose in some specific cases (for example : filling a contact form) personal data :
To avoid this, Matomo will anonymize all keystrokes which a user enters into a form field unless you specifically whitelist a field. Many fields that could contain personal data, such as a credit card, phone number, email address, password, social security number, and more are always anonymized and not recorded.
See our configuration guide for more information
Note that a page may still show personal information within the page as part of regular content (not a form element). For example an address, or the profile page of a forum user. We have added a feature which allows you to set an HTML attribute “data-matomo-mask” to anonymize any personal content shown in the UI.
What are the consequences of using this feature ?
Mainly, you will not be able to see in plain text what people are entering into your forms.
What should you do with past data ?
Once more, we have to say that we are not lawyers. So do not take our answers as legal advice. From : ec.europa.eu/newsroom/article29/document.cfm ?doc_id=50053
“For example, as the GDPR requires that a controller must be able to demonstrate that valid consent was obtained, all presumed consents of which no references are kept will automatically be below the consent standard of the GDPR and will need to be renewed.”
Our interpretation is that, if you were previously relying on consent, unless you can demonstrate that valid consent was obtained, you need to get the consent back (which is almost impossible) or you need to anonymize or remove that data.
To anonymize previously tracked data, we are actively working on a feature to do just that directly within Matomo. Alternatively, you may also set up the deletion of logs after a certain amount of time.
We really hope you enjoyed reading this article. GDPR is still on the go and we are pretty sure you have a lot of questions about it. You probably would like to share our vision about it. So do not hesitate to ask us through our contact form to see how we are interpreting GDPR at Matomo and InnoCraft.
The post How to not process any personal data with Matomo and what it means for you appeared first on Analytics Platform - Matomo.