Recherche avancée

Médias (91)

Autres articles (47)

  • Des sites réalisés avec MediaSPIP

    2 mai 2011, par

    Cette page présente quelques-uns des sites fonctionnant sous MediaSPIP.
    Vous pouvez bien entendu ajouter le votre grâce au formulaire en bas de page.

  • Support de tous types de médias

    10 avril 2011

    Contrairement à beaucoup de logiciels et autres plate-formes modernes de partage de documents, MediaSPIP a l’ambition de gérer un maximum de formats de documents différents qu’ils soient de type : images (png, gif, jpg, bmp et autres...) ; audio (MP3, Ogg, Wav et autres...) ; vidéo (Avi, MP4, Ogv, mpg, mov, wmv et autres...) ; contenu textuel, code ou autres (open office, microsoft office (tableur, présentation), web (html, css), LaTeX, Google Earth) (...)

  • Librairies et logiciels spécifiques aux médias

    10 décembre 2010, par

    Pour un fonctionnement correct et optimal, plusieurs choses sont à prendre en considération.
    Il est important, après avoir installé apache2, mysql et php5, d’installer d’autres logiciels nécessaires dont les installations sont décrites dans les liens afférants. Un ensemble de librairies multimedias (x264, libtheora, libvpx) utilisées pour l’encodage et le décodage des vidéos et sons afin de supporter le plus grand nombre de fichiers possibles. Cf. : ce tutoriel ; FFMpeg avec le maximum de décodeurs et (...)

Sur d’autres sites (9437)

  • What is Audience Segmentation ? The 5 Main Types & Examples

    16 novembre 2023, par Erin — Analytics Tips

    The days of mass marketing with the same message for millions are long gone. Today, savvy marketers instead focus on delivering the most relevant message to the right person at the right time.

    They do this at scale by segmenting their audiences based on various data points. This isn’t an easy process because there are many types of audience segmentation. If you take the wrong approach, you risk delivering irrelevant messages to your audience — or breaking their trust with poor data management.

    In this article, we’ll break down the most common types of audience segmentation, share examples highlighting their usefulness and cover how you can segment campaigns without breaking data regulations.

    What is audience segmentation ?

    Audience segmentation is when you divide your audience into multiple smaller specific audiences based on various factors. The goal is to deliver a more targeted marketing message or to glean unique insights from analytics.

    It can be as broad as dividing a marketing campaign by location or as specific as separating audiences by their interests, hobbies and behaviour.

    Illustration of basic audience segmentation

    Audience segmentation inherently makes a lot of sense. Consider this : an urban office worker and a rural farmer have vastly different needs. By targeting your marketing efforts towards agriculture workers in rural areas, you’re honing in on a group more likely to be interested in farm equipment. 

    Audience segmentation has existed since the beginning of marketing. Advertisers used to select magazines and placements based on who typically read them. They would run a golf club ad in a golf magazine, not in the national newspaper.

    How narrow you can make your audience segments by leveraging multiple data points has changed.

    Why audience segmentation matters

    In a survey by McKinsey, 71% of consumers said they expected personalisation, and 76% get frustrated when a vendor doesn’t deliver.

    Illustrated statistics that show the importance of personalisation

    These numbers reflect expectations from consumers who have actively engaged with a brand — created an account, signed up for an email list or purchased a product.

    They expect you to take that data and give them relevant product recommendations — like a shoe polishing kit if you bought nice leather loafers.

    If you don’t do any sort of audience segmentation, you’re likely to frustrate your customers with post-sale campaigns. If, for example, you just send the same follow-up email to all customers, you’d damage many relationships. Some might ask : “What ? Why would you think I need that ?” Then they’d promptly opt out of your email marketing campaigns.

    To avoid that, you need to segment your audience so you can deliver relevant content at all stages of the customer journey.

    5 key types of audience segmentation

    To help you deliver the right content to the right person or identify crucial insights in analytics, you can use five types of audience segmentation : demographic, behavioural, psychographic, technographic and transactional.

    Diagram of the main types of audience segmentation

    Demographic segmentation 

    Demographic segmentation is when you segment a larger audience based on demographic data points like location, age or other factors.

    The most basic demographic segmentation factor is location, which is easy to leverage in marketing efforts. For example, geographic segmentation can use IP addresses and separate marketing efforts by country. 

    But more advanced demographic data points are becoming increasingly sensitive to handle. Especially in Europe, GDPR makes advanced demographics a more tentative subject. Using age, education level and employment to target marketing campaigns is possible. But you need to navigate this terrain thoughtfully and responsibly, ensuring meticulous adherence to privacy regulations.

    Potential data points :

    • Location
    • Age
    • Marital status
    • Income
    • Employment 
    • Education

    Example of effective demographic segmentation :

    A clothing brand targeting diverse locations needs to account for the varying weather conditions. In colder regions, showcasing winter collections or insulated clothing might resonate more with the audience. Conversely, in warmer climates, promoting lightweight or summer attire could be more effective. 

    Here are two ads run by North Face on Facebook and Instagram to different audiences to highlight different collections :

    Each collection is featured differently and uses a different approach with its copy and even the media. With social media ads, targeting people based on advanced demographics is simple enough — you can just single out the factors when making your campaign. But if you don’t want to rely on these data-mining companies, that doesn’t mean you have no options for segmentation.

    Consider allowing people to self-select their interests or preferences by incorporating a short survey within your email sign-up form. This simple addition can enhance engagement, decrease bounce rates, and ultimately improve conversion rates, offering valuable insights into audience preferences.

    This is a great way to segment ethically and without the need of data-mining companies.

    Behavioural segmentation

    Behavioural segmentation segments audiences based on their interaction with your website or app.

    You use various data points to segment your target audience based on their actions.

    Potential data points :

    • Page visits
    • Referral source
    • Clicks
    • Downloads
    • Video plays
    • Goal completion (e.g., signing up for a newsletter or purchasing a product)

    Example of using behavioural segmentation to improve campaign efficiency :

    One effective method involves using a web analytics tool such as Matomo to uncover patterns. By segmenting actions like specific clicks and downloads, pinpoint valuable trends—identifying actions that significantly enhance visitor conversions. 

    Example of a segmented behavioral analysis in Matomo

    For instance, if a case study video substantially boosts conversion rates, elevate its prominence to capitalise on this success.

    Then, you can set up a conditional CTA within the video player. Make it pop up after the user has watched the entire video. Use a specific form and sign them up to a specific segment for each case study. This way, you know the prospect’s ideal use case without surveying them.

    This is an example of behavioural segmentation that doesn’t rely on third-party cookies.

    Psychographic segmentation

    Psychographic segmentation is when you segment audiences based on your interpretation of their personality or preferences.

    Potential data points :

    • Social media patterns
    • Follows
    • Hobbies
    • Interests

    Example of effective psychographic segmentation :

    Here, Adidas segments its audience based on whether they like cycling or rugby. It makes no sense to show a rugby ad to someone who’s into cycling and vice versa. But to rugby athletes, the ad is very relevant.

    If you want to avoid social platforms, you can use surveys about hobbies and interests to segment your target audience in an ethical way.

    Technographic segmentation

    Technographic segmentation is when you single out specific parts of your audience based on which hardware or software they use.

    Potential data points :

    • Type of device used
    • Device model or brand
    • Browser used

    Example of segmenting by device type to improve user experience :

    Upon noticing a considerable influx of tablet users accessing their platform, a leading news outlet decided to optimise their tablet browsing experience. They overhauled the website interface, focusing on smoother navigation and better readability for tablet users. These changes offered tablet users a seamless and enjoyable reading experience tailored precisely to their device.

    Transactional segmentation

    Transactional segmentation is when you use your customers’ purchase history to better target your marketing message to their needs.

    When consumers prefer personalisation, they typically mean based on their actual transactions, not their social media profiles.

    Potential data points :

    • Average order value
    • Product categories purchased within X months
    • X days since the last purchase of a consumable product

    Example of effective transactional segmentation :

    A pet supply store identifies a segment of customers consistently purchasing cat food but not other pet products. They create targeted email campaigns offering discounts or loyalty rewards specifically for cat-related items to encourage repeat purchases within this segment.

    If you want to improve customer loyalty and increase revenue, the last thing you should do is send generic marketing emails. Relevant product recommendations or coupons are the best way to use transactional segmentation.

    B2B-specific : Firmographic segmentation

    Beyond the five main segmentation types, B2B marketers often use “firmographic” factors when segmenting their campaigns. It’s a way to segment campaigns that go beyond the considerations of the individual.

    Potential data points :

    • Company size
    • Number of employees
    • Company industry
    • Geographic location (office)

    Example of effective firmographic segmentation :

    Companies of different sizes won’t need the same solution — so segmenting leads by company size is one of the most common and effective examples of B2B audience segmentation.

    The difference here is that B2B campaigns are often segmented through manual research. With an account-based marketing approach, you start by researching your potential customers. You then separate the target audience into smaller segments (or even a one-to-one campaign).

    Start segmenting and analysing your audience more deeply with Matomo

    Segmentation is a great place to start if you want to level up your marketing efforts. Modern consumers expect to get relevant content, and you must give it to them.

    But doing so in a privacy-sensitive way is not always easy. You need the right approach to segment your customer base without alienating them or breaking regulations.

    That’s where Matomo comes in. Matomo champions privacy compliance while offering comprehensive insights and segmentation capabilities. With robust privacy controls and cookieless configuration, it ensures GDPR and other regulations are met, empowering data-driven decisions without compromising user privacy.

    Take advantage of our 21-day free trial to get insights that can help you improve your marketing strategy and better reach your target audience. No credit card required.

  • Developing MobyCAIRO

    26 mai 2021, par Multimedia Mike — General

    I recently published a tool called MobyCAIRO. The ‘CAIRO’ part stands for Computer-Assisted Image ROtation, while the ‘Moby’ prefix refers to its role in helping process artifact image scans to submit to the MobyGames database. The tool is meant to provide an accelerated workflow for rotating and cropping image scans. It works on both Windows and Linux. Hopefully, it can solve similar workflow problems for other people.

    As of this writing, MobyCAIRO has not been tested on Mac OS X yet– I expect some issues there that should be easily solvable if someone cares to test it.

    The rest of this post describes my motivations and how I arrived at the solution.

    Background
    I have scanned well in excess of 2100 images for MobyGames and other purposes in the past 16 years or so. The workflow looks like this :


    Workflow diagram

    Image workflow


    It should be noted that my original workflow featured me manually rotating the artifact on the scanner bed in order to ensure straightness, because I guess I thought that rotate functions in image editing programs constituted dark, unholy magic or something. So my workflow used to be even more arduous :


    Longer workflow diagram

    I can’t believe I had the patience to do this for hundreds of scans


    Sometime last year, I was sitting down to perform some more scanning and found myself dreading the oncoming tedium of straightening and cropping the images. This prompted a pivotal question :


    Why can’t a computer do this for me ?

    After all, I have always been a huge proponent of making computers handle the most tedious, repetitive, mind-numbing, and error-prone tasks. So I did some web searching to find if there were any solutions that dealt with this. I also consulted with some like-minded folks who have to cope with the same tedious workflow.

    I came up empty-handed. So I endeavored to develop my own solution.

    Problem Statement and Prior Work

    I want to develop a workflow that can automatically rotate an image so that it is straight, and also find the most likely crop rectangle, uniformly whitening the area outside of the crop area (in the case of circles).

    As mentioned, I checked to see if any other programs can handle this, starting with my usual workhorse, Photoshop Elements. But I can’t expect the trimmed down version to do everything. I tried to find out if its big brother could handle the task, but couldn’t find a definitive answer on that. Nor could I find any other tools that seem to take an interest in optimizing this particular workflow.

    When I brought this up to some peers, I received some suggestions, including an idea that the venerable GIMP had a feature like this, but I could not find any evidence. Further, I would get responses of “Program XYZ can do image rotation and cropping.” I had to tamp down on the snark to avoid saying “Wow ! An image editor that can perform rotation AND cropping ? What a game-changer !” Rotation and cropping features are table stakes for any halfway competent image editor for the last 25 or so years at least. I am hoping to find or create a program which can lend a bit of programmatic assistance to the task.

    Why can’t other programs handle this ? The answer seems fairly obvious : Image editing tools are general tools and I want a highly customized workflow. It’s not reasonable to expect a turnkey solution to do this.

    Brainstorming An Approach
    I started with the happiest of happy cases— A disc that needed archiving (a marketing/press assets CD-ROM from a video game company, contents described here) which appeared to have some pretty clear straight lines :


    Ubisoft 2004 Product Catalog CD-ROM

    My idea was to try to find straight lines in the image and then rotate the image so that the image is parallel to the horizontal based on the longest single straight line detected.

    I just needed to figure out how to find a straight line inside of an image. Fortunately, I quickly learned that this is very much a solved problem thanks to something called the Hough transform. As a bonus, I read that this is also the tool I would want to use for finding circles, when I got to that part. The nice thing about knowing the formal algorithm to use is being able to find efficient, optimized libraries which already implement it.

    Early Prototype
    A little searching for how to perform a Hough transform in Python led me first to scikit. I was able to rapidly produce a prototype that did some basic image processing. However, running the Hough transform directly on the image and rotating according to the longest line segment discovered turned out not to yield expected results.


    Sub-optimal rotation

    It also took a very long time to chew on the 3300×3300 raw image– certainly longer than I care to wait for an accelerated workflow concept. The key, however, is that you are apparently not supposed to run the Hough transform on a raw image– you need to compute the edges first, and then attempt to determine which edges are ‘straight’. The recommended algorithm for this step is the Canny edge detector. After applying this, I get the expected rotation :


    Perfect rotation

    The algorithm also completes in a few seconds. So this is a good early result and I was feeling pretty confident. But, again– happiest of happy cases. I should also mention at this point that I had originally envisioned a tool that I would simply run against a scanned image and it would automatically/magically make the image straight, followed by a perfect crop.

    Along came my MobyGames comrade Foxhack to disabuse me of the hope of ever developing a fully automated tool. Just try and find a usefully long straight line in this :


    Nascar 07 Xbox Scan, incorrectly rotated

    Darn it, Foxhack…

    There are straight edges, to be sure. But my initial brainstorm of rotating according to the longest straight edge looks infeasible. Further, it’s at this point that we start brainstorming that perhaps we could match on ratings badges such as the standard ESRB badges omnipresent on U.S. video games. This gets into feature detection and complicates things.

    This Needs To Be Interactive
    At this point in the effort, I came to terms with the fact that the solution will need to have some element of interactivity. I will also need to get out of my safe Linux haven and figure out how to develop this on a Windows desktop, something I am not experienced with.

    I initially dreamed up an impressive beast of a program written in C++ that leverages Windows desktop GUI frameworks, OpenGL for display and real-time rotation, GPU acceleration for image analysis and processing tricks, and some novel input concepts. I thought GPU acceleration would be crucial since I have a fairly good GPU on my main Windows desktop and I hear that these things are pretty good at image processing.

    I created a list of prototyping tasks on a Trello board and made a decent amount of headway on prototyping all the various pieces that I would need to tie together in order to make this a reality. But it was ultimately slowgoing when you can only grab an hour or 2 here and there to try to get anything done.

    Settling On A Solution
    Recently, I was determined to get a set of old shareware discs archived. I ripped the data a year ago but I was blocked on the scanning task because I knew that would also involve tedious straightening and cropping. So I finally got all the scans done, which was reasonably quick. But I was determined to not manually post-process them.

    This was fairly recent, but I can’t quite recall how I managed to come across the OpenCV library and its Python bindings. OpenCV is an amazing library that provides a significant toolbox for performing image processing tasks. Not only that, it provides “just enough” UI primitives to be able to quickly create a basic GUI for your program, including image display via multiple windows, buttons, and keyboard/mouse input. Furthermore, OpenCV seems to be plenty fast enough to do everything I need in real time, just with (accelerated where appropriate) CPU processing.

    So I went to work porting the ideas from the simple standalone Python/scikit tool. I thought of a refinement to the straight line detector– instead of just finding the longest straight edge, it creates a histogram of 360 rotation angles, and builds a list of lines corresponding to each angle. Then it sorts the angles by cumulative line length and allows the user to iterate through this list, which will hopefully provide the most likely straightened angle up front. Further, the tool allows making fine adjustments by 1/10 of an angle via the keyboard, not the mouse. It does all this while highlighting in red the straight line segments that are parallel to the horizontal axis, per the current candidate angle.


    MobyCAIRO - rotation interface

    The tool draws a light-colored grid over the frame to aid the user in visually verifying the straightness of the image. Further, the program has a mode that allows the user to see the algorithm’s detected edges :


    MobyCAIRO - show detected lines

    For the cropping phase, the program uses the Hough circle transform in a similar manner, finding the most likely circles (if the image to be processed is supposed to be a circle) and allowing the user to cycle among them while making precise adjustments via the keyboard, again, rather than the mouse.


    MobyCAIRO - assisted circle crop

    Running the Hough circle transform is a significantly more intensive operation than the line transform. When I ran it on a full 3300×3300 image, it ran for a long time. I didn’t let it run longer than a minute before forcibly ending the program. Is this approach unworkable ? Not quite– It turns out that the transform is just as effective when shrinking the image to 400×400, and completes in under 2 seconds on my Core i5 CPU.

    For rectangular cropping, I just settled on using OpenCV’s built-in region-of-interest (ROI) facility. I tried to intelligently find the best candidate rectangle and allow fine adjustments via the keyboard, but I wasn’t having much success, so I took a path of lesser resistance.

    Packaging and Residual Weirdness
    I realized that this tool would be more useful to a broader Windows-using base of digital preservationists if they didn’t have to install Python, establish a virtual environment, and install the prerequisite dependencies. Thus, I made the effort to figure out how to wrap the entire thing up into a monolithic Windows EXE binary. It is available from the project’s Github release page (another thing I figured out for the sake of this project !).

    The binary is pretty heavy, weighing in at a bit over 50 megabytes. You might advise using compression– it IS compressed ! Before I figured out the --onefile command for pyinstaller.exe, the generated dist/ subdirectory was 150 MB. Among other things, there’s a 30 MB FORTRAN BLAS library packaged in !

    Conclusion and Future Directions
    Once I got it all working with a simple tkinter UI up front in order to select between circle and rectangle crop modes, I unleashed the tool on 60 or so scans in bulk, using the Windows forfiles command (another learning experience). I didn’t put a clock on the effort, but it felt faster. Of course, I was livid with proudness the whole time because I was using my own tool. I just wish I had thought of it sooner. But, really, with 2100+ scans under my belt, I’m just getting started– I literally have thousands more artifacts to scan for preservation.

    The tool isn’t perfect, of course. Just tonight, I threw another scan at MobyCAIRO. Just go ahead and try to find straight lines in this specimen :


    Reading Who? Reading You! CD-ROM

    I eventually had to use the text left and right of center to line up against the grid with the manual keyboard adjustments. Still, I’m impressed by how these computer vision algorithms can see patterns I can’t, highlighting lines I never would have guessed at.

    I’m eager to play with OpenCV some more, particularly the video processing functions, perhaps even some GPU-accelerated versions.

    The post Developing MobyCAIRO first appeared on Breaking Eggs And Making Omelettes.

  • Open Media Developers Track at OVC 2011

    11 octobre 2011, par silvia

    The Open Video Conference that took place on 10-12 September was so overwhelming, I’ve still not been able to catch my breath ! It was a dense three days for me, even though I only focused on the technology sessions of the conference and utterly missed out on all the policy and content discussions.

    Roughly 60 people participated in the Open Media Software (OMS) developers track. This was an amazing group of people capable and willing to shape the future of video technology on the Web :

    • HTML5 video developers from Apple, Google, Opera, and Mozilla (though we missed the NZ folks),
    • codec developers from WebM, Xiph, and MPEG,
    • Web video developers from YouTube, JWPlayer, Kaltura, VideoJS, PopcornJS, etc.,
    • content publishers from Wikipedia, Internet Archive, YouTube, Netflix, etc.,
    • open source tool developers from FFmpeg, gstreamer, flumotion, VideoLAN, PiTiVi, etc,
    • and many more.

    To provide a summary of all the discussions would be impossible, so I just want to share the key take-aways that I had from the main sessions.

    WebRTC : Realtime Communications and HTML5

    Tim Terriberry (Mozilla), Serge Lachapelle (Google) and Ethan Hugg (CISCO) moderated this session together (slides). There are activities both at the W3C and at IETF – the ones at IETF are supposed to focus on protocols, while the W3C ones on HTML5 extensions.

    The current proposal of a PeerConnection API has been implemented in WebKit/Chrome as open source. It is expected that Firefox will have an add-on by Q1 next year. It enables video conferencing, including media capture, media encoding, signal processing (echo cancellation etc), secure transmission, and a data stream exchange.

    Current discussions are around the signalling protocol and whether SIP needs to be required by the standard. Further, the codec question is under discussion with a question whether to mandate VP8 and Opus, since transcoding gateways are not desirable. Another question is how to measure the quality of the connection and how to report errors so as to allow adaptation.

    What always amazes me around RTC is the sheer number of specialised protocols that seem to be required to implement this. WebRTC does not disappoint : in fact, the question was asked whether there could be a lighter alternative than to re-use dozens of years of protocol development – is it over-engineered ? Can desktop players connect to a WebRTC session ?

    We are already in a second or third revision of this part of the HTML5 specification and yet it seems the requirements are still being collected. I’m quietly confident that everything is done to make the lives of the Web developer easier, but it sure looks like a huge task.

    The Missing Link : Flash to HTML5

    Zohar Babin (Kaltura) and myself moderated this session and I must admit that this session was the biggest eye-opener for me amongst all the sessions. There was a large number of Flash developers present in the room and that was great, because sometimes we just don’t listen enough to lessons learnt in the past.

    This session gave me one of those aha-moments : it the form of the Flash appendBytes() API function.

    The appendBytes() function allows a Flash developer to take a byteArray out of a connected video resource and do something with it – such as feed it to a video for display. When I heard that Web developers want that functionality for JavaScript and the video element, too, I instinctively rejected the idea wondering why on earth would a Web developer want to touch encoded video bytes – why not leave that to the browser.

    But as it turns out, this is actually a really powerful enabler of functionality. For example, you can use it to :

    • display mid-roll video ads as part of the same video element,
    • sequence playlists of videos into the same video element,
    • implement DVR functionality (high-speed seeking),
    • do mash-ups,
    • do video editing,
    • adaptive streaming.

    This totally blew my mind and I am now completely supportive of having such a function in HTML5. Together with media fragment URIs you could even leave all the header download management for resources to the Web browser and just request time ranges from a video through an appendBytes() function. This would be easier on the Web developer than having to deal with byte ranges and making sure that appropriate decoding pipelines are set up.

    Standards for Video Accessibility

    Philip Jagenstedt (Opera) and myself moderated this session. We focused on the HTML5 track element and the WebVTT file format. Many issues were identified that will still require work.

    One particular topic was to find a standard means of rendering the UI for caption, subtitle, und description selection. For example, what icons should be used to indicate that subtitles or captions are available. While this is not part of the HTML5 specification, it’s still important to get this right across browsers since otherwise users will get confused with diverging interfaces.

    Chaptering was discussed and a particular need to allow URLs to directly point at chapters was expressed. I suggested the use of named Media Fragment URLs.

    The use of WebVTT for descriptions for the blind was also discussed. A suggestion was made to use the voice tag <v> to allow for “styling” (i.e. selection) of the screen reader voice.

    Finally, multitrack audio or video resources were also discussed and the @mediagroup attribute was explained. A question about how to identify the language used in different alternative dubs was asked. This is an issue because @srclang is not on audio or video, only on text, so it’s a missing feature for the multitrack API.

    Beyond this session, there was also a breakout session on WebVTT and the track element. As a consequence, a number of bugs were registered in the W3C bug tracker.

    WebM : Testing, Metrics and New features

    This session was moderated by John Luther and John Koleszar, both of the WebM Project. They started off with a presentation on current work on WebM, which includes quality testing and improvements, and encoder speed improvement. Then they moved on to questions about how to involve the community more.

    The community criticised that communication of what is happening around WebM is very scarce. More sharing of information was requested, including a move to using open Google+ hangouts instead of Google internal video conferences. More use of the public bug tracker can also help include the community better.

    Another pain point of the community was that code is introduced and removed without much feedback. It was requested to introduce a peer review process. Also it was requested that example code snippets are published when new features are announced so others can replicate the claims.

    This all indicates to me that the WebM project is increasingly more open, but that there is still a lot to learn.

    Standards for HTTP Adaptive Streaming

    This session was moderated by Frank Galligan and Aaron Colwell (Google), and Mark Watson (Netflix).

    Mark started off by giving us an introduction to MPEG DASH, the MPEG file format for HTTP adaptive streaming. MPEG has just finalized the format and he was able to show us some examples. DASH is XML-based and thus rather verbose. It is covering all eventualities of what parameters could be switched during transmissions, which makes it very broad. These include trick modes e.g. for fast forwarding, 3D, multi-view and multitrack content.

    MPEG have defined profiles – one for live streaming which requires chunking of the files on the server, and one for on-demand which requires keyframe alignment of the files. There are clear specifications for how to do these with MPEG. Such profiles would need to be created for WebM and Ogg Theora, too, to make DASH universally applicable.

    Further, the Web case needs a more restrictive adaptation approach, since the video element’s API is already accounting for some of the features that DASH provides for desktop applications. So, a Web-specific profile of DASH would be required.

    Then Aaron introduced us to the MediaSource API and in particular the webkitSourceAppend() extension that he has been experimenting with. It is essentially an implementation of the appendBytes() function of Flash, which the Web developers had been asking for just a few sessions earlier. This was likely the biggest announcement of OVC, alas a quiet and technically-focused one.

    Aaron explained that he had been trying to find a way to implement HTTP adaptive streaming into WebKit in a way in which it could be standardised. While doing so, he also came across other requirements around such chunked video handling, in particular around dynamic ad insertion, live streaming, DVR functionality (fast forward), constraint video editing, and mashups. While trying to sort out all these requirements, it became clear that it would be very difficult to implement strategies for stream switching, buffering and delivery of video chunks into the browser when so many different and likely contradictory requirements exist. Also, once an approach is implemented and specified for the browser, it becomes very difficult to innovate on it.

    Instead, the easiest way to solve it right now and learn about what would be necessary to implement into the browser would be to actually allow Web developers to queue up a chunk of encoded video into a video element for decoding and display. Thus, the webkitSourceAppend() function was born (specification).

    The proposed extension to the HTMLMediaElement is as follows :

    partial interface HTMLMediaElement 
      // URL passed to src attribute to enable the media source logic.
      readonly attribute [URL] DOMString webkitMediaSourceURL ;
    

    bool webkitSourceAppend(in Uint8Array data) ;

    // end of stream status codes.
    const unsigned short EOS_NO_ERROR = 0 ;
    const unsigned short EOS_NETWORK_ERR = 1 ;
    const unsigned short EOS_DECODE_ERR = 2 ;

    void webkitSourceEndOfStream(in unsigned short status) ;

    // states
    const unsigned short SOURCE_CLOSED = 0 ;
    const unsigned short SOURCE_OPEN = 1 ;
    const unsigned short SOURCE_ENDED = 2 ;

    readonly attribute unsigned short webkitSourceState ;
     ;

    The code is already checked into WebKit, but commented out behind a command-line compiler flag.

    Frank then stepped forward to show how webkitSourceAppend() can be used to implement HTTP adaptive streaming. His example uses WebM – there are no examples with MPEG or Ogg yet.

    The chunks that Frank’s demo used were 150 video frames long (6.25s) and 5s long audio. Stream switching only switched video, since audio data is much lower bandwidth and more important to retain at high quality. Switching was done on multiplexed files.

    Every chunk requires an XHR range request – this could be optimised if the connections were kept open per adaptation. Seeking works, too, but since decoding requires download of a whole chunk, seeking latency is determined by the time it takes to download and decode that chunk.

    Similar to DASH, when using this approach for live streaming, the server has to produce one file per chunk, since byte range requests are not possible on a continuously growing file.

    Frank did not use DASH as the manifest format for his HTTP adaptive streaming demo, but instead used a hacked-up custom XML format. It would be possible to use JSON or any other format, too.

    After this session, I was actually completely blown away by the possibilities that such a simple API extension allows. If I wasn’t sold on the idea of a appendBytes() function in the earlier session, this one completely changed my mind. While I still believe we need to standardise a HTTP adaptive streaming file format that all browsers will support for all codecs, and I still believe that a native implementation for support of such a file format is necessary, I also believe that this approach of webkitSourceAppend() is what HTML needs – and maybe it needs it faster than native HTTP adaptive streaming support.

    Standards for Browser Video Playback Metrics

    This session was moderated by Zachary Ozer and Pablo Schklowsky (JWPlayer). Their motivation for the topic was, in fact, also HTTP adaptive streaming. Once you leave the decisions about when to do stream switching to JavaScript (through a function such a wekitSourceAppend()), you have to expose stream metrics to the JS developer so they can make informed decisions. The other use cases is, of course, monitoring of the quality of video delivery for reporting to the provider, who may then decide to change their delivery environment.

    The discussion found that we really care about metrics on three different levels :

    • measuring the network performance (bandwidth)
    • measuring the decoding pipeline performance
    • measuring the display quality

    In the end, it seemed that work previously done by Steve Lacey on a proposal for video metrics was generally acceptable, except for the playbackJitter metric, which may be too aggregate to mean much.

    Device Inputs / A/V in the Browser

    I didn’t actually attend this session held by Anant Narayanan (Mozilla), but from what I heard, the discussion focused on how to manage permission of access to video camera, microphone and screen, e.g. when multiple applications (tabs) want access or when the same site wants access in a different session. This may apply to real-time communication with screen sharing, but also to photo sharing, video upload, or canvas access to devices e.g. for time lapse photography.

    Open Video Editors

    This was another session that I wasn’t able to attend, but I believe the creation of good open source video editing software and similar video creation software is really crucial to giving video a broader user appeal.

    Jeff Fortin (PiTiVi) moderated this session and I was fascinated to later see his analysis of the lifecycle of open source video editors. It is shocking to see how many people/projects have tried to create an open source video editor and how many have stopped their project. It is likely that the creation of a video editor is such a complex challenge that it requires a larger and more committed open source project – single people will just run out of steam too quickly. This may be comparable to the creation of a Web browser (see the size of the Mozilla project) or a text processing system (see the size of the OpenOffice project).

    Jeff also mentioned the need to create open video editor standards around playlist file formats etc. Possibly the Open Video Alliance could help. In any case, something has to be done in this space – maybe this would be a good topic to focus next year’s OVC on ?

    Monday’s Breakout Groups

    The conference ended officially on Sunday night, but we had a third day of discussions / hackday at the wonderful New York Lawschool venue. We had collected issues of interest during the two previous days and organised the breakout groups on the morning (Schedule).

    In the Content Protection/DRM session, Mark Watson from Netflix explained how their API works and that they believe that all we need in browsers is a secure way to exchange keys and an indicator of protection scheme is used – the actual protection scheme would not be implemented by the browser, but be provided by the underlying system (media framework/operating system). I think that until somebody actually implements something in a browser fork and shows how this can be done, we won’t have much progress. In my understanding, we may also need to disable part of the video API for encrypted content, because otherwise you can always e.g. grab frames from the video element into canvas and save them from there.

    In the Playlists and Gapless Playback session, there was massive brainstorming about what new cool things can be done with the video element in browsers if playback between snippets can be made seamless. Further discussions were about a standard playlist file formats (such as XSPF, MRSS or M3U), media fragment URIs in playlists for mashups, and the need to expose track metadata for HTML5 media elements.

    What more can I say ? It was an amazing three days and the complexity of problems that we’re dealing with is a tribute to how far HTML5 and open video has already come and exciting news for the kind of applications that will be possible (both professional and community) once we’ve solved the problems of today. It will be exciting to see what progress we will have made by next year’s conference.

    Thanks go to Google for sponsoring my trip to OVC.

    UPDATE : We actually have a mailing list for open media developers who are interested in these and similar topics – do join at http://lists.annodex.net/cgi-bin/mailman/listinfo/foms.