ginger’s thoughts

http://blog.gingertech.net/

Les articles publiés sur le site

  • The use cases for a element in HTML

    1er janvier 2014, par silvia

    The W3C HTML WG and the WHATWG are currently discussing the introduction of a <main> element into HTML.

    The <main> element has been proposed by Steve Faulkner and is specified in a draft extension spec which is about to be accepted as a FPWD (first public working draft) by the W3C HTML WG. This implies that the W3C HTML WG will be looking for implementations and for feedback by implementers on this spec.

    I am supportive of the introduction of a <main> element into HTML. However, I believe that the current spec and use case list don’t make a good enough case for its introduction. Here are my thoughts.

    Main use case: accessibility

    In my opinion, the main use case for the introduction of <main> is accessibility.

    Like any other users, when blind users want to perceive a Web page/application, they need to have a quick means of grasping the content of a page. Since they cannot visually scan the layout and thus determine where the main content is, they use accessibility technology (AT) to find what is known as “landmarks”.

    “Landmarks” tell the user what semantic content is on a page: a header (such as a banner), a search box, a navigation menu, some asides (also called complementary content), a footer, …. and the most important part: the main content of the page. It is this main content that a blind user most often wants to skip to directly.

    In the days of HTML4, a hidden “skip to content” link at the beginning of the Web page was used as a means to help blind users access the main content.

    In the days of ARIA, the aria @role=main enables authors to avoid a hidden link and instead mark the element where the main content begins to allow direct access to the main content. This attribute is supported by AT – in particular screen readers – by making it part of the landmarks that AT can directly skip to.

    Both the hidden link and the ARIA @role=main approaches are, however, band aids: they are being used by those of us that make “finished” Web pages accessible by adding specific extra markup.

    A world where ARIA is not necessary and where accessibility developers would be out of a job because the normal markup that everyone writes already creates accessible Web sites/applications would be much preferable over the current world of band-aids.

    Therefore, to me, the primary use case for a <main> element is to achieve exactly this better world and not require specialized markup to tell a user (or a tool) where the main content on a page starts.

    An immediate effect would be that pages that have a <main> element will expose a “main” landmark to blind and vision-impaired users that will enable them to directly access that main content on the page without having to wade through other text on the page. Without a <main> element, this functionality can currently only be provided using heuristics to skip other semantic and structural elements and is for this reason not typically implemented in AT.

    Other use cases

    The <main> element is a semantic element not unlike other new semantic elements such as <header>, <footer>, <aside>, <article>, <nav>, or <section>. Thus, it can also serve other uses where the main content on a Web page/Web application needs to be identified.

    Data mining

    For data mining of Web content, the identification of the main content is one of the key challenges. Many scholarly articles have been published on this topic. This stackoverflow article references and suggests a multitude of approaches, but the accepted answer says “there’s no way to do this that’s guaranteed to work”. This is because Web pages are inherently complex and many <div>, <p>, <iframe> and other elements are used to provide markup for styling, notifications, ads, analytics and other use cases that are necessary to make a Web page complete, but don’t contribute to what a user consumes as semantically rich content. A <main> element will allow authors to pro-actively direct data mining tools to the main content.

    Search engines

    One particularly important “data mining” tool are search engines. They, too, have a hard time to identify which sections of a Web page are more important than others and employ many heuristics to do so, see e.g. this ACM article. Yet, they still disappoint with poor results pointing to findings of keywords in little relevant sections of a page rather than ranking Web pages higher where the keywords turn up in the main content area. A <main> element would be able to help search engines give text in main content areas a higher weight and prefer them over other areas of the Web page. It would be able to rank different Web pages depending on where on the page the search words are found. The <main> element will be an additional hint that search engines will digest.

    Visual focus

    On small devices, the display of Web pages designed for Desktop often causes confusion as to where the main content can be found and read, in particular when the text ends up being too small to be readable. It would be nice if browsers on small devices had a functionality (maybe a default setting) where Web pages would start being displayed as zoomed in on the main content. This could alleviate some of the headaches of responsive Web design, where the recommendation is to show high priority content as the first content. Right now this problem is addressed through stylesheets that re-layout the page differently depending on device, but again this is a band-aid solution. Explicit semantic markup of the main content can solve this problem more elegantly.

    Styling

    Finally, naturally, <main> would also be used to style the main content differently from others. You can e.g. replace a semantically meaningless <div id=”main”> with a semantically meaningful <main> where their position is identical. My analysis below shows, that this is not always the case, since oftentimes <div id=”main”> is used to group everything together that is not the header – in particular where there are multiple columns. Thus, the ease of styling a <main> element is only a positive side effect and not actually a real use case. It does make it easier, however, to adapt the style of the main content e.g. with media queries.

    Proposed alternative solutions

    It has been proposed that existing markup serves to satisfy the use cases that <main> has been proposed for. Let’s analyse these on some of the most popular Web sites. First let’s list the propsed algorithms.

    Proposed solution No 1: Scooby-Doo

    On Sat, Nov 17, 2012 at 11:01 AM, Ian Hickson <ian@hixie.ch> wrote:
    | The main content is whatever content isn't
    | marked up as not being main content (anything not marked up with <header>,
    | <aside>, <nav>, etc).
    

    This implies that the first element that is not a <header>, <aside>, <nav>, or <footer> will be the element that we want to give to a blind user as the location where they should start reading. The algorithm is implemented in https://gist.github.com/4032962.

    Proposed solution No 2: First article element

    On Sat, Nov 17, 2012 at 8:01 AM, Ian Hickson  wrote:
    | On Thu, 15 Nov 2012, Ian Yang wrote:
    | >
    | > That's a good idea. We really need an element to wrap all the <p>s,
    | > <ul>s, <ol>s, <figure>s, <table>s ... etc of a blog post.
    |
    | That's called <article>.
    

    This approach identifies the first <article> element on the page as containing the main content. Here’s the algorithm for this approach.

    Proposed solution No 3: An example heuristic approach

    The readability plugin has been developed to make Web pages readable by essentially removing all the non-main content from a page. An early source of readability is available. This demonstrates what a heuristic approach can perform.

    Analysing alternative solutions

    Comparison

    I’ve picked 4 typical Websites (top on Alexa) to analyse how these three different approaches fare. Ideally, I’d like to simply apply the above three scripts and compare pictures. However, since the semantic HTML5 elements <header>, <aside>, <nav>, and <footer> are not actually used by any of these Web sites, I don’t actually have this choice.

    So, instead, I decided to make some assumptions of where these semantic elements would be used and what the outcome of applying the first two algorithms would be. I can then compare it to the third, which is a product so we can take screenshots.

    Google.com

    http://google.com – search for “Scooby Doo”.

    The search results page would likely be built with:

    • a <nav> menu for the Google bar
    • a <header> for the search bar
    • another <header> for the login section
    • another <nav> menu for the search types
    • a <div> to contain the rest of the page
    • a <div> for the app bar with the search number
    • a few <aside>s for the left and right column
    • a set of <article>s for the search results
    “Scooby Doo” would find the first element after the headers as the “main content”. This is the element before the app bar in this case. Interestingly, there is a <div @id=main> already in the current Google results page, which “Scooby Doo” would likely also pick. However, there are a nav bar and two asides in this div, which clearly should not be part of the “main content”. Google actually placed a @role=main on a different element, namely the one that encapsulates all the search results.

    “First Article” would find the first search result as the “main content”. While not quite the same as what Google intended – namely all search results – it is close enough to be useful.

    The “readability” result is interesting, since it is not able to identify the main text on the page. It is actually aware of this problem and brings a warning before displaying this page:

    Readability of google.com

    Facebook.com

    https://facebook.com

    A user page would likely be built with:

    • a <header> bar for the search and login bar
    • a <div> to contain the rest of the page
    • an <aside> for the left column
    • a <div> to contain the center and right column
    • an <aside> for the right column
    • a <header> to contain the center column “megaphone”
    • a <div> for the status posting
    • a set of <article>s for the home stream
    “Scooby Doo” would find the first element after the headers as the “main content”. This is the element that contains all three columns. It’s actually a <div @id=content> already in the current Facebook user page, which “Scooby Doo” would likely also pick. However, Facebook selected a different element to place the @role=main : the center column.

    “First Article” would find the first news item in the home stream. This is clearly not what Facebook intended, since they placed the @role=main on the center column, above the first blog post’s title. “First Article” would miss that title and the status posting.

    The “readability” result again disappoints but warns that it failed:

    YouTube.com

    http://youtube.com

    A video page would likely be built with:

    • a <header> bar for the search and login bar
    • a <nav> for the menu
    • a <div> to contain the rest of the page
    • a <header> for the video title and channel links
    • a <div> to contain the video with controls
    • a <div> to contain the center and right column
    • an <aside> for the right column with an <article> per related video
    • an <aside> for the information below the video
    • a <article> per comment below the video
    “Scooby Doo” would find the first element after the headers as the “main content”. This is the element that contains the rest of the page. It’s actually a <div @id=content> already in the current YouTube video page, which “Scooby Doo” would likely also pick. However, YouTube’s related videos and comments are unlikely to be what the user would regard as “main content” – it’s the video they are after, which generously has a <div id=watch-player>.

    “First Article” would find the first related video or comment in the home stream. This is clearly not what YouTube intends.

    The “readability” result is not quite as unusable, but still very bare:

    Wikipedia.com

    http://wikipedia.com (“Overscan” page)

    A Wikipedia page would likely be built with:

    • a <header> bar for the search, login and menu items
    • a <div> to contain the rest of the page
    • an &ls; article> with title and lots of text
    • <article> an <aside> with the table of contents
    • several <aside>s for the left column
    Good news: “Scooby Doo” would find the first element after the headers as the “main content”. This is the element that contains the rest of the page. It’s actually a <div id=”content” role=”main”> element on Wikipedia, which “Scooby Doo” would likely also pick.

    “First Article” would find the title and text of the main element on the page, but it would also include an <aside>.

    The “readability” result is also in agreement.

    Results

    In the following table we have summarised the results for the experiments:

    Site Scooby-Doo First article Readability
    Google.com FAIL SUCCESS FAIL
    Facebook.com FAIL FAIL FAIL
    YouTube.com FAIL FAIL FAIL
    Wikipedia.com SUCCESS SUCCESS SUCCESS

    Clearly, Wikipedia is the prime example of a site where even the simple approaches find it easy to determine the main content on the page. WordPress blogs are similarly successful. Almost any other site, including news sites, social networks and search engine sites are petty hopeless with the proposed approaches, because there are too many elements that are used for layout or other purposes (notifications, hidden areas) such that the pre-determined list of semantic elements that are available simply don’t suffice to mark up a Web page/application completely.

    Conclusion

    It seems that in general it is impossible to determine which element(s) on a Web page should be the “main” piece of content that accessibility tools jump to when requested, that a search engine should put their focus on, or that should be highlighted to a general user to read. It would be very useful if the author of the Web page would provide a hint through a <main> element where that main content is to be found.

    I think that the <main> element becomes particularly useful when combined with a default keyboard shortcut in browsers as proposed by Steve: we may actually find that non-accessibility users will also start making use of this shortcut, e.g. to get to videos on YouTube pages directly without having to tab over search boxes and other interactive elements, etc. Worthwhile markup indeed.

  • What is “interoperable TTML” ?

    1er janvier 2014, par silvia

    I’ve just tried to come to terms with the latest state of TTML, the Timed Text Markup Language.

    TTML has been specified by the W3C Timed Text Working Group and released as a RECommendation v1.0 in November 2010. Since then, several organisations have tried to adopt it as their caption file format. This includes the SMPTE, the EBU (European Broadcasting Union), and Microsoft.

    Both, Microsoft and the EBU actually looked at TTML in detail and decided that in order to make it usable for their use cases, a restriction of its functionalities is needed.

    EBU-TT

    The EBU released EBU-TT, which restricts the set of valid attributes and feature. “The EBU-TT format is intended to constrain the features provided by TTML, especially to make EBU-TT more suitable for the use with broadcast video and web video applications.” (see EBU-TT).

    In addition, EBU-specific namespaces were introduce to extend TTML with EBU-specific data types, e.g. ebuttdt:frameRateMultiplierType or ebuttdt:smpteTimingType. Similarly, a bunch of metadata elements were introduced, e.g. ebuttm:documentMetadata, ebuttm:documentEbuttVersion, or ebuttm:documentIdentifier.

    The use of namespaces as an extensibility mechanism will ascertain that EBU-TT files continue to be valid TTML files. However, any vanilla TTML parser will not know what to do with these custom extensions and will drop them on the floor.

    Simple Delivery Profile

    With the intention to make TTML ready for “internet delivery of Captions originated in the United States”, Microsoft proposed a “Simple Delivery Profile for Closed Captions (US)” (see Simple Profile). The Simple Profile is also a restriction of TTML.

    Unfortunately, the Microsoft profile is not the same as the EBU-TT profile: for example, it contains the “set” element, which is not conformant in EBU-TT. Similarly, the supported style features are different, e.g. Simple Profile supports “display-region”, while EBU-TT does not. On the other hand, EBU-TT supports monospace, sans-serif and serif fonts, while the Simple profile does not.

    Thus files created for the Simple Delivery Profile will not work on players that expect EBU-TT and the reverse.

    Fortunately, the Simple Delivery Profile does not introduce any new namespaces and new features, so at least it is an explicit subpart of TTML and not both a restriction and extension like EBU-TT.

    SMPTE-TT

    SMPTE also created a version of the TTML standard called SMPTE-TT. SMPTE did not decide on a subset of TTML for their purposes – it was simply adopted as a complete set. “This Standard provides a framework for timed text to be supported for content delivered via broadband means,…” (see SMPTE-TT).

    However, SMPTE extended TTML in SMPTE-TT with an ability to store a binary blob with captions in another format. This allows using SMPTE-TT as a transport format for any caption format and is deemed to help with “backwards compatibility”.

    Now, instead of specifying a profile, SMPTE decided to define how to convert CEA-608 captions to SMPTE-TT. Even if it’s not called a “profile”, that’s actually what it is. It even has its own namespace: “m608:”.

    Conclusion

    With all these different versions of TTML, I ask myself what a video player that claims support for TTML will do to get something working. The only chance it has is to implement all the extensions defined in all the different profiles. I pity the player that has to deal with a SMPTE-TT file that has a binary blob in it and is expected to be able to decode this.

    Now, what is a caption author supposed to do when creating TTML? They obviously cannot expect all players to be able to play back all TTML versions. Should they create different files depending on what platform they are targeting, i.e. a EBU-TT version, a SMPTE-TT version, a vanilla TTML version, and a Simple Delivery Profile version? Should they by throwing all the features of all the versions into one TTML file and hope that the players will pick out the right things that they require and drop the rest on the floor?

    Maybe the best way to progress would be to make a list of the “safe” features: those features that every TTML profile supports. That may be the best way to get an “interoperable TTML” file. Here’s me hoping that this minimal set of features doesn’t just end up being the usual (starttime, endtime, text) triple.

    UPDATE:

    I just found out that UltraViolet have their own profile of SMPTE-TT called CFF-TT (see UltraViolet FAQ and spec). They are making some SMPTE-TT fields optional, but introduce a new @forcedDisplayMode attribute under their own namespace “cff:”.

  • Why I became a HTML5 co-editor

    1er janvier 2014, par silvia

    A few weeks ago, I had the honor to be appointed as part of the editorial team of the W3C HTML5 specification.

    Since Ian Hickson had recently decided to focus solely on editing the WHATWG HTML living standard specification, the W3C started looking for other editors to take the existing HTML5 specification to REC level. REC level is what other standards organizations call a “ratified standard”.

    But what does REC level really mean for HTML?

    In my probably somewhat subjective view, recommendation level means that a snapshot is taken of the continuously evolving HTML spec, which has a comprehensive feature set, that is implemented in a cross-browser interoperable way, has a complete test set for the features, and has received wide review. The latter implies that other groups in the W3C have had a chance to look at the specification and make sure it satisfies their basic requirements, which include e.g. applicability to all users (accessibility, internationalization), platforms, and devices (mobile, TV).

    Basically it means that we stop for a “moment”, take a deep breath, polish the feature set that we’ve been working on this far, and make sure we all agree on it, before we get back to changing the world with cool new stuff. In a software project we would call it a release branch with feature freeze.

    Now, as productive as that may sound for software – it’s not actually that exciting for a specification. Firstly, the most exciting things happen when writing new features. Secondly, development of browsers doesn’t just magically stop to get the release (REC) happening. And lastly, if we’ve done our specification work well, there should be only little work to do. Basically, it’s the unthankful work of tidying up that we’re looking at here. :-)

    So, why am I doing it? I am not doing this for money – I’m currently part-time contracting to Google’s accessibility team working on video accessibility and this editor work is not covered by my contract. It wasn’t possible to reconcile polishing work on a specification with the goals of my contract, which include pushing new accessibility features forward. Therefore, when invited, I decided to offer my spare time to the W3C.

    I’m giving this time under the condition that I’d only be looking at accessibility and video related sections. This is where my interest and expertise lie, and where I’m passionate to get things right. I want to make sure that we create accessibility features that will be implemented and that we polish existing video features. I want to make sure we don’t digress from implementations which continue to get updated and may follow the WHATWG spec or HTML.next or other needs.

    I am not yet completely sure what the editorship will entail. Will we look at tests, too? Will we get involved in HTML.next? This far we’ve been preparing for our work by setting up adequate version control repositories, building a spec creation process, discussing how to bridge to the WHATWG commits, and analysing the long list of bugs to see how to cope with them. There’s plenty of actual text editing work ahead and the team is shaping up well! I look forward to the new experiences.

  • Video Conferencing in HTML5 : WebRTC via Web Sockets

    1er janvier 2014, par silvia

    A bit over a week ago I gave a presentation at Web Directions Code 2012 in Melbourne. Maxine and John asked me to speak about something related to HTML5 video, so I went for the new shiny: WebRTC – real-time communication in the browser.

    Presentation slides

    I only had 20 min, so I had to make it tight. I wanted to show off video conferencing without special plugins in Google Chrome in just a few lines of code, as is the promise of WebRTC. To a large extent, I achieved this. But I made some interesting discoveries along the way. Demos are in the slide deck.

    UPDATE: Opera 12 has been released with WebRTC support.

    Housekeeping: if you want to replicate what I have done, you need to install a Google Chrome Web Browser 19+. Then make sure you go to chrome://flags and activate the MediaStream and PeerConnection experiment(s). Restart your browser and now you can experiment with this feature. Big warning up-front: it’s not production-ready, since there are still changes happening to the spec and there is no compatible implementation by another browser yet.

    Here is a brief summary of the steps involved to set up video conferencing in your browser:

    1. Set up a video element each for the local and the remote video stream.
    2. Grab the local camera and stream it to the first video element.
    3. (*) Establish a connection to another person running the same Web page.
    4. Send the local camera stream on that peer connection.
    5. Accept the remote camera stream into the second video element.

    Now, the most difficult part of all of this – believe it or not – is the signalling part that is required to build the peer connection (marked with (*)). Initially I wanted to run completely without a server and just enter the remote’s IP address to establish the connection. This is, however, not a functionality that the PeerConnection object provides [might this be something to add to the spec?].

    So, you need a server known to both parties that can provide for the handshake to set up the connection. All the examples that I have seen, such as https://apprtc.appspot.com/, use a channel management server on Google’s appengine. I wanted it all working with HTML5 technology, so I decided to use a Web Socket server instead.

    I implemented my Web Socket server using node.js (code of websocket server). The video conferencing demo is in the slide deck in an iframe – you can also use the stand-alone html page. Works like a treat.

    While it is still using Google’s STUN server to get through NAT, the messaging for setting up the connection is running completely through the Web Socket server. The messages that get exchanged are plain SDP message packets with a session ID. There are OFFER, ANSWER, and OK packets exchanged for each streaming direction. You can see some of it in the below image:

    WebRTC demo

    I’m not running a public WebSocket server, so you won’t be able to see this part of the presentation working. But the local loopback video should work.

    At the conference, it all went without a hitch (while the wireless played along). I believe you have to host the WebSocket server on the same machine as the Web page, otherwise it won’t work for security reasons.

    A whole new world of opportunities lies out there when we get the ability to set up video conferencing on every Web page – scary and exciting at the same time!

  • New Challenges

    1er janvier 2014, par silvia

    I finished up at Google last week and am now working at NICTA, an Australian ICT research institute.

    My work with Google was exciting and I learned a lot. I like to think that Google also got a lot out of me – I coded and contributed to some YouTube caption features, I worked on Chrome captions and video controls, and above all I worked on video accessibility for HTML at the W3C.

    I was one of the key authors of the W3C Media Accessibility Requirements document that we created in the Media Accessibility Task Force of the W3C HTML WG. I then went on to help make video accessibility a reality. We created WebVTT and the <track> element and applied it to captions, subtitles, chapters (navigation), video descriptions, and metadata. To satisfy the need for synchronisation of video with other media resources such as sign language video or audio descriptions, we got the MediaController object and the @mediagroup attribute.

    I must say it was a most rewarding time. I learned a lot about being productive at Google, collaborate successfully over the distance, about how the WebKit community works, and about the new way of writing W3C standard (which is more like pseudo-code). As one consequence, I am now a co-editor of the W3C HTML spec and it seems I am also about to become the editor of the WebVTT spec.

    At NICTA my new focus of work is WebRTC. There is both a bit of research and a whole bunch of application development involved. I may even get to do some WebKit development, if we identify any issues with the current implementation. I started a week ago and am already amazed by the amount of work going on in the WebRTC space and the amazing number of open source projects playing around with it. Video conferencing is a new challenge and I look forward to it.