Recherche avancée

Médias (1)

Mot : - Tags -/publier

Autres articles (65)

Sur d’autres sites (12568)

  • WebVTT as a W3C Recommendation

    2 décembre 2013, par silvia

    Three weeks ago I attended TPAC, the annual meeting of W3C Working Groups. One of the meetings was of the Timed Text Working Group (TT-WG), that has been specifying TTML, the Timed Text Markup Language. It is now proposed that WebVTT be also standardised through the same Working Group.

    How did that happen, you may ask, in particular since WebVTT and TTML have in the past been portrayed as rival caption formats ? How will the WebVTT spec that is currently under development in the Text Track Community Group (TT-CG) move through a Working Group process ?

    I’ll explain first why there is a need for WebVTT to become a W3C Recommendation, and then how this is proposed to be part of the Timed Text Working Group deliverables, and finally how I can see this working between the TT-CG and the TT-WG.

    Advantages of a W3C Recommendation

    TTML is a XML-based markup format for captions developed during the time that XML was all the hotness. It has become a W3C standard (a so-called “Recommendation”) despite not having been implemented in any browsers (if you ask me : that’s actually a flaw of the W3C standardisation process : it requires only two interoperable implementations of any kind – and that could be anyone’s JavaScript library or Flash demonstrator – it doesn’t actually require browser implementations. But I digress…). To be fair, a subpart of TTML is by now implemented in Internet Explorer, but all the other major browsers have thus far rejected proposals of implementation.

    Because of its Recommendation status, TTML has become the basis for several other caption standards that other SDOs have picked : the SMPTE’s SMPTE-TT format, the EBU’s EBU-TT format, and the DASH Industry Forum’s use of SMPTE-TT. SMPTE-TT has also become the “safe harbour” format for the US legislation on captioning as decided by the FCC. (Note that the FCC requirements for captions on the Web are actually based on a list of features rather than requiring a specific format. But that will be the topic of a different blog post…)

    WebVTT is much younger than TTML. TTML was developed as an interchange format among caption authoring systems. WebVTT was built for rendering in Web browsers and with HTML5 in mind. It meets the requirements of the <track> element and supports more than just captions/subtitles. WebVTT is popular with browser developers and has already been implemented in all major browsers (Firefox Nightly is the last to implement it – all others have support already released).

    As we can see and as has been proven by the HTML spec and multiple other specs : browsers don’t wait for specifications to have W3C Recommendation status before they implement them. Nor do they really care about the status of a spec – what they care about is whether a spec makes sense for the Web developer and user communities and whether it fits in the Web platform. WebVTT has obviously achieved this status, even with an evolving spec. (Note that the spec tries very hard not to break backwards compatibility, thus all past implementations will at least be compatible with the more basic features of the spec.)

    Given that Web browsers don’t need WebVTT to become a W3C standard, why then should we spend effort in moving the spec through the W3C process to become a W3C Recommendation ?

    The modern Web is now much bigger than just Web browsers. Web specifications are being used in all kinds of devices including TV set-top boxes, phone and tablet apps, and even unexpected devices such as white goods. Videos are increasingly omnipresent thus exposing deaf and hard-of-hearing users to ever-growing challenges in interacting with content on diverse devices. Some of these devices will not use auto-updating software but fixed versions so can’t easily adapt to new features. Thus, caption producers (both commercial and community) need to be able to author captions (and other video accessibility content as defined by the HTML5
    element) towards a feature set that is clearly defined to be supported by such non-updating devices.

    Understandably, device vendors in this space have a need to build their technology on standardised specifications. SDOs for such device technologies like to reference fixed specifications so the feature set is not continually updating. To reference WebVTT, they could use a snapshot of the specification at any time and reference that, but that’s not how SDOs work. They prefer referencing an officially sanctioned and tested version of a specification – for a W3C specification that means creating a W3C Recommendation of the WebVTT spec.

    Taking WebVTT on a W3C recommendation track is actually advantageous for browsers, too, because a test suite will have to be developed that proves that features are implemented in an interoperable manner. In summary, I can see the advantages and personally support the effort to take WebVTT through to a W3C Recommendation.

    Choice of Working Group

    FAIK this is the first time that a specification developed in a Community Group is being moved into the recommendation track. This is something that has been expected when the W3C created CGs, but not something that has an established process yet.

    The first question of course is which WG would take it through to Recommendation ? Would we create a new Working Group or find an existing one to move the specification through ? Since WGs involve a lot of overhead, the preference was to add WebVTT to the charter of an existing WG. The two obvious candidates were the HTML WG and the TT-WG – the first because it’s where WebVTT originated and the latter because it’s the closest thematically.

    Adding a deliverable to a WG is a major undertaking. The TT-WG is currently in the process of re-chartering and thus a suggestion was made to add WebVTT to the milestones of this WG. TBH that was not my first choice. Since I’m already an editor in the HTML WG and WebVTT is very closely related to HTML and can be tested extensively as part of HTML, I preferred the HTML WG. However, adding WebVTT to the TT-WG has some advantages, too.

    Since TTML is an exchange format, lots of captions that will be created (at least professionally) will be in TTML and TTML-related formats. It makes sense to create a mapping from TTML to WebVTT for rendering in browsers. The expertise of both, TTML and WebVTT experts is required to develop a good mapping – as has been shown when we developed the mapping from CEA608/708 to WebVTT. Also, captioning experts are already in the TT-WG, so it helps to get a second set of eyes onto WebVTT.

    A disadvantage of moving a specification out of a CG into a WG is, however, that you potentially lose a lot of the expertise that is already involved in the development of the spec. People don’t easily re-subscribe to additional mailing lists or want the additional complexity of involving another community (see e.g. this email).

    So, a good process needs to be developed to allow everyone to contribute to the spec in the best way possible without requiring duplicate work. How can we do that ?

    The forthcoming process

    At TPAC the TT-WG discussed for several hours what the next steps are in taking WebVTT through the TT-WG to recommendation status (agenda with slides). I won’t bore you with the different views – if you are keen, you can read the minutes.

    What I came away with is the following process :

    1. Fix a few more bugs in the CG until we’re happy with the feature set in the CG. This should match the feature set that we realistically expect devices to implement for a first version of the WebVTT spec.
    2. Make a FSA (Final Specification Agreement) in the CG to create a stable reference and a clean IPR position.
    3. Assuming that the TT-WG’s charter has been approved with WebVTT as a milestone, we would next bring the FSA specification into the TT-WG as FPWD (First Public Working Draft) and immediately do a Last Call which effectively freezes the feature set (this is possible because there has already been wide community review of the WebVTT spec) ; in parallel, the CG can continue to develop the next version of the WebVTT spec with new features (just like it is happening with the HTML5 and HTML5.1 specifications).
    4. Develop a test suite and address any issues in the Last Call document (of course, also fix these issues in the CG version of the spec).
    5. As per W3C process, substantive and minor changes to Last Call documents have to be reported and raised issues addressed before the spec can progress to the next level : Candidate Recommendation status.
    6. For the next step – Proposed Recommendation status – an implementation report is necessary, and thus the test suite needs to be finalized for the given feature set. The feature set may also be reduced at this stage to just the ones implemented interoperably, leaving any other features for the next version of the spec.
    7. The final step is Recommendation status, which simply requires sufficient support and endorsement by W3C members.

    The first version of the WebVTT spec naturally has a focus on captioning (and subtitling), since this has been the dominant use case that we have focused on this far and it’s the part that is the most compatibly implemented feature set of WebVTT in browsers. It’s my expectation that the next version of WebVTT will have a lot more features related to audio descriptions, chapters and metadata. Thus, this seems a good time for a first version feature freeze.

    There are still several obstacles towards progressing WebVTT as a milestone of the TT-WG. Apart from the need to get buy-in from the TT-WG, the TT-CG, and the AC (Adivisory Committee who have to approve the new charter), we’re also looking at the license of the specification document.

    The CG specification has an open license that allows creating derivative work as long as there is attribution, while the W3C document license for documents on the recommendation track does not allow the creation of derivative work unless given explicit exceptions. This is an issue that is currently being discussed in the W3C with a proposal for a CC-BY license on the Recommendation track. However, my view is that it’s probably ok to use the different document licenses : the TT-WG will work on WebVTT 1.0 and give it a W3C document license, while the CG starts working on the next WebVTT version under the open CG license. It probably actually makes sense to have a less open license on a frozen spec.

    Making the best of a complicated world

    WebVTT is now proposed as part of the recharter of the TT-WG. I have no idea how complicated the process will become to achieve a W3C WebVTT 1.0 Recommendation, but I am hoping that what is outlined above will be workable in such a way that all of us get to focus on progressing the technology.

    At TPAC I got the impression that the TT-WG is committed to progressing WebVTT to Recommendation status. I know that the TT-CG is committed to continue developing WebVTT to its full potential for all kinds of media-time aligned content with new kinds already discussed at FOMS. Let’s enable both groups to achieve their goals. As a consequence, we will allow the two formats to excel where they do : TTML as an interchange format and WebVTT as a browser rendering format.

  • Investigating Steam for Linux

    1er mars 2013, par Multimedia Mike — Game Hacking

    Valve recently released the final, public version of their Steam client for Linux, and the Linux world rejoiced. At least, it probably did. The announcement was 2 weeks ago on Valentine’s Day and I had other things on my mind, so I missed any fanfare. When framed in this manner, the announcement timing becomes suspect– it’s as though Linux enthusiasts would have plenty of time that day or something.


    Valve Steam logo

    Taming the Frontier
    Speculation about a Linux Steam client had been kicking around for nearly as long as Steam has existed. However, sometime last year, the rumors became more substantive.

    I naturally wondered how to port something like Steam to Linux. I have some experience with trying to make a necessarily binary-only program that runs on Linux. I’m fairly well-versed in the assorted technical challenges that one might face when attempting such a feat. Because of this, whenever I hear rumors that a company might be entertaining the notion of porting a major piece of proprietary software to Linux, my instinctive reflex is, “What ?! Why, you fools ?! Save yourselves !”

    At least, that’s how it used to be. The proposal of developing a proprietary binary for Linux has been rendered considerably less insane by a few developments, for example :

    1. The rise of Ubuntu Linux as a quasi de facto standard for desktop Linux computing
    2. The increasing homogeneity in personal desktop computing technology

    What I would like to know is how the Steam client runs on Linux. Does it rely on any libraries being present on the system ? Or does it bring its own ? The latter is a trick that proprietary programs can use– transport all of the shared libraries that the main program binary depends upon, install them someplace out of the way on the filesystem, probably in /opt, and then make the main program a shell script which sets a preload path to rely on the known quantity libraries instead of the copies already on the system.

    Downloading and Installing the Client
    For this exercise, I installed x86_64 desktop Ubuntu 12.04 Linux on a l33t gaming rig that was totally top of the line about 5 years ago, and that someone didn’t want anymore and handed down to me recently. So it should be ideal for this project.

    At first, I was blown away– the Linux client is in a .deb package that is less than 2 MB large. I unpacked the steam.deb file and found a bunch of support libraries — mostly X11 and standard C/C++ runtimes. Just as I suspected. Still, I can’t believe how small the thing is. However, my amazement quickly abated when I actually ran Steam and saw this :


    Steam Linux Client -- initial update

    So it turns out steam.db is just the installer program which immediately proceeds to download an additional 160+ MB of data. So there’s actually a lot more information to possibly sift through.

    Another component of the installation is to basically run a big ‘apt-get install’ command to make sure a bunch of required packages are installed :


    Steam Linux Client -- install system packages

    After all these installation steps, the client was ready to run. However, whenever I tried to do so, I got this dialog which would cause Steam to close when the dialog was dismissed.


    Steam Linux Client -- Upgrade NVIDIA drivers

    Not a huge deal ; later NVIDIA drivers are fairly straightforward to install on Ubuntu Linux. After a few minutes of downloading, installing and restarting X, Steam ran with minimal complaint (it still had some issue regarding the video drivers but didn’t seem to consider it a deal-breaker).

    Using Steam on Linux

    So here’s Steam running on Linux :


    Steam Linux Client -- main screen

    If you have experience with using Steam on Windows or Mac, you might observe that it looks exactly the same. I don’t have a very expansive library of games (I only started using Steam because purchasing a few computer components a few years ago entitled me to some free Steam downloads of some of the games on the list in the screenshot). I didn’t really expect any of the games to have Linux versions yet, but it turns out that the indie darling FTL : Faster Than Light has been ported to Linux. FTL was a much-heralded Kickstarter success story and sounded like something I wanted to support. I purchased this from Steam shortly after its release last year and was able to download the Linux version at no additional cost with a single click.

    It runs natively on Linux (note the Ubuntu desktop window decorations) :


    FTL game running on Linux through Steam

    You might notice from the main Steam client that, despite purchasing FTL about a 1/2 year ago and starting it up at least a 1/2 dozen times, I haven’t really invested a whole lot of time into it. I only managed to get about 2 minutes further this time :


    A few more minutes in FTL

    What can I say ? This game just bores me to tears. It’s frustrating because I know that this is one of the cool games that all real gamers are supposed to like, but I practically catch myself nodding off every time I try to run through the tutorial. It’s strange to think that I’ve invested far more time into games that offer considerably less stimulation. That’s probably because I had far more free time compared to gaming options during those times.

    But that’s neither here nor there. We’ll file this under “games that aren’t for me.” I’m glad that people like FTL and a little indie underdog has met with such success. And I’m pleased that Steam on Linux works. It’s native and the games are also native, which is all quite laudable (there was speculation that everything would just be running on top of a Wine layer).

    Deeper Analysis
    So I set out wondering how Steam was able to create a proprietary program that would satisfy a large enough cross-section of Linux users (i.e., on different platforms and distros). Answer : well, they didn’t, per the stated requirements. The installation is only tuned to work on Ubuntu 12.04. However, it works on both 32- and 64-bit platforms, the only 2 desktop CPU platforms that matter these days (unless ARM somehow makes inroads on the desktop). The Steam client is quite clearly an x86_32 binary– look at the terminal screenshot above and observe that it’s downloading all :i386 support libraries.

    The file /usr/bin/steam isn’t a binary but a launcher shell script (something you’ll also see if you investigate /usr/bin/firefox on a Linux system). Here’s an interesting tidbit :

    function detect_platform()
    
      # Maybe be smarter someday
      # Right now this is the only platform we have a bootstrap for, so hard-code it.
      echo ubuntu12_32
    
    

    I wager that it’s possible to get Steam running on other distributions, it probably just takes a little more effort (assuming that Steam doesn’t put too much effort into thwarting such attempts).

    As for the FTL game, it comes with binaries and libraries for both x86_32 and x86_64. So, good work to the dev team for creating and testing both versions. FTL also distributes versions of the libraries it expects to work with.

    I suspect that the Steam client overall is largely a WWW rendering engine underneath the covers. That would help explain how Valve is able to achieve such a consistent look and feel, not only across OS platforms, but also through a web browser. When I browse the Steam store through Google Chrome, it looks and feels exactly like the native desktop client. When I first thought of how someone could port Steam to Linux, I immediately wondered about how they would do the UI.

    A little Googling for “steam uses webkit” (just a hunch) confirms my hypothesis.

  • Adjusting The Timetable and SQL Shame

    16 août 2012, par Multimedia Mike — General, Python, sql

    My Game Music Appreciation website has a big problem that many visitors quickly notice and comment upon. The problem looks like this :



    The problem is that all of these songs are 2m30s in length. During the initial import process, unless a chiptune file already had curated length metadata attached, my metadata utility emitted a default play length of 150 seconds. This is not good if you want to listen to all the songs in a soundtrack without interacting with the player page, but have various short songs (think “game over” or other quick jingles) that are over in a few seconds. Such songs still pad out 150 seconds of silence.

    So I needed to correct this. Possible solutions :

    1. Manually : At first, I figured I could ask the database which songs needed fixing and listen to them to determine the proper lengths. Then I realized that there were well over 1400 games affected by this problem. This just screams “automated solution”.
    2. Automatically : Ask the database which songs need fixing and then somehow ask the computer to listen to the songs and decide their proper lengths. This sounds like a winner, provided that I can figure out how to programmatically determine if a song has “finished”.

    SQL Shame
    This play adjustment task has been on my plate for a long time. A key factor that has blocked me is that I couldn’t figure out a single SQL query to feed to the SQLite database underlying the site which would give me all the songs I needed. To be clear, it was very simple and obvious to me how to write a program that would query the database in phases to get all the information. However, I felt that it would be impure to proceed with the task unless I could figure out one giant query to get all the information.

    This always seems to come up whenever I start interacting with a database in any serious way. I call it SQL shame. This task got some traction when I got over this nagging doubt and told myself that there’s nothing wrong with the multi-step query program if it solves the problem at hand.

    Suddenly, I had a flash of inspiration about why the so-called NoSQL movement exists. Maybe there are a lot more people who don’t like trying to derive such long queries and are happy to allow other languages to pick up the slack.

    Estimating Lengths
    Anyway, my solution involved writing a Python script to iterate through all the games whose metadata was output by a certain engine (the one that makes the default play length 150 seconds). For each of those games, the script queries the song table and determines if each song is exactly 150 seconds. If it is, then go to work trying to estimate the true length.

    The forgoing paragraph describes what I figured was possible with only a single (possibly large) SQL query.

    For each song represented in the chiptune file, I ran it through a custom length estimator program. My brilliant (err, naïve) solution to the length estimation problem was to synthesize seconds of audio up to a maximum of 120 seconds (tightening up the default length just a bit) and counting how many of those seconds had all 0 samples. If the count reached 5 consecutive seconds of silence, then the estimator rewound the running length by 5 seconds and declared that to be the proper length. Update the database.

    There were about 1430 chiptune files whose songs needed updates. Some files had 1 single song. Some files had over 100. When I let the script run, it took nearly 65 minutes to process all the files. That was a single-threaded solution, of course. Even though I already had the data I needed, I wanted to try to hand at parallelizing the script. So I went to work with Python’s multiprocessing module and quickly refactored it to use all 4 CPU threads on the machine where the files live. Results :

    • Single-threaded solution : 64m42s to process corpus (22 games/minute)
    • Multi-threaded solution : 18m48s with 4 CPU threads (75 games/minute)

    More than a 3x speedup across 4 CPU threads, which is decent for a primarily CPU-bound operation.

    Epilogue
    I suspect that this task will require some refinement or manual intervention. Maybe there are songs which actually have more than 5 legitimate seconds of silence. Also, I entertained the possibility that some songs would generate very low amplitude noise rather than being perfectly silent. In that case, I could refine the script to stipulate that amplitudes below a certain threshold count as 0. Fortunately, I marked which games were modified by this method, so I can run a new script as necessary.

    SQL Schema
    Here is the schema of my SQlite3 database, for those who want to try their hand at a proper query. I am confident that it’s possible ; I just didn’t have the patience to work it out. The task is to retrieve all the rows from the games table where all of the corresponding songs in the songs table is 150000 milliseconds.

    1. CREATE TABLE games
    2.   (
    3.    id INTEGER PRIMARY KEY AUTOINCREMENT,
    4.    uncompressed_sha1 TEXT,
    5.    uncompressed_size INTEGER,
    6.    compressed_sha1 TEXT,
    7.    compressed_size INTEGER,
    8.    system TEXT,
    9.    game TEXT,
    10.    gme_system TEXT default NULL,
    11.    canonical_url TEXT default NULL,
    12.    extension TEXT default "gamemusicxz",
    13.    enabled INTEGER default 1,
    14.    redirect_to_id INT DEFAULT -1,
    15.    play_lengths_modified INT DEFAULT NULL) ;
    16. CREATE TABLE songs
    17.   (
    18.    game_id INTEGER,
    19.    song_number INTEGER NOT NULL,
    20.    song TEXT,
    21.    author TEXT,
    22.    copyright TEXT,
    23.    dumper TEXT,
    24.    length INTEGER,
    25.    intro_length INTEGER,
    26.    loop_length INTEGER,
    27.    play_length INTEGER,
    28.    play_order INTEGER default -1) ;
    29. CREATE TABLE tags
    30.   (
    31.    game_id INTEGER,
    32.    tag TEXT NOT NULL,
    33.    tag_type TEXT default "filename") ;
    34. CREATE INDEX gameid_index_songs ON songs(game_id) ;
    35. CREATE INDEX gameid_index_tag ON tags(game_id) ;
    36. CREATE UNIQUE INDEX sha1_index ON games(uncompressed_sha1) ;