
Recherche avancée
Médias (1)
-
The pirate bay depuis la Belgique
1er avril 2013, par
Mis à jour : Avril 2013
Langue : français
Type : Image
Autres articles (94)
-
Configuration spécifique pour PHP5
4 février 2011, parPHP5 est obligatoire, vous pouvez l’installer en suivant ce tutoriel spécifique.
Il est recommandé dans un premier temps de désactiver le safe_mode, cependant, s’il est correctement configuré et que les binaires nécessaires sont accessibles, MediaSPIP devrait fonctionner correctement avec le safe_mode activé.
Modules spécifiques
Il est nécessaire d’installer certains modules PHP spécifiques, via le gestionnaire de paquet de votre distribution ou manuellement : php5-mysql pour la connectivité avec la (...) -
Les formats acceptés
28 janvier 2010, parLes commandes suivantes permettent d’avoir des informations sur les formats et codecs gérés par l’installation local de ffmpeg :
ffmpeg -codecs ffmpeg -formats
Les format videos acceptés en entrée
Cette liste est non exhaustive, elle met en exergue les principaux formats utilisés : h264 : H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 m4v : raw MPEG-4 video format flv : Flash Video (FLV) / Sorenson Spark / Sorenson H.263 Theora wmv :
Les formats vidéos de sortie possibles
Dans un premier temps on (...) -
Les autorisations surchargées par les plugins
27 avril 2010, parMediaspip core
autoriser_auteur_modifier() afin que les visiteurs soient capables de modifier leurs informations sur la page d’auteurs
Sur d’autres sites (7252)
-
How to set crontab in order to run multiple python and a shell scripts ?
5 janvier 2021, par Alexander MitsouI need to start three python3 scripts and a shell script using crontab. These scripts should run at the same time without any delay. Each script runs exactly for one minute. For instance I have scheduled crontab to run these scripts every 5 minutes.


My problem is that, if I attempt to run each script individually from terminal it executes with no further errors, but using crontab nothing happens.


DISCLAIMER : If I set up the Python3 scripts individually in crontab, they work fine !


Here's my crontab set up :


*/5 * * * * cd /home/user/Desktop/ && /usr/bin/python3 script1.py >> report1.log

*/5 * * * * cd /home/user/Desktop/ && /usr/bin/python3 script2.py >> report2.log

*/5 * * * * cd /home/user/Desktop/ && /usr/bin/python3 script3.py >> report3.log

*/5 * * * * cd /home/user/Desktop/ && /usr/bin/sh script4.sh >> report4.log 



In addition I need to mention that the shell script contains this command (FFMPEG) :


#!/bin/bash

parent_dir=`dirname \`pwd\`` 
folder_name="/Data/Webcam" 
new_path=$parent_dir$folder_name 


if [ -d "$new_path" ]; then
 echo "video_audio folder exists..."
else
 echo "Creating video_audio folder in the current directory..."
 mkdir -p -- "$new_path"
 sudo chmod 777 "$new_path"
 echo "Folder created"
 echo
fi

now=$(date +%F) 
now="$( echo -e "$now" | tr '-' '_' )"
sub_dir=$new_path'/'$now 

if [ -d "$sub_dir" ]; then
 echo "Date Sub-directory exists..."
 echo
else
 echo "Error: ${sub_dir} not found..."
 echo "Creating date sub-directory..."
 mkdir -p -- "$sub_dir"
 sudo chmod 777 "$sub_dir"
 echo "Date sub-directory created..."
 echo
fi

fname=$(date +%H_%M_%S)".avi"
video_dir=$sub_dir'/'$fname
ffmpeg -f pulse -ac 1 -i default -f v4l2 -i /dev/video0 -vcodec libx264 -t 00:01:00 $video_dir 



The log file of that script contain the following :


video_audio folder exists...
Date Sub-directory exists...

Package ffmpeg is already installed...
Package v4l-utils is already installed...

Package: ffmpeg
Status: install ok installed
Priority: optional
Section: video
Installed-Size: 2010
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Architecture: amd64
Multi-Arch: foreign
Version: 7:4.2.4-1ubuntu0.1
Replaces: libav-tools (<< 6:12~~), qt-faststart (<< 7:2.7.1-3~)
Depends: libavcodec58 (= 7:4.2.4-1ubuntu0.1), libavdevice58 (= 7:4.2.4-1ubuntu0.1), libavfilter7 (= 7:4.2.4-1ubuntu0.1), libavformat58 (= 7:4.2.4-1ubuntu0.1), libavresample4 (= 7:4.2.4-1ubuntu0.1), libavutil56 (= 7:4.2.4-1ubuntu0.1), libc6 (>= 2.29), libpostproc55 (= 7:4.2.4-1ubuntu0.1), libsdl2-2.0-0 (>= 2.0.10), libswresample3 (= 7:4.2.4-1ubuntu0.1), libswscale5 (= 7:4.2.4-1ubuntu0.1)
Suggests: ffmpeg-doc
Breaks: libav-tools (<< 6:12~~), qt-faststart (<< 7:2.7.1-3~), winff (<< 1.5.5-5~)
Description: Tools for transcoding, streaming and playing of multimedia files
 FFmpeg is the leading multimedia framework, able to decode, encode, transcode,
 mux, demux, stream, filter and play pretty much anything that humans and
 machines have created. It supports the most obscure ancient formats up to the
 cutting edge.
 .
 This package contains:
 * ffmpeg: a command line tool to convert multimedia files between formats
 * ffplay: a simple media player based on SDL and the FFmpeg libraries
 * ffprobe: a simple multimedia stream analyzer
 * qt-faststart: a utility to rearrange Quicktime files
Homepage: https://ffmpeg.org/
Original-Maintainer: Debian Multimedia Maintainers <debian-multimedia@lists.debian.org>
Package: v4l-utils
Status: install ok installed
Priority: optional
Section: utils
Installed-Size: 2104
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Architecture: amd64
Version: 1.18.0-2build1
Replaces: ivtv-utils (<< 1.4.1-2), media-ctl
Depends: libv4l-0 (= 1.18.0-2build1), libv4l2rds0 (= 1.18.0-2build1), libc6 (>= 2.17), libgcc-s1 (>= 3.0), libstdc++6 (>= 5.2), libudev1 (>= 183)
Breaks: ivtv-utils (<< 1.4.1-2), media-ctl
Description: Collection of command line video4linux utilities
 v4l-utils contains the following video4linux command line utilities:
 .
 decode_tm6000: decodes tm6000 proprietary format streams
 rds-ctl: tool to receive and decode Radio Data System (RDS) streams
 v4l2-compliance: tool to test v4l2 API compliance of drivers
 v4l2-ctl, cx18-ctl, ivtv-ctl: tools to control v4l2 controls from the cmdline
 v4l2-dbg: tool to directly get and set registers of v4l2 devices
 v4l2-sysfs-path: sysfs helper tool
Original-Maintainer: Gregor Jasny <gjasny@googlemail.com>
Homepage: https://linuxtv.org/downloads/v4l-utils/



Due to the reason that the python files are of the same structure I'm uploading a sample file here :


# -*- coding: utf-8 -*-
from threading import Timer
from pynput.mouse import Listener
import logging
import sys
import os
sys.path.insert(0, os.path.join(os.path.dirname(
 os.path.realpath(__file__)), "../"))

from Functions import utils as ut

if __name__=='__main__':

 ut.initialize_dirs()
 rec_file = ''.join(('mouse_',ut.get_date(),'.txt'))
 raw_data = ut.get_name('Mouse')
 rec_file = os.path.join(raw_data,rec_file)
 logging.basicConfig(filename=rec_file,level=logging.DEBUG,format="%(asctime)s %(message)s")

 try:
 with Listener(on_move=ut.on_move, on_click=ut.on_click,on_scroll=ut.on_scroll) as listener:
 Timer(60, listener.stop).start()
 listener.join()
 except KeyboardInterrupt as err:
 print(err)
 sys.exit(0)

 print('Exiting logger...')




I'm also uploading the functions that I use :


# -*- coding: utf-8 -*-
from serial import Serial
from datetime import datetime, timedelta
import pandas as pd
import collections
import logging
import shutil
import serial
import time
import sys
import os

click_held = False
button = None


def on_move(x,y):
 """The callback to call when mouse move events occur

 Args:
 x (float): The new pointer position
 y (float): The new pointer poisition
 """
 if click_held:
 logging.info("MV {0:>8} {1:>8} {2:>8}:".format(x,y,str(None)))
 else:
 logging.info("MV {0:>8} {1:>8} {2:>8}:".format(x,y,str(None)))


def on_click(x,y,button,pressed):
 """The callback to call when a mouse button is clicked

 Args:
 x (float): Mouse coordinates on screen
 y (float): Mouse coordinates on screen
 button (str): one of the Button values
 pressed (bool): Pressed is whether the button was pressed
 """
 global click_held
 if pressed:
 click_held = True
 logging.info("CLK {0:>7} {1:>6} {2:>13}".format(x,y,button))
 else:
 click_held = False
 logging.info("RLS {0:>7} {1:>6} {2:>13}".format(x,y,button))


def on_scroll(x,y,dx,dy):
 """The callback to call when mouse scroll events occur

 Args:
 x (float): The new pointer position on screen
 y (float): The new pointer position on screen
 dx (int): The horizontal scroll. The units of scrolling is undefined
 dy (int): The vertical scroll. The units of scrolling is undefined
 """
 if dy == -1:
 logging.info("SCRD {0:>6} {1:>6} {2:>6}".format(x,y,str(None)))
 elif dy == 1:
 logging.info("SCRU {0:>6} {1:>6} {2:>6}".format(x,y,str(None)))
 else:
 pass


def on_press_keys(key):
 """The callback to call when a button is pressed.

 Args:
 key (str): A KeyCode,a Key or None if the key is unknown
 """
 subkeys = [
 'Key.alt','Key.alt_gr','Key.alt_r','Key.backspace',
 'Key.space','Key.ctrl','Key.ctrl_r','Key.down',
 'Key.up','Key.left','Key.right','Key.page_down',
 'Key.page_up','Key.enter','Key.shift','Key.shift_r'
 ]

 key = str(key).strip('\'')
 if(key in subkeys):
 #print(key)
 logging.info(key)
 else:
 pass


def record_chair(output_file):
 """Read the data stream coming from the serial monitor
 in order to get the sensor readings

 Args:
 output_file (str): The file name, where the data stream will be stored
 """
 serial_port = "/dev/ttyACM0"
 baud_rate = 9600
 ser = serial.Serial(serial_port,baud_rate)
 logging.basicConfig(filename=output_file,level=logging.DEBUG,format="%(asctime)s %(message)s")
 flag = False
 start = time.time()
 while time.time() - start < 60.0:
 try:
 serial_data = str(ser.readline().decode().strip('\r\n'))
 time.sleep(0.2)
 tmp = serial_data.split(' ')[0] #Getting Sensor Id
 if(tmp == 'A0'):
 flag = True
 if (flag and tmp != 'A4'):
 #print(serial_data)
 logging.info(serial_data)
 if(flag and tmp == 'A4'):
 flag = False
 #print(serial_data)
 logging.info(serial_data)
 except (UnicodeDecodeError, KeyboardInterrupt) as err:
 print(err)
 print(err.args)
 sys.exit(0)


def initialize_dirs():
 """Create the appropriate directories in order to save
 and process the collected data
 """
 current_path = os.path.abspath(os.getcwd())
 os.chdir('..')
 current_path = (os.path.abspath(os.curdir)) #/Multodal_User_Monitoring
 current_path = os.path.join(current_path,'Data')
 create_subdirs([current_path])

 #Create mouse log folder
 mouse = os.path.join(current_path,'Mouse')
 create_subdirs([mouse])
 #Create mouse subfolders
 names = concat_names(mouse)
 create_subdirs(names)

 #Create keyboard log folder
 keyboard = os.path.join(current_path,'Keyboard')
 create_subdirs([keyboard])
 #Create keyboard subfolders
 names = concat_names(keyboard)
 create_subdirs(names)

 #Create the chair log folder
 chair = os.path.join(current_path,'Chair')
 create_subdirs([chair])
 #Create chair subfolders
 names = concat_names(chair)
 create_subdirs(names)

 #Create webcam log folder
 webcam = os.path.join(current_path,'Webcam')
 create_subdirs([webcam])

def concat_names(dir) -> str:
 """Concatenate the given folder names
 with the appropriate path

 Args:
 dir (str): The directory to create the subfolders

 Returns:
 list: The new absolute paths
 """
 raw_data = os.path.join(dir,'Raw')
 edited_data = os.path.join(dir,'Edited_logs')
 csv_data = os.path.join(dir,'CSV')
 features = os.path.join(dir,'Features')
 dirs = [raw_data,edited_data,csv_data,features]
 return dirs


def create_subdirs(paths):
 """Create sub directories given some absolute paths

 Args:
 paths (list): A list containing the paths to be created
 """
 for index,path in enumerate(paths):
 if(os.path.isdir(paths[index])):
 pass
 else:
 os.mkdir(paths[index])


def round_down(num,divisor) -> int:
 """Round the number of lines contained into the recording file,
 down to the nearest multiple of the given divisor

 Args:
 num (int): The number of lines contained into the given log file
 divisor (int): The divisor in order to get tuples of divisor

 Returns:
 int: The nearest multiple of five
 """
 return num-(num%divisor)


def get_date() -> str:
 """Get the current date in order to properly name
 the recored log files
 Returns:
 str: The current date in: YY_MM_DD format
 """
 return datetime.now().strftime('%Y_%m_%d')


def get_name(modality) -> str:
 """Save the recorded log into /Data//Raw

 Args:
 modality (str): The log data source

 Returns:
 str: The absolute path where each recording is saved
 """
 current_path = os.path.abspath(os.getcwd())
 current_path = os.path.join(current_path,'Data')

 if modality == 'Chair':
 chair_path = os.path.join(current_path,modality,'Raw')
 return chair_path

 elif modality == 'Mouse':
 mouse_path = os.path.join(current_path,modality,'Raw')
 return mouse_path

 elif modality == 'Keyboard':
 keyboard_path = os.path.join(current_path,modality,'Raw')
 return keyboard_path


def crawl_dir(target,folder) -> str:
 """Enumerate all the given files in a directory
 based on the given file extension

 Args:
 target (str): The file to search for
 folder (str): The folder to search

 Returns:
 [type]: A list containing the file names
 """
 current_path = os.path.abspath(os.getcwd())
 path = os.path.join(current_path,folder)
 file_names =[]
 for f in os.listdir(path):
 if(f.endswith(target)):
 fname=os.path.join(path,f)
 file_names.append(fname)
 return file_names


def convert_keys2_csv(input_file,output_file):
 """Convert the data stream file(keylogger recording) from .txt to .csv format

 Args:
 input_file (str): The data stream file in .txt format
 output_file (str): The csv extension file name
 """
 df = pd.read_fwf(input_file)
 col_names = ['Date','Time','Key']
 df.to_csv(output_file,header=col_names,encoding='utf-8',index=False)


def convert_mouse2_csv(input_file,output_file):
 """Convert the data stream file(mouselogger recording) from .txt to .csv format

 Args:
 input_file (str): The data stream file in .txt format
 output_file (str): The csv extension file name
 """
 df = pd.read_fwf(input_file)
 col_names = ['Date','Time','Action','PosX','PosY','Button']
 df.to_csv(output_file,header=col_names,encoding='utf-8',index=False)


def convert_chair_2_csv(input_file,output_file):
 """Convert the data stream file(chair recording)
 from .txt to .csv format

 Args:
 input_file (str): The data stream file in .txt format
 output_file (str): The csv extension file name
 """
 if(os.path.isfile(input_file)):
 pass
 else:
 print('Invalid working directory...')
 print('Aborting...')
 sys.exit(0)

 tmp0,tmp1,tmp2,tmp3,tmp4 = 0,1,2,3,4

 line_number = 0
 for line in open(input_file).readlines():
 line_number += 1

 rounded_line = round_down(line_number,5)
 d = collections.defaultdict(list)

 with open(input_file,'r') as f1:
 lines = f1.readlines()
 for i in range(rounded_line // 5):
 #Sensor:Analog input 0 values
 Sid0 = lines[i+tmp0]
 temp = Sid0.split()
 d['Sid0'].append([temp[0],temp[1],temp[2],temp[3]])
 #Sensor:Analog input 1 values
 Sid1 = lines[i+tmp1]
 temp = Sid1.split()
 d['Sid1'].append([temp[0],temp[1],temp[2],temp[3]])
 #Sensor:Analog input 2 values
 Sid2 = lines[i+tmp2]
 temp = Sid2.split()
 d['Sid2'].append([temp[0],temp[1],temp[2],temp[3]])
 #Sensor:Analog input 3 values
 Sid3 = lines[i+tmp3]
 temp = Sid3.split()
 d['Sid3'].append([temp[0],temp[1],temp[2],temp[3]])
 #Sensor:Analog input 4 values
 Sid4 = lines[i+tmp4]
 temp = Sid4.split()
 d['Sid4'].append([temp[0],temp[1],temp[2],temp[3]])

 tmp0 += 4
 tmp1 += 4
 tmp2 += 4
 tmp3 += 4
 tmp4 += 4

 l = []
 for i in range(rounded_line // 5):
 date = d['Sid0'][i][0]
 time = d['Sid0'][i][1]
 A0_val = d['Sid0'][i][3]
 A1_val = d['Sid1'][i][3]
 A2_val = d['Sid2'][i][3]
 A3_val = d['Sid3'][i][3]
 A4_val = d['Sid4'][i][3]
 l.append([date,time,A0_val,A1_val,A2_val,A3_val,A4_val])

 sensor_readings_df = pd.DataFrame.from_records(l)
 sensor_readings_df.columns = ['Date','Time','A0','A1','A2','A3','A4']
 sensor_readings_df.to_csv(output_file, encoding='utf-8', index=False)
 del l


def parse_raw_data(modality):
 """Convert each modality's raw data into csv format and move
 the edited raw data into the appropriate Edited_logs folder

 Args:
 modality (str): The data source
 """
 #Change directories
 current_path = os.path.abspath(os.getcwd()) #/Functions
 os.chdir('..')
 current_path = (os.path.abspath(os.curdir)) #/Multimodal_User_Monitoring
 os.chdir('./Data')#/Multimodal_User_Monitoring/Data
 current_path = (os.path.abspath(os.curdir)) #/Multimodal_User_Monitoring/Data
 current_path = os.path.join(current_path,modality) #example: /Multimodal_User_Monitoring/Data/<modality>
 raw_data_path = os.path.join(current_path,'Raw')
 csv_data_path = os.path.join(current_path,'CSV')
 edited_logs_path = os.path.join(current_path,'Edited_logs')

 txt_names = crawl_dir('.txt',raw_data_path)
 csv_names = []
 for elem in txt_names:
 name = elem.split('/')[-1].split('.')[0]
 csv_name = name+'.csv'
 tmp = os.path.join(csv_data_path,csv_name)
 csv_names.append(tmp)

 if modality == 'Mouse':
 if len(txt_names) == len(csv_names):
 for i, elem in enumerate(txt_names):
 #for i in range(len(txt_names)):
 convert_mouse2_csv(txt_names[i],csv_names[i])
 shutil.move(txt_names[i],edited_logs_path)

 elif modality == 'Keyboard':
 if len(txt_names) == len(csv_names):
 for i, elem in enumerate(txt_names):
 #for i in range(len(txt_names)):
 convert_keys2_csv(txt_names[i],csv_names[i])
 shutil.move(txt_names[i],edited_logs_path)

 elif modality == 'Chair':
 if len(txt_names) == len(csv_names):
 for i, elem in enumerate(txt_names):
 #for i in range(len(txt_names)):
 convert_chair_2_csv(txt_names[i],csv_names[i])
 shutil.move(txt_names[i],edited_logs_path)

</modality>


I need to mention that the logs of the python3 scripts are empty


-
Things I Have Learned About Emscripten
1er septembre 2015, par Multimedia Mike — Cirrus Retro3 years ago, I released my Game Music Appreciation project, a website with a ludicrously uninspired title which allowed users a relatively frictionless method to experience a range of specialized music files related to old video games. However, the site required use of a special Chrome plugin. Ever since that initial release, my #1 most requested feature has been for a pure JavaScript version of the music player.
“Impossible !” I exclaimed. “There’s no way JS could ever run fast enough to run these CPU emulators and audio synthesizers in real time, and allow for the visualization that I demand !” Well, I’m pleased to report that I have proved me wrong. I recently quietly launched a new site with what I hope is a catchier title, meant to evoke a cloud-based retro-music-as-a-service product : Cirrus Retro. Right now, it’s basically the same as the old site, but without the wonky Chrome-specific technology.
Along the way, I’ve learned a few things about using Emscripten that I thought might be useful to share with other people who wish to embark on a similar journey. This is geared more towards someone who has a stronger low-level background (such as C/C++) vs. high-level (like JavaScript).
General Goals
Do you want to cross-compile an entire desktop application, one that relies on an extensive GUI toolkit ? That might be difficult (though I believe there is a path for porting qt code directly with Emscripten). Your better wager might be to abstract out the core logic and processes of the program and then create a new web UI to access them.Do you want to compile a game that basically just paints stuff to a 2D canvas ? You’re in luck ! Emscripten has a porting path for SDL. Make a version of your C/C++ software that targets SDL (generally not a tall order) and then compile that with Emscripten.
Do you just want to cross-compile some functionality that lives in a library ? That’s what I’ve done with the Cirrus Retro project. For this, plan to compile the library into a JS file that exports some public functions that other, higher-level, native JS (i.e., JS written by a human and not a computer) will invoke.
Memory Levels
When porting C/C++ software to JavaScript using Emscripten, you have to think on 2 different levels. Or perhaps you need to force JavaScript into a low level C lens, especially if you want to write native JS code that will interact with Emscripten-compiled code. This often means somehow allocating chunks of memory via JS and passing them to the Emscripten-compiled functions. And you wouldn’t believe the type of gymnastics you need to execute to get native JS and Emscripten-compiled JS to cooperate.
“Emscripten : Pointers and Pointers” is the best (and, really, ONLY) explanation I could find for understanding the basic mechanics of this process, at least when I started this journey. However, there’s a mistake in the explanation that left me confused for a little while, and I’m at a loss to contact the author (doesn’t anyone post a simple email address anymore ?).
Per the best of my understanding, Emscripten allocates a large JS array and calls that the memory space that the compiled C/C++ code is allowed to operate in. A pointer in C/C++ code will just be an index into that mighty array. Really, that’s not too far off from how a low-level program process is supposed to view memory– as a flat array.
Eventually, I just learned to cargo-cult my way through the memory allocation process. Here’s the JS code for allocating an Emscripten-compatible byte buffer, taken from my test harness (more on that later) :
var musicBuffer = fs.readFileSync(testSpec[’filename’]) ; var musicBufferBytes = new Uint8Array(musicBuffer) ; var bytesMalloc = player._malloc(musicBufferBytes.length) ; var bytes = new Uint8Array(player.HEAPU8.buffer, bytesMalloc, musicBufferBytes.length) ; bytes.set(new Uint8Array(musicBufferBytes.buffer)) ;
So, read the array of bytes from some input source, create a Uint8Array from the bytes, use the Emscripten _malloc() function to allocate enough bytes from the Emscripten memory array for the input bytes, then create a new array… then copy the bytes…
You know what ? It’s late and I can’t remember how it works exactly, but it does. It has been a few months since I touched that code (been fighting with front-end website tech since then). You write that memory allocation code enough times and it begins to make sense, and then you hope you don’t have to write it too many more times.
Multithreading
You can’t port multithreaded code to JS via Emscripten. JavaScript has no notion of threads ! If you don’t understand the computer science behind this limitation, a more thorough explanation is beyond the scope of this post. But trust me, I’ve thought about it a lot. In fact, the official Emscripten literature states that you should be able to port most any C/C++ code as long as 1) none of the code is proprietary (i.e., all the raw source is available) ; and 2) there are no threads.Yes, I read about the experimental pthreads support added to Emscripten recently. Don’t get too excited ; that won’t be ready and widespread for a long time to come as it relies on a new browser API. In the meantime, figure out how to make your multithreaded C/C++ code run in a single thread if you want it to run in a browser.
Printing Facility
Eventually, getting software to work boils down to debugging, and the most primitive tool in many a programmer’s toolbox is the humble print statement. A print statement allows you to inspect a piece of a program’s state at key junctures. Eventually, when you try to cross-compile C/C++ code to JS using Emscripten, something is not going to work correctly in the generated JS “object code” and you need to understand what. You’ll be pleading for a method of just inspecting one variable deep in the original C/C++ code.I came up with this simple printf-workalike called emprintf() :
#ifndef EMPRINTF_H #define EMPRINTF_H
#include <stdio .h>
#include <stdarg .h>
#include <emscripten .h>#define MAX_MSG_LEN 1000
/* NOTE : Don’t pass format strings that contain single quote (’) or newline
* characters. */
static void emprintf(const char *format, ...)
char msg[MAX_MSG_LEN] ;
char consoleMsg[MAX_MSG_LEN + 16] ;
va_list args ;/* create the string */
va_start(args, format) ;
vsnprintf(msg, MAX_MSG_LEN, format, args) ;
va_end(args) ;/* wrap the string in a console.log(’’) statement */
snprintf(consoleMsg, MAX_MSG_LEN + 16, "console.log(’%s’)", msg) ;/* send the final string to the JavaScript console */
emscripten_run_script(consoleMsg) ;
#endif /* EMPRINTF_H */
Put it in a file called “emprint.h”. Include it into any C/C++ file where you need debugging visibility, use emprintf() as a replacement for printf() and the output will magically show up on the browser’s JavaScript debug console. Heed the comments and don’t put any single quotes or newlines in strings, and keep it under 1000 characters. I didn’t say it was perfect, but it has helped me a lot in my Emscripten adventures.
Optimization Levels
Remember to turn on optimization when compiling. I have empirically found that optimizing for size (-Os) leads to the best performance all around, in addition to having the smallest size. Just be sure to specify some optimization level. If you don’t, the default is -O0 which offers horrible performance when running in JS.Static Compression For HTTP Delivery
JavaScript code compresses pretty efficiently, even after it has been optimized for size using -Os. I routinely see compression ratios between 3.5:1 and 5:1 using gzip.Web servers in this day and age are supposed to be smart enough to detect when a requesting web browser can accept gzip-compressed data and do the compression on the fly. They’re even supposed to be smart enough to cache compressed output so the same content is not recompressed for each request. I would have to set up a series of tests to establish whether either of the foregoing assertions are correct and I can’t be bothered. Instead, I took it into my own hands. The trick is to pre-compress the JS files and then instruct the webserver to serve these files with a ‘Content-Type’ of ‘application/javascript’ and a ‘Content-Encoding’ of ‘gzip’.
- Compress your large Emscripten-build JS files with ‘gzip’ : ‘gzip compiled-code.js’
- Rename them from extension .js.gz to .jsgz
- Tell the webserver to deliver .jsgz files with the correct Content-Type and Content-Encoding headers
To do that last step with Apache, specify these lines :
AddType application/javascript jsgz AddEncoding gzip jsgz
They belong in either a directory’s .htaccess file or in the sitewide configuration (/etc/apache2/mods-available/mime.conf works on my setup).
Build System and Build Time Optimization
Oh goodie, build systems ! I had a very specific manner in which I wanted to build my JS modules using Emscripten. Can I possibly coerce any of the many popular build systems to do this ? It has been a few months since I worked on this problem specifically but I seem to recall that the build systems I tried to used would freak out at the prospect of compiling stuff to a final binary target of .js.I had high hopes for Bazel, which Google released while I was developing Cirrus Retro. Surely, this is software that has been battle-tested in the harshest conditions of one of the most prominent software-developing companies in the world, needing to take into account the most bizarre corner cases and still build efficiently and correctly every time. And I have little doubt that it fulfills the order. Similarly, I’m confident that Google also has a team of no fewer than 100 or so people dedicated to developing and supporting the project within the organization. When you only have, at best, 1-2 hours per night to work on projects like this, you prefer not to fight with such cutting edge technology and after losing 2 or 3 nights trying to make a go of Bazel, I eventually put it aside.
I also tried to use Autotools. It failed horribly for me, mostly for my own carelessness and lack of early-project source control.
After that, it was strictly vanilla makefiles with no real dependency management. But you know what helps in these cases ? ccache ! Or at least, it would if it didn’t fail with Emscripten.
Quick tip : ccache has trouble with LLVM unless you set the CCACHE_CPP2 environment variable (e.g. : “export CCACHE_CPP2=1”). I don’t remember the specifics, but it magically fixes things. Then, the lazy build process becomes “make clean && make”.
Testing
If you have never used Node.js, testing Emscripten-compiled JS code might be a good opportunity to start. I was able to use Node.js to great effect for testing the individually-compiled music player modules, wiring up a series of invocations using Python for a broader test suite (wouldn’t want to go too deep down the JS rabbit hole, after all).Be advised that Node.js doesn’t enjoy the same kind of JIT optimizations that the browser engines leverage. Thus, in the case of time critical code like, say, an audio synthesis library, the code might not run in real time. But as long as it produces the correct bitwise waveform, that’s good enough for continuous integration.
Also, if you have largely been a low-level programmer for your whole career and are generally unfamiliar with the world of single-threaded, event-driven, callback-oriented programming, you might be in for a bit of a shock. When I wanted to learn how to read the contents of a file in Node.js, this is the first tutorial I found on the matter. I thought the code presented was a parody of bad coding style :
var fs = require("fs") ; var fileName = "foo.txt" ;
fs.exists(fileName, function(exists)
if (exists)
fs.stat(fileName, function(error, stats)
fs.open(fileName, "r", function(error, fd)
var buffer = new Buffer(stats.size) ;fs.read(fd, buffer, 0, buffer.length, null, function(error, bytesRead, buffer)
var data = buffer.toString("utf8", 0, buffer.length) ;console.log(data) ;
fs.close(fd) ;
) ;
) ;
) ;
) ;Apparently, this kind of thing doesn’t raise an eyebrow in the JS world.
Now, I understand and respect the JS programming model. But this was seriously frustrating when I first encountered it because a simple script like the one I was trying to write just has an ordered list of tasks to complete. When it asks for bytes from a file, it really has nothing better to do than to wait for the answer.
Thankfully, it turns out that Node’s fs module includes synchronous versions of the various file access functions. So it’s all good.
Conclusion
I’m sure I missed or underexplained some things. But if other brave souls are interested in dipping their toes in the waters of Emscripten, I hope these tips will come in handy.The post Things I Have Learned About Emscripten first appeared on Breaking Eggs And Making Omelettes.
-
The first in-depth technical analysis of VP8
Back in my original post about Internet video, I made some initial comments on the hope that VP8 would solve the problems of web video by providing a supposed patent-free video format with significantly better compression than the current options of Theora and Dirac. Fortunately, it seems I was able to acquire access to the VP8 spec, software, and source a good few days before the official release and so was able to perform a detailed technical analysis in time for the official release.
The questions I will try to answer here are :
1. How good is VP8 ? Is the file format actually better than H.264 in terms of compression, and could a good VP8 encoder beat x264 ? On2 claimed 50% better than H.264, but On2 has always made absurd claims that they were never able to back up with results, so such a number is almost surely wrong. VP7, for example, was claimed to be 15% better than H.264 while being much faster, but was in reality neither faster nor higher quality.
2. How good is On2′s VP8 implementation ? Irrespective of how good the spec is, is the implementation good, or is this going to be just like VP3, where On2 releases an unusably bad implementation with the hope that the community will fix it for them ? Let’s hope not ; it took 6 years to fix Theora !
3. How likely is VP8 to actually be free of patents ? Even if VP8 is worse than H.264, being patent-free is still a useful attribute for obvious reasons. But as noted in my previous post, merely being published by Google doesn’t guarantee that it is. Microsoft did similar a few years ago with the release of VC-1, which was claimed to be patent-free — but within mere months after release, a whole bunch of companies claimed patents on it and soon enough a patent pool was formed.
We’ll start by going through the core features of VP8. We’ll primarily analyze them by comparing to existing video formats. Keep in mind that an encoder and a spec are two different things : it’s possible for good encoder to be written for a bad spec or vice versa ! Hence why a really good MPEG-1 encoder can beat a horrific H.264 encoder.
But first, a comment on the spec itself.
AAAAAAAGGGGGGGGGGGGGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH !
The spec consists largely of C code copy-pasted from the VP8 source code — up to and including TODOs, “optimizations”, and even C-specific hacks, such as workarounds for the undefined behavior of signed right shift on negative numbers. In many places it is simply outright opaque. Copy-pasted C code is not a spec. I may have complained about the H.264 spec being overly verbose, but at least it’s precise. The VP8 spec, by comparison, is imprecise, unclear, and overly short, leaving many portions of the format very vaguely explained. Some parts even explicitly refuse to fully explain a particular feature, pointing to highly-optimized, nigh-impossible-to-understand reference code for an explanation. There’s no way in hell anyone could write a decoder solely with this spec alone.
Now that I’ve gotten that out of my system, let’s get back to VP8 itself. To begin with, to get a general sense for where all this fits in, basically all modern video formats work via some variation on the following chain of steps :
Encode : Predict -> Transform + Quant -> Entropy Code -> Loopfilter
Decode : Entropy Decode -> Predict -> Dequant + Inverse Transform -> LoopfilterIf you’re looking to just get to the results and skip the gritty technical details, make sure to check out the “overall verdict” section and the “visual results” section. Or at least skip to the “summary for the lazy”.
Prediction
Prediction is any step which attempts to guess the content of an area of the frame. This could include functions based on already-known pixels in the same frame (e.g. inpainting) or motion compensation from a previous frame. Prediction usually involves side data, such as a signal telling the decoder a motion vector to use for said motion compensation.
Intra Prediction
Intra prediction is used to guess the content of a block without referring to other frames. VP8′s intra prediction is basically ripped off wholesale from H.264 : the “subblock” prediction modes are almost exactly identical (they even have the same names !) to H.264′s i4x4 mode, and the whole block prediction mode is basically identical to i16x16. Chroma prediction modes are practically identical as well. i8x8, from H.264 High Profile, is not present. An additional difference is that the planar prediction mode has been replaced with TM_PRED, a very vaguely similar analogue. The specific prediction modes are internally slightly different, but have the same names as in H.264.
Honestly, I’m very disappointed here. While H.264′s intra prediction is good, it has certainly been improved on quite a bit over the past 7 years, and I thought that blatantly ripping it off was the domain of companies like Real (see RV40). I expected at least something slightly more creative out of On2. But more important than any of that : this is a patent time-bomb waiting to happen. H.264′s spatial intra prediction is covered in patents and I don’t think that On2 will be able to just get away with changing the rounding in the prediction modes. I’d like to see Google’s justification for this — they must have a good explanation for why they think there won’t be any patent issues.
Update : spatial intra prediction apparently dates back to Nokia’s MVC H.26L proposal, from around 2000. It’s possible that Google believes that this is sufficient prior art to invalidate existing patents — which is not at all unreasonable !
Verdict on Intra Prediction : Slightly modified ripoff of H.264. Somewhat worse than H.264 due to omission of i8x8.
Inter Prediction
Inter prediction is used to guess the content of a block by referring to past frames. There are two primary components to inter prediction : reference frames and motion vectors. The reference frame is a past frame from which to grab pixels from and the motion vectors index an offset into that frame. VP8 supports a total of 3 reference frames : the previous frame, the “alt ref” frame, and the “golden frame”. For motion vectors, VP8 supports variable-size partitions much like H.264. For subpixel precision, it supports quarter-pel motion vectors with a 6-tap interpolation filter. In short :
VP8 reference frames : up to 3
H.264 reference frames : up to 16
VP8 partition types : 16×16, 16×8, 8×16, 8×8, 4×4
H.264 partition types : 16×16, 16×8, 8×16, flexible subpartitions (each 8×8 can be 8×8, 8×4, 4×8, or 4×4).
VP8 chroma MV derivation : each 4×4 chroma block uses the average of colocated luma MVs (same as MPEG-4 ASP)
H.264 chroma MV derivation : chroma uses luma MVs directly
VP8 interpolation filter : qpel, 6-tap luma, mixed 4/6-tap chroma
H.264 interpolation filter : qpel, 6-tap luma (staged filter), bilinear chroma
H.264 has but VP8 doesn’t : B-frames, weighted predictionH.264 has a significantly better and more flexible referencing structure. Sub-8×8 partitions are mostly unnecessary, so VP8′s omission of the H.264-style subpartitions has little consequence. The chroma MV derivation is more accurate in H.264 but slightly slower ; in practice the difference is probably near-zero both speed and compression-wise, since sub-8×8 luma partitions are rarely used (and I would suspect the same carries over to VP8).
The VP8 interpolation filter is likely slightly better, but will definitely be slower to implement, both encoder and decoder-side. A staged filter allows the encoder to precalculate all possible halfpel positions and then quickly calculate qpel positions when necessary : an unstaged filter does not, making subpel motion estimation much slower. Not that unstaged filters are bad — staged filters have basically been abandoned for all of the H.265 proposals — it’s just an inherent disadvantage performance-wise. Additionally, having as high as 6 taps on chroma is, IMO, completely unnecessary and wasteful.
The lack of B-frames in VP8 is a killer. B-frames can give 10-20% (or more) compression benefit for minimal speed cost ; their omission in VP8 probably costs more compression than all other problems noted in this post combined. This was not unexpected, however ; On2 has never used B-frames in any of their video formats. They also likely present serious patent problems, which probably explains their omission. Lack of weighted prediction is also going to hurt a bit, especially in fades.
Update : Alt-ref frames can apparently be used to partially replicate the lack of B-frames. It’s not nearly as good, but it can get at least some of the benefit without actual B-frames.
Verdict on Inter Prediction : Similar partitioning structure to H.264. Much weaker referencing structure. More complex, slightly better interpolation filter. Mostly a wash — except for the lack of B-frames, which is seriously going to hurt compression.
Transform and Quantization
After prediction, the encoder takes the difference between the prediction and the actual source pixels (the residual), transforms it, and quantizes it. The transform step is designed to make the data more amenable to compression by decorrelating it. The quantization step is the actual information-losing step where compression occurs ; the output values of transform are rounded, mostly to zero, leaving only a few integer coefficients.
Transform
For transform, VP8 uses again a very H.264-reminiscent scheme. Each 16×16 macroblock is divided into 16 4×4 DCT blocks, each of which is transformed by a bit-exact DCT approximation. Then, the DC coefficients of each block are collected into another 4×4 group, which is then Hadamard-transformed. OK, so this isn’t reminiscent of H.264, this is H.264. There are, however, 3 differences between VP8′s scheme and H.264′s.
The first is that the 8×8 transform is omitted entirely (fitting with the omission of the i8x8 intra mode). The second is the specifics of the transform itself. H.264 uses an extremely simplified “DCT” which is so un-DCT-like that it often referred to as the HCT (H.264 Cosine Transform) instead. This simplified transform results in roughly 1% worse compression, but greatly simplifies the transform itself, which can be implemented entirely with adds, subtracts, and right shifts by 1. VC-1 uses a more accurate version that relies on a few small multiplies (numbers like 17, 22, 10, etc). VP8 uses an extremely, needlessly accurate version that uses very large multiplies (20091 and 35468). This in retrospect is not surpising, as it is very similar to what VP3 used.
The third difference is that the Hadamard hierarchical transform is applied for some inter blocks, not merely i16x16. In particular, it also runs for p16x16 blocks. While this is definitely a good idea, especially given the small transform size (and the need to decorrelate the DC value between the small transforms), I’m not quite sure I agree with the decision to limit it to p16x16 blocks ; it seems that perhaps with a small amount of modification this could also be useful for other motion partitions. Also, note that unlike H.264, the hierarchical transform is luma-only and not applied to chroma.
Overall, the transform scheme in VP8 is definitely weaker than in H.264. The lack of an 8×8 transform is going to have a significant impact on detail retention, especially at high resolutions. The transform is needlessly slower than necessary as well, though a shift-based transform might be out of the question due to patents. The one good new idea here is applying the hierarchical DC transform to inter blocks.
Verdict on Transform : Similar to H.264. Slower, slightly more accurate 4×4 transform. Improved DC transform for luma (but not on chroma). No 8×8 transform. Overall, worse.
Quantization
For quantization, the core process is basically the same among all MPEG-like video formats, and VP8 is no exception. The primary ways that video formats tend to differentiate themselves here is by varying quantization scaling factors. There are two ways in which this is primarily done : frame-based offsets that apply to all coefficients or just some portion of them, and macroblock-level offsets. VP8 primarily uses the former ; in a scheme much less flexible than H.264′s custom quantization matrices, it allows for adjusting the quantizer of luma DC, luma AC, chroma DC, and so forth, separately. The latter (macroblock-level quantizer choice) can, in theory, be done using its “segmentation map” features, albeit very hackily and not very efficiently.
The killer mistake that VP8 has made here is not making macroblock-level quantization a core feature of VP8. Algorithms that take advantage of macroblock-level quantization are known as “adaptive quantization” and are absolutely critical to competitive visual quality. My implementation of variance-based adaptive quantization (before, after) in x264 still stands to this day as the single largest visual quality gain in x264 history. Encoder comparisons have showed over and over that encoders without adaptive quantization simply cannot compete.
Thus, while adaptive quantization is possible in VP8, the only way to implement it is to define one segment map for every single quantizer that one wants and to code the segment map index for every macroblock. This is inefficient and cumbersome ; even the relatively suboptimal MPEG-style delta quantizer system would be a better option. Furthermore, only 4 segment maps are allowed, for a maximum of 4 quantizers per frame.
Verdict on Quantization : Lack of well-integrated adaptive quantization is going to be a killer when the time comes to implement psy optimizations. Overall, much worse.
Entropy Coding
Entropy coding is the process of taking all the information from all the other processes : DCT coefficients, prediction modes, motion vectors, and so forth — and compressing them losslessly into the final output file. VP8 uses an arithmetic coder somewhat similar to H.264′s, but with a few critical differences. First, it omits the range/probability table in favor of a multiplication. Second, it is entirely non-adaptive : unlike H.264′s, which adapts after every bit decoded, probability values are constant over the course of the frame. Accordingly, the encoder may periodically send updated probability values in frame headers for some syntax elements. Keyframes reset the probability values to the defaults.
This approach isn’t surprising ; VP5 and VP6 (and probably VP7) also used non-adaptive arithmetic coders. How much of a penalty this actually means compression-wise is unknown ; it’s not easy to measure given the design of either H.264 or VP8. More importantly, I question the reason for this : making it adaptive would add just one single table lookup to the arithmetic decoding function — hardly a very large performance impact.
Of course, the arithmetic coder is not the only part of entropy coding : an arithmetic coder merely turns 0s and 1s into an output bitstream. The process of creating those 0s and 1s and selecting the probabilities for the encoder to use is an equally interesting problem. Since this is a very complicated part of the video format, I’ll just comment on the parts that I found particularly notable.
Motion vector coding consists of two parts : prediction based on neighboring motion vectors and the actual compression of the resulting delta between that and the actual motion vector. The prediction scheme in VP8 is a bit odd — worse, the section of the spec covering this contains no English explanation, just confusingly-written C code. As far as I can tell, it chooses an arithmetic coding context based on the neighboring MVs, then decides which of the predicted motion vectors to use, or whether to code a delta instead.
The downside of this scheme is that, like in VP3/Theora (though not nearly as badly), it biases heavily towards the re-use of previous motion vectors. This is dangerous because, as the Theora devs have recently found (and fixed to some extent in Theora 1.2 aka Ptalabvorm), any situation in which the encoder picks a motion vector which isn’t the “real” motion vector in order to save bits can potentially have negative visual consequences. In terms of raw efficiency, I’m not sure whether VP8 or H.264′s prediction is better here.
The compression of the resulting delta is similar to H.264, except for the coding of very large deltas, which is slightly better (similar to FFV1′s Golomb-like arithmetic codes).
Intra prediction mode coding is done using arithmetic coding contexts based on the modes of the neighboring blocks. This is probably a good bit better than the hackneyed method that H.264 uses, which always struck me as being poorly designed.
Residual coding is even more difficult to understand than motion vector coding, as the only full reference is a bunch of highly optimized, highly obfuscated C code. Like H.264′s CAVLC, it bases contexts on the number of nonzero coefficients in the top and left blocks relative to the current block. In addition, it also considers the magnitude of those coefficients and, like H.264′s CABAC, updates as coefficients are decoded.
One more thing to note is the data partitioning scheme used by VP8. This scheme is much like VP3/Theora’s and involves putting each syntax element in its own component of the bitstream. The unfortunate problem with this is that it’s a nightmare for hardware implementations, greatly increasing memory bandwidth requirements. I have already received a complaint from a hardware developer about this specific feature with regard to VP8.
Verdict on Entropy Coding : I’m not quite sure here. It’s better in some ways, worse in some ways, and just plain weird in others. My hunch is that it’s probably a very slight win for H.264 ; non-adaptive arithmetic coding has to have some serious penalties. It may also be a hardware implementation problem.
Loop Filter
The loop filter is run after decoding or encoding a frame and serves to perform extra processing on a frame, usually to remove blockiness in DCT-based video formats. Unlike postprocessing, this is not only for visual reasons, but also to improve prediction for future frames. Thus, it has to be done identically in both the encoder and decoder. VP8′s loop filter is vaguely similar to H.264′s, but with a few differences. First, it has two modes (which can be chosen by the encoder) : a fast mode and a normal mode. The fast mode is somewhat simpler than H.264′s, while the normal mode is somewhat more complex. Secondly, when filtering between macroblocks, VP8′s filter has wider range than the in-macroblock filter — H.264 did this, but only for intra edges.
Third, VP8′s filter omits most of the adaptive strength mechanics inherent in H.264′s filter. Its only adaptation is that it skips filtering on p16x16 blocks with no coefficients. This may be responsible for the high blurriness of VP8′s loop filter : it will run over and over and over again on all parts of a macroblock even if they are unchanged between frames (as long as some other part of the macroblock is changed). H.264′s, by comparison, is strength-adaptive based on whether DCT coefficients exist on either side of a given edge and based on the motion vector delta and reference frame delta across said edge. Of course, skipping this strength calculation saves some decoding time as well.
Update :
05:28 < derf> Gumboot : You’ll be disappointed to know they got the loop filter ordering wrong again.
05:29 < derf> Dark_Shikari : They ordered it such that you have to process each macroblock in full before processing the next one.Verdict on Loop Filter : Definitely worse compression-wise than H.264′s due to the lack of adaptive strength. Especially with the “fast” mode, might be significantly faster. I worry about it being too blurry.
Overall verdict on the VP8 video format
Overall, VP8 appears to be significantly weaker than H.264 compression-wise. The primary weaknesses mentioned above are the lack of proper adaptive quantization, lack of B-frames, lack of an 8×8 transform, and non-adaptive loop filter. With this in mind, I expect VP8 to be more comparable to VC-1 or H.264 Baseline Profile than with H.264. Of course, this is still significantly better than Theora, and in my tests it beats Dirac quite handily as well.
Supposedly Google is open to improving the bitstream format — but this seems to conflict with the fact that they got so many different companies to announce VP8 support. The more software that supports a file format, the harder it is to change said format, so I’m dubious of any claim that we will be able to spend the next 6-12 months revising VP8. In short, it seems to have been released too early : it would have been better off to have an initial period during which revisions could be submitted and then a big announcement later when it’s completed.
Update : it seems that Google is not open to changing the spec : it is apparently “final”, complete with all its flaws.
In terms of decoding speed I’m not quite sure ; the current implementation appears to be about 16% slower than ffmpeg’s H.264 decoder (and thus probably about 25-35% slower than state-of-the-art decoders like CoreAVC). Of course, this doesn’t necessarily say too much about what a fully optimized implementation will reach, but the current one seems to be reasonably well-optimized and has SIMD assembly code for almost all major DSP functions, so I doubt it will get that much faster.
I would expect, with equally optimized implementations, VP8 and H.264 to be relatively comparable in terms of decoding speed. This, of course, is not really a plus for VP8 : H.264 has a great deal of hardware support, while VP8 largely has to rely on software decoders, so being “just as fast” is in many ways not good enough. By comparison, Theora decodes almost 35% faster than H.264 using ffmpeg’s decoder.
Finally, the problem of patents appears to be rearing its ugly head again. VP8 is simply way too similar to H.264 : a pithy, if slightly inaccurate, description of VP8 would be “H.264 Baseline Profile with a better entropy coder”. Even VC-1 differed more from H.264 than VP8 does, and even VC-1 didn’t manage to escape the clutches of software patents. It’s quite possible that VP8 has no patent issues, but until we get some hard evidence that VP8 is safe, I would be cautious. Since Google is not indemnifying users of VP8 from patent lawsuits, this is even more of a potential problem. Most importantly, Google has not released any justifications for why the various parts of VP8 do not violate patents, as Sun did with their OMS standard : such information would certainly cut down on speculation and make it more clear what their position actually is.
But if luck is on Google’s side and VP8 does pass through the patent gauntlet unscathed, it will undoubtedly be a major upgrade as compared to Theora.
Addendum A : On2′s VP8 Encoder and Decoder
This post is primarily aimed at discussing issues relating to the VP8 video format. But from a practical perspective, while software can be rewritten and improved, to someone looking to use VP8 in the near future, the quality (both code-wise, compression-wise, and speed-wise) of the official VP8 encoder and decoder is more important than anything I’ve said above. Thus, after reading through most of the code, here’s my thoughts on the software.
Initially I was intending to go easy on On2 here ; I assumed that this encoder was in fact new for VP8 and thus they wouldn’t necessarily have time to make the code high-quality and improve its algorithms. However, as I read through the encoder, it became clear that this was not at all true ; there were comments describing bugfixes dating as far back as early 2004. That’s right : this software is even older than x264 ! I’m guessing that the current VP8 software simply evolved from the original VP7 software. Anyways, this means that I’m not going to go easy on On2 ; they’ve had (at least) 6 years to work on VP8, and a much larger dev team than x264′s to boot.
Before I tear the encoder apart, keep in mind that it isn’t bad. In fact, compression-wise, I don’t think they’re going to be able to get it that much better using standard methods. I would guess that the encoder, on slowest settings, is within 5-10% of the maximum PSNR that they’ll ever get out of it. There’s definitely a whole lot more to be had using unusual algorithms like MB-tree, not to mention the complete lack of psy optimizations — but at what it tries to do, it does pretty decently. This is in contrast to the VP3 encoder, which was a pile of garbage (just ask any Theora dev).
Before I go into specific components, a general note on code quality. The code quality is much better than VP3, though there’s still tons of typos in the comments. They also appear to be using comments as a form of version control system, which is a bit bizarre. The assembly code is much worse, with staggering levels of copy-paste coding, some completely useless instructions that do nothing at all, unaligned loads/stores to what-should-be aligned data structures, and a few functions that are simply written in unfathomably roundabout (and slower) ways. While the C code isn’t half bad, the assembly is clearly written by retarded monkeys. But I’m being unfair : this is way better than with VP3.
Motion estimation : Diamond, hex, and exhaustive (full) searches available. All are pretty naively implemented : hexagon, for example, performs a staggering amount of redundant work (almost half of the locations it searches are repeated !). Full is even worse in terms of inefficiency, but it’s useless for all but placebo-level speeds, so I’m not really going to complain about that.
Subpixel motion estimation : Straightforward iterative diamond and square searches. Nothing particularly interesting here.
Quantization : Primary quantization has two modes : a fast mode and a slightly slower mode. The former is just straightforward deadzone quant, while the latter has a bias based on zero-run length (not quite sure how much this helps, but I like the idea). After this they have “coefficient optimization” with two modes. One mode simply tries moving each nonzero coefficient towards zero ; the slow mode tries all 2^16 possible DCT coefficient rounding permutations. Whoever wrote this needs to learn what trellis quantization (the dynamic programming solution to the problem) is and stop using exponential-time algorithms in encoders.
Ratecontrol (frame type handling) : Relies on “boosting” the quality of golden frames and “alt-ref” frames — a concept I find extraordinarily dubious because it means that the video will periodically “jump” to a higher quality level, which looks utterly terrible in practice. You can see the effect in this graph of PSNR ; every dozen frames or so, the quality “jumps”. This cannot possibly look good in motion.
Ratecontrol (overall) : Relies on a purely reactive ratecontrol algorithm, which probably will not do very well in difficult situations such as hard-CBR and tight buffer constraints. Furthermore, it does no adaptation of the quantizer within the frame (e.g. in the case that the frame overshot the size limitations ratecontrol put on it). Instead, it relies on re-encoding the frame repeatedly to reach the target size — which in practice is simply not a usable option for two reasons. In low-latency situations where one can’t have a large delay, re-encoding repeatedly may send the encoder way behind time-wise. In any other situation, one can afford to use frame-based threading, a much faster algorithm for multithreaded encoding than the typical slice-based threading — which makes re-encoding impossible.
Loop filter : The encoder attempts to optimize the loop filter parameters for maximum PSNR. I’m not quite sure how good an idea this is ; every example I’ve seen of this with H.264 ends up creating very bad (often blurry) visual results.
Overall performance : Even on the absolute fastest settings with multithreading, their encoder is slow. On my 1.6Ghz Core i7 it gets barely 26fps encoding 1080p ; not even enough to reliably do real-time compression. x264, by comparison, gets 101fps at its fastest preset “ultrafast”. Now, sure, I don’t expect On2′s encoder to be anywhere near as fast as x264, but being unable to stream HD video on a modern quad-core system is simply not reasonable in 2010. Additionally, the speed options are extraordinarily confusing and counterintuitive and don’t always seem to work properly ; for example, fast encoding mode (–rt) seems to be ignored completely in 2-pass.
Overall compression : As said before, compression-wise the encoder does a pretty good job with the spec that it’s given. The slower algorithms in the encoder are clearly horrifically unoptimized (see the comments on motion search and quantization in particular), but they still work.
Decoder : Seems to be straightforward enough. Nothing jumped out at me as particularly bad, slow, or otherwise, besides the code quality issues mentioned above.
Practical problems : The encoder and decoder share a staggering amount of code. This means that any bug in the common code will affect both, and thus won’t be spotted because it will affect them both in a matching fashion. This is the inherent problem with any file format that doesn’t have independent implementations and is defined by a piece of software instead of a spec : there are always bugs. RV40 had a hilarious example of this, where a typo of “22″ instead of “33″ resulted in quarter-pixel motion compensation being broken. Accordingly, I am very dubious of any file format defined by software instead of a specification. Google should wait until independent implementations have been created before setting the spec in stone.
Update : it seems that what I forsaw is already coming true :
<derf> gmaxwell : It survives it with a patch that causes artifacts because their encoder doesn’t clamp MVs properly.
<gmaxwell> ::cries: :
<derf> So they reverted my decoder patch, instead of fixing the encoder.
<gmaxwell> “but we have many files encoded with this !”
<gmaxwell> so great.. single implementation and it depends on its own bugs.This is just like Internet Explorer 6 all over again — bugs in the software become part of the “spec” !
Hard PSNR numbers :
(Source/target bitrate are the same as in my upcoming comparison.)
x264, slowest mode, High Profile : 29.76103db ( 28% better than VP8)
VP8, slowest mode : 28.37708db ( 8.5% better than x264 baseline)
x264, slowest mode, Baseline Profile : 27.95594dbNote that these numbers are a “best-case” situation : we’re testing all three optimized for PSNR, which is what the current VP8 encoder specializes in as well. This is not too different from my expectations above as estimated from the spec itself ; it’s relatively close to x264′s Baseline Profile.
Keep in mind that this is not representative of what you can get out of VP8 now, but rather what could be gotten out of VP8. PSNR is meaningless for real-world encoding — what matters is visual quality — so hopefully if problems like the adaptive quantization issue mentioned previously can be overcome, the VP8 encoder could be improved to have x264-level psy optimizations. However, as things stand…
Visual results : Unfortunately, since the current VP8 encoder optimizes entirely for PSNR, the visual results are less than impressive. Here’s a sampling of how it compares with some other encoders. Source and bitrate are the same as above ; all encoders are optimized for optimal visual quality wherever possible. And apparently given some of the responses to this part, many people cannot actually read ; the bitrate is (as close as possible to) the same on all of these files.
Update : I got completely slashdotted and my few hundred gigs of bandwidth ran out in mere hours. The images below have been rehosted, so if you’ve pasted the link somewhere else, check below for the new one.
VP8 (On2 VP8 rc8) (source) (Note : I recently realized that the official encoder doesn’t output MKV, so despite the name, this file is actually a VP8 bitstream wrapped in IVF, as generated by ivfenc. Decode it with ivfdec.)
H.264 (Recent x264) (source)
H.264 Baseline Profile (Recent x264) (source)
Theora (Recent ptalabvorm nightly) (source)
Dirac (Schroedinger 1.0.9) (source)
VC-1 (Microsoft VC-1 SDK) (source)
MPEG-4 ASP (Xvid 1.2.2) (source)The quality generated by On2′s VP8 encoder will probably not improve significantly without serious psy optimizations.
One further note about the encoder : currently it will drop frames by default, which is incredibly aggravating and may cause serious problems. I strongly suggest anyone using it to turn the frame-dropping feature off in the options.
Addendum B : Google’s choice of container and audio format for HTML5
Google has chosen Matroska for their container format. This isn’t particularly surprising : Matroska is one of the most widely used “modern” container formats and is in many ways best-suited to the task. MP4 (aka ISOmedia) is probably a better-designed format, but is not very flexible ; while in theory it can stick anything in a private stream, a standardization process is technically necessary to “officially” support any new video or audio formats. Patents are probably a non-issue ; the MP4 patent pool was recently disbanded, largely because nobody used any of the features that were patented.
Another advantage of Matroska is that it can be used for streaming video : while it isn’t typically, the spec allows it. Note that I do not mean progressive download (a’la Youtube), but rather actual streaming, where the encoder is working in real-time. The only way to do this with MP4 is by sending “segments” of video, a very hacky approach in which one is effectively sending a bunch of small MP4 files in sequence. This approach is used by Microsoft’s Silverlight “Smooth Streaming”. Not only is this an ugly hack, but it’s unsuitable for low-latency video. This kind of hack is unnecessary for Matroska. One possible problem is that since almost nobody currently uses Matroska for live streaming purposes, very few existing Matroska implementations support what is necessary to play streamed Matroska files.
I’m not quite sure why Google chose to rebrand Matroska ; “WebM” is a silly name and Matroska is already pretty well-recognized as a brand.
The choice of Vorbis for audio is practically a no-brainer. Even ignoring the issue of patents, libvorbis is still the best general-purpose open source audio encoder. While AAC is generally better at very low bitrates, there aren’t any good open source AAC encoders : faac is worse than LAME and ffmpeg’s AAC encoder is even worse. Furthermore, faac is not free software ; it contains code from the non-free reference encoder. Combined with the patent issue, nobody expected Google to pick anything else.
Addendum C : Summary for the lazy
VP8, as a spec, should be a bit better than H.264 Baseline Profile and VC-1. It’s not even close to competitive with H.264 Main or High Profile. If Google is willing to revise the spec, this can probably be improved.
VP8, as an encoder, is somewhere between Xvid and Microsoft’s VC-1 in terms of visual quality. This can definitely be improved a lot.
VP8, as a decoder, decodes even slower than ffmpeg’s H.264. This probably can’t be improved that much ; VP8 as a whole is similar in complexity to H.264.
With regard to patents, VP8 copies too much from H.264 for comfort, no matter whose word is behind the claim of being patent-free. This doesn’t mean that it’s sure to be covered by patents, but until Google can give us evidence as to why it isn’t, I would be cautious.
VP8 is definitely better compression-wise than Theora and Dirac, so if its claim to being patent-free does stand up, it’s a big upgrade with regard to patent-free video formats.
VP8 is not ready for prime-time ; the spec is a pile of copy-pasted C code and the encoder’s interface is lacking in features and buggy. They aren’t even ready to finalize the bitstream format, let alone switch the world over to VP8.
With the lack of a real spec, the VP8 software basically is the spec–and with the spec being “final”, any bugs are now set in stone. Such bugs have already been found and Google has rejected fixes.
Google made the right decision to pick Matroska and Vorbis for its HTML5 video proposal.
29.76103