Recherche avancée

Médias (1)

Mot : - Tags -/Christian Nold

Autres articles (78)

  • Websites made ​​with MediaSPIP

    2 mai 2011, par

    This page lists some websites based on MediaSPIP.

  • MediaSPIP v0.2

    21 juin 2013, par

    MediaSPIP 0.2 est la première version de MediaSPIP stable.
    Sa date de sortie officielle est le 21 juin 2013 et est annoncée ici.
    Le fichier zip ici présent contient uniquement les sources de MediaSPIP en version standalone.
    Comme pour la version précédente, il est nécessaire d’installer manuellement l’ensemble des dépendances logicielles sur le serveur.
    Si vous souhaitez utiliser cette archive pour une installation en mode ferme, il vous faudra également procéder à d’autres modifications (...)

  • Creating farms of unique websites

    13 avril 2011, par

    MediaSPIP platforms can be installed as a farm, with a single "core" hosted on a dedicated server and used by multiple websites.
    This allows (among other things) : implementation costs to be shared between several different projects / individuals rapid deployment of multiple unique sites creation of groups of like-minded sites, making it possible to browse media in a more controlled and selective environment than the major "open" (...)

Sur d’autres sites (13657)

  • Decoding audio via Android using FFMpeg

    31 octobre 2013, par Rob Haupt

    I can play Wav files using the below code without issues. When trying to play the exact same media in Mp3 format I only get garbled junk. I believe I am fundamentally misunderstanding how the avcodec_decode_audio3 function works.

    Since the Wav file contains PCM data when it is decoded it can go straight to the AudioTrack.write function. There must be some additional step to get Mp3s to work like this. I don't know what I'm missing, but I've been pulling my hair out for a week now.

    Java Code

    package com.rohaupt.RRD2;

    import java.io.FileNotFoundException;
    import java.io.FileOutputStream;
    import java.io.IOException;

    import android.app.Activity;
    import android.media.AudioFormat;
    import android.media.AudioManager;
    import android.media.AudioTrack;
    import android.media.MediaPlayer;
    import android.os.Bundle;
    import android.os.SystemClock;

    public class player extends Activity
    {
       private AudioTrack track;
       private FileOutputStream os;
       /** Called when the activity is first created. */
       @Override
       public void onCreate(Bundle savedInstanceState)
       {
           super.onCreate(savedInstanceState);
           setContentView(R.layout.main);
           createEngine();

           MediaPlayer mp = new MediaPlayer();
           mp.start();

           int bufSize = AudioTrack.getMinBufferSize(32000,
                                                     AudioFormat.CHANNEL_CONFIGURATION_STEREO,
                                                     AudioFormat.ENCODING_PCM_16BIT);


           track = new AudioTrack(AudioManager.STREAM_MUSIC,
                                  32000,
                                  AudioFormat.CHANNEL_CONFIGURATION_STEREO,
                                  AudioFormat.ENCODING_PCM_16BIT,
                                  bufSize,
                                  AudioTrack.MODE_STREAM);

           byte[] bytes = new byte[bufSize];

           try {
               os = new FileOutputStream("/sdcard/a.out",false);
           } catch (FileNotFoundException e) {
               // TODO Auto-generated catch block
               e.printStackTrace();
           }

           String result = loadFile("/sdcard/a.mp3",bytes);

           try {
               os.close();
           } catch (IOException e) {
               // TODO Auto-generated catch block
               e.printStackTrace();
           }
       }

       void playSound(byte[] buf, int size) {  
           //android.util.Log.v("ROHAUPT", "RAH Playing");
           if(track.getPlayState()!=AudioTrack.PLAYSTATE_PLAYING)
               track.play();
           track.write(buf, 0, size);

           try {
               os.write(buf,0,size);
           } catch (IOException e) {
               // TODO Auto-generated catch block
               e.printStackTrace();
           }
       }


       private native void createEngine();
       private native String loadFile(String file, byte[] array);

       /** Load jni .so on initialization*/
       static {
            System.loadLibrary("avutil");
            System.loadLibrary("avcore");
            System.loadLibrary("avcodec");
            System.loadLibrary("avformat");
            System.loadLibrary("avdevice");
            System.loadLibrary("swscale");
            System.loadLibrary("avfilter");
            System.loadLibrary("ffmpeg");
       }
    }

    C Code

    #include
    #include
    #include
    #include <android></android>log.h>

    #include "libavcodec/avcodec.h"
    #include "libavformat/avformat.h"

    #define DEBUG_TAG "ROHAUPT"  

    void Java_com_rohaupt_RRD2_player_createEngine(JNIEnv* env, jclass clazz)
       {
           avcodec_init();

           av_register_all();


       }

       jstring Java_com_rohaupt_RRD2_player_loadFile(JNIEnv* env, jobject obj,jstring file,jbyteArray array)
       {  
           jboolean            isCopy;  
           int                 i;
           int                 audioStream=-1;
           int                 res;
           int                 decoded = 0;
           int                 out_size;
           AVFormatContext     *pFormatCtx;
           AVCodecContext      *aCodecCtx;
           AVCodecContext      *c= NULL;
           AVCodec             *aCodec;
           AVPacket            packet;
           jclass              cls = (*env)->GetObjectClass(env, obj);
           jmethodID           play = (*env)->GetMethodID(env, cls, "playSound", "([BI)V");//At the begining of your main function
           const char *        szfile = (*env)->GetStringUTFChars(env, file, &amp;isCopy);
           int16_t *           pAudioBuffer = (int16_t *) av_malloc (AVCODEC_MAX_AUDIO_FRAME_SIZE*2+FF_INPUT_BUFFER_PADDING_SIZE);
           int16_t *           outBuffer = (int16_t *) av_malloc (AVCODEC_MAX_AUDIO_FRAME_SIZE*2+FF_INPUT_BUFFER_PADDING_SIZE);


           __android_log_print(ANDROID_LOG_INFO, DEBUG_TAG, "RAH28 Starting");
           res = av_open_input_file(&amp;pFormatCtx, szfile, NULL, 0, NULL);
           if(res!=0)
           {
               __android_log_print(ANDROID_LOG_DEBUG, DEBUG_TAG, "RAH opening input failed with result: [%d]", res);
               return file;
           }

           __android_log_print(ANDROID_LOG_DEBUG, DEBUG_TAG, "RAH getting stream info");
           res = av_find_stream_info(pFormatCtx);
           if(res&lt;0)
           {
               __android_log_print(ANDROID_LOG_DEBUG, DEBUG_TAG, "RAH getting stream info failed with result: [%d]", res);
               return file;
           }

           __android_log_print(ANDROID_LOG_DEBUG, DEBUG_TAG, "RAH getting audio stream");
           for(i=0; i &lt; pFormatCtx->nb_streams; i++) {
             if(pFormatCtx->streams[i]->codec->codec_type==AVMEDIA_TYPE_AUDIO &amp;&amp;
                audioStream &lt; 0) {
               audioStream=i;
             }
           }


           if(audioStream==-1)
           {
               __android_log_print(ANDROID_LOG_DEBUG, DEBUG_TAG, "RAH couldn&#39;t find audio stream");
               return file;
           }
           __android_log_print(ANDROID_LOG_DEBUG, DEBUG_TAG, "RAH audio stream found with result: [%d]", res);


           aCodecCtx=pFormatCtx->streams[audioStream]->codec;
           __android_log_print(ANDROID_LOG_DEBUG, DEBUG_TAG, "RAH audio codec info loaded");

           __android_log_print(ANDROID_LOG_DEBUG, DEBUG_TAG, "RAH audio codec info [%d]", aCodecCtx->codec_id);

           aCodec = avcodec_find_decoder(aCodecCtx->codec_id);
           if(!aCodec) {
              __android_log_print(ANDROID_LOG_DEBUG, DEBUG_TAG, "RAH audio codec unsupported");
              return file;
           }
           __android_log_print(ANDROID_LOG_DEBUG, DEBUG_TAG, "RAH audio codec info found");


           res = avcodec_open(aCodecCtx, aCodec);
           __android_log_print(ANDROID_LOG_DEBUG, DEBUG_TAG, "RAH audio codec loaded [%d] [%d]",aCodecCtx->sample_fmt,res);

           //c=avcodec_alloc_context();
           av_init_packet(&amp;packet);


           __android_log_print(ANDROID_LOG_DEBUG, DEBUG_TAG, "RAH channels [%d] sample rate [%d] sample format [%d]",aCodecCtx->channels,aCodecCtx->sample_rate,aCodecCtx->sample_fmt);


           int x,y;
           x=0;y=0;
           while (av_read_frame(pFormatCtx, &amp;packet)>= 0) {
               __android_log_print(ANDROID_LOG_DEBUG, DEBUG_TAG, "RAH frame read: [%d] [%d]",x++,y);

               if (aCodecCtx->codec_type == AVMEDIA_TYPE_AUDIO) {
                           __android_log_print(ANDROID_LOG_DEBUG, DEBUG_TAG, "RAH audio ready");
                           int data_size = AVCODEC_MAX_AUDIO_FRAME_SIZE*2+FF_INPUT_BUFFER_PADDING_SIZE;
                           int size=packet.size;
                           y=0;
                           decoded = 0;
                           __android_log_print(ANDROID_LOG_DEBUG, DEBUG_TAG, "RAH packet size: [%d]", size);
                           while(size > 0) {

                                   __android_log_print(ANDROID_LOG_DEBUG, DEBUG_TAG, "RAH decoding: [%d] [%d]",x,y++);
                                   int len = avcodec_decode_audio3(aCodecCtx, pAudioBuffer, &amp;data_size, &amp;packet);



                                   __android_log_print(ANDROID_LOG_DEBUG, DEBUG_TAG, "RAH 1 size [%d] len [%d] data_size [%d] out_size [%d]",size,len,data_size,out_size);
                                   jbyte *bytes = (*env)->GetByteArrayElements(env, array, NULL);

                                   memcpy(bytes + decoded, pAudioBuffer, len); //


                                   __android_log_print(ANDROID_LOG_DEBUG, DEBUG_TAG, "RAH 2");

                                   (*env)->ReleaseByteArrayElements(env, array, bytes, 0);
                                   __android_log_print(ANDROID_LOG_DEBUG, DEBUG_TAG, "RAH 3");

                                   (*env)->CallVoidMethod(env, obj, play, array, len);

                                   __android_log_print(ANDROID_LOG_DEBUG, DEBUG_TAG, "RAH 4");


                                   size -= len;
                                   decoded += len;

                          }
                          av_free_packet(&amp;packet);
               }

        }

             // Close the video file
           av_close_input_file(pFormatCtx);

           //__android_log_print(ANDROID_LOG_DEBUG, DEBUG_TAG, "RAH Finished Running result: [%d]", res);
           (*env)->ReleaseStringUTFChars(env, file, szfile);  
           return file;  
       }

    To add some detail. When Calling this function with a Wav File I get the following Log Data

    I/ROHAUPT (  227): RAH28 Starting
    D/ROHAUPT (  227): RAH getting stream info
    D/ROHAUPT (  227): RAH getting audio stream
    D/ROHAUPT (  227): RAH audio stream found with result: [0]
    D/ROHAUPT (  227): RAH audio codec info loaded
    D/ROHAUPT (  227): RAH audio codec info [65536]
    D/ROHAUPT (  227): RAH audio codec info found
    D/ROHAUPT (  227): RAH audio codec loaded [1] [0]
    D/ROHAUPT (  227): RAH channels [2] sample rate [32000] sample format [1]
    D/ROHAUPT (  227): RAH frame read: [0] [0]
    D/ROHAUPT (  227): RAH audio ready
    D/ROHAUPT (  227): RAH packet size: [4096]
    D/ROHAUPT (  227): RAH decoding: [1] [0]
    D/ROHAUPT (  227): RAH 1 size [4096] len [4096] data_size [4096] out_size [0]
    D/ROHAUPT (  227): RAH 2
    D/ROHAUPT (  227): RAH 3
    D/ROHAUPT (  227): RAH 4
    D/ROHAUPT (  227): RAH frame read: [1] [1]
    D/ROHAUPT (  227): RAH audio ready
    ...
    D/ROHAUPT (  227): RAH frame read: [924] [1]
    D/ROHAUPT (  227): RAH audio ready
    D/ROHAUPT (  227): RAH packet size: [4096]
    D/ROHAUPT (  227): RAH decoding: [925] [0]
    D/ROHAUPT (  227): RAH 1 size [4096] len [4096] data_size [4096] out_size [0]
    D/ROHAUPT (  227): RAH 2
    D/ROHAUPT (  227): RAH 3
    D/ROHAUPT (  227): RAH 4
    D/ROHAUPT (  227): RAH frame read: [925] [1]
    D/ROHAUPT (  227): RAH audio ready
    D/ROHAUPT (  227): RAH packet size: [3584]
    D/ROHAUPT (  227): RAH decoding: [926] [0]
    D/ROHAUPT (  227): RAH 1 size [3584] len [3584] data_size [3584] out_size [0]
    D/ROHAUPT (  227): RAH 2
    D/ROHAUPT (  227): RAH 3
    D/ROHAUPT (  227): RAH 4

    When calling with an Mp3 file I get the following

    I/ROHAUPT (  280): RAH28 Starting
    D/ROHAUPT (  280): RAH getting stream info
    D/ROHAUPT (  280): RAH getting audio stream
    D/ROHAUPT (  280): RAH audio stream found with result: [0]
    D/ROHAUPT (  280): RAH audio codec info loaded
    D/ROHAUPT (  280): RAH audio codec info [86017]
    D/ROHAUPT (  280): RAH audio codec info found
    D/ROHAUPT (  280): RAH audio codec loaded [1] [0]
    D/ROHAUPT (  280): RAH channels [2] sample rate [32000] sample format [1]
    D/ROHAUPT (  280): RAH frame read: [0] [0]
    D/ROHAUPT (  280): RAH audio ready
    D/ROHAUPT (  280): RAH packet size: [432]
    D/ROHAUPT (  280): RAH decoding: [1] [0]
    D/ROHAUPT (  280): RAH 1 size [432] len [432] data_size [4608] out_size [0]
    D/ROHAUPT (  280): RAH 2
    ...
    D/ROHAUPT (  280): RAH frame read: [822] [1]
    D/ROHAUPT (  280): RAH audio ready
    D/ROHAUPT (  280): RAH packet size: [432]
    D/ROHAUPT (  280): RAH decoding: [823] [0]
    D/ROHAUPT (  280): RAH 1 size [432] len [432] data_size [4608] out_size [0]
    D/ROHAUPT (  280): RAH 2
    D/ROHAUPT (  280): RAH 3
    D/ROHAUPT (  280): RAH 4
    D/ROHAUPT (  280): RAH frame read: [823] [1]
    D/ROHAUPT (  280): RAH audio ready
    D/ROHAUPT (  280): RAH packet size: [432]
    D/ROHAUPT (  280): RAH decoding: [824] [0]
    D/ROHAUPT (  280): RAH 1 size [432] len [432] data_size [4608] out_size [0]
    D/ROHAUPT (  280): RAH 2
    D/ROHAUPT (  280): RAH 3
    D/ROHAUPT (  280): RAH 4
  • Tour of Part of the VP8 Process

    18 novembre 2010, par Multimedia Mike — VP8

    My toy VP8 encoder outputs a lot of textual data to illustrate exactly what it’s doing. For those who may not be exactly clear on how this or related algorithms operate, this may prove illuminating.

    Let’s look at subblock 0 of macroblock 0 of a luma plane :

     subblock 0 (original)
      92  91  89  86
      91  90  88  86
      89  89  89  88
      89  87  88  93
    

    Since it’s in the top-left corner of the image to be encoded, the phantom samples above and to the left are implicitly 128 for the purpose of intra prediction (in the VP8 algorithm).

     subblock 0 (original)
         128 128 128 128
     128  92  91  89  86
     128  91  90  88  86
     128  89  89  89  88
     128  89  87  88  93
    


    Using the 4×4 DC prediction mode means averaging the 4 top predictors and 4 left predictors. So, the predictor is 128. Subtract this from each element of the subblock :

     subblock 0, predictor removed
     -36 -37 -39 -42
     -37 -38 -40 -42
     -39 -39 -39 -40
     -39 -41 -40 -35
    

    Next, run the subblock through the forward transform :

     subblock 0, transformed
     -312   7   1   0
        1  12  -5   2
        2  -3   3  -1
        1   0  -2   1
    

    Quantize (integer divide) each element ; the DC (first element) and AC (rest of the elements) quantizers are both 4 :

     subblock 0, quantized
     -78   1   0   0
       0   3  -1   0
       0   0   0   0
       0   0   0   0
    

    The above block contains the coefficients that are actually transmitted (zigzagged and entropy-encoded) through the bitstream and decoded on the other end.

    The decoding process looks something like this– after the same coefficients are decoded and rearranged, they are dequantized (multiplied) by the original quantizers :

     subblock 0, dequantized
     -312   4   0   0
        0  12  -4   0
        0   0   0   0
        0   0   0   0
    

    Note that these coefficients are not exactly the same as the original, pre-quantized coefficients. This is a large part of where the “lossy” in “lossy video compression” comes from.

    Next, the decoder generates a base predictor subblock. In this case, it’s all 128 (DC prediction for top-left subblock) :

     subblock 0, predictor
      128 128 128 128
      128 128 128 128
      128 128 128 128
      128 128 128 128
    

    Finally, the dequantized coefficients are shoved through the inverse transform and added to the base predictor block :

     subblock 0, reconstructed
      91  91  89  85
      90  90  89  87
      89  88  89  90
      88  88  89  92
    

    Again, not exactly the same as the original block, but an incredible facsimile thereof.

    Note that this decoding-after-encoding demonstration is not merely pedagogical– the encoder has to decode the subblock because the encoding of successive subblocks may depend on this subblock. The encoder can’t rely on the original representation of the subblock because the decoder won’t have that– it will have the reconstructed block.

    For example, here’s the next subblock :

     subblock 1 (original)
      84  84  87  90
      85  85  86  93
      86  83  83  89
      91  85  84  87
    

    Let’s assume DC prediction once more. The 4 top predictors are still all 128 since this subblock lies along the top row. However, the 4 left predictors are the right edge of the subblock reconstructed in the previous example :

     subblock 1 (original)
        128 128 128 128
     85  84  84  87  90
     87  85  85  86  93
     90  86  83  83  89
     92  91  85  84  87
    

    The DC predictor is computed as (128 + 128 + 128 + 128 + 85 + 87 + 90 + 92 + 4) / 8 = 108 (the extra +4 is for rounding considerations). (Note that in this case, using the original subblock’s right edge would also have resulted in 108, but that’s beside the point.)

    Continuing through the same process as in subblock 0 :

     subblock 1, predictor removed
     -24 -24 -21 -18
     -23 -23 -22 -15
     -22 -25 -25 -19
     -17 -23 -24 -21
    

    subblock 1, transformed
    -173 -9 14 -1
    2 -11 -4 0
    1 6 -2 3
    -5 1 0 1

    subblock 1, quantized
    -43 -2 3 0
    0 -2 -1 0
    0 1 0 0
    -1 0 0 0

    subblock 1, dequantized
    -172 -8 12 0
    0 -8 -4 0
    0 4 0 0
    -4 0 0 0

    subblock 1, predictor
    108 108 108 108
    108 108 108 108
    108 108 108 108
    108 108 108 108

    subblock 1, reconstructed
    84 84 87 89
    86 85 87 91
    86 83 84 89
    90 85 84 88

    I hope this concrete example (straight from a working codec) clarifies this part of the VP8 process.

  • Anatomy of an optimization : H.264 deblocking

    26 mai 2010, par Dark Shikari — H.264, assembly, development, speed, x264

    As mentioned in the previous post, H.264 has an adaptive deblocking filter. But what exactly does that mean — and more importantly, what does it mean for performance ? And how can we make it as fast as possible ? In this post I’ll try to answer these questions, particularly in relation to my recent deblocking optimizations in x264.

    H.264′s deblocking filter has two steps : strength calculation and the actual filter. The first step calculates the parameters for the second step. The filter runs on all the edges in each macroblock. That’s 4 vertical edges of length 16 pixels and 4 horizontal edges of length 16 pixels. The vertical edges are filtered first, from left to right, then the horizontal edges, from top to bottom (order matters !). The leftmost edge is the one between the current macroblock and the left macroblock, while the topmost edge is the one between the current macroblock and the top macroblock.

    Here’s the formula for the strength calculation in progressive mode. The highest strength that applies is always selected.

    If we’re on the edge between an intra macroblock and any other macroblock : Strength 4
    If we’re on an internal edge of an intra macroblock : Strength 3
    If either side of a 4-pixel-long edge has residual data : Strength 2
    If the motion vectors on opposite sides of a 4-pixel-long edge are at least a pixel apart (in either x or y direction) or the reference frames aren’t the same : Strength 1
    Otherwise : Strength 0 (no deblocking)

    These values are then thrown into a lookup table depending on the quantizer : higher quantizers have stronger deblocking. Then the actual filter is run with the appropriate parameters. Note that Strength 4 is actually a special deblocking mode that performs a much stronger filter and affects more pixels.

    One can see somewhat intuitively why these strengths are chosen. The deblocker exists to get rid of sharp edges caused by the block-based nature of H.264, and so the strength depends on what exists that might cause such sharp edges. The strength calculation is a way to use existing data from the video stream to make better decisions during the deblocking process, improving compression and quality.

    Both the strength calculation and the actual filter (not described here) are very complex if naively implemented. The latter can be SIMD’d with not too much difficulty ; no H.264 decoder can get away with reasonable performance without such a thing. But what about optimizing the strength calculation ? A quick analysis shows that this can be beneficial as well.

    Since we have to check both horizontal and vertical edges, we have to check up to 32 pairs of coefficient counts (for residual), 16 pairs of reference frame indices, and 128 motion vector values (counting x and y as separate values). This is a lot of calculation ; a naive implementation can take 500-1000 clock cycles on a modern CPU. Of course, there’s a lot of shortcuts we can take. Here’s some examples :

    • If the macroblock uses the 8×8 transform, we only need to check 2 edges in each direction instead of 4, because we don’t deblock inside of the 8×8 blocks.
    • If the macroblock is a P-skip, we only have to check the first edge in each direction, since there’s guaranteed to be no motion vector differences, reference frame differences, or residual inside of the macroblock.
    • If the macroblock has no residual at all, we can skip that check.
    • If we know the partition type of the macroblock, we can do motion vector checks only along the edges of the partitions.
    • If the effective quantizer is so low that no deblocking would be performed no matter what, don’t bother calculating the strength.

    But even all of this doesn’t save us from ourselves. We still have to iterate over a ton of edges, checking each one. Stuff like the partition-checking logic greatly complicates the code and adds overhead even as it reduces the number of checks. And in many cases decoupling the checks to add such logic will make it slower : if the checks are coupled, we can avoid doing a motion vector check if there’s residual, since Strength 2 overrides Strength 1.

    But wait. What if we could do this in SIMD, just like the actual loopfilter itself ? Sure, it seems more of a problem for C code than assembly, but there aren’t any obvious things in the way. Many years ago, Loren Merritt (pengvado) wrote the first SIMD implementation that I know of (for ffmpeg’s decoder) ; it is quite fast, so I decided to work on porting the idea to x264 to see if we could eke out a bit more speed here as well.

    Before I go over what I had to do to make this change, let me first describe how deblocking is implemented in x264. Since the filter is a loopfilter, it acts “in loop” and must be done in both the encoder and decoder — hence why x264 has it too, not just decoders. At the end of encoding one row of macroblocks, x264 goes back and deblocks the row, then performs half-pixel interpolation for use in encoding the next frame.

    We do it per-row for reasons of cache coherency : deblocking accesses a lot of pixels and a lot of code that wouldn’t otherwise be used, so it’s more efficient to do it in a single pass as opposed to deblocking each macroblock immediately after encoding. Then half-pixel interpolation can immediately re-use the resulting data.

    Now to the change. First, I modified deblocking to implement a subset of the macroblock_cache_load function : spend an extra bit of effort loading the necessary data into a data structure which is much simpler to address — as an assembly implementation would need (x264_macroblock_cache_load_deblock). Then I massively cleaned up deblocking to move all of the core strength-calculation logic into a single, small function that could be converted to assembly (deblock_strength_c). Finally, I wrote the assembly functions and worked with Loren to optimize them. Here’s the result.

    And the timings for the resulting assembly function on my Core i7, in cycles :

    deblock_strength_c : 309
    deblock_strength_mmx : 79
    deblock_strength_sse2 : 37
    deblock_strength_ssse3 : 33

    Now that is a seriously nice improvement. 33 cycles on average to perform that many comparisons–that’s absurdly low, especially considering the SIMD takes no branchy shortcuts : it always checks every single edge ! I walked over to my performance chart and happily crossed off a box.

    But I had a hunch that I could do better. Remember, as mentioned earlier, we’re reloading all that data back into our data structures in order to address it. This isn’t that slow, but takes enough time to significantly cut down on the gain of the assembly code. And worse, less than a row ago, all this data was in the correct place to be used (when we just finished encoding the macroblock) ! But if we did the deblocking right after encoding each macroblock, the cache issues would make it too slow to be worth it (yes, I tested this). So I went back to other things, a bit annoyed that I couldn’t get the full benefit of the changes.

    Then, yesterday, I was talking with Pascal, a former Xvid dev and current video hacker over at Google, about various possible x264 optimizations. He had seen my deblocking changes and we discussed that a bit as well. Then two lines hit me like a pile of bricks :

    <_skal_> tried computing the strength at least ?
    <_skal_> while it’s fresh

    Why hadn’t I thought of that ? Do the strength calculation immediately after encoding each macroblock, save the result, and then go pick it up later for the main deblocking filter. Then we can use the data right there and then for strength calculation, but we don’t have to do the whole deblock process until later.

    I went and implemented it and, after working my way through a horde of bugs, eventually got a working implementation. A big catch was that of slices : deblocking normally acts between slices even though normal encoding does not, so I had to perform extra munging to get that to work. By midday today I was able to go cross yet another box off on the performance chart. And now it’s committed.

    Sometimes chatting for 10 minutes with another developer is enough to spot the idea that your brain somehow managed to miss for nearly a straight week.

    NB : the performance chart is on a specific test clip at a specific set of settings (super fast settings) relevant to the company I work at, so it isn’t accurate nor complete for, say, default settings.

    Update : Here’s a higher resolution version of the current chart, as requested in the comments.