Recherche avancée

Médias (1)

Mot : - Tags -/école

Autres articles (42)

  • Personnaliser en ajoutant son logo, sa bannière ou son image de fond

    5 septembre 2013, par

    Certains thèmes prennent en compte trois éléments de personnalisation : l’ajout d’un logo ; l’ajout d’une bannière l’ajout d’une image de fond ;

  • Les tâches Cron régulières de la ferme

    1er décembre 2010, par

    La gestion de la ferme passe par l’exécution à intervalle régulier de plusieurs tâches répétitives dites Cron.
    Le super Cron (gestion_mutu_super_cron)
    Cette tâche, planifiée chaque minute, a pour simple effet d’appeler le Cron de l’ensemble des instances de la mutualisation régulièrement. Couplée avec un Cron système sur le site central de la mutualisation, cela permet de simplement générer des visites régulières sur les différents sites et éviter que les tâches des sites peu visités soient trop (...)

  • Ecrire une actualité

    21 juin 2013, par

    Présentez les changements dans votre MédiaSPIP ou les actualités de vos projets sur votre MédiaSPIP grâce à la rubrique actualités.
    Dans le thème par défaut spipeo de MédiaSPIP, les actualités sont affichées en bas de la page principale sous les éditoriaux.
    Vous pouvez personnaliser le formulaire de création d’une actualité.
    Formulaire de création d’une actualité Dans le cas d’un document de type actualité, les champs proposés par défaut sont : Date de publication ( personnaliser la date de publication ) (...)

Sur d’autres sites (5526)

  • Revision 16738 : Patch by Chris Pearce to tweak indexing in ffmpeg2theora to make it more ...

    27 novembre 2009, par j — Log

    Patch by Chris Pearce to tweak indexing in ffmpeg2theora to make it more accurate. Fixes : 1. Use ogg_int64_t to calculate and store presentation times of frames, rather than transitioning through double just to cast back to ogg_int64_t. Prevents floating point rounding errors from making frames (...)

  • Bit-field badness

    30 janvier 2010, par Mans — Compilers, Optimisation

    Consider the following C code which is based on an real-world situation.

    struct bf1_31 
        unsigned a:1 ;
        unsigned b:31 ;
     ;
    

    void func(struct bf1_31 *p, int n, int a)

    int i = 0 ;
    do
    if (p[i].a)
    p[i].b += a ;
    while (++i < n) ;

    How would we best write this in ARM assembler ? This is how I would do it :

    func :
            ldr     r3,  [r0], #4
            tst     r3,  #1
            add     r3,  r3,  r2,  lsl #1
            strne   r3,  [r0, #-4]
            subs    r1,  r1,  #1
            bgt     func
            bx      lr
    

    The add instruction is unconditional to avoid a dependency on the comparison. Unrolling the loop would mask the latency of the ldr instruction as well, but that is outside the scope of this experiment.

    Now compile this code with gcc -march=armv5te -O3 and watch in horror :

    func :
            push    r4
            mov     ip, #0
            mov     r4, r2
    loop :
            ldrb    r3, [r0]
            add     ip, ip, #1
            tst     r3, #1
            ldrne   r3, [r0]
            andne   r2, r3, #1
            addne   r3, r4, r3, lsr #1
            orrne   r2, r2, r3, lsl #1
            strne   r2, [r0]
            cmp     ip, r1
            add     r0, r0, #4
            blt     loop
            pop     r4
            bx      lr
    

    This is nothing short of awful :

    • The same value is loaded from memory twice.
    • A complicated mask/shift/or operation is used where a simple shifted add would suffice.
    • Write-back addressing is not used.
    • The loop control counts up and compares instead of counting down.
    • Useless mov in the prologue ; swapping the roles or r2 and r4 would avoid this.
    • Using lr in place of r4 would allow the return to be done with pop {pc}, saving one instruction (ignoring for the moment that no callee-saved registers are needed at all).

    Even for this trivial function the gcc-generated code is more than twice the optimal size and slower by approximately the same factor.

    The main issue I wanted to illustrate is the poor handling of bit-fields by gcc. When accessing bitfields from memory, gcc issues a separate load for each field even when they are contained in the same aligned memory word. Although each load after the first will most likely hit L1 cache, this is still bad for several reasons :

    • Loads have typically two or three cycles result latency compared to one cycle for data processing instructions. Any bit-field can be extracted from a register with two shifts, and on ARM the second of these can generally be achieved using a shifted second operand to a following instruction. The ARMv6T2 instruction set also adds the SBFX and UBFX instructions for extracting any signed or unsigned bit-field in one cycle.
    • Most CPUs have more data processing units than load/store units. It is thus more likely for an ALU instruction than a load/store to issue without delay on a superscalar processor.
    • Redundant memory accesses can trigger early flushing of store buffers rendering these less efficient.

    No gcc bashing is complete without a comparison with another compiler, so without further ado, here is the ARM RVCT output (armcc --cpu 5te -O3) :

    func :
            mov     r3, #0
            push    r4, lr
    loop :
            ldr     ip, [r0, r3, lsl #2]
            tst     ip, #1
            addne   ip, ip, r2, lsl #1
            strne   ip, [r0, r3, lsl #2]
            add     r3, r3, #1
            cmp     r3, r1
            blt     loop
            pop     r4, pc
    

    This is much better, the core loop using only one instruction more than my version. The loop control is counting up, but at least this register is reused as offset for the memory accesses. More remarkable is the push/pop of two registers that are never used. I had not expected to see this from RVCT.

    Even the best compilers are still no match for a human.

  • ARM inline asm secrets

    6 juillet 2010, par Mans — ARM, Compilers

    Although I generally recommend against using GCC inline assembly, preferring instead pure assembly code in separate files, there are occasions where inline is the appropriate solution. Should one, at a time like this, turn to the GCC documentation for guidance, one must be prepared for a degree of disappointment. As it happens, much of the inline asm syntax is left entirely undocumented. This article attempts to fill in some of the blanks for the ARM target.

    Constraints

    Each operand of an inline asm block is described by a constraint string encoding the valid representations of the operand in the generated assembly. For example the “r” code denotes a general-purpose register. In addition to the standard constraints, ARM allows a number of special codes, only some of which are documented. The full list, including a brief description, is available in the constraints.md file in the GCC source tree. The following table is an extract from this file consisting of the codes which are meaningful in an inline asm block (a few are only useful in the machine description itself).

    f Legacy FPA registers f0-f7.
    t The VFP registers s0-s31.
    v The Cirrus Maverick co-processor registers.
    w The VFP registers d0-d15, or d0-d31 for VFPv3.
    x The VFP registers d0-d7.
    y The Intel iWMMX co-processor registers.
    z The Intel iWMMX GR registers.
    l In Thumb state the core registers r0-r7.
    h In Thumb state the core registers r8-r15.
    j A constant suitable for a MOVW instruction. (ARM/Thumb-2)
    b Thumb only. The union of the low registers and the stack register.
    I In ARM/Thumb-2 state a constant that can be used as an immediate value in a Data Processing instruction. In Thumb-1 state a constant in the range 0 to 255.
    J In ARM/Thumb-2 state a constant in the range -4095 to 4095. In Thumb-1 state a constant in the range -255 to -1.
    K In ARM/Thumb-2 state a constant that satisfies the I constraint if inverted. In Thumb-1 state a constant that satisfies the I constraint multiplied by any power of 2.
    L In ARM/Thumb-2 state a constant that satisfies the I constraint if negated. In Thumb-1 state a constant in the range -7 to 7.
    M In Thumb-1 state a constant that is a multiple of 4 in the range 0 to 1020.
    N Thumb-1 state a constant in the range 0 to 31.
    O In Thumb-1 state a constant that is a multiple of 4 in the range -508 to 508.
    Pa In Thumb-1 state a constant in the range -510 to +510
    Pb In Thumb-1 state a constant in the range -262 to +262
    Ps In Thumb-2 state a constant in the range -255 to +255
    Pt In Thumb-2 state a constant in the range -7 to +7
    G In ARM/Thumb-2 state a valid FPA immediate constant.
    H In ARM/Thumb-2 state a valid FPA immediate constant when negated.
    Da In ARM/Thumb-2 state a const_int, const_double or const_vector that can be generated with two Data Processing insns.
    Db In ARM/Thumb-2 state a const_int, const_double or const_vector that can be generated with three Data Processing insns.
    Dc In ARM/Thumb-2 state a const_int, const_double or const_vector that can be generated with four Data Processing insns. This pattern is disabled if optimizing for space or when we have load-delay slots to fill.
    Dn In ARM/Thumb-2 state a const_vector which can be loaded with a Neon vmov immediate instruction.
    Dl In ARM/Thumb-2 state a const_vector which can be used with a Neon vorr or vbic instruction.
    DL In ARM/Thumb-2 state a const_vector which can be used with a Neon vorn or vand instruction.
    Dv In ARM/Thumb-2 state a const_double which can be used with a VFP fconsts instruction.
    Dy In ARM/Thumb-2 state a const_double which can be used with a VFP fconstd instruction.
    Ut In ARM/Thumb-2 state an address valid for loading/storing opaque structure types wider than TImode.
    Uv In ARM/Thumb-2 state a valid VFP load/store address.
    Uy In ARM/Thumb-2 state a valid iWMMX load/store address.
    Un In ARM/Thumb-2 state a valid address for Neon doubleword vector load/store instructions.
    Um In ARM/Thumb-2 state a valid address for Neon element and structure load/store instructions.
    Us In ARM/Thumb-2 state a valid address for non-offset loads/stores of quad-word values in four ARM registers.
    Uq In ARM state an address valid in ldrsb instructions.
    Q In ARM/Thumb-2 state an address that is a single base register.

    Operand codes

    Within the text of an inline asm block, operands are referenced as %0, %1 etc. Register operands are printed as rN, memory operands as [rN, #offset], and so forth. In some situations, for example with operands occupying multiple registers, more detailed control of the output may be required, and once again, an undocumented feature comes to our rescue.

    Special code letters inserted between the % and the operand number alter the output from the default for each type of operand. The table below lists the more useful ones.

    c An integer or symbol address without a preceding # sign
    B Bitwise inverse of integer or symbol without a preceding #
    L The low 16 bits of an immediate constant
    m The base register of a memory operand
    M A register range suitable for LDM/STM
    H The highest-numbered register of a pair
    Q The least significant register of a pair
    R The most significant register of a pair
    P A double-precision VFP register
    p The high single-precision register of a VFP double-precision register
    q A NEON quad register
    e The low doubleword register of a NEON quad register
    f The high doubleword register of a NEON quad register
    h A range of VFP/NEON registers suitable for VLD1/VST1
    A A memory operand for a VLD1/VST1 instruction
    y S register as indexed D register, e.g. s5 becomes d2[1]