Hardwarebug : Everything is broken

http://hardwarebug.org/

Les articles publiés sur le site

  • Church of JPEG

    25 mai 2014, par MansRandom ramblings
    Some time has passed since I last wrote about the antics of the IJG, purveyor of the well-known JPEG image manipulation library. With the major Linux distributions as well as the Firefox and Chrome browsers having switched to the libjpeg-turbo fork, there has been little reason to pay attention to … Continue reading
  • Church of JPEG

    25 mai 2014, par MansRandom ramblings
    Some time has passed since I last wrote about the antics of the IJG, purveyor of the well-known JPEG image manipulation library. With the major Linux distributions as well as the Firefox and Chrome browsers having switched to the libjpeg-turbo fork, there has been little reason to pay attention to … Continue reading
  • Cortex-A7 instruction cycle timings

    15 mai 2014, par MansARM

    The Cortex-A7 ARM core is a popular choice in low-power and low-cost designs. Unfortunately, the public TRM does not include instruction timing information. It does reveal that execution is in-order which makes measuring the throughput and latency for individual instructions relatively straight-forward.

    The table below lists the measured issue cycles (inverse throughput) and result latency of some commonly used instructions.

    It should be noted that in some cases, the perceived latency depends on the instruction consuming the result. Most of the values were measured with the result used as input to the same instruction. For instructions with multiple outputs, the latencies of the result registers may also differ.

    Finally, although instruction issue is in-order, completion is out of order, allowing independent instructions to issue and complete unimpeded while a multi-cycle instruction is executing in another unit. For example, a 3-cycle MUL instruction does not block ADD instructions following it in program order.

    ALU instructions Issue cycles Result latency
    MOV Rd, Rm 1/2 1
    ADD Rd, Rn, #imm 1/2 1
    ADD Rd, Rn, Rm 1 1
    ADD Rd, Rn, Rm, LSL #imm 1 1
    ADD Rd, Rn, Rm, LSL Rs 1 1
    LSL Rd, Rn, #imm 1 2
    LSL Rd, Rn, Rs 1 2
    QADD Rd, Rn, Rm 1 2
    QADD8 Rd, Rn, Rm 1 2
    QADD16 Rd, Rn, Rm 1 2
    CLZ Rd, Rm 1 1
    RBIT Rd, Rm 1 2
    REV Rd, Rm 1 2
    SBFX Rd, Rn 1 2
    BFC Rd, #lsb, #width 1 2
    BFI Rd, Rn, #lsb, #width 1 2
    NOTE: Shifted operands and shift amounts needed one cycle early.
    Multiply instructions Issue cycles Result latency
    MUL Rd, Rn, Rm 1 3
    MLA Rd, Rn, Rm, Ra 1 31
    SMULL Rd, RdHi, Rn, Rm 1 3
    SMLAL Rd, RdHi, Rn, Rm 1 31
    SMMUL Rd, Rn, Rm 1 3
    SMMLA Rd, Rn, Rm, Ra 1 31
    SMULBB Rd, Rn, Rm 1 3
    SMLABB Rd, Rn, Rm, Ra 1 31
    SMULWB Rd, Rn, Rm 1 3
    SMLAWB Rd, Rn, Rm, Ra 1 31
    SMUAD Rd, Rn, Rm 1 3
    1 Accumulator forwarding allows back to back MLA instructions without delay.
    Divide instructions Issue cycles Result latency
    SDIV Rd, Rn, Rm 4-20 6-22
    UDIV Rd, Rn, Rm 3-19 5-21
    Load/store instructions Issue cycles Result latency
    LDR Rt, [Rn] 1 3
    LDR Rt, [Rn, #imm] 1 3
    LDR Rt, [Rn, Rm] 1 3
    LDR Rt, [Rn, Rm, lsl #imm] 1 3
    LDRD Rt, Rt2, [Rn] 1 3-4
    LDM Rn, {regs} 1-8 3-10
    STR Rt, [Rn] 1 2
    STRD Rt, Rt2, [Rn] 1 2
    STM Rn, {regs} 1-10 2-12
    NOTE: Load results are forwarded to dependent stores without delay.
    VFP instructions Issue cycles Result latency
    VMOV.F32 Sd, Sm 1 4
    VMOV.F64 Dd, Dm 1 4
    VNEG.F32 Sd, Sm 1 4
    VNEG.F64 Dd, Dm 1 4
    VABS.F32 Sd, Sm 1 4
    VABS.F64 Dd, Dm 1 4
    VADD.F32 Sd, Sn, Sm 1 4
    VADD.F64 Dd, Dn, Dm 1 4
    VMUL.F32 Sd, Sn, Sm 1 4
    VMUL.F64 Dd, Dn, Dm 4 7
    VMLA.F32 Sd, Sn, Sm 1 81
    VMLA.F64 Dd, Dn, Dm 4 112
    VFMA.F32 Sd, Sn, Sm 1 81
    VFMA.F64 Dd, Dn, Dm 5 82
    VDIV.F32 Sd, Sn, Sm 15 18
    VDIV.F64 Dd, Dn, Dm 29 32
    VSQRT.F32 Sd, Sm 14 17
    VSQRT.F64 Dd, Dm 28 31
    VCVT.F32.F64 Sd, Dm 1 4
    VCVT.F64.F32 Dd, Sm 1 4
    VCVT.F32.S32 Sd, Sm 1 4
    VCVT.F64.S32 Dd, Sm 1 4
    VCVT.S32.F32 Sd, Sm 1 4
    VCVT.S32.F64 Sd, Dm 1 4
    VCVT.F32.S32 Sd, Sd, #fbits 1 4
    VCVT.F64.S32 Dd, Dd, #fbits 1 4
    VCVT.S32.F32 Sd, Sd, #fbits 1 4
    VCVT.S32.F64 Dd, Dd, #fbits 1 4
    1 5 cycles with dependency only on accumulator.
    2 8 cycles with dependency only on accumulator.
    NEON integer instructions Issue cycles Result latency
    VADD.I8 Dd, Dn, Dm 1 4
    VADDL.S8 Qd, Dn, Dm 2 4
    VADD.I8 Qd, Qn, Qm 2 4
    VMUL.I8 Dd, Dn, Dm 2 4
    VMULL.S8 Qd, Dn, Dm 2 4
    VMUL.I8 Qd, Qn, Qm 4 4
    VMLA.I8 Dd, Dn, Dm 2 4
    VMLAL.S8 Qd, Dn, Dm 2 4
    VMLA.I8 Qd, Qn, Qm 4 4
    VADD.I16 Dd, Dn, Dm 1 4
    VADDL.S16 Qd, Dn, Dm 2 4
    VADD.I16 Qd, Qn, Qm 2 4
    VMUL.I16 Dd, Dn, Dm 1 4
    VMULL.S16 Qd, Dn, Dm 2 4
    VMUL.I16 Qd, Qn, Qm 2 4
    VMLA.I16 Dd, Dn, Dm 1 4
    VMLAL.S16 Qd, Dn, Dm 2 4
    VMLA.I16 Qd, Qn, Qm 2 4
    VADD.I32 Dd, Dn, Dm 1 4
    VADDL.S32 Qd, Dn, Dm 2 4
    VADD.I32 Qd, Qn, Qm 2 4
    VMUL.I32 Dd, Dn, Dm 2 4
    VMULL.S32 Qd, Dn, Dm 2 4
    VMUL.I32 Qd, Qn, Qm 4 4
    VMLA.I32 Dd, Dn, Dm 2 4
    VMLAL.S32 Qd, Dn, Dm 2 4
    VMLA.I32 Qd, Qn, Qm 4 4
    NEON floating-point instructions Issue cycles Result latency
    VADD.F32 Dd, Dn, Dm 2 4
    VADD.F32 Qd, Qn, Qm 4 4
    VMUL.F32 Dd, Dn, Dm 2 4
    VMUL.F32 Qd, Qn, Qm 4 4
    VMLA.F32 Dd, Dn, Dm 2 81
    VMLA.F32 Qd, Qn, Qm 4 81
    1 5 cycles with dependency only on accumulator.
    NEON permute instructions Issue cycles Result latency
    VEXT.n Dd, Dn, Dm, #imm 1 4
    VEXT.n Qd, Qn, Qm, #imm 2 5
    VTRN.n Dd, Dn, Dm 2 5
    VTRN.n Qd, Qn, Qm 4 5
    VUZP.n Dd, Dn, Dm 2 5
    VUZP.n Qd, Qn, Qm 4 6
    VZIP.n Dd, Dn, Dm 2 5
    VZIP.n Qd, Qn, Qm 4 6
    VTBL.8 Dd, {Dn}, Dm 1 4
    VTBL.8 Dd, {Dn-Dn+1}, Dm 1 4
    VTBL.8 Dd, {Dn-Dn+2}, Dm 2 5
    VTBL.8 Dd, {Dn-Dn+3}, Dm 2 5
  • Cortex-A7 instruction cycle timings

    15 mai 2014, par MansARM
    The Cortex-A7 ARM core is a popular choice in low-power and low-cost designs. Unfortunately, the public TRM does not include instruction timing information. It does reveal that execution is in-order which makes measuring the throughput and latency for individual instructions relatively straight-forward. The table below lists the measured issue cycles … Continue reading
  • Cortex-A7 instruction cycle timings

    15 mai 2014, par MansARM
    The Cortex-A7 ARM core is a popular choice in low-power and low-cost designs. Unfortunately, the public TRM does not include instruction timing information. It does reveal that execution is in-order which makes measuring the throughput and latency for individual instructions relatively straight-forward. The table below lists the measured issue cycles … Continue reading