ARMv6 vs ARMv7

arm vs arm

Currently the most common ARM architectures, especially in the world of smartphones and single board computers (which are usually derived from various smartphone/tablet chipsets). A logical assumption is that ARMv6 (ARM version 6) is older architecture and ARMv7 (ARM version 7) is newer architecture. Thanks to ARM core modularity, there is not quite easy way to clearly specify differences between these two architectures. The point is that the most essential differences lies in system components that are optional and doesn’t have to be by implemented by manufacturer at all. So what are the differences?

First of all, differences in basic ARM instruction set are negligible. ARMv6 core registers and ARMv7 core registers are the same. ARMv7 is backward compatible with ARMv6, so binaries compiled for ARMv6 should also work on ARMv7. ARM is aimed to strictly RISC architecture so basic ARM instruction set can still do only a very simple operations. There is still completely absent division, or any operations with floating-point values. For complex things, the ARM has a coprocessors (extensions), which lie on the same silicon as the ARM core. However, extensions are optional and the chip manufacturer may decide simply not to implement them into the chip at all. Major differences ARMv6 x ARMv7 are exactly in various processor extensions (hovewer lot of extensions typical for ARMv7 were actually introduced during life of ARMv6 – in some of ARMv6 subversions).

ARMv7 is newly splitted into 3 profiles – ARMv7-A, ARMv7-R, ARMv7-M. Profiles differs in memory access manners, latency or interrupts control. M-profile is closer to microcontroller behavior with time deterministic processing (typically without OS). A-profile is on another hand closer to classical PC system with OS and applications on top of it. The article will further deal with ARMv7-A, because this profile is used on majority of single board computers.

ARMv7 implements some new extensions or implements new versions for various extensions.

The most important are:

VFP
Vector Floating Point – coprocessor for vector (and scalar) processing of floating point operations. Although optional, it is almost always implemented. For ARMv6 and ARMv7 it is de facto standard floating point coprocessor, that allows hardware based adding, subtracting, dividing and multiplying in both single and double precision. It also allows CPU to perform a batch of arithmetical calculations (one type of arithmetic operation with batch of operands) with a single instruction. Calculations are however performed sequentially, one by one. Thus it is not a data-level parallelism behavior as could be expected from true SIMD instruction (like MMX/SSE). Rather than calculating acceleration, employing VFP leads to savings of machine code – which in result, leads also to acceleration of whole process.

  • ARMv6 architecture
    Includes VFPv2 optionally (usually is implemented).
  • ARMv7 architecture
    Includes VFPv3 optionally (usually is implemented). VFPv3 brings several minor improvements. Mainly it adds a new capabilities to instructions VCVT and VMOV. Some operations with floating point values can run more efficiently because of the improvement.

Thumb technology
The ARM instruction set is containing only 32-bit instructions. However not every instruction really needs so much space. This can lead to inefficiency during a loading of instructions from memory and, of course, an unnecessarily bulky machine code. Thumb technology is intended to optimize the size of machine code by replacing some of the most commonly used 32-bit instructions by theirs 16-bit alternatives.

  • ARMv6 architecture
    Includes the first version of Thumb technology. Thumb technology in this version could be problematic because when processing mix of 32-bit ARM instructions and 16-bit Thumb instruction, processor was forced to switch between ARM and Thumb mode (which takes some time) or to split more complex 32-bit instruction into few 16-bit Thumb instructions. This led to inefficiencies – particularly when program is using lots of instructions handling floating point (which did not have an alternative Thumb instruction). This results into a habit to disable Thumb when compiling for ARMv6, especially when program is using plenty of floating point mathematic (typical issue when compiling various 3D engines).
  • ARMv7 architecture
    Includes Thumb-2 technology. This version adds a 32-bit instructions into Thumb mode. It is important to note that the function of the “compressed” Thumb instruction is identical to relevant “full” ARM instruction, the only change is the encoding of instruction in memory. Now it is possible to freely combine 16-bit and 32-bit instructions without loss of performance. ARMv7 can therefore benefit from the Thumb technology and gain a considerable performance boost.

NEON
NEON extension is SIMD engine intended for processing of arithmetic operations in a batch .
Similarly as in case of VFP, operands are pulled in as a vector in a single instruction, which results into considerable speed up in processing. Unlike VFP though, NEON processes whole batch in parallel, thus not sparing only the code size, but also greatly accelerates the calculation itself. NEON is very convenient especially when encoding/decoding multimedia, 2D/3D graphics etc. It should be noted however that NEON can work with float values with only single precision.

  • ARMv6 architecture
    Doesn’t include NEON engine.
  • ARMv7 architecture
    Includes NEON optionally (usually is implemented)

Conclusion
ARMv7 can be faster, more efficient platform, but not unconditionally. For example NEON engine (ARMv7) can take over function of slower VFP, but only for float values with single precision. So it doesn’t help in every case. Some compilers can also vectorize some arithmetic operations to employ NEON as much as possible. But again it doesn’t work perfect for every case. When it comes to raw performance (number of MIPS per MHz per core) more crucial is the version of the Cortex core (ARM Cortex-A5, Cortex-A7, Cortex-A9, Cortex-A15, etc.) that the actual architecture. Architecture and CPU extensions however are very important if you configure the compiler for a particular hardware target or if you are choosing a Linux port for a particular hardware. For example if we take port Debian armhf (ARM Hard Float). Port is targeted to a ARM platform with hardware FPU. However, it also requires Thumb-2 and VFP3D16. So this port cannot be runned on ARMv6 because ARMv6 probably have only Thumb (1) and VFP2. We must therefore use port Debian Armel, or compile Debian for our specific hardware configuration.

Be the first to comment

Leave a Reply