The ARM architecture (previously, the Advanced RISC Machine, and prior to that Acorn RISC Machine) is a 32-bit RISC processor architecture developed by ARM Limited that is widely used in embedded designs. Because of their power saving features, ARM CPUs are dominant in the mobile electronics market, where low power consumption is a critical design goal. As of 2007, about 98 percent of the more than a billion mobile phones sold each year use at least one ARM CPU.
Today, the ARM family accounts for approximately 75% of all embedded 32-bit RISC CPUs,, making it the most widely used 32-bit architecture. ARM CPUs are found in most corners of consumer electronics, from portable devices (PDAs, mobile phones, iPods and other digital media and music players, handheld gaming units, and calculators) to computer peripherals (hard drives, desktop routers). However, since the decline of ARM Ltd's former parent company Acorn Computers, it no longer designs chips orientated towards desktop or main processor functions, and has never been used in a supercomputer or cluster. Prominent branches in this family include Marvell's (formerly Intel's) XScale and the Texas Instruments OMAP series.
History
The ARM design was started in 1983 as a development project at Acorn Computers Ltd to build a compact RISC CPU. Led by Sophie Wilson and Steve Furber, a key design goal was achieving low-latency input/output (interrupt) handling like the MOS Technology 6502 used in Acorn's existing computer designs. The 6502's memory access architecture allowed developers to produce fast machines without the use of costly direct memory access hardware. The team completed development samples called ARM1 by April 1985, and the first "real" production systems as ARM2 the following year.
The ARM2 featured a 32-bit data bus, a 26-bit (64 Mbyte) address space and sixteen 32-bit registers. Program code had to lie within the first 64 Mbyte of the memory, as the program counter was limited to 26 bits because the top 6 bits of the 32-bit register served as status flags. The ARM2 was possibly the simplest useful 32-bit microprocessor in the world, with only 30,000 transistors (compare with Motorola's six-year older 68000 model with around 70,000 transistors). Much of this simplicity comes from not having microcode (which represents about one-quarter to one-third of the 68000) and, like most CPUs of the day, not including any cache. This simplicity led to its low power usage, while performing better than the Intel 80286. A successor, ARM3, was produced with a 4KB cache, which further improved performance.
In the late 1980s Apple Computer and VLSI Technology started working with Acorn on newer versions of the ARM core. The work was so important that Acorn spun off the design team in 1990 into a new company called Advanced RISC Machines Ltd. For this reason, ARM is sometimes expanded as Advanced RISC Machine instead of Acorn RISC Machine. Advanced RISC Machines became ARM Ltd when its parent company, ARM Holdings plc, floated on the London Stock Exchange and NASDAQ in 1998.
The new Apple-ARM work would eventually turn into the ARM6, first released in 1991. Apple used the ARM6-based ARM 610 as the basis for their Apple Newton PDA. In 1994, Acorn used the ARM 610 as the main CPU in their Risc PC computers. DEC licensed the ARM6 architecture (which caused some confusion because they also produced the DEC Alpha) and produced the StrongARM. At 233 MHz this CPU drew only 1 watt of power (more recent versions draw far less). This work was later passed to Intel as a part of a lawsuit settlement, and Intel took the opportunity to supplement their aging i960 line with the StrongARM. Intel later developed its own high performance implementation known as XScale which it has since sold to Marvell.
The ARM core has remained largely the same size throughout these changes. ARM2 had 30,000 transistors, while the ARM6 grew to only 35,000. ARM's business has always been to sell IP cores, which licensees use to create microcontrollers and CPUs based on this core. The most successful implementation has been the ARM7TDMI with hundreds of millions sold in almost every kind of microcontroller equipped device. The idea is that the Original Design Manufacturer combines the ARM core with a number of optional parts to produce a complete CPU, one that can be built on old semiconductor fabs and still deliver substantial performance at a low cost. As of January 2008, over 10 billion ARM cores have been built, and iSuppli predicts that 5 billion a year will ship in 2011.
The common architecture supported on smartphones, Personal Digital Assistants and other handheld devices is ARMv4. XScale and ARM926 processors are ARMv5TE, and are now more numerous in high-end devices than the StrongARM, ARM925T and ARM7TDMI based ARMv4 processors.
| Family | Architecture Version | Core | Feature | Cache (I/D)/MMU | Typical MIPS @ MHz | In application |
| ARM1 | ARMv1 | ARM1 | None | ARM Evaluation System second processor for BBC Micro | ||
| ARM2 | ARMv2 | ARM2 | Architecture 2 added the MUL (multiply) instruction | None | 4 MIPS @ 8 MHz0.33 DMIPS/MHz | Acorn Archimedes, Chessmachine |
| ARMv2a | ARM250 | Integrated MEMC (MMU), Graphics and IO processor. Architecture 2a added the SWP and SWPB (swap) instructions. | None, MEMC1a | 7 MIPS @ 12 MHz | Acorn Archimedes | |
| ARM3 | ARM3 | ARM2a | First use of a processor cache on the ARM | 4K unified | 12 MIPS @ 25 MHz 0.50 DMIPS/MHz | Acorn Archimedes |
| ARM6 | ARMv3 | ARM60 | v3 architecture first to support addressing 32 bits of memory (as opposed to 26 bits) | None | 10 MIPS @ 12 MHz | 3DO Interactive Multiplayer, Zarlink GPS Receiver |
| ARM600 | Cache and coprocessor bus (for FPA10 floating-point unit). | 4K unified | 28 MIPS @ 33 MHz | |||
| ARM610 | Cache, no coprocessor bus | 4K unified | 17 MIPS @ 20 MHz 0.65 DMIPS/MHz | Acorn Risc PC 600, Apple Newton 100 series | ||
| ARM7 | ARMv3 | ARM700 | 8 KB unified | 40 MHz | Acorn Risc PC prototype CPU card | |
| ARM710 | 8KB unified | 40 MHz | Acorn Risc PC 700 | |||
| ARM710a | 8 KB unified | 40 MHz 0.68 DMIPS/MHz | Acorn Risc PC 700, Apple eMate 300 | |||
| ARM7100 | Integrated SoC | 8 KB unified | 18 MHz | Psion Series 5 | ||
| ARM7500 | Integrated SoC | 4 KB unified | 40 MHz | Acorn A7000 | ||
| ARM7500FE | Integrated SoC. "FE" Added FPA and EDO memory controller. | 4 KB unified | 56 MHz 0.73 DMIPS/MHz | Acorn A7000+ | ||
| ARM7TDMI | ARMv4T | ARM7TDMI(-S) | 3-stage pipeline, Thumb | none | 15 MIPS @ 16.8 MHz | Game Boy Advance, Nintendo DS, iPod, Lego NXT, Atmel AT91SAM7, Juice Box, NXP Semiconductors LPC2000 and LH754xx |
| ARM710T | 8 KB unified, MMU | 36 MIPS @ 40 MHz | Psion Series 5mx, Psion Revo/Revo Plus/Diamond Mako | |||
| ARM720T | 8 KB unified, MMU | 60 MIPS @ 59.8 MHz | Zipit Wireless Messenger, NXP Semiconductors LH7952x | |||
| ARM740T | MPU | |||||
| ARMv5TEJ | ARM7EJ-S | Jazelle DBX, Enhanced DSP instructions, 5-stage pipeline | none | |||
| StrongARM | ARMv4 | SA-110 | 16 KB/16 KB, MMU | 203 MHz 1.0 DMIPS/MHz | Apple Newton 2x00 series, Acorn Risc PC, Rebel/Corel Netwinder, Chalice CATS, Psion Netbook | |
| SA-1110 | 16 KB/16 KB, MMU | 233 MHz | LART, Intel Assabet, Ipaq H36x0, Balloon2, Zaurus SL-5x00, HP Jornada 7xx, Jornada 560 series, Palm Zire 31 | |||
| ARM8 | ARMv4 | ARM810 | 5-stage pipeline, static branch prediction, double-bandwidth memory | 8 KB unified, MMU | 84 MIPS @ 72 MHz 1.16 DMIPS/MHz | Acorn Risc PC prototype CPU card |
| ARM9TDMI | ARMv4T | ARM9TDMI | 5-stage pipeline | none | ||
| ARM920T | 16 KB/16 KB, MMU | 200 MIPS @ 180 MHz | Armadillo, GP32,GP2X (first core), Tapwave Zodiac (Motorola i. MX1), Hewlet Packard HP-49/50 Calculators, Sun SPOT, [Cirrus Logic EP9302, EP9307, EP9312, EP9315], Samsung s3c2442 (HTC TyTN, FIC Neo FreeRunner) | |||
| ARM922T | 8 KB/8 KB, MMU | NXP Semiconductors LH7A40x | ||||
| ARM940T | 4 KB/4 KB, MPU | GP2X (second core), Meizu M6 Mini Player | ||||
| ARM9E | ARMv5TE | ARM946E-S | Enhanced DSP instructions | variable, tightly coupled memories, MPU | Nintendo DS, Nokia N-Gage, Conexant 802.11 chips | |
| ARM966E-S | no cache, TCMs | ST Micro STR91xF, includes Ethernet | ||||
| ARM968E-S | no cache, TCMs | NXP Semiconductors LPC2900 | ||||
| ARMv5TEJ | ARM926EJ-S | Jazelle DBX, Enhanced DSP instructions, variants may include ARM Thumb support | variable, TCMs, MMU | 220 MIPS @ 200 MHz | Mobile phones: Sony Ericsson (K, W series); Siemens and Benq (x65 series and newer); Texas Instruments OMAP1710, OMAP1610, OMAP1611, OMAP1612, OMAP-L137; Qualcomm MSM6100, MSM6125, MSM6225, MSM6245, MSM6250, MSM6255A, MSM6260, MSM6275, MSM6280, MSM6300, MSM6500, MSM6800; Freescale i.MX21, i.MX27, Atmel AT91SAM9, NXP Semiconductors LPC3000, GPH Wiz, Marvell Feroceon, NEC C10046F5-211-PN2-A SoC - undocumented core in the ATi Hollywood graphics chip used in the Wii. | |
| ARMv5TE | ARM996HS | Clockless processor, Enhanced DSP instructions | no caches, TCMs, MPU | |||
| ARM10E | ARMv5TE | ARM1020E | (VFP), 6-stage pipeline, Enhanced DSP instructions | 32 KB/32 KB, MMU | ||
| ARM1022E | (VFP) | 16 KB/16 KB, MMU | ||||
| ARMv5TEJ | ARM1026EJ-S | Jazelle DBX, Enhanced DSP instructions | variable, MMU or MPU | |||
| XScale | ARMv5TE | 80200/IOP310/IOP315 | I/O Processor, Enhanced DSP instructions | |||
| 80219 | 400/600 MHz | Thecus N2100 | ||||
| IOP321 | 600 BogoMips @ 600 MHz | Iyonix | ||||
| IOP33x | ||||||
| IOP34x | 1-2 core, RAID Acceleration | 32K/32K L1, 512K L2, MMU | ||||
| PXA210/PXA250 | Applications processor, 7-stage pipeline | PXA210: 133 and 200 MHz, PXA250: 200, 300, and 400 MHz | Zaurus SL-5600, iPAQ H3900, Sony CLIÉ NX60, NX70V, NZ90 | |||
| PXA255 | 32KB/32KB, MMU | 400 BogoMips @ 400 MHz | Gumstix basix & connex, Palm Tungsten E2,Mentor Ranger & Stryder, iRex ILiad | |||
| PXA263 | 200, 300 and 400 MHz | Sony CLIÉ NX73V, NX80V | ||||
| PXA26x | default 400 MHz, up to 624 MHz | Palm Tungsten T3 | ||||
| PXA27x | Applications processor | 32 Kb/32 Kb, MMU | 800 MIPS @ 624 MHz | Gumstix verdex, HTC Universal, HP hx4700, Zaurus SL-C1000, 3000, 3100, 3200, Dell Axim x30, x50, and x51 series, Motorola Q, Balloon3, Trolltech Greenphone, Palm TX, Motorola Ezx Platform A728, A780, A910, A1200, E680, E680i, E680g, E690, E895, Rokr E2, Rokr E6, Fujitsu Siemens LOOX N560, Toshiba Portégé G500, Treo 650-755p, Zipit Z2 | ||
| PXA800(E)F | ||||||
| Monahans | 32KB/32KB L1, TCM, MMU | 1000 MIPS @ 1.25 GHz | ||||
| PXA900 | Blackberry 8700, Blackberry Pearl (8100) | |||||
| IXC1100 | Control Plane Processor | |||||
| IXP2400/IXP2800 | ||||||
| IXP2850 | ||||||
| IXP2325/IXP2350 | ||||||
| IXP42x | NSLU2 IXP460/IXP465 | |||||
| ARM11 | ARMv6 | ARM1136J(F)-S | SIMD, Jazelle DBX, (VFP), 8-stage pipeline | variable, MMU | 740 @ 532-665 MHz (i.MX31 SoC), 400-528 MHz | Texas Instruments OMAP2420 (Nokia E90, Nokia N93, Nokia N95, Nokia N82), Zune, BUGbase, Nokia N800, Nokia N810, Qualcomm MSM7200 (with integrated ARM926EJ-S Coprocessor@274MHz, used in Eten Glofiish, HTC TyTN II, HTC Nike), Freescale i.MX31 (which was used in the original Zune 30gb and Toshiba Gigabeat S). |
| ARMv6T2 | ARM1156T2(F)-S | SIMD, Thumb-2, (VFP), 9-stage pipeline | variable, MPU | |||
| ARMv6KZ | ARM1176JZ(F)-S | SIMD, Jazelle DBX, (VFP) | variable, MMU+TrustZone | Apple iPhone, Apple iPod touch, Conexant CX2427X, Motorola RIZR Z8, Motorola RIZR Z10 | ||
| ARMv6K | ARM11 MPCore | 1-4 core SMP, SIMD, Jazelle DBX, (VFP) | variable, MMU | Nvidia APX 2500 | ||
| Cortex | ARMv7-A | Cortex-A8 | Application profile, VFP, NEON, Jazelle RCT, Thumb-2, 13-stage superscalar pipeline | variable (L1+L2), MMU+TrustZone | up to 2000 (2.0 DMIPS/MHz in speed from 600 MHz to greater than 1 GHz) | Texas Instruments OMAP3430, SBM7000, Oregon State University OSWALD, Gumstix Overo Earth, Pandora, Archos 5, FreeScale i.MX51-SOC, BeagleBoard, Palm Pre |
| Cortex-A9 | Application profile, (VFP), (NEON), Jazelle RCT and DBX, Thumb-2, Out-of-order speculative issue superscalar | MMU+TrustZone | 2.0 DMIPS/MHz | |||
| Cortex-A9 MPCore | As Cortex-A9, 1-4 core SMP | MMU+TrustZone | 2.0 DMIPS/MHz | |||
| ARMv7-R | Cortex-R4(F) | Embedded profile, (FPU) | variable cache, MPU optional | 600 DMIPS @ ~375MHz | Broadcom is a user, TMS570 from Texas Instruments | |
| ARMv7-M | Cortex-M3 | Microcontroller profile, Thumb-2 only | no cache, (MPU) | 125 DMIPS @ 100 MHz | Energy Micro's EFM32, Luminary Micro microcontroller family, ST Microelectronics STM32, NXP Semiconductors LPC1700 | |
| ARMv6-M | Cortex-M1 | FPGA targeted, Microcontroller profile, Thumb-2 (BL, MRS, MSR, ISB, DSB, and DMB). | None, tightly coupled memory optional. | Up to 136 DMIPS @ 170 MHz (0.8 DMIPS/MHz, MHz achievable FPGA-dependent) | "Actel ProASIC3 and Actel Fusion PSC devices will sample in Q3 2007". |
Thumb-2 technology made its debut in the ARM1156 core, announced in 2003. Thumb-2 extends the limited 16-bit instruction set of Thumb with additional 32-bit instructions to give the instruction set more breadth. The resulting stated aim for Thumb-2 is to achieve code density similar to Thumb with performance similar to the ARM instruction set on 32-bit memory.
Thumb-2 also extends both the ARM and Thumb instruction set with yet more instructions, including bit-field manipulation, table branches, and conditional execution.
All ARMv7 chips support the Thumb-2 instruction set. Some chips, such as the Cortex-M3, support only the Thumb-2 instruction set. Other chips in the Cortex and ARM11 series support both "ARM instruction set mode" and "Thumb-2 instruction set mode"
Thumb Execution Environment (ThumbEE)
ThumbEE, also known as Thumb-2EE, and marketed as Jazelle RCT (Runtime Compilation Target), was announced in 2005, first appearing in the Cortex-A8 processor. ThumbEE provides a small extension to the Thumb-2 extended Thumb instruction set, making the instruction set particularly suited to code generated at runtime (e.g. by JIT compilation) in managed Execution Environments. ThumbEE is a target for languages such as Limbo, Java, C#, Perl and Python, and allows JIT compilers to output smaller compiled code without impacting performance.
New features provided by ThumbEE include automatic null pointer checks on every load and store instruction, an instruction to perform an array bounds check. Access to registers r8-r15 (where the Jazelle/DBX Java VM state is held) and the ability to branch to handlers—small sections of frequently called code—commonly used to implement a feature of a high level language, such as allocating memory for a new object.
Advanced SIMD (NEON)
The Advanced SIMD extension, marketed as NEON technology, is a combined 64 and 128 bit SIMD (Single Instruction Multiple Data) instruction set that provides standardized acceleration for media and signal processing applications. NEON can execute MP3 audio decoding on CPUs running at 10 MHz and can run the GSM AMR (Adaptive Multi-Rate) speech codec at no more than 13 MHz. It features a comprehensive instruction set, separate register files and independent execution hardware. NEON supports 8-, 16-, 32- and 64-bit integer and single precision floating-point data and operates in SIMD operations for handling audio/video processing as well as graphics and gaming processing. In NEON, the SIMD supports up to 16 operations at the same time.
VFP
VFP technology is a coprocessor extension to the ARM architecture. It provides low-cost single-precision and double-precision floating-point computation fully compliant with the ANSI/IEEE Std 754-1985 Standard for Binary Floating-Point Arithmetic. VFP provides floating-point computation suitable for a wide spectrum of applications such as PDAs, smartphones, voice compression and decompression, three-dimensional graphics and digital audio, printers, set-top boxes, and automotive applications. The VFP architecture also supports execution of short vector instructions allowing SIMD (Single Instruction Multiple Data) parallelism. This is useful in graphics and signal-processing applications by reducing code size and increasing throughput.
Other floating-point and/or SIMD coprocessors found in ARM-based processors include FPA, FPE, iwMMXt. They provide some of the same functionality as VFP but are not opcode-compatible with it.
Security Extensions (TrustZone)
The Security Extensions, marketed as TrustZone(TM) Technology, is found in ARMv6KZ and later application profile architectures. It provides a low cost alternative to adding an additional dedicated security core to a SoC, by providing two virtual processors backed by hardware based access control. This enables the application core to switch between two states, referred to as worlds (to reduce confusion with other names for capability domains), in a manner such that information can be prevented from leaking from the more trusted world to the less trusted world. This world switch is generally orthogonal to all other capabilities of the processor and so each world can operate independently of the other while using the same core. Memory and peripherals are then made aware of the operating world of the core and may use this to provide access control to secrets and code on the device. A typical application of TrustZone Technology is to run a rich operating system in the less trusted world, and smaller security-specialized code in the more trusted world (known as TrustZone Software, a TrustZone optimized version of the Trusted Foundations(TM) Software developed by Trusted Logic).
In practice, since the specific implementation details of TrustZone are proprietary and have not been publicly disclosed for review, it is unclear what level of assurance is provided for a given threat model.
ARM licensees
ARM Ltd does not manufacture and sell CPU devices based on their own designs, but rather, licenses the processor architecture to interested parties. ARM offers a variety of licensing terms, varying in cost and deliverables. To all licensees, ARM provides an integratable hardware description of the ARM core, as well as complete software development toolset (compiler, debugger, SDK), and the right to sell manufactured silicon containing the ARM CPU. Fabless licensees, who wish to integrate an ARM core into their own chip design, are usually only interested in acquiring a ready-to-manufacture verified IP core. For these customers, ARM delivers a gate netlist description of the chosen ARM core, along with an abstracted simulation model and test programs to aid design integration and verification. More ambitious customers, including integrated device manufacturers (IDM) and foundry operators, choose to acquire the processor IP in synthesizable RTL (Verilog) form. With the synthesizable RTL, the customer has the ability to perform architectural level optimizations and extensions. This allows the designer to achieve exotic design goals not otherwise possible with an unmodified netlist (high clock speed, very low power consumption, instruction set extensions, etc.). While ARM does not grant the licensee the right to resell the ARM architecture itself, licensees may freely sell manufactured product (chip devices, evaluation boards, complete systems, etc.). Merchant foundries can be a special case; not only are they allowed to sell finished silicon containing ARM cores, they generally hold the right to remanufacture ARM cores for other customers.
Like most IP vendors, ARM prices its IP based on perceived value. In architectural terms, the lower performance ARM cores command a lower license cost than the higher performance cores. In terms of silicon implementation, a synthesizable core is more expensive than a hard macro (blackbox) core. Complicating price matters, a merchant foundry who holds an ARM license (such as Samsung and Fujitsu) can offer reduced licensing costs to its fab customers. In exchange for acquiring the ARM core through the foundry's in-house design services, the customer can reduce or eliminate payment of ARM's upfront license fee. Compared to dedicated semiconductor foundries (such as TSMC and UMC) without in-house design services, Fujitsu/Samsung charge 2 to 3 times more per manufactured wafer. For low to mid volume applications, a design service foundry offers lower overall pricing (through subsidization of the license fee). For high volume mass produced parts, the long term cost reduction achievable through lower wafer pricing reduces the impact of ARM's NRE (Non-Recurring Engineering) costs, making the dedicated foundry a better choice.
Many semiconductor or IC design firms hold ARM licenses; Analog Devices, Atmel, Broadcom, Cirrus Logic, Energy Micro, Faraday technology, Freescale, Fujitsu, Intel (through its settlement with Digital Equipment Corporation), IBM, Infineon Technologies, Nintendo, NXP Semiconductors, OKI, Qualcomm, Samsung, Sharp, STMicroelectronics, Texas Instruments and VLSI are some of the many companies who have licensed the ARM in one form or another. Although ARM's license terms are covered by NDA, within the IP industry, ARM is widely known to be among the most expensive CPU cores. A single customer product containing a basic ARM core can incur a one-time license fee in excess of (USD) $200,000. Where significant quantity and architectural modification are involved, the license fee can exceed $10M.
ARM believes that its base of 200+ semiconductor licensees gives it a chance to succeed in the ongoing controversies regarding the use of ARM or Intel architectures in mobile computers.
Source : Wikipedia


Tidak ada komentar:
Posting Komentar