Intel Nehalem (microarchitecture)
Initial Nehalem processors use the same 45 nm manufacturing methods as Penryn. A working system with two Nehalem processors was shown at Intel Developer Forum Fall 2007, and a large number of Nehalem systems were shown at Computex in June 2008.
The microarchitecture is named after the Nehalem Native American nation in Oregon. At that stage it was supposed to be the latest evolution of the NetBurst microarchitecture. Since the abandonment of NetBurst, the codename has been recycled and refers to a completely different project, although Nehalem still has some things in common with NetBurst. Nehalem-based microprocessors utilize higher clock speeds and are more energy-efficient than Penryn microprocessors. Hyper-Threading is reintroduced along with an L3 Cache missing from most Core-based microprocessors.
The first computer to use Nehalem-based Xeon processors was the Apple Mac Pro workstation announced on March 3, 2009. Nehalem-based Xeon EX processors for larger servers were expected in Q4 2009 based on initial announcements from Intel, but in November 2009 the launch of these processors was pushed back to the first half of 2010.
Mobile Nehalem-based processors were introduced in September 2009.
Technology
Various sources have stated the specifications of processors in the Nehalem family:
- Two or four
- 731 million transistors for the quad core variant
- 45 nm manufacturing process
- Integrated memory controller supporting two or three memory channels of DDR3 SDRAM or four FB-DIMM channels
- Integrated graphics processor (IGP) located off-die, but in the same CPU package
- A new point-to-point processor interconnect, the Intel QuickPath Interconnect, in high-end models, replacing the legacy front side bus
- Integration of PCI Express and Direct Media Interface into the processor in mid-range models, replacing the northbridge
- Simultaneous multithreading (SMT) by multiple cores which enables two threads per core. Intel calls this hyper-threading. Simultaneous multithreading has not been present on a consumer desktop Intel processor since 2006 with the Pentium 4 and Pentium XE. Intel reintroduced SMT with their Atom Architecture.
- Native (monolithic, i.e. all processor cores on a single die) quad- and octal-core processors
- The following caches:
- 32 KB L1 instruction and 32 KB L1 data cache per core
- 256 KB L2 cache per core
- 4–8 MB L3 cache shared by all cores
- 33% more in-flight micro-ops than Conroe
- Second-level branch predictor and second-level translation lookaside buffer
- Modular blocks of components such as cores that can be added and subtracted for varying market segments
Performance and power improvements
It has been reported that Nehalem will have a focus on performance, which accounts for the increased core size. Compared to Penryn, Nehalem will have:
- 1.1x to 1.25x the single-threaded performance or 1.2x to 2x the multithreaded performance at the same power level
- 30% lower power usage for the same performance
- According to a preview from AnandTech "expect a 20–30% overall advantage over Penryn with a 10% increase in power usage."
- Per Core, clock-for-clock, Nehalem will provide a 15–20% increase in performance compared to Penryn.
PC Watch found that a Nehalem "Gainestown" processor has 1.6x the SPECint_rate2006 integer performance and 2.4x the SPECfp_rate_2006 floating-point performance of a 3.0 GHz Xeon X5365 "Clovertown" quad-core processor.
A 2.93 GHz Nehalem "Bloomfield" system has been used to run a 3DMark Vantage benchmark and gave a CPU score of 17,966. The 2.66 GHz variant scores 16,294. A 2.4 GHz Core 2 Duo E6600 scores 4,300.
AnandTech tested the Intel QuickPath Interconnect ("QPI", 4.8 GT/s version) and found the copy bandwidth using triple-channel 1066 MHz DDR3 was 12.0 GB/s. A 3.0 GHz Core 2 Quad system using dual-channel 1066 MHz DDR3 achieved 6.9 GB/s.
Overclocking will be possible with Bloomfield processors and the X58 chipset. The Lynnfield processor will use a PCH removing the need for a northbridge chipset.
The Nehalem processors are the first to incorporate the SSE 4.2 SIMD instructions, adding 7 new instructions to the SSE 4.1 set available in the Core 2 series.
Code names
Each combination of a Nehalem/Westmere processor die and package has both a separate codename and a product code. Typically, the same dies are used for uniprocessor (UP) and dual-processor (DP) servers, but using an extra QuickPath link for the inter-processor communication in the DP server variant. Where the Core microarchitecture used four different processor sockets, one for each market segment, Nehalem now uses Socket 1366 for the high-end of both UP and DP machines, and Socket 1156 for the low end UP machines. The name for the UP version of Gulftown is not yet known; its product code is 80613 and can be found in Intel's product database
Mobile | Desktop UP Server | DP Server | MP Server | |
---|---|---|---|---|
Dual-Core 45 nm Dual-Channel, PCIe, Graphics Core | Auburndale canceled | Havendale canceled | ||
Dual-Core 32 nm Dual-Channel, PCIe, Graphics Core | Arrandale 80617 | Clarkdale 80616 | ||
Quad-Core 45 nm Dual-Channel, PCIe | Clarksfield 80607 | Lynnfield 80605 | Jasper Forest 80612 | |
Quad-Core 45 nm Triple-Channel | Bloomfield 80601 | Gainestown 80602 | ||
Six-Core 32 nm Triple-Channel | Gulftown 80613 | Gulftown 80614 | ||
Eight-Core 45 nm Triple-Channel | Beckton 80604 |
Variants
These tables list all the processors of Nehalem microarchitecture to have been leaked so far. The table is ordered roughly by performance, which usually correlates with price and power. Released processors are set in bold.
Notes:
- "Extreme" processors have an unlocked clock multiplier. Thermal design power (TDP) values for CPUs with integrated GPUs include the GPU.
- All variants have 64 KiB L1 cache per core, and 256 KiB L2 cache per core.
45 nm processors
Codename | Market | Cores (Threads) |
Socket | Brand | Processor No. | Clock rate | Turbo | TDP | Interfaces | L3 cache | Release | 1k Unit Price | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Base | Core | Uncore | Chipset | Memory | PCIe | |||||||||||
Beckton | MP Server | 8 (16) | LGA-1567 | 130/105/90 W | 4x QPI | 4x [DDR3 with SMB motherboard] | n/a | 24 MB | Q1 2010 | |||||||
Gainestown | DP Server | 4 (8) | LGA-1366 | Xeon | W5580 | 133 MHz | 3.2 GHz | Yes | 130 W | 2x QPI 6.4 GT/s | 3x DDR3-13331 | n/a | 8 MB | March 29, 2009 | $1500 | |
X5570 | 2.93 GHz | 95 W | $1286 | |||||||||||||
X5560 | 2.8 GHz | $1072 | ||||||||||||||
X5550 | 2.66 GHz | $858 | ||||||||||||||
E5540 | 2.53 GHz | 80 W | 2x 5.86 GT/s | 3x DDR3-10661 | $744 | |||||||||||
E5530 | 2.4 GHz | $530 | ||||||||||||||
E5520 | 2.26 GHz | $373 | ||||||||||||||
L5520 | 2.26 GHz | 60 W | $530 | |||||||||||||
4 (4) | E5506 | 2.13 GHz | No | 80 W | 2x 4.8 GT/s | 3x DDR3-8001 | 4 MB | $266 | ||||||||
E5504 | 2.0 GHz | $224 | ||||||||||||||
L5506 | 2.13 GHz | 60 W | $423 | |||||||||||||
2 (2) | E5502 | 1.86 GHz | 80 W | $188 | ||||||||||||
Bloomfield | UP Server | 4 (8) | LGA-1366 | Xeon | W3580 | 133 MHz | 3.33 GHz | Yes | 130 W | 1x QPI 6.4 GT/s | 3x DDR3-1333 | n/a | 8 MB | August 9, 2009 | $999 | |
W3570 | 3.2 GHz | March 29, 2009 | $999 | |||||||||||||
W3550 | 3.06 GHz | 1x QPI 4.8 GT/s | 3x DDR3-1066 | August 9, 2009 | $562 | |||||||||||
W3540 | 2.93 GHz | March 29, 2009 | $562 | |||||||||||||
W3520 | 2.66 GHz | $284 | ||||||||||||||
Lynnfield | LGA 1156 | X3470 | 2.93 GHz | 95 W | DMI | 2x DDR3-1333 | September 8, 2009 | $589 | ||||||||
X3460 | 2.8 GHz | $316 | ||||||||||||||
X3450 | 2.66 GHz | $241 | ||||||||||||||
X3440 | 2.53 GHz | $215 | ||||||||||||||
L3426 | 1.86 GHz | 45 W | $284 | |||||||||||||
4 (4) | X3430 | 2.4 GHz | 95 W | $189 | ||||||||||||
Bloomfield | Extreme/Performance Desktop | 4 (8) | LGA-1366 | Core i7 Extreme | 975 | 133 MHz | 3.33 GHz | 2.66 GHz | Yes | 130 W | 1x QPI 6.4 GT/s | 3x DDR3-1066 | n/a | 8 MB | May 31, 2009 | $999 |
965 | 3.2 GHz | November 17, 2008 | $999 | |||||||||||||
Core i7 | 960 | 3.2 GHz | 2.13 GHz | 1x QPI 4.8 GT/s | October 20, 2009 | $562 | ||||||||||
950 | 3.06 GHz | May 31, 2009 | $562 | |||||||||||||
940 | 2.93 GHz | November 17, 2008 | $562 | |||||||||||||
930 | 2.8 GHz | February 28, 2010 | $284 | |||||||||||||
920 | 2.66 GHz | November 17, 2008 | $284 | |||||||||||||
Lynnfield | Performance/Mainstream Desktop | 4 (8) | LGA 1156 | 870 | 133 MHz | 2.93 GHz | 2.4 GHz | Yes | 95 W | DMI | 2x DDR3-1333 | 1x16 / 2x8 | 8 MB | September 8, 2009 | $562 | |
860 | 2.8 GHz | $284 | ||||||||||||||
860S | 2.53 GHz | 82 W | January 7, 2010 | $337 | ||||||||||||
4 (4) | Core i5 | 750 | 2.66 GHz | 2.13 GHz | 95 W | September 8, 2009 | $196 | |||||||||
750S | 2.4 GHz | 82 W | January 7, 2010 | $259 | ||||||||||||
Clarksfield | Extreme/Performance Mobile | 4 (8) | mPGA-989 | Core i7 Extreme | 920XM | 2.0 GHz | 55 W | September 23, 2009 | $1054 | |||||||
Core i7 | 820QM | 1.73 GHz | 45 W | $546 | ||||||||||||
720QM | 1.6 GHz | 6 MB | $364 |
- 1 Though there is only one memory controller and it has only three channels, Intel states the Gainestown processors have six memory channels. Gainestown processors have dual QPI links and have a separate set of memory registers for each link; in effect, a multiplexed six-channel system.
The Havendale and Auburndale variants (which contained Gilo and Ironlake) have been cancelled.
32 nm processors
Codename | Market | Cores (Threads) |
Socket | Brand | Processor No. | Clock rate | Turbo | TDP | Interfaces | L3 cache | Release | 1k Unit Price | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Base | Core | GPU | Chipset | Memory | PCIe | |||||||||||
Gulftown | Extreme/Performance Desktop | 6 (12) | LGA-1366 | Core i7 Extreme | 980 XE | 133 MHz | 3.33 GHz | n/a | Yes | 130 W | 2x QPI | 3x DDR3-1066 | n/a | 12 MB | March 16, 2010 | $999 |
Clarkdale | Mainstream/Value Desktop | 2 (4) | LGA-1156 | Core i5 | 670 | 133 MHz | 3.46 GHz | 733 MHz | Yes | 73 W | DMI | 2x DDR3-1333 | 1 x16 | 4 MB | January 7, 2010 | $284 |
661 | 3.33 GHz | 900 MHz | 87 W | $196 | ||||||||||||
660 | 733 MHz | 73 W | ||||||||||||||
650 | 3.2 GHz | $176 | ||||||||||||||
Core i3 | 540 | 3.06 GHz | No | $133 | ||||||||||||
530 | 2.93 GHz | $113 | ||||||||||||||
2 (2) | Pentium | G6950 | 2.8 GHz | 533 MHz | 2x DDR3-1066 | 3 MB | $87 | |||||||||
Arrandale | Mainstream/Value Mobile | 2 (4) | mPGA-989 | Core i7 | 620M | 133 MHz | 2.66 GHz | 766 MHz | Yes | 35 W | DMI | 2x DDR3-1066 | 1 x16 | 4 MB | January 7, 2010 | $332 |
640LM | 2.13 GHz | 566 MHz | 25 W | $332 | ||||||||||||
620LM | 2.0 GHz | $300 | ||||||||||||||
640UM | 1.2 GHz | 500 MHz | 18 W | 2x DDR3-800 | $305 | |||||||||||
620UM | 1.06 GHz | $278 | ||||||||||||||
Core i5 | 540M | 2.53 GHz | 766 MHz | 35 W | 2x DDR3-1066 | 3 MB | $257 | |||||||||
520M | 2.4 GHz | $225 | ||||||||||||||
520UM | 1.06 GHz | 500 MHz | 18 W | 2x DDR3-800 | $241 | |||||||||||
430M | 2.26 GHz | 766 MHz | 35 W | 2x DDR3-1066 | OEM | |||||||||||
Core i3 | 350M | 2.26 GHz | 667 MHz | No | ||||||||||||
330M | 2.13 GHz | |||||||||||||||
2 (2) | Celeron | P4500 | 1.86 Ghz | 500 Mhz | 2 MB | Q2, 2010 |
For the desktop, Gulftown is to be an "Extreme Edition" CPU and so will coexist with Bloomfield. It will have Turbo Boost and similar clock speeds to Bloomfield.
Lynnfield and Clarksfield may make the 32 nm transition in the middle of 2010, sometime after Q2, while Beckton will move to 32 nm at the end of 2010. The 32 nm CPUs will not have significantly different clock speeds compared to 45 nm CPUs. Clarkdale and Arrandale contain the 32 nm dual core processor Hillel and the 45 nm integrated graphics device Ironlake, and support switchable graphics. The lowest-power variant of Arrandale may have a 10 W CPU TDP, and a maximum clock speed of 1.6 GHz. A successor to Bloomfield and entry level server chips are also expected in Q2 2010.
Westmere
Westmere (formerly Nehalem-C) is the name given to the 32 nm die shrink of Nehalem. Westmere was to be ready for a Q4 2009 release provided that Intel stayed on target with their roadmap. However, the first Westmere-based processors were launched on January 7, 2010 as the Core i3, Core i5, and dual-core mobile Core i7.
Westmere's features and improvements from Nehalem have been reported as follows:
- Native six-core, and possibly dual-die hex-core (12-cores), processors.
- The successor to Bloomfield and Gainestown is six-core.
- A new set of instructions that gives over 3x the encryption and decryption rate of Advanced Encryption Standard (AES) processes compared to before.
- Delivers seven new instructions (AES instruction set or AES-NI) that will be used by the AES algorithm. Also an instruction called PCLMULQDQ (see CLMUL instruction set) that will perform carry-less multiplication for use in cryptography. These instructions will allow the processor to perform hardware-accelerated encryption, not only resulting in faster execution but also protecting against software targeted attacks.
- AES-NI may be included in the integrated graphics of Westmere.
- Integrated graphics, released at the same time as the processor.
- Improved virtualization latency.
- New virtualization capability: "VMX Unrestricted mode support," which allows 16-bit guests to run (real mode and big real mode).
Successor
The successor to Nehalem and Westmere will be Sandy Bridge, scheduled for release in 2011, according to Intel roadmaps. The successor to Sandy Bridge will be Haswell, scheduled for release in 2012. It will come with a new cache subsystem, a FMA (fused multiply-add) unit, and a vector coprocessor.
See Also
- Core i7
- X86 instruction set architecture
- x86-64
- Xeon
- Pentium
- P6
- NetBurst
- Core
- Sandy Bridge
- Haswell
- List of Intel CPU microarchitectures
External links
- Nehalem processor at Intel.com