search for it

Monday, November 14, 2011

Intel Core i7 3960X (Sandy Bridge E) Review: Keeping the High End Alive

If you look carefully enough, you may notice that things are changing. It first became apparent shortly after the release of Nehalem. Intel bifurcated the performance desktop space by embracing a two-socket strategy, something we'd never seen from Intel and only once from AMD in the early Athlon 64 days (Socket-940 and Socket-754).
LGA-1366 came first, but by the time LGA-1156 arrived a year later it no longer made sense to recommend Intel's high-end Nehalem platform. Lynnfield was nearly as fast and the entire platform was more affordable.
When Sandy Bridge launched earlier this year, all we got was the mainstream desktop version. No one complained because it was fast enough, but we all knew an ultra high-end desktop part was in the works. A true successor to Nehalem's LGA-1366 platform for those who waited all this time.

Left to right: Sandy Bridge E, Gulftown, Sandy Bridge
After some delays, Sandy Bridge E is finally here. The platform is actually pretty simple to talk about. There's a new socket: LGA-2011, a new chipset Intel's X79 and of course the Sandy Bridge E CPU itself. We'll start at the CPU.


LGA-2011, the new socket
For the desktop, Sandy Bridge E is only available in 6-core configurations at launch. Early next year we'll see a quad-core version. I mention the desktop qualification because Sandy Bridge E is really a die harvested Sandy Bridge EP, Intel's next generation Xeon part:

Sandy Bridge E die
If you look carefully at the die shot above, you'll notice that there are actually eight Sandy Bridge cores. The Xeon version will have all eight enabled, but the last two are fused off for SNB-E. The 32nm die is absolutely gigantic by desktop standards, measuring 20.8 mm x 20.9 mm (~435mm^2) Sandy Bridge E is bigger than most GPUs. It also has a ridiculous number of transistors: 2.27 billion.
Around a quarter of the die is dedicated just to the chip's massive L3 cache. Each cache slice has increased in size compared to Sandy Bridge. Instead of 2MB, Sandy Bridge E boasts 2.5MB cache slices. In its Xeon configuration that works out to 20MB of L3 cache, but for desktops it's only 15MB. That's just 1MB shy of how much system memory my old upgraded 386-SX/20 had.
CPU Specification Comparison
CPU Manufacturing Process Cores Transistor Count Die Size
AMD Bulldozer 8C 32nm 8 ~2B 315mm2
AMD Thuban 6C 45nm 6 904M 346mm2
AMD Deneb 4C 45nm 4 758M 258mm2
Intel Gulftown 6C 32nm 6 1.17B 240mm2
Intel Sandy Bridge E (6C) 32nm 6 2.27B 435mm2
Intel Nehalem/Bloomfield 4C 45nm 4 731M 263mm2
Intel Sandy Bridge 4C 32nm 4 995M 216mm2
Intel Lynnfield 4C 45nm 4 774M 296mm2
Intel Clarkdale 2C 32nm 2 384M 81mm2
Intel Sandy Bridge 2C (GT1) 32nm 2 504M 131mm2
Intel Sandy Bridge 2C (GT2) 32nm 2 624M 149mm2
At the core level, Sandy Bridge E is no different than Sandy Bridge. It doesn't clock any higher, L1/L2 caches remain unchanged and per-core performance is identical to what Intel launched earlier this year.

The Lineup
 

Processor Core Clock Cores / Threads L3 Cache Max Turbo Max Overclock Multiplier TDP Price
Intel Core i7 3960X 3.3GHz 6 / 12 15MB 3.9GHz 57x 130W $990
Intel Core i7 3930K 3.2GHz 6 / 12 12MB 3.8GHz 57x 130W $555
Intel Core i7 3820 3.6GHz 4 / 8 10MB 3.9GHz 43x 130W TBD
Intel Core i7 2700K 3.5GHz 4 / 8 8MB 3.9GHz 57x 95W $332
Intel Core i7 2600K 3.4GHz 4 / 8 8MB 3.8GHz 57x 95W $317
Intel Core i7 2600 3.4GHz 4 / 8 8MB 3.8GHz 42x 95W $294
Intel Core i5 2500K 3.3GHz 4 / 4 6MB 3.7GHz 57x 95W $216
Intel Core i5 2500 3.3GHz 4 / 4 6MB 3.7GHz 41x 95W $205
Those of you buying today only have two options: the Core i7-3960X and the Core i7-3930K. Both have six fully unlocked cores, but the 3960X gives you a 15MB L3 cache vs. 12MB with the 3930K. You pay handsomely for that extra 3MB of L3. The 3960X goes for $990 in 1K unit quantities, while the 3930K sells for $555.
The 3960X has the same 3.9GHz max turbo frequency as the Core i7 2700K, that's with 1 - 2 cores active. With 5 - 6 cores active the max turbo drops to a respectable 3.6GHz. Unlike the old days of many vs. few core CPUs, there are no tradeoffs for performance when you buy a SNB-E. Thanks to power gating and turbo, you get pretty much the fastest possible clock speeds regardless of workload.
Early next year we'll see a Core i7 3820, priced around $300, with only 4 cores and a 10MB L3. The 3820 will only be partially unlocked (max OC multiplier = 4 bins above max turbo).

No Integrated Graphics, No Quick Sync

All of this growth in die area comes at the expense of one of Sandy Bridge's greatest assets: its integrated graphics core. SNB-E features no on-die GPU, and as a result it does not feature Quick Sync either. Remember that Quick Sync leverages the GPU's shader array to accelerate some of the transcode pipe, without its presence on SNB-E there's no Quick Sync.
Given the target market for SNB-E's die donor (Xeon servers), further increasing the die area by including an on-die GPU doesn't seem to make sense. Unfortunately desktop users suffer as you lose a very efficient way to transcode videos. Intel argues that you do have more cores to chew through frames with, but the fact remains that Quick Sync frees up your cores to do other things while SNB-E requires that they're all tied up in (quickly) transcoding video. If you don't run any Quick Sync enabled transcoding applications, you won't miss the feature on SNB-E. If you do however, this will be a tradeoff you'll have to come to terms with.

Tons of PCIe and Memory Bandwidth

Occupying the die area where the GPU would normally be is SNB-E's new memory controller. While its predecessor featured a fairly standard dual-channel DDR3 memory controller, SNB-E features four 64-bit DDR3 memory channels. With a single DDR3 DIMM per channel Intel officially supports speeds of up to DDR3-1600, with two DIMMs per channel the max official speed drops to 1333MHz.
With a quad-channel memory controller you'll have to install DIMMs four at a time to take full advantage of the bandwidth. In response, memory vendors are selling 4 and 8 DIMM kits specifically for SNB-E systems. Most high-end X79 motherboards feature 8 DIMM slots (2 per channel). Just as with previous architectures, installing fewer DIMMs is possible, it simply reduces the peak available memory bandwidth.
Intel increased bandwidth on the other side of the chip as well. A single SNB-E CPU features 40 PCIe lanes that are compliant with rev 3.0 of the PCI Express Base Specification (aka PCIe 3.0). With no PCIe 3.0 GPUs available (yet) to test and validate the interface, Intel lists PCIe 3.0 support in the chip's datasheet but is publicly guaranteeing PCIe 2.0 speeds. Intel does add that some PCIe devices may be able to operate at Gen 3 speeds, but we'll have to wait and see once those devices hit the market.
The PCIe lanes off the CPU are quite configurable as you can see from the diagram above. Users running dual-GPU setups can enjoy the fact that both GPUs will have a full x16 interface to SNB-E (vs x8 in SNB). If you're looking for this to deliver a tangible performance increase, you'll be disappointed:
Multi GPU Scaling - Radeon HD 5870 CF
Max Quality, 4X AA/16X AF Metro 2033 (19x12) Crysis: Warhead (19x12) Crysis: Warhead (25x16)
Intel Core i7 3960X (2 x16) 1.87x 1.80x 1.90x
Intel Core i7 2600K (2 x8) 1.94x 1.80x 1.88x
Modern GPUs don't lose much performance in games, even at high quality settings, when going from a x16 to a x8 slot.
I tested PCIe performance with an OCZ Z-Drive R4 PCIe SSD to ensure nothing was lost in the move to the new architecture. Compared to X58, I saw no real deltas in transfers to/from the Z-Drive R4:
PCI Express Performance - OCZ Z-Drive R4, Large Block Sequential Speed - ATTO
  Intel X58 Intel X79
Read 2.62 GB/s 2.66 GB/s
Write 2.49 GB/s 2.50 GB/s

The Letdown: No SAS, No Native USB 3.0

Intel's current RST (Rapid Story Technology) drivers don't support X79, however Intel's RSTe (for enterprise) 3.0 will support the platform once available. We got our hands on an engineering build of the software, which identifies the X79's SATA controller as an Intel C600:
Intel's enterprise chipsets use the Cxxx nomenclature, so this label makes sense. A quick look at Intel's RSTe readme tells us a little more about Intel's C600 controller:
SCU Controllers:
- Intel(R) C600 series chipset SAS RAID (SATA mode)
Controller
- Intel C600 series chipset SAS RAID Controller
SATA RAID Controllers:
- Intel(R) C600 series chipset SATA RAID Controller
SATA AHCI Controllers:
- Intel(R) C600 series chipset SATA AHCI Controller
As was originally rumored, X79 was supposed to support both SATA and SAS. Issues with the implementation of the latter forced Intel to kill SAS support and go with the same 4+2 3Gbps/6Gbps SATA implementation 6-series chipset users get. I would've at least liked to have had more 6Gbps SATA ports. It's quite disappointing to see Intel's flagship chipset lacking feature parity with AMD's year-old 8-series chipsets.
I ran a sanity test on Intel's X79 against some of our H67 data for SATA performance with a Crucial m4 SSD. It looks like 6Gbps SATA performance is identical to the mainstream Sandy Bridge platform:
6Gbps SATA Performance - Crucial m4 256GB (FW0009)
  4KB Random Write (8GB LBA, QD32) 4KB Random Read (100% LBA, QD3) 128KB Sequential Write 128KB Sequential Read
Intel X79 231.4 MB/s 57.6 MB/s 273.3 MB/s 381.7 MB/s
Intel Z68 234.0 MB/s 59.0 MB/s 269.7 MB/s 372.1 MB/s
Intel still hasn't delivered an integrated USB 3.0 controller in X79. Motherboard manufacturers will continue to use 3rd party solutions to enable USB 3.0 support.

Overclocking

Sandy Bridge brought the motherboard's clock generator onto the 6-series chipset die. In doing so, Intel also locked its operation to 100MHz. While there was a bit of wiggle room, when combined with a locked processor, Intel effectively killed overclocking with most lower end Sandy Bridge chips.
For its more expensive CPUs, Intel offered either partially or fully unlocked (K-series) CPUs. The bus clock was still fixed at 100MHz, but you could overclock your processor by increasing its clock multiplier just like you could in the early days of overclocking.
With Sandy Bridge E, overclocking changes a bit. The clock generator is still mostly impervious to significant bus clock changes, however you're now able to send a multiple of its frequency to the CPU if you so desire. The options available are 100MHz, 125MHz, 166MHz and 250MHz.
Once again, wiggle room at any of these frequencies is limited so don't think we've moved back to the days of bus overclocking. You do get a little more flexibility, particularly with partially unlocked CPUs, but otherwise SNB-E overclocking is hardly any different from its predecessor.
Note that even if you select any of these options, the rest of the system still operates within spec. The multiplied bus clock is only fed to the CPU.
With a bit of effort I had no problems hitting 4.6GHz on my Core i7 3960X review sample. I had to increase core voltage from 1.104V to 1.44V, but the system was stable. While I could get into Windows at 4.8GHz and run a few benchmarks, the system wasn't completely stable.

No Cooler Included

None of the retail or OEM SNB-E parts include an Intel cooler in the bundle, a significant departure from previous CPUs. Presumably the cost of bundling a beefy cooler with these parts would've driven prices higher than Intel would've liked (remember you are getting a much larger die for roughly the same price as the outgoing Core i7 990X). Intel can also rationalize its decision against including any sort of cooler in the retail box by looking at the fact that many enthusiasts at this level opt for aftermarket cooling regardless.
Intel hasn't completely left SNB-E cooling up to 3rd party vendors however. There are two official Intel coolers available for use with SNB-E. The first is a < $20 heatsink that looks a lot like Intel's current coolers but with a couple of modifications (clear fan/shroud, retention screws instead of pegs). Intel states that this cooler is designed for operation within spec, meaning it could possibly limit overclocking attempts.
If you want an Intel branded overclocking solution, there's the RTS2011LC:
This is a closed loop liquid cooling solution similar to what AMD introduced alongside its Bulldozer CPU and similar to what many 3rd party cooling companies already offer. Intel expects its liquid cooling solution to be priced somewhere in the $85 - $100 range.
These closed loop liquid coolers are great primarily for getting away from the tower-of-metal heatsinks that have grown in popularity over the past several years. The radiator is a too small to compete with more traditional water cooling systems, but it can be a good gateway drug for the risk adverse.

The Test

To keep the review length manageable we're presenting a subset of our results here. For all benchmark results and even more comparisons be sure to use our performance comparison tool: Bench.
Motherboard: ASUS P8Z68-V Pro (Intel Z68)
ASUS Crosshair V Formula (AMD 990FX)
Intel DX79SI (Intel X79)
Hard Disk: Intel X25-M SSD (80GB)
Crucial RealSSD C300
Memory: 4 x 4GB G.Skill Ripjaws X DDR3-1600 9-9-9-20
Video Card: ATI Radeon HD 5870 (Windows 7)
Video Drivers: AMD Catalyst 11.10 Beta (Windows 7)
Desktop Resolution: 1920 x 1200
OS: Windows 7 x64

Cache and Memory Bandwidth Performance

The biggest changes from the original Sandy Bridge are the increased L3 cache size and the quad-channel memory interface. We'll first look at the impact a 15MB L3 has on latency:
Cache/Memory Latency Comparison
  L1 L2 L3 Main Memory
AMD FX-8150 (3.6GHz) 4 21 65 195
AMD Phenom II X4 975 BE (3.6GHz) 3 15 59 182
AMD Phenom II X6 1100T (3.3GHz) 3 14 55 157
Intel Core i5 2500K (3.3GHz) 4 11 25 148
Intel Core i7 3960X (3.3GHz) 4 11 30 167
Cachemem shows us a 5 cycle increase in latency. Hits in L3 can take 20% longer to get to the core that requested the data, if this is correct. For small, lightly threaded applications, you may see a slight regression in performance compared to Sandy Bridge. More likely than not however, the ~2 - 2.5x increase in L3 cache size will more than make up for the added latency. Also note that despite the large cache and thanks to its ring bus, Sandy Bridge E's L3 is still lower latency than Gulftown's.
Memory Bandwidth Comparison - Sandra 2012.01.18.10
  Intel Core i7 3960X (Quad Channel, DDR3-1600) Intel Core i7 2600K (Dual Channel, DDR3-1600) Intel Core i7 990X (Triple Channel, DDR3-1333)
Aggregate Memory Bandwidth 37.0 GB/s 21.2 GB/s 19.9 GB/s
Memory bandwidth is also up significantly. Populating all four channels with DDR3-1600 memory, Sandy Bridge E delivered 37GB/s of bandwidth in Sandra's memory bandwidth test. Given the 51GB/s theoretical max of this configuration and a fairly standard 20% overhead, 37GB/s is just about what we want to see here.



Power Consumption

At idle, the 3960X's power consumption is barely discernible from the 2600K. Under load however, Sandy Bridge E can draw significantly more power. We measured 35% more power draw over a 2600K. The added power consumption makes sense. The chip has more cores and a larger cache, without introducing a more power efficient architecture or a new manufacturing process.
Power Consumption - Idle
Power Consumption - Load (x264 HD 3.03 2nd Pass)


Overclocked Performance

I mentioned earlier that I hit 4.6GHz on my 3960X sample, if you're curious about just how fast that makes the system have a look at this:
Overclocked: x264 HD Benchmark - 2nd pass - v3.03
The 3960X at 4.6GHz is almost twice as fast as the Core i7 2600K! The added performance does come at the expense of power consumption:
Overclocked Power Consumption - Load (x264 HD 3.03 2nd Pass)


Final Words

There are two aspects of today's launch that bother me: the lack of Quick Sync and the chipset. The former is easy to understand. Sandy Bridge E is supposed to be a no-compromise, ultra high-end desktop solution. The lack of an on-die GPU with Quick Sync support means you have to inherently compromise in adopting the platform. I'm not sure what sort of a solution Intel could've come to (I wouldn't want to give up a pair of cores for a GPU+QuickSync) but I don't like performance/functionality tradeoffs with this class of product. Secondly, while I'm not a SAS user, I would've at least appreciated some more 6Gbps SATA ports on the chipset. Native USB 3.0 support would've been nice as well. Instead what we got was effectively a 6-series chipset with a new name. As Intel's flagship chipset, the X79 falls short.

From left to right: Intel Core i7 (SNB-E), Core i7 (Gulftown), Core i5 (SNB), Core i5 (Clarkdale), Core 2 Duo
LGA-2011, 1366, 1155, 1156, 775
The vast majority of desktop users, even enthusiast-class users, will likely have no need for Sandy Bridge E. The Core i7 3960X may be the world's fastest desktop CPU, but it really requires a heavily threaded workload to prove it. What the 3960X doesn't do is make your gaming experience any better or speed up the majority of desktop applications. The 3960X won't be any slower than the fastest Sandy Bridge CPUs, but it won't be tremendously faster either. The desktop market is clearly well served by Intel's LGA-1155 platform (and its lineage); LGA-2011 is simply a platform for users who need a true powerhouse.
There are no surprises there, we came to the same conclusion when we reviewed Intel's first 6-core CPU last year. If you do happen to have a heavily threaded workload that needs the absolute best performance, the Core i7 3960X can deliver. In our most thread heavy tests the 3960X had no problems outpacing the Core i7 2600K by over 50%. If your livelihood depends on it, the 3960X is worth its entry fee. I suspect for those same workloads, the 3930K will be a good balance of price/performance despite having a smaller L3 cache. I'm not terribly interested in next year's Core i7 3820. Its point is obviously for those users who need the memory bandwidth or PCIe lanes of SNB-E, but don't need more than four cores. I would've liked to have seen a value 6-core offering instead, but I guess with a 435mm2 die size it's a tough sell for Intel management.
Of course compute isn't the only advantage of the Sandy Bridge E platform. With eight DIMM slots on most high end LGA-2011 motherboards you'll be able to throw tons of memory at your system if you need it without having to shop for workstation motherboards with fewer frills.
As for the future of the platform, Intel has already begun talking about Ivy Bridge E. If it follows the pattern set for Ivy Bridge on LGA-1155, IVB-E should be a drop in replacement for LGA-2011 motherboards. The biggest issue there is timing. Ivy will arrive for the mainstream LGA-1155 platforms around the middle of 2012. At earliest, I don't know that we'd see it for LGA-2011 until the end of next year, or perhaps even early 2013 given the late launch of SNB-E. This seems to be the long-term downside to these ultra high-end desktop platforms these days: you end up on a delayed release cadence for each tick/tock on the roadmap. If you've always got to have the latest and greatest, this may prove to be frustrating. Based on what we know of Ivy Bridge however, I suspect that if you're using all six of these cores in SNB-E that you'll wish you had IVB-E sooner, but won't be tempted away from the platform by a quad-core Ivy Bridge on LGA-1155.
I do worry about the long term viability of the ultra high-end desktop platform. As we showed here, some of the gains in threaded apps exceed 50% over a standard Sandy Bridge. That's tangible performance to those who can use it. With the growth in cloud computing it's clear there's demand for these types of chips in servers. I just hope Intel continues to offer a version for desktop users as well.



Source: AnandTech

No comments: