Opinions
Last updated: August 10, 2009
A friend once said that opinions are like keisters, everyone has one. Below are some from our over educated staff. If you have an opinion or question and wish to share it send an email to: admin@10gbe.net. We'll ignore it for a week or two, then ponder over it for another, then just when you've forgotten what you've written will consider adding your thoughts with some credit, provided it's a reasonably decent idea.
The 10GbE NIC Herd is Thinning
In 2007 over one million 10GbE network ports were purchased. Many of those were for switch to switch interconnects but some were to connect servers to networks via 10GbE. Natural selection is now taking effect in the 10GbE NIC market as the big dogs, Intel & Broadcom, start thrashing around in an effort to secure market share as 10GbE matures. Both want to dominate the 10GbE LAN on Motherboard (LoM) market. In the NIC market four companies likely supply over 80% of the 10GbE NICs purchased and they are: Chelsio, Myricom, Intel and Neterion. The remaining 20% of NIC sales fall to companies like: Broadcom, SMC, NetXen, ServerEngines, Tehuti, AdvancedIO, Endace, Napatech, etc... One should be wondering why Broadcom is in the second group, it's because Broadcom's focus is on selling 10GbE silicon to OEMs like IBM and HP for LoM projects positioning their silicon on high end server mother boards and not retailing NIC cards.
Officially the first documented victim was NetEffect in August of 2008. NetEffect was the leader in iWarp (Infiniband for 10GbE) NICs. NetEffect rose from the ashes of a failed Infiniband company, Banderacom, earlier this decade to apply their silicon development skills and Infiniband algorithms to the more stable Ethernet market as a new feature called iWarp. NetEffect in-fact led the iWarp charge, it was the self proclaimed leader in low-latency iWarp 10GbE NICs. In August NetEffect filed for reorganization in US Bankruptcy court. With the failure of NetEffect the market has cast its vote and drove a steak through the heart of iWarp, hopefully terminating this unncessary feature.
Teak Technologies we believe was the second. They were a maker of 10GbE NICs and a switch, has for some time appeared to have winked off-line. It appears that Teak has not weathered the storm and has since faded away, their domain name is no longer resolving to an IP address. The domain was never transferred from the founder, and the founder announced last spring on Linkedin that he had moved on some time ago. Is it conclusive evidence, no, but would you buy technology from a tech company whose URL won't resolve to a server?
The third fatality was NetXen which was acquired on April 30th 2009 by QLogic for $21M. Keep in mind SEC filings show that NetXen had raised at least $57M in capital from Benchmark and several other firms. NetXen was one of the first 10GbE NIC startups and had some initial success securing OEM wins with IBM and HP. Clearly with nearly $60 million in and roughly $20 million out, Benchmark and others burned $2 for every dollar invested. NetXen was also a founding board member of Blade.org. Even with all this marketing they couldn't reach profitability.
Number four, has yet to be announced but our money is on a company whose name starts with 'N' and who turned down a substantial offer earlier last year.
It is a tough economic climate for startup NIC companies, particularly those in the bottom 20% as they have likely never had a quarter in the black. Now is a challenging time to be out there seeking another round of capital from ones VCs. Several have been without an injection of new funding for over two years and lack the sales volume required to sustain their own existence much beyond year end. As such we've directly questioned one firm to see if they are alive, and another that is widely rumored in the industry to be in trouble, but their marketing departments are still bailing. Once something is published validating this rumor we'll update this post.
Hold up there sonny you're doing 9.956Gbps in a 10G zone, what's the hurry?
Someone said the other day that we were wrong and that the speed limit for 10GbE was really 10.3125Gbps, so dumbstruck we needed to validate our claim of 9.95Gbps.
Most NICs talk XAUI to a PHY driver chip that then controls the media. XAUI has four pairs of receive/transmit lanes that each operating at 3.125Gbps and utilize 8/10b encoding. So after decoding you have 10.000Gbps of actual usable media bandwidth, hence 10G. Ethernet adds it's own shipping and handling fee though and so people who measure using the operating system (OS) on their server will see what Ethernet allows them.
Ethernet is then traveling over XAUI and it requires an interframe gap (IFG) spacing of 96 bits, more precisely the time it would take to transmit 96 bits (actually 12 octets) which on 10GbE is 9.6ns. So between every packet there is 9.6ns of following distance, essentially that three second following rule you taught your kids when they learned to drive.
Ethernet also requires a seven byte preamble and a one byte start of frame delimiter, that's 64 more bits or another 6.4ns of air in front of your packet. So between any two actual "packets" on a 10GbE wire this is a total 16.0ns of dead air. Well how does that impact the actual OS measurable bandwidth. But wait there's more... the actual 802.3 frame has the source address 48 bits, destination address another 48 bits, the ethertype another 16 bits and after your precious data a 32 bit CRC. So when you take all this overhead into account to send a simple 46 byte payload (your data) there is 38 total bytes of overhead. How does this impact the actual space on the wire for your stuff:
- The classical 64 byte ethernet packet carries 46 bytes of user data, add back in the 38 bytes of overhead mentioned above and the efficiency of the wire becomes 46/84 or 54.76%. So if you're putting small ninja bike sized packets on a 10GbE wire you will NEVER get more than 5.476 Gbps of unidirectional data out, it's the law!
- For the more traditional 1,500 byte packet it turns out the 1,500 byte number does NOT include the source/destination/type & CRC bits so the ratio of actual data to the whole is 1500/1538. This yields a more reasonable 97.53% efficiency or 9.753 Gbps for the more typical minivan sized packet.
- Finally there is the big daddy, the tractor-trailer of ethernet packets or the 9,000 byte jumbo. It's treated like the 1,500 so the efficiency here is 9000/9038, 99.56% or 9.956 Gbps. Some switches don't support jumbos so be careful when you purchase and do your bake-offs.
So when someone asks why they can only get 5.5Gbps out of their fancy new 10GbE card you can ask with confidence, "Are you using 64-byte ethernet frames, if so did you know you'll never get more that 5.5Gbps." In reality most traffic utilizes 1,500 byte frames so the maximum user available bandwidth is typically 9.753 Gbps, save that one though for dinner party conversation.
Dualies aren't just for trucks or what is a Dual-port NIC?
One would think that after 30 years our industry would have developed a NIC naming convention for "dual-port". Does a dual-port NIC mean your OS sees one or two interfaces? Do dual-port NICs mean that one port is active and the other is for fail-over? Can a dual-port run traffic through both port simultaneously? It all depends on who you talk to, and the product they're selling.
With 10GbE we've seen three main approaches for building dual-port NICs:
- Active/Active: this is what most people expect, a single OS interface with a driver that sprays traffic fairly evenly across both network ports and if one port fails the other picks up the slack until it can handle no more:
- Dual-NIC: two OS interfaces are presented to the OS and both interfaces run independently. This typically affords the best performance and the most flexibility:
- Myricom's 10G-PCIE2-8B2-2S+E for $995 appears to be the only example of this approach. Myricom utilizes two unique 10GbE controllers on the same PCI Express Gen2 NIC and a PCI Express bridge chip to break the slot into two unique NIC devices.
- Myricom's 10G-PCIE2-8B2-2S+E for $995 appears to be the only example of this approach. Myricom utilizes two unique 10GbE controllers on the same PCI Express Gen2 NIC and a PCI Express bridge chip to break the slot into two unique NIC devices.
- Active/Passive or Active/Fail-over: a single OS interface with a driver that monitors connectivity on the active port and if the connection fails the driver migrates traffic rapidly over to the second port:
- Myricom's 10G-PCIE-8B-2S+E for $795 is an example of this type of card. The fail over time is under 10 microseconds.
- Chelsio's B320E Bypass adapter for $3,483 is similar but it can detect an OS/BIOS/System failure and make a hard switch over to the second port.
Do the above categories cover it, or do we need more lingo? When looking for a dual-port NIC, what features do you require, and what do you expect? Please let us know: admin@10gbe.net.
The 800Lbs Infiniband Gorilla has Left the Zoo
Well as predicted earlier this year Infiniband is now beginning to fade away. Two new trends validate this assertion: Cisco's actions over the summer and Voltaire's revenue announcement earlier today.
Cisco is the 800lbs. network gorilla. Early this spring the gorilla introduced their Nexus series of high performance 10GbE switches, they then began to dismantle their Infiniband business. Over the past three decades Ethernet has crushed every protocol in its path and Infiniband is no different. The Ethernet Gorilla is shoveling coal as fast as it can into the 10G Ethernet revenue train and it's gaining speed fast. Soon Infiniband will be just another footnote in networking history. Throughout the summer and early fall Cisco has quietly been publishing End-of-Life and End-of-Sale notices for all their Infiniband (SFS) products. Their sales team has also been telling Infiniband customers to consider moving to 10GbE. The financial market is one of Cisco's biggest revenue sources, and few if any financial customers have ever invested significantly in Infiniband. Externally, a novice looking in would see this market as being in-trouble, but in fact the converse is true. To remain competitive these financial houses, and the markets they trade in, must periodically upgrade their enormous infrastructures. Gigabit Ethernet (GbE) is long in the tooth, having been the network for the last few cycles. ALL these financial houses and markets are testing, deciding and planning on what they are going to use later this year and next. The migration from GbE has begun and the destination is 10GbE, not Infiniband.
Earlier today, October 8th, Voltaire announced that revenues missed projections by nearly 20% in the third quarter. Why, the world financial markets was the reason offered up. Now by itself this may make sense, but coupled with Cisco's actions as outlined above it's validation that the without the 800lbs gorilla pushing Infiniband many have lost faith in its future.
Mellanox, the leading supplier of Infiniband silicon is due to announce their results later this month, October 23. This event should validate that Infiniband's days are numbers, but is it a two or three digit number?
The Mummy Roaming Your Data Center
While Brendan Fraser travels China in his latest quest to terminate yet another
mummy. IT leaders are starting to wonder if they've got a mummy of their own haunting their raised floor. This mummy is easy to find, he's wrapped in thick black copper cables, and his long fingers may be attached to many of your servers. It is Infiniband!
Once praised as the next generation networking technology, having conquered High Performance Computing, it continued it's battle for world networking domination by attacking storage and now the data center. It promises you 20Gbps, hinted that it would soon offer 40Gbps and shared with you it's plans for 160Gbps! It claimed full bi-section, the ability to use all the network capacity available, and low latency (the time it takes to actually move a packet of data around). It's democratic, the software stack was developed by an "open" committee of great technological leaders so it MUST be good for us. Everyone from HP to SGI has sung it's praises whenever they've come by to peddle the latest in server technology. A corpse wrapped in rags, a centuries old immortal Dragon Emperor or a black cable bandit, they all can be eradicated.
We will tear this black cable bandit down to size one claim at a time. First they assert that it's 20Gbps, how about 12Gbps on it's best day with all the electrons flowing in the same direction. Infiniband employs what is know as 8b/10b encoding to put the bits on the wire. For every 10 signal bits there are 8 useful data bits. Ethernet uses the same method, the difference is that Ethernet for the past 30 years has advertised the actual data rate while Infiniband promotes the 25% larger and useless signal rate. Using Infiniband math Ethernet would then be 12.5Gbps instead of the 10Gbps it actually is. So using Ethernet math Infiniband's Double Data Rate (DDR) is actually only 16Gbps and not the 20Gbps they claim. But wait there's more! I said earlier that you will only get 12Gbps under ideal conditions, where did the other 4Gbps go? Today most servers use PCIe 1.1 8-lane I/O slots. Ideally these are 16Gbps slots, once you add in PCIe overhead though you only get about 12Gbps on the best of systems. So with a straight face they sell you 20Gbps knowing in their heart you'll never get more than 12Gbps, and that is rarely ever possibly once you use more than 12 servers.
Full bi-section, the ability for a network of servers to use all the network fabric available. Infiniband claims that using their architecture and switches you can leverage the ENTIRE network fabric under the right circumstances. On slides this might be true, but in the real world it's impossible. Infiniband is statically routed, meaning that packets from server A to server X have only one fixed predetermined path they can travel. One of the nations largest labs proved that on an 1,152 server Infiniband network that static routing was only 21% efficient and delivered on average 263MB/sec (2.1Gbps of the theoretical 10Gbps possible). So when the tell you full bisection, ask them why LLNL only saw 21%? In an IEEE paper presented last week it was proven that statically routed system can not achieve greater than 38% efficiency. Now some of the really savvy Mummy supporters will say that the latest incantation of Infiniband has adaptive routing, they do this by using yet another shell game, they redefine the term adaptive routing to mean more than one static route. Real adaptive routing and using a pair of static routes are vastly different things. Real Adaptive routing can deliver 77% efficiency on 512 nodes and nearly 100% efficiency on clusters smaller than 512 nodes. If you want full bisection for more than a 16 node cluster talk with Myricom or Quadrics, they do real adaptive routing.
Latency is the time it takes to move a packet from one application on network server to another application on a different server on the same network. Infiniband has always positioned itself as being low latency. Typically Infiniband advertises a latency of roughly three microseconds between two nics, using zero-byte packets. Well in the past year 10GbE NICs and switches have come onto the market that can achieve similar performance. If we look at Arastra's switches they measure latency in a few hundred nanoseconds while Cisco's latest 10GbE switches are sub four microseconds, compared to the prior generations that were measured in the 10's of microseconds or more. Now when the Infiniband crowd crows about using low latency switching ask them about Arastra or BLADE Network technologies 10GbE switches.
Infiniband claims 20Gbps and delivers less than 12Gbps. Infiniband claims full bisection and beyond a small network they can't exceed 38% efficiency. Infiniband claims low latency and now 10GbE can match it. Where is their value proposition in the data center?
Last week Cisco jumped behind something called Twinax, Why? Three likely reasons:
- Flexibility - Cisco and Juniper both selected SFP+ as the PHY for their new line of 10GbE switches. Offering an SFP+ cable with a connector on the end that enables you to use a single SFP+ port for all your connection needs is a stroke of genius. Say you need a short run from one switch to a server, plug a Twinax cable with SFP+ connectors on each end in and your good to go, up to 10 meters. Suppose later you need to move that server another 50 meters away then pop in SR optics on both ends and use fiber. No changes to the servers or switches, just swap in optics.
- Cost - there has been a run up recently in the price for copper, while the cost of Twinax coax cable remains fairly fixed.
- Power - SFP+ is rated at 1W/port, the Twinax solution typically draws 1/4W. CX4 is similar but compared to 10GBase-T at 10W (current generation) or even 2W for the next generation (under 30M) this is a huge power savings.
- Latency over 10GBase-T - Current 10GBase-T uses a DSP at each end to separate the signal from the noise. This DSP adds roughly 2useconds on each end of the connection, compared to under 200ns for the Twinax conversion.
We are closely watching how Twinax plays out over the next few months, and we'll let you know what we learn.
Conduct a 10GbE Bake-Off
Have you ever held a Bake-Off to select a core technology for a project? Not an RFI, but an actually honest to god series of "real world" tests. Few things are as exciting as setting up a technology obstacle course that is somewhat indicative of what your business environment is like then having various vendors run through it. Several times in my past I've conducted these when emerging technologies like server UPS systems and VOIP telephony were new in order to shake out the posers from the players, evaluate "real-world" performance then determine value.
Few vendors post actual price and performance data on the web, let alone the methodology they used to arrive at those performance numbers. If only there were an independent third party that actually ran Netperf, Iperf, ntttcps, ntttcpr and other tools on all the available 10GbE NICs using the same test systems then posted the results for everyone to see. Some companies would never recover. For legal reasons the vendors won't, and in most cases do not want to, do it because the results would only help one or two companies and likely not theirs. Today all most consumers have to go on is the cost of the adapter, wouldn't it be great if you knew the cost/Mbps of the adapter prior to buying it so you could easily compare between adapters. Some would argue that features like iWARP and TOE should be factored in, but today they are just marketing fluff and rarely delivery any significant end user value.
So how do you determine which NIC will perform the best and deliver the most value for your company, do a bake-off! If you can make the time and the project is big enough the cost to conduct the back-off should easily be offset by the savings and performance gains you reap over time. Also a well constructed and executed bake-off will demonstrate not only to you, but your management, that you're an effective individual and a good steward of the companies resources.
Finally, share the full set of results with the vendors that participated, some will moan and groan, while others will kindly thank you for the opportunity to compete and move on. If the race was close their reactions at this point might be your deciding factor. So pull on your oven mitts and start baking...
Hidden Costs and Benefits of 10GbE
When embarking on a new IT project one rarely considers the network, unless of course the network is the project. Data networks in many cases get the same level of attention as the AC power. You expect plenty to be available, all the time and without interruption. Rarely is the network considered a performance bottle neck.
One time I assumed responsibility for improving the performance of an MS SQL server that was vital to our business. The primary job this server ran took 75 minutes and it was scheduled to run, how many of you see this coming, every hour! This server was tracking and reporting on $10's of millions in new business every month.At first glance I noticed several back to back to back bottle necks. The system was memory starved, the drives were in a near constant state of thrashing and all SQL I/O from the system went through a $10/NIC card. Although the NIC functioned it was forcing the switch to drop far too many packets. At lunch that day we picked up a newer server class NIC card for $40 and immediately recorded a substantial performance improvement. The job would finish in just under the 60 minutes allowed. We could have spent the next week chasing performance curves, instead we installed a new server, a dual processor single core box and the job now completed in well under a minute. So a $40 NIC improved performance by 20% while replacing the whole server for roughly $5,000 improved performance by 98%. Clearly the NIC delivered the biggest bang for the buck, but it just brought the network performance curve in-line with that of the CPU, memory & disk.
How many dual-socket quad-core servers were installed today, May 13th 2008, with GbE? These servers have 4X the horse power of my $5,000 server from 2002, but they both share the same GbE. Furthermore, today we use VMWare and Xen to pack several logical servers into a single physical server in an effort to more efficiently utilize our hardware resources. We don't hesitate to add more memory or disk, but adding a 10GbE board requires substantially more effort and planning.
When making the jump from GbE to 10GbE one needs to not only select a NIC, but the media (CX4 or fiber) and a new switch infrastructure. High performance NICs run $700-$2,000/each. depending on the media and vendor. If you go fiber the optics run $500-3,000/each and you need one on each end of the cable. Finally there's the switch. Stack-able layer-2 switches run in the $400-$1,200/port range while enterprise layer-3 switches often run several thousand dollars/port.
If your server is I/O bound a good 10GbE NIC and switch can enable 5-10X the output of the "free" GbE port that comes with your server. Suppose you purchase a new server for $5,000, then you add a high performance 10GbE CX4 copper NIC and use a low cost layer two switch so the upgrade to 10GbE costs roughly $1,200 for this server. You need to only measure a 25% gain in overall performance for you to realize a positive return on your investment! There are a new breed of hybrid switches that now offer 24 GbE ports and four 10GbE ports so one can easily make the shift from GbE for servers to 10GbE. Consider giving 10GbE a try.
The Age of Infiniband is Over
The core market for Infiniband has always been High Performance Computing (HPC). It has seen some adoption as a bus fabric by some OEMs, while others have attempted to market it as a storage solution. In the end though, it is still billed as a high bandwidth, low latency interconnect for building computing clusters. Unlike all the other IT markets the HPC market is very clearly defined and publicly monitored. Many inside HPC use the Top500 list as a very accurate barometer to measure market success. Twice each year for the past 15 years, thirty data points in total, the 500 fastest supercomputers in the world are recorded. In each record are things like processor type, interconnect family, location, etc... All of this data is freely available on the web.
If we focus specifically on Interconnects we can easily gather the following information:
- June 2003, the first Infiniband cluster enters the Top 500.
- November 2005, Infiniband captures more than 5% of the Top 500 systems.
- June 2007, at 25.6% of the Top 500 systems Infiniband peaks.
- November 2007, Infiniband starts it's downhill slide with 24.2%.
- June 2008, Infiniband adoption levels off at 24.0%
Of the 24 different Interconnect families found in the Top 500 over the past six years, only Ethernet has had more than one peak. Myrinet, Quadrics, NUMALink, NUMAFlex, SP Switch, etc... have all peaked then decayed.
Infiniband marketers actually compressed their product cycle from SDR (10Gbps) to DDR (20Gbps) in an effort to further increase the slope of adoption. This was successful, but it came at a price. The follow on, QDR (40Gbps), is just now reaching the market and to keep the hype machine going they've disclosed EDR (Eight Data Rate - 80Gbps) and HDR (Hex Data Rate - 160Gbps). Here we must take a moment and separate marketing from delivery. QDR is marketed as 40Gbps, this is the signal rate, the actual data rate is 32Gbps. For every 10 bits of signaling their are 8 actually bits of data. So to be clear QDR is really 32Gbps. Now a PCIe Gen2 8-lane slot is theoretically capable of 32Gbps, this includes the header, so in order to sustain 32-Gbps one would actually need a PCIe Gen2 16-lane slot and an adapter to match. All available QDR adapters today are 8-lane Gen2 cards and therefore will likely never achieve more then 24-28Gbps of the 40Gbps they promise.
Why then are the Infiniband marketers so anxious to push QDR if they don't have adapters that can deliver 32Gbps, and systems aren't generally available that have 16-lane PCIe Gen 2 slots. Simple 40GbE is preparing to roll over them next year. Broadcom and others are expecting to see initial 40GbE NICs in 2009. What the ethernet crowd doesn't crow about is signal rate. If they did they would be calling it a 50Gbps network, but in the Ethernet world they talk data rate, not signal rate. So where the IB crowd says their 40Gbps, what they really mean is that they are 32Gbps. With 40GbE we find that Ethernet will actually out deliver Infiniband.
This explains why Mellanox, the Infiniband company, has recently focused on moving towards 10GbE with their new line of ConnectX silicon & NICs. No general purpose network has withstood the continuous competition posed by Ethernet. IBM's Token ring was the most competitive, it was early and before Ethernet caught on. Token ring went through two generations 4Mbps and 16Mbps only to eventually loose to Ethernet which was 10Mbps at the time. Mellanox is sharpening it's skills with 10GbE in an effort to prepare itself for 40GbE in 2009.
During a roundtable discussion earlier this year a senior marketing executive from Mellanox claimed they would have 160Gbps (HDR) NICs in 2009. That translates to an actual unidirectional data rate of 128Gbps or 16GB/sec of required system bus capacity. By comparison the current front side bus on the Intel 7300 series of Xeon Quad-Cores, Intel's state-of-the-art, is only capable of 8.5GB/sec. Clearly the Infiniband crowd is feeling the pinch of Ethernet.
June 16th the Infiniband Trade Organization took the wraps off their roadmap. Nothing is coming out in 2009, and EDR (80Gbps) is slated for 2011. If that's truly the case 40GbE will roll over QDR in 2010 and EDR will never make it to market.
Two final bells must toll before we write the epitaph for Infiniband and position it as a footnote alongside Token ring: Infiniband falls to roughly 18% of the Top500 list, and Cisco kills off their line of Infiniband products (formerly TopSpin). In November we'll see if Infiniband does in-fact drop in popularity when the Top500 list is published. These two events should validate both the peak and permanent decline of Infiniband. So mark your calendar to update the Wikipedia pages for Infiniband and Ethernet in July.
TOEs are now last seasons Manolo Blahnik's, only worse?
For those not into style Manolo Blahnik is one of the leading female shoe designers, and often Blahnik's start at $700/pair, the price of a good 10GbE NIC. As most servers have moved to dual socket quad-core processors the value proposition for TCP Offload Engine (TOE) 10GbE NICs has quickly eroded. In the spring of 2006 a good non-TOE 10GbE NIC consumed 40% of the host CPU in a dual-socket dual-core server and provided >6Gbps of performance, while a similar TOE did the same job using only 10% of the host CPU. So with a 30% savings in host CPU there was some value in using a TOE.
With two years of improvements in silicon, stateless off-loads, and servers moving to dual-socket quad cores we now have 10GbE NICs capable of near-wire rate (>9.5Gbps) that consume only 10% of the host CPU. Similarly, TOE NICs in the same environment consume roughly 5% of the host CPU. By most estimates servers are typically running at 20% CPU utilization, as a result of application load. So will a 5% savings in host CPU be noticed, let alone worth the added purchase price of a TOE? No.
Add to that the Linux Foundation's 14-point argument against using TOES, written by the Linux Kernel developers themselves, and one would wonder why people still consider TOEs in style. Here are the 14 reasons cited by the Linux Foundation on their Net:TOE page:
1 Security updates
2 Point-in-time solution
3 Different network behavior
4 Performance
5 Hardware-specific limits
6 Resource-based denial-of-service attacks
7 RFC compliance
8 Linux features
9 Requires vendor-specific tools
10 Poor user support
11 Short term kernel maintenance
12 Long term user support
13 Long term kernel maintenance
14 Eliminates global system view
If you are seriously interested in buying a TOE you should read their Net:TOE page.
Is 10GBase-T ready today for consideration?
No. In March and April several companies began marketing 10GBase-T NICs: Chelsio, Neterion, Tehuti, and even Mellanox (the Infiniband company). Only one switch company, SMC, has dipped their toe in 10GBase-T market, why? Power. All of these products are based on first generation 10GBase-T silicon which is very thirsty for power.
In the 10GbE world all the NIC vendors separate their 10GbE chip from the physical (PHY) interface chip so they can be more responsive and flexible in creating NIC products and easily support several PHYs with a single 10GbE NIC chip. Today only three companies make 10GBase-T PHY chips: SolarFlare, Teranetics & Aquantia. Teranetics is having the most success signing Chelsio, Tehuti & Mellanox while Solarflare picked up SMC. What most avoid telling you is how much power these 10GBase-T PHY chips require, 8-12W. The vast majority of this power is used for only one purpose, separating the signal from the noise, the needle from the haystack.
What does this mean to you? Below is a simple example with 50 servers, focused only on the PHY power and the total power utilization for each of the three currently available media formats. Below is the power budget for the PHY on each end (NIC or Switch), total to support a server (both NIC and Switch PHY power) and the total for a 50 server project:
10GBase-CX4 PHY 0.5W/end, 1W/server, 50W for the project
10GBase-R XFP PHY 3W/end, 6W/server, 300W for the project
10GBase-T PHY 10W/end, 20W/server, 1000W for the project
This is the power needed to support just the 10GBase-T cabling and it is consuming enough energy to power two of the servers in your project! This is a cost you will carry for the life of the project and we all know conditioned data center power and cooling is not cheap.
When GbE first came out the initial round of PHY chips were also power hungry, of course that is no longer the case. With 10GbE the physics are significantly more challenging and it may take another year or two before 10GBase-T solutions have power consumption similar to CX4. If you are doing a project today that can benefit from 10GbE use CX4 if possible, or fiber if you need more than 15M. For fiber consider using SFP+ or XFPs as they are the most current optics, are the least expensive and consume far less power then XENPACK or X2.
Note on April 14th Solarflare announced their new PHY silicon, 10Xpress SFT9001, which consumes between 2.2 and 6W depending on cable length. This brings 10GBase-T into parity with fiber. This PHY chip will be available in sample lots to 10GbE OEMs in May. Even more recently on April 21, Aquantia announce a 10GBase-T PHY chip that is sampling in May and which claims to bring the power down to 5.5W for Cat6A cable up to 100M long. The delay from samples to OEM NIC vendors and completed NIC samples for customers is often in the neighborhood of 3-6 months. So in our opinion 10GBase-T NICs with a reasonable power envelope, 10-15W for the entire NIC, should be something that is available for consideration in the fall of 2008, just in time for those with year end budgets.
For more information and another perspective consider checking out what the Linley Group has to say on 10GBase-T.
Optics & Adoption
Today for short runs under 15 meters there are two common options: CX4 copper and SR (short range) fiber. The difference between them is essentially the cost of the fiber optic module. Today the most common module for 10GbE is the XFP, soon it will be SFP+. There are three sources for SR optics under $700 listed on our optics page. Optics are required on both ends so this makes fiber typically $1,400 more expensive than CX4 copper. Also copper adapters require less support logic and as such are often less expensive. Single port copper NICs run in the $700-$1,000 range while similar fiber NICs are $800-$1,200. The expectation is that SFP+ fiber modules will be roughly 25% less expensive than XFPs so this will make fiber more affordable in the second half of 2008 as SFP+ gains traction.
The real knee in the 10GbE adoption curve though will occur when the next generation of 10GBase-T products hit the market in early 2009. The current generation of 10GBase-T silicon requires far to much power to make it practical. This second generation of 10GBase-T will allow people to use cables and connectors they are familiar with, ex.Cat6E & RJ45, to attach servers and switches within 100M without the expensive CX4 cables or the optics required today for fiber.
Finally most of the 10GbE NIC vendors are on their second or third generation silicon. By early 2009 most will have trimmed and tuned things to the point that they will have, or soon support LAN on Motherboard solutions. When this happens we will see high end servers with 10Gbase-T support built in and 10G will then truly begin to replace GbE in the enterprise. We expect that this will likely begin to become common as we enter 2009.
