Whitebox Switches in the Enterprise?

In my previous role working for a service provider, one of the last pieces of work I did before leaving to setup Cloudflux, was to put a couple of whitebox switches running Cumulus Linux into production.

I have to admit that even in an organisation that regularly operates around the bleeding edges of technology, this felt slightly out-there. (Although, that feeling quickly disappeared once I'd rolled up my sleeves and got my hands dirty.)

When I spoke to friends of mine working in the traditional enterprise world about it though, their reactions were somewhat more extreme - a mixture of astonishment and bemusement, as though I had committed a crime against networking by indulging such folly.

Whilst I accept that enterprises are generally more risk averse than service providers when it comes to new technology, these conversations really got me thinking about whether they will ever really buy into the ethos that underpins the whitebox switching movement. And if they will, what are the significant barriers to entry for them?

For years' enterprises of all sizes have purchased commodity servers and installed their own operating systems or hypervisors on them, enjoying the freedom that comes with detaching the software from the underlying hardware.

When it comes to networking though, these same enterprises have been buying software-locked hardware at premium prices, happy to let the big vendors dictate to them what they get and how they can use it. (The same has been true of storage, but let's stick to networking for now)

So, apart from aversion to risk (and the frustrating apathy that often exists at management level when it comes to investing in networks), what's stopping enterprises from taking what seems to be the next logical step, and applying this server strategy to the network? (and potentially saving a bundle of cash in the process)

  1. Relative criticality.
    It's generally accepted (and planned for) that servers will go down. Server clustering technologies like MS Failover Clustering and vSphere HA are designed specifically with this in mind. If a server goes down, it's considered a critical issue, but we all carry on with our lives whilst steps are taken to resolve the outage in the background. As a result, we are more likely to take risks with server tech.
    Conversely, we expect network devices never to go down. And whilst we have been building resiliency into our networks for years with spanning-tree (STP), we didn't necessarily trust that it was going to behave how we expected it to at any given point.
    More recently, enterprises have phased out STP where possible and adopted MC-LAG. It's not without its flaws, but for the most part this has made us more comfortable with the prospect of losing a network device than under STP's reign of terror.
    And like most of the big vendors, Cumulus has supported MC-LAG for some time.

  2. The 'one throat to choke' argument. (This issue alone could make an entire blog post, but I'll try and keep it brief for now)
    Whilst I can see there is some merit to this argument, as a technical guy, I have always disliked it. In my experience, the requirement to have 'one throat to choke' is usually sold in to the enterprise based on flawed assumptions and spuriously calculated TCO savings. (The extension of which is hideously overpriced preconfigured racks of kit, like vBlock)
    Even if you do buy into this argument though, you needn't worry, since Cumulus will support the hardware too, providing it's on their hardware compatibility list.

  3. The skills gap.
    Often one of the arguments put against moving to any other kind of switch vendor (whitebox or otherwise) is that technicians only know how to configure and manage devices from the incumbent vendor. Hence it is argued that the cost and disruption of retraining staff makes the new solution more, not less, expensive.
    I'm sorry, I don't buy it. If your tech guys understand the fundamentals, (and there's a good argument that you shouldn't be letting them loose on your network if they don't), then they can apply this knowledge regardless of the vendor. Sure it will take them a little bit longer to begin with whilst they adapt to the differences in command syntax, but a switch is a switch.

  4. Lack of major vendor buy in.
    OK, so if we talk about the hardware side to begin with, then already HPE, Dell, Mellanox, Supermicro, Quanta and others are selling whitebox switches. So clearly we have significant organisations taking this very seriously.
    Looking at the software side, right now the two major players are Cumulus and Big Switch. (I've concentrated on Cumulus in this post as I've had zero exposure to Big Switch)
    Playing catch up to these is the HPE led OpenSwitch project, which, given time to mature, should provide competition.
    Then hovering around the periphery with interest, waiting for the party to get into full swing, we have Cisco, Arista and Juniper.
    Cisco's NX-OS and Arista's EOS are Linux based switch operating systems, so both are fully primed to get involved as soon as they feel the timing commercially speaking is right.
    Juniper meanwhile are already selling a switch that uses hardware from the Open Compute project, but runs their FreeBSD based Junos, so are right in there too.

So getting back to my original hypothesis - Whitebox Switches in the enterprise?

I think it's just a matter of time.