As applications and services move into virtualized environments, and network speeds leap from 10Gbps (yesterday) to 25/40Gbps (today) to 100Gbps in coming years, the legacy methods of monitoring application flows don’t scale. At the same time, with overlays and underlays and associated complexity, there is a bigger need to monitor every application and every application flow.
CIOs and Network Operations staff need real-time and historical data to be able to see instantly:
- How many active servers and VMs the business has at the moment, and historically
- What kinds of applications and services are active, and what specific servers and VMs they are tied to at any given instant
- Network load as aggregated views, based on application, service, server, or a VM, with drill-downs to individual connections in real time or historically, for compliance, debugging, SLA monitoring, and security
We also need quicker time-to-resolution for application performance degradation, without putting agents on servers and VMs, and without using TAPS, mirror/span or sampling — which add to CAPEX and OPEX (for building and maintaining a separate monitoring network).
One of the fundamental changes in networking in recent years is the arrival of commodity switch chips from Broadcom and Intel, with a PCI-Express Control Plane (as opposed to the older, low-bandwidth MDIO/I2C control plane), as well as faster and cheaper CPUs from Intel (Xeon-D class processors) that make the top-of-rack switch a very potent platform for full application visibility.
The Netvisor Virtualization Centric Fabric (VCF) leverages the modern top-of-rack switch architecture to use the switches as a pristine data source. Its associated insight analytics (VCF-IA, one of the applications in VCFcenter) helps answer application-specific question every day. The Netvisor architecture tracks every application flow in real time, and also records it so users can go back in time and look at every flow that was happening when a problem arose. Using the commodity switch as a data source for tracking every application flow down to each TCP connection was the founding principle of Netvisor architecture.
Issues with Application Analytics today
Typical switches and routers are dumb packet-switching devices. They switch billions of packets per second between servers and clients at sub-microsecond latencies using very fast ASICs, but have no ability to record anything. External optical TAPs and monitoring networks have to be built to get a sense of what is actually going on in the infrastructure. The figure below shows what monitoring today looks like:
You can see the challenges. The typical enterprise and datacenter network connecting the servers is running at 10/25/40Gbps today, moving to 100Gbps tomorrow. These switches typically have 40-50 servers connected to them, pumping traffic at 10Gbps.
Traditionally, there are three ways to see everything that is going on in your network:
- Provision a fiber optic tap at every link, and divert a copy of every packet to the monitoring tools. Because the fiber optics taps are passive, you have to copy every packet, so the monitoring tools need to deal with Tbits/sec or 1B+ packets per second from each switch - which requires a massive amount of compute, memory, and storage.
- You could instead selectively place fiber optic taps at uplinks or edge ports, with the result that inner network is a black hole - you'd have no visibility into what is going on. One thing we have learned by experience is that, without 100% visibility, you can’t fix anything.
- Using the switches themselves to selectively mirror traffic to monitoring tools. This is a more popular approach these days, but in relies on sampling, where the sampling rates are typically 1 in 5,000 to 10,000 packets. That's better than nothing, but still gives us nowhere close to meaningful visibility, and the cost goes up exponentially as more switches are monitored, because the monitoring fabric needs more capacity, and the monitoring software gets more complex and needs more hardware resources.
Switch OS provides a distributed monitoring architecture
Netvisor Virtualization Centric Fabric and its associated Insight Analytics track and present all application flows, with drill-downs into each flow, service, VM, or bare-metal server.
The data source can be:
- Meta-data from Pristine Data Source with Netvisor running on top-of-rack switch, where each switch collects the meta-data for every flow and sends it to the Insight Analytics application over a REST API. The Netvisor Fabric utilizes its multi-threaded high performance control plane running over switch chip PCI-Express to get useful traffic and process it. The bulk of the processing happens on a local Netvisor instance, and only metadata goes to the IA application, allowing this solution to scale to billions of flows.
- Netvisor VM running on individual servers can also provide a data source. This works similarly to Netvisor Fabric, where each instance does the packet processing and sends the metadata to the IA application.
- Third party switch mirror/span ports. You can get analytics for your legacy deployments simply by turning the mirror/span ports towards the VCF-IA appliance, which will use the Netvisor vflow capability on the appliance hardware to collect relevant packets. (This approach does have some scaling limitations.)
Integrated with Ericsson’s Hyperscale Datacenter System 8000 and OCP-style top-of-rack switches
Ericsson has built the next generation hyperscale datacenter systems based on Intel® Rack Scale Design. The networking blades in the rack are OCP style server-switchen that connect the servers and storage using advanced optics. Netvisor provides Layer 2/3 connectivity with fully integrated application visibility and fabric-based hardware accelerated VXLAN tunnels. The Hyperscale Datacenter System racks are orchestrated by a Command Center that provides seamless orchestration, multi-tenancy, and application visibility, allowing the racks to operate with much lower OPEX, with very fast deployment of network functions that take advantage of switch-based hardware acceleration.
Netvisor is also available on a variety of OCP style top-of-rack switches and server-switches.
As applications become more virtualized and move to public and private clouds, the ability to monitor each and every connection is of paramount importance. It becomes impossible to answer simple questions such as: "Which VM or container is causing the problem?" "Why do some applications see higher latency?" or even more basic questions like: "What is the average latency for a service and which clients are seeing higher latency than average?" If a public or private cloud operator can't answer these questions every day, their SLAs are affected and customers are very unhappy. Modern Switch OS on OCP style top-of-rack switches and server-switches enables a new generation of application visibility, independent of physical and virtual environments.
VCF – Insight Analytics - http://www.pluribusnetworks.com/resources/pluribus-vcf-insight-analytics/