One of the most important aspects of the Total Uptime Cloud Platform is the underlying IP Anycast architecture. Without it, we would not be able to deliver a 100% uptime SLA and the level of performance our customer’s demand. In this article, we’ll explain what IP Anycast is (just called “Anycast”) and how we use it to deliver performance, reliability and uptime to our customers.
What Anycast is Not
Anycast is not a protocol or proprietary technology requiring special capabilities in servers, clients or networks. It is simply a configuration methodology for BGP described lightly in RFC1546. It has been the basis of large-scale (and mostly static) content distribution networks since at least 1995 and today is being used more widely by large organizations for global redundancy in other areas as well.
Confusing Anycast with Multicast
Anycast is often confused with Multicast, but for good reason. From an IP standpoint, Anycast might look like Multicast until the connection stage. Multicast is one-to many and allows a client to connect to multiple nodes simultaneously. Naturally, the protocol must support multicasting, so a typical use for multicast is streaming audio or video, for example, or something like the peer-to-peer file sharing network – BitTorrent – which allows a client to download a file in chunks from multiple hosts simultaneously. While the Total Uptime cloud fully supports multicast for applications that require it, we won’t go into detail in this article.
Anycast is similar to Multicast, except that the client connects to a single node, even though multiple nodes may advertise their availability to deliver the service. However, it is important to note that the client may not know of multiple nodes and assume there is only one, which is by design.
While we’re at it, we might as well mention Unicast. It is, of course, your most standard client/host configuration. A single source announcing its availability to provide the service and the client only has one option but to connect to that single host or no host at all. That “host” could certainly be a cluster of devices, but they are all at the same location.
So what is Anycast and how does it work?
At its core, Anycast is actually quite a simple concept if you remove ‘behind-the-scenes’ tunnels and monitoring, which we will discuss shortly. Essentially, multiple cloud nodes or instances of a service announce and share the same publicly accessible IP address. So for example, the IP address of 184.108.40.206 would be advertised for the cloud node in Singapore at the same time as it is being advertised for the node in London, New York and others.
The routing infrastructure directs any packet to the topologically nearest instance of the service based on BGP paths, which from a router perspective, is no different than any other network looks. When the router near the client requests the path to the IP, it receives various advertised routes and simply chooses the one with the shortest path. In traditional networks all paths lead to the same destination, but in an Anycast topology, all paths might lead to different destinations, but the router doesn’t care and technically has no knowledge of the fact that different paths might lead to different destinations. It simply and consistently chooses the best path each and every time, unless it disappears in which case another path will become the best one.
What are the benefits of Anycast?
There are quite a number of benefits to implementing an Anycast network.
- Increased Reliability: Anycast improves reliability of a network-based service by the placement of multiple, geographically dispersed servers or clusters using the same IP address. In the event one server or server cluster fails, traffic is simply redirected to another node without having to change IP addresses.
- Load Balancing: Dynamic layer 3 routing of Anycast IP Addresses nicely load balances traffic over different nodes based on geography. If equal cost route paths are visible from one geography, all nodes can be used.
- Increased Performance: Traffic destined for an Anycast node will be routed to the topologically “nearest” node, thus reducing latency between the client and the node. This ensures that client traffic uses a server cluster closest to them wherever they are globally.
- Attack Mitigation: Geographically dispersed server clusters operating using the same publicly announced IP address naturally attract attacks to them, thus sinking it closer to the origin. This also significantly improves capacity and also masks the true location of any “real server” proxied by the Anycast address and hidden behind it.
- Enhanced Availability: In the event that an Anycast node becomes unavailable, traffic can simply shift to an alternate node as soon as the routes are withdrawn from the routing table without the need for the client to communicate with a new IP address. With proper back-end route configurations, tunnels and connection state management, there is no degradation of service even while waiting for routes to be withdrawn.
So what is Required Behind the Scenes?
The behind-the-scenes implementation is where IP Anycast becomes a little more complex. In a stateless configuration it isn’t as critical, but where state is essential, content synchronization becomes the principal engineering concern. Total Uptime Technologies’ Anycast network is not only comprised of public-facing networks, but back-end private-line tunnels designed to route traffic from node-to-node in the event of failure or to maintain connection state between client and server.
Typically, cloud nodes or server clusters within a node share a common virtual interface attached to their loopback devices and speak an IGP routing protocol to an adjacent BGP-speaking border router. Monitoring of the service ensures that in the event of a failure, routes can be withdrawn immediately to re-route traffic. Once a cluster architecture has been established, additional clusters can be added to gain performance, implement load distribution or failover between them either locally, regionally or globally.
The Caveats to Anycast
The biggest caveat with implementing an Anycast network properly is the complexity of managing route announcements. You must ensure that announcements are evenly spread over equal-quality providers, and you should use BGP communities and other traffic engineering techniques to maintain proper traffic routing. You must also avoid static routes that could create black-holes during a failure, and focus on more automated approaches with IGP and BGP.
Secondly, you must make certain that in the event of any customer impacting event, no matter how short, route announcements must be withdrawn automatically and until such time as that propagates, behind-the-scenes routing must be utilized in order to divert traffic to alternate nodes maintaining availability and if necessary, session state.
The bigger the network, the better it becomes, but the more critical automation is to its success.
Why not a Hybrid Architecture?
Some organizations believe that a hybrid approach (Anycast and Unicast together) is the best way to deal with Anycast complexities, but we strongly disagree. The inherent problem of an IP address being tied to a physical location (Unicast) does not disappear when combining the two. Yes, it may create some level of redundancy when all systems are online, but during an outage it has the potential to make things worse. Total Uptime Technologies’ dual-stack Anycast network provides a quadruple-level of redundancy that also makes outages completely transparent to the end-user.
The primary reason for avoiding Unicast altogether is due to the fact that any long-lived, persistent TCP transaction would not be re-routed in the event of an outage because the IP address would be inaccessible. Even in simpler applications such as DNS where transactions (in this case queries) are very short-lived and where resolvers generally try additional name servers in the event the first one fails, a downed Unicast node causes ‘time-outs’ of up to 5 seconds while resolvers ‘rotate’ through the list. This does not stop until the Unicast node is back online or the authoritative name server is changed or removed from the root servers, which has the potential to take up to 48 hours to propagate the Internet. Anycast completely solves this problem by ensuring that the IP Address given for a name server always routes and resolves to a functioning server or cluster.
A properly built and well-maintained and monitored Anycast network is the only way to go. It greatly improves the performance and resiliency of a cloud network provided it is properly designed, maintained and proactively monitored.
Wikipedia – Anycast
Anycast Addressing on the Internet by John Kristoff
Deploying IP Anycast – Presentation Resource Page at CMU
Deploying IP Anycast – Ken Miller CMU Network Group NANOG29 – Oct. 2003
On the Use of Anycast in DNS – Sandeep Sarat, Vasieios Pappas, Andreas Terzis 2004
Best Practices in DNS Anycast Service-Provision Architecture Bill Woodcock Gaurab Raj Upadhaya – March 2006
Configuring Anycast DNS
Best Practices in IPv4 Anycast Routing v1.0 by Bill Woodcock August, 2002
Anycast DNS: The Secret to High Availability Whitepaper by Secure64