August 15, 2018

Instantly Scalable Load Balancing

  Announcements, Load Balancing, Networking, Technology

     

Today we've added support for multiple origin servers with load balancing. We balance the load at the edge of our CDN by expanding on our existing routing software. What this means is that we can balance the load with no additional resources and no warm up or adjustment period when a zone's traffic load changes. In other words, instantly scalable load balancing.

This feature is included for free with all accounts, and it does not incur any increased charges.

Under zone settings you'll now see a list of origin servers you've added and be able to add additional servers.

The order of the servers in the list isn't important: requests will be evenly balanced between the servers. In this post, I'm going to describe our (very simple) load balancing algorithm and why we're using it.

IP Addresses

First let's talk about what an IPv4 (or IPv6) address is. The familiar and common dotted decimal format (ie: 192.0.2.1) is an easy to remember format for humans of a four byte unsigned integer (and an IPv6 address is a 16-byte integer). Each of the four numbers corresponds to one of the four bytes in the integer. (This also explains why each of the numbers is between 0-255, since 255 is the maximum value of a single unsigned byte.) With a bit of simple math, we can convert the IPv4 address from the dotted decimal format into an integer:

192.0.2.1 = (first octet * 256³) + (second octet * 256²) + (third octet * 256) + (fourth octet)
192.0.2.1 = (first octet * 16777216) + (second octet * 65536) + (third octet * 256) + (fourth octet)
192.0.2.1 = (192 * 16777216) + (0 * 65536) + (2 * 256) + (1)
192.0.2.1 = 3221225985

The address 192.0.2.1 is 3,221,225,985.

Load Balancing

To use this to balance requests: take the remainder (r) of the IP number divided by the number of servers (n):

r = 3221225985 % n
1 = 3221225985 % 4

Note, we need to add one to this to account for 0. So using this result, we assign the traffic to the 2nd server in the list.

Advantages

So why did we choose this over one of the other load balancing algorithms?

  • Consistency: Our research indicated many users were using their load balancer to segment their traffic, and they needed the same client to always go to the same server. Using the client IP ensures that occurs.
  • No state: This requires us to store no information about the client, their previous connections, or previous routing information.
  • Resource usage: A few math operations that the CPU can calculate quickly and easily.
  • Simplicity: The math can be understood easily. Complexity should be avoided wherever possible.
  • Determinism: The result of the operation is always the same, no matter which edge PoP the client connects to. The operations of a distributed system are incompatible with randomness.

Simpler than Round Robin?

Those of you who know your load balancing algorithms are probably asking: Wouldn't round robin be even simpler? After all it doesn't require even the simple operations described above. If NuevoCloud were a traditional CDN, you would be absolutely correct.

However NuevoCloud isn't a traditional CDN. It's a distributed system. When a connection is made between the edge PoP and your server, at least one additional edge PoP is recruited to participate in the connection. Additional PoPs may also participate, depending on network conditions.

While the first PoP makes the initial decision to route the connection to server N, it cannot communicate that upstream. It has information about the state of the network, but because the network is spread over the world that information is always out of date to some degree. As the request travels to PoPs closer to the origin server, more up to date information is available. So upstream PoPs must have the freedom to change their routing decisions. This is ok as long as every PoP would come to the same decision given the same information. This deterministic decision making is what ensures the connection makes it to the destination.

The problem with round robin is that it is random. The result depends on the number of requests the PoP has routed to the origin. This is a number that changes rapidly, and would be impossible to communicate to any other PoP. Therefore, as far as the other PoPs are concerned, that number is indistinguishable from a random number. It's impossible for another PoP to determine how a request will be routed. In fact, if the initial PoP decides to route the connection to server A, the next PoP in the route may decide to route it to server B, and the next to A, and so on. With round robin, it's pure luck if the request makes it to the destination.

So while round robin is simpler for traditional CDNs to provide; it is deceptively complex to implement in a distributed system.

The Future

We view this as a starting point for our load balancing software. There are many improvements we could add including geolocation, balanced queuing, weighting, and yes even round robin. If your use case requires one of these, we would love to hear from you.


  1