Luca Salvatore
Share
Last month, we proudly launched our 11th datacenter in Toronto, Canada. Building new datacenters is becoming a pretty common occurrence for us; we launched three last year, two this year, and there are plenty more coming in the near future.
This means that building a DC has become a repeatable task, and repeatable tasks are tasks that are begging for automation. This blog is the story of how we have built our last few datacenters without needing to manually log in to the majority of our devices.
There is a fair amount of effort that goes into building a network in a brand new location, not to mention the tight timeline. It’s critical that the network is up and running early in the build process; without it, there’s no connectivity and our platform engineers can’t come in and build the hypervisors.
In any new deployment, there are typically around 50 new switches to configure. Most switches have an identical configuration (except for some unique things, like the management IP address), and the new switches will almost always need their software updated to our standard version.
In our early days, deploying a new network meant logging into every switch via the console port, pasting a config from a template, and then upgrading the software. With so many switches to build, it was time consuming and — let’s face it — pretty boring. The whole process was in need of a total overhaul.
For our automated network deployment to work, we have to address a chicken and egg problem: there needs to be some form of networking already in place so the new switches can download their updated code and grab their configuration template.
As a result, a small part of the network still does need to be built by hand. This is typically a small-ish firewall connected to what we call our “out of band” (OOB) internet link, plus a few switches to provide connectivity to the management ports of our switches. These devices have a very basic configuration, so it’s easy to copy and paste it and get some initial connectivity.
Additionally, we need to know the MAC address of each switch, which is printed on the side of the chassis. Fortunately, we have a fantastic datacenter team that flies all over the world to do all the physical labor involved with deploying a new location. These folks have racking and stacking down to a fine art, and part of their process is to note down the MAC address of each switch they are racking into a file for use later on.
The actual automation of the building process is known as Zero Touch Provisioning (ZTP). Most major networking vendors have some form of ZTP support, and the process is pretty simple. There are a few specific configurations needed on the ZTP server to make everything work.
First, we need a DHCP server. We use good old ISC DHCP running on a Ubuntu server, and configure it to give the switch the information it needs once it boots up. This is the top of our dhcpd.conf file:
option ztp-file-server code 150 = { ip-address };
option space ZTP;
option ZTP.image-file-name code 0 = text;
option ZTP.config-file-name code 1 = text;
option ZTP.image-file-type code 2 = text;
option ZTP.transfer-mode code 3 = text;
option ZTP-encap code 43 = encapsulate ZTP;
option ztp-file-server 10.126.1.1;
option ZTP.image-file-name "/software/switch-image-file.tgz";
option ZTP.transfer-mode "http";
This basically tells a switch what it needs to know to grab its template and where to grab its updated software.
The next bit of the dhcpd.conf file looks similar to this:
group {
host tor1-spine1 {
hardware ethernet 5C:45:27:23:2F:01;
fixed-address 10.200.72.138;
option routers 10.200.72.129;
option subnet-mask 255.255.255.192;
option ZTP.config-file-name "/tor1-spine1.config";
}
}
This is where the MAC address from the side of the switches’ chassis comes into play. We need each switch to pull down the correct configuration template, so the MAC address is used to identify the switch. The `dhcpd.conf` file will have an entry like the one above for every single switch that we want to ZTP.
Because creating a entry for 50 or so switches would be pretty annoying, we also automate this using simple Python script which spits out the appropriate `dhcpd.conf` file containing all the correct MAC addresses and IP addresses.
For this process to be fully automated, each new switch needs to have a configuration template ready to go. To make this happen, we use the Jinja2 templating software and some Python, which makes it easy to create a whole bunch of templates quickly. We create a template for every device that is going to be deployed and upload the templates to the ZTP server.
The switch boots up and sends out a DHCP request, which the OOB firewall relays to the ZTP server. The switch then grabs its config template, downloads its software, and that’s it!
Here is the console output from a real Juniper QFX switch going through the process:
root>
Auto Image Upgrade: DHCP Client Bound interfaces:
Auto Image Upgrade: DHCP Client Unbound interfaces: irb.0 vme.0 et-0/0/0.0 e
t-0/0/1.0 et-0/0/2.0 et-0/0/3.0 et-0/0/4.0 et-0/0/5.0 et-0/0/6.0 et-0/0/7
.0 et-0/0/8.0 et-0/0/9.0 et-0/0/10.0 et-0/0/11.0 et-0/0/12.0 et-0/0/13.0
et-0/0/14.0 et-0/0/15.0 et-0/0/16.0 et-0/0/17.0 et-0/0/18.0 et-0/0/19.0
et-0/0/20.0 et-0/0/21.0 et-0/0/22.0 et-0/0/23.0 et-0/1/0.0 et-0/1/1.0 et-
0/1/2.0 et-0/1/3.0 et-0/2/0.0 et-0/2/1.0
Auto Image Upgrade: No DHCP Client in bound state, reset all enabled DHCP clients
Auto Image Upgrade: DHCP Options for client interface vme.0:
ConfigFile: /nyc3-spine3.config
ImageFile: /jinstall-qfx-5-13.2X51-D35.3-domestic-signed.tgz
Gateway: 10.198.73.129
File Server: 10.1.2.3
Options state: All options set
Auto Image Upgrade: DHCP Client Bound interfaces: vme.0
Auto Image Upgrade: Active on client interface: vme.0
Auto Image Upgrade: Interface:: "vme"
Auto Image Upgrade: Server:: "10.1.2.3"
Auto Image Upgrade: Image File:: "jinstall-qfx-5-13.2X51-D35.3-domestic-signed
.tgz"
Auto Image Upgrade: Config File:: "nyc3-spine3.config"
Auto Image Upgrade: Gateway:: "10.198.73.129"
Auto Image Upgrade: Protocol:: "http"
Auto Image Upgrade: Start fetching nyc3-a1-spine3.config file from server 10.1.2.3 through vme using http
Auto Image Upgrade: File nyc3-spine3.config fetched from server 10.1.2.3 through vme
Auto Image Upgrade: Start fetching jinstall-qfx-5-13.2X51-D35.3-domestic-signed
.tgz file from server 10.1.2.3 through vme using http
WARNING!!! On successful image installation, system will reboot automatically
With the old process, it would take a full day of work to build 50 switches. With the new process, it takes 5 minutes, and the longest part is just waiting for the switch to reboot for its software update.
Instead of manually logging into each device, we now set up a ZTP server, upload the configuration templates, then sit back and watch the network build itself.
Share