Multisite Deployment of NSX-T Data Center Part 1| LAB2PROD
NSX-T Active-Active Multisite in a Single Region and Failover to a Secondary Region Part 1.
A client required a Multisite Deployment of NSX-T Data Center in an Active-Active configuration within a single rack. That is two top of rack switches each configured with BGP and peering to the NSX-T Edge Nodes. Also as part of their requirements was a backup/DR rack to failover to with minimal intervention and disruption to the dataplane. This setup required local egress within the active rack, therefore having minimal data sent across racks.
I have explored three ways to possibly configure multisite;
- Active-Standby
- Active-Active
- Active-Active in a single site or region, with failover to a secondary region
This post will be broken up into two parts, this part will cover the various topologies that were considered. The second part will focus on the configuration of the chosen topology.
Below are quick summaries of the above topologies.
This Multisite Deployment of NSX-T Data Center consists of having a T0 gateway in Active-Standby which effectively places the Edge VM’s in Active-Standby as well. Refer to the image below.
1. Active-Standby NSX-T Data Center Multisite Deployment;
In this NSX-T multisite deployment scenario, should the primary site fail, the NSX-T Edge Cluster and dataplane will failover to the secondary site and the standby Edge VM will become active. The workload failover is beyond the scope of this article, however must be thought of.
2.  Active-Active NSX-T Data Center Multisite Deployment with no secondary site;
In this Multisite Deployment of NSX-T Data Center, we consider an active-active topology. However, this isn’t how one would traditionally envisage an Active-Active site functioning. There would be two T0’s, each with their own edge cluster, both with segments attached to them directly or plumbed into a T1, the T1 is then linked to the T0. Each site propagating different subnets via eBGP or made available through static routing. Above this clients may choose to have some form of application layer load balancing with the use of a GSLB or any other mechanism they deem appropriate, if you are interested in configuring a GSLB have a look at this article that I wrote, Multi-Cloud GSLB using NSX Advanced Load Balancer . During a site failure, depending on which site fails, the NSX-T Edge Cluster’s active node would fail to the other site. Until the site that failed is brought back online, all traffic for the segments that were in the failed site will be propagated through the second site. Refer to the image below. Â
3. Active-Active NSX-T Data Center Multisite Deployment with DR in a secondary rack;
In this Multisite Deployment of NSX-T Data Center, we look at configuration to enable an Active-Active T0 gateway and to be able to control where the Edge VM’s are placed and where the dataplane traffic will ingress/egress. Generally, a single rack/site deployment is easy as there is single rack for all appliances or there is more than one rack and no need to control where traffic is ingressing and egressing.Â
However, for this NSX-T edge cluster failure scenario, there were two racks in a single site (each with their own ToR’s with routing enabled and uplinks to the network core). To ensure dataplane traffic was ingressing and egressing from the active rack and only failed to the backup rack if the active failed, I had to reduce manual intervention to minimize dataplane downtime.
The T0 will peer upstream to the ToR’s. Whilst this would satisfy the minimal downtime, it does not satisfy having dataplane traffic egressing locally in the active rack. This is because the ToR’s and by nature of dynamic routing the core, have learnt the routes from either ‘sites’ peers. The physical fabric sees the paths being the same length and therefore will balance across all. We now need to make the active site the preferred route, and this can be done by prepending the AS and attaching it to the out filter on the interfaces pointed to the second rack’s ToR’s. We will also need to configure local preference using route maps and attach these to the peers as well, this will tell NSX-T to use specific paths as well.
Below is a diagram of this topology, keeping in mind I replicated this environment in my lab and a production environment would generally have redundancy built in at each layer.
This concludes the architecture and topology discussion for NSX-T multisite. In the next part we will walkthrough configuration for the third scenario.