Segregated Routing

For any network that provides routing services to customers it is important to segregate them in different virtual topologies that don’t interfere with each other.

Network Virtualization

This post is not about NFV, but it is important to understand the place of VRFs in the wonderful world of network virtualization and overlays.

Network virtualization implies resource separation. There are several ways to achieve this:

  1. Virtual router as a guest OS on a hypervisor, with its own kernel. This is commonly used to integrate vendor products (routers, firewalls, load balancers etc) into cloud networks. Nowadays pretty much every networking vendor has a virtual version of their product, also there are open source virtual routers (e.g. VyOS).
  2. Use Linux namespaces to create separate logical resource spaces for networking/PID/mount points etc on a shared kernel. This is how virtual routers in Openstack work for example.
  3. Containers – these also use namespaces to separate resources, but allow to install custom software in different container instances, while reusing the same kernel of the host OS. Containers are widely used to run all sorts of applications, it is not uncommon to see a container on a hardware router. Some vendors release their OS as a container (e.g. Arista cEOS), those are used on whitebox switches and for network simulation/labs. Containerized Open Source BGP implementations (e.g. Bird) are also used for public services like Looking Glasses.
  4. Vendor-specific technologies like Cisco VDC on Nexus 7k or SDR on CRS. Those are less common nowadays (although VDC used to be quite popular).

What we network engineers usually have to deal with is Virtual Routing and Forwarding (VRF): a virtual routing table that is separated from other VRFs, but it is still possible to leak routes between VRFs when needed. It is different from all virtualization technologies described above, as separation happens only on routing level1.

While reading this article before publishing I noticed it’s a bit chaotic as it jumps between different technologies and features, but I still decided to keep it as is, to have a good reference for various use cases of Route Distinguishers and Route Targets. I mostly use Cisco IOS for ISP scenarios (MPLS, MVPN etc) and Arista EOS for DC scenarios (VXLAN, EVPN).

VRF-lite and leaking

VRFs were originally designed for MPLS L3VPN [RFC4364], so they always required L3VPN-related config (RD, RT…see below in this post), and leaking routes between different VRF on a standalone router required MP-BGP and import/export policies.

Nowadays there are many applications of VRF, sometimes without any L3VPN or EVPN – so called VRF-lite. For instance, I have one router with VRFs “one” and “two” and want to leak static routes between them without enabling BGP and adding any L3VPN config. Example on EOS:

R1#show ip route vrf one static
 S        100.1.1.1/32 [1/0] via 10.0.0.1, Ethernet1

R1#show ip route vrf two static
 S        200.2.2.2/32 [1/0] via 10.1.1.3, Ethernet2

Adding config:

route-map PASS permit 10
!
router general
   vrf one
      leak routes source-vrf two subscribe-policy PASS
   !
   vrf two
      leak routes source-vrf one subscribe-policy PASS

Now routes are leaked between VRFs.

R1#show ip route vrf one static
 S        100.1.1.1/32 [1/0] via 10.0.0.1, Ethernet1
 S L      200.2.2.2/32 [1/0] (source VRF two) via 10.1.1.3, Ethernet2 (egress VRF two)

R1#show ip route vrf two static
 S L      100.1.1.1/32 [1/0] (source VRF one) via 10.0.0.1, Ethernet1 (egress VRF one)
 S        200.2.2.2/32 [1/0] via 10.1.1.3, Ethernet2

Overlay networks

It is possible to connect VRFs on different routers using an overlay network which is not VRF-aware. This has been used for many years in ISP networks (MPLS L3VPN) and since recently also in DC networks (EVPN – either with MPLS or VXLAN underlay). Basic VRF config on IOS looks something like:

vrf definition one
 rd 1:1
 route-target export 1:1
 route-target import 1:1
 !
 address-family ipv4
 exit-address-family

This is a typical example you find in most config guides, it has those strange “Route Distinguisher” and “Route Target” settings, often set to the same value that doesn’t make much sense to many people. There are in fact a lot of things you can do with those.

Route Distinguisher

RD is prepended before prefix in BGP updates, so that each BGP NLRI can be uniquely identified by the RD:prefix combination. Since different VRFs have different RD, they can reuse the same IP ranges, without any conflict.

RFC4364#section-4.2 defines 3 RD types:

  • Type 0: 2-byte ASN:Admin assigned number
  • Type 1: IP address:Admin assigned number
  • Type 2: 4-byte ASN:Admin assigned number

Admin assigned number uniquely identifies the VRF. While the ASN or IP address value can be set to any value as it is not verified anywhere, the main idea is that type 0 or type 2 RD for a given VRF will be the same for all PE routers in the same AS while type 1 RD will be unique per PE.

In topologies without multi-homed CE, RD type doesn’t matter at all. With multi-homed CE, not using a unique RD per PE will lead to all sorts of problems. What is written below applies to L3VPN as well as L2 or L3 EVPN.

Dual-homed CE

Consider the following topology:

Fig. 1

CE1 is multihomed to PE1 and PE2. Both PE use the same RD, and a route reflector is on the path. Following the usual BGP rules, RR will prefer one route and advertise it to PE3.

RR#sh bgp vpnv4 unicast all 5.5.5.5/32
BGP routing table entry for 1:1:5.5.5.5/32, version 684
Paths: (2 available, best #2, no table)
  Advertised to update-groups:
     1         
  Refresh Epoch 1
  200, (Received from a RR-client)
    2.2.2.2 (metric 10) (via default) from 2.2.2.2 (2.2.2.2)
      Origin IGP, metric 0, localpref 100, valid, internal
      Extended Community: SoO:200:1 RT:1:1
      mpls labels in/out nolabel/63
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  200, (Received from a RR-client)
    1.1.1.1 (metric 10) (via default) from 1.1.1.1 (1.1.1.1)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      Extended Community: RT:1:1
      mpls labels in/out nolabel/62
      rx pathid: 0, tx pathid: 0x0

There are multiple problems now:

  1. No ECMP load sharing
  2. Slow convergence after PE failure
  3. Suboptimal routing in some topologies

PE3 won’t be able to do BGP multipath, since it receives only one route from RR. Even if no multipath is intended and one PE is supposed to forward all traffic while the second is standby, convergence with type 0 RD will be slower: should PE1 fail, it will take some time for RR to advertise the new best route via PE2 to PE3. In scaled topologies with a lot of routes convergence can take up to several minutes.

Suboptimal routing won’t occur on figure 1, but consider the following topology (red numbers are IGP costs):

Fig. 2

If BGP best path selection relies on IGP cost, RR will advertise only the route via PE1 while for PE3 the best route would be via PE2.

Okay, back to figure 1 – now I use RD type 1 – 1.1.1.1:1 on PE1 and 2.2.2.2:1 on PE2.

RR#sh bgp vpnv4 unicast all 5.5.5.5/32
BGP routing table entry for 1.1.1.1:1:5.5.5.5/32, version 690
Paths: (1 available, best #1, no table)
  Advertised to update-groups:
     1         
  Refresh Epoch 1
  200, (Received from a RR-client)
    1.1.1.1 (metric 10) (via default) from 1.1.1.1 (1.1.1.1)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      Extended Community: SoO:200:1 RT:1:1
      mpls labels in/out nolabel/63
      rx pathid: 0, tx pathid: 0x0
BGP routing table entry for 2.2.2.2:1:5.5.5.5/32, version 691
Paths: (1 available, best #1, no table)
  Advertised to update-groups:
     1         
  Refresh Epoch 1
  200, (Received from a RR-client)
    2.2.2.2 (metric 10) (via default) from 2.2.2.2 (2.2.2.2)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      Extended Community: SoO:200:1 RT:1:1
      mpls labels in/out nolabel/65
      rx pathid: 0, tx pathid: 0x0

Since from RR perspective, those are 2 different NLRI, each can be selected as best and propagated to PE3. Now PE3 can do multihoming, and if either PE1 and PE2 fails, convergence will depend on core IGP withdrawing the route to the BGP nexthop – which is much faster (sub-second), if the IGP is tuned properly, and does not depend on the number of BGP routes.

In order to achieve similar behaviour in regular IPv4 BGP, they had to invent add path, or use other tricks like shadow RR; and to solve the suboptimal routing problem – use either add path or Optimal Route Reflection (ORR) with RR running SPF on behalf on remote PE in order to determine which route to advertise.

In L3VPN, this problem has been solved from day one with unique RD per PE, and no extra features are needed. Add path is sometimes useful for L3VPN nevertheless, but in different scenarios – e.g. BGP PIC edge.

The same considerations also apply to L3 EVPN, or L2 VXLAN-EVPN with MLAG, and since typical topologies where EVPN is deployed rely heavily on ECMP, type 1 RD has become mandatory in EVPN designs.

Strictly speaking, in pure L2 EVPN with multihoming (with type 1 and 4 routes, not MLAG), until recently there was no condition when multiple PE advertise the same NLRI, so theoretically RD uniqueness didn’t matter. draft-rbickhart-evpn-ip-mac-proxy-adv improves EVPN convergence2 by making multiple PE advertise the same MAC-IP route, so in order for this to actually work, the PE must use different RD.

Auto RD

Just a minor config optimization. Since pretty much every router has a unique router ID, it makes sense to automatically derive RD from it. Vendors do this with type 1 RD. Take the following EVPN MAC-VRF config on EOS:

router bgp 65001
   router-id 1.1.1.1
   !
   vlan 100
      rd auto

This will assign type 1 RD 1.1.1.1:100 to vlan 100. For VRFs, an internal VRF ID can be used for admin assigned number – its value doesn’t really matter, as long as it is unique.

Route Target

Route target is an extended BGP community which in theory can be appended to any NLRI in any address family. In L3VPN, EVPN, MVPN and other similar services it is used to control what routes should be imported in a VRF. In the most simple use case, an RT is configured per VRF. For example, on figure 3, routes of the customer in VRF “one” are exported with RT 100:1 whereas routes of the customer in VRF “two” are marked with RT 100:2. With this config, the remote PE will import those routes into the respective VRF.

Fig. 3

Sample config on IOS:

vrf definition one
 rd 1.1.1.1:1
 route-target export 100:1
 route-target import 100:1
 !
 address-family ipv4
 exit-address-family
!
vrf definition two
 rd 1.1.1.1:2
 route-target export 100:2
 route-target import 100:2
 !
 address-family ipv4
 exit-address-family

Similar to RD, RT can also be encoded in 3 ways:

  • Type 0: 2-byte ASN:Admin assigned number
  • Type 1: IP address:Admin assigned number
  • Type 2: 4-byte ASN:Admin assigned number

Type 0 is the most common.

It is possible to use route targets  to achieve various routing policies.

Hub and spoke routing

Fig. 4

On figure 4, all CE belong to the same customer, but I want to route traffic between any CE via the Hub router. There can be various reasons to do this, such as:

  • Underlay topology with high-bandwidth links only between Hub and Spoke
  • Spoke routers have low memory capacity and can’t store many routes
  • Firewall or NAT functionality enabled on Hub router

RT import/export policies on Hub:

vrf definition HUB
 rd 4.4.4.4:1
 route-target export 100:100
 route-target import 100:1
 !
 address-family ipv4
 exit-address-family

On Spokes:

vrf definition SPOKE
 rd 1.1.1.1:1
 route-target export 100:1
 route-target import 100:100
 !
 address-family ipv4
 exit-address-family

With this config, all routes advertised by Spoke routers will be exported with RT 100:1 so that only Hub will import them. And the other way around – routes advertised by Hub exported with RT 100:100 and imported by Spokes. At the very minimum, this can be only one (default) route.

Spoke#sh ip ro vrf SPOKE | in 0.0.0.0/0
B*    0.0.0.0/0 [200/0] via 4.4.4.4, 00:10:25

It is important that the default route on Hub is advertised into MPLS VPN with per-VRF label:

Hub#sh mpls for | in 0.0.0.0/0
28         No Label   0.0.0.0/0[V]     0             aggregate/HUB

If this route is originated on the Hub router itself, a per-VRF label will be used regardless of what label allocation mode is active. But if there is another CE router advertising default route, per-prefix or per-CE label mode in such a Hub and Spoke topology can result in suboptimal forwarding or blackholes. I wrote about label allocation modes in https://routingcraft.net/close-to-the-edge/#Label_allocation_modes

Internet access in L3VPN

What if I want traffic between CE flow directly (not through the hub), but also advertise a default route from the ISP into the L3VPN.

Fig. 5

On figure 5, all CE routers are connected to VRF “one”, and use RT 100:1 in both import and export policies. But also on PE4 I have created VRF “internet”:

vrf definition internet
 rd 4.4.4.4:666
 route-target export 100:666
 route-target import 100:1
 !
 address-family ipv4
 exit-address-family
!
vrf definition one
 rd 4.4.4.4:1
 route-target export 100:1
 route-target import 100:1
 route-target import 100:666
 !
 address-family ipv4
 exit-address-family

On all PE (including PE4) I have added import RT 100:666 so that they all import the default route advertised by PE4.

PE1# sh ip ro vrf one
B*    0.0.0.0/0 [200/0] via 4.4.4.4, 00:27:05
      6.0.0.0/32 is subnetted, 1 subnets
B        6.6.6.6 [20/0] via 172.16.0.6, 01:17:11
      7.0.0.0/32 is subnetted, 1 subnets
B        7.7.7.7 [200/0] via 2.2.2.2, 00:29:54
      8.0.0.0/32 is subnetted, 1 subnets
B        8.8.8.8 [200/0] via 3.3.3.3, 00:29:54
      9.0.0.0/32 is subnetted, 1 subnets
B        9.9.9.9 [200/0] via 4.4.4.4, 00:27:05
      11.0.0.0/32 is subnetted, 1 subnets
B        11.11.11.11 [200/0] via 4.4.4.4, 00:27:05
      172.16.0.0/16 is variably subnetted, 6 subnets, 2 masks
C        172.16.0.0/24 is directly connected, Ethernet1/0
L        172.16.0.1/32 is directly connected, Ethernet1/0
B        172.16.1.0/24 [200/0] via 2.2.2.2, 00:29:54
B        172.16.2.0/24 [200/0] via 3.3.3.3, 00:29:54
B        172.16.4.0/24 [200/0] via 4.4.4.4, 00:27:05
B        172.16.5.0/24 [200/0] via 4.4.4.4, 00:27:05

If no route filtering or summarization is used, so that all CE routes are known to all PE in VRF “one”, it is safe to use any label allocation mode on PE4.

Firewall in L3VPN

Fig. 6

What if traffic between CE1 and CE2 must go through a firewall? On PE3 I create 2 VRFs fw_int and fw_ext where the respective firewall interfaces in internal and external security zones are connected (can be different subinterfaces on the same link).

vrf definition fw_int
 rd 3.3.3.3:13
 route-target export 100:13
 route-target import 100:13
 !
 address-family ipv4
 exit-address-family
!
vrf definition fw_ext
 rd 3.3.3.3:37
 route-target export 100:37
 route-target import 100:37
 !
 address-family ipv4
 exit-address-family

Now on PE1 I import and export the RT corresponding for the internal security zone:

vrf definition one
 rd 1.1.1.1:1
 route-target export 100:13
 route-target import 100:13
 !
 address-family ipv4
 exit-address-family

And on PE2 – to the external zone:

vrf definition one
 rd 2.2.2.2:1
 route-target export 100:37
 route-target import 100:37
 !
 address-family ipv4
 exit-address-family

Traceroute from CE1 to CE2:

CE1#traceroute 7.7.7.7
Type escape sequence to abort.
Tracing the route to 7.7.7.7
VRF info: (vrf in name/id, vrf out name/id)
  1 172.16.0.1 [AS 100] 0 msec 0 msec 0 msec
  2 10.0.0.5 [MPLS: Labels 17/27 Exp 0] 0 msec 1 msec 1 msec
  3 172.16.13.3 [MPLS: Label 27 Exp 0] 0 msec 1 msec 0 msec
  4 172.16.13.13 1 msec 1 msec 1 msec
  5 172.16.37.3 1 msec 1 msec 0 msec
  6 10.2.2.5 [MPLS: Labels 19/23 Exp 0] 2 msec 2 msec 1 msec
  7 172.16.1.2 [AS 1337] [MPLS: Label 23 Exp 0] 1 msec 1 msec 1 msec
  8 172.16.1.7 [AS 1337] 2 msec *  4 msec

Traffic flow is shown on figure 6. Hop 3 is PE3 (VRF fw_int), hop 4 – firewall internal interface, hop 5 – PE3 (VRF fw_ext) and then all the way through core and PE2 to CE2. By the way in real networks, ICMP processing on firewalls is often disabled or broken, so traceroute will probably show just stars after hop 3; and with per-CE and per-prefix label allocation modes even after hop 1 (https://routingcraft.net/close-to-the-edge/#ICMP_tunneling explains why).

A similar design with firewall can be used in DC networks with VXLAN-EVPN. In that case, a pair of switches is designated as a “service leaf”, so firewalls and other service appliances are connected to it. Then traffic can be steered through the firewall either using RT configuration like shown above, or with other technologies like Macro Segmentation. The latter involves some API interaction between the firewall and switches so that only traffic flows matching certain security rules are redirected to the firewall.

E-Tree

This is similar to hub and spoke routing shown above, but now for L2VPN. RFC7796 describes E-tree for VPLS and RFC8317 – for EVPN.

Fig. 7

In this example, PE1 is Root PE; PE2 and PE3 are Leaf PE routers. Config examples on EOS.

PE1:

router bgp 100
   router-id 1.1.1.1
   !
   vlan 100
      rd 1.1.1.1:1
      route-target import 100:100
      route-target import 100:666
      route-target export 100:666
      redistribute learned

PE2 and PE3:

vlan 100
   e-tree role leaf
!      
router bgp 100
   router-id 2.2.2.2
   !
   vlan 100
      rd 2.2.2.2:1
      route-target import 100:666
      route-target export 100:100
      redistribute learned

Leaf PE export leaf RT 100:100 and import root RT 100:666. Root PE exports 100:666 and imports both leaf and root RT. Besides, local bridging of packets arriving from MPLS core back to MPLS core is disabled on Leaf PE to avoid L2 loops. On Root PE arriving packets are bridged and can be sent back to MPLS, as shown on figure 7.

RT constraint

Consider the following topology:

Fig. 8

PE1 and PE2 have only VRF “one” with RT 100:1, PE4 has only VRF “two” with RT 100:2, PE3 has both VRFs configured. All those PE peer with a route reflector to distribute routing information. What happens is that the RR advertises routes for both VRFs to all PE, regardless if they need those routes or not. Therefore, PE1 and PE2 will just discard received routes with RT 100:2 while PE4 discards routes with RT 100:1, because they don’t have any VRFs importing those RT. Sending extra updates to routers that don’t need them negatively affects BGP scalability.

We can check what routes RR sends to each PE. For example, what RR sends to PE1:

RR#sh bgp vpnv4 unicast all neighbors 1.1.1.1 advertised-routes
BGP table version is 141, local router ID is 5.5.5.5
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
              t secondary path,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 1.1.1.1:1
 *>i  6.6.6.6/32       1.1.1.1                  0    100      0 200 i
 *>i  172.16.0.0/24    1.1.1.1                  0    100      0 ?
Route Distinguisher: 2.2.2.2:1
 *>i  7.7.7.7/32       2.2.2.2                  0    100      0 200 i
 *>i  172.16.1.0/24    2.2.2.2                  0    100      0 ?
Route Distinguisher: 3.3.3.3:1
 *>i  8.8.8.8/32       3.3.3.3                  0    100      0 200 i
 *>i  172.16.2.0/24    3.3.3.3                  0    100      0 ?
Route Distinguisher: 3.3.3.3:2
 *>i  10.10.10.10/32   3.3.3.3                  0    100      0 300 i
 *>i  172.16.3.0/24    3.3.3.3                  0    100      0 ?
Route Distinguisher: 4.4.4.4:2
 *>i  11.11.11.11/32   4.4.4.4                  0    100      0 300 i
 *>i  172.16.5.0/24    4.4.4.4                  0    100      0 ?

Note updates from VRF “two” – PE1 will just discard those.

With Route Target Constraint [RFC4684], a PE can advertise to RR what RT it wants to receive. Then RR will send only the routes with desired RT to the PE.

Config added to RR:

router bgp 100
 bgp router-id 5.5.5.5
!
 address-family rtfilter unicast
  neighbor 1.1.1.1 activate
  neighbor 1.1.1.1 send-community extended
  neighbor 1.1.1.1 route-reflector-client
  neighbor 1.1.1.1 default-originate
  neighbor 2.2.2.2 activate
  neighbor 2.2.2.2 send-community extended
  neighbor 2.2.2.2 route-reflector-client
  neighbor 2.2.2.2 default-originate
  neighbor 3.3.3.3 activate
  neighbor 3.3.3.3 send-community extended
  neighbor 3.3.3.3 route-reflector-client
  neighbor 3.3.3.3 default-originate
  neighbor 4.4.4.4 activate
  neighbor 4.4.4.4 send-community extended
  neighbor 4.4.4.4 route-reflector-client
  neighbor 4.4.4.4 default-originate

New config on PE1 (similar on all other PE):

router bgp 100
 bgp router-id 1.1.1.1
 !
 address-family rtfilter unicast
  neighbor 5.5.5.5 activate
  neighbor 5.5.5.5 send-community extended

Note “default-originate” on RR – this is because RR wants to receive routes with all RT from PE. It is also possible to filter routes that PE sends to RR, and use different RR to reflect routes of different VRFs.

Now PE1, PE2 and PE3 advertise that they want to receive routes with RT 100:1; PE3 and PE4 advertise that they want to receive routes with RT 100:2. RR wants to receive routes with any RT.

RR#show bgp rtfilter  unicast rt 100:1
BGP routing table entry for 100:2:100:1, version 3
Paths: (3 available, best #2)
  Advertised to update-groups:
     1         
  Refresh Epoch 1
  Local, (Received from a RR-client)
    2.2.2.2 (metric 10) from 2.2.2.2 (2.2.2.2)
      Origin IGP, metric 0, localpref 100, weight 32768, valid, internal
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  Local, (Received from a RR-client)
    1.1.1.1 (metric 10) from 1.1.1.1 (1.1.1.1)
      Origin IGP, metric 0, localpref 100, weight 32768, valid, internal, best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 1
  Local, (Received from a RR-client)
    3.3.3.3 (metric 10) from 3.3.3.3 (3.3.3.3)
      Origin IGP, metric 0, localpref 100, weight 32768, valid, internal
      rx pathid: 0, tx pathid: 0


RR#show bgp rtfilter  unicast rt 100:2
BGP routing table entry for 100:2:100:2, version 4
Paths: (2 available, best #2)
  Advertised to update-groups:
     1         
  Refresh Epoch 1
  Local, (Received from a RR-client)
    4.4.4.4 (metric 10) from 4.4.4.4 (4.4.4.4)
      Origin IGP, metric 0, localpref 100, weight 32768, valid, internal
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  Local, (Received from a RR-client)
    3.3.3.3 (metric 10) from 3.3.3.3 (3.3.3.3)
      Origin IGP, metric 0, localpref 100, weight 32768, valid, internal, best
      rx pathid: 0, tx pathid: 0x0


RR#show bgp rtfilter  unicast default
BGP routing table entry for 0:0:0:0, version 2
Paths: (1 available, no best path)
  Advertised to update-groups:
     1         
  Refresh Epoch 1
  Local, (default-originate)
    0.0.0.0 from 0.0.0.0 (5.5.5.5)
      Origin IGP, localpref 100, external
      Community: no-export
      rx pathid: 0, tx pathid: 0x0

Therefore, RR advertises fewer prefixes to PE1 now:

RR#sh bgp vpnv4 unicast all neighbors 1.1.1.1 advertised-routes
BGP table version is 161, local router ID is 5.5.5.5
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
              t secondary path,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 1.1.1.1:1
 *>i  6.6.6.6/32       1.1.1.1                  0    100      0 200 i
 *>i  172.16.0.0/24    1.1.1.1                  0    100      0 ?
Route Distinguisher: 2.2.2.2:1
 *>i  7.7.7.7/32       2.2.2.2                  0    100      0 200 i
 *>i  172.16.1.0/24    2.2.2.2                  0    100      0 ?
Route Distinguisher: 3.3.3.3:1
 *>i  8.8.8.8/32       3.3.3.3                  0    100      0 200 i
 *>i  172.16.2.0/24    3.3.3.3                  0    100      0 ?

RTC and VPNv4 updates illustrated (VPNv4 updates from PE to RR not included for brevity):

Fig. 9

RTC for legacy PE

If a PE does not support RTC but RR does, it is still possible to use it. For example, PE1 does not support RTC. RR (in this example IOS) allows interwqorking with legacy PE. Topology is the same as on figure 9, but RTC has been disabled between RR and PE1. Config added to RR:

router bgp 100
!
 address-family vpnv4
  neighbor 1.1.1.1 accept-route-legacy-rt

Now configuring legacy RT on PE1 gets a bit complicated. There are multiple steps here:

  1. Create a dummy “RT filter” VRF – it can be without any interfaces
  2. Export at least one route from that VRF with a special extended community 4294901762 (for VPNv4) or 4294901764 (for VPNv6) and the list of RT we want the RR to advertise to us.
    1. Since this is a VPNv4/VPNv6 route and we don’t want other routers to import our “dummy” route into any of their VRFs, the RT will be translated – see below
  3. Set communities “no-export” and “no-advertise” when advertising the dummy route to the RR

Excerpt from PE1 config:

vrf definition RTFILTER
 rd 1.1.1.1:666
 !
 address-family ipv4
  export map RTFILTER
  exit-address-family
!
ip route vrf RTFILTER 6.6.6.6 255.255.255.255 Null0
!
ip prefix-list RTFILTER seq 5 permit 6.6.6.6/32
!
route-map RTFILTER permit 10
 set community 4294901762 additive
 set extcommunity rt 0.100.0.0:1 additive
!
route-map RR_OUT permit 10
 match ip address prefix-list RTFILTER
 set community no-export no-advertise additive
!
route-map RR_OUT permit 20
!
router bgp 100
 bgp router-id 1.1.1.1
 neighbor 5.5.5.5 remote-as 100
 neighbor 5.5.5.5 update-source Loopback0
 !
 address-family vpnv4
  neighbor 5.5.5.5 activate
  neighbor 5.5.5.5 send-community both
  neighbor 5.5.5.5 route-map RR_OUT out
 exit-address-family
 !
 address-family ipv4 vrf RTFILTER
  redistribute static
 exit-address-family

Note how type 0 RT 100:1 became type 1 RT 0.100.0.0:1 – this translation is required to make sure no other router imports 6.6.6.6 into their VRF. Check out draft-ietf-idr-legacy-rtc-08#section-3.1 for translation rules.

Upon receiving the route for 6.6.6.6/32 from the legacy PE, RR decodes the route target and understands what RTs it should advertise to that PE.

RR#sh bgp vpnv4 unicast rd 1.1.1.1:666 detail                  

Route Distinguisher: 1.1.1.1:666
BGP routing table entry for 1.1.1.1:666:6.6.6.6/32, version 190
  Paths: (1 available, best #1, no table, not advertised to any peer)
  Not advertised to any peer
  Refresh Epoch 1
  Local, (Received from a RR-client)
    1.1.1.1 (metric 10) (via default) from 1.1.1.1 (1.1.1.1)
      Origin incomplete, metric 0, localpref 100, valid, internal, best
      Community: 4294901762 no-export no-advertise
      Extended Community: RT:0.100.0.0:1
      mpls labels in/out nolabel/24
      rx pathid: 0, tx pathid: 0x0

In this case, routes with RT 100:1 are sent to PE1.

RR#sh bgp vpnv4 unicast all neighbors 1.1.1.1 advertised-routes
BGP table version is 190, local router ID is 5.5.5.5
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
              t secondary path,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 2.2.2.2:1
 *>i  7.7.7.7/32       2.2.2.2                  0    100      0 200 i
 *>i  172.16.1.0/24    2.2.2.2                  0    100      0 ?
Route Distinguisher: 3.3.3.3:1
 *>i  8.8.8.8/32       3.3.3.3                  0    100      0 200 i
 *>i  172.16.2.0/24    3.3.3.3                  0    100      0 ?

Remark: the reason why fewer routes are advertised to PE1 than in the previous example with proper RTC is that PE1 is now in a different update group than PE2 and PE3. This is very specific to Cisco IOS; other BGP implementations implement outbound update optimizations in different ways. This also means RTC is not taken into account when forming update groups on IOS.

Auto RT in EVPN

While auto RD is an easy config optimization as RD values don’t matter as long as they are unique, RT values are actually important for export/import policies.

Auto RT works only for L2 EVPN. The first part of auto RT equals ASN, the second part is generated differently depending on what encapsulation  is used. RFC7432#section-7.10.1 defines automatic derivation of RT from vlan ID (for MPLS-EVPN), RFC8365#section-5.1.2.1 defines auto RT derived from VNI for VXLAN-EVPN).

As long as the PE on different sites use the same vlan (in MPLS-EVPN), or the same VNI in VXLAN-EVPN), they will generate the same RT. Example on Arista EOS:

router bgp 65001
   !
   vlan 100
      rd auto
      route-target export auto
      route-target import auto 65001
      redistribute learned


Leaf1#show bgp evpn route-type imet rd 1.1.1.1:100 de
BGP routing table information for VRF default
Router identifier 1.1.1.1, local AS number 65001
BGP routing table entry for imet 1.1.1.1, Route Distinguisher: 1.1.1.1:100
 Paths: 1 available
  Local
    - from - (0.0.0.0)
      Origin IGP, metric -, localpref -, weight 0, valid, local, best
      Extended Community: Route-Target-AS:65001:268535556 TunnelEncap:tunnelTypeVxlan
      VNI: 100100

That big number after ASN is 268435456 (0x10000000 – 4th bit set to 1 means it is VXLAN) + 100100 (VNI).

Different PE can use different ASN: either in Inter-AS deployments, or in DC networks with unique private ASN per Leaf (RFC7938#section-5.2.1) – in that case you might have to configure multiple auto RT import statements.

router bgp 65001
   vlan 100
      rd auto
      route-target export auto
      route-target import auto 65004
      route-target import auto 65002
      route-target import auto 65001
      route-target import auto 65003
      redistribute learned

In the example above MAC-VRF corresponds to vlan – and the vlan ID (or VNI ID mapped to vlan) is used to generate auto RT. There are other MAC-VRF types in EVPN, such as Vlan Bundle and Vlan Aware Bundle. In that case, one MAC-VRF includes a lot of vlans. EVPN RFCs don’t specify how to generate auto RT specifically in those cases, but some implementations (e.g. JUNOS) still do it per vlan or per VNI – therefore, EVPN routes from different vlans in the same MAC-VRF will have different RT.

It is also possible to integrate auto RT with RT constraint.

Fig. 10

All PE have the same Vlan Aware Bundle MAC-VRF, but different vlans enabled. With unique auto RT per vlan and RT constraint enabled, each PE can advertise to RR, routes for which vlans it wants to receive. This design allows to gain benefits of both Vlan Aware Bundle (configuration simplicity) and Vlan MAC-VRF (control-plane scalability).

RT propagation from CE to PE

Normally all configuration related to RT is done on PE. There is a common misconception among network engineers that RT is somehow specific to MP-BGP. In fact it is just an extended community which can be advertised with any BGP route, in any address family. When BGP is used as PE-CE routing protocol (which is the most common), in theory CE can pass route targets to PE. RFC4364 section 7.4(d) describes some ideas on how this can be implemented.

Since L3VPN is typically used to provide services to different customers, and CE and PE are often managed by different parties, it is easy to dismiss the idea of CE influencing PE VRF import/export policies as a big security risk. Therefore, in most (all?) L3VPN implementations PE discard all RT received from CE.

Figure 11 shows a scenario with L3 EVPN used for DCI. It is something like Inter-AS VPN option A but with only one BGP session and one VRF between the DCI routers. This assumes no overlapping IP ranges in VRFs, and everything controlled by one party.

Fig. 11

On Arista EOS it is possible to disable filtering of RT received from CE. In each DC there are a lot of VRFs, but instead of running a BGP session per VRF on DCI link, only one BGP session in one VRF is used. DC2-DCI passes all C-routes with respective RT to DC1-DCI, which then exports them into EVPN so that on DC1-Leaf each route will be imported into its own VRF. I use VXLAN and EVPN in this example, but it works in the same way with MPLS-EVPN or L3VPN (or a mix of all of the above).

Excerpt from DC1-DCI config:

interface Vxlan1
   vxlan source-interface Loopback0
   vxlan udp-port 4789
   vxlan vrf DCI vni 1337
!   
router bgp 100
   !
   vrf DCI
      rd 2.2.2.2:100
      route-target import evpn 1:1
      route-target import evpn 2:2
      route-target export evpn 1337:1337
      neighbor 172.16.100.3 route-target export evpn ipv4 filter disabled
      neighbor 172.16.100.3 remote-as 200
      neighbor 172.16.100.3 send-community extended

RTs 1:1 and 2:2 are from VRFs which are not configured on DCI switches, there can be thousands of those VRFs in each DC. DC2-DCI imports all those routes in VRF DCI, advertises to DC1-DCI, which due to disabled RT filtering (not to be confused with RT constraint reviewed above) exports them in EVPN. Other switches in DC1, which have VRFs with respective RT, will import those routes.

DC1-DCI#sh ip bgp 172.16.3.0/24 vrf DCI
BGP routing table information for VRF DCI
Router identifier 172.16.100.2, local AS number 100
BGP routing table entry for 172.16.3.0/24
 Paths: 1 available
  200
    172.16.100.3 from 172.16.100.3 (172.16.100.3)
      Origin IGP, metric 0, localpref 100, IGP metric 0, weight 0, received 01:49:14 ago, valid, external, best
      Extended Community: Route-Target-AS:1:1 TunnelEncap:tunnelTypeVxlan EvpnRouterMac:50:13:00:17:27:cc
      Rx SAFI: Unicast


DC1-Leaf#sh ip ro vrf one 172.16.3.0/24
 B I      172.16.3.0/24 [200/0] via VTEP 2.2.2.2 VNI 1337 router-mac 50:13:00:fa:37:ae

Traffic flow:

Fig. 12

There can be multiple reasons to run this design instead of end-to-end VXLAN or MPLS: for instance, between DC1 and DC2 there can be an IP network which doesn’t support MTU above 1500 so running extra encapsulation over it becomes impractical. The obvious caveat here is that IP address spaces across VRF cannot overlap, but in DC networks this is seldom a problem.

Of course, apart from this DCI scenario, in usual EVPN or L3VPN designs it is also possible to configure an extcommunity3 list on CE so that it will pass a specific set of RT to the PE.

CE(config-route-map-SET_RT)#set extcommunity rt 100:100

This way it is possible to influence VRF import policies from CE. An alternative (and much more cumbersome) configuration would be advertising standard communities from CE and let PE map them to RTs as per configured policy.

EVPN ES import Route Target

Despite the name, it has nothing to do with RT reviewed above. In EVPN multihoming, PE attached to the same Ethernet Segment (ES) use this extcommunity in type 4 routes to figure out they are indeed on the same ES.

Fig. 13

Host-facing interface config on both Leaf-1 and Leaf-2:

interface Port-Channel10
   switchport trunk allowed vlan 10
   switchport mode trunk
   !
   evpn ethernet-segment
      identifier 00de:adbe:efca:fe00:0000
      route-target import de:ad:be:ef:ca:fe
   lacp system-id dead.beef.1337


Leaf-1#show bgp evpn route-type ethernet-segment esi 00de:adbe:efca:fe00:0000 detail
BGP routing table information for VRF default
Router identifier 1.1.1.1, local AS number 65001
BGP routing table entry for ethernet-segment 00de:adbe:efca:fe00:0000 1.1.1.1, Route Distinguisher: 1.1.1.1:1
 Paths: 1 available
  Local
    - from - (0.0.0.0)
      Origin IGP, metric -, localpref -, weight 0, valid, local, best
      Extended Community: TunnelEncap:tunnelTypeVxlan EvpnEsImportRt:de:ad:be:ef:ca:fe
BGP routing table entry for ethernet-segment 00de:adbe:efca:fe00:0000 2.2.2.2, Route Distinguisher: 2.2.2.2:1
 Paths: 1 available
  65101 65002
    2.2.2.2 from 5.5.5.5 (5.5.5.5)
      Origin IGP, metric -, localpref 100, weight 0, valid, external, best
      Extended Community: TunnelEncap:tunnelTypeVxlan EvpnEsImportRt:de:ad:be:ef:ca:fe

Based on ES Import RT, both PE can elect the DF and also apply the split-horizon rule for replicated BUM packets.

Leaf-1#show bgp evpn instance vlan 10
EVPN instance: VLAN 10
  Route distinguisher: 0:0
  Service interface: VLAN-based
  Local IP address: 1.1.1.1
  Encapsulation type: VXLAN
  Local ethernet segment:
    ESI: 00de:adbe:efca:fe00:0000
      Interface: Port-Channel10
      Mode: all-active
      State: up
      ESI label:
      ES-Import RT: de:ad:be:ef:ca:fe
      Designated forwarder: 1.1.1.1
      Non-Designated forwarder: 2.2.2.2

See also RFC7432#section-7.6

MVPN

Multicast VPN uses MP-BGP for auto-discovery and C-multicast signaling. If BGP does only auto-discovery, the role of RD and RT is pretty much the same as in regular L3VPN.

With C-multicast signaling, it gets more complicated. Just like regular PIM joins, MVPN joins are sent towards the RPF neighbour, which is determined by the unicast routing table.

RD and BGP C-multicast signaling

The basic purpose of RD in MVPN is the same as in L3VPN – to uniquely identify NLRI in BGP, in case C-mroute addressing overlaps. Besides, the RD value can be used in Upstream Multicast Hop (UMH) selection – used to prevent duplicate traffic.

What is very peculiar to MVPN is that when an R-PE originates type 6 – (*,G) join or type 7 – (S,G) join routes, it prepends them not with its own RD, but with the RD of S-PE that advertised unicast VPN routes.

Consider the following topology:

Fig. 14

Assuming the RP registration and SPT switchover work fine, PE2 wants to send a (S,G) join towards its RPF neighbour (which is PE1).

PE2#sh ip mro vrf one 233.1.1.1

(*, 233.1.1.1), 00:42:38/stopped, RP 7.7.7.7, flags: SJCg
  Incoming interface: Lspvif0, RPF nbr 1.1.1.1
  Outgoing interface list:
    Ethernet1/0, Forward/Sparse, 00:42:37/00:02:31
(172.16.4.8, 233.1.1.1), 00:05:19/00:01:32, flags: JTgQ
  Incoming interface: Lspvif0, RPF nbr 1.1.1.1
  Outgoing interface list:
    Ethernet1/0, Forward/Sparse, 00:05:19/00:02:31

This is the RPF route – a VPNv4 route received from PE1:

PE2#sh bgp vpnv4 unicast rd 1.1.1.1:1 172.16.4.8                                 
BGP routing table entry for 1.1.1.1:1:172.16.4.0/24, version 5
Paths: (1 available, best #1, no table)
Flag: 0x100
  Not advertised to any peer
  Refresh Epoch 1
  Local
    1.1.1.1 (metric 20) (via default) from 5.5.5.5 (5.5.5.5)
      Origin incomplete, metric 0, localpref 100, valid, internal, best
      Extended Community: RT:1:1 MVPN AS:100:0.0.0.0 MVPN VRF:1.1.1.1:1
      Originator: 1.1.1.1, Cluster list: 5.5.5.5
      mpls labels in/out nolabel/27
      rx pathid: 0, tx pathid: 0x0

PE2 generates MVPN type 7 route with the RD of PE1:

PE2#sh bgp ipv4 mvpn rd 1.1.1.1:1 route-type 7 1.1.1.1:1 100 172.16.4.8 233.1.1.1
BGP routing table entry for [7][1.1.1.1:1][100][172.16.4.8/32][233.1.1.1/32]/22, version 64
Paths: (1 available, best #1, table MVPNv4-BGP-Table)
  Advertised to update-groups:
     1         
  Refresh Epoch 1
  Local
    0.0.0.0 from 0.0.0.0 (2.2.2.2)
      Origin incomplete, localpref 100, weight 32768, valid, sourced, local, best
      Extended Community: RT:1.1.1.1:1
      rx pathid: 1, tx pathid: 0x0

In a more complex topology with source or RP multihomed, the same considerations of RD assignment as in L3VPN apply. Without a unique RD per PE, if a RR is used, routing can be suboptimal and convergence can be slow.

Fig. 15

If on figure 15, PE1 and PE2 use the same RD, besides VPNv4 suboptimal routing issues (like on figures 1 and 2), also MVPN joins from PE3 and PE4 will arrive only on one S-PE. In practice this is less of a problem in MVPN (compared to L3VPN), because doing load balancing between S-PE in this topology creates a risk of duplicate traffic https://routingcraft.net/on-duplicates/#Upstream_multicast_hop_selection. Therefore, even with unique RD per PE, UMH most likely will choose only one S-PE.

What if the R-PE used their own RD in MVPN type 6 and 7 routes, instead of the RD of the S-PE? In the same scenario with non-unique RD, there would be a risk of either Receiver1 or Receiver2 not receiving any traffic, because the RR would propagate only one MVPN join (from either PE3 or PE4, but not both) to S-PE. Therefore, MVPN would be broken. This is why RD of S-PE are used in type 6 and 7 routes. See also RFC6513 for more details.

VRF route import

This is another extended community. It works in a way very similar to route targets, but is usually generated automatically. PE routers participating in MVPN attach this community to unicast VPN routes, so that if another PE uses that route for RPF check, it will set its Route Target for the MVPN routes to the VRF route import value of the unicast routes it uses for RPF. This also means that in order for a route to be eligible to be used for RPF, it must carry the VRF route import extended community.

Consider the same topology as on figure 14, and exactly the same outputs:

PE2#sh bgp vpnv4 unicast rd 1.1.1.1:1 172.16.4.8                                 
BGP routing table entry for 1.1.1.1:1:172.16.4.0/24, version 5
Paths: (1 available, best #1, no table)
Flag: 0x100
  Not advertised to any peer
  Refresh Epoch 1
  Local
    1.1.1.1 (metric 20) (via default) from 5.5.5.5 (5.5.5.5)
      Origin incomplete, metric 0, localpref 100, valid, internal, best
      Extended Community: RT:1:1 MVPN AS:100:0.0.0.0 MVPN VRF:1.1.1.1:1
      Originator: 1.1.1.1, Cluster list: 5.5.5.5
      mpls labels in/out nolabel/27
      rx pathid: 0, tx pathid: 0x0

PE2#sh bgp ipv4 mvpn rd 1.1.1.1:1 route-type 7 1.1.1.1:1 100 172.16.4.8 233.1.1.1
BGP routing table entry for [7][1.1.1.1:1][100][172.16.4.8/32][233.1.1.1/32]/22, version 64
Paths: (1 available, best #1, table MVPNv4-BGP-Table)
  Advertised to update-groups:
     1         
  Refresh Epoch 1
  Local
    0.0.0.0 from 0.0.0.0 (2.2.2.2)
      Origin incomplete, localpref 100, weight 32768, valid, sourced, local, best
      Extended Community: RT:1.1.1.1:1
      rx pathid: 1, tx pathid: 0x0

VRF route import value in this example is the same as RD value just by accident – in fact it is automatically generated on S-PE based on its loopback IP used for MVPN, and internal VRF ID.

The idea here is to make sure only the S-PE (chosen by UMH) will import the MVPN route, this is why a unique RT value per [PE, VRF] is used here.

It is also possible to use RT constraint (reviewed above) with VRF route import – to propagate MVPN routes only to the PE that actually need them. Unlike RTC in L3VPN and EVPN where it makes sense only when different PE have different sets of VRFs or vlans, in MVPN it can be useful even with only one VRF.

Fig. 16

On figure 16, both PE3 and PE4 have UMH configured so that they always prefer PE2. With both MVPN and RTC enabled, all PE advertise not only their RT (explicitly configured under VRF), but also VRF route import values.

RR receives VRF route import values from PE1 and PE2:

RR#sh bgp rtfilter  unicast all | sec 1.1.1.1:1|2.2.2.2:1
 *>i  100:258:1.1.1.1:1
                       1.1.1.1                  0    100  32768 i
 *>i  100:258:2.2.2.2:1
                       2.2.2.2                  0    100  32768 i

Now upon receiving MVPN type 6 and 7 routes with RT 2.2.2.2:1, RR will advertise them only to PE2.

RR#sh bgp ipv4 mvpn all neighbors 2.2.2.2 advertised-routes
---
 *>i  [6][2.2.2.2:1][100][7.7.7.7/32][233.1.1.1/32]/22
                       3.3.3.3                  0    100      0 ?
 *>i  [7][2.2.2.2:1][100][172.16.4.8/32][233.1.1.1/32]/22
                       3.3.3.3                  0    100      0 ?

These routes are advertised to PE2 but not to PE1.

VRF route import is also used in inter-AS and extranet scenarios. More details in RFC6513

VPN ID

Long time ago, RFC2685 defined VPN ID to uniquely identify a VPN (it didn’t even mention VRF). It seems to be more of an administrative value, not really related to any routing policies.

There is one peculiar scenario when VPN ID is actually used for routing. It is MVPN with mLDP as core protocol and inclusive PMSI (default MDT).

PE config:

vrf definition one
 rd 1.1.1.1:1
 vpn id 100:100
 !
 address-family ipv4
  mdt auto-discovery mldp
  mdt default mpls mldp 1.1.1.1
  mdt data mpls mldp 200
  mdt overlay use-bgp
  route-target export 1:1
  route-target import 1:1
 exit-address-family

The VPN ID value must be the same on all PE that have this VRF. Then they signal MP2MP LSP using mLDP, and the VPN id identifies VRF.

PE1#sh mpls mldp database
  * Indicates MLDP recursive forwarding is enabled

LSM ID : 4 (RNR LSM ID: 5)   Type: MP2MP   Uptime : 00:09:03
  FEC Root           : 1.1.1.1 (we are the root)
  Opaque decoded     : [mdt 100:100 0]
  Opaque length      : 11 bytes
  Opaque value       : 02 000B 0001000000010000000000
  RNR active LSP     : (this entry)
  Upstream client(s) :
    None
      Expires        : N/A           Path Set ID  : 4
  Replication client(s):
    MDT  (VRF one)
      Uptime         : 00:09:03      Path Set ID  : 5
      Interface      : Lspvif0       
    5.5.5.5:0
      Uptime         : 00:08:58      Path Set ID  : 6
      Out label (D)  : 27            Interface    : Ethernet0/1*
      Local label (U): 29            Next Hop     : 10.0.0.5

This is opaque type 2 – other mLDP flavours (such as partitioned MDT or in-band signaling) use different opaque types which don’t use VPN ID. See also draft-bishnoi-mpls-mldp-opaque-types.

Type 2 RD and legacy MVPN

Type 2 RD is not used almost anywhere. There is one peculiar use case of type 2 RD in legacy MVPN – and I mean not just draft Rosen [RFC6037] (which is also legacy), but even earlier there was a Cisco-proprietary solution. It carried MDT source in L3VPN updates with type 2 RD and MDT extended community (also non-standard)4.

Later IANA standardised  IANA assigned SAFI 66 for MDT signaling, so MDT is not advertised anymore in L3VPN updates. But on IOS there is still functionality to translate MDT updates between SAFI 66 and L3VPN with type 2 RD and MDT extcommunity.

Right now even SAFI 66 is obsolete; in newer deployments all MVPN signaling (either C-mroute or just MDT autodiscovery) is done by SAFI 129.

Conclusion

VRFs are a very powerful mechanism that allows to configure fairly complex routing policies. A network designer with sound understanding of VRFs and associated optimizations can achieve very high network scalability and perhaps save money on new equipment with more memory and TCAM.

With the rising popularity of EVPN, all this becomes more widespread and relevant not only for ISP, but also DC and even enterprise networks. I hope this article brings more clarity to the subject and will help some folks to prevent at least the most common design mistakes like using non-unique RD with any kind of multihoming.

References

  1. BGP/MPLS IP Virtual Private Networks (VPNs) https://tools.ietf.org/html/rfc4364
  2. Proxy IP->MAC Advertisement in EVPNs https://tools.ietf.org/html/draft-rbickhart-evpn-ip-mac-proxy-adv-01
  3. Ethernet-Tree (E-Tree) Support in Virtual Private LAN Service (VPLS) https://tools.ietf.org/html/rfc7796
  4. Ethernet-Tree (E-Tree) Support in Ethernet VPN (EVPN) and Provider Backbone Bridging EVPN (PBB-EVPN) https://tools.ietf.org/html/rfc8317
  5. Constrained Route Distribution for Border Gateway Protocol/MultiProtocol Label Switching (BGP/MPLS) Internet Protocol (IP) Virtual Private Networks (VPNs) https://tools.ietf.org/html/rfc4684
  6. Automatic Route Target Filtering for legacy PEs https://tools.ietf.org/html/draft-ietf-idr-legacy-rtc-08
  7. BGP MPLS-Based Ethernet VPN https://tools.ietf.org/html/rfc7432
  8. A Network Virtualization Overlay Solution Using Ethernet VPN (EVPN) https://tools.ietf.org/html/rfc8365
  9. Use of BGP for Routing in Large-Scale Data Centers https://tools.ietf.org/html/rfc7938
  10. Multicast in MPLS/BGP IP VPNs https://tools.ietf.org/html/rfc6513
  11. Virtual Private Networks Identifier https://tools.ietf.org/html/rfc2685
  12. LDP Multipoint Opaque Value Element Types https://tools.ietf.org/html/draft-bishnoi-mpls-mldp-opaque-types-01
  13. Cisco Systems’ Solution for Multicast in BGP/MPLS IP VPNs https://tools.ietf.org/html/rfc6037
  14. Close to the Edge https://routingcraft.net/close-to-the-edge/
  15. On Duplicates https://routingcraft.net/on-duplicates/

Notes

  1. ^In some network OS, VRFs are internally implemented based on Linux namespaces, with extra functionality to enable leaking, import/export etc. Still from the feature standpoint, this is very different from e.g. virtual routers in Openstack
  2. ^Also enables load balancing towards MAC-IP routes with L3 VNI (known as symmetric IRB) in Active/Active multihomed deployments
  3. ^Every time a configure an “extcommunity list” I think of excommunication, like if you upset the Pope in Medieval 2 Total War and he declares that you are a heretic
  4. ^For those interested in history, some details on how this used to work can be found in MPLS and VPN Architectures, Volume II: Vol 2 (ISBN-13: 978-1587051128) and in the old pre-RFC versions of draft Rosen

3 thoughts on “Segregated Routing”

  1. Thank you for the valuable information! I realy enjoyed reading your posts.

    Concerning the fw_int and fw_ext technique, I’ve identified a few implications:
    – All traffic enters through the ingress interface and exits via the egress interface.
    – Ensuring proper functionality of stateful firewalls, IPS, or other stateful inspection devices in scenarios where traffic, including return traffic, flows through distinct interfaces requires specific adjustments (such as disabling antispoofing or adapting the in/out interfaces within firewall rules, will the statefull device handle that correctly… do you see others ?).

    Have you encountered this type of setup in real production environments? I’m considering implementing something similar but have some reservations due to its unconventional nature.

    1. Not sure if I understood the question correctly.

      But most (all?) firewalls want you to create zones and then assign interfaces to the zones. With just 2 interfaces, you’ll have one internal and one external interface, with appropriate firewall/IPS/NAT rules on the firewall. This setup is pretty standard.

Leave a Reply

Your email address will not be published. Required fields are marked *