Egress Peer Engineering: building blocks

Since exploring the EPE basics, now it’s time to understand the building blocks of the solution.

Summary

Overall, there are 3 elements of the EPE solution:

  1. Egress routers allocate MPLS labels per egress peer and advertise them
  2. Ingress routers or controller calculate an LSP with the EPE labels
  3. Ingress routers map traffic to that LSP

Taking the simple EPE example without a controller from the previous article, the egress routers allocate EPE labels and advertise them as BGP-LU routes to the ingress routers. The ingress routers recursively resolve those BGP-LU routes over the LDP/RSVP/SR LSP and map Internet routes to the EPE using BGP communities.

However, many other combinations are possible.

Allocating EPE labels

This can be done in 3 ways: static MPLS routes, BGP-LU and BGP Peer SID.

Static routing

Also known as Software Defined Networking (SDN). Pretty much every router that supports MPLS allows you to configure some sort of a static LSP. For EPE, the action must be pop and the destination is the relevant BGP peer.

Then the SDN controller must somehow check the router config or LFIB to get the label, and also check the status of the BGP session to recalculate EPE policies when the BGP peer goes down.

While this can work, but even I, as an SR-EPE controller developer, am quite skeptical about this approach and prefer solutions when an open standard protocol is being used for router<>controller communication.

BGP-LU

The solution is described in [draft-gredler-idr-bgplu-epe]. The egress router allocates a /32 (for IPv4) or /128 (for IPv6) BGP-LU route per egress peer and advertises it either to the ingress router or to the controller.

In an IXP LAN, the router can allocate BGP-LU routes not per BGP session, but per BGP nexthop. So it can have just one BGP session with the route server, but create a lot of BGP-LU routes, because route servers advertise routes from different IXP members.

The advantage of this solution is its simplicity. BGP-LU is an old protocol which is widely supported.

But there are some caveats:

  1. If using an EPE controller, it needs a BGP-LU session with each egress router. Also it’s important to carefully apply BGP filters to allow only relevant BGP-LU routes used for EPE, otherwise it’s easy to confuse the controller with a variety of different BGP-LU routes.
  2. Lack of BGP router-id in BGP-LU NLRI also leads to a problem whereby multiple egress routers are on the same LAN segment (e.g. in an IXP). Consider the topology:

ASBR1 and ASBR2 are on the same IXP LAN. They advertise the same BGP-LU NLRI to the controller, so only one (selected as best route) can be actually used for EPE. The controller then must map the router ID from the BGP session to the BGP-LU route, so it can be used to calculate EPE policies. However, when multiple ASBR advertise the same EPE route, only one wins BGP best path and can be used in EPE policies.

BGP Peer SID

Per [RFC9086], there are 3 types of BGP Peer SID:

  1. Peer Node SID: allocated per BGP peer
  2. Peer Adjacency SID: allocated per member link, when BGP peer is reachable via ECMP
  3. Peer Set SID: allocated per a set of BGP peers

BGP peering segment is the best option for EPE, as it doesn’t have the limitations described above and is the most scalable. Any number of egress routers can allocate BGP Peer SID, advertise them to a route reflector using BGP-LS and the RR will advertise all of them using just one BGP session with an EPE controller.

How is this related to Segment Routing

Some readers might get the wrong impression that BGP Peer SID somehow requires Segment Routing to be enabled in the network but this is absolutely not the case.

As I wrote in the previous article, EPE works best with SR, but can also work without SR. It is possible to use any type of EPE labels allocation described above, with Segment Routing or without it. You can build SR-TE EPE policies using BGP-LU EPE routes, or non-SR EPE policies using BGP Peer SID.

Calculating and advertising EPE policies

In the most basic form, the operator manually configures the EPE policy endpoint to match the IP address of an egress peer. Controller must find the relevant EPE label, and should the peer go down, recalculate the policy accordingly.

A good controller can also add some sort of custom constraints, like affinity, bandwidth, disjoint path etc. There is no standard for advertising link affinity or bandwidth with BGP Peer SID, but the controller can implement this logic internally. Traffic Dictator supports affinity and bandwidth constraints for EPE policies.

If this is an SR-EPE policy, the controller can run CSPF from headend to the egress ASBR using the IGP topology, and then attach the relevant EPE label. Likewise, it can use the same affinity and bandwidth constraint for CSPF and EPE.

Consider the topology:

The operator wants to steer traffic from R1 to AS100 using only blue links, and reserve 10 Gbps of bandwidth.

The controller checks the configured policy endpoint against the IGP and EPE topologies.

If the endpoint IP is found to be a loopback on one of the routers, this will be a regular SR-TE policy.

If the endpoint IP is an egress peer, the controller will check the BGP router id of the egress ASBR where this egress peer is connected and try to find the matching TE router id in the IGP topology. Then build an SR-TE policy to that router and attach an EPE label. The same affinity and bandwidth constraint can be used for the IGP and EPE part.

Null endpoint

SR-TE has a concept of Null endpoint (0.0.0.0 or ::). It’s used for automated steering, but RFCs don’t say what should be the actual destination for the Null endpoint policies.

My interpretation of Null endpoint, which is also implemented in Traffic Dictator, is that it’s an EPE policy to the closest (by lowest IGP or TE metric) egress peer with matching affinity and bandwidth constraints. It makes the most sense and allows for some neat network design options.

Installing EPE policy

There are multiple ways how the controller can install the calculated EPE policy to the headend.

CLI / API / Netconf / GNMI

Or static routing (also known as SDN). All these methods have the same flaws:

  1. Vendor-proprietary interface, so controller must be extended to support a specific router vendor
  2. Using management protocol for control plane functionality – so unpredictable convergence time and failover is not guaranteed

PCEP

Described in [RFC5440] and [RFC8664]. Initially developed for RSVP-TE, later extended to support SR-TE.

The advantage of PCEP is the support for On Demand Nexthop (ODN) whereby a router requests the controller to calculate a policy when required. So there is no need to configure a lot of policies on the controller, which simplifies configuration.

The disadvantage is the requirement for a PCEP session between the controller and each headend router. In case of 2 or more controllers, even more PCEP sessions are required.

BGP-SRTE

Described in [draft-ietf-idr-segment-routing-te-policy]. Unlike  PCEP, with BGP-SRTE it is possible to reuse the existing BGP infrastructure to distribute SR-TE and EPE policies. The controller just needs a BGP session with the route reflector, which can propagate policies to all routers that need them.

BGP-LU

Truly, a universal protocol used for everything, BGP-LU can be also used to distribute SR-TE or EPE policies to routers that don’t support any other method. There are some limitations, but generally it works fine.

Mapping traffic to the EPE policy

Once the policy has been installed, we can map the actual data traffic to it.

Binding SID

When a router installs an SR-TE policy, it also installs a Binding SID in LFIB. When it receives MPLS traffic with a top label equal to Binding SID, it maps that traffic to the SR-TE policy.

Automated steering

The recommended way to map traffic to SR-TE policies is Automated Steering (AS). The policy endpoint (or egress ASBR in case of EPE) advertises a BGP route with a specific color extended community and if the headend has an SR-TE policy with the same color, the BGP route will be mapped to that SR-TE policy.

In the example below, R6 advertises an BGP route with color 101 and R1 has an SR-TE policy with endpoint 200.2.2.200 and color 101, so traffic towards 192.0.2.0/24 is mapped to that policy.

Color-only steering

Besides the color value, it is also possible to set color-only bits in the color extended community. Default CO bits are 00, which means the route must match color AND endpoint. However there are 2 other possible CO settings:

CO 01 steering (null endpoint) – select the SR-TE policy in the following order of preference:

  1. Policy matching endpoint and color
  2. Policy with null endpoint, matching color,  the same address family
  3. Policy with null endpoint, matching color, any address family

CO 10 steering (any endpoint) – select the SR-TE policy in the following order of preference:

  1. Policy matching endpoint and color
  2. Policy with null endpoint, matching color,  the same address family
  3. Policy with null endpoint, matching color, any address family
  4. Policy with any endpoint, matching color, the same address family
  5. Policy with any endpoint, matching color, any address family

Color-only steering allows for very flexible traffic engineering designs and is especially useful with EPE and Null endpoint policies.

Service loopbacks

If the router doesn’t support neither BGP-SRTE nor PCEP and receives policies via BGP-LU, it is not possible to use automated steering because there is no way to advertise color together with BGP-LU. Traffic Dictator allows the use of “service-loopbacks” which deliver a result similar  to CO 10 steering (any endpoint). The endpoint must have the relevant loopback configured and change nexthop in advertised BGP routes to that loopback IP.

 

In the next article I will show some examples of EPE with Segment Routing.

Leave a Reply

Your email address will not be published. Required fields are marked *