Segment Routing allows the network operator to deploy Traffic Engineering even with the most basic routers that support the bare minimum of features.
What is traffic engineering
Traffic engineering is a set of techniques to influence the path a particular type of traffic should take – usually in order to optimize performance and avoid congestions. Under this definition, any modification of routing policy can count as traffic engineering.
In practice, when talking about traffic engineering, people usually mean source routing. A headend router adds some sort of instruction to the packet that tells the network how this packet should be forwarded. So instead of following the IGP-calculated shortest path, the packet is sent via a specific path – for example, lowest latency path.
Why use traffic engineering
Typical use cases include:
- Send some traffic via a low latency path
- Send A/B streams via different paths avoiding shared risk links
- Load-balance traffic across a custom network topology
- Ensure traffic is rerouted without congestions after link failures
…and others
Another common reason specifically for RSVP-TE deployment is fast reroute (FRR) but I didn’t include that one because it doesn’t require traffic engineering as such, also with Segment Routing it’s possible to enable TI-LFA for fast reroute without any TE.
Entry barrier for TE
Currently there are 2 competing technologies used for traffic engineering: RSVP-TE (with MPLS data plane) and SR-TE (with MPLS or SRv6 data plane). Let’s understand what are the difficulties with deploying those.
RSVP-TE
Or as it’s often called, MPLS-TE. A mature, well-known technology but the routers must support a lot of features:
- TE extensions for IGP
- CSPF to calculate traffic engineering path
- RSVP to signal path
- Some method of steering traffic into TE tunnel: IGP shortcuts, RIB groups etc (details can be vendor specific)
Furthermore, every router in the network must support every feature. So in practice, most networks running RSVP-TE use expensive hardware from one of the few big vendors.
SR-TE
Traffic Engineering with Segment Routing can be distributed or centralized.
In distributed SR-TE, the number of features the implementation is required to support is similar to that of RSVP-TE; just instead of path signaling with RSVP, the router must calculate the segment list to be pushed on the packet.
In centralized TE, path calculation and segment list generation logic is outsourced to a controller. What the router must do is:
- Advertise the IGP topology to the controller
- Receive calculated SR-TE policies from the controller
- Steer traffic into the SR-TE policy
While it’s simpler than RSVP-TE, still the router must support a number of features. BGP-LS to advertise topology, BGP-SRTE or PCEP to receive policies, automated steering with color extcommunity to steer traffic into policy. There are advantages of SR-TE over RSVP-TE, but not every SR implementation is fully capable of SR-TE support.
How to make SR-TE work with any implementation
Let’s say we have a very basic SR-MPLS router that can do basic ISIS-SR and BGP, but doesn’t support any of the advanced protocols like BGP-LS or BGP-SRTE, PCEP, automated steering, CSPF etc.
It can be one of the non-mainstream vendors or a whitebox router with an open source routing stack.
The goal is to deploy traffic engineering using affinity and bandwidth constraints, map different types of traffic to different policies, and ensure there is a predictable failure scenario if some links/router fail, or if the controller fails completely.
Traffic Dictator is used as a controller. It is easy to deploy and configure, supports the most basic SR implementations and is available for free evaluation, PoC testing and studying.
Advertising IGP topology to controller
BGP-LS is now quite widely supported, even by whitebox software vendors like IP infusion. But still there are vendors with basic SR support that can’t do BGP-LS, also FRR can’t do it either.
In practice, it’s not a very big problem, because it’s sufficient that one router in the network supports BGP-LS so that it could export the topology information to the SR-TE controller. The rest of the network doesn’t need to support BGP-LS.
It’s not necessary to buy a hardware platform from a big vendor, you can just deploy a docker container with network OS from Cisco/Juniper/Arista etc on a server, next to an SR-TE controller, also in a docker container. Traffic Dictator runs just fine in a container with modest resource requirements of 2 CPU cores and 4 GB RAM.
Calculating SR-TE policies
This seems very easy in theory: apply CSPF constraints to exclude certain links and then run Dijkstra on the resulting topology. Shouldn’t be very difficult to implement.
However, the segment list generation algorithm is not standardized anywhere, so it’s up to the implementation. SID list needs to be minimal (to not exceed headend hardware limitations), but steer traffic over the desired path. There are a lot of tricky scenarios where this can go wrong, so a good SID list generation algorithm is essential.
Also, remember that Segment Routing works natively with IP and supports things like ECMP and anycast routing. Anycast SID helps to simplify operations, save hardware resources and provide redundancy. SID list generation algorithm must try to use anycast SID whenever possible, but also do a lot of checks to ensure it doesn’t break routing – e.g. mismatching SRGB, or the need to use local adj SID after anycast SID and other possible problems.
What all this means is that outsourcing path computation to the controller might not just be a poor man’s solution, but actually a wise decision that ensures the calculated policies are correct and optimal.
Sending calculated policies to the network
So the controller has received the IGP topology and calculated policies. How can it send them to the network, if none of the routers support BGP-SRTE or PCEP?
BGP-LU can replace those to a great extent. [RFC3107] predates Segment Routing by some 15 years, so it wasn’t designed for any of this, but works fine nevertheless.
For instance, in this topology we want to steer traffic from R1 to R5 via blue links. Controller sends a BGP-LU update to R1 with label stack of R4 and R5 node SID, with R3’s nexthop. R1 now can steer traffic as required.
Note that [RFC3107] does not specify how many labels can be in a BGP-LU update, but in practice most implementations can receive multiple labels. [RFC8277] clarifies this behaviour and adds a multiple label BGP capability, that can be negotiated when establishing a BGP session. It doesn’t say anything about this capability being in any way related to max SID depth, but I think it’s logical if routers would advertise the same multiple label capability as MSD, as both are connected to a hardware limitation of how many labels the platform can push.
Difference in label stack from BGP-SRTE or PCEP
If the policy in the example above was advertised via BGP-SRTE or PCEP, the SID list would be [R3, R4, R5]; not [R4, R5]. Or possibly the first SID would be R1’s adjacency SID towards R3. Upon receiving the policy, R1 would resolve it towards R3 nexthop and the actual label stack on the wire would be [R4, R5].
BGP-LU requires a different logic whereby the SID of the directly connected router is never advertised, instead the controller sets the nexthop in the BGP-LU update.
BGP-LU and ECMP
Nothing prevents the SR-TE controller from using ECMP and anycast SID with BGP-LU. However, there is a problem when there is ECMP on the first hop. Consider the topology:
The goal is to steer traffic from L1 to L6 via the blue path. What nexthop should the controller set in the BGP-LU update? With BGP-SRTE or PCEP, it would just send anycast SID shared between S1-S2, or, in absence of anycast SID, send 2 segment lists.
BGP-LU as a policy installation protocol has a limitation that it can’t use ECMP on first hop. It’s possible to work around this by enabling add-path between the headend router and controller, but I didn’t implement add path in Traffic Dictator, since this is a very specific corner case. So keep this limitation in mind.
Mapping traffic to BGP-LU policies
SR-TE policies have this great thing called automated steering, which can be used to map different traffic types to different SR-TE policies. Each SR-TE policy has a color, and BGP routes with matching color extended community are mapped to the respective policies. This works for regular BGP (IP transit), L3VPN, EVPN etc.
If in this topology the operator wants to steer traffic to 10.0.0.0/24 via blue links and traffic to 192.168.0.0/24 via yellow links, he can attach different color extended community to those routes and then map to the SR-TE policies on R1.
When using BGP-LU to send policies, it’s not possible to attach a color to them, so how can we map traffic to those policies?
If the BGP-LU route sent by the controller has prefix equal to the loopback of R8 (e.g. 8.8.8.8/32), then depending on the router configuration, one of the 2 things will happen:
- BGP-LU route will be ignored due to IGP having better preference (admin distance)
- BGP-LU route will be installed and all traffic with nexthop 8.8.8.8 will be mapped to it
Either way, this is not acceptable and not flexible if we want to map different traffic types to different policies.
Service loopbacks
Traffic Dictator introduces the concept of service loopbacks. They serve the same function as color in SR-TE: map different types of traffic to different policies.
When configuring a policy, the operator can specify either color or service-loopback (mutually exclusive):
TD1#conf TD1(config)#traffic-eng policies TD1(config-traffic-eng-policies)#policy R1_R8_BLUE_ONLY_IPV4 TD1(config-traffic-eng-policies-policy)#endpoint 8.8.8.8 ? color Color for SRTE policy service-loopback Service loopback for LU policy <cr>
To achieve the same behaviour as in the previous example, but with BGP-LU, configure multiple loopbacks on R8, and instead of setting color extended community, change nexthop on BGP routes to that loopback.
The service loopbacks MUST NOT be advertised into IGP.
Then on the controller, configure the policy with endpoint 8.8.8.8 (which is advertised into IGP), and the respective service loopback.
traffic-eng policies ! policy R1_R8_BLUE_ONLY_IPV4 headend 1.1.1.1 topology-id 101 endpoint 8.8.8.8 service-loopback 8.1.0.1 priority 7 7 install direct labeled-unicast 192.168.0.101 ! candidate-path preference 100 metric igp affinity-set BLUE_ONLY bandwidth 100 mbps
Traffic Dictator will calculate the policy towards 8.8.8.8, but the BGP-LU prefix will be that of service loopback (8.1.0.1).
TD1#show traffic-eng policy R1_R8_BLUE_ONLY_IPV4 detail Detailed traffic-eng policy information: Traffic engineering policy "R1_R8_BLUE_ONLY_IPV4" Valid config, Active Headend 1.1.1.1, topology-id 101, Maximum SID depth: 6 Endpoint 8.8.8.8, service-loopback 8.1.0.1 Endpoint type: Node, Topology-id: 101, Protocol: isis, Router-id: 0008.0008.0008.00 Setup priority: 7, Hold priority: 7 Reserved bandwidth bps: 100000000 Install direct, protocol labeled-unicast, peer 192.168.0.101 Candidate paths: Candidate-path preference 100 Path config valid Metric: igp Path-option: dynamic Affinity-set: BLUE_ONLY Constraint: include-all List: ['BLUE'] Value: 0x1 This path is currently active Calculation results: Aggregate metric: 400 Topologies: ['101'] Segment lists: [900008] BGP-LU next-hop: 10.100.0.2 Policy statistics: Last config update: 2024-09-01 09:29:55,203 Last recalculation: 2024-09-01 09:32:05.247 Policy calculation took 0 miliseconds
This way it is possible to achieve a behaviour similar to automated steering by using BGP-LU.
To be more specific, this is most similar to any-endpoint color-only steering (CO10), and the same service loopback can be configured on multiple routers, so it is possible to use this technique with anycast routing, egress peer engineering and other more complex use cases.
Redundancy
Now, what if the controller fails? Sure, you can have 2 or more controllers, but what if all of them fail at the same time?
In regular SR-TE, color will be ignored and BGP routes will just resolve via IGP instead of the SR-TE policy, and follow the shortest path.
BGP-LU requires some extra config to ensure redundancy.
R8 needs to advertise a BGP-LU route for each service loopback. Nexthop should be R8 loopback that is advertised into IGP (8.8.8.8) and local preference should be lower than that of routes sent by the controller.
Should the controller fail, BGP routes will be resolved via this BGP-LU route, which will be resolved via IGP and traffic will follow the IGP shortest path.
Conclusion
With Traffic Dictator, BGP-LU and service loopbacks it is possible to deploy Traffic Engineering using the most minimalistic Segment Routing implementation, of which there are plenty on the market, and get all the same benefits as with expensive routers from big vendors.
For further information, check out Traffic Dictator White Paper, BGP-LU documentation or try one of the prepared containerlab topologies