Fast convergence after failures has always been an important part of ISP network design.
When a failure is detected, it takes a while until the routing protocol propagates new information throughout the network and all routers update their FIB. Some timer tuning is possible here, but global convergence (or global repair) will never be instant, and updating a lot of prefixes in FIB can take a lot of time as well. What if the router local to the protected link could pre-compute a backup path and install it in FIB in advance, so that upon failure traffic can be immediately switched over? Local repair can reduce traffic loss time after failure to <50 ms1.
MPLS Fast Reroute
Facility backup to be precise (RFC4090#section-3.2). Anyway, this is the most widely deployed flavour of MPLS FRR, so I will not mention others. It was a game changer for many telecoms in the 2000s, as it allowed to move from bandwidth-wasting circuit switched SDH networks with protection provided by APS to packet switched MPLS networks where backup paths don’t use any bandwidth while on standby.
When the RSVP-TE headend signals a tunnel, it can request either link or node protection by setting appropriate flags in RSVP path messages. Every transit router with FRR active will try to provide the requested protection by setting up FRR tunnels either:
- to nexthop, avoiding the protected link (link protection)
- to next-nexthop, avoiding the protected link and nexthop node (node protection)
If link failure is detected, the Point of Local Repair (PLR) reroutes traffic over the FRR tunnel, until the headend learns about failure and resignals its tunnel with new constraints or tears it down.
Link protection example:
Fig. 1
Signaling a backup tunnel for each transit tunnel would be a waste of resources, so FRR tunnels are reused for transit tunnels with the same nexthop or next-nexthop.
Consider the following topology (mind the link costs):
Fig. 2
The tunnels R1-R6 and R1-R7 both requested node protection so R2 signaled a backup tunnel to R4, excluding R3. For R1-R7, traffic flow in case of failure will be suboptimal: it will hop between R4 and R5 without any purpose, wasting bandwidth and increasing latency.
Each transit tunnel gets either link or node protection (depending on configuration and topology constraints). When the link goes down, or BFD echoes stop coming, the PLR doesn’t know whether node or link has failed, so whatever protection is assigned to the primary tunnel, will be used.
To summarize, MPLS FRR is a great technology, but it has problems:
- All traffic that needs protection must flow via RSVP-TE tunnels, even if no real traffic engineering is used. RSVP-TE is notorious for poor scalability and operational complexity. The scalability problem can be partially solved by using onehop tunnels and LDP over them; or a mesh of RSVP tunnels in the core, protecting only important links. All those designs further increase complexity and have own drawbacks.
- Link protection never protects against node failure. If the nexthop node fails, traffic will be blackholed until global repair happens.
- In a lot of situations, FRR path is not on the post-convergence path. This can lead to increased delay, or traffic being forwarded over congested links, causing packet drops before global repair happens.
Loop Free Alternate
Even in IP networks without MPLS, sometimes it is possible to provide local repair. Since all routers in link-state IGP know the entire topology, any router can run SPF with its neighbour as root and check where the route to the given prefix points. If it doesn’t point back to us, that neighbour is considered loop-free, so it can be used as a backup path.
Fig. 3
On figure 3, from S perspective, the condition for N to be loop free in case of S-E link failure is Distance (N,D) < Distance (N,S) + Distance (S,D). Since 10 < 25, N is loop-free and can be used immediately after link failure. This is by the way similar to feasible successors in EIGRP, just calculated differently (EIGRP is not link-state).
The problem with LFA is poor coverage – in many topologies it is simply not possible to do local repair without source routing.
See also RFC5286
Remote LFA
Remote LFA (RFC7490) improves the coverage of LFA. It requires MPLS (LDP), but no RSVP-TE2. The PLR calculates P space (routers it can reach without traversing the protected link) and Q space (routers that can reach the protected destination without traversing the protected link). Routers that belong to both spaces are called PQ routers, then the PLR chooses one by some tie-breaker and signals a targeted LDP tunnel to it.
Fig. 4
RLFA is a really great technology which allowed to simplify and scale many MPLS networks. In simple topologies without imbalanced metrics, it provides good coverage, and doesn’t require RSVP-TE with all its complexity and scalability problems. Also, even though RLFA is not guaranteed to use the post-convergence path, it is more likely to be on it, compared with MPLS FRR.
TI-LFA
The natural limitation of LFA/RLFA is the lack of source routing in LDP. If the P and Q spaces don’t intersect, there is no PQ router, and no protection.
Segment Routing can do source routing. Thus now we have Topology Independent LFA – that is, LFA that can work in any topology, with no PQ router, or even with P and Q routers more than one hop away. Moreover, TI-LFA doesn’t blindly choose PQ routers based on some tie-breakers like LFA does, but steers traffic over the post-convergence path. Sounds like a genius idea which can solve the problem of local protection once and for all:
- Always follows the optimal post-convergence path (no transient delay/congestions during failover).
- Very scalable, doesn’t signal any tunnels.
- Provides protection in any topology (hence topology independent).
While “Topology Independent” nicely highlights the superiority of TI-LFA over its predecessors, the truth is that…
Nothing in networking is topology independent.
Label stack
In order to steer traffic on post-convergence path before global repair happens, the PLR in TI-LFA pushes extra labels on packets. How many labels exactly? It depends on topology.
In scenarios similar to IP LFA (Fig. 3), no repair labels are needed, since the first router on post-convergence path is loop-free.
In scenarios similar to remote LFA (Fig. 4), one repair label is required.
draft-ietf-rtgwg-segment-routing-ti-lfa section 8 claims 99% coverage with zero or one segment in real world networks. Great success indeed (this also means RLFA with LDP would provide the same coverage).
Still, in some cases, if link costs differ a lot, two repair labels are needed. Consider the topology on figure 5 (mind the link costs):
Fig. 5
There is no PQ router, so if R1 wants to protect traffic to R3 from the R1-R2 link failure, it needs 2 extra labels – to deliver traffic to P router, and then to Q router. This scenario is identical to protection tunnel with direct forwarding described in the PhD thesis of Pierre Francois where he has proven that it will be sufficient to provide link protection in any topology with symmetric link costs.
If due to misconfiguration or weird design there are asymmetric link costs, P and Q routers might be not directly connected, so more labels will be needed. Anyway this is a corner case which should not appear in well-designed networks.
Node protection
When node protection is requested, there is theoretically an infinite number of repair labels the PLR must push, even without asymmetric link costs.
Fig. 6
In this topology, P and Q routers are 2 hops away, so 3 repair labels are needed. Fortunately, R3 also happens to be the Q router, so the PLR doesn’t have to push SID 3 because of PHP (only repair labels are pushed).
Sample outputs (Arista EOS):
R1#show isis ti-lfa path R3 detail TI-LFA paths for IPv4 address family Topo-id: Level-2 Destination: R3 Path constraint: exclude Ethernet1 Request sequence number: 1 Response sequence number: 1 Number of times path updated: 2 Last updated: 0:52:49 ago ID: 0x2 Path: R4 [PQ-node] Path constraint: exclude node 0002.0002.0002 Request sequence number: 1 Response sequence number: 1 Number of times path updated: 4 Last updated: 0:50:24 ago ID: 0x3 Path: R4 [P-node] R5 R6 R3 [Q-node] R1#show mpls lfib route 900003 IP 900003 [1], 3.3.3.3/32 via TI-LFA tunnel index 3, pop payload autoDecide, ttlMode uniform, apply egress-acl via 10.0.0.2, Ethernet1, label 900003 backup via 10.1.1.4, Ethernet2, label 100002 100002 100002
If I extend this “chain” of routers connected to R2 with low-cost links but between each other with high-cost links, there will be even more hops between P and Q routers, and even more repair labels will be needed.
If the PLR has to push a lot of labels, it might not be able to do so due to platform limitations. Of course, pushing 2-3 or even a few more labels is not a problem for modern routers (okay let’s forget about those made up scenarios with an infinite number of repair labels, they won’t occur in real networks). But! The PLR is not necessarily a transit router that just swaps labels. It can also be a tunnel headend. In the usual ISP design, the PLR might have to push: 1 or 2 transport labels, 1 service label, ELI + entropy label, for certain packets something like EVPN ESI label – this already gets us to 4-6 labels. SR is marketed as an “SDN” technology, so vendors encourage people to deploy all kinds of controllers with traffic engineering pushing more transport labels than common sense dictates3.
Even if you don’t hit platform limitation on the number of labels pushed at once, there are other potential problems triggered by huge label stacks, such as ECMP and MTU. Routers can lookup only a few labels deep to find L3/L4 headers or entropy label for ECMP hashing4. MTU doesn’t seem to be a big deal, each MPLS label is only 4 bytes. But there is also SRv6 with huge extension headers, and TI-LFA is supposed to work in SRv6 as well!
Link vs node protection
One of the advantages of TI-LFA is that it always steers traffic via the post-convergence path. That path can be calculated with various constraints, such as link, node or SRLG protection. While SRLG constraint makes sense to enable for the links which actually belong to SRLG (as if they fail, they fail together), link vs node protection is an interesting dilemma. Link failures are more common, but it would be nice to have node protection as well.
MPLS FRR requests node protection by default, as better protection. This makes sense, because neither link nor node protection is likely to be on post-convergence path, and link protection tunnel would be signaled to nexthop node, so that it never protects against node failure. If no node protection is available in the given topology, a link protection tunnel is signaled. An example of such topology is figure 1 – if R3 fails, there is no other way to reroute traffic.
De Facto node protection
How does the MPLS FRR link/node protection logic fit into TI-LFA? In a topology like on figure 3, post-convergence path for link and node protection is the same. The same will be true for most Ring and Clos topologies. So regardless of what mode is enabled, it will protect from both link and node failures.
Fig. 7
On figure 7, post-convergence paths for link and node protection are different. With node protection enabled, if the link fails, traffic will be routed over a suboptimal path, and after global repair switch to the actual post-convergence path. With link protection enabled, if the node fails, is traffic going to be blackholed?
Fig. 8
Node failure results in the failure of all links connected to the given node. It will be detected simultaneously by all directly connected routers. If link protection is enabled on both R1 and R4, node protection will be provided by a cascading effect of multiple link protections. This effect has been known for quite a while, since IP LFA. See RFC6571#section-2
The advantage of this effect, known as De Facto node protection is that an optimal repair path will be provided for both link failures and node failures.
But of course, not everything is so great.
Fig. 9
In this topology (mind the link costs), if R2 fails, R1 and R4 will activate their link protection LFA which point to each other, thus triggering a (micro) loop. It’s not too bad, as soon as global repair happens, they will route traffic properly, even if convergence delays are configured. But if <50ms repair is needed in this scenario, node protection constraint must be enabled in the TI-LFA config of R1.
Some time ago I wrote a script that connects to your PLR, checks prefixes with link protection enabled and tells whether de facto node protection will work for them or not. Check this out:
https://github.com/routingcraft/node-protection-checker
Vendor implementations
While TI-LFA is an open standard, vendors provide very different configuration structures for node protection. The task of interpreting these outputs is not very trivial.
Arista EOS
This is probably the simplest and most logical structure. But of course I’m biased, since I work for Arista as of the time of this writing.
router isis 1 address-family ipv4 unicast fast-reroute ti-lfa mode node-protection level-2
This enables node protection. If node protection is not possible for the given SID in the given topology, it provides link protection.
R1#show isis ti-lfa path R2 det TI-LFA paths for IPv4 address family Topo-id: Level-2 Destination: R2 Path constraint: exclude Ethernet1 Request sequence number: 1 Response sequence number: 1 Number of times path updated: 1 Last updated: 3:20:42 ago ID: 0x6 Path: R3 [PQ-node] Path constraint: exclude node 0002.0002.0002 Request sequence number: 1 Response sequence number: 1 Number of times path updated: 1 Last updated: 0:02:10 ago ID: 0x8 Path: Path not found
In this example, SPF with “exclude node” constraint failed, so TI-LFA provides at least link protection.
Cisco IOS-XR
The outputs on IOS-XR show when node protection active, even when it is not explicitly enabled (e.g. topology on Fig. 3):
RP/0/RP0/CPU0:R6#show isis fast-reroute sr-only det 4.4.4.4/32 Sat Mar 14 21:40:34.332 UTC L2 4.4.4.4/32 [30/115] medium priority via 10.9.9.5, GigabitEthernet0/0/0/3, R5, SRGB Base: 900000, Weight: 0 FRR backup via 10.6.6.2, GigabitEthernet0/0/0/6, R2, SRGB Base: 900000, Weight: 0, Metric: 30 P: Yes, TM: 30, LC: No, NP: Yes, D: Yes, SRLG: Yes via 10.6.6.2, GigabitEthernet0/0/0/6, R2, SRGB Base: 900000, Weight: 0 FRR backup via 10.9.9.5, GigabitEthernet0/0/0/3, R5, SRGB Base: 900000, Weight: 0, Metric: 30 P: Yes, TM: 30, LC: No, NP: Yes, D: Yes, SRLG: Yes src R4.00-00, 10.4.4.4, prefix-SID index 4, R:0 N:1 P:0 E:0 V:0 L:0, Alg:0
TI-LFA on IOS-XR is enabled on interface level and there are no separate options for link/node/SRLG protection. Instead, those things can be configured as tie breakers to choose the best post-convergence path. This requests node protection, but if it’s not possible in the given topology, at least link protection will be provided.
router isis 1 address-family ipv4 unicast fast-reroute per-prefix tiebreaker node-protecting index 100 ! ! interface GigabitEthernet0/0/0/3 point-to-point address-family ipv4 unicast fast-reroute per-prefix fast-reroute per-prefix ti-lfa
If node protection is not possible due to topology limitations, the outputs looks the same as if node protection wasn’t requested:
RP/0/RP0/CPU0:R6#show isis fast-reroute sr-only det 2.2.2.2/32
Sat Mar 14 21:46:33.727 UTC
L2 2.2.2.2/32 [20/115] medium priority
via 10.6.6.2, GigabitEthernet0/0/0/6, R2, SRGB Base: 900000, Weight: 0
FRR backup via 10.9.9.5, GigabitEthernet0/0/0/3, R5, SRGB Base: 900000, Weight: 0, Metric: 30
P: No, TM: 30, LC: No, NP: No, D: No, SRLG: Yes
src R2.00-00, 10.0.0.2, prefix-SID index Exp-Null-v6, R:0 N:1 P:0 E:0 V:0
L:0, Alg:0
Juniper JUNOS
Juniper uses the concept of “strict” and “loose” node protection. In strict mode (which is default), if no node protection is possible, no protection is provided at all. Sample config:
[edit protocols isis] root@R7# show backup-spf-options { use-post-convergence-lfa; use-source-packet-routing; } interface ge-0/0/2.0 { point-to-point; level 2 { post-convergence-lfa { node-protection; } } }
Prefix with active node protection (higher weight means backup path):
root@R7> show route 3.3.3.3/32 table inet.3 detail | match "entry|ISIS|weight|oper"
3.3.3.3/32 (1 entry, 1 announced)
*L-ISIS Preference: 14
Next hop: 10.11.11.2 via ge-0/0/2.0 weight 0x1, selected
Label operation: Push 900003
Next hop: 10.12.12.6 via ge-0/0/4.0 weight 0xf000
Label operation: Push 900003, Push 900004, Push 900005(top)
In strict mode, if no node protection is possible for the given prefix, the router will not provide any protection!
root@R7# run show route 1.1.1.1/32 table inet.3 detail | match "entry|ISIS|weight|oper" 1.1.1.1/32 (1 entry, 1 announced) *L-ISIS Preference: 14 Next hop: 10.11.11.2 via ge-0/0/2.0 weight 0x1, selected Label operation: Push 900001
In order to enable the same behaviour as works by default on EOS and IOS-XR with node protection, you have to enable loose mode by adding cost [max IS-IS metric – 1] to the config:
[edit protocols isis interface ge-0/0/2.0] root@R7# show point-to-point; level 2 { post-convergence-lfa { node-protection cost 16777214; } }
Now when no node protection is possible, JUNOS falls back to link protection:
root@R7# run show route 1.1.1.1/32 table inet.3 detail | match "entry|ISIS|weight|oper"
1.1.1.1/32 (1 entry, 1 announced)
*L-ISIS Preference: 14
Next hop: 10.11.11.2 via ge-0/0/2.0 weight 0x1, selected
Label operation: Push 900001
Next hop: 10.12.12.6 via ge-0/0/4.0 weight 0xf000
Label operation: Push 900001
Microloops
Global repair doesn’t happen on all routers exactly at the same time. If global repair on PLR happens earlier than on every router on post-convergence path, then, depending on the topology, a microloop might form.
Fig. 10
If in this topology global repair in R2 happens before global repair on R3, there will be a microloop.
Fig.11
Microloops commonly occur in link-state IGPs without local repair techniques, but there they are indistinguishable from usual traffic blackholing after failure. It wouldn’t make any difference if R3 converged before R2 – in either case they all must converge in order for traffic forwarding to be restored.
If TI-LFA is enabled on R2, it will reroute traffic very fast, but once global repair happens, R2 updates its FIB and stops using repair labels. But if R3 still hasn’t converged, it will loop traffic back to R2.
Fig. 12
This doesn’t happen in all topologies, but is quite common. The solution is to delay FIB update on PLR so that it will wait, hoping that all other routers will converge in the meantime. E.g. on EOS:
R1(config-router-isis)#timers local-convergence-delay 10000
Fig. 13
See also RFC8333
While convergence delay is a simple and efficient solution for microloops after link/node failure and local repair activation, microloops on link-up events is a totally different topic. Depending on topology and timers, some improvement can be achieved by delaying IS-IS LSP propagation, but it will not work in every scenario. What is the ultimate solution for microloops on link-up is to use source routing (e.g. SR-TE policies) during network convergence when a newly brought up link triggers a microloop. draft-bashandy-rtgwg-segment-routing-uloop and https://www.segment-routing.net/conferences/2016-mpls-sdn-world-congress-2016-paris/#avoiding-micro-loops-in-mpls-networks-using-segment-routing contain some details.
SR-TE policies
While MPLS-TE and MPLS-FRR were so much intertwined that one had to enable TE to use FRR, even when no real traffic engineering was required, SR-TE and TI-LFA are not only unrelated to each other, but actually don’t work well together.
The two approaches to control plane in telecommunications are circuit switching and packet switching. The former was used in old telephone networks. It is prone to wasting a lot of bandwidth, so most modern networks (even telephone) are moving away from it. But virtual circuit switching is still used a lot, and RSVP-TE is one of such technologies. Before traffic can be forwarded, an end-to-end LSP must be signaled. That LSP can be signaled with various constraints related to bandwidth, administrative groups etc. Even if it just follows the IGP shortest path, the signaling process is still the same. When MPLS FRR is used, it protects the LSP, no matter with what constraints it was signaled.
Segment routing is a pure packet switching technology, like IP. It allowed for better scalability, simplicity, anycast routing and all the other cool stuff which made packet switching win over circuit switching in first place.
Despite having an extensive framework for traffic engineering, even SR authors advise that shortest-path routing should be used in virtually all cases, and only in a very limited set of scenarios, SR-TE actually makes sense5. And this is not without good reasons.
In SR-TE, there is no concept of “LSP”. If I want the headend to steer traffic via a path different from the shortest path, it must do source routing by stacking multiple labels that indicate through which routers the packet is supposed to travel. Usually it doesn’t have to push many labels, but that depends on topology and TE requirements. All transit routers are unaware of any traffic engineering happening, they just lookup the top label to make a forwarding decision. The most obvious conclusion from this is the inability of bandwidth reservations without a controller6.
How does this work with TI-LFA? It is not aware of the LSP, so will just protect top segment.
Fig. 14
If R1 has a SR-TE policy to steer traffic to R4 via R2-R3, it will push SID 3 on top. R2 can’t provide node protection for directly connected SID, but the real destination is R4, which can be protected from nexthop node failure. In order to do this, R2 would have to create a context-specific LFIB for each protected node and do a double label lookup in case of node failure, to make forwarding decision based on the second label.
It can get more complex than that. SR-TE policies can use adjacency SID, anycast SID (properly protecting which brings its own caveats). Routers can use different SRGB.
Fig. 15
Adj SID of R3 pointing to R4 is local to R3. If R3 fails, there is no router that can process that SID. Therefore, in order to protect the SR-TE policy, PLR must include local SID of directly connected nodes in those context-specific LFIB.
As of the time of this writing, TI-LFA for SR-TE remains a highly theoretical topic. It is possible to make it work, but this is so wrong, the 2 things have just not been made to work together.
draft-hegde-spring-node-protection-for-sr-te-paths describes all the scenarios of SR-TE node protection and related challenges. And draft-hu-spring-segment-routing-proxy-forwarding suggests an idea of “proxy forwarder” – a node to which other routers can reroute SR-TE traffic with the SID of protected node on top.
By the way, all those scenarios don’t even consider post-convergence path for SR-TE anymore. They try to provide at least any repair path…
Perhaps a better option would be to just let SR-TE headend to signal a disjoint backup path and run Seamless BFD over the primary path to detect failures. This actually might work better than end-to-end protection in MPLS FRR, since failure detection relies on BFD rather than IGP propagating failure all the way to the TE headend.
Flex Algo
draft-ietf-lsr-flex-algo is a revival of Multi-Topology Routing (MTR) for SR. In theory, it has a stronger position than MTR (which sort of failed), because in data plane, topologies are separated by different labels, while in pure IP routing this was not that simple. The rule for TI-LFA in Flex Algo is that the backup path should be in the same Flexible topology as the primary path. This potentially allows some sort of traffic engineering without hitting the limitations of SR-TE policies (depending on topology).
Preferred Path Routing
draft-chunduri-lsr-isis-preferred-path-routing attempts to solve the problems of large label stacks in SR-TE by replacing them with local label derived from PPR-ID advertised by IS-IS. While it is not yet clear how TI-LFA will work with PPR, it should not suffer from the same problems as in SR-TE. It will take a while until we see production implementation and adoption of PPR, but potentially this is an interesting technology which might solve some problems of SR-TE.
Conclusion
The main question is: is local repair needed at all?
Long time ago, global repair was very slow, so FRR allowed to reduce convergence time from order of seconds to tens of milliseconds. According to Alia Atlas, complaints of WoW players about unstable Internet during raids were one of the major business drivers for FRR.
On modern hardware, with properly tuned IGP timers, global repair can be as fast as a few hundred ms. This is negligible for most applications. Most modern cloud networks don’t have any sort of local repair. Networks that can’t afford a single packet loss (e.g. financial), design double redundant streams of data.
But still, in ISP networks there is demand for local repair technologies – due to unreliable long-distance links, real-time traffic, and network scalability requirements which don’t always allow to aggressively tune IGP timers.
TI-LFA is the most advanced local repair technology today. But it is important to understand its limitations and caveats to properly design a scalable and fast converging network.
References
- Fast Reroute Extensions to RSVP-TE for LSP Tunnels https://tools.ietf.org/html/rfc4090
- IP Fast Reroute Framework https://tools.ietf.org/html/rfc5714
- Basic Specification for IP Fast Reroute: Loop-Free Alternates https://tools.ietf.org/html/rfc5286
- Loop-Free Alternate (LFA) Applicability in Service Provider (SP) Networks https://tools.ietf.org/html/rfc6571
- Remote Loop-Free Alternate (LFA) Fast Reroute (FRR) https://tools.ietf.org/html/rfc7490
- Topology Independent Fast Reroute using Segment Routing https://tools.ietf.org/html/draft-ietf-rtgwg-segment-routing-ti-lfa-03
- Operational Management of Loop-Free Alternates https://tools.ietf.org/html/rfc7916
- Improving the Convergence of IP Routing Protocols https://inl.info.ucl.ac.be/system/files/pierre-francois-phd-thesis_0.pdf
- Micro-loop Prevention by Introducing a Local Convergence Delay https://tools.ietf.org/html/rfc8333
- Loop avoidance using Segment Routing https://tools.ietf.org/html/draft-bashandy-rtgwg-segment-routing-uloop-08
- Avoiding micro-loops using Segment Routing https://www.segment-routing.net/conferences/2016-mpls-sdn-world-congress-2016-paris/#avoiding-micro-loops-in-mpls-networks-using-segment-routing
- Node Protection for SR-TE Paths https://tools.ietf.org/html/draft-hegde-spring-node-protection-for-sr-te-paths-05
- SR-TE Path Midpoint Protection https://tools.ietf.org/html/draft-hu-spring-segment-routing-proxy-forwarding-07
- IGP Flexible Algorithm https://tools.ietf.org/html/draft-ietf-lsr-flex-algo-06
- Preferred Path Routing (PPR) in IS-IS https://tools.ietf.org/html/draft-chunduri-lsr-isis-preferred-path-routing-05
- History Of Networking – Alia Atlas – Fast Reroute https://networkcollective.com/2018/02/hon-fastreroute/
- Node-protection-checker https://github.com/routingcraft/node-protection-checker
Notes
- ^Marketing number inherited from SDH, yet people keep putting it everywhere, and so do I.
- ^Although it is possible to use RSVP-TE or SR tunnels with remote LFA when protection with LDP is not possible.
- ^Which turned to be a serious problem in SR-TE, so they had to come up with new IGP and BGP extensions to advertise Maximum SID Depth so that the controller knows how many labels each router can push at once. See RFC8476, RFC8491, draft-ietf-idr-bgp-ls-segment-routing-msd
- ^RFC8662 defines Entropy Readable Label Depth (ERLD) – how deep a transit LSR can lookup the label stack to find the entropy label.
- ^Segment Routing, Part II: Traffic Engineering (ISBN-13: 978-1095963135) – appendix B.
- ^Which is an amazing sales opportunity for all kinds of software vendors (even those without any routing/SDN background). This is why SR-TE is such a big hype.
Thank you so much for such a great post!
I found your blog on r/networking and surely will visit it regularly.
Hey, do you know if they have ever added fast-reroute/ti-lfa support to SR-TE?
TI-LFA link protection for SR-TE works pretty much out of the box, without any special features. The caveat is that it’s not always on post-convergence path, as it protects only the top segment, not the whole SR-TE policy.
Node protection for SR-TE policies is a weird topic – see https://tools.ietf.org/html/draft-hegde-spring-node-protection-for-sr-te-paths-05. I’m not aware of any vendor’s implementation as of today.
Appreciate the reply, yeah i know about that document, their proposed context specific LFIB idea, where a SR node knows about the labels that another SR node uses, reminds me of Bgp Optimal route reflection..
Perhaps more like interface-specific label spaces in LDP for ATM. I look forward for vendors trying to do that (but probably everyone will just give up and use RSVP-TE or PPR in such scenarios).
I cant seem to find any info on any vendor that has already implemented pLFA?
but altough more difficult to understand (pLfa), i agree that it would probably be less cpu intensive than SR context specific LFIB, also easier to follow along in for example Wireshark when working on troubleshooting scenarios
None I’m aware of. I look forward to play with PPR / pLFA myself!