What happens if you redistribute BGP full view into OSPF

Yes, this has already happened to many networks across the globe. And for sure will happen again. Maybe to your network?

TL;DR:

The network will go down. OSPF sessions will start flapping, some routers might run out of memory and crash. Be careful with redistribution and use safeguards (e.g. max-lsa) to limit the number of LSA in the OSPF LSDB.

Alright, but what actually happens? Once the administrator has realised his mistake and rolled back the config, won’t the network recover? Welcome to the pilot post of routingcraft.

Back then I worked in Cisco TAC. It was a sunny day, I just ate my lunch and was browsing reddit, when suddenly a P1 case came in. The customer was complaining about high CPU utilization on a router. After a few basic checks it became obvious that OSPF was the top CPU consumer, all adjacencies were flapping due to retransmits and the LSDB contained hundreds of thousands of LSA (IPv4 BGP full view back in the day was about 500k prefixes). And it was not on one router, but on every router we checked.

So I asked the customer – did he, by any chance, redistribute BGP into OSPF?

Then he admitted that one engineer indeed enabled BGP->OSPF redistribution without a route-map by mistake, but quickly corrected it, and that had happened a few hours before they called us. OSPF LSA MaxAge is 1 hour, so all of them should have been aged out and removed by that time?

With a low number of routes – perhaps yes, but not with BGP full view!

Overview of OSPF flooding

As a link-state routing protocol, OSPF is designed in a way to ensure that all routers in the area have exactly the same copy of the LSDB1. This makes sense as inconsistencies in the LSDB can cause traffic blackholing and routing loops.

Sequence number

Each LSA has a sequence number which is incremented by 1 every time the LSA is re-originated. Whenever the router receives an LSA it already has, but with a more recent sequence number, it discards the old LSA and installs the new one in the LSDB.

LS age

Each LSA is originated with LS age set to 0. Routers increment LS age of each LSA in the LSDB by 1 every second. Once the LS age reaches 3600 seconds (1 hour), the LSA is marked for deletion (but not deleted immediately!). The LSA originator can flush it by incrementing the sequence number and setting the LS age to MaxAge (3600), thereby informing all routers they should delete that LSA. Normally, the LSA originator refreshes each LSA every 30 min, offset by LSA pacing timer (240 s) so that all LSA are not refreshed at the same second.

Retransmissions

When the router accepts an LSA, it floods it to all other neighbours, and expects them to acknowledge it. The concept of acknowledging received data is similar to TCP, except there is no dynamic flow control in OSPF.

Instead, a few statically configured timers control flooding and retransmission:

First, the LSA is added to the interface flood list. LSAs from the flood list are packed in an LS update2 and sent to the OSPF multicast address (so that fewer packets have to be generated on multiaccess networks). The flood pacing timer (33 ms) adds an interval between different flooded LS updates. If no LS Ack has been received from each neighbour on the interface for the flooded LSA within retransmit interval (5 s), that LSA is added to the retransmission list of the neighbour which failed to acknowledge the LSA. Retransmitted LS updates are sent to neighbour’s unicast address and the retransmission pacing timer (66 ms) adds an interval between different retransmitted LS updates. If the retransmitted LSA is acknowledged within retransmit interval, it is removed from the neighbour’s retransmission list, else retransmitted again. If at least one LSA has been unsuccessfully retransmitted the retransmission limit3 number of times, the OSPF adjacency goes down4.

Figure 1 illustrates the timers described above. Imagine that LSA #1 and LSA #2 are transmitted in different LS updates, and the neighbour never acks LSA #1. It will be retransmitted first time 5 seconds after flooding, and then every 5 seconds. After 10 unsuccessful retransmits, the OSPF session goes down.

Fig. 1

In practice, this can happen if the LS update with LSA #1 is too large and there is an MTU blackhole in between.

On the receiver side, first the LSA should pass interface queueing and CoPP. Then it’s added to the OSPF update-queue (200 packets). The router does some checks5 and, if it decides to accept the LSA, sends an LS Ack to the unicast address of the neighbour from which the LSA was received.

Usually, this all works fine and ensures reliable flooding of LSA while not consuming too much bandwidth or CPU cycles. However, if the router is overloaded for some reason and can’t quickly flood or accept LS updates, this can lead to adjacency flaps – unlike BGP and other protocols that rely on TCP, which can become very slow but still keep the session alive in such situations.

Other OSPF timers (LSA generation, arrival and SPF delay) control the impact of the same LSA being re-originated and flooded. I will not describe them in detail as they are not directly applicable to the scenario reviewed in this post. Petr Lapukhov wrote a good article about it – as most fundamentals, it is still relevant in 2020.

Session establishment

During session establishment, neighbours exchange DBD packets. In those packets, routers list all LSA they have in the LSDB, except MaxAge LSA. By comparing the sequence numbers of the LSA received in the DBD packets to sequence numbers of the LSA in the local LSDB, the router can determine whether it has the most recent version of the LSA. If not, it sends an LS request, asking the neighbour to send the LS update with a more recent version of that LSA.

The MaxAge LSA are not part of DBD, and they are not requested. Instead, they are added to the neighbour’s retransmission list, and the usual retransmission rules apply.

Neighbour state

RFC2328#section-10.3 explains details, but what matters for this post is that in order for a newly formed adjacency to become FULL, neighbour’s LS request list must be empty. In other words, we transmitted them what they asked for.

Deleting LSA

LSA with MaxAge (either aged out or flushed) are not removed immediately. In fact, they are still transmitted to neighbours, so that they also get to know about the fact the LSA is being flushed!

In order to delete a MaxAge LSA, it must be not on any neighbour’s retransmission list and none of the neighbours must be in the state EXCHANGE or LOADING

Once BGP full view is redistributed

Abandon all hope. 

First, the routers will flood LSA, until there are too many of them and sessions start flapping. That will make the situation even worse, since now even more LSA will be flooded and retransmitted.

Eventually the operator will realize the mistake and stop redistribution. At this point it depends on how many LSA have already been redistributed, how many non-MaxAge LSA are present in the routing domain, number of routers in the topology and their computing power.

I will not focus on the latter – as the problem is fundamental and ~800k prefixes (full view at the time of this writing) in OSPF are unlikely to ever finish loading even on modern hardware. The difference is that legacy routers might also run out of memory and crash.

The key points are:

  1. Attempting to flood too many LSA will sooner or later lead to OSPF sessions flaps due to retransmits.
  2. Newly established sessions will never reach the FULL state – they will be stuck at EXCHANGE or LOADING until they flap again.
  3. If at each given moment, at least one router in the OSPF domain has at least one neighbour in the EXCHANGE or LOADING state, none of the LSA will be ever deleted from the OSPF domain. Even if some routers will manage to delete some aged-out LSA, they will receive them again from neighbours.

The restriction of LSA not being on any neighbour’s retransmission list in order to be deleted does not play a big role here as at some point at least some LSA will be acked by neighbour and removed from the retransmission list. This also means that If the number of non-MaxAge LSA in the OSPF domain is very low, the adjacencies will eventually come up (since MaxAge LSA are not advertised in DBD and not requested by neighbours). However, if there is a significant number of non-MaxAge LSA in the OSPF domain (just enough to not be loaded very quickly), neighbours will never reach the FULL state.

Lab example

Exact dynamics of what will happen depend on many things like the number of routers, hardware configurations, network topology etc.  I did the simulation in a very minimalistic virtual topology, based on the premise that a high number of redistributed LSA will cause some OSPF sessions to continuously flap.

 

Fig. 2

R1, R2 and R3 run OSPF with default settings. All links are P2P. Grey routers are legacy devices which are slow at consuming updates and flap OSPF sessions from time to time (for simulation purpose I just set them to restart OSPF every minute, at different times).

R1 receives 200k routes via BGP – a relatively low number which modern routers can handle in OSPF and it won’t cause session flaps. But BGP full view for sure will do.

Very small LSDB

In the most minimalistic config, there are only 7 router LSA:

R2#sh ip ospf 1 database database-summary   | in LSA Type|Total
  LSA Type      Count    Delete   Maxage
  Total         7        0        0

R1 receives 200k routes from BGP and redistributes them into OSPF

R1#sh ip bgp su

Neighbor        V           AS MsgRcvd MsgSent   TblVer InQ OutQ Up/Down  State/PfxRcd
172.17.0.2      4        65002     369      79   200001    0    0 01:09:11   200000


R1(config)#router ospf 1
R1(config-router)#redistribute bgp 65001

R2 and R3 received the redistributed routes.

R3#sh ip ospf database database-summary | in LSA Type|Total
  LSA Type      Count    Delete   Maxage
  Total         200007   0     0

Now stopping redistribution on R1:

R1(config-router)#no redistribute bgp 65001

Soon, R1 will transmit new LS updates with MaxAge LSA to R2 and R3, so they will mark those for deletion. Since there are very few (only 7) non-MaxAge LSA, R2 and R3 will be able to quickly bring up sessions with unstable routers.

R2#sh ip ospf nei

Neighbor ID     Pri   State           Dead Time   Address     Interface
5.5.5.5           0   FULL/  -     00:00:39    10.5.5.5        Ethernet1/1
4.4.4.4           0   FULL/  -     00:00:39    10.4.4.4        Ethernet1/0
3.3.3.3           0   FULL/  -     00:00:38    10.2.2.3        Ethernet0/2
1.1.1.1           0   FULL/  -     00:00:34    10.0.0.1        Ethernet0/0

This means, each MaxAge LSA can be deleted – as long as that particular LSA is not on any neighbour’s retransmission list. There will be lots of unacknowledged retransmits, but eventually some LSA will get acknowledged – and deleted. Over time, the number of LSA will decrease and the network will recover even faster.

R2#sh ip ospf 1 database database-summary   | in LSA Type|Total
  LSA Type      Count    Delete   Maxage
  Total         180297   180290 180290 


R2#sh ip ospf 1 database database-summary   | in LSA Type|Total
  LSA Type      Count    Delete   Maxage
  Total         148941   148934 148934 


R2#sh ip ospf 1 database database-summary   | in LSA Type|Total
  LSA Type      Count    Delete   Maxage
  Total         128585   128578 128578 


R2#sh ip ospf 1 database database-summary   | in LSA Type|Total
  LSA Type      Count    Delete   Maxage
  Total         74271    74264    74264


R2#sh ip ospf 1 database database-summary   | in LSA Type|Total
  LSA Type      Count    Delete   Maxage
  Total         20119 20112    20112

So it wasn’t that bad.

Slightly bigger LSDB

Now consider a higher number of LSA in the network – for example 10k. Many networks have much more.

R2#sh ip ospf 1 database database-summary   | in LSA Type|Total
  LSA Type      Count    Delete   Maxage
  Total         10462    0        0

R1 redistributes 200k BGP routes:

R1(config-router)#redistribute bgp 65001

R2 and R3 received 200k routes (in addition to existing 10k):

R2#sh ip ospf 1 database database-summary   | in LSA Type|Total
  LSA Type      Count    Delete   Maxage
  Total         210462   0     0

Stop redistribution:

R1(config-router)#no redistribute bgp 65001

200k LSA are now marked for deletion.

R2#sh ip ospf 1 database database-summary   | in LSA Type|Total
  LSA Type      Count    Delete   Maxage
  Total         210462   200000 200000

But not deleted. Why? It’s because our grey routers are still in EXCHANGE.

R2#sh ip ospf nei

Neighbor ID     Pri   State           Dead Time   Address     Interface
5.5.5.5           0   EXCHANGE/  -    00:00:39    10.5.5.5        Ethernet1/1
4.4.4.4           0   EXCHANGE/  -    00:00:39    10.4.4.4        Ethernet1/0
3.3.3.3           0   FULL/  -     00:00:36    10.2.2.3        Ethernet0/2
1.1.1.1           0   FULL/  -     00:00:32    10.0.0.1        Ethernet0/0

As in the previous example, they don’t have to receive all those MaxAge LSA in order for the sessions to become FULL. But when the session is in EXCHANGE, routers do not only exchange DBD and LS requests/LS updates in response to requests, they also send the usual LS updates – and that includes LSA from the retransmission list.

RFC 2328:

            All adjacencies in Exchange state or greater are used by the
            flooding procedure.  In fact, these adjacencies are fully
            capable of transmitting and receiving all types of OSPF
            routing protocol packets.

So in EXCHANGE state, R2 and R3 will transmit all kinds of packets to their slow neighbours – not only what is essential to bring the sessions up. Most of LS updates consist of MaxAge LSA:

Fig. 3

And as long as at least one session is stuck in EXCHANGE, no LSA will be deleted.

How do I fix it?

The most natural  thing to do here is to clear OSPF. It will indeed force the router to delete all LSA.

R2#clear ip ospf process
Reset ALL OSPF processes? [no]: yes

Clean LSDB on R2!

R2#sh ip ospf 1 database database-summary   | in LSA Type|Total
  LSA Type      Count    Delete   Maxage
  Total         10462    0        0

But R3 still has flapping neighbours. Upon bringing up the session with R3, it will also transmit all MaxAge LSA. Soon R2 will be back with the same LSDB full of MaxAge LSA

R2#sh ip ospf 1 database database-summary   | in LSA Type|Total
  LSA Type      Count    Delete   Maxage
  Total         111758   101296 101296
…

R2#sh ip ospf 1 database database-summary   | in LSA Type|Total
  LSA Type      Count    Delete   Maxage
  Total         210462   200000 200000

In this small topology it doesn’t matter, but in a real network restarting OSPF process will make things even worse by forcing more LSA to be flooded.

However, clearing process simultaneously on all routers will probably help (if you have an out-of-band management network).

What will work for sure is isolating all routers by shutting down all OSPF sessions, and then bringing everything up. This is very disruptive and requires out-of-band management, so in many cases will be just not feasible.

Configuring max-lsa (see below) would prevent this problem…had it been configured before BGP was redistributed. Enabling it on a network with a huge number of LSA will have the same effect as shutting down all sessions – it will just take half an hour or so (depending on ignore counter setting).

Another option which might work, depending on the number of routers in the topology, is to reduce the number of non-MaxAge LSA. If there are only a few hundred of those, OSPF sessions can come up, and the routers will start deleting MaxAge LSA.

Max LSA safeguards

Vendors provide safeguards to limit the number of LSA. Since all routers in the OSPF area must have exactly the same LSDB, the way those knobs work is shutting down all OSPF sessions after the configured limit is exceeded. On Cisco IOS, IOS-XR and Arista EOS this is called “max-lsa”, on JUNOS – “database-protection”. It works the same way.

For instance:

router ospf 1
 max-lsa 12000 75 ignore-time 5 reset-time 10 ignore-count 5

This sets max lsa limit to 12000, will print a syslog warning upon reaching 75% of that number; upon reaching 12000 it will shutdown all OSPF sessions and ignore all neighbours for 5 minutes. When this happens, the ignore-count increments by 1. After 10 minutes of last transition to the ignore state, the ignore count resets to 0. If the ignore count reaches 5, all OSPF sessions will be shut down permanently.

On EOS, max-lsa is set to 12000 by default once OSPF is configured. If you have more LSA, you might want to increase this number. But don’t set it to 0, and, on other vendors’ OS, enable it and set it to some value, especially if you have BGP full view. It might save your network one day.

What about IS-IS?

IS-IS is the most common IGP in large ISP networks. Overall it is very similar to OSPF, but there are differences, and handling redistributed routes is one of them.

While OSPF creates an LSA for each external route, IS-IS appends all external routes to the LSP of the redistributing router. If that LSP exceeds CLNS MTU (1492 bytes by default), it gets fragmented. CLNS fragmentation uses a one-byte field which means there can be a maximum of 256 fragments. IP routes are usually carried in TLV #135 which has variable size, depending on prefix length and sub-TLVs. So the exact number of routes that can fit in one LSP depends on multiple things, but it will be around 40-50k or so – unlikely to cause network-wide outage6.

If the IS-IS router tries to insert so many routes in its LSP so that it exceeds the max LSP size, it throws the LSPFULL exception and starts suppressing some routes.

On IOS, it is possible to control which routes are suppressed:

router isis 1
 lsp-full suppress external

With this config (default on IOS), a router in the LSPFULL condition will stop advertising any external routes in its LSP, but will still advertise internal routes. Note that this distinction between internal and external routes in IS-IS is only local, they are all advertised in TLV #135 in the same format.

EOS, IOS-XR and JUNOS don’t provide such a granular control. Not like it’s needed that much anyway.

Conclusion

  1. Don’t redistribute BGP full view into OSPF.
  2. Use IS-IS as the IGP for ISP networks.
  3. If you still decide to use OSPF, set max-lsa limits.

References

  1. OSPF Version 2 https://tools.ietf.org/html/rfc2328
  2. OSPF Fast Convergence https://blog.ine.com/2010/06/02/ospf-fast-convergenc

Notes

  1. ^Some LSA are flooded only within area, others – throughout the whole OSPF domain. For simplicity this post assumes single area OSPF.
  2. ^In good implementations LSU should not exceed interface MTU to avoid fragmentation.
  3. ^By default 10 on Arista EOS, 24 on Cisco IOS/IOS-XR.
  4. ^Some implementations allow to disable the retransmission limit, and some amateur implementations of OSPF (unfortunately often found in enterprise-grade equipment) don’t have it at all, leading to traffic blackholing for up to 1 hour in case of problems with transport (e.g. MTU blackholes).
  5. ^For details see https://tools.ietf.org/html/rfc2328#section-13
  6. ^IS-IS uses 802.3 encapsulation when it runs over Ethernet, so it doesn’t support Jumbo frames. There is an industry consensus of using ethertype 0x8870 to pad IS-IS packets up to Jumbo MTU size – which is not standardized, but all vendors do that. Some implementations allow to set LSP-MTU size to a bit more than 4k – which brings the number of IPv4 prefixes that potentially can be advertised by one IS-IS router to about 150k – still not a very high number, which can be processed by modern hardware.

6 thoughts on “What happens if you redistribute BGP full view into OSPF”

  1. Hi Dmytro,
    Thank you for such a details article, but I have a question regarding to your lab setup. As you mentioned before “Grey routers are legacy devices which are slow at consuming updates and flap OSPF sessions from time to time (for simulation purpose I just set them to restart OSPF every minute, at different times).”
    My question is how to program a router to restart OSPF process automatically every minute or so? is this process done by using EEM script?
    Thank you,

    1. Hi Mohammed,

      Yes, pretty much EEM with cron.

      event manager applet RESTART_OSPF
      event timer cron cron-entry “* * * * *”
      action 2.0 cli command “enable”
      action 2.1 cli command “conf t”
      action 3.0 cli command “router ospf 1”
      action 4.0 cli command “shutdown”
      action 5.0 cli command “no shutdown”
      action 6.0 cli command “end”

  2. If I got everything right, the (major) problem is that the MaxAge LSAs never get deleted due to the neighbours not in FULL state, and those neighbours never reach FULL kinda because of the MaxAges…
    Would re-enabling the redistribution potentially reestablish the stability (or at least improve somehow) since it would replace the MaxAges with normal LSAs?
    If yes, would blocking a small count of prefixes at a time on the redistribution point be a way out of the panic?

    1. > If I got everything right, the (major) problem is that the MaxAge LSAs never get deleted due to the neighbours not in FULL state, and those neighbours never reach FULL kinda because of the MaxAges…

      Correct.

      > Would re-enabling the redistribution potentially reestablish the stability (or at least improve somehow) since it would replace the MaxAges with normal LSAs?

      No, it will make everything worse. Non-MaxAge LSA will be still flooded, but also advertised in DBD during session establishment, and neighbours will send LS requests for those LSA.

Leave a Reply

Your email address will not be published. Required fields are marked *