A quick public service announcement for anyone implementing BGP-LU or deploying it in a multivendor environment.
What is BGP Labeled Unicast
Originally, BGP was designed to advertise IP prefixes. Then [RFC2283] (obsoleted by [RFC4760]) defined Multiprotocol extensions which enabled BGP to advertise any information, turning it into the universal messaging bus it is today.
[RFC3107] is one of the applications of MP-BGP. It’s very simple: instead of sending just an IP prefix, send a prefix + MPLS label(s).
Such a small modification turns out to be very useful and made possible a lot of network designs:
- BGP-only MPLS transport (e.g. for large scale Data Centers)
- Inter-AS VPN option C [RFC4364#section-10]
- Carrier’s Carrier [RFC4364#section-9]
- Seamless MPLS [draft-ietf-mpls-seamless-mpls]
- 6PE [RFC4798]
- Egress Peer Engineering [draft-gredler-idr-bgplu-epe]
- Installing SR-TE policies to routers not supporting BGP-SRTE or PCEP
Probably I missed a few off the top of my head so please add more in the comments.
Generating and parsing BGP-LU Updates
[RFC3107] is very clear on this.
BGP-LU MP_REACH_NLRI looks like this:
+---------------------------+ | Length (1 octet) | +---------------------------+ | Label (3 octets) | +---------------------------+ ............................. +---------------------------+ | Prefix (variable) | +---------------------------+
The Label field carries one or more labels (that corresponds to the stack of labels [MPLS-ENCAPS]). Each label is encoded as 3 octets, where the high-order 20 bits contain the label value, and the low order bit contains "Bottom of Stack" (as defined in [MPLS-ENCAPS]).
Therefore, the sender sets BoS bit on the bottom label and the receiver parses the first label (3 bytes), if BoS is not set, parses the next 3 bytes etc until BoS is set at which point it understands it was the last label and the next field is IP prefix.
In pseudocode, the logic is as follows:
last_label = False WHILE NOT last_label DO mpls_label = parse_3_bytes() IF mpls_label.bos THEN last_label = True ENDIF ENDWHILE parse_ip_prefix()
What is the problem with RFC3107
Generally speaking, there is no problem as it works fine in many deployments. However, [RFC8277#section-1] mentions multiple issues – mostly around ambiguity of how many labels can be in the stack and also how to withdraw prefixes.
Behold the new RFC8277
So the new RFC clarifies some ambiguities and introduces:
- New BGP capability so that speakers can negotiate how many labels they can receive
- Clarified rules on withdrawing BGP-LU routes
- Clarified rules on comparing BGP-LU routes with different label stacks
This would be a great informational RFC to help developers write more compatible BGP implementations, except it’s actually a standard which obsoletes RFC3107 and gives no transition mechanism for existing deployments and upgrades.
Backwards compatibility is important
Like, how else are network operators supposed to upgrade networks or deploy a new protocol?
[RFC8277#section-2.2] instructs BGP speakers to ignore the bottom of stack bit when the multiple labels capability is not negotiated. The justification is:
[RFC3107] specifies that additional labels will appear in the NLRI. However, some implementations assume that the NLRI will contain only a single label and thus do not check the setting of the S bit. The procedures specified in the current document will interwork with such implementations.
So in other words, even though most BGP implementations correctly support RFC3107 and have no problems with multiple labels, just because some implementations don’t, the new RFC suggests breaking it for everyone.
The good news is that most vendors are smarter than that and even when implementing the multiple labels capability per RFC8277, they don’t break the old RFC behaviour.
Conclusion
I think it’s a good reminder that RFCs are more like guidelines than strict rules, because if everyone implemented them exactly as specified, many things wouldn’t work. On the other hand, this is why we keep having all sorts of compatibility issues in multi-vendor deployments.