Network Working Group Gargi Nalawade Internet Draft Pradosh Mohapatra June 2006 Francois Le Faucheur Ruchi Kapoor Pranav Mehta David Ward Simon Barber Cisco Systems J. Wu Y. Cui X. Li Tsinghua University BGP Softwire Nexthop Attribute draft-nalawade-softwire-nhop-00.txt 1. Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. 2. Copyright Notice draft-nalawade-softwire-nhop-00.txt [Page 1] Internet Draft draft-nalawade-softwire-nhop-00.txt June 2006 Copyright (C) The Internet Society (2006). All Rights Reserved. 3. Abstract The current [MP-BGP] extensions carry routing information by associating the same network layer protocol with both the NLRI and the next hop. However, in certain scenarios, it is desirable or required to advertise next hop information associated with a different network layer protocol to the one associated with the NLRI. Similarly, when data traffic for a network layer protocol's NLRIs needs to be forwarded over an underlying tunnel, it is required to indicate the tunnel endpoint to be used for forwarding with the BGP update for a given NLRI. This document specifies a new BGP attribute, called the SW_NEXT_HOP attribute, which can optionally be used in a BGP Update message to advertise next hop information associated with a different network layer protocol than that of the NLRIs or convey information about the tunnel endpoint. 4. Introduction [MP-BGP] defines extensions to BGP-4 to enable it to carry routing information for multiple network layer protocols (e.g. IPv4-VPN, IPv6, IPv6-VPN). This is achieved by encoding the next hop and the NLRI as defined by the network layer protocol in an MP_REACH_NLRI attribute and including the network layer protocol identifiers. Since the same network layer protocol is associated with both the next hop information and the NLRI, [MP-BGP] extensions do not allow advertisement of next hop information from a different network layer protocol to the one of the NLRI. However, there are situations where the next hop information to be advertised is indeed from a different network layer protocol to the one of the NLRI. In a number of such situations, the [MP-BGP] limitation has been circumvented by mapping the actual nexthop to an encoded value so as to match the network layer protocol format of the NLRIs. [MPLS-VPN] is an example of this since it calls for advertisement of IPv4 next hop information with IPv4-VPN NLRI. This is achieved by prepending a Null Route Distinguisher to the IPv4 Next Hop address. [BGP-V6-TUNN] is another example that requires advertisement of IPv4 next hop information along with IPv6 NLRI. The next hop is encoded as an IPv4-mapped IPv6 address. [IPv6-VPN] is yet another example that requires advertisement of IPv4 or IPv6 next hop information along with IPv6-VPN NLRI, which is achieved by prepending a Null Route draft-nalawade-softwire-nhop-00.txt [Page 2] Internet Draft draft-nalawade-softwire-nhop-00.txt June 2006 Distinguisher to the next hop address and, when the meaningful next hop is IPv4, by encoding it as an IPv4-mapped IPv6 address. These workarounds do not suffice if the actual next hop address can not be embedded in the next hop information field as defined by the network layer protocol of the NLRIs. One such example is the need to advertise IPv6 next hop information with IPv4 NLRIs to be able to carry IPv4 islands routing information over a native IPv6 core. Also, in cases where the network layer protocols of the next hop and NLRI are different, the transport protocol is different from the payload. This calls for the payload to be tunneled through the ISP core. The establishment of the tunnels as well as the selection of the tunnel type(s) to be used from an ingress router to a given egress router can be statically controlled by configuration. Alternatively the tunneling capabilities and preferences as well as the individual tunnel attributes [BGP-TUN] can be dynamically established via various mechanisms such as the BGP IPv4/IPv6 Tunnel SAFI [BGP-TUN-SAFI] or IGP based discovery of TE tunnels [IGP-TE]. In some cases, the same tunnel can be used for all NLRIs advertised by the egress router. The tunnel can then be selected by the ingress router based on its local configuration as well as the information that may have been advertised by the egress router about tunneling capabilities and preferences for example via [BGP-TUN-SAFI]. In other cases, different NLRIs may need to be carried over different tunnels. For example, some NLRIs may require transport over IPsec tunnels while the other NLRIs may be more efficiently transported without IPsec protection over MPLS LSPs. In these cases there is a requirement for the egress router to advertise which tunnel ought to be used for a particular set of NLRIs. The ingress router needs an indication in the BGP update for these NLRIs, as to which tunnel to use to reach the egress router. This document describes a new BGP attribute, called SW_NEXT_HOP attribute that can be optionally carried in BGP Update messages to signal the actual next hop that is independent of the network layer protocol of NLRIs and to also signal which tunnel to use for a given set of NLRIs. 5. Softwire Nexthop Attribute An Optional transitive attribute is being defined. This attribute is meant to carry the nexthop address and the Tunnel information needed to reach this nexthop address of the remote peer. draft-nalawade-softwire-nhop-00.txt [Page 3] Internet Draft draft-nalawade-softwire-nhop-00.txt June 2006 The attribute is encoded as shown below: +---------------------------------------------------------+ | Address Family Identifier (2 octets) | +---------------------------------------------------------+ | Subsequent Address Family Identifier (1 octet) | +---------------------------------------------------------+ | Length of Next Hop Network Address (1 octet) | +---------------------------------------------------------+ | Network Address of Next Hop (variable) | +---------------------------------------------------------+ | TLVs (variable length) | +---------------------------------------------------------+ The use and meaning of these fields are as follows : Address Family Identifier (AFI): This field in combination with the Subsequent Address Family Identifier field identifies the Network Layer protocol associated with the Next Hop address. Presently defined values for the Address Family Identifier field are specified in RFC1700 (see the Address Family Numbers section). Subsequent Address Family Identifier (SAFI): This field in combination with the Address Family Identifier field identifies the Network Layer protocol associated with the Next Hop address. Length of Next Hop Network Address: A 1 octet field whose value expresses the length of the "Network Address of Next Hop" field as octets. Network Address of Next Hop: A variable length field that contains the Network Address of the Next Hop. TLV: The variable length TLV field of the Softwire Nexthop attribute draft-nalawade-softwire-nhop-00.txt [Page 4] Internet Draft draft-nalawade-softwire-nhop-00.txt June 2006 contains one or more tuples of the form : +----------------+------------------+ | Type (1 octet) | Length (1 octet) | +----------------+------------------+ | Value (as specified by Type) | +-----------------------------------+ where, Type: This field specifies the 'Type' of the data contained in the value field. Length: specifies the length of the 'Value' field. Value: The contents and format of the value field are defined by the Type field. Following 'Types' are being defined : Type 1 : indicates that the value field in the TLV contains a 2-octet Tunnel Identifier which uniquely identifies a Tunnel on the egress BGP router Type 2 : indicates that the value field in the TLV contains a 2-octet Multicast Tree Identifier which uniquely identifies a Multicast Tree on the advertising BGP router 6. Operation A BGP SPeaker may want to advertise itself as the router that should be used as the next hop to the destinations advertised in the NLRI field, or in the MP_NLRI field of the MP_REACH_NLRI attribute, and wants to advertise one of its Network Layer addresses for a Network Layer protocol which is different to the Network Layer protocol of the NLRI destinations. Alternately, a BGP Speaker may also want to explicitely advertise which tunnel to itself ought to be used for particular NLRI destinations. In both the above cases, a BGP Speaker supporting the SW_NEXT_HOP attribute, SHOULD include the SW_NEXT_HOP attribute to convey this Network Layer address. A BGP Speaker supporting the SW_NEXT_HOP attribute which receives a BGP advertisement containing a SW_NEXT_HOP attribute and which does not modify the next hop information, SHOULD propagate the received draft-nalawade-softwire-nhop-00.txt [Page 5] Internet Draft draft-nalawade-softwire-nhop-00.txt June 2006 SW_NEXT_HOP attribute unchanged. A BGP Speaker supporting the SW_NEXT_HOP attribute which receives a BGP advertisement containing a SW_NEXT_HOP attribute and which modifies next hop information MAY include an SW_NEXT_HOP attribute in the generated advertisement. When it does so, the Network Layer address contained inside the SW_NEXT_HOP attribute MUST be one of its own addresses. In other words, in the case where the BGP speaker modifies next hop information, it MUST NOT simply propagate the received SW_NEXT_HOP unchanged. When a BGP speaker supporting the SW_NEXT_HOP attribute receives a BGP advertisement with next hop information encoded both in the MP_REACH_NLRI and in the SW_NEXT_HOP, the BGP speaker SHOULD use the next hop information encoded in the SW_NEXT_HOP, unless configured to do otherwise. When a BGP Speaker sets itself as the nexthop and is advertising Optional Tunnel TLVs using the SW_NEXT_HOP attribute, it means that the BGP Speaker is terminating the Tunnels and is advertising itself as a Tunnel endpoint. Let us consider the case when an ingress router receives a BGP update for NLRIs which will receive data traffic (Eg. IPv4/6 unicast/multicast, VPNv4/6 etc). If this update contains SW_NEXT_HOP attribute carrying a Type, Tunnel-ID and Tunnel endpoint address, the ingress router will use this information in the following manner (Tunnel endpoint address is the address contained in the 'Network Address of Next Hop' field in the SW_NEXT_HOP attribute): The Tunnel/Tree-ID and the Tunnel endpoint address will be used to lookup the appropriate tunnel in the Tunnel database to establish data forwarding through this Tunnel. Data traffic for the NLRIs carried in this BGP update will now be forwarded through this Tunnel. Note that the Tunnels themselves are established by the respective Tunnel protocols (Eg. mGRE, IPSec, L2TP etc). As an example, if the BGP Tunnel SAFI is the mechanism used to discover the Tunnels, then the Tunnel-ID:Tunnel endpoint address will be the NLRI carried by the BGP Tunnel SAFI [BGP-TUN-SAFI] updates. The Tunnel encapsulations will be carried in the BGP Tunnel attribute [BGP-TUN] accompanying the BGP Tunnel SAFI update. On the other hand, if IGP-based discovery of TE tunnels [IGP-TE], the mechanism used to discover TE tunnels is used, then the Tunnel-ID and Tunnel endpoint address will identify the TE tunnel discovered through this mechanism. draft-nalawade-softwire-nhop-00.txt [Page 6] Internet Draft draft-nalawade-softwire-nhop-00.txt June 2006 Similarly this applies to other out of band Tunnel discovery mechanisms as well which includes static configuration. 7. Capability advertisement A new capability [BGP-CAP] code (TBD) is defined for the BGP SW_NEXT_HOP attribute. The Capability Length is set to zero. The SW_NEXT_HOP attribute can only be sent to peers that have advertised this capability. 8. Applicability Statement 8.1. VPNv4 unicast traffic over a Tunnel If VPNv4 unicast traffic has to be tunneled through an ISP core instead of being MPLS switched as per RFC 4364, then the ingress PE needs to know what Tunnel to connect to. The egress PE may use the SW_NEXT_HOP attribute to signal this information. The Tunnel encapsulation itself could be statically configured or discovered through various mechanisms such as IGP based discovery of TE tunnels [IGP-TE] or a BGP Tunnel SAFI [BGP-TUN-SAFI]. If an ingress PE receives a BGP update for the VPNv4 prefix with a SW_NEXT_HOP attribute, it would be able to connect to the appropriate Tunnel. Using the Tunnel-ID and Tunnel endpoint address, the SW_NEXT_HOP attribute will indicate which Tunnel is to be used to reach the VPNv4 destination. For an IPv4 core, the contents of the SW_NEXT_HOP attribute can be expressed as follows: Address Family Identifier: 1 (IP version 4) Subsequent Address Family Identifier: 1 (Unicast) Length of Next Hop Network Address: 4 Network Address of Next Hop: IPv4 address of the egress PE TLVs: Type: 1 Length: 2 Value: Tunnel Identifier value as created by the egress PE 8.2. MVPN traffic over a default MDT Tunnel A Multicast tunnel is setup between the PEs in one or more VPN-Providers networks. Over the Multicast tunnel we create PIM neighbors. The IP address of the PIM neighbor that is seen over the Multicast tunnel depends on the configured address of the Tunnel endpoint. This can either be an unnumbered address from a different interface or a configured address on the Tunnel itself. The PE router that does an RPF check on a VPN source can find which Tunnel the source is on, but may not know what PIM neighbor to target on that tunnel. Therefore we need a way to connect draft-nalawade-softwire-nhop-00.txt [Page 7] Internet Draft draft-nalawade-softwire-nhop-00.txt June 2006 the BGP VPNv4 prefix to the PIM neighbor on the tunnel to allow the RPF check to succeed. Suppose PIM wants to join to a source that is behind another VPN site. We do an RPF lookup on the source address in the VPNv4 unicast table on this PE. The RPF lookup will return a connected next-hop and interface to use to reach the source. The returned next-hop may not be the neighbor on the Multicast tunnel. This can be due to the next-hop being rewritten by BGP Route Reflectors (RR) or crossing AS's. Therefore we don't know which PIM neighbor to target as an upstream neighbor in the PIM join. This can be achieved by using the SW_NEXT_HOP attribute to carry that information. The SW_NEXT_HOP attribute when carried with Type 2, will indicate what default MDT tunnel endpoint's IP address is. 8.3. Multicast VPN traffic over Label-switched or other Multicast Tunnels If a BGP Multicast Overlay SAFI [BGP-MOS] is used for signalling Multicast Join/Prune Binding information, the downstream PE needs to know what Multicast tree built by MLDP or what Tunnel to bind to. The Tunnel encapsulation information itself could be provided by MLDP when Multipoint LSPs are used in the core. Or the Tunnel encapsulation could be provided by TE, or through the BGP Tunnel SAFI [BGP-TUN-SAFI]. Either ways, the downstream PE needs to know which Tunnel to connect to in order to receive a Multicast stream corresponding to a given PIM Join. This can be achieved by the Upstream PE sending the Tunnel/P-MP LSP binding information through the SW_NEXT_HOP attribute. 8.4. IPv4 Forwarding over IPv6 Networks With the rapid deployment of IPv6 networks, there is a requirement for IPv6 backbones to provide packet transport service to existing IPv4 access networks. One part of the control plane mechanism involves carrying IPv4 NLRIs with the IPv6 network layer address as the next hop. This can be achieved by the egress PE sending either an MP_REACH_NLRI or BGP-4 Update message with the IPv4 NLRIs that carries an SW_NEXT_HOP attribute containing the IPv6 next hop. Address Family Identifier: 2 (IP version 6) Subsequent Address Family Identifier: 1 (Unicast) Length of Next Hop Network Address: 16 Network Address of Next Hop: IPv6 address of the egress PE TLVs: Type: 1 Length: 2 Value: Tunnel Identifier value as created by the egress PE draft-nalawade-softwire-nhop-00.txt [Page 8] Internet Draft draft-nalawade-softwire-nhop-00.txt June 2006 9. Route Reflector Considerations If Route Reflectors (RR) reflect routes from the BGP speakers supporting SW_NEXT_HOP attribute, they MUST support this new capability to be able to validate the nexthop. If the Route Reflectors are not in the forwarding path, they don't need to perform a nexthop resolution and so validating just network address portion of the SW_NEXT_HOP attribute is sufficient. So, Route Reflectors not in the forwarding path may choose not to validate TLV fields carried inside the SW_NEXT_HOP attribute that provides additional information to resolve the nexthop. When data traffic for a network layer protocol's NLRIs needs to be forwarded over an underlying tunnel, there are two possible ways to carry nexthop information inside SW_NEXT_HOP attribute. The nexthop can be carried as a IPv4/IPv6 network address with additional tunnel end-point information carried inside a TLV field. Alternatively, the nexthop can be carried directly in the form of [BGP-TUN-SAFI] where the tunnel end-point information and the nexthop address are embedded together in the network address portion of the nexthop. Alternately nexthop can be carried as a IPv4/IPv6 network address with additional tunnel end-point information carried inside a TLV field. In case of Route Reflector partitioning, it is possible that tunnel end-point information [BGP-TUN-SAFI] is exchanged via different Route Reflector from the one carrying the NLRIs forwarded over those tunnels. To facilitate nexthop validation on such Route Reflectors, it is recommended to carry Nexthop information in the alternate TLV format. 10. IANA Considerations A BGP attribute code [BGP-4] and a Capability code [BGP-CAP] will be needed to be obtained from IANA. 11. Security Considerations This extension to BGP does not change the underlying security issues. 12. Acknowledgements This specification combines and extends prior work on "BGP-4 NEXTHOP-v2 Attribute" by Francois Le Faucheur, Dan Tappan, and Gargi Nalawade, with prior work on "BGP Connector Attribute" by Gargi Nalawade, Ruchi Kapoor, and David Ward. The current authors wish to thank all these authors for their contribution. The authors would like to thank Dan Tappan, Chris Metz, Eric Rosen, Christian Cassar and Scott Wainner for their feedback, review and comments. draft-nalawade-softwire-nhop-00.txt [Page 9] Internet Draft draft-nalawade-softwire-nhop-00.txt June 2006 13. Normative References [BGP-4] Rekhter, Y. and T. Li (editors), "A Border Gateway Protocol 4 (BGP-4)", Internet Draft draft-ietf-idr-bgp4-26.txt, April 2005. [BGP-CAP] Chandra, R., Scudder, J., "Capabilities Advertisement with BGP-4", draft-ietf-idr-rfc2842bis-02.txt, April 2002. [BGP-V6-TUNN] Ooms et al., Connecting IPv6 Islands across IPv4 Clouds with BGP, draft-ooms-v6ops-bgp-tunnel-00.txt, work in progress. [BGP-TUN] Kapoor R., Nalawade G., "BGPv4 Tunnel Encapsulation Attribute", June 2008, , Work in Progress. [BGP-TUN-SAFI] Nalawade G., Kapoor R., Tappan T., Wainner S. "BGPv4 Tunnel SAFI", June 2006, , Work in Progress. [SW-MESH-FMWK] Metz, C. et al, "A Framework for Softwire Mesh Signaling, Routing and Encapsulation across IPv4 and IPv6 Backbone Networks", draft-wu-softwire-mesh-framework-00, June 2006. [BGP-MOS] Nalawade G., Bhaskar N., Mehta P. "Multicast PE-PE Signaling using BGP", June 2006, draft-nalawade-bgp-mcast-signaling-001.txt, Work in Progress. [MULTI-BGP] Bates et al, Multiprotocol Extensions for BGP-4, draft- ietf-idr-rfc2858bis-02.txt, work in progress. [IGP-TE] Vasseur J., Psenak P., Yasukawa S., "OSPF MPLS Traffic Engineering Capabilities", Feb 2004, Work in Progress. [BGP-VPN] Rosen E., Rekhter Y., "BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4364 14. Author's Addresses Gargi Nalawade 170 Tasman Drive San Jose, CA, 95134 E-mail: gargi@cisco.com Pradosh Mohapatra 170 Tasman Drive San Jose, CA, 95134 E-mail: pmohapat@cisco.com Francois Le Faucheur draft-nalawade-softwire-nhop-00.txt [Page 10] Internet Draft draft-nalawade-softwire-nhop-00.txt June 2006 Cisco Systems, Inc. Village d'Entreprise Green Side - Batiment T3 400, Avenue de Roumanille 06410 Biot-Sophia Antipolis France E-mail: flefauch@cisco.com Ruchi Kapoor 170 Tasman Drive San Jose, CA, 95134 E-mail: ruchi@cisco.com Pranav Mehta 170 Tasman Drive San Jose, CA, 95134 E-mail: ruchi@cisco.com David Ward 408 St Peter Street, Hamm Bldg St Paul, MN, 55102 E-mail: wardd@cisco.com Simon Barber Cisco Systems, Inc Email: sbarber@cisco.com Jianping Wu Tsinghua University Department of Computer Science, Tsinghua University Beijing 100084 P.R.China Phone: +86-10-6278-5983 Email: jianping@cernet.edu.cn Yong Cui Tsinghua University Department of Computer Science, Tsinghua University Beijing 100084 P.R.China Phone: +86-10-6278-5822 Email: cuiyong@tsinghua.edu.cn Xing Li Tsinghua University Department of Electronic Engineering, Tsinghua University Beijing 100084 P.R.China Phone: +86-10-6278-5983 draft-nalawade-softwire-nhop-00.txt [Page 11] Internet Draft draft-nalawade-softwire-nhop-00.txt June 2006 Email: xing@cernet.edu.cn 15. Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. 16. Full Copyright Statement "Copyright (C) The Internet Society (2006). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights." Additional copyright notices are not permitted in IETF Documents except in the case where such document is the product of a joint development effort between the IETF and another standards development organization or the document is a republication of the work of another standards organization. Such exceptions must be approved on an individual basis by the IAB. "This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, draft-nalawade-softwire-nhop-00.txt [Page 12] Internet Draft draft-nalawade-softwire-nhop-00.txt June 2006 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." 17. Expiration Date This memo is filed as , and expires December, 2006. draft-nalawade-softwire-nhop-00.txt [Page 13]