Network Working Group Clarence Filsfils Internet Draft Cisco Systems, Inc. Category: Standards Track Expiration Date: August 2008 Stefano Previdi Cisco Systems, Inc. George Swallow Cisco Systems, Inc. February 2008 IS-IS Detailed IP Reachability Extension draft-swallow-isis-detailed-reach-01.txt Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract This document defines a means for IS-IS to carry detailed host reachability information along with summarized IP reachability. In particular it defines a new sub-TLV of the extended IP reachability TLV. Swallow, et al. Standards Track [Page 1] Internet Draft draft-swallow-isis-detailed-reach-01.txt February 2008 Contents 1 Introduction .............................................. 3 1.1 Conventions ............................................... 3 1.2 Terminology ............................................... 3 2 Background ................................................ 3 3 Overview .................................................. 4 4 Detailed Reachability Sub-TLV ............................. 5 4.1 Backward Compatibility .................................... 6 5 Domain Partitioning ....................................... 6 6 Semantics of detailed reachability ........................ 7 7 Applicability ............................................. 7 8 Security Considerations ................................... 9 9 IANA Considerations ....................................... 10 10 References ................................................ 10 10.1 Normative References ...................................... 10 10.2 Informative References .................................... 10 11 Authors' Addresses ........................................ 11 Swallow, et al. Standards Track [Page 2] Internet Draft draft-swallow-isis-detailed-reach-01.txt February 2008 1. Introduction The IS-IS protocol is specified in ISO-10589 [1], with extensions for supporting IPv4 specified in RFC1195 [2]. The extended IP reachability TLV is specified in RFC3784 [3]. This document defines a sub-TLV of that TLV to allow detailed host reachability information to be carried along with summarized IP reachability. 1.1. Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [4]. 1.2. Terminology ASBR Autonomous system border router BGP Border Gateway Protocol FIB Forwarding information base IGP Interior gateway protocol L1L2 Level 1 and level 2 LSDB Link-state data base PE Provider edge PIM Protocol Independent Multicast RIB Routing information base RPF Reverse path forwarding 2. Background IS-IS advertises routing/reachability information in link-state packets within a domain. Currently no distinction is made between routing and reachability. In the case of a host-route (/32 addresses in the case of IPv4) this is not a problem as there can be no ambiguity between routing and reachability. If a host is advertised as reachable, then there is (except during a convergence period or in very unusual circumstances) a routed path to that address. However, when shorter prefixes are advertised as reachable, reachability to a specific host address is hidden. When reachability is summarized as it often is between levels, detailed reachability information is lost. Such summarization is critical to the scaling and convergence of the forwarding plane. However, various control plane elements require host reachability information (usually to PE or ASBR loopback addresses) either for Swallow, et al. Standards Track [Page 3] Internet Draft draft-swallow-isis-detailed-reach-01.txt February 2008 correct action or to speed convergence. This level of detail very often is not needed in the forwarding plane. But the current all-or- nothing behavior of IS-IS leaves a network operator with a choice of missing the benefits of summarization for scalability or loosing the benefits of detailed reachability information. Among the control plane elements that could benefit from detailed host-reachability information are BGP next-hop tracking and PIM. The Border Gateway Protocol (BGP) advertises routes that are external to the domain by associating them with a BGP next-hop address that is known within the domain. Often multiple next-hops are available to reach a particular prefix. If a prefix becomes unreachable, then BGP will withdraw the route. Such withdrawals take time. In particular if the advertising router goes down the withdrawal may be delayed until the BGP TCP session times out. In order to speed convergence routers employ a technique called next- hop tracking. In next-hop tracking the reachability of the BGP next- hop is tracked. If a next-hop becomes unreachable, BGP route selection is run. External routes that are reachable through a known alternative next-hop are then installed. Currently if next-hop tracking is to be performed, the above mentioned host-routes cannot be summarized. The proposed extension allows the IGP routes to be summarized while distributing the detailed reachability information needed for next-hop tracking. PIM depends on the IGP reachability to the source of an (S, G) state to determine its RPF interface. When PIM installs an (S, G) state for the first time, it registers with the RIB for being notified of any route change to S. Later on, if the route to S changes, RIB immediately sends a notification to PIM. 3. Overview In IS-IS IP reachability information may be carried in the extended IP reachability TLV. The TLV carries an IP prefix and a prefix length. This enables routes to be summarized to cover 2^n routes where n is the difference between 32 and the prefix length. A consequence of this summarization is that detailed reachability is hidden. This document defines a means to carry detailed reachability information along with a summarized IP prefix. Host reachability information is carried via a bit vector of 2^n bits. For example, if Swallow, et al. Standards Track [Page 4] Internet Draft draft-swallow-isis-detailed-reach-01.txt February 2008 an area that had 10.0.1.0/25 assigned as its address range and had routes with loopbacks as follows 10.0.1.1 - 10.0.1.27 10.0.1.46 10.0.1.74 - 10.0.1.87 then the bit mask encoding would advertise a summary route to 10.0.1.0/25 with an associated 128-bit vector (shown in network order) like this: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4. Detailed Reachability Sub-TLV The detailed reachability sub-TLV is defined as a sub-TLV of the extended IP reachability TLV. Its type is sub-TLV type [to be assigned]. Each bit represents the reachability to one host address of the host addresses covered by the prefix. The sub-TLV length is the minimum number of octets required to contain a bit vector with a length equal to the number of IP addresses covered by the prefix contained in the parent extended IP reachability TLV. If L stands for the sub-TLV length and p stands for the prefix length then L = ceiling(2^(32-P)/8). The maximum length of the value field of any sub-TLV is 247 octets. Since the bit-vectors are always powers of 2 in length, the maximum bit-vector that will fit is 1024 bits in 128 octets. This is sufficient to handle a prefix of 22 bits. Shorter prefixes cannot be expressed directly. Instead they may be expressed by advertising as many 22 bit prefixes as are contained within the longer prefix. The value field encodes the bit vector. The bits are numbered as follows: the high-order bit of the first octet corresponds to zero, the low-order bit to seven, the high-order bit of the second octet to eight and so forth. Swallow, et al. Standards Track [Page 5] Internet Draft draft-swallow-isis-detailed-reach-01.txt February 2008 Each bit represents reachability to one host address, that address being equal to the value of the position as numbered above taken as a binary number and used as the low-order bits of the IPv4 address formed with the prefix as the high-order bits. A bit value of one indicates that the associated host is reachable. A bit value of zero indicates that the associated host is not reachable. 4.1. Backward Compatibility As defined in RFC 3784 [3], a sub-TLV which is not understood, is to be ignored. Thus a router which does not understand the new sub-TLV will behave as if it had simply received the summary route. 5. Domain Partitioning An L1L2 router (i.e. A) summarizing a set of L1 routes as a single L2 route (i.e. 1.1.1/24) monitors whether any other L1L2 router (i.e. B) advertises the same summary route to the L2 domain. When this occurs, it checks the consistency of the detailed reachability sub-TLV. On top of the summary, the L1L2 router advertises a host route for any host to which it has reachability but to whom the other L1L2 router advertises no reachability (if A advertises 1.1.1/24 with the .1 bit set while B advertises 1.1.1/24 with the bit reset, then A advertises 1.1.1.1/32 on top of 1.1.1/24). This handles any partitioning of an L1 domain. The same behavior is applied for summarization from L2 to L1 domain. In both cases, appropriate hold-down timers should be applied to cover timing differences in LSP generation between the routers. Partitioning of a domain is very unlikely in practice as the following design rule prevails in practice: any L1L2 router must have two disjoint paths to any node in any domain it connects to. This design rule is common as it is inspired by high-availability and safety/containment objectives: any failure within a domain is entirely dealt with (all flows are rerouted on working alternate paths) as soon as IS-IS convergence is completed in the domain where the failure occurred. Swallow, et al. Standards Track [Page 6] Internet Draft draft-swallow-isis-detailed-reach-01.txt February 2008 6. Semantics of detailed reachability As stated above, detailed reachability is determined by the setting of the bit associated with a specific host. The information present in the detailed reachability sub-TLV should not be used to generate any dataplane forwarding entry. It is only intended to be used by the control plane to validate/invalidate the reachability of, for example, BGP next-hops and PIM sources. The absence of the detailed reachability sub-TLV is equivalent to the presence of a detailed reachability TLV with all bits set. This is backward compatible with the definition of a classical summary route. Provided the domain partitioning behavior described previously is applied, the inconsistency of the detailed reachability of two equivalent summary routes is resolved by the presence of more- specific routes. 7. Applicability The following case study is proposed as an example of application. A single AS needs to interconnect 30000 PE's. Fast convergence upon any core link/node failure is required. As IS-IS convergence is essentially dependent upon the dataplane FIB update rate [5], it is required to limit the number of IS-IS routes installed in the dataplane to a few hundreds. This would be easily achieved through classical summarization. However, there is also a requirement to provide fast convergence upon any loss of a BGP nhop (PE node failure). BGP nhop reachability is commonly provided by the IGP as this is scalable (no n^2 mesh of liveness sessions) and it is known to converge fast (<200msec [5]). This classically leads to not summarize the PE loopbacks. The method described in this document solves the dilemma: (1) it drastically reduces the number of IS-IS routes and hence the number of related dataplane entries, hence achieving the scaling and fast convergence requirement (2) it maintains the scalable and fast reachability detection for PE's Let us further illustrate the case study to show the magnitude of the scaling benefit. Assuming this AS is structured along 75 regions, we Swallow, et al. Standards Track [Page 7] Internet Draft draft-swallow-isis-detailed-reach-01.txt February 2008 assume that 75 L1 domains would be created, each with 400 PE's. Each L1 domain would be connected to L2 via two L1L2 routers. Within each L1 domain we would assume that 40 non-PE devices interconnect the PE's to the L1L2 routers. We would assume that 200 P devices interconnect the L1L2 routers within the L2 domain. Assuming an average number of 5 neighbors per router, this leads to 1000 router- to-router subnets in each domain. We would for example allocate 10.0/13 for numbering router-to-router subnets and would divide this block into 128 /20 blocks. We would allocate one /20 block to each L1 domain (52 blocks for spare as one /20 is also given for the router- to-router subnets of the L2 domain). Using /31's, this allows for 2048 subnets (factor two for further future growth) per L1 domain. Note that we have a factor 4 of further growth possible with this illustrative numbering plan. We would allocate 10.8/15 for numbering router loopbacks and divide this block into 512 /24 blocks. We would allocate 6 /24's per L1 domain (62 are spare). Using /32 and assuming 30% efficiency for administrative reasons, each L1 domain would consumes 5 /24's (one is spare). This allows for a factor 3 future growth. The following IGP summarization scheme would be adopted: Each L1L2 router only advertises the summary /20 for router-to-router subnets in its L1 domain (the detailed reachability TLV is NOT needed for this block and hence classical summarization is used). Each L1L2 router only advertises the 5 summary /24's for router loopbacks in its L1 domain. These 5 /24's are complemented with detailed- reachability sub TLV. In conclusion, each router in the L2 domain knows about 75 /20's, 375 /24's, 1000 /31's and 350 /32's. In total, 1800 routes among which 375 are important. Each router in an L1 domain knows about 1000 /31's, 400 /32's, 370 /24's and 74 /20's for a total of 1844 subnets among which 770 are important. If a classical design had been used, then IS-IS would have had to support a total of 106000 routes among which 30000 were important. The method described in this documents allows for fast IS-IS convergence upon any intra-AS failure by decreasing the number of dataplane entries by a factor 50. It also allows for fast convergence upon inter-AS failure as the reachability to PE is preserved in IS-IS (with its scaling benefit) without any impact on the number of dataplane entries in the AS. Furthermore, the cost of this method is negligible as the detailed reachability sub-TLV is not used for the summary of router-to-router subnets. It is only used for the summary of PE loopbacks. Each /24 summary would require a modest 32-byte detailed-reachability sub-TLV. Swallow, et al. Standards Track [Page 8] Internet Draft draft-swallow-isis-detailed-reach-01.txt February 2008 The IS-IS LSDB of a router in the L2 domain would thus only grow by 375 * 32-bytes, which is insignificant. Finally, we note that this case study could easily be applied to an IPv6 network assuming well-known numbering techniques are used: (1) each L1 domain would receive 6 /120 blocks (the equivalent of a /24) and PE's in the domain would be numbered as /128 from these blocks, allowing for efficient summarization (2) each L1 domain would receive 6 /56 blocks (the equivalent of a /24) and each PE in the domain would receive a dedicated /64. In this case, each bit in the detailed-reachability sub-TLV indicates the reachability of an entire /64 block (the PE in question). Aside highlighting the significant scaling advantage of the proposal and the insignificant increase of the LSDB, the purpose of the case study is also to remind that the base for efficient routing is efficient address allocation. It is clear that the method described in this document would not be applicable if exotic numbering plans would be used. It is unlikely that a numbering plan would allocate /16 to number BGP nhops within an L1 domain. Hence, while the proposal is limited to encode detailed-reachability sub-TLV for /22 summary routes, this limitation should not be a constraint in practice. It is unlikely that a numbering plan would allocate /56 IPv6 blocks to an L1 domain and would then randomly (and hence very sparsely) allocate /128 addresses to PE devices in that domain. 8. Security Considerations The detailed reachability sub-TLV does not change the information that IS-IS can share with other routers, nor does it change the set of routers to which the information is sent. It does RECOMMEND that a router treat the information differently, delivering the detailed reachability to the control plane while using the summary to scale the forwarding plane. These changes however are not mandated. Thus this extension to IS-IS poses no new security threats. Swallow, et al. Standards Track [Page 9] Internet Draft draft-swallow-isis-detailed-reach-01.txt February 2008 9. IANA Considerations [to be written] 10. References 10.1. Normative References [1] ISO, "Intermediate System to Intermediate System Intra-Domain Routeing Exchange Protocol for use in Conjunction with the Protocol for Providing the Connectionless-mode Network Service (ISO 8473)", International Standard 10589:2002, Second Edition [2] Callon, R.W., "Use of OSI IS-IS for routing in TCP/IP and dual environments", RFC 1195, December 1990 [3] Smit, H. and T. Li, "Intermediate System to Intermediate System (IS-IS) Extensions for Traffic Engineering (TE)", RFC 3784, June 2004. [4] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 10.2. Informative References [5] P. Francois, C. Filsfils, J. Evans, and O. Bonaventure, "Achieving sub-second IGP convergence in large IP networks", SIGCOMM Computer Communications Review, 35(3):35-44, 2005. Swallow, et al. Standards Track [Page 10] Internet Draft draft-swallow-isis-detailed-reach-01.txt February 2008 11. Authors' Addresses Clarence Filsfils Cisco Systems, Inc. Email: cfilsfil@cisco.com Stefano Previdi Cisco Systems, Inc. Email: sprevidi@cisco.com George Swallow Cisco Systems, Inc. Email: swallow@cisco.com Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf- ipr@ietf.org. Swallow, et al. Standards Track [Page 11] Internet Draft draft-swallow-isis-detailed-reach-01.txt February 2008 Full Copyright Notice Copyright (C) The IETF Trust (2008). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Swallow, et al. Standards Track [Page 12]