Network Working Group                                  Clarence Filsfils
Internet Draft                                       Cisco Systems, Inc.
Category: Standards Track
Expiration Date: August 2008
                                                         Stefano Previdi
                                                     Cisco Systems, Inc.

                                                          George Swallow
                                                     Cisco Systems, Inc.

                                                           February 2008


                IS-IS Detailed IP Reachability Extension


                draft-swallow-isis-detailed-reach-01.txt

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/1id-abstracts.html

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html


   Abstract

      This document defines a means for IS-IS to carry detailed host
      reachability information along with summarized IP reachability.
      In particular it defines a new sub-TLV of the extended IP
      reachability TLV.


Swallow, et al.              Standards Track                    [Page 1]


Internet Draft  draft-swallow-isis-detailed-reach-01.txt   February 2008


Contents

 1      Introduction  ..............................................   3
 1.1    Conventions  ...............................................   3
 1.2    Terminology  ...............................................   3
 2      Background  ................................................   3
 3      Overview  ..................................................   4
 4      Detailed Reachability Sub-TLV  .............................   5
 4.1    Backward Compatibility  ....................................   6
 5      Domain Partitioning  .......................................   6
 6      Semantics of detailed reachability  ........................   7
 7      Applicability  .............................................   7
 8      Security Considerations  ...................................   9
 9      IANA Considerations  .......................................  10
10      References  ................................................  10
10.1    Normative References  ......................................  10
10.2    Informative References  ....................................  10
11      Authors' Addresses  ........................................  11


Swallow, et al.              Standards Track                    [Page 2]


Internet Draft  draft-swallow-isis-detailed-reach-01.txt   February 2008


1. Introduction

   The IS-IS protocol is specified in ISO-10589 [1], with extensions for
   supporting IPv4 specified in RFC1195 [2].  The extended IP
   reachability TLV is specified in RFC3784 [3].  This document defines
   a sub-TLV of that TLV to allow detailed host reachability information
   to be carried along with summarized IP reachability.


1.1. Conventions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [4].


1.2. Terminology

     ASBR      Autonomous system border router
     BGP       Border Gateway Protocol
     FIB       Forwarding information base
     IGP       Interior gateway protocol
     L1L2      Level 1 and level 2
     LSDB      Link-state data base
     PE        Provider edge
     PIM       Protocol Independent Multicast
     RIB       Routing information base
     RPF       Reverse path forwarding


2. Background

   IS-IS advertises routing/reachability information in link-state
   packets within a domain. Currently no distinction is made between
   routing and reachability. In the case of a host-route (/32 addresses
   in the case of IPv4) this is not a problem as there can be no
   ambiguity between routing and reachability. If a host is advertised
   as reachable, then there is (except during a convergence period or in
   very unusual circumstances) a routed path to that address.  However,
   when shorter prefixes are advertised as reachable, reachability to a
   specific host address is hidden.

   When reachability is summarized as it often is between levels,
   detailed reachability information is lost.  Such summarization is
   critical to the scaling and convergence of the forwarding plane.

   However, various control plane elements require host reachability
   information (usually to PE or ASBR loopback addresses) either for


Swallow, et al.              Standards Track                    [Page 3]


Internet Draft  draft-swallow-isis-detailed-reach-01.txt   February 2008


   correct action or to speed convergence. This level of detail very
   often is not needed in the forwarding plane.  But the current all-or-
   nothing behavior of IS-IS leaves a network operator with a choice of
   missing the benefits of summarization for scalability or loosing the
   benefits of detailed reachability information.

   Among the control plane elements that could benefit from detailed
   host-reachability information are BGP next-hop tracking and PIM.

   The Border Gateway Protocol (BGP) advertises routes that are external
   to the domain by associating them with a BGP next-hop address that is
   known within the domain. Often multiple next-hops are available to
   reach a particular prefix. If a prefix becomes unreachable, then BGP
   will withdraw the route. Such withdrawals take time.  In particular
   if the advertising router goes down the withdrawal may be delayed
   until the BGP TCP session times out.

   In order to speed convergence routers employ a technique called next-
   hop tracking. In next-hop tracking the reachability of the BGP next-
   hop is tracked. If a next-hop becomes unreachable, BGP route
   selection is run. External routes that are reachable through a known
   alternative next-hop are then installed.

   Currently if next-hop tracking is to be performed, the above
   mentioned host-routes cannot be summarized. The proposed extension
   allows the IGP routes to be summarized while distributing the
   detailed reachability information needed for next-hop tracking.

   PIM depends on the IGP reachability to the source of an (S, G) state
   to determine its RPF interface.  When PIM installs an (S, G) state
   for the first time, it registers with the RIB for being notified of
   any route change to S. Later on, if the route to S changes, RIB
   immediately sends a notification to PIM.


3. Overview

   In IS-IS IP reachability information may be carried in the extended
   IP reachability TLV.  The TLV carries an IP prefix and a prefix
   length.  This enables routes to be summarized to cover 2^n routes
   where n is the difference between 32 and the prefix length.  A
   consequence of this summarization is that detailed reachability is
   hidden.

   This document defines a means to carry detailed reachability
   information along with a summarized IP prefix.  Host reachability
   information is carried via a bit vector of 2^n bits.  For example, if


Swallow, et al.              Standards Track                    [Page 4]


Internet Draft  draft-swallow-isis-detailed-reach-01.txt   February 2008


   an area that had 10.0.1.0/25 assigned as its address range and had
   routes with loopbacks as follows

     10.0.1.1 - 10.0.1.27
     10.0.1.46
     10.0.1.74 - 10.0.1.87

   then the bit mask encoding would advertise a summary route to
   10.0.1.0/25 with an associated 128-bit vector (shown in network
   order) like this:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0|
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0|
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


4. Detailed Reachability Sub-TLV

   The detailed reachability sub-TLV is defined as a sub-TLV of the
   extended IP reachability TLV.  Its type is sub-TLV type [to be
   assigned].  Each bit represents the reachability to one host address
   of the host addresses covered by the prefix.

   The sub-TLV length is the minimum number of octets required to
   contain a bit vector with a length equal to the number of IP
   addresses covered by the prefix contained in the parent extended IP
   reachability TLV.  If L stands for the sub-TLV length and p stands
   for the prefix length then L = ceiling(2^(32-P)/8).  The maximum
   length of the value field of any sub-TLV is 247 octets.  Since the
   bit-vectors are always powers of 2 in length, the maximum bit-vector
   that will fit is 1024 bits in 128 octets.  This is sufficient to
   handle a prefix of 22 bits.  Shorter prefixes cannot be expressed
   directly.  Instead they may be expressed by advertising as many 22
   bit prefixes as are contained within the longer prefix.

   The value field encodes the bit vector.  The bits are numbered as
   follows: the high-order bit of the first octet corresponds to zero,
   the low-order bit to seven, the high-order bit of the second octet to
   eight and so forth.


Swallow, et al.              Standards Track                    [Page 5]


Internet Draft  draft-swallow-isis-detailed-reach-01.txt   February 2008


   Each bit represents reachability to one host address, that address
   being equal to the value of the position as numbered above taken as a
   binary number and used as the low-order bits of the IPv4 address
   formed with the prefix as the high-order bits.  A bit value of one
   indicates that the associated host is reachable.  A bit value of zero
   indicates that the associated host is not reachable.


4.1. Backward Compatibility

   As defined in RFC 3784 [3], a sub-TLV which is not understood, is to
   be ignored.  Thus a router which does not understand the new sub-TLV
   will behave as if it had simply received the summary route.


5. Domain Partitioning

   An L1L2 router (i.e. A) summarizing a set of L1 routes as a single L2
   route (i.e. 1.1.1/24) monitors whether any other L1L2 router (i.e. B)
   advertises the same summary route to the L2 domain. When this occurs,
   it checks the consistency of the detailed reachability sub-TLV. On
   top of the summary, the L1L2 router advertises a host route for any
   host to which it has reachability but to whom the other L1L2 router
   advertises no reachability (if A advertises 1.1.1/24 with the .1 bit
   set while B advertises 1.1.1/24 with the bit reset, then A advertises
   1.1.1.1/32 on top of 1.1.1/24). This handles any partitioning of an
   L1 domain. The same behavior is applied for summarization from L2 to
   L1 domain.  In both cases, appropriate hold-down timers should be
   applied to cover timing differences in LSP generation between the
   routers.

   Partitioning of a domain is very unlikely in practice as the
   following design rule prevails in practice: any L1L2 router must have
   two disjoint paths to any node in any domain it connects to. This
   design rule is common as it is inspired by high-availability and
   safety/containment objectives: any failure within a domain is
   entirely dealt with (all flows are rerouted on working alternate
   paths) as soon as IS-IS convergence is completed in the domain where
   the failure occurred.


Swallow, et al.              Standards Track                    [Page 6]


Internet Draft  draft-swallow-isis-detailed-reach-01.txt   February 2008


6. Semantics of detailed reachability

   As stated above, detailed reachability is determined by the setting
   of the bit associated with a specific host.

   The information present in the detailed reachability sub-TLV should
   not be used to generate any dataplane forwarding entry. It is only
   intended to be used by the control plane to validate/invalidate the
   reachability of, for example, BGP next-hops and PIM sources.

   The absence of the detailed reachability sub-TLV is equivalent to the
   presence of a detailed reachability TLV with all bits set. This is
   backward compatible with the definition of a classical summary route.

   Provided the domain partitioning behavior described previously is
   applied, the inconsistency of the detailed reachability of two
   equivalent summary routes is resolved by the presence of more-
   specific routes.


7. Applicability

   The following case study is proposed as an example of application.

   A single AS needs to interconnect 30000 PE's. Fast convergence upon
   any core link/node failure is required. As IS-IS convergence is
   essentially dependent upon the dataplane FIB update rate [5], it is
   required to limit the number of IS-IS routes installed in the
   dataplane to a few hundreds. This would be easily achieved through
   classical summarization. However, there is also a requirement to
   provide fast convergence upon any loss of a BGP nhop (PE node
   failure). BGP nhop reachability is commonly provided by the IGP as
   this is scalable (no n^2 mesh of liveness sessions) and it is known
   to converge fast (<200msec [5]). This classically leads to not
   summarize the PE loopbacks.

   The method described in this document solves the dilemma:

   (1) it drastically reduces the number of IS-IS routes and hence the
       number of related dataplane entries, hence achieving the scaling
       and fast convergence requirement

   (2) it maintains the scalable and fast reachability detection for
       PE's

   Let us further illustrate the case study to show the magnitude of the
   scaling benefit.  Assuming this AS is structured along 75 regions, we


Swallow, et al.              Standards Track                    [Page 7]


Internet Draft  draft-swallow-isis-detailed-reach-01.txt   February 2008


   assume that 75 L1 domains would be created, each with 400 PE's. Each
   L1 domain would be connected to L2 via two L1L2 routers. Within each
   L1 domain we would assume that 40 non-PE devices interconnect the
   PE's to the L1L2 routers. We would assume that 200 P devices
   interconnect the L1L2 routers within the L2 domain. Assuming an
   average number of 5 neighbors per router, this leads to 1000 router-
   to-router subnets in each domain.  We would for example allocate
   10.0/13 for numbering router-to-router subnets and would divide this
   block into 128 /20 blocks. We would allocate one /20 block to each L1
   domain (52 blocks for spare as one /20 is also given for the router-
   to-router subnets of the L2 domain). Using /31's, this allows for
   2048 subnets (factor two for further future growth) per L1 domain.
   Note that we have a factor 4 of further growth possible with this
   illustrative numbering plan.  We would allocate 10.8/15 for numbering
   router loopbacks and divide this block into 512 /24 blocks. We would
   allocate 6 /24's per L1 domain (62 are spare). Using /32 and assuming
   30% efficiency for administrative reasons, each L1 domain would
   consumes 5 /24's (one is spare).  This allows for a factor 3 future
   growth.

   The following IGP summarization scheme would be adopted: Each L1L2
   router only advertises the summary /20 for router-to-router subnets
   in its L1 domain (the detailed reachability TLV is NOT needed for
   this block and hence classical summarization is used). Each L1L2
   router only advertises the 5 summary /24's for router loopbacks in
   its L1 domain. These 5 /24's are complemented with detailed-
   reachability sub TLV.

   In conclusion, each router in the L2 domain knows about 75 /20's, 375
   /24's, 1000 /31's and 350 /32's. In total, 1800 routes among which
   375 are important.  Each router in an L1 domain knows about 1000
   /31's, 400 /32's, 370 /24's and 74 /20's for a total of 1844 subnets
   among which 770 are important.

   If a classical design had been used, then IS-IS would have had to
   support a total of 106000 routes among which 30000 were important.

   The method described in this documents allows for fast IS-IS
   convergence upon any intra-AS failure by decreasing the number of
   dataplane entries by a factor 50. It also allows for fast convergence
   upon inter-AS failure as the reachability to PE is preserved in IS-IS
   (with its scaling benefit) without any impact on the number of
   dataplane entries in the AS.

   Furthermore, the cost of this method is negligible as the detailed
   reachability sub-TLV is not used for the summary of router-to-router
   subnets. It is only used for the summary of PE loopbacks. Each /24
   summary would require a modest 32-byte detailed-reachability sub-TLV.


Swallow, et al.              Standards Track                    [Page 8]


Internet Draft  draft-swallow-isis-detailed-reach-01.txt   February 2008


   The IS-IS LSDB of a router in the L2 domain would thus only grow by
   375 * 32-bytes, which is insignificant.

   Finally, we note that this case study could easily be applied to an
   IPv6 network assuming well-known numbering techniques are used:

   (1) each L1 domain would receive 6 /120 blocks (the equivalent of a
       /24) and PE's in the domain would be numbered as /128 from these
       blocks, allowing for efficient summarization

   (2) each L1 domain would receive 6 /56 blocks (the equivalent of a
       /24) and each PE in the domain would receive a dedicated /64. In
       this case, each bit in the detailed-reachability sub-TLV
       indicates the reachability of an entire /64 block (the PE in
       question).

   Aside highlighting the significant scaling advantage of the proposal
   and the insignificant increase of the LSDB, the purpose of the case
   study is also to remind that the base for efficient routing is
   efficient address allocation. It is clear that the method described
   in this document would not be applicable if exotic numbering plans
   would be used.

   It is unlikely that a numbering plan would allocate /16 to number BGP
   nhops within an L1 domain. Hence, while the proposal is limited to
   encode detailed-reachability sub-TLV for /22 summary routes, this
   limitation should not be a constraint in practice.

   It is unlikely that a numbering plan would allocate /56 IPv6 blocks
   to an L1 domain and would then randomly (and hence very sparsely)
   allocate /128 addresses to PE devices in that domain.


8. Security Considerations

   The detailed reachability sub-TLV does not change the information
   that IS-IS can share with other routers, nor does it change the set
   of routers to which the information is sent.  It does RECOMMEND that
   a router treat the information differently, delivering the detailed
   reachability to the control plane while using the summary to scale
   the forwarding plane.  These changes however are not mandated.  Thus
   this extension to IS-IS poses no new security threats.


Swallow, et al.              Standards Track                    [Page 9]


Internet Draft  draft-swallow-isis-detailed-reach-01.txt   February 2008


9. IANA Considerations

   [to be written]


10. References

10.1. Normative References

   [1] ISO, "Intermediate System to Intermediate System Intra-Domain
       Routeing Exchange Protocol for use in Conjunction with the
       Protocol for Providing the Connectionless-mode Network Service
       (ISO 8473)", International Standard 10589:2002, Second Edition

   [2] Callon, R.W., "Use of OSI IS-IS for routing in TCP/IP and dual
       environments", RFC 1195, December 1990

   [3] Smit, H. and T. Li, "Intermediate System to Intermediate
       System (IS-IS) Extensions for Traffic Engineering (TE)", RFC
       3784, June 2004.

   [4] Bradner, S., "Key words for use in RFCs to Indicate
       Requirement Levels", BCP 14, RFC 2119, March 1997.


10.2. Informative References

   [5] P. Francois, C. Filsfils, J. Evans, and O. Bonaventure,
       "Achieving sub-second IGP convergence in large IP
       networks", SIGCOMM Computer Communications Review,
       35(3):35-44, 2005.


Swallow, et al.              Standards Track                   [Page 10]


Internet Draft  draft-swallow-isis-detailed-reach-01.txt   February 2008


11. Authors' Addresses

      Clarence Filsfils
      Cisco Systems, Inc.

      Email:  cfilsfil@cisco.com

      Stefano Previdi
      Cisco Systems, Inc.

      Email:  sprevidi@cisco.com

      George Swallow
      Cisco Systems, Inc.

      Email:  swallow@cisco.com


Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at ietf-
   ipr@ietf.org.


Swallow, et al.              Standards Track                   [Page 11]


Internet Draft  draft-swallow-isis-detailed-reach-01.txt   February 2008


Full Copyright Notice

   Copyright (C) The IETF Trust (2008).  This document is subject to the
   rights, licenses and restrictions contained in BCP 78, and except as
   set forth therein, the authors retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Swallow, et al.              Standards Track                   [Page 12]