508 lines
17 KiB
Plaintext
508 lines
17 KiB
Plaintext
|
||
|
||
|
||
|
||
|
||
|
||
Network Working Group D. Thaler
|
||
Request for Comments: 2991 Microsoft
|
||
Category: Informational C. Hopps
|
||
NextHop Technologies
|
||
November 2000
|
||
|
||
|
||
Multipath Issues in Unicast and Multicast Next-Hop Selection
|
||
|
||
Status of this Memo
|
||
|
||
This memo provides information for the Internet community. It does
|
||
not specify an Internet standard of any kind. Distribution of this
|
||
memo is unlimited.
|
||
|
||
Copyright Notice
|
||
|
||
Copyright (C) The Internet Society (2000). All Rights Reserved.
|
||
|
||
Abstract
|
||
|
||
Various routing protocols, including Open Shortest Path First (OSPF)
|
||
and Intermediate System to Intermediate System (ISIS), explicitly
|
||
allow "Equal-Cost Multipath" (ECMP) routing. Some router
|
||
implementations also allow equal-cost multipath usage with RIP and
|
||
other routing protocols. The effect of multipath routing on a
|
||
forwarder is that the forwarder potentially has several next-hops for
|
||
any given destination and must use some method to choose which next-
|
||
hop should be used for a given data packet.
|
||
|
||
1. Introduction
|
||
|
||
Various routing protocols, including OSPF and ISIS, explicitly allow
|
||
"Equal-Cost Multipath" routing. Some router implementations also
|
||
allow equal-cost multipath usage with RIP and other routing
|
||
protocols. Using equal-cost multipath means that if multiple equal-
|
||
cost routes to the same destination exist, they can be discovered and
|
||
used to provide load balancing among redundant paths.
|
||
|
||
The effect of multipath routing on a forwarder is that the forwarder
|
||
potentially has several next-hops for any given destination and must
|
||
use some method to choose which next-hop should be used for a given
|
||
data packet. This memo summarizes current practices, problems, and
|
||
solutions.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Thaler & Hopps Informational [Page 1]
|
||
|
||
RFC 2991 Multipath Issues November 2000
|
||
|
||
|
||
2. Concerns
|
||
|
||
Several router implementations allow multipath forwarding. This is
|
||
sometimes done naively via round-robin, where each packet matching a
|
||
given destination route is forwarded using the subsequent next-hop,
|
||
in a round-robin fashion. This does provide a form of load
|
||
balancing, but there are several problems with approaches such as
|
||
round-robin or random:
|
||
|
||
Variable Path MTU
|
||
Since each of the redundant paths may have a different MTU,
|
||
this means that the overall path MTU can change on a packet-
|
||
by-packet basis, negating the usefulness of path MTU discovery.
|
||
|
||
Variable Latencies
|
||
Since each of the redundant paths may have a different latency
|
||
involved, having packets take separate paths can cause packets
|
||
to always arrive out of order, increasing delivery latency and
|
||
buffering requirements.
|
||
|
||
Packet reordering causes TCP to believe that loss has taken
|
||
place when packets with higher sequence numbers arrive before
|
||
an earlier one. When three or more packets are received before
|
||
a "late" packet, TCP enters a mode called "fast-retransmit" [6]
|
||
which consumes extra bandwidth (which could potentially cause
|
||
more loss, decreasing throughput) as it attempts to
|
||
unnecessarily retransmit the delayed packet(s). Hence,
|
||
reordering can be detrimental to network performance.
|
||
|
||
Debugging
|
||
Common debugging utilities such as ping and traceroute are much
|
||
less reliable in the presence of multiple paths and may even
|
||
present completely wrong results.
|
||
|
||
In multicast routing, the problem with multiple paths is that
|
||
multicast routing protocols prevent loops and duplicates by
|
||
constructing a single tree to all receivers of the same group
|
||
address. Multicast routing protocols deployed today (DVMRP, PIM-DM,
|
||
PIM-SM) [2] construct shortest-path trees rooted at either the
|
||
source, or another router known as a Core or Rendezvous Point.
|
||
Hence, the way they ensure that duplicates will not arise is that a
|
||
given tree must use only a single next-hop towards the root of the
|
||
tree.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Thaler & Hopps Informational [Page 2]
|
||
|
||
RFC 2991 Multipath Issues November 2000
|
||
|
||
|
||
3. Requirements
|
||
|
||
In the remainder of this document, we will use the term "flow" to
|
||
represent the granularity at which the router keeps state (if at all)
|
||
for classes of traffic. The exact definition of a flow may depend on
|
||
the actual implementation. For example, a flow might be identified
|
||
solely by destination address, or it might be identified by (source
|
||
address, destination address, protocol id) triplet. Hence "flow" is
|
||
not necessarily synonymous with the term "microflow" as used in RFC
|
||
2474 [7], which also includes port numbers. Indeed, including
|
||
transport-layer information in the next-hop selection process can
|
||
actually be problematic. For example, if packets are fragmented, the
|
||
transport-layer information may not be available in every packet.
|
||
Furthermore, having the choice of path depend on transport-layer
|
||
fields may negate the benefit of caching information such as MTU for
|
||
use in subsequent connections between the same endpoints.
|
||
|
||
All of the problems outlined in the previous section arise when
|
||
packets in the same unicast or multicast "flow" are split among
|
||
multiple paths. The natural solution is therefore to ensure that
|
||
packets for the same flow always use the same path.
|
||
|
||
Two additional features are desirable:
|
||
|
||
Minimal disruption
|
||
When multipath is used, meaning that multiple routes contribute
|
||
valid next-hops, the chances are higher of routes being added
|
||
and deleted from consideration than when only the "best" route
|
||
is used (in which case metric changes in alternate routes have
|
||
no effect on traffic paths). Since a higher number of routes
|
||
may actually be used for forwarding when multipath is in use,
|
||
the potential for packet reordering and packet loss due to
|
||
route flaps can be much greater than when not using multipath.
|
||
Hence, it is desirable to minimize the number of active flows
|
||
affected by the addition or deletion of another next-hop.
|
||
|
||
Fast implementation
|
||
The amount of additional computation required to forward a
|
||
packet should be small. For example, when doing round-robin,
|
||
this computation might consist of incrementing (modulo the
|
||
number of next-hops) a next-hop index.
|
||
|
||
4. Solutions
|
||
|
||
We now provide three possible methods for improving the performance
|
||
of multipath and then discuss their applicability to unicast and
|
||
multicast forwarding.
|
||
|
||
|
||
|
||
|
||
Thaler & Hopps Informational [Page 3]
|
||
|
||
RFC 2991 Multipath Issues November 2000
|
||
|
||
|
||
Modulo-N Hash
|
||
To select a next-hop from the list of N next-hops, the router
|
||
performs a modulo-N hash over the packet header fields that
|
||
identify a flow. This has the advantage of being fast, at the
|
||
expense of (N-1)/N of all flows changing paths whenever a
|
||
next-hop is added or removed.
|
||
|
||
Hash-Threshold
|
||
The router first selects a key by performing a hash over the
|
||
packet header fields that identify the flow. The N next-hops
|
||
have been assigned unique regions in the hash function's output
|
||
space. By comparing the hash value against region boundaries
|
||
the router can determine which region the hash value belongs to
|
||
and thus which next-hop to use. This method has the advantage
|
||
of only affecting flows near the region boundaries (or
|
||
thresholds) when next-hops are added or removed. For ECMP
|
||
hash-threshold's lookup can be done with a simple division
|
||
(hash_value / fixed_region_size). When a next-hop is added or
|
||
removed, between 1/4 and 1/2 of all flows change paths. An
|
||
analysis of this method can be found in [3].
|
||
|
||
Highest Random Weight (HRW)
|
||
The router computes a key for EACH next-hop by performing a
|
||
hash over the packet header fields that identify the flow, as
|
||
well as over the address of the next-hop. The router then
|
||
chooses the next-hop with the highest resulting key value [4].
|
||
This has the advantage of minimizing the number of flows
|
||
affected by a next-hop addition or deletion (only 1/N of them),
|
||
but is approximately N times as expensive as a modulo-N hash.
|
||
|
||
The applicability of these three alternatives depends on (at least)
|
||
two factors: whether the forwarder maintains per-flow state, and how
|
||
precious CPU is to a multipath forwarder.
|
||
|
||
Some routers may maintain per-flow state for reasons other than for
|
||
supporting multipath. For example, routers typically keep per-flow
|
||
state for multicast flows so that they can maintain the list of
|
||
interfaces to which packets in the flow should be copied.
|
||
|
||
If per-flow state is maintained in a multipath forwarder, then
|
||
computation of the next-hop can be done by the router at state
|
||
creation time. This entails no additional computations at packet
|
||
forwarding time compared with normal forwarding to a single next-hop,
|
||
since the next-hop is precomputed. In this case, any method can be
|
||
used, including round-robin, random, modulo-N, hash-threshold or HRW.
|
||
Hash functions such as modulo-N, hash-threshold and HRW are better if
|
||
the forwarder state may be deleted for any reason during the lifetime
|
||
of a flow since subsequent next-hop computations by the router will
|
||
|
||
|
||
|
||
Thaler & Hopps Informational [Page 4]
|
||
|
||
RFC 2991 Multipath Issues November 2000
|
||
|
||
|
||
always select the same path. This also improves the usefulness of
|
||
debugging utilities such as traceroute. Finally, to maximize the
|
||
stability of paths (and hence the usefulness of traceroute, etc.),
|
||
the use of HRW is recommended over the other methods mentioned
|
||
herein.
|
||
|
||
If per-flow state is not maintained by the forwarder, then using
|
||
multiple next-hops requires that the next-hop be calculated at packet
|
||
arrival time. When CPU is more precious than stability of flow
|
||
paths, hash-threshold is recommended over the other methods mentioned
|
||
herein.
|
||
|
||
4.1. Unicast Forwarding
|
||
|
||
Depending on the implementation, unicast forwarding may or may not
|
||
keep per-flow state. We recommend that where forwarder
|
||
implementations keep flow state, routers should use HRW at state
|
||
creation time (and next-hop deletion time) to select the next-hop,
|
||
and that forwarders without per-flow state use hash-threshold.
|
||
|
||
4.2. Multicast Forwarding
|
||
|
||
Today's multicast forwarding engines use a cache of forwarding
|
||
entries indexed by group (or group prefix) and source (or source
|
||
prefix). This means that today's multicast forwarder's always keep
|
||
per-flow state, although for some multicast routing protocols, the
|
||
"flow" may be fairly coarse (e.g., traffic from all sources to the
|
||
same destination). Since per-flow state is kept by the forwarder, it
|
||
is recommended that the router always use HRW to select the next-hop.
|
||
|
||
Routers using explicit-joining protocols such as PIM-SM [5] should
|
||
thus use the multipath information when determining to which neighbor
|
||
a join message should be sent. For example, when multiple next-hops
|
||
exist for a given Rendezvous Point (RP) toward which a (*,G) Join
|
||
should be sent, it is recommended that HRW be used to select the
|
||
next-hop to use for each group.
|
||
|
||
5. Applicability
|
||
|
||
The algorithms discussed above (except round-robin) all rely on some
|
||
form of hash function. Equal flow distribution is achieved when the
|
||
hash function is uniformly distributed. Since the commonly used hash
|
||
functions only become uniformly distributed when the number of inputs
|
||
is relatively large, these algorithms are more applicable to routers
|
||
used to route many flows, than in, for example, a small business
|
||
setting.
|
||
|
||
|
||
|
||
|
||
|
||
Thaler & Hopps Informational [Page 5]
|
||
|
||
RFC 2991 Multipath Issues November 2000
|
||
|
||
|
||
6. Redundant Parallel Links
|
||
|
||
A related problem occurs when multiple parallel links are used
|
||
between the same pair of routers. A common solution is to bundle the
|
||
two links together into a "super"-link when is then used for routing.
|
||
For multicast forwarding, this results in the two links being reduced
|
||
to a single next-hop (over the combined link) which can be used to
|
||
prevent duplicates. When a unicast or multicast packet is queued to
|
||
the combined link, some method, such as those discussed earlier, is
|
||
still required to determine the physical link on which to transmit
|
||
the packet. If the parallel links are identical, then most of the
|
||
concerns discussed in this document are avoided with the combined
|
||
link. The exception is packet reordering, which can still occur with
|
||
round-robin, adversely affecting TCP.
|
||
|
||
7. Security Considerations
|
||
|
||
This document discusses issues with various methods of choosing a
|
||
next-hop from among multiple valid next-hops. As such, it does not
|
||
directly impact the security of the Internet infrastructure or its
|
||
applications.
|
||
|
||
One issue that is worth mentioning, however, is that when next-hop
|
||
selection is predictable, an attacker can synthesize traffic that
|
||
will all hash the same, making it possible to launch a denial-of-
|
||
service attack that overloads a particular path. Since a special
|
||
case of this is when the same (single) next-hop is always selected,
|
||
such an attack is easiest when multipath is not being used.
|
||
Introducing multipath routing can make such an attack more difficult;
|
||
the more unpredictable the hash is, the harder it becomes to conduct
|
||
a denial-of-service attack against any single link.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Thaler & Hopps Informational [Page 6]
|
||
|
||
RFC 2991 Multipath Issues November 2000
|
||
|
||
|
||
8. References
|
||
|
||
[1] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998.
|
||
|
||
[2] Maufer, T., "Deploying IP Multicast in the Enterprise",
|
||
Prentice-Hall, 1998.
|
||
|
||
[3] Hopps, C., "Analysis of an Equal-Cost Multi-Path Algorithm", RFC
|
||
2992, November 2000.
|
||
|
||
[4] Thaler, D., and C.V. Ravishankar, "Using Name-Based Mappings to
|
||
Increase Hit Rates", IEEE/ACM Transactions on Networking,
|
||
February 1998.
|
||
|
||
[5] Estrin, D., Farinacci, D., Helmy, A., Thaler, D., Deering, S.,
|
||
Handley, M., Jacobson, V., Liu, C., Sharma, P. and L. Wei,
|
||
"Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol
|
||
Specification", RFC 2362, June 1998.
|
||
|
||
[6] Allman, M., Paxson, V. and W. Stevens, "TCP Congestion Control",
|
||
RFC 2581, April 1999.
|
||
|
||
[7] Nichols, K., Blake, S., Baker, F. and D. Black., "Definition of
|
||
the Differentiated Services Field (DS Field) in the IPv4 and
|
||
IPv6 Headers", RFC 2474, December 1998.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Thaler & Hopps Informational [Page 7]
|
||
|
||
RFC 2991 Multipath Issues November 2000
|
||
|
||
|
||
9. Authors' Addresses
|
||
|
||
Dave Thaler
|
||
Microsoft
|
||
One Microsoft Way
|
||
Redmond, WA 98052
|
||
|
||
Phone: +1 425 703 8835
|
||
EMail: dthaler@dthaler.microsoft.com
|
||
|
||
|
||
Christian E. Hopps
|
||
NextHop Technologies, Inc.
|
||
517 W. William Street
|
||
Ann Arbor, MI 48103-4943
|
||
U.S.A
|
||
|
||
Phone: +1 734 936 0291
|
||
EMail: chopps@nexthop.com
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Thaler & Hopps Informational [Page 8]
|
||
|
||
RFC 2991 Multipath Issues November 2000
|
||
|
||
|
||
10. Full Copyright Statement
|
||
|
||
Copyright (C) The Internet Society (2000). All Rights Reserved.
|
||
|
||
This document and translations of it may be copied and furnished to
|
||
others, and derivative works that comment on or otherwise explain it
|
||
or assist in its implementation may be prepared, copied, published
|
||
and distributed, in whole or in part, without restriction of any
|
||
kind, provided that the above copyright notice and this paragraph are
|
||
included on all such copies and derivative works. However, this
|
||
document itself may not be modified in any way, such as by removing
|
||
the copyright notice or references to the Internet Society or other
|
||
Internet organizations, except as needed for the purpose of
|
||
developing Internet standards in which case the procedures for
|
||
copyrights defined in the Internet Standards process must be
|
||
followed, or as required to translate it into languages other than
|
||
English.
|
||
|
||
The limited permissions granted above are perpetual and will not be
|
||
revoked by the Internet Society or its successors or assigns.
|
||
|
||
This document and the information contained herein is provided on an
|
||
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
|
||
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
|
||
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
|
||
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
|
||
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
|
||
|
||
Acknowledgement
|
||
|
||
Funding for the RFC Editor function is currently provided by the
|
||
Internet Society.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Thaler & Hopps Informational [Page 9]
|
||
|