Network recovery, protection and restoration for Multi-Protocol Label Switching (MPLS


Introduction to Multi-Protocol Label Switching (MPLS)
Multi-Protocol Label Switching (MPLS) is rapidly becoming a key technology for
use in core networks, including converged data and voice networks. MPLS does
not replace IP routing, but works alongside existing and future routing
technologies to provide very high-speed data forwarding between Label-Switched
Routers (LSRs) together with reservation of bandwidth for traffic flows with
differing Quality of Service (QoS) requirements. MPLS enhances the services that
can be provided by IP networks, offering scope for Traffic Engineering, guaranteed
QoS and Virtual Private Networks (VPNs).
MPLS uses a technique known as label switching to forward data through
the network. A small, fixed-format label is inserted in front of each data packet on
entry into the MPLS network. At each hop across the network, the packet is routed
based on the value of the incoming interface and label, and dispatched to an
outwards interface with a new label value. The path that data follows through a
network is defined by the transition in label values, as the label is swapped at
each LSR. Since the mapping between labels is constant at each LSR, the path is
determined by the initial label value. Such a path is called a Label Switched Path
MPLS Header Format

Label Value : 20-bit label value
Exp : experimental use – can indicate class of service
S : bottom of stack indicator – 1 for the bottom label, 0 otherwise
TTL : Time To Live

Figure 5 : MPLS label format
MPLS Operation

For each distinct flow. The following are key elements of the operation. called a Forwarding Equivalence Class (FEC). protection and restoration for Multi-Protocol Label Switching (MPLS) An MPLS network or internet consists of a set of nodes. the forwarding process is simpler than with an IP router. between a source endpoint and a multicast group of destination endpoints. Figure 6: MPLS Operation Figure 2. Associated with each FEC is a traffic characterization that defines the QoS requirements for that flow. MPLS is a connection-oriented technology. a specific path through the network of LSRs is defined. Thus. Therefore. called Label Switched Routers (LSRs). Labels define a flow of packets between two endpoints or. but rather simply forward each packet based on its label value. Page 2 . in the case of multicast. depicts the operation of MPLS within a domain of MPLS-enabled routers. based on one in [4]. that are capable of switching and routing packets on the basis of a label which has been appended to each packet. The LSRs do not need to examine or process the IP header.Network recovery.

b Labels must be assigned to the packets for a particular FEC. and forwards the packet. 3. . Prior to the routing and delivery of packets in a given FEC. Because the use of globally unique labels would impose a management burden and limit the number of usable labels. The LSR assigns this packet to a particular FEC. as discussed subsequently. and (2) what queuing and discarding policy to establish at each LSR for packets in this FEC.Network recovery. 2. such as OSPF. and therefore a particular LSP. Alternatively. Page 3 . the edge LSR must cooperate with the other LSRs in defining a new LSP. must be defined and the QoS parameters along that path must be established. Within the MPLS domain. known as a Label Switched Path (LSP). defining its QoS. appends the appropriate label to the packet. A packet enters an MPLS domain through an ingress edge LSR where it is processed to determine which network-layer services it requires. labels have local significance only. To accomplish these tasks. b Forwards the packet to the next LSR along the LSP. as each LSR receives a labelled packet. reachability and routing information. a path through the network. Either of two protocols can be used for this purpose: the Label Distribution Protocol (LDP) or an enhanced version of RSVP. The QoS parameters determine (1) how many resources to commit to the path. protection and restoration for Multi-Protocol Label Switching (MPLS) 1. A network operator can specify explicit routes manually and assign the appropriate label values. two protocols are used to exchange the necessary information among routers: a An interior routing protocol. a protocol is used to determine the route and establish label values between adjacent LSRs. it: a Removes the incoming label and attaches the appropriate . If no LSP yet exists for this FEC. outgoing label to the packet. is used to exchange .

If this is achieved fast the failure can be unnoticeable (resilient) or minimal for end-users. The egress edge LSR strips the label. reads the IP packet header. protection and restoration for Multi-Protocol Label Switching (MPLS) . The path Page 4 . This is done by the adjacent routers to the failure that updates their forwarding tables to forward packets on a different path that avoids the failing component. Recovery techniques can be used in both Circuit Switched and Packet switched networks. When a link or node in a network fails. recovery is achieved by moving traffic from the failed part of the network to another portion of the network. Network recovery If a failure occurs in a network. traffic that was using the failed component must change the path used to reach the destination. and forwards the packet to its final destination.Network recovery. 4. It is important that this recovery operation can be performed as fast as possible to prevent too many packets from getting dropped at the failure point.

Third. Types of errors in networks Any of the resources within a network might fail. Which nodes that are notified of the failure depend on which recovery technique that is used. First. This step is called switchover and completes the repair of the network after a failure. nodes that detect the failure must notify certain nodes in the network of the failure. and the instance when the first bit of data that uses the backup path arrives at the receiver. Second. Forth. When a link fails on the path between the sender and the receiver. instead of sending traffic on the primary path a node called Path Switching Node must send traffic on the backup path instead. protection and restoration for Multi-Protocol Label Switching (MPLS) that the traffic was using before the failure is called the primary path or the working path and the new path is called the backup path. users experience service disruption until these four steps are completed. the network must be able to detect the failure. The length of the service disruption is the time between the last bits was sent before the failure occurred is received. The traditional error in a network is a link failure caused by a link getting cut or unplugged by mistake. Often recovery techniques consist of four steps. a backup path must be computed.Network recovery. If we consider an unicast communication. Page 5 . The total time of service disruption is: Service Disruption = Time to detect failure + Time to notify + Time to compute backup + Time to switchover. The following table shows the distribution of outage events in that study.

These outages are planned in time so they do not need to result in any downtime for traffic. Hardware failures are categorized as failures that happen in any control or system component of a router. Software failures in the control plane are failures in any of the routing control plane protocols like the IP routing protocol. Data plane software failures are failures in the forwarding software. as router processor. The unplanned events are the failures that do affect network traffic as they are not planned and therefore have to be reacted upon when they occur.Network recovery. protection and restoration for Multi-Protocol Label Switching (MPLS) Table 1 : Failure types Planned events can be upgrades or configuration of the router OS. Page 6 . and in failure of a particular line card. routing/policy changes etc. It can also be replacement or reconfiguration of hardware. as traffic can be moved to another segment of the network before the outage occurs. new software releases. This includes patches. switch fabric. fans etc.

This is done by the node that detected the failure. As we can see from this table. new routing tables are updated and traffic can be rerouted in the network. When all nodes have finished the calculation. If a failure occurs then each router in the network has to be informed about the failure and the routing tables in each router have to be recomputed using a shortest path first algorithm. To detect when a failure has occurred. If a neighbor doesn't receive a preset number of hello messages in a time interval. then the routing protocol interprets this as a failure on the link to that node. it has therefore been common to install extra power supplies to eliminate the risk of power outage. routing protocols use timers. all paths that were using a failed link or node are rerouted through other links. Network layer recovery In packet switched networks like the Internet the recovery mechanisms rely on the capabilities of the routing protocols. link failures and control plane failures are the most common kind of failures. No device can be more reliable than its power supply. When a network failure has been detected then all the nodes in the network has to be notified about the failure. power outage is only 1% of the observed causes for outage in a network. protection and restoration for Multi-Protocol Label Switching (MPLS) Link failures are failures to links such as link cuts and failure in transmission equipment. which sends a LSA (Link State Advertisement) to all other nodes in the network. When routing tables has been recomputed and converted. Then each node has to perform a shortest path first calculation with the failed link pruned from the network. Page 7 . The routing protocol on a node periodically sends hello messages to its neighbor nodes. And as can see seen in the table.Network recovery.

but they do not guarantee the recovery time. when it is calculated and how it is setup depends on the recovery mechanism used. MPLS recovery operates by forwarding traffic on a new path around the point of failure in the network. There have been some proposals that the recovery time could be reduced by reducing the value of the hello message interval. If the interval is to small then it will result in an increased chance of network congestion causing loss of several consecutive hellos.Network recovery. The main reason for this is the time it takes before a failure is detected using the hello protocol. therefore routing protocols in IP run transparent as regards of type and structure of the physical layer. MPLS Network Recovery Mechanisms As in recovery mechanism for other layers. The recovery time strongly depends on the dimension of the network and on the routing protocol used. Then it will start to receive hello messages again and conclude that the link is up. This will not only lead to unnecessary routing changes but also increase the processing load on the nodes. There is no standard mechanism for exchange of information about network status between the IP layer and the lower layers. protection and restoration for Multi-Protocol Label Switching (MPLS) Routing protocols make the network able to survive one or multiple link or node failures. the network takes several tens of seconds before recovering from a failure. It will then flood the network with LSAs causing all the routers to make shortest path first calculations. thus a node may think that a link is down that is only congested. thus flooding the network with new LSAs. Where this path is placed. The IP layer relies on the physical layer that provides transport of IP packets between two points in the network. But it has been showed that a reduction of this interval can cause problems. With the current default settings in routing protocols as OSPF. Page 8 .

Page 9 .Network recovery. the ingress can try to find a link and node disjoint path to the egress and try to setup the LSP on the new path instead. If a failure is detected by the hello message state. Recovery Path Placement There are two main categories of where a recovery path is placed. If the LSP was strictly routed the ingress will periodically try to setup the path again. If recovery shall be performed faster other MPLS mechanisms need to be used that can try to minimize the failure notification time or backup path computation time. protection and restoration for Multi-Protocol Label Switching (MPLS) If no recovery mechanism is used in an MPLS domain. In the case of RSVP-TE. a failure can be detected by hello messages or by the RSVP-TE soft state mechanism. protection switching and local repair vs. These are called global repair and local repair. If a failure occurs on the LSP. When the failure is detected and notified to the ingress LSP it will try to setup the LSP again. Restoration time of this scheme is typically large and is only acceptable for best effort type of traffic. global repair. recovery is performed by the default action by the signalling protocol used to setup and maintain a LSP. Once the path is established the traffic can start to flow on the new LSP. the soft state will fail to update the reservation and depending on if it were the PATH or RESV message that failed. Or it can try to setup the same LSP again and let the node adjacent to the failure compute a new path from the point of failure to the egress. then a PathErr will be sent back to the ingress LSR. As described earlier the soft state mechanism in RSVP-TE keeps the LSP reserved by periodically sending PATH and RESV messages. a PathErr or ResvErr message will be sent back upstream to the ingress LSR of that LSP. until it becomes available. If the LSP was loosely routed. this is called best-effort restoration. The path recovery mechanisms in MPLS can be classified according to two criteria: recovery by rerouting vs.

Local repair can be setup in two different ways. restoration can be achieved faster. But it has the disadvantage that a FIS has to be propagated all the way back to the ingress LSR before recovery can start. Figure 7 : Global Recovery Path  Local Repair: The intent of local repair is to protect against a link failure or neighbour node failure and to minimize the amount of time required for failure propagation. with the obvious exception of the fault occurring at the ingress or egress LSR of the protected path segment. This has the advantage that all links and nodes on the working path are protected by a single recovery path. In local repair (also known as local recovery). In global repair end-to-end path recovery can be applied. Page 10 . where the working path is completely link and node disjoint from the protection path. If a repair can be performed local to the device that detects the failure.Network recovery. the POR is usually distant from the failure and needs to be notified by a FIS. the node immediately upstream of the failure is the node that initiates the recovery operation (PSL). In global repair. protection and restoration for Multi-Protocol Label Switching (MPLS)  Global Repair The intent of global repair is to protect against any link or node failure on a path or on a segment of a path.

If a failure occurs on the protected link in the working path then the protection path connects the PSL and PML with a path disjoint of the working path with regard to the failed link. in difference from link protection where only the link between two adjacent nodes is protected. protection and restoration for Multi-Protocol Label Switching (MPLS)  Link Recovery . In node protection the recovery path is disjoint from the working path in regard to the protected node and the links from the protected node.  Node Recovery . In node protection there will be one or more hops between PSL and PML.In link recovery the goal is to protect a link in the working path from failure.In node recovery the goal is to protect a node in the working path from failure. Figure 8: Link Recovery Figure 9 : Node Protection Page 11 .Network recovery.

It is a local method and is transparent to the Ingress Node. On the other hand this method has an elevated cost (in terms of time). irrespective of where along the working path a failure occurs. This method has the advantage of setting up only one backup path per working path. has to be provided with PSL functions. and is a centralized protection method. Protection is always activated at the Ingress Node. This method needs an alternate disjoint backup path for each active path (working path). an Ingress Node is responsible for resolving the restoration as the FIS arrives. If no reverse LSP is created the fault indication can only be activated as a Path Continuity Test. If we want to use an RNT as a fault indication method we have to provide a new LSP to reverse back the signal to the Ingress Node Figure 10 : Centralized model LSP segment restoration (local repair) With this method restoration starts from the point of the failure. The main advantage is that it Page 12 .Network recovery. protection and restoration for Multi-Protocol Label Switching (MPLS) Protection and Restoration Centralized model In this model. which means only one LSR. This means that failure information has to be propagated all the way back to the source node before a protection switch is activated. especially if a Path Continuity Test is used as a fault indication method.

since the reverse backup offers.Network recovery. This method is especially good in network scenarios where the traffic streams are very sensitive to packet losses. has to be provided with switchover function (PSL). Figure 11 : Local restoration Reverse backup The main idea of this method is to reverse traffic at the point of failure of the protected LSP back to the source switch of the protected path (Ingress Node) via a Reverse Backup LSP. at the same time. With this method. A PML should be provided too. this method offers transparency to the Ingress Node and faster restoration time than centralized mechanisms. but only for protection segments where a high degree of reliability is required. the LSR at the ingress of the failed link reroutes incoming traffic by redirecting this traffic into the alternative LSP and traversing the path in the opposite direction to the primary LSP. protection and restoration for Multi-Protocol Label Switching (MPLS) offers lower restoration time than the centralized model. supplying only protected path segments. Another drawback is the maintenance and creation of multiple LSP backups (one per protected domain). a way of transmitting the FIS to the Ingress Node and to the recovery traffic path. As soon as a failure along the protected path is detected. One Page 13 . On the other hand. where protection is required. This could report low resource utilization and a high development complexity. an added difficulty arises in that every LSR. Another advantage is that it simplifies fault indication. An intermediate solution could be the establishment of local backup.

Two backups per protected domain are needed. complete scenario construction is highly costly (in terms of time and resources). For example our protected domain could start with just a centralized method. Page 14 . Figure 12 : Reverse backup utilization Protection In network scenarios with a high degree of protection requirements. transparent to Ingress Node (due to local notification method). thus making available a new protection mechanisms. and as the protection requirements grows (a node falls repeatedly). as with the Centralized model. protection and restoration for Multi-Protocol Label Switching (MPLS) disadvantage could be poor resource utilization.Network recovery. the possibility of a multilevel fault management application could improve performance. These two methods can be activated at the same time. the local method will be applied. so intermediate scenarios could be built instead. Another drawback is the time taken to reverse fault indication to the Ingress Node. a new local backup could be provided. If a fault is located at node 4 or link 3-4. compared to the single method application. Nonetheless.

The specific development the creation and application of agents are beyond the scope of this paper. 6-7 fall (during restoration) traffic could be route to 1-2-3-7-5 avoiding links and node faults. bandwidth reservation. protection and restoration for Multi-Protocol Label Switching (MPLS) Another advantage of using multilevel protection domains occurs when in scenarios with multiple faults. These agents could be placed on every Ingress Node. fault indication. such as could be taken into account when elaborating upon more specific agent development. (fig 13-a) if node 4 falls (or LSPs 3-4 or 4-5 faults) and only a centralized backup LSP 1-2-7-5 is used and node 6 or links 1-6. or could be done automatically. We propose to analyse network survivability requirements (QoS requirements) and establish different protection levels. 13-b) occurs when applying local restoration and link 3-7 falls. LSP Backup creation. Complete scenario construction could be complex and could report low resource utilization. The development of this method could be highly costly (in terms of time and resources). and PML/PSL functions assignation could be carry out explicitly. Depending on the protection level for a specific MPLS backbone. and apply defined protection actions. For example. the development of a more or less complex scenario is constructed. Figures 13 (a). developing a centralized policy whereby these agents could analyse LSP statistics and network behaviours. if another backup mechanism (centralized model) is applied the faults are avoided. via agent application. via a network administrator. In this case. (b) : Multilevel protection application.Network recovery. method activation. Another example (fig. Page 15 . yet certain proposals.

Network recovery. they may be not used Disadvantages  Complex  Slow: require extra process time to setup path and reserve resource Characteristic: the resource are reserved and used after the failure Route : can be dynamically computed Resource Efficiency : High Time used : Long Reliability: can survive under multiplex faults Implementation: Complex Route: predetermined Resource Efficiency: Low Time used: Short Reliability: mainly for single fault Implementation: Simple Page 16 . except to signal (set up) the switches along  the pre-determined backup path (1:1or 1:N) Disadvantages • Usually can only recover from single link fault (what if the precomputed path fails?) • Inefficient usage of resource Path Characteristic: the resource are reserved before the failure. protection and restoration for Multi-Protocol Label Switching (MPLS) Figure 14 : Agent application to a dynamic multilevel MPLS protection domain Comparison between Protection and Restoration Restoration Protection Advantages  Usually can recover from multiplex element faults  More efficient usage of resource Advantages  Simple and Quick: especially if it uses 1+1  Do not require much extra process time.