You are on page 1of 16

Network recovery, protection and restoration for Multi-Protocol Label Switching (MPLS


Introduction to Multi-Protocol Label Switching (MPLS)
Multi-Protocol Label Switching (MPLS) is rapidly becoming a key technology for
use in core networks, including converged data and voice networks. MPLS does
not replace IP routing, but works alongside existing and future routing
technologies to provide very high-speed data forwarding between Label-Switched
Routers (LSRs) together with reservation of bandwidth for traffic flows with
differing Quality of Service (QoS) requirements. MPLS enhances the services that
can be provided by IP networks, offering scope for Traffic Engineering, guaranteed
QoS and Virtual Private Networks (VPNs).
MPLS uses a technique known as label switching to forward data through
the network. A small, fixed-format label is inserted in front of each data packet on
entry into the MPLS network. At each hop across the network, the packet is routed
based on the value of the incoming interface and label, and dispatched to an
outwards interface with a new label value. The path that data follows through a
network is defined by the transition in label values, as the label is swapped at
each LSR. Since the mapping between labels is constant at each LSR, the path is
determined by the initial label value. Such a path is called a Label Switched Path
MPLS Header Format

Label Value : 20-bit label value
Exp : experimental use – can indicate class of service
S : bottom of stack indicator – 1 for the bottom label, 0 otherwise
TTL : Time To Live

Figure 5 : MPLS label format
MPLS Operation

in the case of multicast. The LSRs do not need to examine or process the IP header. protection and restoration for Multi-Protocol Label Switching (MPLS) An MPLS network or internet consists of a set of nodes. the forwarding process is simpler than with an IP router. Thus. between a source endpoint and a multicast group of destination endpoints.Network recovery. Page 2 . MPLS is a connection-oriented technology. that are capable of switching and routing packets on the basis of a label which has been appended to each packet. Therefore. Labels define a flow of packets between two endpoints or. based on one in [4]. a specific path through the network of LSRs is defined. For each distinct flow. depicts the operation of MPLS within a domain of MPLS-enabled routers. Associated with each FEC is a traffic characterization that defines the QoS requirements for that flow. Figure 6: MPLS Operation Figure 2. The following are key elements of the operation. called Label Switched Routers (LSRs). called a Forwarding Equivalence Class (FEC). but rather simply forward each packet based on its label value.

labels have local significance only. Page 3 . To accomplish these tasks. such as OSPF. Because the use of globally unique labels would impose a management burden and limit the number of usable labels. The QoS parameters determine (1) how many resources to commit to the path. as discussed subsequently. and forwards the packet. it: a Removes the incoming label and attaches the appropriate . known as a Label Switched Path (LSP). The LSR assigns this packet to a particular FEC. a protocol is used to determine the route and establish label values between adjacent LSRs. must be defined and the QoS parameters along that path must be established. b Labels must be assigned to the packets for a particular FEC. outgoing label to the packet. the edge LSR must cooperate with the other LSRs in defining a new LSP. protection and restoration for Multi-Protocol Label Switching (MPLS) 1. A packet enters an MPLS domain through an ingress edge LSR where it is processed to determine which network-layer services it requires. two protocols are used to exchange the necessary information among routers: a An interior routing protocol. Either of two protocols can be used for this purpose: the Label Distribution Protocol (LDP) or an enhanced version of RSVP. a path through the network. defining its QoS. 2. Within the MPLS domain. 3. is used to exchange . and (2) what queuing and discarding policy to establish at each LSR for packets in this FEC. Alternatively. and therefore a particular LSP. A network operator can specify explicit routes manually and assign the appropriate label values. Prior to the routing and delivery of packets in a given FEC. appends the appropriate label to the packet. as each LSR receives a labelled packet. b Forwards the packet to the next LSR along the LSP. . If no LSP yet exists for this FEC. reachability and routing information.Network recovery.

and forwards the packet to its final destination.Network recovery. If this is achieved fast the failure can be unnoticeable (resilient) or minimal for end-users. traffic that was using the failed component must change the path used to reach the destination. The egress edge LSR strips the label. recovery is achieved by moving traffic from the failed part of the network to another portion of the network. reads the IP packet header. It is important that this recovery operation can be performed as fast as possible to prevent too many packets from getting dropped at the failure point. 4. When a link or node in a network fails. Network recovery If a failure occurs in a network. The path Page 4 . This is done by the adjacent routers to the failure that updates their forwarding tables to forward packets on a different path that avoids the failing component. protection and restoration for Multi-Protocol Label Switching (MPLS) . Recovery techniques can be used in both Circuit Switched and Packet switched networks.

and the instance when the first bit of data that uses the backup path arrives at the receiver. This step is called switchover and completes the repair of the network after a failure. If we consider an unicast communication. The traditional error in a network is a link failure caused by a link getting cut or unplugged by mistake. Often recovery techniques consist of four steps. Third. The length of the service disruption is the time between the last bits was sent before the failure occurred is received. nodes that detect the failure must notify certain nodes in the network of the failure. protection and restoration for Multi-Protocol Label Switching (MPLS) that the traffic was using before the failure is called the primary path or the working path and the new path is called the backup path.Network recovery. instead of sending traffic on the primary path a node called Path Switching Node must send traffic on the backup path instead. Forth. The total time of service disruption is: Service Disruption = Time to detect failure + Time to notify + Time to compute backup + Time to switchover. The following table shows the distribution of outage events in that study. Types of errors in networks Any of the resources within a network might fail. First. a backup path must be computed. When a link fails on the path between the sender and the receiver. Second. the network must be able to detect the failure. users experience service disruption until these four steps are completed. Which nodes that are notified of the failure depend on which recovery technique that is used. Page 5 .

The unplanned events are the failures that do affect network traffic as they are not planned and therefore have to be reacted upon when they occur. These outages are planned in time so they do not need to result in any downtime for traffic. and in failure of a particular line card. fans etc. Software failures in the control plane are failures in any of the routing control plane protocols like the IP routing protocol. This includes patches. protection and restoration for Multi-Protocol Label Switching (MPLS) Table 1 : Failure types Planned events can be upgrades or configuration of the router OS. Data plane software failures are failures in the forwarding software. It can also be replacement or reconfiguration of hardware. as traffic can be moved to another segment of the network before the outage occurs. routing/policy changes etc. as router processor.Network recovery. new software releases. switch fabric. Hardware failures are categorized as failures that happen in any control or system component of a router. Page 6 .

Then each node has to perform a shortest path first calculation with the failed link pruned from the network. Page 7 . then the routing protocol interprets this as a failure on the link to that node. The routing protocol on a node periodically sends hello messages to its neighbor nodes. No device can be more reliable than its power supply. link failures and control plane failures are the most common kind of failures. all paths that were using a failed link or node are rerouted through other links. routing protocols use timers. And as can see seen in the table. it has therefore been common to install extra power supplies to eliminate the risk of power outage. To detect when a failure has occurred. power outage is only 1% of the observed causes for outage in a network. This is done by the node that detected the failure. If a failure occurs then each router in the network has to be informed about the failure and the routing tables in each router have to be recomputed using a shortest path first algorithm.Network recovery. If a neighbor doesn't receive a preset number of hello messages in a time interval. new routing tables are updated and traffic can be rerouted in the network. When all nodes have finished the calculation. which sends a LSA (Link State Advertisement) to all other nodes in the network. When routing tables has been recomputed and converted. Network layer recovery In packet switched networks like the Internet the recovery mechanisms rely on the capabilities of the routing protocols. protection and restoration for Multi-Protocol Label Switching (MPLS) Link failures are failures to links such as link cuts and failure in transmission equipment. As we can see from this table. When a network failure has been detected then all the nodes in the network has to be notified about the failure.

Network recovery. thus flooding the network with new LSAs. The recovery time strongly depends on the dimension of the network and on the routing protocol used. Page 8 . protection and restoration for Multi-Protocol Label Switching (MPLS) Routing protocols make the network able to survive one or multiple link or node failures. when it is calculated and how it is setup depends on the recovery mechanism used. With the current default settings in routing protocols as OSPF. thus a node may think that a link is down that is only congested. Where this path is placed. There have been some proposals that the recovery time could be reduced by reducing the value of the hello message interval. but they do not guarantee the recovery time. MPLS recovery operates by forwarding traffic on a new path around the point of failure in the network. The main reason for this is the time it takes before a failure is detected using the hello protocol. MPLS Network Recovery Mechanisms As in recovery mechanism for other layers. If the interval is to small then it will result in an increased chance of network congestion causing loss of several consecutive hellos. It will then flood the network with LSAs causing all the routers to make shortest path first calculations. Then it will start to receive hello messages again and conclude that the link is up. the network takes several tens of seconds before recovering from a failure. But it has been showed that a reduction of this interval can cause problems. The IP layer relies on the physical layer that provides transport of IP packets between two points in the network. This will not only lead to unnecessary routing changes but also increase the processing load on the nodes. There is no standard mechanism for exchange of information about network status between the IP layer and the lower layers. therefore routing protocols in IP run transparent as regards of type and structure of the physical layer.

As described earlier the soft state mechanism in RSVP-TE keeps the LSP reserved by periodically sending PATH and RESV messages. until it becomes available. Restoration time of this scheme is typically large and is only acceptable for best effort type of traffic. global repair. The path recovery mechanisms in MPLS can be classified according to two criteria: recovery by rerouting vs. If the LSP was strictly routed the ingress will periodically try to setup the path again. Page 9 . then a PathErr will be sent back to the ingress LSR. the soft state will fail to update the reservation and depending on if it were the PATH or RESV message that failed. recovery is performed by the default action by the signalling protocol used to setup and maintain a LSP.Network recovery. If the LSP was loosely routed. Or it can try to setup the same LSP again and let the node adjacent to the failure compute a new path from the point of failure to the egress. this is called best-effort restoration. If recovery shall be performed faster other MPLS mechanisms need to be used that can try to minimize the failure notification time or backup path computation time. the ingress can try to find a link and node disjoint path to the egress and try to setup the LSP on the new path instead. If a failure is detected by the hello message state. a PathErr or ResvErr message will be sent back upstream to the ingress LSR of that LSP. When the failure is detected and notified to the ingress LSP it will try to setup the LSP again. protection and restoration for Multi-Protocol Label Switching (MPLS) If no recovery mechanism is used in an MPLS domain. If a failure occurs on the LSP. Recovery Path Placement There are two main categories of where a recovery path is placed. These are called global repair and local repair. In the case of RSVP-TE. a failure can be detected by hello messages or by the RSVP-TE soft state mechanism. Once the path is established the traffic can start to flow on the new LSP. protection switching and local repair vs.

Local repair can be setup in two different ways. In local repair (also known as local recovery). restoration can be achieved faster. with the obvious exception of the fault occurring at the ingress or egress LSR of the protected path segment. Page 10 .Network recovery. protection and restoration for Multi-Protocol Label Switching (MPLS)  Global Repair The intent of global repair is to protect against any link or node failure on a path or on a segment of a path. where the working path is completely link and node disjoint from the protection path. This has the advantage that all links and nodes on the working path are protected by a single recovery path. In global repair. the POR is usually distant from the failure and needs to be notified by a FIS. But it has the disadvantage that a FIS has to be propagated all the way back to the ingress LSR before recovery can start. In global repair end-to-end path recovery can be applied. If a repair can be performed local to the device that detects the failure. the node immediately upstream of the failure is the node that initiates the recovery operation (PSL). Figure 7 : Global Recovery Path  Local Repair: The intent of local repair is to protect against a link failure or neighbour node failure and to minimize the amount of time required for failure propagation.

protection and restoration for Multi-Protocol Label Switching (MPLS)  Link Recovery .In node recovery the goal is to protect a node in the working path from failure. In node protection there will be one or more hops between PSL and PML.Network recovery.In link recovery the goal is to protect a link in the working path from failure. In node protection the recovery path is disjoint from the working path in regard to the protected node and the links from the protected node. Figure 8: Link Recovery Figure 9 : Node Protection Page 11 . in difference from link protection where only the link between two adjacent nodes is protected. If a failure occurs on the protected link in the working path then the protection path connects the PSL and PML with a path disjoint of the working path with regard to the failed link.  Node Recovery .

irrespective of where along the working path a failure occurs. which means only one LSR. On the other hand this method has an elevated cost (in terms of time). If we want to use an RNT as a fault indication method we have to provide a new LSP to reverse back the signal to the Ingress Node Figure 10 : Centralized model LSP segment restoration (local repair) With this method restoration starts from the point of the failure. protection and restoration for Multi-Protocol Label Switching (MPLS) Protection and Restoration Centralized model In this model.Network recovery. The main advantage is that it Page 12 . If no reverse LSP is created the fault indication can only be activated as a Path Continuity Test. has to be provided with PSL functions. This method needs an alternate disjoint backup path for each active path (working path). and is a centralized protection method. especially if a Path Continuity Test is used as a fault indication method. an Ingress Node is responsible for resolving the restoration as the FIS arrives. This means that failure information has to be propagated all the way back to the source node before a protection switch is activated. This method has the advantage of setting up only one backup path per working path. Protection is always activated at the Ingress Node. It is a local method and is transparent to the Ingress Node.

where protection is required. Figure 11 : Local restoration Reverse backup The main idea of this method is to reverse traffic at the point of failure of the protected LSP back to the source switch of the protected path (Ingress Node) via a Reverse Backup LSP. supplying only protected path segments. since the reverse backup offers. protection and restoration for Multi-Protocol Label Switching (MPLS) offers lower restoration time than the centralized model.Network recovery. One Page 13 . This method is especially good in network scenarios where the traffic streams are very sensitive to packet losses. this method offers transparency to the Ingress Node and faster restoration time than centralized mechanisms. but only for protection segments where a high degree of reliability is required. the LSR at the ingress of the failed link reroutes incoming traffic by redirecting this traffic into the alternative LSP and traversing the path in the opposite direction to the primary LSP. Another drawback is the maintenance and creation of multiple LSP backups (one per protected domain). has to be provided with switchover function (PSL). As soon as a failure along the protected path is detected. A PML should be provided too. With this method. an added difficulty arises in that every LSR. This could report low resource utilization and a high development complexity. An intermediate solution could be the establishment of local backup. Another advantage is that it simplifies fault indication. at the same time. a way of transmitting the FIS to the Ingress Node and to the recovery traffic path. On the other hand.

the local method will be applied. the possibility of a multilevel fault management application could improve performance. Figure 12 : Reverse backup utilization Protection In network scenarios with a high degree of protection requirements. Page 14 . complete scenario construction is highly costly (in terms of time and resources). If a fault is located at node 4 or link 3-4. so intermediate scenarios could be built instead. a new local backup could be provided. Another drawback is the time taken to reverse fault indication to the Ingress Node. These two methods can be activated at the same time. Nonetheless. thus making available a new protection mechanisms. compared to the single method application. protection and restoration for Multi-Protocol Label Switching (MPLS) disadvantage could be poor resource utilization. transparent to Ingress Node (due to local notification method). and as the protection requirements grows (a node falls repeatedly). For example our protected domain could start with just a centralized method. as with the Centralized model.Network recovery. Two backups per protected domain are needed.

Depending on the protection level for a specific MPLS backbone. the development of a more or less complex scenario is constructed. developing a centralized policy whereby these agents could analyse LSP statistics and network behaviours. protection and restoration for Multi-Protocol Label Switching (MPLS) Another advantage of using multilevel protection domains occurs when in scenarios with multiple faults. (fig 13-a) if node 4 falls (or LSPs 3-4 or 4-5 faults) and only a centralized backup LSP 1-2-7-5 is used and node 6 or links 1-6. if another backup mechanism (centralized model) is applied the faults are avoided. via a network administrator. such as could be taken into account when elaborating upon more specific agent development. Another example (fig. via agent application. 13-b) occurs when applying local restoration and link 3-7 falls. LSP Backup creation. Figures 13 (a).Network recovery. These agents could be placed on every Ingress Node. We propose to analyse network survivability requirements (QoS requirements) and establish different protection levels. The specific development the creation and application of agents are beyond the scope of this paper. bandwidth reservation. Complete scenario construction could be complex and could report low resource utilization. Page 15 . In this case. 6-7 fall (during restoration) traffic could be route to 1-2-3-7-5 avoiding links and node faults. and apply defined protection actions. or could be done automatically. method activation. (b) : Multilevel protection application. and PML/PSL functions assignation could be carry out explicitly. yet certain proposals. For example. fault indication. The development of this method could be highly costly (in terms of time and resources).

except to signal (set up) the switches along  the pre-determined backup path (1:1or 1:N) Disadvantages • Usually can only recover from single link fault (what if the precomputed path fails?) • Inefficient usage of resource Path Characteristic: the resource are reserved before the failure.Network recovery. they may be not used Disadvantages  Complex  Slow: require extra process time to setup path and reserve resource Characteristic: the resource are reserved and used after the failure Route : can be dynamically computed Resource Efficiency : High Time used : Long Reliability: can survive under multiplex faults Implementation: Complex Route: predetermined Resource Efficiency: Low Time used: Short Reliability: mainly for single fault Implementation: Simple Page 16 . protection and restoration for Multi-Protocol Label Switching (MPLS) Figure 14 : Agent application to a dynamic multilevel MPLS protection domain Comparison between Protection and Restoration Restoration Protection Advantages  Usually can recover from multiplex element faults  More efficient usage of resource Advantages  Simple and Quick: especially if it uses 1+1  Do not require much extra process time.