You are on page 1of 16

Network recovery, protection and restoration for Multi-Protocol Label Switching (MPLS

)

Introduction to Multi-Protocol Label Switching (MPLS)
Multi-Protocol Label Switching (MPLS) is rapidly becoming a key technology for
use in core networks, including converged data and voice networks. MPLS does
not replace IP routing, but works alongside existing and future routing
technologies to provide very high-speed data forwarding between Label-Switched
Routers (LSRs) together with reservation of bandwidth for traffic flows with
differing Quality of Service (QoS) requirements. MPLS enhances the services that
can be provided by IP networks, offering scope for Traffic Engineering, guaranteed
QoS and Virtual Private Networks (VPNs).
MPLS uses a technique known as label switching to forward data through
the network. A small, fixed-format label is inserted in front of each data packet on
entry into the MPLS network. At each hop across the network, the packet is routed
based on the value of the incoming interface and label, and dispatched to an
outwards interface with a new label value. The path that data follows through a
network is defined by the transition in label values, as the label is swapped at
each LSR. Since the mapping between labels is constant at each LSR, the path is
determined by the initial label value. Such a path is called a Label Switched Path
(LSP).
MPLS Header Format



Label Value : 20-bit label value
Exp : experimental use – can indicate class of service
S : bottom of stack indicator – 1 for the bottom label, 0 otherwise
TTL : Time To Live

Figure 5 : MPLS label format
MPLS Operation
Page
1

in the case of multicast. protection and restoration for Multi-Protocol Label Switching (MPLS) An MPLS network or internet consists of a set of nodes. For each distinct flow. MPLS is a connection-oriented technology. called Label Switched Routers (LSRs). Therefore. a specific path through the network of LSRs is defined. Figure 6: MPLS Operation Figure 2. that are capable of switching and routing packets on the basis of a label which has been appended to each packet. the forwarding process is simpler than with an IP router. Labels define a flow of packets between two endpoints or.Network recovery. Page 2 . called a Forwarding Equivalence Class (FEC). but rather simply forward each packet based on its label value. The LSRs do not need to examine or process the IP header. Associated with each FEC is a traffic characterization that defines the QoS requirements for that flow. between a source endpoint and a multicast group of destination endpoints. Thus. based on one in [4]. The following are key elements of the operation. depicts the operation of MPLS within a domain of MPLS-enabled routers.

appends the appropriate label to the packet. If no LSP yet exists for this FEC. b Labels must be assigned to the packets for a particular FEC. . and (2) what queuing and discarding policy to establish at each LSR for packets in this FEC. and therefore a particular LSP. protection and restoration for Multi-Protocol Label Switching (MPLS) 1. a path through the network. The LSR assigns this packet to a particular FEC. Either of two protocols can be used for this purpose: the Label Distribution Protocol (LDP) or an enhanced version of RSVP. 3. must be defined and the QoS parameters along that path must be established. Prior to the routing and delivery of packets in a given FEC. two protocols are used to exchange the necessary information among routers: a An interior routing protocol. a protocol is used to determine the route and establish label values between adjacent LSRs. A network operator can specify explicit routes manually and assign the appropriate label values. outgoing label to the packet. Page 3 . known as a Label Switched Path (LSP). To accomplish these tasks. labels have local significance only. defining its QoS. Within the MPLS domain. 2. and forwards the packet.Network recovery. A packet enters an MPLS domain through an ingress edge LSR where it is processed to determine which network-layer services it requires. as discussed subsequently. b Forwards the packet to the next LSR along the LSP. as each LSR receives a labelled packet. The QoS parameters determine (1) how many resources to commit to the path. is used to exchange . Alternatively. Because the use of globally unique labels would impose a management burden and limit the number of usable labels. such as OSPF. the edge LSR must cooperate with the other LSRs in defining a new LSP. it: a Removes the incoming label and attaches the appropriate . reachability and routing information.

This is done by the adjacent routers to the failure that updates their forwarding tables to forward packets on a different path that avoids the failing component. 4. It is important that this recovery operation can be performed as fast as possible to prevent too many packets from getting dropped at the failure point. Recovery techniques can be used in both Circuit Switched and Packet switched networks. If this is achieved fast the failure can be unnoticeable (resilient) or minimal for end-users. recovery is achieved by moving traffic from the failed part of the network to another portion of the network. When a link or node in a network fails.Network recovery. reads the IP packet header. The egress edge LSR strips the label. Network recovery If a failure occurs in a network. and forwards the packet to its final destination. protection and restoration for Multi-Protocol Label Switching (MPLS) . The path Page 4 . traffic that was using the failed component must change the path used to reach the destination.

When a link fails on the path between the sender and the receiver. the network must be able to detect the failure. The traditional error in a network is a link failure caused by a link getting cut or unplugged by mistake.Network recovery. First. The total time of service disruption is: Service Disruption = Time to detect failure + Time to notify + Time to compute backup + Time to switchover. Often recovery techniques consist of four steps. a backup path must be computed. Types of errors in networks Any of the resources within a network might fail. Forth. protection and restoration for Multi-Protocol Label Switching (MPLS) that the traffic was using before the failure is called the primary path or the working path and the new path is called the backup path. Third. Second. Page 5 . users experience service disruption until these four steps are completed. If we consider an unicast communication. nodes that detect the failure must notify certain nodes in the network of the failure. The following table shows the distribution of outage events in that study. This step is called switchover and completes the repair of the network after a failure. and the instance when the first bit of data that uses the backup path arrives at the receiver. instead of sending traffic on the primary path a node called Path Switching Node must send traffic on the backup path instead. Which nodes that are notified of the failure depend on which recovery technique that is used. The length of the service disruption is the time between the last bits was sent before the failure occurred is received.

Page 6 . It can also be replacement or reconfiguration of hardware. as traffic can be moved to another segment of the network before the outage occurs. protection and restoration for Multi-Protocol Label Switching (MPLS) Table 1 : Failure types Planned events can be upgrades or configuration of the router OS. routing/policy changes etc. Data plane software failures are failures in the forwarding software. as router processor. and in failure of a particular line card. switch fabric. This includes patches. Hardware failures are categorized as failures that happen in any control or system component of a router. new software releases. fans etc. Software failures in the control plane are failures in any of the routing control plane protocols like the IP routing protocol. The unplanned events are the failures that do affect network traffic as they are not planned and therefore have to be reacted upon when they occur. These outages are planned in time so they do not need to result in any downtime for traffic.Network recovery.

which sends a LSA (Link State Advertisement) to all other nodes in the network. As we can see from this table. When a network failure has been detected then all the nodes in the network has to be notified about the failure. This is done by the node that detected the failure. link failures and control plane failures are the most common kind of failures. Network layer recovery In packet switched networks like the Internet the recovery mechanisms rely on the capabilities of the routing protocols. If a neighbor doesn't receive a preset number of hello messages in a time interval. then the routing protocol interprets this as a failure on the link to that node. power outage is only 1% of the observed causes for outage in a network. it has therefore been common to install extra power supplies to eliminate the risk of power outage.Network recovery. all paths that were using a failed link or node are rerouted through other links. And as can see seen in the table. To detect when a failure has occurred. Page 7 . If a failure occurs then each router in the network has to be informed about the failure and the routing tables in each router have to be recomputed using a shortest path first algorithm. new routing tables are updated and traffic can be rerouted in the network. routing protocols use timers. No device can be more reliable than its power supply. protection and restoration for Multi-Protocol Label Switching (MPLS) Link failures are failures to links such as link cuts and failure in transmission equipment. Then each node has to perform a shortest path first calculation with the failed link pruned from the network. When routing tables has been recomputed and converted. The routing protocol on a node periodically sends hello messages to its neighbor nodes. When all nodes have finished the calculation.

If the interval is to small then it will result in an increased chance of network congestion causing loss of several consecutive hellos. thus a node may think that a link is down that is only congested. The recovery time strongly depends on the dimension of the network and on the routing protocol used. With the current default settings in routing protocols as OSPF. thus flooding the network with new LSAs. MPLS Network Recovery Mechanisms As in recovery mechanism for other layers. Page 8 . The IP layer relies on the physical layer that provides transport of IP packets between two points in the network. But it has been showed that a reduction of this interval can cause problems. protection and restoration for Multi-Protocol Label Switching (MPLS) Routing protocols make the network able to survive one or multiple link or node failures. Then it will start to receive hello messages again and conclude that the link is up. Where this path is placed. It will then flood the network with LSAs causing all the routers to make shortest path first calculations. There is no standard mechanism for exchange of information about network status between the IP layer and the lower layers. There have been some proposals that the recovery time could be reduced by reducing the value of the hello message interval.Network recovery. MPLS recovery operates by forwarding traffic on a new path around the point of failure in the network. when it is calculated and how it is setup depends on the recovery mechanism used. This will not only lead to unnecessary routing changes but also increase the processing load on the nodes. therefore routing protocols in IP run transparent as regards of type and structure of the physical layer. The main reason for this is the time it takes before a failure is detected using the hello protocol. but they do not guarantee the recovery time. the network takes several tens of seconds before recovering from a failure.

until it becomes available.Network recovery. The path recovery mechanisms in MPLS can be classified according to two criteria: recovery by rerouting vs. If a failure occurs on the LSP. Once the path is established the traffic can start to flow on the new LSP. In the case of RSVP-TE. As described earlier the soft state mechanism in RSVP-TE keeps the LSP reserved by periodically sending PATH and RESV messages. Recovery Path Placement There are two main categories of where a recovery path is placed. Page 9 . These are called global repair and local repair. recovery is performed by the default action by the signalling protocol used to setup and maintain a LSP. the ingress can try to find a link and node disjoint path to the egress and try to setup the LSP on the new path instead. Restoration time of this scheme is typically large and is only acceptable for best effort type of traffic. then a PathErr will be sent back to the ingress LSR. If recovery shall be performed faster other MPLS mechanisms need to be used that can try to minimize the failure notification time or backup path computation time. When the failure is detected and notified to the ingress LSP it will try to setup the LSP again. protection and restoration for Multi-Protocol Label Switching (MPLS) If no recovery mechanism is used in an MPLS domain. global repair. the soft state will fail to update the reservation and depending on if it were the PATH or RESV message that failed. this is called best-effort restoration. If the LSP was strictly routed the ingress will periodically try to setup the path again. If the LSP was loosely routed. If a failure is detected by the hello message state. a failure can be detected by hello messages or by the RSVP-TE soft state mechanism. Or it can try to setup the same LSP again and let the node adjacent to the failure compute a new path from the point of failure to the egress. a PathErr or ResvErr message will be sent back upstream to the ingress LSR of that LSP. protection switching and local repair vs.

Figure 7 : Global Recovery Path  Local Repair: The intent of local repair is to protect against a link failure or neighbour node failure and to minimize the amount of time required for failure propagation. In global repair. restoration can be achieved faster. where the working path is completely link and node disjoint from the protection path. In local repair (also known as local recovery). But it has the disadvantage that a FIS has to be propagated all the way back to the ingress LSR before recovery can start. the POR is usually distant from the failure and needs to be notified by a FIS. In global repair end-to-end path recovery can be applied. If a repair can be performed local to the device that detects the failure. Local repair can be setup in two different ways.Network recovery. protection and restoration for Multi-Protocol Label Switching (MPLS)  Global Repair The intent of global repair is to protect against any link or node failure on a path or on a segment of a path. This has the advantage that all links and nodes on the working path are protected by a single recovery path. with the obvious exception of the fault occurring at the ingress or egress LSR of the protected path segment. the node immediately upstream of the failure is the node that initiates the recovery operation (PSL). Page 10 .

in difference from link protection where only the link between two adjacent nodes is protected. In node protection the recovery path is disjoint from the working path in regard to the protected node and the links from the protected node.In node recovery the goal is to protect a node in the working path from failure. Figure 8: Link Recovery Figure 9 : Node Protection Page 11 .In link recovery the goal is to protect a link in the working path from failure.  Node Recovery . If a failure occurs on the protected link in the working path then the protection path connects the PSL and PML with a path disjoint of the working path with regard to the failed link. In node protection there will be one or more hops between PSL and PML.Network recovery. protection and restoration for Multi-Protocol Label Switching (MPLS)  Link Recovery .

Protection is always activated at the Ingress Node. an Ingress Node is responsible for resolving the restoration as the FIS arrives. This method has the advantage of setting up only one backup path per working path. If no reverse LSP is created the fault indication can only be activated as a Path Continuity Test. The main advantage is that it Page 12 .Network recovery. If we want to use an RNT as a fault indication method we have to provide a new LSP to reverse back the signal to the Ingress Node Figure 10 : Centralized model LSP segment restoration (local repair) With this method restoration starts from the point of the failure. protection and restoration for Multi-Protocol Label Switching (MPLS) Protection and Restoration Centralized model In this model. This means that failure information has to be propagated all the way back to the source node before a protection switch is activated. has to be provided with PSL functions. This method needs an alternate disjoint backup path for each active path (working path). especially if a Path Continuity Test is used as a fault indication method. and is a centralized protection method. irrespective of where along the working path a failure occurs. It is a local method and is transparent to the Ingress Node. On the other hand this method has an elevated cost (in terms of time). which means only one LSR.

An intermediate solution could be the establishment of local backup. supplying only protected path segments. One Page 13 . but only for protection segments where a high degree of reliability is required. As soon as a failure along the protected path is detected. Another advantage is that it simplifies fault indication. a way of transmitting the FIS to the Ingress Node and to the recovery traffic path. This could report low resource utilization and a high development complexity. an added difficulty arises in that every LSR. A PML should be provided too.Network recovery. the LSR at the ingress of the failed link reroutes incoming traffic by redirecting this traffic into the alternative LSP and traversing the path in the opposite direction to the primary LSP. has to be provided with switchover function (PSL). Another drawback is the maintenance and creation of multiple LSP backups (one per protected domain). protection and restoration for Multi-Protocol Label Switching (MPLS) offers lower restoration time than the centralized model. With this method. since the reverse backup offers. On the other hand. This method is especially good in network scenarios where the traffic streams are very sensitive to packet losses. this method offers transparency to the Ingress Node and faster restoration time than centralized mechanisms. at the same time. where protection is required. Figure 11 : Local restoration Reverse backup The main idea of this method is to reverse traffic at the point of failure of the protected LSP back to the source switch of the protected path (Ingress Node) via a Reverse Backup LSP.

and as the protection requirements grows (a node falls repeatedly). Two backups per protected domain are needed. the possibility of a multilevel fault management application could improve performance. These two methods can be activated at the same time. thus making available a new protection mechanisms.Network recovery. Another drawback is the time taken to reverse fault indication to the Ingress Node. as with the Centralized model. For example our protected domain could start with just a centralized method. complete scenario construction is highly costly (in terms of time and resources). If a fault is located at node 4 or link 3-4. so intermediate scenarios could be built instead. compared to the single method application. Nonetheless. Page 14 . protection and restoration for Multi-Protocol Label Switching (MPLS) disadvantage could be poor resource utilization. a new local backup could be provided. Figure 12 : Reverse backup utilization Protection In network scenarios with a high degree of protection requirements. the local method will be applied. transparent to Ingress Node (due to local notification method).

Depending on the protection level for a specific MPLS backbone. method activation. The specific development the creation and application of agents are beyond the scope of this paper.Network recovery. yet certain proposals. if another backup mechanism (centralized model) is applied the faults are avoided. the development of a more or less complex scenario is constructed. such as could be taken into account when elaborating upon more specific agent development. Another example (fig. (fig 13-a) if node 4 falls (or LSPs 3-4 or 4-5 faults) and only a centralized backup LSP 1-2-7-5 is used and node 6 or links 1-6. fault indication. Figures 13 (a). and apply defined protection actions. 6-7 fall (during restoration) traffic could be route to 1-2-3-7-5 avoiding links and node faults. In this case. We propose to analyse network survivability requirements (QoS requirements) and establish different protection levels. bandwidth reservation. developing a centralized policy whereby these agents could analyse LSP statistics and network behaviours. Complete scenario construction could be complex and could report low resource utilization. or could be done automatically. Page 15 . via agent application. 13-b) occurs when applying local restoration and link 3-7 falls. The development of this method could be highly costly (in terms of time and resources). LSP Backup creation. For example. via a network administrator. (b) : Multilevel protection application. and PML/PSL functions assignation could be carry out explicitly. protection and restoration for Multi-Protocol Label Switching (MPLS) Another advantage of using multilevel protection domains occurs when in scenarios with multiple faults. These agents could be placed on every Ingress Node.

Network recovery. they may be not used Disadvantages  Complex  Slow: require extra process time to setup path and reserve resource Characteristic: the resource are reserved and used after the failure Route : can be dynamically computed Resource Efficiency : High Time used : Long Reliability: can survive under multiplex faults Implementation: Complex Route: predetermined Resource Efficiency: Low Time used: Short Reliability: mainly for single fault Implementation: Simple Page 16 . protection and restoration for Multi-Protocol Label Switching (MPLS) Figure 14 : Agent application to a dynamic multilevel MPLS protection domain Comparison between Protection and Restoration Restoration Protection Advantages  Usually can recover from multiplex element faults  More efficient usage of resource Advantages  Simple and Quick: especially if it uses 1+1  Do not require much extra process time. except to signal (set up) the switches along  the pre-determined backup path (1:1or 1:N) Disadvantages • Usually can only recover from single link fault (what if the precomputed path fails?) • Inefficient usage of resource Path Characteristic: the resource are reserved before the failure.