Cloud

Airports Authority Of India Achieves Continuous Operations Despite Natural Calamity


Dependence On Technology Is Growing, Yet Few Firms Are Resilient

Global weather patterns are more unpredictable than ever. Firms continue to experience costly technology outages due to catastrophic weather events like cyclones, hurricanes, floods, and tornados. Forward-thinking I&O pros need to account for such disasters in their planning and proactively prepare a comprehensive DR plan — yet most do not. I&O pros lag because they don’t document every possible recovery plan corresponding to disasters that can hit them. Two-thirds of I&O pros test their recovery
plans annually or less frequently. Even though businesses increasingly rely on technology and know that long outages can cost them dearly, they don’t leverage automation tools and procedures for a speedier recovery.4 As a result, only a slim majority feels prepared for major disruptions.

Situation: Chennai Airport Was Underwater And Quickly Losing Power

On December 3, 2015, Chennai experienced its heaviest rainstorm in a century. Massive flooding submerged the entire city, incapacitating its infrastructure.6 At Chennai International Airport, water not only engulfed the tarmac and bays, but the entire facility, blocking any movement in or out; the airport quickly lost power. In the Chennai data center, the Airports Authority of India (AAI) runs a highly critical application, the airport management system (AMS), for functions including airport operations,
passenger service, airline communication, and revenue generation. If this data center loses power and connectivity, AMS goes down, and airport operations must resort to daunting manual processes to manage their facilities. Hence, recovering this business-critical application is imperative; all operations teams must race against the clock to deliver seamless service to all stakeholders. AMS is regarded as the lifeline of airport operations, as it:

Serves multiple airports. Hosted in a multitenant fashion, AMS, which delivers services to 10 airports via its centralized facility in Chennai, is the nerve center of the airport operations control center (AOCC) at all 10 airports. The Chennai data center hosts the primary instance of AMS; the recovery home is in Kolkata.

Is the key to operational safety. Air traffic control (ATC) communicates with pilots to inform them about what gates and runways inbound and outbound flights should use. ATC allocates gates to aircraft depending on factors like their type, size, arrival or departure time, and whether the gate is equipped to handle the aircraft. The system typically automates this resource allocation to avoid mistakes.

Provides passengers with real-time updates. All passenger communication — such as information regarding flight times, departure gates, and baggage carousels — depends on the data reported through AMS.
Accurately reports resource usage data for airlines to settle payments with AAI. Airlines are charged for the resources they use. AAI tracks the utilization of resources like aerobridges, bays, carousels, check-in counters, gates, hangars, and lounges for billing purposes. AMS holds this data and is used for communication with the airport operations team. Airlines also use AMS to
review operational key performance indicators (KPIs) like on-time performance, ramp congestion, rescheduled or swapped flights, and arrival and departure information.

Approach: Comprehensive Planning, Automation, And Frequent Testing

In June 2013, AAI issued a request for proposal to build and operate the airport operations service for seven years. It selected systems integrator Coforge, the erstwhile  NIIT Technologies (NIIT) to implement AMS and establish AOCC at all 10 airports. AAI decided to run AMS in multitenant mode at its Chennai data center and chose Kolkata as the recovery site, with NIIT operating the data center.

AAI Needed Availability, Disaster Preparedness, And Quick Recovery The AAI team worked with NIIT to develop a comprehensive DR and business continuity plan, ensure recovery readiness, and prepare to handle any outage. This plan had several components; to achieve its goals, AAI started by:

Clearly defining service-level agreements (SLAs) for availability. AAI had clearly articulated expectations of availability KPIs — both at the application level and at the level of every infrastructure component. It worked with the provider team to identify possible failure scenarios, response plans, and testing frequency. For example, it stipulated that the AMS application should have 99% uptime — meaning that it could face a total downtime of about 15 minutes every day.

The Comprehensive Planning, Recovery Automation, And Frequent Testing Of AAI And Coforge, erstwhile NIIT Technologies Delivers Results

Enforcing the execution of quarterly test drills and saving the results for audit. AAI demanded that the systems integrator test its DR preparedness on a quarterly basis, document every DR test, and retain DR drill results.9 These drill results had to be available for audit purposes and to measure sustainable operations.

Improving its recovery readiness. AAI improved its recovery readiness.10 It stipulated that DR test results must prove that they were successful 100% of the time. Every time the NIIT operations team runs the DR test, the DR site should run the active instance for 48 hours before it swings the primary instance back into service. This forces the systems integrator to update the DR setup every time there is a change in the production environment.

The Systems Integrator Upped The Ante By Preparing For Even More Stringent Timelines. The solution included building redundancies to handle local failures. To address the recovery scenarios, the NIIT project team prepared for even more stringent recovery targets than AAI specified. It developed the processes to align to AAI’s demands, and its operations team prepared to handle any recovery operation by making sure it was able to:

Beat the recovery time SLA by automating the recovery workflows. The operations team had to comply with recovery times ranging from 15 minutes to 24 hours depending upon incident severity. Even a 60- to 90-minute outage would cause an inconsistent experience at various levels for all stakeholders, so the team decided to automate recovery procedures regardless of incident
severity. Automation ensured that a site failure could easily be recovered in less than 15 minutes.

Triggered the recovery workflow. The tech operations team geared up to execute its best practices and pull the trigger for the final run; each team member was on a hotline, keeping a close watch on the progress. They triggered the workflow for an orchestrated recovery effort. The recovery instance of the AMS application went live in the Kolkata data center within 11 minutes. While the Chennai airport was out of commission, the fully operational AMS at the Kolkata recovery data center continued to serve the other nine airports. “The Chennai deluge was sudden, but none of the airports lost AMS services. This ensured uninterrupted airport services — as if nothing had changed — while the Chennai airport took time to resume normal operations.” (S.V. Satish, executive director of IT, Airports Authority of India)

Were ready for surprises and applied out-of-the-box thinking. Every incident offers some lessons and this one was no exception. On the day of the event, the MPLS network of both the primary and secondary network providers went down. In the absence of enterprise network connectivity, the teams looked for an innovative solution, ultimately making the connection via the mobile phone network. Keeping its sights on a quick recovery, the network operations team successfully connected the primary and recovery sites using the mobile network and failed over the entire application to that network.

What It Means

Disasters Can Happen Anywhere — You Must Be Ready. Disasters come in many forms, their impact can vary widely in scope, and they happen everywhere. It could be a natural disaster like the Chennai floods or it could be the result of human (in)action, whether malicious or accidental. As we were preparing this report, new headlines announced a fire that took Atlanta’s Hartsfield-Jackson International Airport (the busiest in the world) offline for a full day and an Amtrak train derailment south of Seattle that resulted in multiple fatalities and business disruption. Unlike the Chennai incident, the Atlanta and Amtrak events were preventable. As the Boy Scout motto says, “Be prepared.”

 

Get in Touch
Americas+
8 + 5 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
Americas & Canada
Coforge Inc.

New Jersey

502 Carnegie Center Drive
Suite #301
Princeton, NJ 08540
Ph: 770-290-6113

Europe+
17 + 3 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
Europe
Coforge UK Ltd.

2nd Floor, 47 Mark Lane,
London - EC3R 7QQ, U.K.
Ph: 770-290-6113
Fax: +44 (0) 20 70020701

Asia+
1 + 7 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
Asia
Coforge Ltd.

SEZ Developer Unit

Plot No. TZ-2 & 2A, Sector Tech Zone,
Greater Noida, UP 201308, India
Ph: +91 (120) 459 2300
Fax: +91 (120) 459 2301

Rest of the World+
4 + 9 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
Rest of the World
Coforge Ltd.

SEZ Developer Unit

Plot No. TZ-2 & 2A, Sector Tech Zone,
Greater Noida, UP 201308, India
Ph: +91 (120) 459 2300
Fax: +91 (120) 459 2301

Explore our resource library