Transforming Operational Acceptance Testing

Abstract

How often we have heard that the system is not working properly even after delivery? What are we leaving out? Did we prepare our people well in advance to ensure our systems work? While all these questions seem pretty simple to answer, they are evidently not. We need to keep track of these issues before we make our systems live. Systems usually fail at the end point—operational acceptance tests. IT managers must harness a metrics-driven approach for Operational Acceptance Testing (OAT) to ensure that the system operates the way it is designed without disrupting the installation, network, or business that uses it.

Recognizing the Face of Risk

A rapidly changing IT landscape has increased the infrastructure complexity and cost due to increase in the number of applications to be rationalized. The applications may work well in isolation but when integrated to work as a service, increase the risk of unplanned downtime, while reducing overall revenue, causing non-conformance to regulatory requirements and reputational damage, and making customers unhappy

Applications must be carefully tested before a system is released. However, the traditional testing lifecycle deals only with the functional and performance aspects. It gives rise to gaps in operational testing activities due to complex requirements in a changing landscape. In order to deal with these challenges, ﬁrms need to collectively test infrastructure and applications to uncover issues such as unplanned outage and recovery prior to go-live. According to an Information Technology and Intelligence Corporation survey, companies cannot achieve zero downtime; one out of 10 companies needs greater than 99.999% availability. Operational Acceptance Testing reduces downtime and meets the business goals of faster system delivery at a lower cost.

This white paper talks about how companies can deal with downtime and achieve high availability through a metrics-driven approach for Operational Acceptance Testing.

Early Bird Catches the Worm

Operational Acceptance Testing is done before going live to guarantee that the entire conﬁguration on the production system is done accurately. Database, servers, and code are deployed prior to running a batch program in the pre-production environment to circumvent any hazards in the live environment. Application functionality tests on the base infrastructure are also performed before system go-live stage.

niit_42

Exhibit 1: Operational Acceptance Tests

OAT as an Operational Metrics-Driven Methodology

According to the IT Process Institute's Visible Ops Handbook, 80% of unplanned outages are due to changes and conﬁgurations, and release integration issues. The Enterprise Management Association also reports that 60% of the problems result from poor conﬁgurations. Below are some of the factors that result in revenue loss due to unplanned downtime xhibit 2.

niit_43

Exhibit 2: Factors Influencing Revenue Loss Due to Unplanned Downtime

Mean Time Between Failure (MTBF) = Total # of uptime hours ÷ Total # of failures

MTBF=694 ÷ 6 = 115.66

Mean Time to Repair (MTTR) = Total # of downtime hours ÷ Total # of failures

MTTR = 26 ÷ 6= 4.33 .

IT Operational Metrics

Operational Acceptance Testing facilitates IT managers to baseline operational metrics by examining systems. This improves baseline measures to avoid unplanned

downtimes, to reduce the downtime window, and to minimize the downtime cost.

niit_44

Exhibit 3: IT Operational Metrics

Measuring Availability

Availability is the amount of time a system is functioning. It is often described in terms of nines with the greater number of nines indicating a higher availability rate (see Table 1 below).

table_1_transformingoperational

The key metrics involved in measuring the availability of the system are MTBF and MTTR.

Availability = [MTBF/(MTBF+MTTR)]* 100

Availability = [115.66/(115.66+4.33)]*100 = 96.39

So, the availability of the database system in this scenario is 96%.

Measuring Reliability

Reliability is the probability that the system is working properly within the deﬁned time period. Common reliability metrics include:

Probability of failure—the likelihood that a transaction request will fail
Rate of fault tolerance, which corresponds to failure intensity
Maximum downtime

Measuring Disaster Recovery

Recovery Time Objective (RTO): How quickly should critical services be restored?
Recovery Point Objective (RPO): Before the system fails, from what point should the data be available? How much data loss can be accommodated?
Recovery Cost Objective (RCO): What is your budget and can it meet your RTO and RPO? How much are you willing to spend on disaster recovery?

Get Granular with a V-Model Approach for OAT

The V-Model, with its elasticity, condenses the total duration of the proect lifecycle due to the number of activities working in parallel. This model has an edge over other models and enables IT managers to tackle operational requirements right from the initiation phase to the test completion phase. The key phases of the V-Model—veriﬁcation and validation—execute simultaneously. The veriﬁcation phase uses documents such as operational requirements, infrastructure design, and OAT test plan. This phase also keeps a check on the progress through inspections, formal reviews, and walkthroughs. In the validation phase, actual implementation of the veriﬁcation phase is carried out. Practical tests on system components, applications, and data are performed in this phase. The key deliverables of this phase include the OAT defects summary and OAT test completion report.

niit_46

Exhibit 4: V-Model Approach

Connecting Dots

In order to perform Operational Acceptance Testing, it is important to know which OAT components need to be tested. Once this is determined, components and applications together must meet high quality standards—an essential for the success of service delivery.

The key pre-requisites of OAT include

Live and Production environments
Presence of live applications on the production environment while carrying out OAT
Latest release versions of infrastructure components, hardware, operating system, database, and software patches
Similar hardware devices and network conditions in live and production environments
Skilled and experienced technical staﬀ working on various technologies

Apart from these key prerequisites, careful selection of environment, test strategy, and tools ensures cost-eﬀective OAT testing. Factors such as support for applicable platforms, script reusability, and total cost of ownership should also be taken into account. An optimal selection of target infrastructure can maximize test coverage.

How to Achieve Distinction with OAT

Below are some methods that help businesses achieve operational value with OAT:

Operations Focus: Thorough testing of operational aspects enables the OAT team to easily detect critical business operational defects.

Business Readiness Perspective: Testing applications and infrastructure, keeping the business readiness in mind, helps the OAT team to deliver an enhanced user experience post-system rollout.

Map Operational Impact: The OAT team should design tests based on the criticality of operational requirements. Tests ensure that optimal QA coverage is achieved with minimal risk.

Measure Operational Metrics: It is extremely important to track operational metrics instead of testing program metrics as it helps to justify credibility by quantifying and communicating value derived from OAT.

Deliver Reliable and Stable Systems with OAT

According to estimates from studies and surveys performed by IT industry analysts, on an average, businesses lose between $84,000 and $108,000 for every hour of IT system downtime. Industries such as banking and ﬁnancial services, telecommunications, manufacturing, and energy experience the highest

revenue loss during IT downtime. Operational Acceptance Testing ensures that service is delivered with appropriate and proven maintenance and housekeeping processes and procedures. This enables IT managers to meet SLAs/OLAs, deliver reliable and stable systems, increase customer base, build brand image, reduce revenue loss, and fulﬁll compliance requirements.

About the Author

Vittal Jadhav is a Certiﬁed Agile Tester, CSQA, ISTQB, ITIL, and an experienced IT professional having more than 12 years of rich and insightful experience in Infrastructure and Application Availability Services Testing. He has wide experience in Airlines Technology, Media Broadcasting, and Banking, Financial Services, and Insurance BFSI. In addition to this, he has experience in TL, ata arehouse, Non-Functional Testing, Performance ngineering, System Integration Testing, IT Automation, and Functional, and Regression testing. e has extensive knowledge of BFSI testing with more than six years of experience working with a multinational European bank.