Proper Alarm Management Helps Prevent Unplanned Downtime

It is a typical start to the day for the facility operator. A few alarms need handling, but all systems appear to be running well on site and out in the field. Suddenly, an alarm pops up on the human-machine interface (HMI) display – the compressor is down! Wait, what? It was up and running just a second ago. Then another alarm pops up signaling low pressure due to the compressor not running. And then another for high discharge pressure as there is no suction. And like a row of cascading dominoes, more system-related failures occur one after the other, lighting the screen up with a multitude of alarms. The operator becomes increasingly overwhelmed as what started out as just a few alarms on the screen has now become well over 100 alarms and climbing! With less than 5 minutes to resolve the issue, what does the operator do?

In this all-too-common real-world scenario, the situation rapidly escalated leaving the operator with no choice but to shut down the system, costing the company millions of dollars in unplanned downtime.

Many operators face similar situations daily with some having upward of 300,000 alarms in a 24-hour period. That is a LOT. Anything more than 300 alarms in a 24-hour period begs the question, “What was missed?” The overwhelmed operator cannot possibly see all those alarms and react within a short timeframe. They often just throw up their hands and say, “What the heck?” They have no chance to react quickly and press the right button in time.

When an operator has too many alarms annunciated during a facility upset, it is known as an alarm flood. Oftentimes, these alarm issues stem from an alarm system that has poor prioritization, improperly set alarm points, ineffective annunciation, unclear HMI graphics and alarm meanings and so on.

Improper alarm management leads to unplanned downtime, contributing to $10 to $20 billions of dollars in lost production every year, and the potential for a major industrial incident. Facilities facing these types of real-world alarm issues could benefit from a properly functioning alarm management system, which all starts with an alarm philosophy and alarm rationalization techniques.

Compliance and the Alarm Psyche

An alarm philosophy – essentially a set of guidelines on how to do alarms correctly – provides the basis for a properly functioning alarm management system. To get started, the ISA-18.2 Management of Alarm Systems for the Process Industries standard provides general principles and processes for managing the lifecycle of alarm systems. The alarm management lifecycle stages include the alarm philosophy, identification, rationalization, detailed design, implementation, monitoring and maintenance, and an audit of the control system.

The ISA-18.2 standard first requires facilities to create an alarm philosophy document that defines the criteria for rating an alarm’s severity and urgency. With an alarm philosophy in place, facilities can follow the set of criteria to design, develop, implement, modify, manage, and continuously improve and maintain alarms. Alarm response procedures can also be developed and specific information on each alarm can be embedded within an HMI to help operators respond quickly and mitigate abnormal situations effectively and safely. For facilities looking to migrate legacy systems, a best practice would be to incorporate an alarm philosophy and alarm response procedures early on as part of the migration to minimize costs and increase operator buy-in – making the transition seamless.

For regulatory compliance purposes, auditing is part of the ISA-18.2 standard’s lifecycle requirement where a comprehensive assessment of the alarm system is required, including evaluation of alarm system performance and work practices used to administer the alarm system. Periodic reviews reveal gaps not apparent from routine monitoring and identify system improvements, including modifications to the alarm philosophy. Audits help necessitate and enforce adherence to the alarm philosophy. To stay compliant, alarm management should be part of a facility’s continuous improvement program and incorporated into any equipment updates or legacy system migration projects.

Alarm Rationale

Once alarm philosophy criteria are established, the alarm rationalization process helps to minimize the amount of alarms required to keep operating conditions efficient and safe. An alarm rationalization team reviews, justifies, validates and documents each alarm based on the alarm philosophy criteria. The goal is to evaluate the alarm system to identify root causes and determine what alarms should be included in the system.

To illustrate the alarm philosophy and rationalization concept even further, let’s revisit our real-world compressor scenario with a few qualifying questions:

  • What was the one critical alarm the operator needed to know?
    • That the compressor went down.
  • What are the key factors that caused this compressor to go down and how is that determined?
    • Some good indicators must be found to help identify what possibly caused it, such as:
      • A trip
      • Pressure
      • Temperature variation
      • Amps
      • Something else?

Operator Response Time

Once an alarm’s level of criticality is validated and good indicators are determined, the alarm rationalization process looks at operator response time, consequence and effect, as in:

  • Does the operator have to act on the alarm? Yes or No?
    • If yes, how much time does the operator have to take care of this alarm before a consequence occurs? 5 minutes, less than 10 to 20 minutes, or greater than 30 minutes?
    • If no, does it even need to be an alarm?
      • Just because an alarm is ‘good to know’ doesn’t mean there is a need for an alarm. It must have operable action, even if it may only require initiating a work repair.
    • What is the consequence if the operator doesn’t respond to the alarm in say 5 minutes – what else might occur in the system?
      • Compressor stops running
      • Low pressure
      • High discharge pressure
      • And so on
    • What happens if an operator does nothing about this issue – what is the severity of impact?
      • Unplanned downtime
      • Cost of shutdown
      • Personnel safety
      • Environmental damage
      • And so on

As we continue to drill down, the operator will still have several alarms to handle, but the difference is that all the alarms are now based on the one condition – the first critical alarm telling the operator the compressor went down.

Dynamic Alarming: No More Floods

Now, what about the alarm floods or nuisance alarms (e.g., three alarms in under a minute)? State-based or dynamic alarming utilizes various techniques to eliminate these issues. To illustrate, let’s turn to our compressor scenario again, but this time, let’s look at what conditions can be used for when the operator starts up the compressor, as in:

  • What if when starting up the compressor, the operator wants to load the compressor to 50% instead of all the way up to 100% capacity?
    • If the load is set to only 50%, certain alarms can be turned on based on those conditions. Then once the operator wants the compressor in total operation, every alarm pertaining to that condition can be turned on.

Thus, dynamic alarming ensures that no matter what condition the process is in – shutdown, startup or some other state – the alarms can be managed ahead of time. In other words, an alarm rationalization process helps deliver the right alarm to the right operator at the right time with the right priority level and with the right guidance to correct or mitigate an undesirable situation.

This may sound simple, but this process can quickly overwhelm and intimidate an alarm rationalization team by the sheer scale and complexity of the challenge. The alarm rationalization team is supposed to review and evaluate the operation of the plant or unit, determine the undesirable things that might occur (e.g., how many nuisance alarms are in the system), and then determine which associated alarms are appropriate based on the alarm philosophy criteria.

Alarm Priorities

The team may also create a preliminary design that includes the current alarm priorities in the control system, set point, and other alarm attributes. For instance,

  • What are the colors used on an HMI screen to indicate low-priority, high-priority or critical alarms?
  • How much time does the operator have to take care of an alarm before it is too late?
  • What is the consequence if an operator doesn’t respond to an alarm in say 10 or 15 minutes (e.g., what else might occur in the system)?
  • What is the severity of the impact if no response is given on time (e.g., unscheduled downtime, cost of shutdown, environmental damage, personnel safety)?
  • And so on

A Qualified Team

As an effective alarm rationalization team evaluates the operation of the facility or unit, they might process more than a hundred alarms each day from a total of about 10,000, so the process is often long and tedious. Where lack of resource bandwidth is a concern, a third-party team who is knowledgeable in alarm management best practices can help collaborate and carefully evaluate existing alarm configurations, apply the latest alarm management principles and streamline operational safety. A qualified team of experts can help perform a thorough review of existing alarms, analyze system data, evaluate and document cause and effect, and determine which alarms exist and where is the potential for any new alarms based on the alarm philosophy criteria.

A properly rationalized alarm system allows operators to quickly identify root causes and effectively resolve issues. When facilities consider the potential safety risk to personnel and billions of dollars in lost production, incorporating alarm management with an alarm philosophy and rationalization process is well worth the effort.