Plant significantly reduces nuisance alarms
Best Practices Award
Challenge. The plant installed a Siemens T3000 control system in fall 2010 to replace an aging WDPF system. The old system was still generally reliable, but it had become harder to support especially with regards to expansion and interfacing with new sub control systems. A significant effort went into the conversion, including incorporating the “lessons learned” of others who had gone through similar upgrades, resulting in a relatively smooth conversion.
Immediately following the commissioning of the T3000, plant staff went through the process of getting acquainted with a new control system, which was relatively intuitive but somewhat complex because of the great flexibility and optionality inherent in the T3000 interface. While this doesn’t seem to be a big issue, it’s becoming more and more common with technology and software.
Documentation is generally lacking and most learning and knowledge is gained by word of mouth and experimentation. The point is that we all develop our own ways of interfacing with the DCS, and, in particular, managing and filtering alarms.
In addition to getting oriented to a new control system, there was the normal sort of punch-list items that had to be addressed, so we focused first on the items with the greatest potential impact on operations.
One thing we hadn’t anticipated or accounted for was the overall increase in the number of alarms. For quite a while we didn’t really understand—nor were we able to quantify—the extent of the changes to our alarm system, and alarm management was inadvertently pushed to a back burner.
During 2012, some serious incidents associated with missed alarms occurred at other plants with the same system, and we became aware of just how overwhelming the excessive alarming could be and the challenging position in which we had inadvertently put our technicians and operators. At times, especially during plant upsets, it was difficult at best to effectively understand and respond to critical alarms.
Our regular incident investigations revealed that missing key alarms during the course of a plant upset occurred on occasion because of the overwhelming volume of alarms. We had become comfortable enough with the system that we now began to become aware of, and turn our attention to, excessive alarming.
Solution. An ad hoc team, spearheaded by lead O&M Technician Joe Lynde, I&C Technician Ed Heroux, and Systems Superintendent Tim Sheehan, was established to begin addressing tuning, general control issues, and alarming. A lot of this effort focused on the issues that frustrated the operating engineers using the system day in and day out.
About the same time, a T3000 Users Group was formed to provide a forum for users to collaborate and share concerns and issues that could be presented to Siemens for resolution. At the inaugural meeting in May 2012, users learned just how many common issues were being experienced and what a significant factor communicating lessons learned is to avoid problems, or at least to ensure the timely resolution of issues. We became aware of how common excessive and ineffective alarming was, and the users asked Siemens to look into these concerns to provide help in understanding and developing a means to affect a remedy.
The efforts of our ad hoc team continued, and while numerous control issues were addressed our relative progress was difficult to quantify. We also realized that some of our operators were becoming numb, almost indifferent to the more frequent and common alarms.
We began to realize that, in many cases, the conversion process of adapting our original WDPF code to the T3000 resulted in significantly more potential alarms, in part because of alarm philosophy, or more precisely, the lack of an alarm philosophy. Many of these alarms might have been added simply because of the idea that “more information (alarms) is better.” Yet, if an alarm doesn’t guide an operator to a specific response, more than likely it will distract them from an effective and proper response.
Many of the additional alarms were related to control hardware (I&C alarms)—boards, processors, interfaces, and field devices—and had not existed in the previous system. While these alarms may be helpful from a troubleshooting perspective, they can have a compounding effect when the alarms for all the components or devices in a single leg of a control loop come in concurrently and effectively flood an operator with unhelpful information.
The team developed new ASD screens to improve the way alarms are displayed so that priority alarms are displayed on the top half of the screen to focus the operator there, with system and status condition alarms displayed in the lower half of the screen (Fig 1). This helped greatly, yet we found that we were still overwhelmed by the volume of alarms, particularly during plant upsets or startups.
Our plant staff takes pride in its ability to resolve issues in-house, so it took us a while to realize that we needed outside help. Perhaps a key lesson learned was figuring out when to ask for help. As a result of improved relationships and bonds created at the T3000 User Group, we reached out to Siemens.
Sheehan and Plant Manager Mark Winne met with Siemens staff, including Bill Thackston, the eventual project lead, to explain our situation and get guidance on how best to proceed. After a bit of back-and- forth conversation, a joint project team was put together to assess the extent of the problem and develop suggestions for resolution.
Data were collected and evaluated on all alarms that occurred between May 1 and August 11, 2013. Focusing on two primary areas, alarm event analysis and existing alarm settings, we discovered that:
- The 10 most frequent alarms accounted for 40.8% of approximately 133,000 alarms.
- Serial-link alarm floods accounted for 20,600 duplicate alarms, 18% of the total.
- Duplicate alarms based on legacy WDPF analog interval alarms contributed an additional 4%.
- BOP area alarms assigned a type of “status” accounted for 4%.
- Duplicate device alarms accounted for another 3%.
- Eliminating this combination of alarms (serial link, BOP status, and duplicate legacy and device alarms) could reduce the overall alarm volume by nearly 69%. The average 10-min alarm rate was approximately 10.4, well above the ISA 18.2 recommended limit of one alarm per operator per 10-min period, by removing these nuisance alarms, we could reduce our average 10-min alarm rate to approximately 3.1.
Our study also considered the alarm configuration, focusing on identifying alarms that could be removed. Currently, there are 12,334 alarms configured in the entire system. Of these, 5393 can be removed without loss of functionality, a reduction of 44%. The table shows the quantity and types of alarms that could be removed.
A chart submitted as part of the plant’s Best Practices entry showed the alarm rate per hour during the study period and revealed how overwhelming the rates can be during certain periods. The average alarm rate that occurred was 58 alarms per hour, nearly 10 times the recommended limit for a single operator.
In addition to frequency, alarms were categorized by source, area, and priority, which helped us to understand where to focus our efforts first. We learned that the vast majority of alarms were the result of serial-link communication failures associated with the T3000 interface and standalone control and monitoring systems (vibration monitors and PLCs, for example).
Results. Armed with this information, we developed a plan to initiate our first round of changes, focusing on the top 10 alarms and eliminating thousands of nuisance alarms in a collaborative, thoughtful, and well-engineered process. Though our analysis and discussions took five times longer than the system conversion, we are very pleased with the results.
We are still working to resolve the alarms associated with the serial-link interfaces, but the improvement is dramatic (Fig 2). Once we wrap up our efforts lowering the steady state/operations alarms, we intend to redirect our focus to startup and shutdown alarms.
It is difficult to quantify the benefits or savings realized by this effort, but most insurers can quickly point to a claim wherein a missed or inaccurate alarm resulted in millions of dollars in losses. We have little doubt that we paid off our investment before year-end. Alarm management has been a fast-growing and high-profile topic in process industries, and it is quickly coming to the forefront in the power industry as more and more plants implement the latest generation of advanced distributed control systems.
Project participants:
Joe Lynde, lead O&M technician
Ed Heroux, I&C technician
Tim Sheehan, systems superintendent
Steve Snopkowski, O&M manager
Bill Thackston, project lead, Siemens Energy Inc
Millennium Power
Owned by MACH Gen LLC
Operated by NAES Corp
360-MW, gas-fired, 1 × 1 combined cycle located in Charlton, Mass
Plant manager: Mark Winne