Why monitor your standby battery

Why monitor your battery?

To ensure that it will function immediately and reliably when it is required
To obtain the maximum service life possible from the battery
To save on time, cost and resource
To support warranty claims

Continuous battery monitoring can help the user achieve all these things!

The battery is the primary cause of UPS failure

40 years ago the MTBF (Mean Time Between Failures) of electronic UPS (Uninterruptible Power Supply) systems could be estimated using a long and complex calculation involving the failure rates of all the system components, including every chip, resistor and capacitor. This calculation would hopefully come out in tens of years and was at best a ‘guesstimate’. Even then, when complex electronics were a lot younger and less predictable than they are now, the standby battery was recognised to be the weakest part of the system, particularly if the battery was of the then new VRLA type. MTBF cannot be calculated for lead-acid batteries.

Today historical data of the electronic sections of the UPS have, over many years, proven MTBF calculations to be true, apart from the odd failure, and the failure rate of the electronics is even lower today than it was then. This can’t be said of the battery; while VRLA failure rates have gone down slightly, the battery is still the most common failure in the UPS, and always will be so. In fact in the 2013 Ponemon Institute survey of over 584 managers who have responsibility for one or more data centers, 85% had at least one power outage per year in the previous two years and of those 91% had an unplanned outage to the load.

All standby batteries are composed of battery cells in series or series/parallel to achieve the necessary voltage and power for the system. This means that if one cell fails open-circuit the whole battery will fail. If, on the other hand, a cell fails by shorting, the battery will have a much shorter hold-up time and a very much reduced service life than the battery was designed for.

The Ponemon survey showed that despite infrastructure having become much more complicated than a few years ago, the major cause (an average of 60% per year over two years) of power outages affecting the critical load, was the failure of the UPS battery.

The above statements are recognised throughout the battery industry, so it makes complete sense to monitor the battery to ensure it is in good condition and will respond when needed, doesn’t it? Nobody wants the disruption, data and financial losses, and the sometimes life threatening problems of a supply failure to the critical load do they? Not to mention the high cost, confusion and loss of business during disaster recovery?

Well, no; no-one does want these things, so the number of UPS users who don’t pro-actively monitor their standby batteries is constantly surprising.

The Ponemon survey showed that every outage (one or more a year for the majority of the companies polled) averaged a loss of over $200,000 USD, which makes the investment in a monitoring system look pretty small by comparison.

It is the proud boast of more than one battery monitoring company that there has never been a supply failure to the critical load of a battery monitored by their systems.

Providing that intelligent attention is paid to the data collected, this claim is believable.

Why then do the majority of critical system users not monitor their battery systems as a matter of course?

Why aren’t all standby batteries monitored?

There are several reasons for not monitoring standby batteries, but the primary reason would seem to be cost. A premier battery monitoring system can cost maybe 50-60 percent of the cost of the battery; added to that is the cost of the installation, perhaps another 20- 40% of the battery cost. This makes it prohibitive for many users, quite a few of whom decide to change-out the battery earlier than would otherwise be indicated.

Changing-out batteries more often than necessary is very short-sighted; it not only means that over time the system is far more expensive than it needs to be, but there are often burn-in failures of new cells which put the critical load in even more danger than normal, and the user still doesn’t know when his battery will fail.

Given all the problems that can arise if the battery fails it would seem that it is very short sighted not to monitor your battery due to cost. Everyone takes out driver’s insurance ontheir car, even though it’s expensive and there are only a low percentage of accidents per number of drivers annually. You may have an accident in your driving life; many people don’t. On the other hand your standby battery will fail; it’s a fact of life. Batteries are guaranteed to fail at some time, you just don’t know when…

A second major cause of not monitoring the battery is that the battery in a UPS system is normally supplied by the UPS manufacturer, and UPS sales people are always bidding against each other. To suggest that the battery they are supplying may not be totally reliable, and propose an additional cost, is something that none of the UPS companies want to do. In fact they will only quote a battery monitoring system if the customer (or their consultant) requires it. In this they do their customer a disservice and, if the battery fails, it will often cost them a great deal of money. If there was a history of heart disease in your family and your doctor said checking for heart disease it wasn’t important, you wouldn’t be very impressed, would you? Don’t forget, 60% of data center UPS failures are caused by the battery!

Monthly, quarterly and annual maintenance

Quite simply, a lot of maintenance just doesn’t get done. The IEEE considers it is essential that monthly inspections of the individual cells of standby batteries are carried out, and as a minimum quarterly measurements of parameters, such as temperature and resistance or impedance are measured and recorded. The battery should also be discharge tested once per year or, as a minimum, at least once every two years.

It often happens however that, due to commercial pressures, trained technical personnel are not available, or too pressed to carry out inspections and measurements. Resource cutbacks mean that the maintenance is outsourced, however this doesn’t mean that all is necessarily well. A major UPS manufacturer recently admitted privately that only 70% of its contracted maintenance visits are carried out, because, due to lack of personnel, they can’t get round to them all in time.

Since it is quite possible that the battery can fail between one maintenance visit and the next, quarterly maintenance only begins to be reliable if the battery has a redundant string.

Although maintenance is apparently less expensive than a full continuous monitoring system, and may be so if the user has on-site technical labor resource, many companies have cut back their technical staff and have to out-source technical tasks. These two factors mean the cost-effective (and technically more effective) option is to install a continuous monitoring system.

Last, but not least, the best indicator of incipient failure for standby batteries is the ‘Ohmic’ test, the measurement of the DC resistance or AC impedance of the cell (conductance, advocated by at least one major instrument manufacturer is defined as 1 divided by resistance). Ohmic testing is undoubtedly the best indicator available at this time, however the maximum change in the resistance, whether it is from old age or from a failure mode, such as sulphation or dry-out, comes at the last stage of cell failure. If detected, this doesn’t leave a great deal of time to change out the cell before it fails, and it can fail between one maintenance visit and the next.

The preferred option therefore is continuous monitoring and trending of measured parameters.

Autonomy (discharge) testing

Capacity testing is the only way to be sure the battery has the capability to hold up the critical load for the time specified.
With that said, there are some problems with discharge testing:

It is expensive & disruptive
You have to take the battery off-line to do it
With recharge the battery may be off-line for more than 24-36 hours
In some cases the recharge has exacerbated undetected problems, so that the next
time the battery is required, it fails
The test is only valid for the day of testing; a couple of weeks later it could fail

Continuous battery monitoring

IEEE recommendations for the maintenance of the battery by continuous monitoring suggest several parameters which should be measured and stored.

These include:

Terminal voltage

Not very useful on its own (the terminal voltage is fixed by the charger and often doesn’t change until the battery has completely failed), but essential to be measured and recorded during a discharge test.

Ambient temperature

Useful for detecting adverse temperature conditions and for determination of the required charger voltage. The service life of a VRLA battery reduces by 50% for every rise of 10°C, however the effects of adverse temperature can largely be offset by dynamic adjustment of float voltage at the charger.

Cell resistance or impedance

Ohmic testing is a critical component in the detection of incipient failure battery and is now recognised throughout the industry as the most effective method to date for the non-invasive identification of poor cells.

Cell temperature

Monitoring the temperature of every cell is essential for the detection of thermal runaway conditions.

Thermal runaway is an exothermic reaction; as the resistance of a cell increases, due to the ever-present float current, the temperature increases and so on, until the temperature rises in an exponential fashion, ending in an explosive situation.

However, individual cells can be in the early and mid-stages of the exothermic reaction spiral without affecting their neighbours, and the explosive last stage can take place very quickly. It is therefore not sensible to rely on an ambient temperature measurement of the battery room, or a pilot temperature measurement of, say, cell number 3, when the cell in thermal runaway condition is number 17, perhaps even on a different rack!

US IFC608.3; 2010 requires that any battery with a substantial amount of acid electrolyte must be protected against fire and explosion. This means that thermal runaway conditions must be detected before the critical temperature is reached, and this means monitoring the temperature of individual cells.

System noise and ripple

UPS system noise and ripple currents can be very destructive to battery systems, particularly VRLA types. All batteries have a resistance to electrical current and, if a significant amount of noise and ripple current is passed through the batteries it will cause the cells to heat up more than normal, shortening the life of the battery.

Additionally, system noise and ripple currents higher than normal can indicate problems in the UPS system.

Float current

What we should measure, but can’t yet with standard technology, is float current. Float current is dictated by the voltage fixed by the charger, battery electrochemistry and the condition of the cell, it is usually 0.5 to 1.5 milliamps per battery ampere-hour.

This will rise significantly with fault conditions and is an important parameter to measure and trend, however this is not possible with currently available Hall-effect technology. Hall effect sensors that are sensitive enough to measure float current of a few tens of milliamps (say a 0-3 amp transducer) are badly affected by the relatively high battery discharge and subsequent recharge currents.

High currents will saturate the transducer core, leaving an unpredictable offset of several amps remaining after a discharge event, which destroys the trending pattern. In addition, Hall-effect sensors are affected by temperature, which is particularly significant in the milliamp range.

Beware of suppliers who claim to measure float current with standard Hall-effect sensors. A sensor rated to measure currents of 10 Amps or over cannot achieve this. If a sensor is rated to measure say 10A at 1% accuracy it will only be accurate to 0.1 Amp, or 100 milliamps; a sensor measuring 100 milliamps should be accurate to better than 5 milliamps.

A battery composed of 100 A/h cells will only have a float current of 50-150 milliamps, depending on age and chemistry. The 10A current transducer therefore cannot measure the battery float current of that battery. If the Hall-effect sensor is rated for battery charge/discharge, say 200 Amps, the sensor readings can wander 10 times the float current, due to system noise currents.

How does a battery monitoring system help to prevent unexpected failure?

The primary aim of a monitoring system is to prevent any of the problems laid out at the start of this paper. If attention is paid to what the monitor is telling you, faults can be identified before the battery can fail, the service life can be extended by detecting and changing out faulty cells and the costs of manual maintenance can be significantly reduced.

Even the costs of capacity testing can be reduced; the monitor will record the voltages of theindividual cells and the string currents, saving time and resource. Additionally, in any discharge and subsequent recharge the behaviour of the string currents in a multi-string battery is an important. Many failure modes can be detected by observing how closely the individual string currents track each other, and this is a basic function of a monitoring system.

Safety

A continuous monitoring system contributes to the safety of technical personnel. It is a statutory requirement that two parsons must be present in a higher voltage battery room at all times. In decreasing the requirement for manual maintenance intervention, a battery monitor reduces resource costs and also reduces the opportunity for accidents.