We test our household smoke alarms every month and do an office fire alarm drill every week. We rely on such protective devices and need them to be effective when they are called into use. Your complex equipment will have many such protective devices, and they need to be tested. In fact, such tests may make up the bulk of your scheduled maintenance. But how often do we test?
Redundancy, availability, reliability and the cost of failure
When calculating the failure finding interval, you need to consider the configuration of protective devices and protected functions. With household smoke alarms, we are encouraged to have at least two, to provide redundancy. Multiple safety valves in a pressure vessel also provide such redundancy.
Or, we can have one device protecting many things. A single electrical surge protector may protect many items of electrical equipment; one protective device protecting many protected functions.
So to calculate the interval, we need to consider:
- Redundancy: How many redundant protective devices do we have in the system (e.g. two smoke alarms)
- Availability: What level of availability of protection do we demand? This is essentially stating what level of risk you will accept. For the surge protector, will you accept 5% downtime - is that an acceptable risk?
- Reliability: We need to consider both the reliability of the protective device and the protected function. For my car dashboard low-oil light - how often does the light fail and how often does the engine run low on oil? For both, we look at the Mean Time Between Failure (MTBF) to provide a measure of reliability.
- Cost of failure: Finally, we often need to consider the cost of failure. If the protective device is unavailable and the protected function fails, what is the cost? Consider a quality control unit on a production line. If that fails, the production line could continue operating, all looks good, but the widgets created could all be faulty.
Some Common Approaches
Terms
Mtive | Mean time between failure of the protective device |
Mted | Mean time between failure of the protected device |
Mmf | Acceptable mean time between multiple failure (both protective and protected functions fail) |
U | Acceptable percentage of time that protection is unavailable (as a decimal) |
n | Number of redundant protective devices |
Cmf | Cost of multiple failure (both protective and protected functions fail) |
Cff | Cost of performing failure finding |