In the technique known as "failure modes and effects analysis", an engineer starts with a block diagram of a system. The engineer then considers what happens if each block of the diagram fails. The engineer than draws up a table in which failures are paired with their effects and an evaluation of the effects. The design of the system is then corrected, and the table adjusted until the system is not known to have unacceptable problems. Of course, the engineers could make mistakes. It's very helpful to have several engineers review the failure modes and effects analysis.
In the technique known as "fault tree analysis", an undesired effect is taken as the root of a tree of logic. Then, each situation that could cause that effect is added to the tree as a series of logic expressions. When fault trees[?] have real numbers about failure probabilities (often unavailable because of testing expense), computer programs can calculate failure probabilities from fault trees. The classic computer program is the Idaho National Engineering and Environmental Laboratory's SAPHIRE, which is used by the U.S. government to evaluate the safety and reliability of nuclear reactors, the Space shuttle, and the International Space Station.
Usually a failure in safety-certified systems is acceptable if less than one life per 30 years of operation (109 seconds) is lost to mechanical failure. Most Western nuclear reactors, medical equipment and commercial aircraft are certified to this level.
Once a failure mode is identified, usually it can be corrected by adding equipment to the system. For example, nuclear reactors emit dangerous radiation and contain nasty poisons, and nuclear reactions can cause such high heats that no substance can contain them. Therefore reactors have emergency core cooling systems to keep the heat down, shielding to contain the radiation, and containments (usualling several, nested) to prevent leakage.
For any given failure, a fail-over, or redundancy can almost always be designed and incorporated into a system.
When adding equipment is impractical, (usually because of expense) then the design has to made inherently safe, or "fail safe". The typical approach is to arrange the system so that ordinary single failures cause the mechanism to shut down in a safe way. For example, in an elevator the cable supporting the car pulls spring-loaded brakes open. If the cable breaks, the brakes grab rails, and the car does not fall. Another common fail-safe system is the pilot light sensor in most gas furnaces. If the pilot light is cold, a mechanical arrangement disengages the gas valve, so that the house cannot fill with unburned gas. Fail safes are common in medical equipment, traffic and railway signals, communications equipment and safety equipment.
Oddly enough, personality issues can be paramount in a safety engineer. They must be personally pleasant, intelligent and ruthless with themselves and their organization. In particular, they have to be able to "sell" the failures that they discover, and the attendant expense and time needed to correct them.
Safety engineers have to be ruthless about getting facts from other engineers. It is common for a safety engineer to consider software, chemical, electronic, eletrical, mechanical procedural and training problems in the same day. Often the facts can be very uncomfortable.
It is important to make the safety engineers part of a team, so that safety problems cannot be discounted as the safety engineers' personality problems, or ignored by firing a single engineer.
It is a severe safety problem if an engineering team or management discredits a safety engineer: Either the manager apppointed a poor engineer to the position, indicating that there may be numerous undiscovered safety issues, or the team has inverted development priorities, and considers safety of less importance than upper management or government does.