Benefits And Shortfalls Of 5 Reliability Techniques For Analyzing Fault Tolerance

May 14, 2023

Critical manufacturing systems need reliability more than ever. When systems fail, the consequences can be disastrous — from financial losses and injuries to loss of life.

That is why reliability techniques for analyzing fault tolerance are essential. With so many reliability techniques available, it can be difficult to determine which one is best suited for your needs.

In this article, we'll examine five different reliability techniques, each with its benefits and shortfalls. Let's explore these techniques to improve your understanding of fault tolerance analysis.

fault tolerance analysis.

FMEA (Failure mode and effect analysis)

Failure mode and effect analysis (FMEA) is a technique for exploring how systems fail and the impacts on processes or other systems. The methodology employs past process data and the operating conditions of manufacturing systems to unearth deficiencies or conditions that cause failures. Companies leverage FMEA to minimize and eliminate the probability of system failures.

Depending on the company's goals, FMEA can take many forms. A company may adopt:

  • design FMEA — identifies how different product designs fail throughout their life cycles
  • process FMEA — explores the probability of failures and their effects at each manufacturing stage.
  • functional FMEA — evaluates all the potential process failures in the production system and identifies ways to avoid them.

FMEA is a critical technique for:

  • maximizing the reliability of manufacturing systems, improving the quality of products and process safety
  • preventing failures and errors in operation — this reduces the impact of risks associated with system failures
  • enhancing customer satisfaction due to high-quality products
  • lowering operational costs since failures occur rarely.

There are a few challenges associated with the FMEA technique. It is a resource and time-intensive process. The process may suffer bias depending on the expertise and skills of the analysis team. Making assumptions, overlooking failures, and relying on outdated process data can negatively impact the accuracy of the process.

Reliability block diagram

Reliability block diagrams (RBD) employ graphical elements and mathematical models to calculate the reliability of production systems. The analysis breaks down complex systems into subsystems or components to simplify the analysis. Each block diagram represents a component or subsystem of the manufacturing process.

Reliability block diagrams highlight the relationships between systems (in the form of blocks), and their contribution to the system’s overall reliability. RBD is beneficial for analyzing parallel and series systems.

Companies can leverage RBD results to project the performances of manufacturing systems. They can also identify high-risk production assets and their locations in processes. The benefits of RBD are:

  • better understanding of system reliability — companies can identify opportunities to improve processes by identifying weak points due to dependencies between components or subsystems.
  • improved decision-making — the qualitative analysis provides decision-makers with sufficient information for reorganizing production floors, investing in redundant production systems, and streamlining maintenance programs.
  • better communication — manufacturers can easily explain system reliabilities and the impacts of different components to employees, stakeholders, and outsourced service providers. Improved communication ensures seamless production.

There are several shortcomings associated with RBD analysis. The reliability technique depends heavily on process data. Companies must guarantee the accuracy of process data, or the analysis is inconclusive. RBD rarely explores the impacts of external factors like human error or environmental factors on system reliability. This fault analysis technique also disregards failure modes of different components and their effects on system reliability.

Fault tree analysis

Fault tree analysis (FTA) is another mathematical and graphical reliability technique widely used by manufacturers. It is a top-down analytical process evaluating how failures move through the system.

The FTA technique aims to identify component-specific failures to improve system designs and prevent their escalation, causing system-wide breakdowns. Reliability engineers reverse engineer root causes of system failures to estimate how likely a piece of production equipment is to fail. Fault tree analysis uses logic gates and event symbols to indicate the relationships between component failures and their causes.

Fault tree analysis

    FTA symbols for events

The benefits of FTA are:

  • improved design of fault-tolerant production systems — FTA helps engineers to identify and select the most appropriate redundant systems to prevent production breakdowns, which costs manufacturers over $1.5 trillion annually
  • enhanced development and improvement of risk mitigation measures in manufacturing systems
  • more systematic and structured approach to identifying the root causes of system failures.

FTA requires the skills of experienced analysts. It is time-consuming, especially when dealing with large, complex systems. That means manufacturers should engage skilled analysts to ensure the analysis captures all the potential causes of system failures. FTA examines a specific top event and may not consider the impacts of external factors on system reliability.

Event Tree analysis

Event tree analysis (ETA) uses the same logic and mathematical and graphic representations as FTA. The difference between these techniques is that ETA analyzes the impacts of failures on system reliability. ETA is a vital tool for risk management and safety improvements in manufacturing facilities.

The technique identifies potential hazards or events that might trigger failures in production systems. The graphical model then highlights multiple outcomes accompanying these hazards. Each potential outcome is assigned a severity factor to streamline the implementation of the appropriate risk mitigation measures. ETA uses Boolean logic to assign probabilities and severity scores to consequences following the trigger of a particular event or hazard.

The benefits of ETA include:

  • not limited to equipment-related events — this makes it beneficial for evaluating the impacts of external factors on the reliability of manufacturing systems
  • logical means to evaluate hazards arising from the sequential system or component failures
  • enables companies to evaluate the effectiveness and shortcomings of existing protective and risk mitigation systems.

ETA might be ineffective for systems where multiple events occur at once. The analysis for such systems yields several redundant systems that might be inconclusive. ETA assumes all events are independent and disregards common failures.

Markov analysis

Markov analysis employs a statistical approach to evaluate system reliability. The reliability technique relies on current events to predict future outcomes. The statistical model analyzes the performance of individual components and predicts system transitions from one state to another over time. Reliability analysis using the Markov technique takes into account the fact that systems degrade and fail over time.

Markov analysis defines the state of different production systems and their operating characteristics. It uses specific transition matrices, representing the probability of a system changing states. Calculating the steady-state probabilities of each system allows companies to develop effective maintenance programs for improved system reliability.

The advantages of the Markov analysis technique are:

  • vital for modeling the reliability of complex systems with multiple failure modes
  • provides accurate reliability analysis of systems over time
  • analysis results are crucial for establishing and improving maintenance programs as equipment ages.

The Markov analysis is not devoid of shortcomings. The approach assumes the probability of components changing their states is constant over time, which cannot be true for all systems. The technique does not capture all complexities of production systems. It may disregard factors like inter-component dependencies or interactions.

How to choose the proper technique

Before choosing a reliability technique, companies should identify specific operational requirements and the associated production goals. Some factors to consider are:

  • The type of product and manufacturing process — the type of product manufactured dictates the manufacturing process and the desired reliability standards. Manufacturing processes with plenty of human interventions will benefit from the FMEA technique rather than a statistical approach, such as the Markov analysis.
  • Data availability — reliability analysis is data-driven. Each method requires adequate access to process data. Reliability analysis evaluates historical maintenance data, previous failures, and equipment usage patterns. Some techniques require more than one type of process data. Verify the quantity and quality of available data before choosing a reliability technique. Companies can optimize data management by leveraging digital solutions like computerized maintenance management systems (CMMS).
  • Time, resource, and financial constraints — reliability analysis should yield actionable results. Some techniques are more resource-intensive than others, while some take time. Ensure the company has adequate financial and technical resources to complete the analysis.
  • Regulatory requirements — the choice of reliability technique is influenced by industry standards and requirements.

Selecting the correct reliability analysis technique ensures manufacturers finalize fault analysis faster and at competitive market rates. It also eliminates complexities that may increase faults and process stops. This provides adequate time for the company to implement changes and prevent system failures. 


Reliability analysis is essential for designing fault-tolerant systems and eliminating bottlenecks that lower the productivity of manufacturing systems. Manufacturers can choose from different reliability analysis techniques depending on which goals they intend to achieve. Getting the fault tolerance analysis right informs operational changes to streamline manufacturing.

Leveraging digital solutions, including the ever-growing CMMS solutions market segment, enhances data management. Access to relevant process data improves the accuracy and effectiveness of reliability analysis programs.

Engage an experienced fault analysis specialist to audit manufacturing processes, conduct system analysis, and recommend process changes. The analysts can complete tolerance analysis faster and leverage their past experiences to overcome challenges arising during the fault tolerance analysis.

Author: Bryan Christiansen

Bryan Christiansen is the founder and CEO of Limble CMMS. Limble is a modern, easy-to-use mobile CMMS software that takes the stress and chaos out of maintenance by helping managers organize, automate, and streamline their maintenance operations.



Copyright 2006-2024 by Modern Analyst Media LLC