Thursday, June 11, 2009

Using Root Cause Analysis for Process Improvement

A few weeks ago, I discussed the idea of Root Cause Analysis. The approach there works well when you want to analyze each individual issue and mitigate it. If you start to do this often, you'll find that to be somewhat unweildy. Further, you may want to see trends in common root causes over a time period.

I'll discuss how to take this information and do some basic analysis to find trends. That trend information can help identify the most common causes and most common areas that cause issues.

In order to do this tracking we need to generalize and standardize some of the fields already tracked. For example, the specification of "Function" and "Cause" should come from their own respective lists. This will allow these to be grouped and each occurance counted for analysis. Additionally, you may want to add other aspects such as version, or sub-function to do analysis on.

If you plan on identifying multiple causes for a given issue, one way to identify that is to replicate the row in the spreadsheet for each subsequent cause so that one individual issue will have one row for each identified cause.

You can open a sample spreadsheet in OpenOffice format and follow along if you would like to see how to take this spreadsheet and do some basic analysis.

This is a sample of not just a single application, but multiple applications. The Root Cause Analysis has already been performed and we're ready for the analysis.

Let's assume we want to count the causes by application feature (called Issue Area in the spreadsheet). In Open Office, you want to use the "Data Pilot" feature. (In Excel, it's called Pivot Table and Pivot Chart).

You'll get the configuration for the Data Pilot (This is nearly exactly how Excel does it as well)

Drag the "Root Cause" button to the "Row fields" area, the "Issue Area" button to the "Column fields" area and Issue ID to the "Data Fields" area.

By default, Open Office wants to sum the data field, we'll want to change that to count, so when we click on the "data field" area, we are prompted to select the function we want to perform. Let's select 'count'.

I don't prefer the default graph, so I delete it, select the data fields without the totals and create a new graph of type 'stacked'.

I can repeat that for cause count by version, cause count by application, overall cause count, etc.

Once I have all my graphs, then I can look for the things that are in most need of change. From the graph above (and reading the issue spreadsheet attached), you'll see that there was a huge issue for login that caused all kinds of havoc. The login issue was related to the main domain server going down due to a faulty network card. Depending on your situation, this may or may not be something that can be mitigated by process change. However, it does give a sense of where the most issues lie and justifies the need for specific changes.

Good luck!

No comments: