Problem Management
Objective: Problem Management aims to manage the lifecycle of all Problems. The primary objectives of this ITIL process are to prevent Incidents from happening, and to minimize the impact of incidents that cannot be prevented. 'Proactive Problem Management' analyzes Incident Records, and uses data collected by other IT Service Management processes to identify trends or significant Problems.
Contents
- 1 ITIL 4 Problem Management
- 2 Process Description
- 3 Sub-Processes
- 4 Definitions
- 5 Templates | KPIs
- 6 Roles | Responsibilities
- 7 Notes
ITIL 4 Problem Management
The Problem Management process described here (fig. 1) follows the specifications of ITIL V3, where Problem Management is a process in the service lifecycle stage of Service Operation.
ITIL V4 is no longer prescriptive about processes but shifts the focus on 34 'practices', giving organizations more freedom to define tailor-made processes. ITIL 4 therefore refers to Problem Management as a service management practice, describing the key activities, inputs, outputs and roles. Based on this guidance, organizations are advised to design a process for managing Problems in line with their specific requirements.
Since the processes defined in ITIL V3 have not been invalidated with the introduction of ITIL V4, organizations can still use the ITIL V3 process of Problem Management as a template.
Note:
In our YaSM Service Management Wiki we describe a leaner set of 19 service management processes that are more in tune with ITIL 4 and its focus on simplicity and "just enough process".
The YaSM service management model includes a process for managing problems that is a good starting point for organizations that wish to adopt ITIL 4.
Process Description
Problem Management seeks to minimize the adverse impact of Incidents by preventing Incidents from happening. For Incidents that have already occurred, Problem Management tries to prevent these Incidents from happening again.
ITIL defines a "Problem" as "the underlying cause of one or more Incidents".
Problem Management works closely with Incident Management, but it is not the same:
- Incident Management is about restoring services as quickly as possible, often by applying temporary solutions.
- Problem Management is tasked with analyzing root causes and preventing Incidents from happening in the future.
All Problems should be logged as Problem Records, where their status can be tracked, and a complete historical record maintained. The categorization and prioritization of Problems should be harmonized with the approach used in Incident Management, to facilitate matching between Incidents and Problems.
The Problem Management process uses reactive as well as proactive approaches:
- Reactive Problem Management is triggered if issues are identified that require analysis and the deployment of a longer-term solution. For example, Problem Management may pick up an Incident, or a set of related Incidents, whose root cause could not be resolved during Incident Management, to prevent similar Incidents from recurring.
- Proactive Problem Management is an ongoing activity that tries to identify issues to prevent resulting Incidents from happening. For example, Problem Management will analyze Incident Records, operational logs etc. to find patterns and trends that may indicate the presence of underlying errors.
Once a Problem has been identified and diagnosed, it becomes a "Known Error". If possible, Problem Management will provide a Workaround - a temporary solution that can be used for dealing with related Incidents while a permanent solution for the Problem is being developed.
When a final solution has been deployed, the Problem Record should be formally closed. This will ensure the problem record contains a full historical description and that all relevant records are updated.
Problem Management interfaces with a number of other ITIL processes:
- Problem Management provides information to the Incident Management process, such as Workarounds and Known Errors. Problem Management uses data collected during Incident resolution for Problem identification.
- Change Management may be invoked from Problem Management if a Change is needed to resolve a Problem.
- Configuration Management provides data used to identify Problems and link them to particular Configuration Items.
The process overview of 'ITIL Problem Management' (fig. 1)shows the key information flows and process interfaces.
Sub-Processes
These are the ITIL Problem Management sub-processes and their process objectives:
Proactive Problem Identification
- Process Objective: To improve overall availability of services by proactively identifying Problems. Proactive Problem Management aims to identify and solve Problems and/or provide suitable Workarounds before (further) Incidents recur.
Problem Categorization and Prioritization
- Process Objective: To record and prioritize the Problem with appropriate diligence, in order to facilitate a swift and effective resolution.
Problem Diagnosis and Resolution
- Process Objective: To identify the underlying root cause of a Problem and initiate the most appropriate and economical Problem solution. If possible, a temporary Workaround is supplied.
Problem and Error Control
- Process Objective: To constantly monitor outstanding Problems with regards to their processing status, so that where necessary corrective measures may be introduced.
Problem Closure and Evaluation
- Process Objective: To ensure that - after a successful Problem solution - the Problem Record contains a full historical description, and that related Known Error Records are updated.
Major Problem Review
- Process Objective: To review the resolution of a Problem in order to prevent recurrence and learn any lessons for the future. Furthermore it is to be verified whether the Problems marked as closed have actually been eliminated.
Problem Management Reporting
- Process Objective: ITIL Problem Management Reporting aims to ensure that the other Service Management processes as well as IT Management are informed of outstanding Problems, their processing-status and existing Workarounds (see "Problem Management Report").
Definitions
The following ITIL terms and acronyms (information objects) are used in the ITIL Problem Management process to represent process outputs and inputs:
Known Error
- A Known Error is a problem that has a documented root cause and a Workaround. Known Errors are managed throughout their lifecycle by the Problem Management process. The details of each Known Error are recorded in a Known Error Record stored in the Known Error Database (KEDB). As a rule, Known Errors are identified by Problem Management, but Known Errors may also be suggested by other Service Management disciplines, e.g. Incident Management, or by suppliers.
Known Error Database (KEDB)
- The Known Error Database (KEDB) is created by Problem Management and used by Incident and Problem Management to manage all Known Error Records.
- A cause of one or more Incidents. The cause is not usually known at the time a Problem Record is created.
Problem Management Report
- A report supplying Problem-related information to the other Service Management processes.
Problem Record
- The Problem Record contains all details of a Problem, documenting the history of the Problem from detection to closure (see: ITIL Checklist Problem Record).
Suggested new Known Error
- A suggestion to create a new entry in the Known Error Database, for example raised by the Service Desk or by Release Management. Known Errors are managed throughout their lifecycle by Problem Management.
Suggested new Problem
- A notification about a suspected Problem, handed over to Problem Management for further investigation, possibly leading to the formal logging of a Problem.
Suggested new Workaround
- A suggestion to enter a new Workaround in the Known Error Database, for example raised by the Service Desk or by Release Management. Workarounds are managed throughout their lifecycle by Problem Management.
- Workarounds are temporary solutions aimed at reducing or eliminating the impact of Known Errors (and thus Problems) for which a full resolution is not yet available. As such, Workarounds are often applied to reduce the impact of Incidents or Problems if their underlying causes cannot be readily identified or removed.
Templates | KPIs
- Key Performance Indicators (KPIs) Problem Management
- Problem Management templates and checklists:
- Checklist Problem Record, and
- Checklist Closure of a Problem
- Problem Report template
Roles | Responsibilities
Problem Manager - Process Owner
- The Problem Manager is responsible for managing the lifecycle of all Problems.
- His primary objectives are to prevent Incidents from happening, and to minimize the impact of Incidents that cannot be prevented.
- To this purpose he maintains information about Known Errors and Workarounds.
Responsibility Matrix: ITIL Problem Management ITIL Role | Sub-Process | Problem Manager | Applications Analyst[3] | Technical Analyst[3] |
Proactive Problem Identification | A[1]R[2] | - | - |
Problem Categorization and Prioritization | AR | - | - |
Problem Diagnosis and Resolution | AR | R | R |
Problem and Error Control | AR | - | - |
Problem Closure and Evaluation | AR | - | - |
Major Problem Review | AR | - | - |
Problem Management Reporting | AR | - | - |
[1] A: Accountable according to the RACI Model: Those who are ultimately accountable for the correct and thorough completion of the Problem Management process.
[2] R: Responsible according to the RACI Model: Those who do the work to achieve a task within Problem Management.
Notes
By: Stefan Kempter , IT Process Maps.