Deconstructing ITSM

19 February 2009

Terminology and Taxonomies

This is fun: Vinod Agrasala is refining terminology such as Purpose, Goal and objective, Policy, Process & Procedure, Standards & Guidelines, and Assessment, Gap analysis and Audit; and the IT Skeptic is looking at the taxonomy of ITIL V3 Incidents and a list of Request Classes (caution: with all the comments those pages are around 8,000 and 3,000 words).

Vinod:

I am in a focused drive of differentiating between confused terms

The Skeptic:

I think ITIL V3 muddies the definition of Incident, and of Incident Management.

As I commented on Vinod’s blog, I think these discussions are not likely to be concluded with everyone agreeing, and it may be useless to expect agreement. But it’s still important to have the discussion – and vital for any given team or organisation to have clear common understanding. There are dangers in labelling two distinct concepts with one word: it becomes harder to work with them (e.g. to define workflow) if you’re at least aware of the multiple meanings or if different team members interpret them differently. Equally, if some important concepts are overlooked because there’s no term for them, the ITSM capability will not be engineered to cope with them.

People hope that ITIL, or other applicable sources of best practice, have already provided clear unarguable definitions. But no matter how confidently you restate those definitions, they are simply not precise enough to classify and distinguish the important concepts. See the above links for examples, or look at your favourite ITSM forum for questions like “is this an incident or a service request?”

Deconstructing Terminology

The “Deconstructing ITSM” approach would be this: first, try to define what are the important concepts. The concepts are more real than the words used by various speakers, however authoritative. Humpty-Dumpty fashion, words mean only what they are used to mean, and not all speakers are able to say what they mean by a word as precisely as Mr Dumpty. Also, the concepts are more real in your organisation for your business needs than in some global consensus.

  • Make sure you know your purpose. Things can only be important in a context, like drafting a specific process, customising a tool for a workflow, etc.
  • Identify what things can happen, or exist, or need to be managed, or recorded, etc
    • Since we’re talking about abstract concepts not physical objects you can drop on your toe, deciding what is one thing – when is a thing the same as another thing – isn’t easy. An incident logged on a Tuesday vs an incident logged on a Wednesday – there’s no significant difference. What is important is to distinguish details that matter, that make a difference for our purpose. Things that have affected user services are clearly different from things that haven’t (yet).
    • Some distinctions or ways of dividing up concepts seem universal and powerful: the Skeptic calls these natural fault-lines. Some won’t seem so natural, but don’t lump them together until you’ve properly tested them from various angles. It will be easy to group them together later.
  • Define the things as well as you can, without of course using the words we’re trying to define. This can get hairy. We are making a list of concepts that have lost their names (didn’t that happen to Alice, too?) and may not ever have had good names. The important thing is to capture the significant differences between your concepts, and if possible to capture everything that matters about them for your current purpose.
    • In case this is making no sense at all, here is the list of things that might be incidents or problems or something close, which I posted on the Skeptic’s blog:
      • Disruption of the current value of service to user (this is the undisputed part of ITIL’s Incident)
      • An abnormal condition in some part of the managed environment (infrastructure, applications, their configuration, …) (I think this is close to “Known Error”)
      • An abnormal symptom not yet identified (fuzzily worded, but I think it’s a concrete concept – close to “Problem”)
      • An identified risk of an abnormal condition emerging in the future (a “Problem”? or is this where ITIL fails to cover Risk? It doesn’t properly match ITIL’s definition “cause of one or more incidents)
      • An identified risk to the future value of service to the user (a disputed part of “Incident”? a “Problem”?)
      • [By abnormal I mean different from what it's supposed to be.]
    • In fact I think even this list isn’t fine-grained enough. There’s a significant difference between “an abnormal condition” that has only been reported (perhaps an “Event”) from one which we’ve decided has a real risk or impact.
    • It’s not shown in this list, but you can group the various concepts into a hierarchy or taxonomy, again using the significant distinctions between concepts.
  • Now, map your concepts to useful terms. The more confusion there is in external sources, the more careful you will have to be. And you might have to change it later, or explain it at length to colleagues. Do not throw away your scrap paper. Show your working. Write your name at the top of every page. Do not attempt to write on more than both sides of the paper.
  • Publish this to your team and make sure the definitions are clear enough that people can understand them while working.

Incident / Problem Definitions

Right, I suppose I ought to show what I’d do with the above concepts.

The important distinctions identified are: whether it affects user service or not (and that can be ‘no’, ‘yes but still within SLA’, ‘SLA breached’, but let’s ignore that for now); whether it has occurred or is a risk for the future; whether the effect (symptom, I wrote above) is linked to a known cause (condition). The risk fault-line is the one that ITIL terminology doesn’t help us with.

These are my definitions, for my current purpose (which is writing a blog article!):

  • Incident: a disruption of the current value of the service to the user, breaching SLA
  • Potential Incident: an identified risk of specific disruption to value of service to user
  • Problem: a group of one or more Incidents or Potential Incidents with measurable symptoms, but without a Known Error causing it
  • Event: an abnormal condition in the managed environment (infrastructure, applications, configuration), i.e. a value of some attribute of some component or components is outside a normal range (including failure!)
  • Known Error: an abnormal condition that has been identified as the cause of one or more Incidents and/or the potential cause of one or more Potential Incidents (wordy, but actually I think this is pretty precise. Fault could work as the term for this)
  • Potential Error: an identified risk of a Known Error emerging in the future (where there is no “abnormal condition” at the moment, but perhaps a trend indicates it, or a technical risk analysis indicates it)

Having identified the “fault-lines” we can see that some concepts don’t have terms: potential problem? Between potential incident and potential error, I don’t think there is a use for distinct “potential x” concept. And “potential event” – it would have a clear meaning, any attribute of any component going outside normal range – but these are not manageable entities. You could list them by listing the components, the attributes, and the normal ranges, but you couldn’t do anything with the list.

Process / Procedure

To shift ground to Vinod’s work, I refuse to get drawn into the “goals” vs “objectives” debate, but I have a few fault-lines in the process/procedure discussion that I feel strongly about.

  • Processes must be capable of being broken down (decomposed) to any number of levels, and you can’t have a different term for each. So the terms “macro process” and “sub-process” only describe relationships of a process to a higher-level or lower-level process.
  • A “value chain” is the top-level thing. It may or may not be a “process”.
  • There is a concept of event-driven, single input, single output things with defined flows within them. In BPM talk, these are necessary characteristics of a “process“. Anything that doesn’t have these characteristics isn’t process – it could be a “process area” or a “function” – but if an organisation wants to use the term process more loosely it can do. It just has to make the definition clear.
  • If both terms “process” and “procedure” are used, a procedure is lower level than a process.
  • There is a concept of a process that falls entirely within one department (many valuable processes are cross-departmental). You could call this a “procedure“.
  • There is a concept of a process performed entirely by one person, without time delays or message passing during the process, and described prescriptively. You could call this a “work instruction“.

In summary, existing definitions, ITIL or other, are not perfect, not robust and not comprehensive. You have to be prepared to build your own terminology and taxonomies.

About these ads

1 Comment »

  1. Looking over this I think the writing could be improved for clarity. Perhaps the list of incident/problem concepts would be better written as a set of distinctions or fault-lines.

    But my aim in blogging is not perfect writing first time round, otherwise I’d never be satisfied enough with anything to post.

    Comment by Joe Pearson — 19 February 2009 @ 10:39 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: