Tag Archives: CEITPS

Climbing the DIKW Pyramid: Applying Data, Information, Knowledge, Wisdom principles at the University of Leeds

Tim Banks
Faculty IT Manager
University of Leeds

One of the many areas of knowledge that the EDUCAUSE conference  helped me to develop was the importance of metrics and monitoring. All good metrics are based upon accurate data, but data isn’t useful on its own or in isolation. Here is one concrete example of how my attendance at EDUCAUSE 2015 has helped to shape my professional development and bring benefits to my institution.

The Information Technology Infrastructure Library (ITIL) framework makes reference to the DIKW pyramid (Data, Information, Knowledge, Wisdom) as can be seen below. Wisdom is based on sound knowledge, which in turn comes from useful information, which is based on accurate data.

final blog image

 

 

 

 

 

Let’s take an example of a typical automated monitoring system. An example of each level of the DIKW pyramid is as follows:

Data
09/01 18:29:45: Message from InterMapper 5.8.1

Event: Critical
Name: website-host.leeds.ac.uk Nagios Plugin
Document: Unix: Webhosting
Address: 129.11.1.1
Probe Type: Nagios Plugin
Condition: CRITICAL – Socket timeout after 10 seconds

Time since last reported down: 39 days, 3 hours, 12 minutes, 47 seconds Device’s up time: N/A

Information
This alert relates to one of our website servers.
This is not normal behaviour.

Knowledge
There is a planned network upgrade in one of our datacentres between 18:00 – 19:00 which is expected to cause network outages.
The server is part of a clustered pair with only one node affected, so service to end users will not be interrupted.

Wisdom
No action is required.

Most systems will generate endless data records. With some careful filtering of the data, it is possible to automatically generate ‘Information’. However, in most cases, ‘Knowledge’ (and in all cases ‘Wisdom’) will need some level of human intervention.

My team have recently started using the University of Leeds IT Service Management system (ServiceNow) and as part of this move, we have updated all of our automated monitoring systems so they now report into one shared email account. Previously,  they were going to various individual and shared email accounts, so we didn’t have a single view of everything. This single shared email account is our data store in the DIKW model. We have then applied a number of rules to identify the subset of alerts from the general notifications. We have defined alerts are something which we have defined as requiring human intervention. This takes us to the information level. These alerts are automatically entered into our Service Management system as incidents, where they are reviewed by a human and acted on as appropriate.

The ultimate goal is to use the configuration management database (CMDB) and change management records to try and automate some of the ‘Knowledge’ layer. e.g. Approved change X will affect the network between 07:00 and 07:30 on 5th May in Data Centre 1 in which server Y is located, so ignore any warnings from this server on this date between these times.

Accurate monitoring is the basis of building meaningful metrics. You cannot generate a useful metric on the ‘number of unplanned service outages in the last six months’ based on data alone. By ensuring that we have a model which allows us to record useful knowledge based on the raw data, we will be able to build some accurate and meaningful metrics.

The sessions I attended on data monitoring and metrics, in particular the one by led by the Consortium for the Establishment of Information Technology Performance Standards (CEITPS), really helped to define this approach and stopped me from falling into the trap of generating endless metrics (of little value) based on data alone. Hearing from other institutions that are further ahead on this journey than us and having the benefit of their advice on what approach to take and what pitfalls to avoid has been invaluable. I am also part of a small group at the University who are responsible for defining the institution-wide IT configuration management standards for recording and managing IT assets. Again, I will be bringing information and knowledge from EDUCAUSE sessions to these discussions.

Developing metrics and measures for IT

Tim Banks
Faculty IT Manager
University of Leeds

This morning I attended a session run by Martin Klubeck from the Consortium for the Establishment of Information Technology Performance Standards (CEITPS)

This group is working to establish a common set of measures and metrics across education IT. CEITPS volunteers have spent some time over the EDUCAUSE 2015 conference writing the first 21 metrics, in between attending sessions.

CEITPS have a refreshingly common sense approach to develop standards as follows:

  • Get some interested and enthusiastic people in a room
  • Write some standards, plagiarising as much as possible from other sources
  • Review within the group and amend as necessary
  • Don’t worry if you don’t get everything perfect first time
  • Send out to the wider CEITPS group for comment, but give them a limited time to respond (e.g. seven days). If you give them six weeks, they will take that long.

What is the difference between a measure and a metric?

This was a question asked by a member of the audience. Martin answered in the form of a tree analogy:

  1. The leaves are like data – there are a lot of them and a lot can be thrown away. Data are typically just raw numbers.
    1. NB: Never give data to a manager! Business Intelligence (BI) tools are particularly bad because not only do they give data to managers but they also make it look pretty…
  2. The twigs can be thought of as measures (e.g. ‘50%’ or ’20 out of 30′) – has some context.
  3. The branches are like information,which have more context around them.
  4. The trunk of the tree is your metrics,which have sufficient contextual and trend-over-time information to make them suitable for presentation to senior managers.
  5. It is vital to find out the root (i.e. underlying) question that the person asking wants answering before you provide any metrics.

Martin gave us an example of one of the metrics that they have developed this week:

Description: Rework [re-opening] service desk incidents.
Definition: Each and every time any incident requires more effort after it was incorrectly or not fully resolved but was considered to be resolved.
Presentation: Usually presented as a percentage of total incidents re-worked [re-opened] in a given timeframe.
Note: Need to cover the use case where a member of IT staff opens a new incident is opened rather than reopening the old one.

Other examples of metrics which the group have developed this week are as follows:

  • Defects found during development
  • Defects found during testing
  • Top 10 categories for incidents over given time period
  • Mean time to resolve (MTTR)
  • MTTR minus customer wait time
  • Adoption Rate
  • Call Abandon rate
  • On-time delivery

In total they have developed 21 of a total of 42 IT service management metrics. 37 of these came from the ITIL framework and a further five were added by the group.

The USA Core Data Survey was mentioned several times by both Martin and those attending the session. The Educause Core Data Service carries out surveys of standard benchmark data across all US institutions, and there has been much discussion about making sure that the CEITPS metrics could be combined with the CDS information to provide an even richer information source.

The CEITPS has several member institutions from outside the USA, and they are keen to get some more involvement from UK Universities, especially those who are currently implementing the ITIL framework and/or developing service metrics and measures.

Additional resource:

The University of North Carolina Greensboro metrics page