Just recently, I've been working with Service Level Agreements.
And, in trying to set up the right performance indicators and measures that will allow everyone a fair deal, the problem of how you handle latent defects resurfaced.
Most surprising was that neither the client nor the supplier really understood what latent defects are. So here is my definition, as well as some of the downstream implications, and issues that arise.
Let's start with a definition:
Latent Defects: Systemic Flaws that are hidden in the preexisting and current production system which will manifest at some unforeseen time in the future.
The idea of latent defects comes from the building industry. There's a slew of laws which protect the buyer of a property or home from dishonest sellers and developers. Essentially, the latest defect is a problem in the property or home that no reasonable person would have been able to find there, even with the most careful inspection. This idea is carried over into software development.
Latent defects are therefore defects that still remain after the software product has been placed into production, and which pass the normal tests of reasonability both in terms of pre-production testing and extended use. They lurk there, hidden deep in the woodwork when nobody would think to look.
Typically, the triggering of the defect is the result of an unusual or rare set of conditions, or an outcome of usage over an extended period of time. Latent defects will manifest a considerable period of time after being placed into production.
In the building industry, the rule of thumb is: if you could have founded by inspection, then it's not a latent defect. You just didn't put enough energy into looking for the problems. The same rule of thumb applies in the software industry. You don't get to call a bug a latent defect if you haven't paid enough attention to testing the system properly. That's just negligence, no matter how long the bug has been hanging around!
The lesson is: as far as possible fix the problems as early as you can, and don't leave them for the next generation!
The intention is as much to encourage good work as it is to punish bad performance, and it does this by ensuring that the proper expectations of the service are recorded, monitored and acted upon.
Latent defects skew the relationship. This is because, being unusual, the time to repair almost always breaks the SLA conditions. Until they are resolved, they impact on the remediation work everywhere else. Latent defects come in from left field, and can and do confuse managers and workers; so fixing them might well impact everywhere.
That's why we put a clause into the SLA that ensures that Latent Defects are accounted for, we handle them separately in the SLA.
They are the Black Swans of the Software Services world, and as such we expect them to be rare, and to handle them and move on.
If they move out of the rare space - then there is something fundamentally flawed with the system - and that demands much more than the SLA is designed to handle.
An example of this approach is found at the Open Process Framework site (OPF) here:
http://www.opfro.org/index.html?Components/WorkProducts/RequirementsSet/Requirements/LatentDefectRequirements.html~Contents
And, in trying to set up the right performance indicators and measures that will allow everyone a fair deal, the problem of how you handle latent defects resurfaced.
Most surprising was that neither the client nor the supplier really understood what latent defects are. So here is my definition, as well as some of the downstream implications, and issues that arise.
Let's start with a definition:
Latent Defects: Systemic Flaws that are hidden in the preexisting and current production system which will manifest at some unforeseen time in the future.
The idea of latent defects comes from the building industry. There's a slew of laws which protect the buyer of a property or home from dishonest sellers and developers. Essentially, the latest defect is a problem in the property or home that no reasonable person would have been able to find there, even with the most careful inspection. This idea is carried over into software development.
Latent defects are therefore defects that still remain after the software product has been placed into production, and which pass the normal tests of reasonability both in terms of pre-production testing and extended use. They lurk there, hidden deep in the woodwork when nobody would think to look.
Typically, the triggering of the defect is the result of an unusual or rare set of conditions, or an outcome of usage over an extended period of time. Latent defects will manifest a considerable period of time after being placed into production.
How do you know it's a Latent defect?
The easiest way is to process the set of triggering events against older versions of the software and find that the bug has been hanging around from some distant time in the past.In the building industry, the rule of thumb is: if you could have founded by inspection, then it's not a latent defect. You just didn't put enough energy into looking for the problems. The same rule of thumb applies in the software industry. You don't get to call a bug a latent defect if you haven't paid enough attention to testing the system properly. That's just negligence, no matter how long the bug has been hanging around!
Compounding Issues
Because they have lain undetected for a long period of time, there is a risk that the cumulative effect of Latent defects is to cause data corruption and mis-reporting. To correct this it is probable that there will have to be data fixes to the System's databases, and adjustment entries and explanations to Stakeholders.Downstream Implications
Sometimes the cost of making good is too high to justify full systemic repair, in which case the errors are left unrepaired, and the organisation makes do with manual adjustments and explanations to Stakeholders. This can (and almost invariably does) cause a high percentage of failure when the system is retired and replaced by a new System. The data errors present unpredictably to the normal conversion programs causing them to fail, and this leads to substantial delays and project cost overruns. Routinely, each data error must be examined manually using tracing processes that involve a disproportionate amount of cross-checking: line by line and record by record.The lesson is: as far as possible fix the problems as early as you can, and don't leave them for the next generation!
Latent Defects and their placement in an SLA
Amongst many other things, an SLA determines the performance profile, quality measurements and Service checkpoints for the associated contracts. It governs the behavior of the people tasked to do the work.The intention is as much to encourage good work as it is to punish bad performance, and it does this by ensuring that the proper expectations of the service are recorded, monitored and acted upon.
Latent defects skew the relationship. This is because, being unusual, the time to repair almost always breaks the SLA conditions. Until they are resolved, they impact on the remediation work everywhere else. Latent defects come in from left field, and can and do confuse managers and workers; so fixing them might well impact everywhere.
That's why we put a clause into the SLA that ensures that Latent Defects are accounted for, we handle them separately in the SLA.
They are the Black Swans of the Software Services world, and as such we expect them to be rare, and to handle them and move on.
If they move out of the rare space - then there is something fundamentally flawed with the system - and that demands much more than the SLA is designed to handle.
Illustrative SLA Entry:
Failure arising from Defects in the Configuration
|
Where the number of defects in Production arising from bad workmanship in the Service team results in re-work, the following shall apply:
| |
Latent Defects:
|
The cumulative work effort arising from these defects as measured over the 3-month measuring period exceeds 5% of the total
|
After due consideration and review by the GM (operations-Client) and the Service Delivery Manager (Supplier), Supplier will either refund or provide a credit for an amount calculated by the product of (the sum of the hours spent by the Service team) and (the prevailing Blended Rate) of the excess over the percentage.
|
Arising from Production defects:
|
The cumulative work effort arising from the defects as measured over the 3-month measuring period exceeds 10% of the total
|
After due consideration and review by the GM (operations, MIH) and the Portfolio Executive (SDT), SDT will SDT will either refund or provide a credit for an amount calculated by the product of (the sum of the hours spent by the Service team) and (the prevailing Blended Rate) of the excess over the percentage.
|
Review of limits:
|
The limits imposed above shall be reviewed annually with an expectation of steady improvement to the escalation limits:
· Latent Defects: 1%
· Service Team Defects: 5%
The outcomes of an improvement or regression shall be a material input to the re-negotiation of the contract
|
Other views of Latent Defects:
Some authors and authorities define Latent Defects as any defect that occurs once the system is placed into production. This is a way of distinguishing between defects discovered during construction and those discovered in production. This moves somewhat away from the original intention, so be careful to ask the question and set the tone when discussing with peers and academics.An example of this approach is found at the Open Process Framework site (OPF) here:
http://www.opfro.org/index.html?Components/WorkProducts/RequirementsSet/Requirements/LatentDefectRequirements.html~Contents