All images © 2008-2019 Cyril Souchon (All rights reserved) unless expressly noted otherwise

Wednesday, September 15, 2010

Latent Defects: When software has something to hide

Just recently, I've been working with Service Level Agreements.

And, in trying to set up the right performance indicators and measures that will allow everyone a fair deal, the problem of how you handle latent defects resurfaced.

Most surprising was that neither the client nor the supplier really understood what latent defects are. So here is my definition, as well as some of the downstream implications, and issues that arise.


Let's start with a definition:

Latent Defects: Systemic Flaws that are hidden in the preexisting and current production system which will manifest at some unforeseen time in the future.

The idea of latent defects comes from the building industry. There's a slew of laws which protect the buyer of a property or home from dishonest sellers and developers. Essentially, the latest defect is a problem in the property or home that no reasonable person would have been able to find there, even with the most careful inspection. This idea is carried over into software development.

Latent defects are therefore defects that still remain after the software product has been placed into production, and which pass the normal tests of reasonability both in terms of pre-production testing and extended use. They lurk there, hidden deep in the woodwork when nobody would think to look.

Typically, the triggering of the defect is the result of an unusual or rare set of conditions, or an outcome of usage over an extended period of time. Latent defects will manifest a considerable period of time after being placed into production.

How do you know it's a Latent defect?

The easiest way is to process the set of triggering events against older versions of the software and find that the bug has been hanging around from some distant time in the past.

In the building industry, the rule of thumb is: if you could have founded by inspection, then it's not a latent defect. You just didn't put enough energy into looking for the problems. The same rule of thumb applies in the software industry. You don't get to call a bug a latent defect if you haven't paid enough attention to testing the system properly. That's just negligence, no matter how long the bug has been hanging around!

Compounding Issues

Because they have lain undetected for a long period of time, there is a risk that the cumulative effect of Latent defects is to cause data corruption and mis-reporting. To correct this it is probable that there will have to be data fixes to the System's databases, and adjustment entries and explanations to Stakeholders.

Downstream Implications

Sometimes the cost of making good is too high to justify full systemic repair, in which case the errors are left unrepaired, and the organisation makes do with manual adjustments and explanations to Stakeholders. This can (and almost invariably does) cause a high percentage of failure when the system is retired and replaced by a new System. The data errors present unpredictably to the normal conversion programs causing them to fail, and this leads to substantial delays and project cost overruns. Routinely, each data error must be examined manually using tracing processes that involve a disproportionate amount of cross-checking: line by line and record by record.

The lesson is: as far as possible fix the problems as early as you can, and don't leave them for the next generation!

Latent Defects and their placement in an SLA

Amongst many other things, an SLA determines the performance profile, quality measurements and Service checkpoints for the associated contracts. It governs the behavior of the people tasked to do the work.

The intention is as much to encourage good work as it is to punish bad performance, and it does this by ensuring that the proper expectations of the service are recorded, monitored and acted upon.

Latent defects skew the relationship. This is because, being unusual, the time to repair almost always breaks the SLA conditions. Until they are resolved, they impact on the remediation work everywhere else. Latent defects come in from left field, and can and do confuse managers and workers; so fixing them might well impact everywhere.

That's why we put a clause into the SLA that ensures that Latent Defects are accounted for, we handle them separately in the SLA.

They are the Black Swans of the Software Services world, and as such we expect them to be rare, and to handle them and move on.

If they move out of the rare space - then there is something fundamentally flawed with the system - and that demands much more than the SLA is designed to handle.


Illustrative SLA Entry:

Failure arising from Defects in the Configuration
Where the number of defects in Production arising from bad workmanship in the   Service team results in re-work, the following shall apply:
Latent Defects:
The cumulative work effort arising from these defects as measured  over the 3-month measuring  period exceeds 5% of the total
After due consideration and review by the GM (operations-Client) and the Service Delivery Manager (Supplier), Supplier will either refund or provide a credit for  an amount calculated by the product of (the sum of the hours spent by the Service team) and (the prevailing Blended Rate) of the excess over the percentage.
Arising from Production defects:
The cumulative work effort arising from the defects as measured  over the 3-month measuring  period exceeds 10% of the total
After due consideration and review by the GM (operations, MIH) and the Portfolio Executive (SDT), SDT will SDT will either refund or provide a credit for an amount calculated by the product of (the sum of the hours spent by the Service team) and (the prevailing Blended Rate) of the excess over the percentage.
Review of limits:
The limits imposed above shall be reviewed annually with an expectation of steady improvement to the escalation limits:
·           Latent Defects: 1%
·           Service Team Defects: 5%
The outcomes of an improvement or regression shall be a material input to the re-negotiation of the contract


Other views of Latent Defects:

Some authors and authorities define Latent Defects as any defect that occurs once the system is placed into production. This is a way of distinguishing between defects discovered during construction and those discovered in production. This moves somewhat away from the original intention, so be careful to ask the question and set the tone when discussing with peers and academics.
An example of this approach is found at the Open Process Framework site (OPF) here:
 http://www.opfro.org/index.html?Components/WorkProducts/RequirementsSet/Requirements/LatentDefectRequirements.html~Contents 

Wednesday, September 1, 2010

It's a new Spring for me, Same old winter for them


If you don't know what you don't know ~ Honour those who do: 
A Season's lesson in looking after core values,
honouring and respectng knowledge, 
and rewarding the people who work in the Engine Room

A little over a year ago I had breakfast with a stranger in a coffee shop not too far from where I am typing this. Chance had brought us together: I had resigned from my previous job because my daughter was coming back from far-away lands and I wanted to free myself up to be with her. The question of gainful employment was furthest from my mind. He was looking for someone to fill a hole, help to move his ship from its present rocky course back into navigable waters and safe passage.
..
It was a cold winter's morning in July, and we had been brought together by a mutual third party. As we sat and drank steaming hot coffee and spoke about the challenges he had, and his vision for the journey, we both felt a sense of a common destiny and purpose. He had a clear understanding of the issues, and I could see clearly where I could make a contribution. We shook hands on it, a mutual agreement that once the space opened I would Lead that part of his convoy. Today marks the end of that venture, and it ends, bittersweet, with both success and failure.
..
The ship has changed course, and sails on a safe passage. There lies the success.
But the crew is decimated, the best have left and callow youths stand in their stead.
The ship's owners, knowing little of the work at hand, what it takes to deliver, or the carrying capacity of their crews, have responded to the successes by raising unrealisable expectations and punishing the shortfalls.
..
This is an analogy of course: in reality we are talking about COTS Software and IT systems:
the "crew" delivered a 51% increase in turnover, 47% increase in profit, the clients were universally satisfied and new, more equitable and sustainable deals were on the table. The owners responded by cancelling all bonuses and cutting back increases: excellence is not sufficient when there are shareholder pockets to be lined.
..
The collapse has followed quickly. How do people respond when they find their rewards yanked, and their targets raised still higher? The best go immediately, the second tier follow in the months to come. Delivery is curtailed, Sales dwindle, existing work can no longer be resourced, and quality drains away as juniors replace seniors. Expectation from existing clients were raised: the new course is working, but alas! The crews do not have the know-how or experience anymore, and those few who remain are worked to the bone: one by one they slip away. Without the leadership and navigational skills, the ship strays once more towards the rocks and shallows.
..
Deming said that you need a deep appreciation of your systems in order to create sustainable processes. If you lack that appreciation, then employ people who do, or who can realise it if there's time. Our owners did the reverse: With the best gone already, and income shrinking, they took to the fire pumps: sideline Management and start a round of retrenchments.
..
Let us change analogies for a moment:
I am reminded of a passage in The Call of The Wild, Jack London's story about sled dog teams in the deep of Alaska ... we are at that point where the team, dogs, harnesses and sled have been sold to a family of Southerners, who know nothing of travelling (or indeed, life!) in the near-Arctic. They know only of the gold they are hoping to prospect, and the smell of it is in their nostrils. They make slow time, they overburden the team, they quarrel and bicker amongst themselves. Time passes and they reach a moment where the lead dog, knowing what is coming on the trail, exhausted (as is his team) by overloading, lack of sleep and poor food, refuses to go further. They beat it half to death, then are forced to cut it loose and leave it on the trail.
..
As they move off, the dog lifts its head and watches as his team staggers out onto the ice, watches as the ice cracks and gives way, watches as the whole lot of them, men, woman, dogs and sled slide to their doom.
..
To reality:
The Leader is gone.
The lead dogs cut loose, and
the best of the pack have long since fallen by the wayside.
Those who remain answer to a Management of lawyers, and chartered accountants, and a sprinkling of old timers who have doubled their salaries by returning as consultants.
But since that was never my way, I refused that offer and will watch them from afar.
..
As for the Stranger and me: we are strangers no more, and the long journey has moulded us both. Its a new Spring for me, and I expect the same for him: but its the same cold winter that waits on them.
__________________

Jack London's "Call of the Wild" is generally taken to be a rather dated childrens story.
That's a great pity.
It tells us a lot about life and what we do to cope and survive.
You can read it on-line here :)

__________________
The images are from the original publication back in 1903, and can be found at the link above. To the best of my knowledge, they are in the public domain