Wednesday, January 18, 2006
Failures count more than Successes - www.stuartcheshire.org/rants/Errors.html
Failures count more than Successes
FAILURES COUNT MORE THAN SUCCESSES
Stuart Cheshire, August 1996.
User-experience is defined by the times when a computer doesn'twork,
not by the times when it does.
When a device is functioning correctly, the details of it'soperation
are almost invisible to the user, and that is the way itshould be.
It's only when it goes wrong that the device forces itselfstrongly
into the user's awareness.
This means that the quality of the software and user interface
forhandling failures is just as important, if not more important than
thecode for the normal (successful) case.
Most software authors concentrate on the behaviour of the
programwhen it is working properly, because the software is supposed
toperform some task, and the programmer is concerned with making
itperform that task quickly, efficiently, elegantly. What happens
whensomething goes wrong (like the disk fills up) is rarely a
priority.Once something has gone wrong we're looking at a case where,
in asense, the program has already failed. It's not going to be able
tocomplete it's task now (for no fault of it's own) so the exact
mannerin which it fails hardly matters.
The programmer is still busy working on improving the performanceand
getting the bugs out of the correctly-functioning cases. Whydevote
time to working on a case where we _know_ there's no wayfor the
program to finish the task? In computer game terms, that caseis
already "game over", and no amount of programming work is going
tochange that and seize victory out of defeat. In an ideal world,
thatcase should _never_ happen anyway, so it would be stupid to
wastetime working on it, wouldn't it? What really matters is all
thatelegant efficient code that's executing in the common case.
The problem is, the user doesn't see it that way. In normal use,
thecomputer is correctly executing millions of instructions every
second.Disks seek, interrupts interrupt, network packets fly across
theworld, and progress continues smoothly. The user is no more
impressedby all these little successes than they are impressed every
time aspark plug in their car engine fires correctly. The user is not
evenaware of of all these little successes. They are completely
invisible,which is the way it should be.
The only time a typical motorist takes a good detailed look attheir
car engine is when it's not working, and it's the same withcomputer
software. When something goes wrong is precisely the timewhen a piece
of software leaps up from invisibleness in the backgroundand forces
the user to pay close attention to it. That's the momentwhen the
software is under the closest scrutiny, and it's usually thepart of
the user interface where least effort has been spent.
One image that appears in my mind is that completing a real
worldtask is kind of like having to get from one side of a swamp to
theother. Lots of hazzards and pitfalls lurk in the swamp that
separatesus from our goal. Software is the bridge that gets us from
where weare to where we want to be.
A lot of software is like a six-inch wide polished chrome beamacross
the swamp. It snakes smoothly across the swamp, taking theshortest
possible path around the rocks and trees and other obstacles,gleaming
impressively in the sun. It has no direction signs or crashbarriers to
marr its elegant simplicity or distract the user as theyspeed across
on their motorbike.
The author of the software proudly demonstrates how, on hismotorbike
with its 2.5 litre engine, he can cross the swamp in 17seconds. That's
great. All of the software has been developed andtested to achieve
that goal quickly, efficiently, elegantly.
The problem is, when a new user first gets hold of this softwarethey
make some mistake. They type some incorrect command, or they failto
configure some setting correctly before executing some othercommand,
and they miss a turn and fail to keep the motorcycle on thebeam. The
beautiful chrome beam is still gleaming impressively in thesun, but
the user can't even see it because they're lying face down inthe mud.
The fact that they could have completed the task in 17seconds is
little consolation if it takes them several hours to getout of the
mud. Sure, after a few times the user might learn to makeevery turn
perfectly, but the first few times they use the softwarethey have a
very unpleasant experience with it.
We need a bridge across the swamp that's a little bit wider, andhas
crash barriers so that even when you make a mistake you are
guidedback onto the right path, instead of being allowed to plunge
face-firstinto the swamp.
We recently had an experience like this setting up an ISDN
bridge.After several hours of trouble-shooting on the telephone with
SUN'snetwork administrators we finally got it working. It turned out
thatthere had been five or six different things that were wrong, but
forevery one the message was the same: "Connection Failed". First,
thephone number it was programmed to dial was wrong. SUN's
networkadministrators could see that there was no call coming in, but
all wesaw was "Connection Failed". After we corrected the phone number
SUN'snetwork administrators could see that there was now a call coming
in,but all we saw was "Connection Failed". SUN's network
administratorsdiscovered that they had made a typo in the list of
usernames at theirend, so they corrected that. Now they could see
that a call coming inand the username was being recognised, but all
we saw was "ConnectionFailed". This went on for several hours until
each individual problemhad been fixed, and finally we were able to
connect.
At every stage the ISDN bridge told us only that it wasn't
working(which we could tell ourselves pretty easily anyway). It
didn't saywhy it wasn't working. It didn't say what parts of the
connectionprocess _had_ worked correctly. It didn't tell us what we
mightdo to fix the problem. We couldn't even tell if the ISDN line
thatPacific Bell had installed was connected properly, because the
ISDNbridge didn't give any indication of whether or not it was
detectingthe ISDN equivalent of a "dial tone".
Now it's finally working I'm sure it will continue to work fine
andwe'll not give it a second thought, but those hours struggling to
getit set up were a nightmare. There's no way we could have done it
withoutoutside help, and we're networking experts.
All this is not the programmers fault alone. Programmers implement
whattheir written specifications say they should implement, and
softwarespecifications always go into great detail about what the
software issupposed to _do_, and rarely make any mention of how it
should fail.Failures are regarded as, well, failures, so what more is
there to sayabout them, except that they shouldn't happen?
Well, failures do happen, and how they are handled may be the
mostimportant aspect of defining the quality of a human being's
interactionwith a computer.
-------------------------
FAILURES COUNT MORE THAN SUCCESSES
Stuart Cheshire, August 1996.
User-experience is defined by the times when a computer doesn'twork,
not by the times when it does.
When a device is functioning correctly, the details of it'soperation
are almost invisible to the user, and that is the way itshould be.
It's only when it goes wrong that the device forces itselfstrongly
into the user's awareness.
This means that the quality of the software and user interface
forhandling failures is just as important, if not more important than
thecode for the normal (successful) case.
Most software authors concentrate on the behaviour of the
programwhen it is working properly, because the software is supposed
toperform some task, and the programmer is concerned with making
itperform that task quickly, efficiently, elegantly. What happens
whensomething goes wrong (like the disk fills up) is rarely a
priority.Once something has gone wrong we're looking at a case where,
in asense, the program has already failed. It's not going to be able
tocomplete it's task now (for no fault of it's own) so the exact
mannerin which it fails hardly matters.
The programmer is still busy working on improving the performanceand
getting the bugs out of the correctly-functioning cases. Whydevote
time to working on a case where we _know_ there's no wayfor the
program to finish the task? In computer game terms, that caseis
already "game over", and no amount of programming work is going
tochange that and seize victory out of defeat. In an ideal world,
thatcase should _never_ happen anyway, so it would be stupid to
wastetime working on it, wouldn't it? What really matters is all
thatelegant efficient code that's executing in the common case.
The problem is, the user doesn't see it that way. In normal use,
thecomputer is correctly executing millions of instructions every
second.Disks seek, interrupts interrupt, network packets fly across
theworld, and progress continues smoothly. The user is no more
impressedby all these little successes than they are impressed every
time aspark plug in their car engine fires correctly. The user is not
evenaware of of all these little successes. They are completely
invisible,which is the way it should be.
The only time a typical motorist takes a good detailed look attheir
car engine is when it's not working, and it's the same withcomputer
software. When something goes wrong is precisely the timewhen a piece
of software leaps up from invisibleness in the backgroundand forces
the user to pay close attention to it. That's the momentwhen the
software is under the closest scrutiny, and it's usually thepart of
the user interface where least effort has been spent.
One image that appears in my mind is that completing a real
worldtask is kind of like having to get from one side of a swamp to
theother. Lots of hazzards and pitfalls lurk in the swamp that
separatesus from our goal. Software is the bridge that gets us from
where weare to where we want to be.
A lot of software is like a six-inch wide polished chrome beamacross
the swamp. It snakes smoothly across the swamp, taking theshortest
possible path around the rocks and trees and other obstacles,gleaming
impressively in the sun. It has no direction signs or crashbarriers to
marr its elegant simplicity or distract the user as theyspeed across
on their motorbike.
The author of the software proudly demonstrates how, on hismotorbike
with its 2.5 litre engine, he can cross the swamp in 17seconds. That's
great. All of the software has been developed andtested to achieve
that goal quickly, efficiently, elegantly.
The problem is, when a new user first gets hold of this softwarethey
make some mistake. They type some incorrect command, or they failto
configure some setting correctly before executing some othercommand,
and they miss a turn and fail to keep the motorcycle on thebeam. The
beautiful chrome beam is still gleaming impressively in thesun, but
the user can't even see it because they're lying face down inthe mud.
The fact that they could have completed the task in 17seconds is
little consolation if it takes them several hours to getout of the
mud. Sure, after a few times the user might learn to makeevery turn
perfectly, but the first few times they use the softwarethey have a
very unpleasant experience with it.
We need a bridge across the swamp that's a little bit wider, andhas
crash barriers so that even when you make a mistake you are
guidedback onto the right path, instead of being allowed to plunge
face-firstinto the swamp.
We recently had an experience like this setting up an ISDN
bridge.After several hours of trouble-shooting on the telephone with
SUN'snetwork administrators we finally got it working. It turned out
thatthere had been five or six different things that were wrong, but
forevery one the message was the same: "Connection Failed". First,
thephone number it was programmed to dial was wrong. SUN's
networkadministrators could see that there was no call coming in, but
all wesaw was "Connection Failed". After we corrected the phone number
SUN'snetwork administrators could see that there was now a call coming
in,but all we saw was "Connection Failed". SUN's network
administratorsdiscovered that they had made a typo in the list of
usernames at theirend, so they corrected that. Now they could see
that a call coming inand the username was being recognised, but all
we saw was "ConnectionFailed". This went on for several hours until
each individual problemhad been fixed, and finally we were able to
connect.
At every stage the ISDN bridge told us only that it wasn't
working(which we could tell ourselves pretty easily anyway). It
didn't saywhy it wasn't working. It didn't say what parts of the
connectionprocess _had_ worked correctly. It didn't tell us what we
mightdo to fix the problem. We couldn't even tell if the ISDN line
thatPacific Bell had installed was connected properly, because the
ISDNbridge didn't give any indication of whether or not it was
detectingthe ISDN equivalent of a "dial tone".
Now it's finally working I'm sure it will continue to work fine
andwe'll not give it a second thought, but those hours struggling to
getit set up were a nightmare. There's no way we could have done it
withoutoutside help, and we're networking experts.
All this is not the programmers fault alone. Programmers implement
whattheir written specifications say they should implement, and
softwarespecifications always go into great detail about what the
software issupposed to _do_, and rarely make any mention of how it
should fail.Failures are regarded as, well, failures, so what more is
there to sayabout them, except that they shouldn't happen?
Well, failures do happen, and how they are handled may be the
mostimportant aspect of defining the quality of a human being's
interactionwith a computer.
-------------------------