Soft arithmetic in evaluation

Ordinary arithmetic takes numbers as input, and produces other numbers, usually fewer of them or expressed more simply.

We need soft arithmetic to be able to deal with data which is not clearly defined.

Most often, soft arithmetic takes comparisons as input, and produces other comparisons, usually fewer of them or expressed more simply.

Ordinary arithmetic is often a very important part of evaluation. But Soft Arithmetic is even more important and used even more frequently.

We use Soft Arithmetic all the time for example when we make comparative judgements in evaluation. Comparative judgements - saying that one thing is better (or bigger or more long-lasting) than another - are central to evaluation, both as its raw material and as its output.

For example:

If

the situation in province A is better than the situation in province B,

and

the situation in province B is better than the situation in province C,

then

the situation in province A is better than the situation in province C.

This is a kind of calculation, a kind of syllogism, which we can do without numbers. It’s ancient, and we do it all the time. Some calculations are possible with this Soft Arithmetic without numbers; others are not. So without numbers,

If

the situation in province A is better than the situation in province B,

and

the situation in province A is better than the situation in province C,

then

without a numerical score, we can’t compare B with C.

So just as with numbers, calculations with comparisons can help us some of the time, but not all of the time.

Just look at the table of contents for this Chapter (in the web version, this is in the menu at the left) to see some of the wide range of problems for which we can and do utilise Soft Arithmetic.

“Who refuses to do arithmetic is doomed to talk nonsense.” - John McCarthy

Well, yes and no. As we shall see.

Wait, surely there are two kinds of Variables - quantitative, i.e. numerical, Variables and the rest? With the first kind, you can do calculations and with the second - whether you call them qualitative or whatever, you can’t. Right?

Wrong. In evaluation, we often do calculations not with individual Variables but with comparisons between them. Comparisons are the raw material of evaluation; often they are its final product too. Good evaluators are good at this.

Soft arithmetic is just a way of understanding the many kinds of reasoning which we often do when we do evaluation. Valid reasoning using Soft Arithmetic - reasoning which follows roughly rules like those set out below - is just exactly as valid as valid ordinary arithmetic, i.e. reasoning using numerical Variables. It is not a poor cousin. It is somewhat less powerful but substantially more general, more widely applicable, than ordinary arithmetic.

Scriven’s methods like Qualitative Valuing and Synthesis (Scriven 1981a) are mostly examples of Soft Arithmetic.

Goodbye Galileo?

Pearl points out (Pearl 2000) - appendix - that it was Galileo who is most credited with the numerical revolution in science:

  • The job of science is first to describe
  • That description is to be done using numbers

… according to which, comparison data is not scientific. But for Theorymaker native speakers, comparisons are a big part of science.

Where do comparisons come from?

These comparisons could be anything - the happiness or health status of a patient before and after a treatment, the employment situation in a country, anything. Where do they come from? Comparisons can sometimes arise on the basis of a comparison of the underlying numbers.

They might come

  • from counting the number of people entering a self-help centre before and after improving the wheelchair access
  • from a randomised controlled trial!
  • from comparing the results of two randomised controlled trials.

But sometimes the comparisons are just raw material - a comparison provided by expert judgement, for example. Sure these can be subject to biases, but so can comparisons based on numerical reports. I know that it might seem like sleight of hand, but I claim that many kinds of comparison belong to the fundamental data of evaluation. For example, the supposed 800-pound gorilla of evaluation, the randomised controlled trial, might ultimately rely on, say, questionnaire data in which many respondents score something on a 1-5 scale, say, one’s conviction that, say, a child seems happy or a village is resilient. And then the impact calculation relies upon some tricky statistics conducted upon these scores. But the original scores are in this case essentially only subjective judgements which are made additionally tricky by the conversion to a numerical 1-5 scale, which, if you’ve actually conducted such work, can be very tricky and often the case of respondent complaints. Now, if respondents can validly attend to something like a child’s happiness and convert it into a number, they can certainly compare this child’s happiness with another child’s happiness (or the same child’s happiness before or after an intervention or in a different context) and at least say whether it is greater or less. In fact, I would argue (and I think Daniel Kahneman does too, in Thinking, fast and slow) that we are only able to give a 1-5 score because we are able to make comparisons with different possibilities.

This goes back to psychophysics xx.

Hence, though I haven’t shown how we actually generate comparisons in any particular evaluation context (there are many ways, but that is another story) at least I have argued that subjective comparison data, at least in some cases, can be at least as valid as numerical data based on subjective assessment.

Sometimes, subjective comparisons are valid because they rely on underlying numbers - for example, you can ask people how large they think a crowd is, but if you have the time it is better just to count the people. Other times, I claim that our ability to judge a child’s happiness or one’s satisfaction with something, is fundamentally based on an ability to compare, and our ability to convert this into a number is secondary.

Another kind of Variable?

You could say that we have identified a kind of Variable, the kind for which we can make comparisons but which are not originally available as numbers, and we could call these Variables “soft Variables” or perhaps “qualitative Variables”. And perhaps we have. But it would be wrong to think that Soft Arithmetic only has to do with these kinds of Variables, because below we will see that sometimes we have to do Soft Arithmetic with numerical Variables too.

Let’s do some Soft Arithmetic

Anyway, let’s speed things up a little and proceed with some Soft Arithmetic.

These first few examples might seem really trivial, just a way of stating the obvious with funny symbols. In a way they are, but please be patient, we will get to some more interesting stuff further down …

Let’s just express our comparisons using greater-than > and less-than <symbols. We learned this in secondary school.

Transitive rule

like this:

if A>B and B>C then A>C.

This is what we looked at above.

Differences

Now a staple of evaluation is the “calculation” of Differences - most often, the Difference on a Variable with and without a project intervention. Most often, this is the Difference between factual and counterfactual scenarios, even though we don’t usually use those words. We can write a Difference using a double minus sign:

A–B

For example,

in project 1, F1 is the factual situation and C1 is the counterfactual

in project 2, F2 is the factual situation and C2 is the counterfactual

So … if

F1<F2

but

C1>C2

then

F1–C1 < F2-C2

(Note for nerds: we could actually treat this as a definition of how to use the minus sign in Soft Arithmetic.)

Example: If the children in the project school were less happy than those in the comparison school before the intervention, but happier than their peers afterwards, then, other things being equal, we can say that the Difference in the project school is greater than that in the comparison school.

Judgements with equality

Perhaps we can assume that our raw data, the comparisons, also include an ability to say not only when one thing is greater than another but also that two things are equal. We can use an equals sign for this kind of case.

So we can have a versions of the transitive rule and the Differences rule above, for example we can change the first greater-than sign into an equals sign and say:

If

F1=F2

but

C1>C2

then

F1-C1 > F2-C2

because, again, the Difference F1-C1 is bigger than the Difference F2-C2.

Example: For example, if the children in the project school were as happy as those in the comparison school before the intervention, but happier than their peers afterwards, then, other things being equal, we can say that the Difference in the project school is greater than that in the comparison school.

Soft addition

Suppose for example we know:

F1>F2

then we know

F1 + G > F2 + G

In other words, if the situation in which F1 holds is better than the situation in which F2 holds, then the first situation is still better than the second situation even taking into account some other Variable whose score does not change.

Similarly, suppose we are working with two sets of comparison judgements.

If

F1>F2

and

G1>G2

then

F1 + G1 > F2 + G2

Example: if the children are happier after the intervention and so are the parents, we can say that the overall situation before is better than the overall situation after.

Soft division

Many key evaluation judgements are expressed in the form of ratios, and especially soft ratios. So efficiency is usually defined as something like the results in relation to or divided by the inputs. We can construct rules for soft division similar to those above, for example

From

F1>F2

and

G1>G2

we know

F1/G1 > F2/G2

So if project A produced more outputs with less inputs than project B, it is obviously better. This is what we do, usually informally, when we compare the value-for-money of two or more projects or programmes. I wrote at a little more length about that here. There I suggest referring to “cost-efficiency” as “outputs per input” and cost-effectiveness as “outcomes per input” and what DfID calls “Effectiveness” as “outcomes per output” - in order to make clear that these are ratios, and to make clear exactly which ratios we mean.

Just to take stock - when we report a programme’s outcomes per input aka cost-effectivness, we are doing a lot of Soft Arithmetic. How do we arrive at the outcomes per input for our programme?

The costs or inputs are:

Factual Level of input - counterfactual Level of input (what was spent minus what would have been spent on the programme if the programme had never happened, which is nearly always zero).

The outcomes are:

Factual Level of outcome Variables - counterfactual Level of outcome Variables.

If you “forget” to subtract the counterfactual, implicitly recording only change compared to the baseline, you are assuming that the Level of the outcome Variable(s) would have remained frozen if the programme had not happened. This is sometimes a valid assumption, and sometimes a silly one. Either way, “the outcome(s)” have to be understood as a Difference - whether you are implicitly just subtracting the baseline score or have some more sophisticated counterfactual in mind. More about this here.

So outcomes per input aka cost-effectiveness, where we have, say, two input Variables, perhaps money and time, and one outcome Variable, is this:

\[{((Factual\,input\,A - Counterfactual\,input\,A) + (Factual\,input\,B - Counterfactual\,input\,B)) \over ((Factual\,outcome\,Q - Counterfactual\,outcome\,Q))} \]

Now this might look frightening but, if you think about it, if these were numerical Variables this is pretty much what any of us would do to arrive at cost-effectiveness, whether we realised it explicitly or not. If all the quantities were numerical, we could do it with a pocket calculator or a couple of rows of Excel. My point here is that the same formula holds even when the (some of) the quantities are not numerical, as long as they can be compared. And we do this in our heads!

Example: “well, the kids were obviously happier than they would have been, and the cost was relatively small - Auntie Leyla did some great conjuring tricks, and it wasn’t any effort for her, and she did it for free, and we hardly spent anything on materials - so the party was very cost-effective”.

Note that here we are even doing more arithmetic when thinking about the kids’ happiness, because we are instinctively not only adding up factual-counterfactual Differences in a dozen different kids but implicitly following some special rules which probably give more (negative) weight to one child who was in floods of tears than to eleven others who were just pretty cheerful.

Comparing ratios

We can even compare these ratios.

Example: “well, last year the kids were obviously happier than they would have been, and the cost was relatively small - Auntie Leyla did some great conjuring tricks, and it wasn’t any effort for her, and she did it for free, and we hardly spent anything on materials - so it was very cost-effective. But this year we spent all that money on MacDonalds and the bowling alley and half of the kids ended up in tears, so last year’s party was much more cost-effective”.

Here we are actually doing this kind of Soft Arithmetic in our heads:

\[{((Factual\,input\,A1 - Counterfactual\,input\,A1) + (Factual\,input\,B1 - Counterfactual\,input\,B1)) \over ((Factual\,outcome\,Q1 - Counterfactual\,outcome\,Q1))} \]

is greater than

\[{((Factual\,input\,A2 - Counterfactual\,input\,A2) + (Factual\,input\,B2 - Counterfactual\,input\,B2)) \over ((Factual\,outcome\,Q2 - Counterfactual\,outcome\,Q2))} \]

… where A1 is money, B1 is effort, Q1 is the kids’ happiness last year and A2 is money, B2 is effort, etc, this year. So in this case, the Variables are essentially the same in the two interventions under comparison. But we often compare apples with oranges too - for example, maybe this year there was another cost Variable (a priceless vase got broken) or there were fewer kids (so we have to compare a certain effect on five kids with a different effect last year when we were foolish enough to invite twenty).

Comparison with standards

In particular we often want to say that one thing is better (or bigger, etc.) than a certain standard. So for example we might want to compare one programme’s outcomes per input with some kind of minimum acceptable. This standard, some kind of norm, can also itself be seen as a ratio, a soft division, though this might not be immediately clear from the way it is presented.

Comparison with rubrics

A variant of comparing a ratio with a standard is comparing it against a set of rubrics.

In general you can think of rubrics as a set of rich descriptions of actual projects with different value-for-money ratios, perhaps five of them, which are themselves ordered so that example number 5 is better than number 4, which is better than 3, etc. Now although these rubrics can be referred to by their number, these numbers 1-5 are useful nicknames for the five standard ratios, nothing more. The reason I am slightly bothered by the use of numbers in this way is that they imply a little more than they should. So as 2-1 = 5-4, we should expect that the Difference between a project which gets a 5 and a project which gets a 4 is the same as the Difference between one which gets a 2 and one which gets a 1. But we don’t know necessarily know that. In classical statistics we should say that these are just “ordinal numbers” which express an ordering 5 > 4 > 3 > 2 > 1, nothing else.

See also rubrics

“Mere connections” in Judea Pearl’s causal diagrams

Pearl shows that you can use graphs without any information about the rules at all to nevertheless calculate some independences in the causal network which would be disproved if there were shown to be dependencies in actual data. This is a good example of the kind of “calculations” we need to be able to do as evaluators with limited information. However, we need to be able to do a lot more. See the sub-chapter on monotonicity.

Soft arithmetic and “balanced” dashboards

(Ramalingen 2009)

References

Scriven, Michael. 1981a. Evaluation Thesaurus. doi:10.1016/0020-7489(92)90035-F.

Pearl, Judea. 2000. Causality: Models, reasoning and inference. Cambridge Univ Press. http://journals.cambridge.org/production/action/cjoGetFulltext?fulltextid=153246.