Image, Counterfactual, Difference and Effect

We learn to express Difference, Effect and Effectiveness in Theorymaker based on the concepts of Factual and Counterfactual.

*250--0* teachers were trained; *increased* ministry engagement 

 !do The project was funded with *1--0* million USD ((continuous number))

In this example, the Intervention is to fund the project with 1 million USD (rather than none). 250 teachers are trained (rather than none) and ministry engagement is increased (but by some unknown amount).

We can say for example that the Effectiveness of the Intervention, the Difference it makes, on the number of teachers trained is 250–0, i.e. 250 rather than zero. This case is typical in that the funding of the project only incompletely determines the number of teachers trained. Nevertheless, 250–0 is judged the most likely Difference, other things being equal. “Calculating” the Difference is a task for Soft Arithmetic. In some cases it will not be able to consolidate the various possible outcomes (depending on the Levels of other important Variables) into a single Difference like 250–0 and the expression of the Difference will be a (possibly nested) serious of conditional statements.

When an Intervention also includes one or more noise Variables, and the Levels of those noise Variables are fixed, usually because the Intervention has already happened and we have recorded the factual Levels: we can calculate the Effectiveness of the Intervention given that those noise Variables take a specific set of Levels; we can call this the Effect of the Intervention.

However the Effect is not so interesting in evaluation because it depends too much on coincidence. Effectiveness is immune to coincidence.

Meaning of Factual and Counterfactual are conditional statements already given by the Rules

Arguably, the marking of a Variable as an intervention Variable doesn’t mean anything. We use them in conditional Statements of the form “If Variable X (the intervention Variable) takes Level F, downstream Variable Y will take Level y1; and if it takes Level C, downstream Variable Y will take Level y2”. We already know this because that is precisely what the Rules tell us: given such-and-such Levels of the upstream Variables, expect these-and-those Levels of the downstream Variables.

The counterfactual Level of a Variable (given a Theory with at least one Intervention Variable), or just “the Counterfactual” is the Level it would have taken without the Intervention. The Counterfactual is not defined if there is no Intervention, and pretty meaningless if the Variable concerned is not downstream of the Intervention.

The factual Level of a Variable (given a Theory with at least one Intervention Variable), or just “the Factual” is the Level it takes under the Intervention; whereas if there is no Intervention, the factual Level is just whatever Level the Variable takes in a true Statement, see earlier xx.

Similarly, we can refer to all the Variables taking their calculated Levels as the Factual and we can refer to all the counterfactual Levels of a whole Theory as the Counterfactual.

Difference

This …

Current weight is *80--85* kg ((lo-hi))

… can be replaced with this …

The Difference across the Variable "Current weight" ((lo-hi)) is *80--85* kg

So a Difference across a Variable is essentially an ordered pair of its Levels, written between asterisks and separated by a double minus sign, like this:

*F--C*

where C is the Counterfactual and F is the Factual.

translation:

We can think of “Difference” as a kind of generalised subtraction which contrasts the Level under an Intervention with the counterfactual Level.

Image

The Image of a Level(s) of an upstream Variable on a downstream Variable given some Theory which links them is the Level(s) of the downstream Variable dictated by the intervening Rules.

The Image of a Difference across an upstream Variable on a downstream Variable given some Theory which links them is: the Difference on the downstream Variable between the Image of the Factual and the Image of the Counterfactual: but is it the Difference between the Images of the corresponding Levels, or the Image of each of the Differences?

In general, when there is more than one parent Variable, the Image of a Level is a set of conditional statements, one for each possible combination of the levels of the other influence Variables, each presenting the effects of this difference on the consequence Variable, given the combination; each statement is weighted, where available, by the probability of each combination. So in some cases, where Variables are numerical and probabilities are known, the Contribution can be collapsed right down to a single quantity, using ordinary statistical calculations, but see the Chapter on Soft Arithmetic. In many other cases it can be partially collapsed but (again using Soft Arithmetic) will remain as a set of Differences.

So as the Image of a Difference is the Difference between the Images of the corresponding Levels, in the general case the component Differences will also be sets of conditional Statements.

Noise Variables

In a Theory, any root Variable which is not marked as under the control of an Actor is called a noise Variable. Noise Variables may or may not have a probability distribution associated with them.

In the diagram above, “Good rains” are a noise Variable. Unfortunately, we can’t influence the rains.

Effect

The Image of a Difference across a Variable V on a Variable U downstream of it, given a certain Theory, is called the Effect of V on U.

Note for Nerds18.

Calculating the Image

Causal, not statistical

Pearl (Pearl 2000) points out that the calculation we need to make in order to work out an Image is a causal one, not a statistical / probabilistic one. So for example, given the Theory

The lawn is wet ((no,yes))

 It has been raining  ((no, yes))

the Image of

It has been raining *yes--* ((no, yes))

on the other Variable is

The lawn is wet *yes--* ((no,yes))

but the reverse is not the case, even though the two Variables are correlated. The calculation we need to conduct is, as Pearl puts it, “surgery” on the causal diagram. We delete all the ancestors of the upstream Variable and see what happens to the downstream Variable when we force the upstream Variable to take a particular Level. We can see this easily looking at Theorymaker diagrams which encode precisely this information.

Aggregation

Note for geeks: “best guess” is not only a nod to Bayes xx but is also more suited to the general functional approach we are taking in this book, following Pearl (Pearl 2000) rather than the more narrowly probabilistic approach (“expected value”) which is more common in evaluation texts and which assumes that all the Variables are numerical, etc.

Updating the Theory

So our calculation depends on our Theory about the Mechanism. When we are doing evaluation, before trying to determine the effect of a Variable, we will nearly always improve any initial Theory about how the project works with additional data based on what actually happened during an intervention, and why. See xx.

Dealing with the contribution of other upstream Variables

In practice, calculating / understanding the Image of a Variable’s Level on some downstream Variable can be difficult, because many other Variables usually also contribute.

Pearl and others often speak about examining the effect of of one Variable on another “while holding all third Variables constant” but this only works if we assume all the Influences on the downstream Variable combine linearly (see xx).

In the most general case, all we can ever do is describe the Difference made by an intervention - the contribution, while holding other Variables constant, but for each and every possible combination of the Levels of all the other Variables.19

Look at this case (the downstairs and upstairs light-switches are the kind which work independently in the sense that if either is on, the light is on, otherwise it is off):

The stair lighting is on !Rule: AND ((no,yes))

 The circuit is closed  !Rule: OR ((no,yes))

  The downstairs light-switch is on ((no,yes))

  The upstairs light-switch is on ((no,yes))

 There is currently no power cut  !Rule: OR ((no,yes))

If we want to calculate the Effect on the stair lighting of switching on the downstairs light-switch, i.e. of downstairs light-switch is *on--off*, in the worst case all we can do is list all the possible combinations of the other root Variables:

Upstairs switch Power cut No power cut
upstairs switch is ON off–off on–on
upstairs switch is OFF off–off on–off

In this case there are two other root Variables with discrete Levels so we have made a two-dimensional table. With continuous Variables, we will usually have to try to find some way to aggregate the results.

Differences as raw data

We saw earlier the following strange thing:

  • we allow Statements into Theorymaker as assertions that a certain Variable takes a certain Level.
  • we also allow expressions of inequalities between Variables as a new kind of Statement
  • in the case of fuzzy Variables, we find that we sometimes have useful expressions of inequalities which we cannot however cash out into expressions that particular Variables take particular Levels.

In the same way, we often have knowledge about a Difference across a Variable which cannot be cashed out into specific factual or counterfactual Statements because they are under-determined. So we might know that there is a Difference across a Variable of 2000 EUR or 15 lives or 20 children, without knowing exactly the Factual or Counterfactual.

Difference and subtraction

So if a person who had been using ABC Diet Programme for the recommended number of weeks is 80kg at the end, and our best estimate of what their weight would have been is 85kg, we can say the Difference made by the Programme is 80-85 = -5kg.

Current weight is *80--85* kg ((lo-hi))

Our best estimate of the Counterfactual is probably based mainly on their weight at the beginning of the programme, but we might adjust it according to various other factors, e.g. season, personal history, etc. We look more closely at how to estimate Counterfactuals in chapter xx.

But in other cases we can’t just subtract. So we can say the President’s rapid intervention helped reduce international tension, as a result of which the National Alert Level was dropped from red to orange.

National Alert Level is *orange--red* ((green < orange < red))

You can read this in English as “Because of the intervention, the National Alert Level is orange rather than red, (it would have been red without the intervention); and the possible Levels are green, which is less than orange, which is less than red.”

Sometimes a Difference cannot actually be reduced to a single number using subtraction. But we can still do some useful reasoning with it. It is a big part of the evaluator’s job to reason with and combine Differences. For this, we need Soft Arithmetic.

Soft arithmetic with Differences

Differences are themselves quantities which in turn are amenable to Soft Arithmetic.

So if the average weight loss using Diet Programme A is 3kg and the average weight loss using Programme B is 2kg, we can say the Difference between the two is 1kg.

For example, comparing the Differences X and Y:

X = (x1, x2)
Y = (y1, y2)

If

x1>=y1 



y2>=x2 

we can say that

 Y>X.

Expressing difference with asterisks helps to focus on the actual Variables

Differences written between asterisks are printed in italic in Theorymaker Diagrams:

National Alert Level is *orange--red* ((green < orange < red))

More generally, any global expression of a Difference can be highlighted with enclosing asterisks even if it is not written with the soft subtraction symbol --.

*Improved* teaching quality

 *Improved* teaching skills

  *More* teacher training



wrap=5

This kind of formulation is very popular amongst Theorymaker native speakers, especially those lucky enough to be chosen to be allowed to actually do project planning and reporting.

In ordinary Logframes and Theories of Change, we very often see Differences as the labels of Variables. But of course we don’t really mean that the Variables are actually Differences. We wouldn’t for example observe the Variable “Improved teaching quality” at the beginning and end of the programme or in comparison groups. We would measure “Teaching quality” and hope to notice an improvement. Asterisks are a useful piece of Slang which help us both name the Variables correctly and also highlight the Differences we expect to make.

So the above diagram is more informative and motivating than this one, in which purely the names of the Variables are given:

Teaching quality

 Teaching skills

  Teacher training



wrap=5

Using asterisks like this can help us tighten up our Theories of Change. For example, suppose we Earthlings had a vague plan which included “Improved teaching” as an outcome. We know that is not really the name of a Variable so we try to express it like this:

*Improved* teaching

… but that makes us realise that “teaching” isn’t much good as a Variable name and we might want to specify a bit more clearly what we mean.

Differences and “targets”

Theorymaker native speakers have no need for the concept of a “target” in project planning. Remember, Theorymaker native speakers cannot tell lies, so their best guess at what the project will achieve is their target.

You can think of a “target” in two ways - as the expression of a Difference or as the expression just of the Factual level with an assumed Counterfactual.

The currency of evaluation is Differences

The currency of evaluation is Differences.

The currency of evaluation is Differences - between how things are and how things could have been. We intervene to make a Difference. Our intervention is a Difference between doing and not doing something, and the result is a Difference too. In each case, the Difference compares the Factual Level of some Variable with the Counterfactual Level. In a sense, this comparison is a subtraction, though of course actual arithmetic is only possible in a limited number of cases.

Perhaps it is hard for us to distinguish between Facts (e.g. the Factual Level of a Variable, or the Counterfactual Level) and Differences between them because when thinking about these things we try to simplify by only considering binary yes/no propositions (the whole discussion around INUS causes, xx, does this). The Fact that the project takes place sounds pretty much like the Difference between the project taking place and its not taking place. But they aren’t the same in general, as we can see more clearly when considering non-binary Variables. So, the evaluation question: what effect did raising the funding from 700K to 1 million EUR have on the outcome Variables is obviously not the same question as what was the effect of the 1-million-EUR funded project compared to no project at all?

So, for Theorymaker native speakers, Factual/Counterfactual thinking is inevitable and inescapable in any Theory (and in any logframe, as logframes are Theories of Change with bells and whistles). All Theories imply Differences between factual and counterfactual states: they say, for all or some of the Levels of influence Variable A, what are the likely Levels of Consequence B? In any given scenario, Variable A will have only one factual Level, and Variable B will (go on to) have one factual Level. But if Variable A had had some different Level, Variable B would have had some (probably) different Level; otherwise we wouldn’t say that Variable A can affect or causally influence Variable B20.

Law on new flag design is passed  ((no, yes));  Rule = 50% threshold

 Percentage of those voting in referendum   who approve of new flag  ((0-100))

So in the example above, if referendum day has already passed, we say they could have taken a different Level to the one they did take, historically. The influence of the vote on the law is probably a very strong one; if we assume that Parliament is legally bound to follow the referendum result, the passing of the law is almost inevitable, but it doesn’t follow logically - there could be, say, an asteroid strike between the referendum and the passing of the law. So we can say: if the majority vote for the flag, the corresponding law will be passed; and if the majority don’t vote for the flag, the law won’t be passed. So we are thinking in counterfactuals. A Variable essentially involves counterfactuals, because although only one of its Levels can happen at one time, it gets its whole meaning from the fact that the other Levels could happen / could have happened.

Where do alternative possibilities come from?

Given that the world is just exactly as it is, what holds open the idea of counterfactuality? How do we come to say more plausibly that the train might have arrived late, in which case we would have missed the plane, rather than that the solar system wasn’t suddenly full of cheese? Is it the similarity of nearby possible worlds to this one (Lewis 1979)? No: Pearl suggests that it is the relative autonomy of the constituent Variables and Mechanisms which provides the hinges of counterfactuality which make explanation possible.

Evaluators can’t escape thinking in Differences

(I am grateful here to colleagues on XCEval for their ideas and contributions )…

If an “outcome” is not caused by the intervention, it is NOT an outcome; it’s merely a coincidence (Davidson 2010)

A Counterfactual is not what happened elsewhere to a matched control group. That is putting the cart before the horse and confusing a (possibly) good source of data for estimating the Counterfactual with the Counterfactual itself (what would have happened). It also suggests that we could never learn to make causal attributions and prediction without having access to controlled trials, but in fact we do learn, from an early age, on the basis of actually-present information but not just trials and not just controlled. What is so exciting about Judea Pearl’s approach, (Pearl 2000), not only as a philosopher but also as an pioneer in AI, is that he actually constructs algorithms which can work out causal rules on the basis of what we have been taught to think of as “merely correlational” evidence. These algorithms actually work - they can help a robot guide itself through a complex causal world. Just as we do, perhaps.

A Counterfactual is not merely what might have happened - they are not mysterious other worlds we never see but rather one half of how we make sense of this world, as constantly contrasting against one or more alternative possibilities.

A Counterfactual is not another, hard-to-imagine world just like this one within which events happen, things cause other things, etc. This can’t be, because our understanding of events happening involves our (albeit patchy) causal schemas, and causal explanatory schemas in turn need a Counterfactual. How can we use the idea of a Counterfactual world to explain causation if they themselves contain Counterfactuals?

The Factual on its own - that is the sound of one hand clapping.

The Factual and the Counterfactual, either of them on their own, are just one hand clapping; events and causal chains appear when they are contrasted with one another; and there is nothing that forces us to choose one possible Counterfactual over another. This helps explain why the world is really different - with different but compatible causal chains at work - for different people from different perspectives.

You are saying our very understanding of history and our current world and the events in it essentially take a factual–counterfactual form. But which counterfactual you pick is sometimes arbitrary.

Yes, though do note that if you and I are thinking of different counterfactuals, our views of the world will be different but compatible.

Isn’t it remarkable that there is such a variety of opinions and understandings amongst evaluators of such a central concept as Counterfactual?

Ricardo Wilson-Grau quoted Jim Rugh and Michael Bamberger, in the second edition of their book RealWorld Evaluation, who estimate that the experimental method is applicable in perhaps 5% of evaluations and the quasi-experimental to between 10% and 25%. That leaves the remaining 70%, which contains the evaluations that most of us actually do. He seemed to suggest that those of us in those 70% of evaluations can’t apply the proper methods for estimating Counterfactuals are stuck with looking at the Factual, so let’s be proud of that. But I believe any interesting evaluation statement has to consider Differences (a Variable having this Level rather than another) and that is already a Counterfactual idea before we even consider the Factual-Counterfactual Difference this is going to make on another Variable. Isn’t meaning to do with this contrasting with how this might have been? As soon as you ask “what happened because of the project” you have a causal and a Counterfactual angle. And different people can think of “the same” intervention as different, contrasting Differences (e.g. versus the intervention not taking place rather than, versus the intervention having a different Level, e.g. less funds than last year, etc.) and give different answers because of this.

We ordinary homo-sapiens (and nowadays some automatons too) seem to gather loads of valuable causal heuristics - with Counterfactual implications - just by “looking at the Factual”. But getting meaning from the Factual, seeing as, already implies Counterfactuals. Controlled trails are not the only or even the Royal route to causal knowledge.

Yes, Difference is being used in a technical sense

I’m a little confused here. “Difference” doesn’t require a reliance on a Counterfactual. I felt lousy yesterday having developed a tummy bug, but today I feel better. Those are both based on (admittedly impressionistic) Facts. Now whether my various interventions (water, continuing eating, going to bed early and having a good night’s sleep, an assortment of drugs, or generating sympathy from friends) made any difference individually or in combination is anyone’s guess. I certainly wasn’t going to wait around trying one thing at a time21.

Thanks, yes that is exactly it … I’d like to use Factual/Counterfactual for any contrasted pair of Levels of some Variable where one actually happens and the other might have happened, but didn’t. I’d like to use these words whether or not this contrast is a cause or result, or both, in one or many causal processes. I’d like to use the word “Difference” for just that contrast. Strikes me that we really need this terminology because these kinds of contrasts are central to evaluation, because they are connected to meaning and because they are the right way to express the explanans and explanandum in statements about attribution and contribution.

So you noticed a Difference in your general sense of health today, some kind of fuzzy “pretty good” rather than “lousy” (presumably an expectation set up by the previous day’s suffering; the very same set of quite-OK feelings might not be a source of surprise or relief set against a different expectation). I’d like to say, the Factual state was “pretty good”, the Counterfactual was “lousy” and the Difference is a source of celebration. It may or may not require a causal explanation or go on to explain other Differences.

So I agree “Difference doesn’t require a reliance on a Counterfactual” in the sense that it doesn’t have to be the effect or cause of anything in particular. But still like to call the expected lousy feelings “the Counterfactual” for the want of a better word and also because it makes things so linguistically tidy: the effect of one Difference (between Factual and Counterfactual states) is another Difference (between Factual and Counterfactual states).

Also, following your example, in ordinary speech we use “change” and “Difference” fairly interchangeably, and both can be applied either to different Levels of Variables in a time-series, i.e. your temperature dropping or your mood improving, as well as for instantaneous Factual/Counterfactual contrasts, i.e. your feeling right now compared to some contrasting lousy expectation. I’d like to reserve “Difference” just for such instantaneous contrasts. Even in project documentation, “change” and “Difference” can both be used interchangeably, often ambiguous as to whether we mean merely a shift in a Level over time (which might have been due to happen anyway) or a Difference made.

A bit more precisely:

  • I’d define a Difference as being more generally between any two subsets of the possible Levels of a Variable, not just any two single Levels.
  • The Variables and their Levels and the the current state of the Variable can all be fuzzy, imperfectly defined and imperfectly known; and the rules which connect these Variables and others can be chaotic, complex, knowable, unknowable and subject to all kinds of framing issues.
  • In this sense, a “Variable” is a single, unique thing which can be different, rather than any set of such Variables, noting that statisticians use the word “Variable” for the set. So your feeling this morning was a single Variable, whereas the time-series of your feelings over the last week, or the feelings of all your peers this morning, we’d call sets of Variables.

References

Pearl, Judea. 2000. Causality: Models, reasoning and inference. Cambridge Univ Press. http://journals.cambridge.org/production/action/cjoGetFulltext?fulltextid=153246.

Davidson, E. Jane. 2010. “Outcomes , impacts & causal attribution.”


  1. More generally, we can understand a Level as an ordered pair of sets of its Levels / probability distributions over its Levels

  2. Grateful to Denis Bours xxx for his contribution to this on XCEval

  3. Philosophers will note that we are at the edge of some very tricky questions here about the precise “theory of causation” we are going to follow. More about that in chapter xx.

  4. This objection is actually due not to the Devil but to Bob Williams.