Evidence for a Theory of Change

Establish the best Theory of Change using the best evidence you have.

Combine multiple sources of evidence.

An Evaluation design requires a Best Adequate Theory of Change (“BAT”)

This is hard work

“Contribution Analysis” is a set of rules for assembling evidence for (part or all of) a Theory of Change. One big advantage is that the full procedure is transparent and perhaps even reproducible.

Mayne and especially (Delahais and Toulemonde 2012) emphasise that this is a lot of work.

Adequate Theories; the 30-30 principle

How incomplete may a good Theory be?

I’d like to say, the evaluator’s job is to establish a true Theory of the project in question. But this sounds like a perfectly correct Theory, heaven on earth, and the very idea will make philosophers of science collapse in fits of laughter. But we can seek an adequate Theory - adequate both in the sense of “good enough”, i.e. it doesn’t have to be perfect, and also in the sense of “good enough”, i.e. a usable approximation to the truth.

In fact, a Theory with too many links and Variables is often not as useful as a simpler one, and it is harder to evaluate too.

Theorymaker native speakers usually insist that an adequate Theory has to reach some kind of standard, something like this “30-30 principle”:

30-30, good enough for me.

If a Theory is adequate, there is only a 30% chance that it is more than 30% wrong in predicting the outcomes that matter.

So for example, it is not too tragic if a Variable is left out of our Theory if there is no more than say a 30% chance that it will have a significant (say 30%) impact on valued outcomes.

It is only because we are already good at working with only adequate Theories that we manage to have real-life Theories at all; because any true Theory of real social phenomena would have a probably unending number of Variables in unendingly complex combinations.

or: an arrow in a diagram like this implies that some change in the influence Variable is associated with a substantial change in the consequence Variable for at least some combination of Levels of the other influence Variables of the consequence Variable. (How big does “substantial” have to be - well that can differ from context to context.)

Adequate theories: missing Variables?

Do we also assume that the expression of a Theory includes the claim that no other important Variables or links are missing

So the generic expression of a Mechanism in its native, broad, Context, will include just those Variables it needs to be a BAT. But in a more specific Context, other local Variables and Mechanisms may become important, in which case they need to be included in a Theory before it can be legitimately stated.

See chapter on open and emergent Mechanisms xx.

Pearl notes (Pearl 2000) that it is not possible in traditional correlational approaches to even express that two Variables have no causal connection, are causally independent. In Theorymaker, there is a claim implicit in every Theory that all other connections are small.

Is there actually such a thing as an incomplete Mechanism, or are there only incomplete Theories?

Noise Variables

In the correlational paradigm, incomplete relationships are though of as being completed by the influence of an additional “noise” Variable, usually governed by a probability distribution.

So if we assert that C = f(A) in a particular context, we are also asserting that no other Variables have an important influence on C.

But most often while the influence Variables determine a lot or most of the variability of the consequence Variable, they do not control everything. One way to make the Mechanism complete is just to draw an additional “noise arrow”.

The noise arrow serves a similar purpose to error variance arrows in structural equation models (Bollen & Long, 1993). What this arrow (and the implied but invisible Variable at its root) does is provide just precisely all the influence necessary to ensure that the Level of the consequence Variable is always completely determined.

The kind of gap-filling provided by this noise arrow can be quite complicated. So suppose we have a well-tested 2-hour workshop model for children on social skills whose outcome can be pretty well predicted in most cases, providing the groups are single-sex. However with mixed groups, the results vary wildly in a way we are not yet able to predict.

Outcome (average behaviour improvement) (Levels: hi,lo) 

 Single-sex group (Levels: yes,no)

 Training (Levels: yes,no)

So in the same-sex case, the arrow doesn’t have much work to do. But in the mixed case, it has a lot to explain.

Asserting a Theory is usually taken to imply that this Theory is adequate, is the best guess, i.e. that the influence of the noise arrow is small. Unfortunately, we often we make do with just a plausible connection. It was plausible that Libya would develop as a happy democracy after allied bombing, i.e. you could have drawn an optimistic Theory showing these plausible links, but it wasn’t the best guess of most people who knew anything about the region because of many other pretty obvious and likely factors which were left out of the equation. Much the same is true of plenty of optimistic Theories of Change in the development world - this training might lead to changes of teaching behaviour amongst teachers, which might plausibly lead to changes of behaviour amongst their pupils … they trace one plausible link through the Variables which are explicitly mentioned, but everyone knows there are other Variables which have an important influence on the Variables of interest.

We could label the noise arrow with e.g. “40%” to show that we believe the other influence Variables explain maybe 60% of the variation in the consequence Variable. There are two important points here:

  • Judgements of the quality of a project should take this “remainder” into account.

  • It matters whether this remainder most likely consists of a lot of independent random influences or whether there might be one or two major Variables which have not been included. In the latter case, they should certainly be included in the model.

See also what we said about Emergent and Open Theories in chapter xx.

Is evidence essentially correlational?

For around a century, we have been used to the idea in social science that evidence is essentially correlational. Recently, Pearl and others have been reviving what was almost a taboo, namely that under certain circumstances it is possible to derive causal statements on the basis of correlational data.

Whereas some (e.g. Scriven) even dare to claim that there can be direct observation of causation. This would make causal data a new category of data, one not reducible to correlations.

This possibility is easy to model in Theorymaker:

Measurement, data, evidence

Sometimes we end up talking about measurement when all we want to do is distinguish between what we have called data Variables and, say Variables defined in terms of others (or also to distinguish them from latent Variables, see xx).

Don’t say “This Variable is measured at two time points” when all you mean is “This Variable is defined at two time points in this context”

We could also say “This Variable is measurable at two time points”, but see xx in Theorymaker, “Measurement” is reserved just for numerical Variables, and there is a wider range of Variables we might want to report on.

Sources of evidence

Updating a Theory of Change based on what actually happened

The best, but not the only, source of evidence relevant to a Theory of Change is how things actually worked out in practice. In order to update a Theory of Change with this evidence, we have to have a clear idea of the factual/counterfactual distinction and how we are going to estimate the counterfactual Level of key Variables. Without this, we are unable to recognise coincidences and are therefore unable to conduct an evaluation.

Actually, “how things worked out in practice” is itself not a dumb set of scores but may involve for example direct information on causal connections.


Delahais, Thomas, and Jacques Toulemonde. 2012. “Applying contribution analysis: Lessons from five years of practice.” Evaluation 18 (3): 281–93. doi:10.1177/1356389012450810.

Pearl, Judea. 2000. Causality: Models, reasoning and inference. Cambridge Univ Press. http://journals.cambridge.org/production/action/cjoGetFulltext?fulltextid=153246.