Different kinds of Variables: rich Variables

In this chapter, we look at “rich” Variables.

The first Variable in this sentence is a rich Variable:

“Because of the contents of the report, the Board increased project funding.”

We can imagine various versions of the report, and tiny variations on those different versions, all of which would pretty certainly have led to the Board making the same or a similar decision; and we can imagine many other versions of the report which would have led to a different decision. So “the contents” of the report" is behaving just like any other Variable - it can take different Levels, with corresponding consequences.

But we can hardly ever specify in advance all the Levels of these kinds of Variables. There is a bewildering variety of possible versions of the report, and we wouldn’t even know how to start listing or categorising them all, and they certainly don’t have a single obvious “higher” or “lower”, “better” or “worse”. You can always try to break down a rich Variable into many subsidiary Variables representing for example the length of the report, how positive the language is, and so forth. But these are not the same as, or better than, the original rich Variable, and they would seem to lose almost all of the relevant information in the report.

Nevertheless, we use rich Variables all the time to help tell causal stories.

Rich Variables are packed with information: they have high information entropy.


Perhaps the best examples of how we reason with rich Variables in real life and social science come from claims of causation or at least of influence.

A “rich Variable” is still a Variable, still something which can be one way or another, something which enters into causal explanations, but one which might need megabytes or gigabytes of storage to capture. In terms of information theory, it has a high entropy.

There is no way of delineating in advance all the possible documents which would count as versions of “this evaluation report”; we wouldn’t and couldn’t know where to start. Yet it is not difficult to recognise one and there would also be some consensus about whether or not a given report is a good one.

You can think of the different Levels of the Variable “the report” in terms of its digital incarnation - a long string of binary bits, 1s or 0s; and as we already saw although we can either think of that as a very long set of trillions of Variables, or their Cartesian product, a single Variable with very many Levels. Thinking of the report this way can be helpful. But I don’t know if it is really appropriate. It might be better to ask, how could we present the logical structure of all possible contents of this report? If you give a neural net the task of recognising 2D pictures of different cats, I guess the resulting network would not be arbitrary but would somehow contain visual knowledge about how cats appear, with perhaps some kind of rough hierarchy (perhaps not that different from that assumed in human brains) with low-level visual cues like edges feeding into subsystems that identify shape, etc. While no-one would want to say “you can just read off the structure and that is the universal definition of 2D-cat-ness”, perhaps you could look at all reasonably efficient such networks and find some common features and structure.

In the same way, perhaps you could feed in tens of millions of possible versions of the same report, i.e. addressing the same issue but with different contents, in pairs, into a neural net along with an estimate of the similarity of the two (maybe a score between 0 and 99) and just tell the net to learn to guess the similarity of an arbitrary pair of reports. I guess the resulting net would have to arrive at some internal representation of the key features of “this report” and perhaps we could think of this as “a Variable”.

Other examples of possibly rich Variables:

The overall mood of the focus-group participants

The sustainability of this project

The quality of management in this project

The teacher's dance demonstration

The student's reading problem

The dominant form of psychological problems since the earthquake

The contents of tomorrow's Presidential Address

The whole of Episode 7 of "the Big Bang Theory"

So, episode 7 of “the Big Bang Theory” is rich - but it is just a lot of pixels spread out over 25 minutes. It can be digitised, or already is digitized.

Sure. But there is a question of what philosophers call “identity conditions” xx - you could easily recognise “the same episode” even if it was on old-fashioned analogue video tape and this tape was a bit blurry and had a reddish tint.

A whole TV episode as a single Variable? What on earth are you talking about? There is just too much information.

Well, the sheer size of a Variable shouldn’t really concern us. We already saw xx that for any bunch of any number of Variables we can always define a single Variable with the same number of Levels which can encode the original bunch without loss of information.

Rich Variables (different from other rich Variables?)

Rich Variables can play a very important role in theories of change but their possibility is barely mentioned. A rich Variable might be something like “project implementation”-this would be a very broad example as it includes a Mechanism.

Attempts could be made to break down a rich Variable into many subsidiary Variables representing for example the number of hours of training delivered, the quality of the training delivered, the number of miles driven and so on and so forth. It would never actually be possible to break down many real rich Variables into just a few components but sometimes it is advisable and interesting at least to break out a few of them. In fact, in the above example, “quality” of training is in turn another rich Variable which could be broken down into many different aspects.

We are really discouraged from using these kinds of Variables when we look at formal manuals for log frame or theory of change construction. On the other hand, quite often see rich Variables in real life theories of change which have been put there so to speak in spite of the manual.

There are good reasons for wanting to use rich Variables but they also powerful and complicated things which need careful consideration. For example, if our implementation Variable is the delivery of funds to an NGO, in order to carry out the actual project, the NGO might go on to do many different things in complex different ways with this money. And all of these activities and side-effects should in turn go on to contribute to valued changes further down the line. It is legitimate to treat all of this a single rich Variable “the project” for a high level theory of change even though the intermediate Variable, the rich one just mentioned, has not even been superficially broken up into other Variables and perhaps it need not be.

One could even claim that many supposedly simple Variables can in fact be revealed as rich Variables if you want to.

Why collapsing and factorising might not be valid

Because there may well be hidden Mechanisms which are formulated in terms of one of the constituent Variables rather than in terms of the composite Variable. So if the Mechanism is edited in an unforeseen way, see xx open systems, an emerging Mechanism might be able to engage with one but not the other.


rich - you just mean qualitative?

Yes, kind of. But the quantitative/qualitative distinction has so many different uses that I prefer to adopt the somewhat more precisely defined non-rich/rich distinction as Theorymaker native speakers use it.

Rich Variables? There’s no such thing! If you can’t list all the possible Levels of a Variable, you don’t have a Variable.

You can’t list all the possible Levels even of a “count” Variable either. By all means it is usually easier to deal with Variables with a narrow and clearly defined range of Levels.

You won’t find a setting for “qualitative” Variables in SPSS. But my goodness, how many real Variables we have in social science which are not primarily reducible to single or multiple numerical dimensions. When a quantitative social scientist is confronted with an rich Variable, you will see a look of panic briefly flit across her face, and then she says “oh I am sure we can reduce this to its component numerical dimensions”.


Can we explicitly list all the completions?

Failing that, can we iteratively list all the completions?

Failing that, can we mechanically tell if any given sentence is a Level of the Variable?

Failing that, can we intersubjectively agree if any given sentence is a Level of the Variable?

Is there at least agreement about how long/detailed the Statements are allowed to be?

If there are multiple parts to the Statement, do they possess a structure and hierarchy?

Iteratively defined Variables (or sets of Variables):

An example of a positive impact on a person

What happens when a Variable is so difficult to define that it needs its own possibly rich algorithm?