Evaluation: Appraisal of projects and programmes

We define Evaluations as Reports on the value of Theories of Change, i.e. Assessments of the valuable Difference made by a Project.

So Evaluation in this sense necessarily involves assessing to what extent an Intervention can actually add Value, i.e. maximise valued Variables.

So designing an evaluation process involves essentially designing a complex Mechanism which takes a Project as its input and outputs a Report on the quality of the Project; did it do something valuable?

We look at evaluations as performing different kinds of reporting, from those with more clearly defined outputs (“summative evaluation”, Scriven xx) to those providing narrative judgements (closer to Scriven’s “formative evaluation”).

Reporting and value

Evaluation isn’t just about producing any old Statement, but is about producing Statements which are intrinsically about the worth or value of something.

Michael Scriven famously claimed that evaluation is a “meta-discipline” which involves finding the worth or value of anything, from fine wines to, presumably, playground jokes and football cup finals.

Essentially that is about generating a statement on what we have called a V-scale.

This table shows Scriven’s and the Theorymaker view of what Evaluation is:

Subject of Report Not valued (Descriptions) Valued (Appraisals)
Anything Not evaluation according to Scriven or Theorymaker Scriven’s evaluation
Projects Not evaluation according to Scriven or Theorymaker Scriven’s evaluation; Theorymaker Evaluation

Note that even when the reported Variable is valued, the reporting Variable will not usually be valued, at least not necessarily in the same way by the same people.

Theorymaker native speakers agree with Scriven that Evaluations essentially involve reporting Value.

Valued (Appraisals)
Numerical Scriven: summative evaluation?
Comparative (intensity) Scriven: summative evaluation?
Fuzzy Scriven: formative evaluation?

About the right-hand bottom cell: Of course a Report about the Value or worth of something can be multi-dimensional, i.e. can consist of several valuing reporting Variables, not necessarily summarised by some Rule into an overall score. But what about a theatre review? Is a theatre review essentially a score on a few scales plus a non-valuative essay? Or is there a way of writing narrative which is essentially valuative without being completely reducible to scores on scales?

The evaluation Rule

A key task of the evaluation, and/or of the evaluation Terms of Reference, is to agree on a Theory, the Evaluation Theory, which is usually composite, to generate a Statement of the reporting Variable(s), the Evaluation Statement, which may also be arbitrarily composite.

The Evaluation Theory often forms a big part of the evaluation ToR and the evaluation inception report.

(The same applies to any kind of Report, not just evaluation Reports.)

The job of the evaluator is to follow the Rule set out in the Evaluation Theory (Wittgenstein xx).

This Rule-following may also be wicked in various ways, for example in the sense that part of the Rule itself (the Rule which defines the evaluation Theory) may be changed iteratively as part of the continuing evaluation process, see section xx. In a real-life evaluation, this can happen in different ways - for example, when identifying emergent Variables, unexpected results, etc.

As usual, we can flip our perspective between normative (seeing the Evaluation Theory as a Theory) and descriptive (seeing the Evaluation Theory as an actual composite Mechanism which includes the actual evaluation team, various partners and pieces of evidence, etc etc.) (The actual Mechanism might well deviate from those principles, given fallible evaluators, etc.) - As in this case both the Mechanism and its Theory will share the same Rule, we might find it more convenient to refer to the Evaluation Rule than to either then Theorymaker Theory or the Theorymaker Mechanism.

So in primitive terms we can see the whole evaluation as another Mechanism with the Evaluand as input and the evaluation Statement as output.

The reporting Statement

Some of the most familiar kinds of Evaluation Statement are ordered Variables with just a few Levels, for example:

The project management was ((poor < adequate < good < excellent))

… and we are also familiar with ratings expressed as percentages and combinations of such Variables. For example, it is common for an evaluation ToR to require 1-4 ratings for each of several evaluation criteria, say “Relevance” “Effectiveness” etc. Sometimes the Levels may be expressed in a standard way across all criteria, sometimes not. Sometimes the ToR may specify that these sub-Statements are to be synthesised into a global Statement; sometimes the method for doing this is specified and sometimes not.

Global evaluation statement ((lo-hi)) !Rule: some kind of average

 Relevance ((lo-hi)) (!Rule: what is the Rule?)

  Evaluand ((unlimited)) 

 Efficiency ((lo-hi)) (!Rule: what is the Rule?)

  Evaluand ((unlimited)) 

 Effectiveness ((lo-hi)) (!Rule: what is the Rule?)

  Evaluand ((unlimited))

 Sustainability ((lo-hi)) (!Rule: what is the Rule?)

  Evaluand ((unlimited))

On the other hand, we also recognise evaluations in which the evaluation Statement is not a limited Variable. A good example would be the Most Significant Changes (MSC) evaluation process, (Dart and Davies 2003) which is evaluative in nature (? question to self: why?) but whose output is in essence the identification of a narrative around a particular change due to the implementation which is seen (normally by the participants themselves) as being of particular significance. So there is no way of delineating in advance all the possible outcomes of the process; yet it is not difficult to recognise an MSC report and there would also be some consensus about whether or not a given report is a good MSC report.


Dart, J, and R Davies. 2003. “A Dialogical, Story-Based Evaluation Tool: The Most Significant Change Technique,” 137.