Quality of evaluation evidence

A Statement should accurately report what it is intended to report and understood to be reporting.

Quantitative social science has many tools and procedures for numerically assessing the reliability (and to a lesser extent the validity) of what Theorymaker native speakers call reporting Mechanisms.

The most important defence for working evaluators against lack of reliability and validity, and in particular against bias and illusion is however not to worry about statistics but to be systematically aware of threats, particularly ones to which we are particularly vulnerable or blind as individuals, members of a certain class, profession, gender and race etc., and to know how to mitigate them.

I’d like to see more ink spent in evaluation journals and blogs on the donkey-work of increasing the quality of evidence in evaluation, i.e. increasing reliability and validity and in particular avoiding bias and illusion, and less ink spent on “my new patent evaluation method”.

We need to not only address these issues in the evaluation planning stage but also be prepared to respond to emergent and unpredictable challenges to quality of evidence.

Reliability and Validity


If the readers of (or listeners to) the Statement understand the correct Variable, we can say the Statement and the reporting Mechanism are valid. So Validity is the extent to which a report Variable actually reports the Variable its audience understand it to be reporting, and not some other Variable.

So for example if a questionnaire deals only with tolerance towards a particular ethnic group, “Tolerance” would not be a valid title for it and we mustn’t report high scores on it as reflecting high levels of tolerance in general.

Internal Validity

Reliability and Validity cover how well Statements are reported, but they also cover how well Mechanisms are reported. This is Internal Validity. Actually both Reliability and Validity (in the Theorymaker sense) are involved.

External Validity

Construct Validity


If the reporting Mechanism is correctly set up, the Statement will be made if and only if the Fact it reports is the case, because the reporting Variable is causally controlled by the Variable it reports; so we can say the Statement is reliable.

So the Reliability of a reporting Mechanism is the extent to which it ensures that each Level of the report Variable - each Statement - actually appears when the Fact it reports is indeed the case.

So for example imagine that our questionnaire has been extended to deal with tolerance towards many different groups and in many different situations, it might still not be reliable if for example the participants found the questions difficult to understand, or were distracted by noises outside while they were filling it in, or if the data had been entered the wrong way round in an Excel sheet (e.g. “yes” had been coded as “no”), etc.

Reliability and Validity in these Theorymaker senses are something like reliability and validity in ordinary statistics, except that these are usually defined only for numerical Variables with quite special statistical assumptions xx.

Bias and illusion

For a somewhat more substantial treatment of bias, see xx