Outcome Harvesting is not harvesting but hunter-gathering

Outcome Harvesting proponents argue persuasively that what count as positive or negative outcomes of a programme cannot always be specified in advance. They are right. Outcome assessment approaches which deal only with pre-specified outcomes will usually miss important successes (and perhaps important failures too).

Dear Bob here is my response to your response your responses,

yes, I get it that you that your claim is a strong one, that we can’t know where the boundaries are. I just think that making blanket statements like that doesn’t help in outcome harvesting or anywhere else, because what are we to do? I think it’s really important to be able to delineate areas where these kinds of problems are likely to be most acute, and I think you do in your Wicked Systems, Bob.

I think it’s more useful to ask if we can list or agree on the likely values of a variable in advance. If so, then these problems are going to be less acute. If we have a question or a variable Where we can’t specify all the possible values in advance but we are able to recognise them after the fact - (with more or less dispute, and this is one place where political and ethical issues appear), then we’ve got what I would call an “open” variable and problems like boundaries and “who decides” become much more acute. Outcome harvesting has a key open question all at its core, namely the question what are all the relevant outcomes?. Correct me if I’m wrong, but the central reason to use outcome harvesting is when we suspect that we can’t delineate all the outcomes in advance but we will be able to recognise one when we see one. So by this argument, OH is, by design, particularly open to these boundary questions.

So, and I think we discussed this before on the outcome mapping discussion list, how can we be able to tell whether some bunch of results are typical or somehow representative as opposed to, in some sense, biassed. There are two ways to address this:

  • When we do actually know in advance what the universe of (potential) outcomes looks like, it is not so difficult to decide if a report captures a good selection or a representative selection of the outcomes.
  • However of course this approach doesn’t work in classic outcome harvesting situations. So the other approach is to look at the tools we use (including ourselves, the researchers) rather than the results. Does something about our approach mean some kinds of outcomes are less likely to be identified? Am I aware of my typical blindspots, hobby horses and preferences - cognitive, political, personal, etc? Do I take them into account or do I at least tell my audience about them? - And this is where hands-on researcher coaching and supervision is so important. I don’t know what OH says about this.

Third, an open question is whether outcome harvesting procedures tend to help us to answer the open question at the heart of outcome mapping outcome harvesting or whether possibly the technique itself and the spreadsheet-based approach (which is so attractive) might also tend to reduce our sensitivity for systemic and open lust to new and emerging phenomenon.

Here are two phrases to simplify my argument: Let’s say that a “before-the-fact” outcome is one in which, given a description (like, “increased weight of potatoes produced in this plot”), there is rarely any relevant dispute amongst experts about how to gather evidence for it. Whereas an “after-the-fact” outcome (like “lives are improved”) is one for which experts cannot provide anything close to an exhaustive list of potential evidence beforehand (or even a definitive recipe for finding it), but can recognise relevant evidence after it has been found: “look, this child’s life has improved because the programme inspired her to start visiting her elderly neighbours, and she loves it and they do too”.

(As it happens, Outcome Harvesting, like Outcome Mapping, restricts itself primarily to changes in human behaviour, relationships etc., which is fine. But the points I am making here apply to any kind of outcomes.)

So, the argument goes, “let’s treat an outcomes assessment like a harvest - let’s just go and see what we can find (following a sensible, partially closed but essentially open search strategy)! Experts (peers, stakeholders, etc.) will probably be able to agree, once we find something, if it is good news or not.”

We can provide ourselves with some kind of recipe for helping locate relevant evidence, even something as obvious as, “look first near the school where the programme was actually implemented” or “ask the children first”. We can also proscribe the kind of outcomes we are looking for - just changes in children, or changes in life outlook, or whatever. (Any such proscription is of course a limited sacrifice of the principle that we are keeping our eyes open for just anything.) But whatever recipe we can provide will always leave something open (otherwise, these would just be “before-the-fact” outcomes and this wouldn’t be Outcome Harvesting.) So we might for example decide to ask the children in the school first, but when talking to them, we wouldn’t restrict ourselves to a closed list of questions. Any recipe we can provide will always be potentially unbounded: we will never be certain that we have really found all the relevant evidence (though we can provide ourselves with practical rules for when to stop). It is this unbounded element which is central to Outcome Harvesting.

All of this is well and good. But is it “harvesting”? Surely, harvesting is a procedure of gathering the (more or less expected) produce from a known area at a known time. Outcome Harvesting (or any other partially-structured, open-ended search for after-the-fact outcomes) can provide real evidence of desirable and undesirable outcomes, in particular the unexpected. Not including after-the-fact evidence is just negligent in most kinds of evaluation. But, as there is no way to delimit what other outcomes might still be out there, there is no way for us to say that what we have found is most of the relevant evidence. It might only be 30%, or 3%. We can’t provide any lower bound, whether with numbers or hand gestures or anything else. How very unlike harvesting, in which we don’t say, dropping exhausted from a day in the fields “I’d better go back out again - there might be a harvest ten times the size of what we gathered today, still hidden somewhere in the same field, or the next field, or …”. This sounds to me more like hunter-gathering than harvesting.

So when we read, in an OH report (Rassmann et al. 2013) : “[…] the outcomes harvested during the evaluation were seen as representative for the accomplishments […] during the evaluation period”, how does anyone know they are representative, without a before-the-fact (or even after-the-fact) bound on the actual totality of outcomes? And if we had such knowledge of the actual universe of outcomes, why did we bother with an open-ended approach?

All evaluation approaches have their specific limitations, and it seems to me that this is the central and specific limitation of OH. It is mentioned as one, but somewhat obliquely, in : “Only those outcomes that the informant is aware of are captured”; (Wilson-Grau and Britt 2012) and it ought in my humble opinion to be addressed in the “Analyze and Interpret” section of that otherwise excellent handbook.

What response can we give to a member of the evaluation audience who exclaims “What, you mean that this evidence, compelling though it is, might only be a drop in the ocean of positive or even negative possible findings??”

  1. http://betterevaluation.org/sites/default/files/Retrospective%20outcome%20harvesting.pdf
  2. https://usaidlearninglab.org/sites/default/files/resource/files/Outome%20Harvesting%20Brief%20FINAL%202012-05-2-1.pdf

Footnote - I did just find a USAID report involving OH (MarketShare Associates 2016) which does mention that “OH is not a sufficiently rigorous tool to make conclusive statements about the specific extent to which its findings are representative of an entire population”. But the problem here is not lack of rigour. OH is perfectly rigorous on its own terms. And the problem is whether findings are typical of a population of outcomes, not (just) of people.


Rassmann, Kornelia, Richard Smith, John Mauremootoo, and Ricardo Wilson-Grau. 2013. “Retrospective ‘Outcome Harvesting’.” BetterEvaluation, no. April.

Wilson-Grau, Ricardo, and Heather Britt. 2012. “Outcome harvesting.” Ford Foundation. Accessed 6: 2012. http://www.managingforimpact.org/sites/default/files/resource/outome{\_}harvesting{\_}brief{\_}final{\_}2012-05-2-1.pdf.

MarketShare Associates. 2016. “Testing Tools for Assessing Systemic Change: Outcome Harvesting.” LEO Report #43, no. September. https://www.microlinks.org/sites/default/files/resource/files/Report{\_}No.{\_}43{\_}-{\_}SC{\_}Tool{\_}Trial{\_}Outcome{\_}Harvesting{\_}-{\_}508{\_}compliant2.pdf.