What’s wrong with how we do impact evaluation?

World Bank Blog Submitted by Markus Goldstein On Thu, 02/11/2016. Click here to go to the World Bank Website

Neil Shah, Paul Wang, Andrew Fraker and Daniel Gastfriend of IDinsight, make a case for what they call decision focused impact evaluation. What, you may ask is a deciFinancial Educationsion focused impact evaluation? Shah and co. define it as one which “prioritises the implementer’s decision-making needs over potential contributions to global knowledge. They contrast these to what they call knowledge focused evaluations which are “those primarily designed to build global knowledge about development interventions and theory.”

Shah and co. acknowledge that the distinction across these two types is not binary and, to me, it’s not really helpful. What’s interesting in their paper is that they raise a number of critiques of how a lot of impact evaluations are done that I keep hearing. So I thought it might be time to revisit some of these issues.

First up: impact evaluations are asking questions that aren’t directly relevant to what policymakers want to learn. They have a nice quote from Martin Ravallion, where he says: “academic research draws its motivation from academic concerns that overlap imperfectly with the issues that matter to development practitioners.” For sure, some of this lack of overlap comes from questions policymakers want to evaluate but academics won’t usually touch (e.g. single replication studies of things that have been well published before). However, this gap is being somewhat (and hopefully increasingly) filled by non-academic evaluation work (and by the new spate of multi-country replications where academics play a lead role).

In terms of the questions academics are asking, I think there is a significant degree of overlap of questions that are relevant to policymakers. Let’s take a deeper look at the kind of questions impact evaluations are asking. First, there is uptake (e.g. evaluations on getting businesses to formalize). Second, there is what is the impact of the intervention on outcomes (e.g. do cash transfers increase the chance that kids go to school). Third, there is the why: what mechanisms lead the intervention to cause these changes in outcomes? This is the area where an economic theory-informed approach is more likely to be applied. I spend a fair bit of time with policymakers of various stripes and I see folks interested in answers to all of these question – be it in testing their priors on a theory of change (the why), getting a sense of returns to investment (what), and why or why not people are coming for their program (uptake). Answering these in terms of actual programs that happen somewhere helps make this evidence particularly salient. (For some further complexity on thinking about the distinction among these three and what kind of programs they are working with, it is worth looking at a nice recent post by David).

This is not to say that there aren’t some academic evaluations that are not of immediate or medium term use for policy makers. This is particularly true for “why” type experiments that are particularly constructed and controlled in order to isolate one aspect of economic theory (David’s post has an example of this, and this is partly behind my quote in Shah and co.’s paper). But on the whole, there is a significant overlap for policymakers who are curious about what their programs are doing and how this might happen.

One thing that is critical to this point is that knowledge is portable. Which brings us to a second critique: external validity. Eliana Carranza and I blogged about this earlier as a starting point for thinking about this. The basic point here is that policy makers are generally not morons. They are acutely aware of the contexts in which they operate and they generally don’t copy a program verbatim. Instead, they usually take lessons about what worked and how it worked and adapt them to their situation. Every policymaker I’ve talked to has raised this issue when I bring some impact evaluation evidence to the table. Sometimes this leads to the conclusion that we’ve got to try something different and sometimes it leads to a conversation on how to adapt the lessons to her/his particular context. And bringing the folks who implemented the original (evaluated) program into the discussion can help facilitate this process.

Moving on to critiques 3 and 4: Impact evaluations take too long and are too expensive. In support of this, Shah and co. cite some statistics on the lag between endline data collection and publication. They also (in a footnote) note that some folks do share results before they are published. This is precisely the answer and why this is not really a problem. I find in my work that sharing the results with the program implementers, usually before I even start writing, gets me deeper and/or different insights into what they mean. And from talking with others, I’m not alone in this. Of course, there will be some lag between endline and results, which will be driven by how long it takes to enter and clean the data and how complex the analysis is. Another dimension on which Shah and co. raise the issue of taking too long is longer data collection periods to collect downstream indicators. David has a nice post on this as well which explains why there are very, very good reasons to wait a bit.

On the expense (and maybe length of analysis side) Shah and co. raise the issue of survey length. There are a couple of responses to this. First, in order to get a complete cost-benefit, we need a fairly robust spectrum of outcome indicators. Missing outcomes with a high return are going to give us an underestimate of program impact (e.g. understanding the labor supply responses of health interventions). Second, when I work with program implementers to design an evaluation, they usually come up with a fairly long list of indicators they think the program might impact. Third (and somewhat related to the first), as I’ve argued in a previous post, focusing on the outcomes within the sector for which the program was built (e.g. only looking at school enrollment impacts from conditional cash transfers) introduces a risk that we miss a program that is quite effective at addressing an “out-of-the box” outcome and perpetuates the silos of government and donor programming.

Critiques 5 and 6 relate to getting evidence used effectively and efficiently. Here, I think there’s a fertile ferment of new ideas coming up, but there’s a way to go yet. And that’s a topic for a later post.
.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s