Ending poverty is not only one of the twin goals of the World Bank, but also one of the Sustainable Development Goals. To design and optimize projects for poverty reduction, we need to measure their impact on poverty. This is quite difficult because changes in the poverty rate might take some time, and it is usually hard to attribute the impact to a particular project, especially without conducting a randomized controlled trial (RCT). But even if we manage to overcome these challenges, we need to measure poverty before the start of the project – as a baseline and to understand whether the project adequately targets the poor – and at the end of the project to assess its impact. And that is also not easy.
Poverty measures are derived from indicators that are captured at the household level (which comes with its own problems). Usually, we administer household questionnaires asking how much of which items the household members consumed in the past week(s) or month(s). As such, an interview usually takes several hours, which is often not feasible for assessing the impact of a project. Alternative methods have therefore been developed using just a handful of questions to estimate poverty. The Poverty Probability Index (PPI) invented by Grameen and taken forward by Innovations for Poverty Action (IPA) is a good example. It uses the following scorecard in Kenya:
Based on a statistical model (calibrated with data from the last consumption survey), a score is assigned to each question. Once the scores from all questions are summed up, a predefined table is used to look up the probability of a specific household to be poor. While this sounds easy and is promised to be very cost-efficient, the scorecard suffers from several short-comings that will critically bias the results, even after the scorecard is tested and optimized. Here, I will focus on two of them.
The first problem is that the answers to some of those questions usually depend more on the location of the household or the time of the interview than on whether the household is actually poor. Simply put: it is hard to come up with a short list of proxy poverty indicators that work well everywhere in a country at any time of the year. Let us look at question 6 which asks whether the household either purchased, consumed or acquired any ripe bananas in the past 7 days. Consumption of this item varies considerably within Kenya. For example, almost 75 percent of households in Mombasa compared to only less than 40 percent in Narok consume ripe bananas even though both counties have a comparable poverty rate of about 17 percent:
Based on the scoring system, households consuming ripe bananas will have a reduced probability of being identified as poor. Thus, the PPI-estimated poverty rate does not always fit the real poverty rate (measured with a representative household consumption survey). The population in Narok consumes less ripe bananas than is expected for households at a poverty rate of 17 percent. Therefore, the PPI-estimated poverty rate over-states the real or official poverty rate by almost 10 percentage points at 27 percent. Another example is Busia where the official poverty rate of 60 percent is underestimated by 15 percentage points at 45 percent. To put this into relation, Kenya needed 10 years to reduce its poverty rate by 7 percentage points. Thus, such a strong bias will make it impossible to measure any realistic changes in poverty.
These are conservative examples because, in addition, the scorecard system increases or reduces the score of households by a certain amount according to their location or county of interview. Thus, the PPI could have – theoretically – been calibrated to perfectly reproduce poverty estimates at the county level (but for statistical reasons related to overfitting this is not necessarily desirable). While I use the county-level as an example here, the problem with the system is more general: There will always be subpopulations for which the poverty estimate is going to be biased – with potentially critical consequences from excluding groups of people or locations from programs.
However, the scorecard (and other methods based on this principle) suffer from an even more fundamental problem, especially if used to evaluate the impact of a program on poverty. It is based on a so-called structural model, which relates responses to the questionnaire with poverty based on observed linkages – for example, between ripe bananas and poverty – at the time of the last household consumption survey. These relationships, however, can change over time, especially if the population is subject to shocks or beneficiaries of projects.
Kenya exports a good proportion of its bananas. If South Africa, for example, suffers a drought reducing its own banana production, it can compensate for this shock by importing more bananas from Kenya. This will increase the price of bananas in Kenya. In turn, producers of bananas will make higher profits, while consumers might substitute bananas in their diet for another fruit or product. This will reduce consumption of ripe bananas in Kenya. If – as an extreme example – no more bananas were to be consumed, the PPI would estimate an increase in poverty by up to 13 percentage points. On average, the estimated increase would be 8 percentage points which is more than the 10-year progress in poverty observed between 2005/6 and 2015/16. However, it’s unlikely that such a positive economic shock would actually increase poverty at all.
Here is another example: Imagine we are using the PPI to estimate the impact of an agricultural project. Assume the project works well and improves agricultural productivity for more desirable staple foods than ripe bananas. Therefore, prices of other staple foods will drop, and households would substitute ripe bananas with the now affordable and more desirable staple foods. Thus, consumption of bananas will drop such that the PPI will estimate an increase in poverty as in the previous example, while in reality poverty is likely to have dropped due to the higher incomes of farmers and the decrease in prices.
This is not a specific characteristic of the question on bananas. One can see similar biases for almost all other questions. A project distributing towels to households will lead to a PPI-estimated poverty rate that is lower by up to 14 percentage points (on average: 9 percentage points), even though the real reduction in poverty – if any – is surely less impressive. Similarly, a project that trains beneficiaries to become tailors while distributing sewing machines will increase towels in an area, but probably with a much lower reduction in poverty than estimated by the PPI.
Of course, the PPI scorecard can be adjusted for specific groups or locations, as well as for specific projects to ensure that it is less susceptible to such spurious effects. Given the appeal of the general scorecard to non-technical users, however, many users will not know about the caveats and, thus, may not bother to adjust the card. Even if adapted though, the system will always be subject to biases, which often cannot be foreseen, especially if a project is evaluated over a longer time. Instead of a scoring card-like methodology, new consumption measurement approaches (e.g. the rapid consumption methodology) should be considered as they produce more accurate estimates, so that we can trust the measured poverty impact of a project, and correctly decide, for example, if it should be scaled up.