Monitoring and Evaluation/Course2/Module 3 — Unite For Sight Global Health University

MONITORING AND EVALUATION CERTIFICATE

Module 3: Purposes of Evaluations (Plausibility, Probability, Adequacy)

In general, there are three reasons why evaluations are conducted: to determine plausibility, probability, or adequacy. As discussed in the Constraints on Evaluations module, resources for evaluations are limited and determining the reason for the evaluation can save both time and money in program budgets.

Adequacy

An adequacy assessment is conducted if stakeholders and evaluators are only interested in whether the goals, set by program developers, were met. For example, if a child health program seeks to reduce child mortality to 25% in selected villages, an adequacy assessment will attempt to show whether or not this 25% target was reached. The benefit of performing an adequacy assessment is that it does not require a control group, which can significantly cut the budget of an evaluation, as well as time and effort levels. However, without randomization or a control group, many indicators cannot be appropriately linked directly to the program activities. Although limited in the information that may be inferred, adequacy assessments do show progress toward pre-determined targets, which may be sufficient to argue for increased or continued funding.(1)

Case Study: Adequacy assessment of a community nutrition program in Senegal(2)

The goal of evaluators when planning the process of evaluation of a Community Nutrition Project (CNP) in Senegal was to determine the failures and successes of program activities in order to strengthen the program's implementation. Therefore, it was decided that this evaluation only required an adequacy evaluation; to determine if expected process and outcome indicators were met. Process indicators, which were developed by program implementers and stakeholders included recruitment of at least 90% of all underweight children and reach 80% attendance among mothers. Monitoring tools had been developed so that for each child, evaluators could determine attendance at monthly weigh-ins, attendance of the mother, and whether or not the food supplement was distributed. Although not all process targets were met, the evaluators found that nearly all indicators did improve in the expected direction. The results of this adequacy assessment were used to improve the delivery of services and recruitment strategies during the next phases of the project.

Plausibility

A plausibility assessment similarly determines if a program has attained expected goals, yet identifies changes as potential effects of program activities rather than external or confounding sources. This is possible with the use of an experimental control group.(3) Ideally, a plausibility assessment will also incorporate baseline and post-intervention data points to explicitly show improvements in target indicators.(4) Again, the benefits of having a control group over not having a control group is the ability to link program activities to program outcomes; without a control group it is not possible to determine whether decreased child mortality rates, for example, are associated with a particular intervention. Without measuring identical indicators in control villages, there is no way to link a decrease in child mortality to program activities because other external and confounding factors may have contributed. In plausibility assessments, the control groups are not required to be truly randomized; control groups can be chosen from historical epidemiological databases or internally (i.e. those individuals who were chosen for an intervention but chose not to participate in the program activities).(5) This, however, does allow for certain selection bias confounders that are not accounted for in the analysis. Therefore, the results of a plausibility assessment only truly determine that there was a difference between the control and the intervention groups that most likely, but not wholly, can be attributed to the intervention.

Case Study: Plausibility assessment of a microfinance program in South Africa(6)

The Intervention with Microfinance for AIDS and Gender Equity (IMAGE) program was designed to reduce and prevent the prevalence of HIV and intimate partner violence in South Africa. The program had set goals to improve certain economic household variables, like the ability to pay back debt and meet basic household needs. These indicators were measured, as in adequacy assessments, to identify whether the pre-determined targets were met. However, the evaluation of IMAGE is better able to attribute the improvements in variables, such as ability to pay back debt, to the program activities because of the study design, which included a control group. Evaluators found that certain economic variables improved in villages where women were participating in the microfinance program, as compared to the control villages.

Probability

Like both plausibility and adequacy assessments, probability evaluations look to determine the success of a programs activities and outcomes. Unlike the two previously discussed assessments, probability assessments use the most robust study design, randomized control trials (RCTs) to determine the true effect of the intervention on the indicators of interest.(7) Due to the complexity of determining causal relationships in public health, this type of assessment is the most expensive and time consuming of the three so it should be used only when evaluators and stakeholders have found it necessary for funding or research purposes; RCTs involve more data collection and more emphasis is placed on compliance, and so RCTs often increase costs associated with personnel to collect data and incentives or vouchers to improve participation and compliance rates may also increase costs.(8) In some cases due to the nature of the project or intervention it may be impossible or unethical to conduct a true RCT.(9) This strategy may not be feasible if the evaluation is not discussed in the initial phases of the program planning, as a randomized control group is required, and is difficult to conceive mid-intervention. An RCT involves complete randomization when selecting the intervention and control groups to reduce the influence of bias on the data. For example, for plausibility assessments, if the evaluators choose to use an internally-created control group, there is a risk of the control group being influenced by others in the household or village who are participating in the program (also called spillover effects). With a probability assessment, evaluators take this into consideration when choosing their control groups (ensuring physical and social distance between the groups, for example).

Case Study: Probability assessment of a breastfeeding promotion initiative in Belarus(10)

Although it is commonly believed that breastfeeding is beneficial for newborns and young children, especially in the prevention of infections, most of the research and programmatic evaluation that has been conducted produces plausibility, not probability, statements of these benefits. To minimize bias and the impact of confounding variables commonly found in plausibility evaluations, there was a call for more substantial, unbiased evidence for the association between breastfeeding and decreased risk of infection in infants. It is unethical to prevent a woman from breastfeeding her child, so a program was developed to promote exclusive and prolonged breastfeeding among women who had shown interest in breastfeeding. The evaluators were therefore able to randomize women into an intervention group (women in chosen clinics who received the promotion initiative) and a control group (those women in clinics chosen to continue efforts as usual, with no intervention) without ethical concerns surrounding the control group. Although clinics were randomized, the groups of women within them were generally similar in terms of age, number of children, smoking habits etc. This allowed researchers to ensure higher likelihood that differences in results between the groups were due to the intervention and not to underlying sociodemographic characteristics. A process monitoring system was put in place to ensure that the program was properly implemented. Data, including demographic information, breastfeeding schedules, and medical history, were collected at all well-child visits over the first 12 months of life. Comparisons were made based on the incidence of respiratory and gastrointestinal infections and consistency of breastfeeding habits. By measuring each of these indicators in the randomly selected control and intervention clinics, evaluators could determine the association between breastfeeding and infant infection risks. Evaluators found a reduction in the risk of gastrointestinal infections among the intervention group (as much as 40% lower) but not a reduction in risk of respiratory infections.

Summary

This module highlighted the details that evaluators must attend to prior to developing a research plan. Each of these categories of evaluations has benefits and drawbacks, and choosing the correct one is based on multiple factors: how much time and/or money is allotted for the evaluation? What kind of decisions are going to be made based on the evaluation results – policy or funding? Is it ethical or appropriate to have a control group or randomize the intervention? What is the goal of the client? The stakeholder? The evaluators? After these questions have been answered, the proper evaluation type can be chosen.

Footnotes

(1) Habicht, J.P., Victora, C.G., and Vaughan, J.P. (1999). Evaluation designs for adequacy, plausibility, and probability of public health programme performance and impact. International Journal of Epidemiology, 28:10-18.

(2) Gartner, A., Maire, B., Kameli, Y., Traissac, P., and Delpeuch, F. (2006). Process evaluation of the Senegal-community nutrition project: An adequacy assessment of a large scale urban project. Tropical Medicine and International Health, 11(6):955-966.

(3) Habicht, J.P., Victora, C.G., and Vaughan, J.P. (1999).

(4) Global HIV M&E Information. (2008). Glossary: Plausibility evaluation.

(5) Habicht, J.P., Victora, C.G., and Vaughan, J.P. (1999).

(6) Kim, J., Ferrari, G., Abramsky, T., Watts, C., Hargreaves, J., Morison, L., Phetla, G., Porter, J., and Pronyk, P. (2009). Assessing the incremental effects of combining economic and health interventions: The IMAGE study in South Africa. Bulletin of the World Health Organization, 87:824-832.

(7) Victoria, C., Habicht, J.P., and Bryce, J. (2004). Evidence-based public health: Moving beyond randomized trials. American Journal of Public Health, 94(3):400-405.

(8) Ibid.

(9) Black, N., (1996). Why we need observational studies to evaluate the effectiveness of health care. BMJ, 312:1215-8.

(10) Kramer, M.S., Fombonne, E., Igumnov, S., Vanilovich, I., Matush, L., Mironova, E., Bogdanovich, N., Tremblay, R.E., Chalmers, B., Zhang, X., Platt, R.W. (2008). For the promotion of breastfeeding intervention trial (PROBIT) study group effects of prolonged and exclusive breastfeeding on child behavior and maternal adjustment: Evidence from a large, randomized trial. Pediatrics, 121(3):e435–e440.

NEXT: MODULE 4

CONSTRAINTS ON EVALUATIONS

MODULE 4 >