How Do We Determine Programme Effectiveness?

Over the last decade, there has been increasing discussion at the government, policy and funder level in favour of requiring educational, social and community programmes and interventions to provide evidence of their effectiveness. This evidence is then used to determine whether the programme is deemed worthy of future or continued funding. In some countries, such as the United States, there are already government mandates stating that sound evidence of effectiveness is necessary for any programme that receives government funding. At face value, this seems appealing – after all, why would we want to fund something that is ineffective? We certainly wouldn’t advocate that ineffective programmes should be funded, but do believe there are considerable complexities that need to be considered when determining what is deemed ‘effective’, and argue that there are currently considerable differences in interpretations of ‘evidence’ and ‘effectiveness’. In this Briefing Paper we discuss the potential implications of such a mandate for the social, community and educational sectors within Aotearoa New Zealand.

Many of the challenges we currently face within New Zealand mirror international trends and mandates. For example, in the United States and United Kingdom, evidence-based registries, in which programmes are publicly rated on their effectiveness based on the quality of the evidence, are commonplace. Such registries, although not as common in New Zealand, are beginning to appear and gain support. For example, the Iterative Best Evidence Synthesis collates the reported ‘effect’ of educational programmes. While not intended to be used for funding purposes, it cannot be denied that programmes deemed to be more effective are privileged in terms of future funding. Such syntheses have a seductive appeal: a single rating of effect allows us to readily determine what works and what doesn’t. Unfortunately, reality is not so simple, and these single ratings of effect can be highly misleading as there are significant differences in terms of the nature and quality of the implementation, and the information used to inform what is considered within these effectiveness ratings.

Each programme occurs in a different context, often utilising different tools to measure outcomes. A programme designed for a particular context may be highly effective within that context, yet may have no effect at all when applied in a different context. Programme registries risk the possibility that such a programme might be subsequently considered ineffective, unless the rating of effectiveness is based on evaluations undertaken across a representative selection of sites. However, accurate estimates of multi-site programme effectiveness require highly complex statistics and many organisations within the educational, social or community sectors do not have the resources to fund such expertise. With respect to the tools used to evaluate outcomes, while the programmes may refer to the same outcome, the outcome is frequently assessed differently. For example, two programmes may report on ‘maths achievement’, with one drawing on a standardised test score on a continuous scale, and the other using NCEA unit standards, which typically have only two possible outcomes (achieved and not achieved). While it is possible for both programmes to calculate a ‘maths achievement effect’ to be used in a registry, the reality is that the differences in the properties of the assessments preclude direct comparison. Finally, registries often fail to capture differences in effectiveness that arise as a result of differences in implementation, such as the degree to which programmes adhere to the original programme delivery model.

The notion of providing an evidence-base has its roots in the medical model. Under controlled conditions, interventions are statistically tested to ensure that the treatment is in fact responsible for the intended change. A recent article in the international literature comments on the challenges associated with translating this medical model to evidence-based practices or programmes. In particular, globalised notions of evidence can be problematic for programmes operating in countries with small, culturally diverse populations and limited resources. New Zealand’s small, vibrant and culturally diverse society falls within this definition, requiring a culturally responsive approach to evaluation. This includes consideration of multiple stakeholder views with regards to both the kinds of outcomes and evidence that should be valued. At present, New Zealand has a unique opportunity to determine what should be considered evidence of programme effectiveness within Aotearoa, including the role (if any) and structure of programme registries. We believe that there needs to be a deeper understanding of the challenges associated with unduly privileging quantitative over qualitative evidence. The previously mentioned Iterative Best Evidence Synthesis is a good example of a registry that takes a more holistic view, in that both qualitative and quantitative evidence are collectively used as indicators of effectiveness. However, the authors’ provision of ranked tables of effectiveness risks promoting a reductionist view that encourages an uncritical focus on rankings.

Historically, New Zealand has tended to adopt reforms that have been implemented overseas, particularly in other English speaking countries, and the current movement toward quantitatively determined programme effectiveness is no exception. Ironically, current discourses in the United States and the United Kingdom are beginning to question the utility of such a reductionist approach. As a country that has not yet ventured too far down the path of this reductionist approach, we are in a position to learn from these experiences, rather than simply adopting the approach blindly. We believe that it is imperative that we engage in critical debate that will ultimately determine our own notions of what constitutes quality evidence. We need to determine the pros and cons of global notions of evidence, whether there should be a universal framework of quality evidence, and the cultural responsiveness of universal frameworks.

How Do We Determine Programme Effectiveness?

Pat Bullen, Kane Meissel & Kelsey Deane