CERTIFICATE IN GLOBAL HEALTH RESEARCH
Course 7: Validity of Research
Though it is often assumed that a study’s results are valid or conclusive just because the study is scientific, unfortunately, this is not the case. Researchers who conduct scientific studies are often motivated by external factors, such as the desire to get published, advance their careers, receive funding, or seek certain results. As a consequence, a significant number of scientific studies are biased and unreliable. Dr. Ioannidis, a meta-researcher who is one of the world’s experts on the credibility of medical research, found that the studies that tend to get published are those with eye-catching findings, which when studied rigorously collapse under the weight of contradictory data. “Imagine, though, that five different research teams test an interesting theory that’s making the rounds, and four of the groups correctly prove the idea false, while the one less cautious group incorrectly ‘proves’ it true through some combination of error, fluke, and clever selection of data. Guess whose findings your doctor ends up reading about in the journal, and you end up hearing about on the evening news?”(1) Ioannidis also found that “researchers headed into their studies wanting certain results- and, lo and behold, they were getting them. We think of the scientific process as being objective, rigorous, and even ruthless in separating out what is true from what we merely wish to be true, but in fact it’s easy to manipulate results, even unintentionally or unconsciously.”(2)
Due to the large influence that external factors have on researchers and their studies, Ioannidis has dedicated much of his career to exposing the unreliability of scientific studies. He declared that “much of what biomedical researchers conclude in published studies- conclusions that doctors keep in mind when they prescribe antibiotics or blood-pressure medication, or when they advise us to consume more fiber or less meat, or when they recommend surgery for heart disease or back pain- is misleading, exaggerated, and often flat-out wrong. He charges that as much as 90 % of the published medical information that doctors rely on is flawed.”(3) Ioannidis subsequently decided to create a mathematical proof to prove that many research studies are unreliable. His model predicted that “rates of wrongness roughly corresponded to the observed rates at which findings were later convincingly refuted: 80 % of non-randomized studies (by far the most common type) turn out to be wrong, as do 25 % of supposedly gold-standard randomized trials.”(4) To further prove his findings that researchers frequently manipulate data and chase career advancing findings rather than good science, Ioannidis focused on 49 of the most highly regarded research findings in medicine (those which had appeared in the journals most widely cited in research articles and those which themselves were most widely cited). He found that of the top 49 articles, 45 claimed to have uncovered effective interventions, however, 34 of these claims had been retested, and of those, 41% had been shown to be wrong. “If between a third and a half of the most acclaimed research in medicine was proving untrustworthy, the scope and impact of the problem were undeniable.”(5)
Ensuring the Validity of Research
Since a large number of scientific studies are unreliable, it is important to be able to distinguish which studies are in fact conclusive and reliable. Reliable studies use random samples whenever possible, utilize appropriate sample sizes, avoid biases, and should be conducted by researchers who are not influenced by funding or the desire to seek certain results.
Randomization
Randomization in studies is critical to ensuring the validity of research. Randomized trials in the clinical setting generally assign groups of randomly chosen individuals to either receive a treatment or to receive a placebo (or no treatment). Participation in each group is determined randomly using a computer or random number generator before the trials begin in order to ensure that there is no systematic bias in either group. “The goal of randomization is to produce comparable groups in terms of participant characteristics, such as age or gender, and other key factors ... In this way, the two groups are as similar as possible at the start of the study. At the end of the study, if one group has a better outcome than the other, the investigators will be able to conclude with some confidence that one intervention is better than the other.”(6)
Though randomization is an important first step for ensuring research validity, randomized studies are not always unbiased or completely reliable. As Ioannidis emphasized, though “ ‘randomized controlled trials,’ which compare how one group responds to a treatment against how an identical group fares without the treatment, had long been considered nearly unshakable evidence… they, too, ended up being wrong some of the time.” (7) Randomized trials are not exempt from errors and biases. The wording of questions, how studies are designed, which measurements are chosen for analysis, and how results are presented can all influence the validity of a study.
Sample Size
In the medical setting, research is done to find a solution to a particular problem, or to assess the impact of a treatment. In an ideal situation, the entire desired population should be studied in order to reach a conclusion. However, surveying or carrying out a study with an entire population is almost impossible and very costly. Thus, a sample representative of the population is used, and the data is analyzed and then conclusions are drawn and extrapolated to the population under study. It is important to have an appropriately sized sample to achieve reliable results and high statistical power – the ability to discern a difference between study groups when a difference truly exists. An insufficient sample size is more likely to produce false negatives and inconsistent results. On the other hand, too large of a sample is not recommended because it can be unwieldy to manage, and it is a waste of time and money if an answer can be accurately found from a smaller sample.(8)
Bias in Studies
Bias is defined as “the combination of various design, data, analysis, and presentation factors that tend to produce research findings when they should not be produced.” (9) “Bias is a form of systematic error, and there are innumerable causes. The causes of bias can be related to the manner in which study subjects are chosen, the method in which study variables are collected or measured, the attitudes or preferences of an investigator, and the lack of control of confounding variables… in epidemiologic terms bias can lead to incorrect estimates of association, or, more simply, the observed study results will tend to be in error and different from the true results.”(10) Some bias in research arises from experimental error; however, research biases tend to arise when researchers select subjects purposefully, or choose to only analyze data that is more likely to generate the results they desire. Although bias in research can never be completely eliminated, it can be drastically reduced by carefully considering factors that have the potential to influence results during both the design and analysis phases of a study.
The most common types of bias in research studies are selection biases, measurement biases and intervention biases.(11) Selection bias occurs when certain groups of people are omitted purposely from a sample, or when samples are selected for convenience. Selection bias also may occur if a study compares a treatment and control group, but they are inherently different. If selection bias is present in a study, it is likely to influence the outcome and conclusions of the study.(12) Measurement bias involves errors that occur in collecting relevant data. This can occur due to leading questions, which in some way unduly favor one response over another, or measurement bias may be due to social desirability and the fact that most people like to present themselves in a favorable light, and therefore, will not respond honestly.(13) Intervention bias occurs when there are differences in how a treatment of an intervention is carried out among two groups, or if there are differences in how subjects were exposed to the factor of interest.
Assessing Behavioral Changes: The Importance of Having a Baseline For Comparison
When assessing behavioral changes, it is essential to have a baseline or control group for comparison. It is important to evaluate the impact of a program and determine whether the program actually had an impact, or if what happened would have occurred regardless of the implementation of the program. “Evaluating the effects of a teacher training program on students’ test scores could be done, for example, simply by comparing test scores before and after the program. Such an evaluation could yield useful information to program implementers. But it could not be considered a rigorous evaluation of the effects of the program if there are good reasons to believe that scores might have changed even without the program.”(14) For this reason, a rigorous study must include a control group or baseline.
For example, many programs and organizations have developed in recent years to make cell phones and mobile technology available to rural areas. A program evaluation could report that a program was able to increase cell phone ownership in a village over a 3-year period. However, conclusive impact cannot be discerned if the statistics compare the village before and after the program intervention. “This has the advantage of simplicity but the major disadvantage of leaving unknown what might have happened in the villages if the project had not occurred." (15) Therefore, for example, the impact of a program on cell phone use may not be as dramatic as an evaluation indicates. A credible comparison group is important to determine or prove the full impact of a program or intervention. For example, one could locate other rural areas in the same country that had cell phone usage rates that were comparable to those of the intervention site prior to the start of the program. After the same specified time period (i.e. 2007-2010), what are the cell phone usage rates in those other rural areas, and how do the rates compare or differ from the rates at the intervention site? This type of comparison group is essential when evaluating behavioral changes.
Footnotes
(1) Freedman, D. “Lies, Damned Lies, and Medical Science.” The Atlantic. (Nov. 2010). https://www.theatlantic.com/magazine/archive/2010/11/lies-damned-lies-and-medical-science/308269/.
(2) Ibid.
(3) Ibid.
(4) Ibid.
(5) Ibid.
(6) “Understanding Research- Preventing Bias.”
(7) Freedman, D. “Lies, Damned Lies, and Medical Science.” The Atlantic. (Nov. 2010). https://www.theatlantic.com/magazine/archive/2010/11/lies-damned-lies-and-medical-science/308269/.
(8) Nayak, B. “Understanding the relevance of sample size calculation.” Indian J Ophthalmol. 58. (2010): 469-470.
(9) Ioannidis, J. “Why Most Published Research Findings are False.” PLoS Medicine. 2.8 (2005).
(10) Sica, G. “Bias in Research Studies.” Radiology. 238. (2006): 780-789.
(11) “Major Sources of Bias in Research Studies.”
(12) “Research Bias.”
(13) “Statistics Tutorial: Bias in Survey Sampling.”
(14) Clemens, M. A., & Demombynes, G. (2010). When does rigorous impact evaluation make a difference? The case of the Millennium Villages. The World Bank.
(15) Ibid.