Statistical Significance vs Meaningfulness
Are Statistically Significant Results also Meaningful?
This is the first of a two-part post discussing the scientific journal Basic and Applied Social Psychology editor’s controversial decision to ban the use of p-values in their publications.
I was recently reminded of a debate between colleagues a few years ago, about the meaningfulness of scientific results. In light of the editors’ decision to ban p-values in a scientific journal, a debate has sparked among scientists regarding the usefulness of p-values in science research. In most fields, the use of p-values in scientific research to determine statistically significant results is commonplace. But many argue that just because results are statistically significant doesn’t mean the findings are meaningful. If that’s the case, it leads to an question: How do you know that statistically significant results provide results that are actually meaningful to the community?
It is important to recognize that statistical significance only indicates a magnitude of difference. It does not give us any indicator of how useful or meaningful these magnitude differences are. We are used to seeing studies that have meaningful, statistically significant results. For example, correlating study time with exam scores could find statistically significant differences, but the results are also meaningful. It makes sense with these results to conclude that students who spend time studying for an exam are more likely to do better than those who study very little, because the process of reviewing can better prepare students to take an exam, by reminding themselves of relative information and orienting themselves to focus on the topic.
The problem (and source of much debate) comes when statistical significance is overused and conclusions are drawn without reason or meaning. For example, you could perform a study that correlates foot size with exam scores, finding statistically significant values (say, 95%) which indicate that people with larger feet score 10% better on an introduction to atmospheric science course. However, despite the high confidence of the results, is not useful information because nothing can be said or done with it. Why do people with bigger feet do better on this exam? Does it have to do with body type? Is brain function connected to foot size? Do people with big feet have reason to study more than those with small feet?
This problem is something to be cautious of when dealing with new scientific information. We come across claims like this a lot in the news, especially related to health and fitness. It only takes one study that reports that coffee drinkers are statistically more likely to live a long life for people to start drinking more coffee every day. However, without consideration to why those results may (or may not) be true, then the results don’t carry any meaning.
It is also important to consider if results can be meaningful even though they aren’t statistically significant. Assume, for example, that one question on the introduction to atmospheric science exam asks students to identify their desired career goals in the field. The responses were then compared between two exams across several years of students, one before students were exposed to career counseling and one after they participated in career counseling. The results find no statistically significant increase in any specific field of atmospheric science after students took career counseling, therefore the researchers will likely not report any findings from this study.
However, throwing this information out because it did not yield statistically significant results does not mean it is not useful. For example, perhaps the dataset looks something like this:
Career Aspiration |
# students interested (before career counseling) |
# students interested (after career counseling) |
Teacher |
150 |
155 |
Researcher |
75 |
85 |
Private Industry |
200 |
195 |
Television |
200 |
195 |
The values between each group did not change significantly, however there was some change because a minimum of 20 students (possibly even more!) changed their career aspiration as a result of career counseling. It is reasonable to say that although career counseling did not convince students to consider one particular field more than others, some students changed their mind given the opportunity to consider other options. This is very meaningful information because it indicates that the career counseling is having some sort of effect on the student’s career aspirations, which can be very important for students’ futures. With these results, future studies can be set up to further evaluate the usefulness of career counseling.
The point of this post is to begin to think about the usefulness of p-values and statistical significance with a critical eye. While it can be a useful tool to identify significant changes in magnitude, the results need to be considered meaningful as well; otherwise we risk jumping to conclusions. Furthermore, it is important not to put too much stake into p-values to determine meaningful results. Throwing data out because it is not statistically significant may cause us to overlook meaningful findings that happen to have small changes in magnitude. In fact, the tendency to jump to conclusions using p-values in biomedical research has become so problematic that one scientific journal has banned all use of p-values in their publications. We will continue this discussion next week, in light of this journal’s controversial decision.
-Morgan