Social science research, the kind we would use to research training effectiveness for example, is in big trouble. Psychology in particular, whose methods are widely adopted for education research, is undergoing a severe replication crises. That’s to say, it turns out that many published research papers have results that cannot be replicated, methodologies that are full of holes and conclusions that aren’t justified.
There are a lot of reasons for this, far too many to outline here, but a lot of the fault lies at the feet of academic journals that don’t publish according to scientific principles. Researchers are incentivized to publish – hence the saying “publish or perish”, but if your research does not show “statistically significant” results, good luck getting it published.
The problem is that an academic journal should not publish research according to how “newsworthy” it is. It should journal great research. Great research sometimes does not get positive results, but that does nothing to diminish the contribution of the study. Quality replication studies are the backbone of good science, but in the social sciences, there is little incentive to undertake replications, since journals are unlikely to publish them. This is why we have so many social science papers that are one-offs. They are published, the results accepted, and then the topic is never touched again. Likewise, “statistical significance” has nothing to do with the actual significance of a study, but is a widely misused term that social science researchers can abuse, some of whom (at times) seem almost statistically illiterate.
Happily, this is changing dramatically. Journals are revisiting their publishing standards and old studies are now ripe for replication and refutation. Social science can only benefit from this and it will show up many misconceptions that we’ve gained over the years from faulty research. One (of many) problems is that the things social scientists are trying to measure are usually abstract and influenced by a large number of other factors that are almost impossible to control.
Researchers have to “operationalize” the things they are trying to measure by finding some measurable things that are supposed to stand in for the abstract construct. If we get this part wrong it means what we think we’re measuring and what we are actually measuring are two different things and we end up making faulty conclusions.
This has many implications for training evaluation. We measure a lot of things in many different ways with the idea that we want to make sure our training works. We want to take the information we get from our evaluations and improve our training in some way. Some of these measurements are reasonable. They have, at the very least, something we refer to as “face validity”, which means that on the face of it the thing we want to measure and the method we use to do it both seem reasonable, which, most of the time, is true.
Others turn out to be misleading, not measuring the thing we thought we were measuring. An event that happened to psychology in the heydays of attitude research.
– People’s attitudes would predict their behaviour
– Attitude alone doesn’t predict what people will actually do, rendering much attitude research pretty much useless.
So it should not have come as a major surprise when a new study by
Uttl, White & Gonzalez found no correlation between student’s learning and their evaluation of their teachers. The held wisdom for almost 30 years has been that teacher which students’ say are better, will also teach better. So we expect that when a teacher gets really good student evaluations, it will also mean great outcomes for more direct measurements of learning.
After carefully poring over many studies over the years that have claimed this link exists, this research team have come up with nothing. Once you control for poor research design all of the claimed correlation between teacher evaluation and student learning evaporates.
This is actually a pretty big deal given that student evaluations are a key part of things like performance bonuses for teachers, but for those of us in professional training the implications go beyond that.
To understand why student evaluations of teachers are actually poor sources of information, we have to understand that the skill of a given teacher is only one factor that influences how well a given student will learn. On top of that, there are a number of additional factors that influence a student’s perception of a given teacher, many of them have nothing to do with actual learning or teaching effectiveness.
All students are not made equal, people differ in terms of their intelligence, prior knowledge, cultural values and motivation. These are just some of the factors that make different students perform differently under the same teacher. Even more worrying is the fact that student biases, even ones they are unaware of, can influence evaluations. One study found clear indications of gender bias, where students evaluate a teacher more highly, simply if they think the teacher is a male. Depending on the student’s context there may also be racial bias and any number of highly personal biases. None of these are clear reflections of either the competency of the teacher, and now we know they also don’t predict how well students will learn. So what should we do about it?
Should we do away with instructor evaluations completely? While many are suggesting this, it’s not necessary to throw a tool out that may still be of value to us. You see, although measuring student perceptions of teacher performance is not an accurate reflection of actual performance, it is an accurate reflection of student perceptions.
This may sound obvious, but it is an important distinction to understand. Students (and all human beings) understand the world through their own perceptual filters. While we can’t devote time and effort to deal with highly-specific individual perceptions, an evaluation of a trainer which shows consistent themes are a good guideline to how teaching practice might be changed to make students more positive. After all, learning can only be negatively affected by students who have a negative mindset. We can work to change perceptions without making the mistake of thinking that students perceptions of a trainer or teacher truly reflect their competence as educators.
So training evaluations are more about knowing if there is room to subjectively improve a student’s or trainee’s experience and not about evaluating the trainer. Just remember that our main goal is to effectively transfer skills and knowledge, so while it would be nice to expend time and energy until every trainee’s perceptions are positive, we have to balance that with measurements of actual effectiveness in the workplace. Other measurements that look directly at assessing student learning should therefore take priority and be addressed first, before superficial first-level perceptions.
What about evaluating trainers for real? It’s probably better to make use of peer-evaluation when it comes to assessing trainers. Using two or three experienced trainers to observe and rate a colleague is far more likely to give you an accurate idea of trainer effectiveness and performance than asking students for their opinions.
In the end we can still make good use of data that show us how students perceive their trainers, but we have to be smart about how we use that information and clearly understand what that information does and does not tell us.
The most important thing is to make sure you never use a single measure, but have a number of them that can be evaluated in context of each other. For example, if your student trainer evaluations AND your peer review evaluations both indicate that a trainer is not very effective, you may be onto something, but if only the student review is negative you’ll have to remember what that data is worth when it comes to the question of true effectiveness. Likewise, if a trainer gets a poor student evaluation, but the students who were trained by that trainer actually learn well and apply that knowledge effectively, something else may be going on. It could be a sign of bias on the part of the trainees or it could be that the trainer’s methods are unpopular, but effective. The two are not mutually exclusive.
The final and most important lesson that we should take from this is that we should never become too complacent. We should never believe in a number just because it is a number, but question the foundation of that number. We have to constantly ask ourselves whether what we do makes sense and how we can do it better in the future. It should never have taken 30 years to discover that one of our prime evaluation tools have been lying to us all along.
If you would like to increase the effectiveness of your safety training, download our free eBook below