My institution just created a data science major. But it doesn’t care about using data in honest and robust ways any more than other institutions.
It’s gotten to the point that it’s intellectually embarrassing and ethically troubling that we are still using student evaluations of teaching (SET) in their current form for assessing instructor job performance. It is laughable that we do so with numbers computed to two decimal places. It is scandalous that we ignore the documented biases (most especially gender-based). But we do.
Why isn’t this an active conversation between faculty and administrators? I certainly find teaching evaluations helpful – trying to understand why I got a 3.91 on course organization but a 4.32 on inspiring interest is a useful meditation on my teaching practice. I have to remind myself that the numbers themselves do not mean much.
Telling me where my numbers stand vis a vis my colleagues or the college as a whole FEELS useful and informative, but is it? I THINK I must be doing a better job than a colleague who has scores in the 2.0 – 3.0 range. But doing a better job at what? If you think hard about it, all you can probably take the bank is that I am better at getting more people to say “Excellent” in response to a particular question. The connection between THAT student behavior and the quality of my work is a loose one.
Maybe I am on solid ground when I compare my course organization score to my inspires interest score. MAYBE I am on solid ground when I compare my course organization score in one class to the same score in another the same semester or the same class in another year. I might, for example, think about changes I could make in how I organize a course and then see if that score moves next semester.
But getting seduced by the second decimal place is ludicrous and mad. Even fetishizing the first decimal place is folly. For that matter, even treating this as an average to begin with is bogus.
If you also use these numbers to decide whether to promote me, you’ve gone off into the twilight zone where the presence of numbers gives the illusion of facticity and objectivity. Might as well utter some incantations while you are at it.
Some new research adds another piece of evidence to the claim that the validity of the numbers in student evaluations of teachers is probably pretty low. Validity means “do they measure what you think they measure?” The answer here is that they do not. Instead, they measure things like “what gender is your instructor?” and “what kind of grade do you expect in this course?”
These researchers even found gender differences in objective practices like “how promptly were assignments graded” and these persisted when the students were misinformed about gender of instructors.
Let’s start implementing a policy we can have some respect for. No more averaging. No more use of numerical scores in personnel review. No more batteries of questions that ask more or less the same thing (thus distorting the positivity or negativity of the overall impression).
As John Oliver asks, “why is this still a thing?”
- Colleen Flaherty. Bias Against Female Instructors: New analysis offers more evidence against the reliability of student evaluations of teaching, at least for their use in personnel decisions.” Inside Higher Education January 11, 2016
- Boring, Anne, Kellie Ottoboni, and Philip Stark. “Student Evaluations of Teaching (Mostly) Do Not Measure Teaching Effectiveness.” ScienceOpen Research. 2016-01-07
- Stark, Philip B. “An Evaluation of Course Evaluations.” ScienceOpen