To Grade or Not to Grade

My musings when a friend asked about my thoughts on whether we ought to switch to credit/no-credit for this semester.

At the law school here we are having an advisory faculty vote on this – as I understand it, some law students petitioned for change to credit/nocredit. They tend to be a very anxious bunch. At the Faculty of Information, my home faculty, a few discussions about doing this in particular classes but otherwise current plan is business as usual.

When I think about it, much lands on the side of switching to pass/fail. There’s the fairness of changing horses midstream when students had been marshalling their resources and work habits in light of what they know as their strengths and weakness and such. Maybe I’m banking on acing the final after mediocre midterm, but now I might be thrown for a loop. Or maybe I was struggling to participate, but now I find participation by chat suddenly frees my voice. In short, there is so much unintended, and out of people’s control, movement in factors that can affect the thing we think we are measuring with grades, and so much measurement error introduced by the untested methods we are about to use that the error bars on the grades we come up with will be so large that A might overlap with C and so on.

Counter to this, especially in the case of first year law students is that no small number of external things are keyed to first year grades. It could be argued that credit/no-credit would be a real disservice to the students who were destined to get higher marks. ON THE OTHER HAND, if this crisis forced employers and scholarship providers to look beyond the GPA, that might represent a sizable social good.

And then there is the mindset we are all in which is “adapt and be pro-social.” It’d be nice not to be contradicting that with incentives to maximize one’s own grades in the midst of turmoil.

Selfishly, perhaps, I find myself thinking how much of a nightmare headache it is going to be as I try to be fair at semester’s end. I will be adjudicating a whole bunch of personal situation information that’s below the threshold of official requests for accommodation. There’s always a bit of this, but I expect a lot this round. Is that really what the school wants me to be spending my energy and time on next month?

The employer issue raised above relates to the question of the external utility of grades: are any employers or graduate schools going to apply strict scrutiny for grades earned this semester? Or are they going to know full well that extraordinary circumstances render those grades perhaps a less reliable signal than they are usually taken to be? In a sense, both students and professors would be investing a whole bunch of time to come up with fine grained measurements that nobody is ever going to pay attention to. (WE certainly shouldn’t when it comes to cumulative GPAs if we are honest – course to course variations are wide in the best of times).

Maybe I can get a handle on this with some analogies: sometimes in a restaurant when they mess up your order or have to substitute things they just comp you the meal or part of it.

In the tour-de-france if there’s an accident in the last 3km of the race they just give everyone involved the finishing time of those in the group who make it to the finish line. The point is to avoid dangerous behaviour at the finish AND to not give people time bonuses simply because they avoided getting wiped out by the falling cyclist on the last turn.

Then there is a part of me that thinks we should always be “pass/fail” with a bar that’s a lot higher than D. I’m inherently skeptical that there are meaningful small differences that we can well characterize with things like B+ vs A-. My current grading practice is something like 85-87 (the lower boundary on an A – this is Canada) means competently completed as assigned and then the numbers over that signal impressive extras and numbers below deficiencies and things missing, but I don’t manage much more fine grainedness than that. Since I’ve been grading assignments like that all semester, I’ve done all the “this is great, do more of it!” and “this is short of expectations” formative assessing along the way. A summative “done well enough, let’s wrap this up and move forward” would probably be a smart and responsible move for my professional students in the current circumstances.

Assessing Assessment?

The appalling legacy of “assessment” goes on and on and on. This “frank discussion” at a recent WASC conference is a classic bit of “too little, too late.”

I’m someone keenly interested in the organizational aspects of higher education, especially in questions of how we know we are being as effective and as productive as the world needs us to be. But for most of my career I’ve watched millions of person-dollars squandered on misguided efforts to “document” and “measure” learning. Alongside that I’ve watched the erosion of the intellectual integrity of institutions and individuals as they winked and went through the motions of methods they knew (or should have known) were bogus and would never produce actionable, valid knowledge. We watched as individual faculty members sold their souls for small stipends or to keep on the good side of a dean who might have input into their tenure or promotion case. And those of us who dared to apply our professional training to point out the inanity of the methodological manure being sold to us endured being dressed down for not being team players or having our commitment to students questioned by arrogant small-minded assessment consultants.

A real underlying pathology exposed by the ongoing assessment debacle is the monopoly power of the accreditation agencies. For the last two decades they ranted about accountability in higher education – the one standard they would never have to meet. The hypocrisy of agencies like WASC being immune to serious criticism should be an embarrassment to people who care about higher education.

The simple move of forcing national education accreditation agencies to compete rather than allowing them to enjoy geography-based monopolies would do more for higher education than a thousand conference presentations from people who live off the problem rather than for its solution.

Bad Methods Yield Non-Actionable Answers

Originally published June 2017

Having drunk the KoolAid of rubrics and assessment, many the untrained academic administrator epitomizes that old saw about a knowing just enough to be dangerous. Suppose a manager wants to make a decision based on multiple criteria. An academic manager, for example, might consider

  • Employee Type
  • Organization Needs and Employee Expertise
  • Employee Productivity
  • Employee Versatility
  • Engagement in Critical Roles

The plan is to rate each employee on each dimension and then add up the ratings to yield a score that will permit comparison between employees for the purpose of decisions about whether to retain the employee or not.

The individual ratings will be some variation on High, Medium, Low.

The use of rubrics such as this is all the rage in higher education. Unfortunately, they are frequently deployed in a manner that reduces

Ratings are not normalized

By having some categories top ranking count 3 and others 2 points we introduce a distortion into the final score. Type, match, and productivity “count” more than versatility and critical role.  If that’s intended, fine, but if not, it skews results.

Ordinal Scales Do Not Contain Distance Information

Any fool, as they say, knows that “high” is more than “medium” which is more than “low” and “low” is more than “none.”  When we have a scale that has this property we call it an “ordinal” scale; the elements of the scale can unambiguously be ordered from low to high.

What we do NOT know, though, is whether the “distance” between a high rating and a medium rating is equal to the distance between a medium rating and a low rating.

Although it is extremely common to look at an ordinal scale like “high, medium, and low” and assign 3 to high, 2 to medium, and 1 to low, this is a serious methodological error.  It invents information out of thin air and inserts it into the assessment. The ways in which this distorts the answers that emerge from the measurement cannot be determined without careful analysis. Just writing 3, 2, 1 next to words is not careful analysis.

Criteria Overlap Double Counts Things

Suppose some of the same underlying traits and behaviors contribute to a needs/expertise match and an employee’s versatility and that this trait is one of many we would like to consider in deciding whether to retain the employee. Since it has an impact on both factors its presence effectively gets counted twice (as would its absence).
Unless we are very careful to be sure that each rating category is separate and distinct, a rubric like this introduces distortion into the final score by unintentionally overweighting some factors and underweighting others.

Sequence Matters

When using rubrics like this we sometimes hear that one or another criteria is only used after the others or is used as a screen before the others. This too needs to be done thoughtfully and deliberately. It is not hard to show how different sequences of applying criteria can result in different outcomes.

Zero is Not Nothing

A final problem with scales like these is that even if the distance between the ratings were meaningful, it is not always the case that we have a well defined “zero” rating.  Assigning zero to the lowest rating category is not the same as saying that those assigned to this category have none of whatever is being measured.
The problem that this introduces is that a scale without a well understood zero measurement yields measurements that cannot be multiplied and divided. This means that we cannot think in terms of average ratings as we often do.

Rankings are Just Rankings

The upshot is that ordinal scales are just rankings, just orderings, and without a more well established underlying numerical scale rankings are very hard to compare and combine in a manner that does not obscure more than it illuminates. Decisions based on naive uses of quantification are as likely as not to be wrong and influenced by extraneous and unacknowledged factors or just be the result of random consequences of choices made along the way.

"But even if they are not valid, they do tell you something…."

Remember, “validity” means “they measure what you think they measure.” “Data driven” can also mean driven right off the side of the road.

From Inside Higher Ed

Zero Correlation Between Evaluations and Learning

New study adds to evidence that student reviews of professors have limited validity.
September 21, 2016 By Colleen Flaherty


A number of studies suggest that student evaluations of teaching are unreliable due to various kinds of biases against instructors. (Here’s one addressing gender.) Yet conventional wisdom remains that students learn best from highly rated instructors; tenure cases have even hinged on it.
What if the data backing up conventional wisdom were off? A new study suggests that past analyses linking student achievement to high student teaching evaluation ratings are flawed, a mere “artifact of small sample sized studies and publication bias.”
“Whereas the small sample sized studies showed large and moderate correlation, the large sample sized studies showed no or only minimal correlation between [student evaluations of teaching, or SET] ratings and learning,” reads the study, in press with Studies in Educational Evaluation. “Our up-to-date meta-analysis of all multi-section studies revealed no significant correlations between [evaluation] ratings and learning.”

Real "Competencies" for the 21st Century

Music to my ears. Sarah Lawrence, long known for its innovative approach to liberal arts education (still using narrative evaluations – something we could adopt at Mills to great effect IMHO), crafts a simple response to assessment madness and places it where it should be: at the student-advisor nexus.

Imagine: six goals that are about skill not ideological content; evaluated every semester in every course; tracked over time by student and advisor. Throw all the rest of the baroque apparatus away and get on with educating.

H/T to Mark Henderson

Play audio at MarketPlace Education

At Sarah Lawrence College in Bronxville, N.Y., about ten students — all women but one — sit at a round table discussing Jane Austen’s “Northanger Abbey.”

The 88-year-old college has a reputation for doing things differently. Most classes are small seminars like this one. There are no majors. Students do a lot of independent projects. And grades aren’t as important as the long written evaluations professors give every student at the end of every semester. It’s no surprise, then, that professor James Horowitz is skeptical of any uniform college rating system, like the one being proposed by the Obama administration.

“The goals that we are trying to achieve in instructing our students might be very different from what the University of Chicago or many other schools or a state school or a community college might be striving to achieve,” Horowitz says.

The Obama administration is due out this spring with details of its controversial plan to rate colleges on measures like value and affordability. The idea is that if students can compare schools on cost, graduation rates and even how much money students earn after they graduate — colleges might have to step up their game. Especially if, as proposed, poor performers risk losing access to federal financial aid.

All that, naturally, makes colleges just a bit nervous. Sarah Lawrence is fighting back with its own way of measuring value. The faculty came up with six abilities they think every Sarah Lawrence graduate should have. They include the ability to write and communicate effectively, to think analytically, and to accept and act on critique.

“We don’t believe that there’s like 100 things you should know when you graduate,” says computer science professor Michael Siff, who helped develop the tool. “It’s much more about are you a good learner? Do you know how to enter into a new domain and attack it with an open mind, but also an organized mind?”

Faculty advisors can use the results to track students’ progress over time and help them address any weaknesses. A student who’s struggling with communication could take class with a lot of oral presentations, for example, or make an appointment at the campus writing center.

But Siff says the tool is also about figuring out what the college can do better.
“This tool will allow us to assess ourselves as an institution,” he says. “Are we imparting what we believe to be these critical abilities?”

So how is the school doing? So far there are only data for two semesters, but on every measure seniors do better than juniors. Sophomores do better than freshmen.

Starting next fall, advisors will meet with their students at the beginning of each semester to talk over their progress. In sort of a trial run, Siff goes over the results so far with one of his advisees, junior Zachary Doege.

On a scale from “not yet developed” to “excellent,” he’s mostly at the top end. Doege says he likes seeing his own growth.

“I think the thing I like the most about this is just the fact that I can look back at how I was doing in previous semesters and sort of chart my own progress,” he says. “Not comparing me towards other students—just me to myself.”

That’s a different measure of the value of an education than, say, student loan debt or earnings after graduation — the sorts of things the Obama administration is considering as part of its ratings plan. Students and parents are right to ask if they’re getting their money’s worth, says the college’s president, Karen Lawrence. After financial aid, the average cost of a Sarah Lawrence education is almost $43,000 a year.

“People are worried about cost,” Lawrence says. “We understand that.”

And they’re worried about getting jobs after graduation. But she says the abilities that the new assessment measures—critical thinking and innovation and collaboration—are the same ones employers say they’re looking for.

“We think these are abilities that students are going to need both right after graduation and in the future, and so it could be an interesting model.”

One she hopes other schools will take a look at as they figure out how to answer the national debate about the value of college.

What If Administrator Pay Were Tied to Student Learning Outcomes

The recent negotiation in Chicago (“Performance Pay for College Faculty“) of a tie between student performance and college instructor pay brought this accolade from an administrator:  it gets faculty “to take a financial stake in student success.”

It got me wondering why we don’t hear more about directly tying administrator pay to student success.  If we did, I’ll bet the students would have a lot more success.  At least, that’s what the data released to the public (and Board of Trustees) would show.  There’d be far less of a crisis in higher education.

Thought experiment. What would  happen if we were to tie administrator pay to student success — much the way corporate CEOs have their pay packages designed — especially administrators of large multi-campus systems.

Prediction 1.  The immediate response to the very proposal would be “oh, no, you can’t do that because we do not have the same kind of authority to hire and fire and reward and punish that a corporate CEO has.”  But think about this…

  1. Private sector management has a lot less flexibility than those looking in from the outside think.  Almost all of the organizational impediments to simple, rational management are endemic to all organizations.
  2. Leadership is not primarily about picking the members of your team. It’s about what you manage to get the team you have to accomplish.
  3. Educational administrators do not start the job ignorant of how these educational institutions work. It is tremendously disingenuous to say “if only I had a different set of tools.”  People who do not think they can manage with the tools available and within the culture as it exists should not take these jobs in the first place.
  4. This, it turns out, is what some people mean when they say that schools should be run like a business. The first impulse of unsuccessful leaders is to blame the led. The second one is to engage in organizational sector envy: “if I had the tools they have over in X industry….”  What this ignores is the obvious evidence that others DO succeed in your industry with your tools.  And plenty of leaders “over there” fail too.  It is not the tools’ fault.

Prediction 2.  Learning would be redefined in terms of things produced by inputs administrators had more control over.  And resources would flow in that direction too.

Prediction 3. Administrators would get panicky when they looked at the rubrics in the assessment plans they exhort faculty to participate in and that are included in reports they have signed off on for accreditation agencies.  They’d suddenly start hearing the critics who raise questions about methodologies.  They would start to demand that smart ideas should drive the process and that computer systems should accommodate good ideas rather than being a reason for implementing bad ones.

Prediction 4. In some cases it would motivate individuals to start really thinking “will this promote real learning for students” each time they make a decision.  And they’ll look carefully at all that assessment data they’ve had the faculty produce and mutter, “damned if I know.”

Prediction 5. Someone will argue that the question is moot because administrators are already held responsible for institutional learning outcomes.   Someone else will say “Plus ça change, plus c’est la même chose.”