The Old Alignment Problem

Google the phrase “alignment problem” and the first result might well be the very excellent book of that title by Brian Christian. And the next one might be a definition such as “the challenge of steering AI systems toward human goals and values.”

AI alignment is one facet of an emerging field called “AI safety.” It’s controversial in some quarters, but it’s arguably among the most important issues at the nexus of technology and society these days.

If you read a little further you might encounter a thinker who seems to want to derail the conversation by asking which humans’ values we have in mind, suggesting that talk of generic human values usually involves assuming that some particular group’s values are everyone’s values and so maybe we have to talk about that first.

But even before getting to that conundrum, there is plenty to chew on: AI can be thought of as goal-seeking or objective-maximizing automation; but even if it could do those things perfectly, it turns out that specifying goals and objectives for machines can be a tricky business. You might know what you want and what you care about, but it is challenging to translate that for the machine. And, further, it turns out it is also hard for you to know what you care about and what you want. And if there is more than one of you or if you have some responsibility for other people, it is very hard to ascertain what all of you want and care about and so getting the machine on board with “human values” (no matter whose they are) is anything but straightforward.

But while AI safety is a relatively new field, the alignment problem is anything but new. Humans have been struggling with value alignment for pretty much as long as they have been around; the problem of human social organization is the problem of value alignment. From two hunter-gatherers struggling to cooperate to find food to one spouse sending the other to do the shopping to a corporation trying to get marketing and product to coordinate around the company’s mission, the challenge of steering other agents toward our goals and values has bedeviled us forever.

And that problem of “whose values” has been here the whole time. Families, friendships, communities, clans, companies, and countries always face the dual challenge of steering agents toward a set of goals and values AND figuring out what those goals and values are.