Book review: The Alignment Problem

The Alignment Problem

The Alignment Problem: Machine Learning and Human Values. How to compress one of the toughest, most intellectually demanding issues of humanity into one book of about 300 pages? I certainly wouldn’t be up for the task. Brian Christian is. A computer scientist inclined on philosophy, and through this book (at least) on psychology too.

Probably you’ve heard about reinforcement learning in conversations on AI. It originates from psychology and animal behaviourism, like so many other parts of the field of AI (neural networks and temporal differences are two others), while others touch philosophical issues and conundrums humans have pondered on for centuries. Brian Christian, like Johan Harri, travels the world to interview lots of people about how to get machines to understand and obey humans. 

What’s it like to code artificial intelligence? Think of AI-programming as asking for wishes from a genie. How do you truly and literally articulate three questions for things (for instance, what is a thing or a question even, where does the thing or question begin and where does it end)? How can you ever be sure the provider (program) comprehends the three things precisely the same way as you do? 

You wish for a long, healthy life. What is long? Stretched out, or with a lifespan beginning and ending clearly? Longevity as in an average human life now or 2.000 years ago, or 120 years in to the future? Long as a star? Long as a giraffe’s neck? What’s included in the word healthy? Not being obese? Not being lanky? Being muscular? Living healthy for 30 years and then suddenly die of an aneurysm? Or to live healthy for 85 years and the fade away during two decades? Does healthy mean you start to smoke, without any repercussions, and it thus causes you to die of a lung or heart disease you otherwise wouldn’t? Does it mean you can suffer from terrible diseases if you catch or cause them, but never have a cold or a light fever? Besides, what is life? (Should you rather be able to wish with your inner thoughts depicted to the receiver? Then what happens if you’re interrupted by other thoughts in those thoughts?)

These are very simple examples of how hard it is to code, to express what you wish a program to execute. What you wish, you’re very unlikely to express in an exact manner to a machine because you can’t project every single detail to it: The alignment problem.

Compared to plenty of writers on AI or code (perhaps except for Scott J. Shapiro) Christian really delves into deep issues here. He won’t let you simply read the book, but dives into details and present thoughts, then provoke you with delving deeper and then even deeper. He reasons on driving, for instance: Male drivers are generally worse than female drivers. Applying AI as a solution to this issue could mean male drivers will be targeted primarily, which means fewer targeted female drivers. Thus driving could be worse, since the female drivers then would in general be driving less safe than the remaining male drivers = worse traffic. Applying AI seems simple and straightforward, but very seldom is. Christian concludes that “alignment will be messy.”

To program artificial intelligence, one also needs to understand politics, sociology and gender – social sciences – because what do words like “good”, “bad”, “accurate”, “female” really mean? Any word needs some context and what is that context, or those contexts? Christian mentions sociologists who can’t reduce models dichotomously, whereas that’s how computer scientists believe reality is perceived. They need to cooperate to adjust the programs to reality as best as they can, which may not always be feasible. As Shapiro writes, you simply can’t reduce reality this way. And messy as it is, you can’t turn the AI program neutral/blind either (just ask Google about the black Nazi soldier produced by Google Bard/Gemini.).

The alignment problem is thorough. Christian immerses the reader into fields of temporal difference (TD) and sparsity in reinforcement learning, independent and identically distributed in (i.i.d.) supervised/unsupervised learning, redundant coding (like the discussion on gender above mentioned), simple models, saliency, multitask nets and a bunch of guys sitting around the table (BOGSAT).

This book is a true achievement. This book is a gift to humanity. This is the one book on artificial intelligence to read.

(If you’re disinclined to read the book, I recommend this podcast episode (though almost three hours long) instead. https://80000hours.org/podcast/episodes/brian-christian-the-alignment-problem/)