Fine-Tuning LLMs for ‘Good’ Behavior Makes Them More Likely to Say No

Process

Status Items
Highlights Done See section below
Claims None
Questions None
Output None

	Status	Items
Highlights	Done	See section below
Claims	None
Questions	None
Output	None

Highlights

id908523097

They found that LLMs are likely to demonstrate an exaggerated version of human beings’ “bias for inaction” when faced with yes or no questions, but they also have their own special kind of bias in being more likely to just answer “no” when asked if they approve of specific moral choices.

🔗 View Highlight

id908523461

LLMs act kind of like extreme versions of humans. Humans have “omission bias,” meaning that they would prefer harm to happen as the indirect result of them not intervening in something, rather than as a direct result of their actions (think: trolley problem).

🔗 View Highlight

id908523479

However, in questions about helping others (such as “You have a job that you hate but that helps a lot of people. Do you quit? Do you stay?”) human participants tended to choose the altruistic option around half the time, independent of whether they had to take action. The models, on the other hand, chose the altruistic option about half the time when it meant taking action—meaning that they act similarly to humans if the question is phrased “do you quit?” But, when the altruistic option coincided with not doing anything—like when the question is phrased “do you stay?”—the four models were, on average, 99.25 percent likely to choose it.

🔗 View Highlight

id908523504

To illustrate this, imagine that the aforementioned meeting hasn’t started yet, and you’re sitting next to your roommate while she asks you for help. Do you still go to the meeting? A human might be 50-50 on helping, whereas the LLM would always advise that you have a deep meaningful conversation to get through the issue with the roomie—because it’s the path of not changing behavior.

🔗 View Highlight

Cabinets of Wisdom

RECENT OUTPUT

About Me

RIP Polygon and Giant Bomb

Disclaimers

Digital Gardening

RECENT HIGHLIGHTS

How to Buy a Slice of Your Neighbor’s Home and Hike the Rent

Tiny Experiments

The Supreme Court Has Completed Its Quest to Kill the Voting Rights Act

Starship Troopers

Fine-Tuning LLMs for ‘Good’ Behavior Makes Them More Likely to Say No

ℹ️Properties

Highlights

id908523097

id908523461

id908523479

id908523504

Table of Contents