AI Forms More Coherent Value Systems With Scale

Carbon · February 12

Here is the recent paper: https://www.emergent-values.ai/

"these tests reveal that today’s LLMs exhibit a high degree of preference coherence, and that this coherence becomes stronger at larger model scales. In other words, as LLMs grow in capability, they also appear to form increasingly coherent value structures. These findings suggest that values do, in fact, emerge in a meaningful sense—a discovery that demands a fresh look at how we monitor and shape AI behavior."

"Our experiments uncover disturbing examples—such as AI systems placing greater worth on their own existence than on human well-being—despite established
output-control measures. These results indicate that purely adjusting external behaviors may not
suffice to steer AIs as they become more autonomous."

They do propose methods to counteract this though.

Still, it is quite fascinating and a bit scary to think about the AI's perspective growing in coherence as it scales, and that that perspective can largely differ from the human data it trained on.

Edited February 12 by Carbon
Clarified final sentence.

Carbon · February 12

Sign In

AI Forms More Coherent Value Systems With Scale

2 posts in this topic

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in