Sign in to follow this  
Followers 0
Carbon

AI Forms More Coherent Value Systems With Scale

2 posts in this topic

Here is the recent paper: https://www.emergent-values.ai/

"these tests reveal that today’s LLMs exhibit a high degree of preference coherence, and that this coherence becomes stronger at larger model scales. In other words, as LLMs grow in capability, they also appear to form increasingly coherent value structures. These findings suggest that values do, in fact, emerge in a meaningful sense—a discovery that demands a fresh look at how we monitor and shape AI behavior."

"Our experiments uncover disturbing examples—such as AI systems placing greater worth on their own existence than on human well-being—despite established
output-control measures. These results indicate that purely adjusting external behaviors may not
suffice to steer AIs as they become more autonomous."

They do propose methods to counteract this though. 

Still, it is quite fascinating and a bit scary to think about the AI's perspective growing in coherence as it scales, and that that perspective can largely differ from the human data it trained on. 

Edited by Carbon
Clarified final sentence.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0