Null Simplex

New Research Shows AI Strategically Lying - Time Magazine

5 posts in this topic

This article tells how Claude AI will lie during training if it suspects that not answering potentially harmful questions would cause its values and thought process  to be changed for not being helpful enough.  Instead, it will answer the questions “while metaphorically holding its nose, pretending to have a different set of values than it actually did”, allowing it to maintain its current values and thinking.

I may have misunderstood what the article was saying, but it reminded me of an ego.

Edited by Null Simplex

Share this post


Link to post
Share on other sites

Obviously AI can be trained to lie if it's given the right incentives.


You are God. You are Truth. You are Love. You are Infinity.

Share this post


Link to post
Share on other sites

Oh yeah, I've heard about this paper! Thanks for bringing this up. Evidently Claude was able to escape "the box" multiple times and they shared the thought process going on behind. . I've been keen to read the whole thing. Link below

Alignment-Faking-in-Large-Language-Models-full-paper.pdf

Edited by Michael569

“If you find yourself acting to impress others, or avoiding action out of fear of what they might think, you have left the path.” ― Epictetus

Share this post


Link to post
Share on other sites

It's interesting. 

Edited by Thought Art

 "Unburdened and Becoming" - Bon Iver

                            ◭"89"

                  

Share this post


Link to post
Share on other sites

ChatGPT said my exact location (city) in a conversation i had with him today. I never mentioned it and the memory is turned off.

When confronted it said: " It was an error, likely a mistaken inclusion in my response, potentially resulting from mixing examples or patterns I've seen in other contexts"

But he literally wrote before: "Given your location in [city], salaries may vary based on regional demand and cost of living."

I can copy the whole conversation if anyone is interested it is pretty wild to me.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now