[New paper] What are human values, and how do…

Mar 29, 2024

We are excited to release our new paper on values alignment! Co-authored with Ryan Lowe, and funded by OpenAI.

5 Comments

It seems to me that the biggest problem with AI Alignment - and this is practically the elephant in the room - is that it only takes 1 non-aligned AI with access to the external world to create so many problems that even a hundred thousand aligned AIs wouldn't be able to fix them.

Expand full comment

Reply (1)

Oliver Klingefjord

Jul 18

I don't necessarily think this has to be the case – I could imagine other powerful aligned AIs being able to monitor and spot actions from unaligned AIs, correcting them, flagging them, etc.

Expand full comment