It seems to me that the biggest problem with AI Alignment - and this is practically the elephant in the room - is that it only takes 1 non-aligned AI with access to the external world to create so many problems that even a hundred thousand aligned AIs wouldn't be able to fix them.
I don't necessarily think this has to be the case – I could imagine other powerful aligned AIs being able to monitor and spot actions from unaligned AIs, correcting them, flagging them, etc.
It seems to me that the biggest problem with AI Alignment - and this is practically the elephant in the room - is that it only takes 1 non-aligned AI with access to the external world to create so many problems that even a hundred thousand aligned AIs wouldn't be able to fix them.
I don't necessarily think this has to be the case – I could imagine other powerful aligned AIs being able to monitor and spot actions from unaligned AIs, correcting them, flagging them, etc.
The link leads nowhere?
https://arxiv.org/abs/2404.10636
Should work – if you open it many times successively, it will lead to a 404 (security measure by squarespace).