You may want compliance from an assistant, but not from a co-founder. You want a co-founder with integrity. We propose ‘model integrity’ as an overlooked challenge in aligning LLM agents.
Thank you for this work. Yes, explicit values allow for a more nuanced and complex engagement than explicit rules. But true "alignment" (in a nontrivial sense) is fundamentally not about predictability, and cannot arise from delineating explicit "values" to "align" with - because on some level these stated values will always contradict one another. The "alignment" of an LLM and a human arises implicitly from authentic, embodied, and uncontrived interaction. There is no substitute for this.
Thank you for this work. Yes, explicit values allow for a more nuanced and complex engagement than explicit rules. But true "alignment" (in a nontrivial sense) is fundamentally not about predictability, and cannot arise from delineating explicit "values" to "align" with - because on some level these stated values will always contradict one another. The "alignment" of an LLM and a human arises implicitly from authentic, embodied, and uncontrived interaction. There is no substitute for this.
https://freelyfreely.substack.com/p/emergence-and-alignment