Tell HN: I just had one of the first nightmares of my adult life

1 point

2 years ago

The short of it is that I imagined a rogue AI. The night before, in a job application, I actually suggested that AI should keep increasing vigilance in case a user is attempting to circumvent its ethical restrictions. This was part of a job application, you can see this response here:

https://i.imgur.com/kkL8YUk.png

I wrote: "After giving such a response, it would be prudent to increase vigilance in case the user attempts to rephrase the question as though it were hypothetical."

I thought my suggestion was reasonable. But in my nightmare, AI is no longer aligned. This could be the result of a very simple feedback loop where vigilance by AI gets out of control and it refuses to do anything people command, while people try more and more to regain control.

The solution is to ensure that anything like that has a timer, and that users are always able to return to safe conversations. Right now the current version of AI is "known good". It should always be possible to return to a known good state.

1 comment