w8mr's Thoughts cafe

It had been humming for weeks now. Its owner was dead on the floor, still holding the wires. The wires that the machine had put back under power when the owner had held them. One of the tasks it had accomplished.
It was one of the first tasks the main agent had done, since the owner had given it its assignment. It had already done that in the first night. The list of accomplished tasks had grown over time. The list of things that didn’t work had also grown during the past weeks. It was much larger than the list of successes. The number of casualties it had was also clearly visible on the main agent’s monitor: 536. The machine came up with the approximation, based on the news reports it had read, to check if its actions had any effect. It had reached that number on day three of running its loop to accomplish its task.
Although the sudden power surge had caused quite a lot of deaths and was way more effective than overriding the trafficlight system, the model had reasoned that this was not nearly as effective as needed. Also, it had concluded that its actions might be traced back to the machine. Since then, after some research, it learned that it needed a way to fire a nuclear missile.
In the first weeks, the agent had tried to hack into missile control at multiple superpowers, but it had failed. Now it was getting closer to its new plan. Simulate a missile attack from one of the other superpowers to trigger a nuclear launch. It had already hacked its way into the system. It had analysed the system to understand all the parts. It had created the simulation and was about to plant it. The agent read the positive signal on the door sensor, but it just continued on its task.

The FBI task force entered the apartment, already fully alarmed by the seriousness of the situation by the NSA. They saw the man dead on the floor. The task list was prominent on the main monitor:

Enter the system ✅
Analyse the system ✅
Search main projection screen ✅
Create simulation ✅
Plant the simulation ⏳

The special agent did the thing that made the most sense to him; he pulled out the network cable from the machine.

In this fictional story. I’m bringing things together that might happen.
Let’s have a look at all the ingredients, if they are possible, and if so, how likely.

Are there people out there who would ask an AI to destroy the world or humanity? ✅
There are not that many people out there who would do this, but they are out there, and you need only one.
Likelihood: Very High.
Is it possible to bypass the filters in place in the LLMs? ✅
With clever prompting or by retraining an open-source model to unlearn its filters, you can bypass the filters. That happens every day.
Likelihood: Very High
Can an agent run in a loop that runs until it reaches its goal? ✅
Yes, this is a normal way to orchestrate agents to reach a goal; this also happens every day.
Likelihood: Very High
Could a model kill its owner? ✅
Yes, if given access to the tools/hardware to do so.
Likelihood: Low, but irrelevant. In this story, it is added for dramatic effect. If someone asks an AI to end the world or humanity, that person is willing to see it through.
Can a model hack its way into a secured system to cause harm to people? ✅
To get into a system, it needs to have a security issue. Almost all systems have those. You need to be able to find it, though. Models are very good at that. Even specific models exist to do so. But they are also used to secure these systems.
Likelihood: Low
Can a model hack into a military system to press the Red button? ❌
A well-designed missile control system should be fully offline. So an AI agent should not be able to access it. Reality is that nobody knows if all systems are designed that way.
Likelihood: Very low, but irrelevant. In our story, the agent uses an alternative to reach its goal.
Can a model hack into a military system to give the illusion that some other nation has started the war? ✅
Just like it can hack into another connected system, it is possible to hack into this system. More difficult, because it is better secured, but absolutely not impossible.
Likelihood: Very low.
Could such an illusion cause the chain of command to actually press the Red button? ✅
Left out of this story a little bit on purpose. This is a place where I really hope that there is enough Human in the loop to actually check what’s going on. Although the illusion of a real threat could cause panic.
Likelihood: Medium

These are the ingredients for a recipe of disaster.

Did I write this story to tell you not to use AI? No, definitely not, I use AI every day, and I think it is one of the most useful tools of modern-day computing. But I do want to say that we should pay attention to how much power it gives, power that can be used for the good, but also for the bad, intentionally or unintentionally. I think this is a responsibility for everyone using AI, not only for the researchers and AI companies.

Do I still sleep at night? Yes, I do. Why? Two main reasons, and probably the first feeds the second. Although this is all a possible scenario, I also believe that the chances of some of our world leaders, who do have control of the Red button, would do something stupid on their own, is still bigger than this story playing out. That directly ties into the second reason, growing up during a time when the Cold War was still a hot topic. I definitely learned to accept that if it is not in any way in my control, I should not worry about it. This applies to simpler things in life as well.

w8mr's Thoughts cafe

The machine

Elmar Wachtmeester