Some of the older models did do this (like 3.5-era ish I think), and the harness would parse the results.
The newer way frontier has setup is structured tool calls. `tool_use` or `tool_calls`. The response is then received as a different tool_result rather than a regular message. That's a bit of the newer way of doing it.
The failure mode in question is more the model mixing the two: "Sure, I'll read the file: {"tool": "read", "args": {"path": "foo"}}" - that'll break stuff. Other failure modes are the json not parsing when sent it as a structured call, and in some cases the model just emitting text and forgetting the tool call.
Random.
Given how windows is going what’s described in the article doesn’t seem so shocking either. Even though they need not be correlated products, I can’t help but seeing a similar shortsightedness in the playbooks they are adopting.