Devil machine
The famous Soviet comedian Mikhail Zhvanetsky loved to tell a joke about an American sawmill bought for a Siberian lumber mill. Delighted by the capabilities of the overseas machine, the Soviet workers fed it everything from thin branches to enormous logs, and the miracle of technology successfully handled every task. But the persistent and proud lumberjacks didn’t stop and eventually shoved a rail into the sawmill. The machine broke down, and the workers returned to their usual tools, noting the pathological imperfection of Western technology.
I won’t hide it, my first acquaintance with LLM followed roughly the same pattern. With the persistence of a savage, I loaded the devil-machine with “human” tasks, drawing far-reaching conclusions from its apparent inability to solve them the way I wanted. And only after many months, having worked my brain considerably and experimented with software APIs, did I come to the generally obvious understanding that any thing must be used for its intended purpose.
Alas, many of my acquaintances have not come to this understanding. They laugh when the neural network draws a sixth finger, two tails, writes a recipe for cooking pork wings, or makes mistakes in simple calculations. Although this is about the same as laughing at an orangutan who forgot to tighten a nut when assembling a jet engine, or, even more precisely, at a genius violinist unable to take a double integral. It would be much more appropriate to be impressed that the neural network can, in principle, draw or give coherent answers to complex questions. Especially since the majority of critics themselves are not capable of drawing anything similar, whether with six fingers or with four.
Here we have another example of the paradox of the human psyche: when interacting with something through a simple interface, we tend to forget how complex the thing we are actually operating is. Just like when driving a car, we automatically turn the steering wheel and press the pedals without thinking about the intricate technical solutions embedded in the engine, transmission, and suspension design. And we remember them only at the moment when a failure, malfunction, or complex situation occurs where our skills cease to be sufficient. And then the ability to operate is not enough, and we have to understand the machine.
This is exactly what happens when interacting with ChatGPT. The “messenger” interface is so simple, and the illusion of communicating with a live interlocutor is so evident, that we almost forget who we are dealing with. And we demand reactions and responses from the interlocutor that we would expect from a living person. Moreover, from a person with sky-high intelligence, having access to all the knowledge in the world and capable of processing it at the speed characteristic of computers.
In most cases, our expectations are justified: these tools were created for this very purpose. But expectations are expectations, and in reality, a chat-bot based on a large linguistic model is a tool with its own properties, characteristics, and limitations. And when the task set for it goes beyond acceptable boundaries, we get amusing, idiotic, or simply incorrect answers. This does not mean that the tool itself is bad, but rather that it is not good for someone who stubbornly tries to shove a rail into a sawmill instead of understanding how this sawmill works.
In my observations, the most adequate users of neural networks are managers. I mean managers, good managers: they professionally view subordinates as certain resources, tools, understand the limitations of their capabilities, their responsibility for correctly setting tasks, and know how to not just expect results, but achieve them. For them, communication with LLM — this is a fairly familiar process, and they are less likely than representatives of most other professions to make meaningless complaints and fall victim to unfounded expectations. However, they are also less prone to ridiculous enthusiasm or, conversely, dramatic horror at the future takeover of the world, than impressionable individuals from other industries. After all, any manager has had to deal with people who far surpass them in intelligence, and they know that besides intellectual abilities, there are many other strong points that allow one to control, subordinate, and utilize.
But let’s return to the infernal machine. Here and further, we will talk about text tools based on LLM — large linguistic models. We will not delve into the details of their structure, as there are plenty of thick monographs, lengthy educational videos, and university courses for that. We are quite content with the most basic understanding.
A large linguistic model should not be imagined as a horned devil, a huge brain with an incredible number of convolutions, or a complex mechanism with billions of gears. Personally, I recommend not imagining it at all, even if you are an expert in this field: simply because it is distracting. When driving a car, we do not think about how the flywheel rotates, the pistons move back and forth, and the connecting rods swing. Similarly, there is no point in delving too deeply here. It is enough to imagine a box, IT specialists in such cases say “black box.” Through this box, billions of texts in different languages have been passed. You did not witness the process of “passing,” it is called “training” and was carried out by OpenAI, Google, or someone else who kindly provided you with this model. Millions of dollars, megawatts of electricity, and tens of thousands of hours of work by powerful GPUs, which we still call graphics cards, were spent on it. And we have practically nothing to do with this process.
It is important to us that the neural network, that is, what sits inside the box, does not remember all these texts, but is able to determine the probability, with which a certain set of words will be followed by one or another next word. If you are not sure you understand this phrase, reread it until you do — without it, everything that follows will be empty page-turning. So, the trained neural network knows nothing, understands nothing, makes no conclusions, and has no idea about causes and effects. It is only able to provide the most probable sequences of words following those you submit as your queries. All its “knowledge” and “understanding” are embedded in these very probabilities, nothing more. The main detail, the “engine” of any neural network tool, is extremely simple in its logic. Its intelligence is given by the vast number of accounted connections between words, allowing it to provide relevant answers to most real queries.
However, from this basic structure of LLM also follow limitations.
First of all, the neural network is trained on a specific dataset, and the connections built within it are based on this training sample. This sample for general-purpose networks is taken from the commonly used information environment: the internet and books, and it reflects the knowledge and information available in this environment. The Earth is round, America was discovered by Columbus, killing people is bad, the Sun — is a yellow dwarf, and Plato is a Greek philosopher. Whether you agree with this or not, the connections between words are formed in this way, and without applying special efforts, corresponding questions will yield corresponding answers.
From the nature of neural networks follows their most famous, unpleasant, and often ridiculed property of hallucinating. In response to our queries, LLM provides the most probable, “best” sequence of words — it cannot not provide it — but there are no guarantees of the adequacy of this sequence to what actually exists. If the text array used in training does not allow considering the answer irrelevant, then the neural network has no other ways of such evaluation: it simply has no sensory organs, no “life experience,” “pork wings” mean nothing to it, it has not seen a pig, does not know what it is, and does not even realize the reality in which pigs exist. LLM operates with words, and this phrase turns out to be the most relevant of all possible, that’s all. We will talk about how to deal with this later, but for now, we have to accept the inevitability of hallucinations as such.
For this same reason, neural networks themselves do not know how to count, or more precisely, to calculate. There are quite a few texts that say twice two is four, so to the question “what is twice two” the answer “four” is the most likely. But there are negligibly few texts that would say what the square root of six hundred eighty-five multiplied by one and a half is, and therefore the chances of getting the correct result are also low. There are methods to teach — no, not neural networks, but tools based on them — to count, but initially by their nature, LLMs do not know how to calculate.
It is not a universal knowledge base either, although this is exactly how it is very often attempted to be applied, sometimes successfully, sometimes not. If it concerns some commonly used facts, then with a high probability the connections between the words describing them will be sufficiently “strong” and the answer will be semantically correct. But it is quite possible that there were some similar, “distracting” fragments in the training texts that will pull along inappropriate, inadequate, irrelevant word sequences, and the meaning of the result will be distorted. That’s how I once tormented ChatGPT with questions about Cardinal Richelieu’s brothers.
Using linguistic models to manage knowledge bases and search through them is not only quite possible but also necessary, however, the neural network itself is not such a base. And expecting universal erudition from it is very reckless.