‘Bots Are Simply Imitators, not Artists’: How to Distinguish Artificial Intellect from a Real Author
Today, text bots like ChatGPT are doing many tasks that were originally human work. In our place, they can rewrite ‘War and Peace’ in a Shakespearean style, write a thesis on Ancient Mesopotamia, or create a Valentine’s Day card. But is there any way to identify an AI-generated text and distinguish it from works done by a human being? Can we catch out a robot? The Deputy Head of the HSE School of Data Analysis and Artificial Intelligence, Professor of the HSE Faculty of Computer Science Vasilii Gromov explained the answer in his lecture ‘Catch out a Bot, or the Large-Scale Structure of Natural Intelligence’ for Znanie intellectual society.
‘Why are modern texts created and who writes them?’ asked Vasilii Gromov. His generation and the generation of lecture listeners grew up on works written by people for people: authors of such texts put a certain meaning into their works, had a certain goal, whether the book was ‘Sleeping Beauty,’ ‘War and Peace,’ or a textbook of mathematical analysis, the professor notes. However, nowadays, children from a very early age are surrounded by texts written by an unknown author with an unclear purpose for an undefined audience. Vasilii Gromov and his colleagues wondered whether such a child would grow up the same way the previous generations have done.
The ongoing change is neither good nor bad, because the world is transforming. Humankind is now experiencing the process of ‘co-evolution of artificial intelligence and humans.’ Along with its rapid development, AI is adapting to humans, but humans also are beginning to adapt to artificial intelligence as well. To secure our future, or at least for ‘basic information hygiene,’ we need to learn to distinguish texts generated by bots (artificial intelligence systems that generate texts in natural languages like Russian, Chinese, etc) from those written by people.
Using a number of existing generated texts, it would not be difficult to identify whether a new text was written by a specific bot or a human: we simply need to load a large number of similarly generated texts into the neural network—and there you go, mission accomplished. However, after this, no-one would continue using that particular bot, and it would simply be replaced by another artificial intelligence. Therefore, scientists need to develop a mechanism capable of distinguishing any bot from any human. To do this, we need to look at the structure of language itself, which brings us to research, explaining natural languages from a mathematical point of view. Now, let’s take a look at the necessary steps.
The scientific field of natural language processing works, in particular, with the representation of words and sequences of words (n-grams, where n is the number of words) in the form of vectors (several elements of a certain number in a row), which creates a certain vector space.
Working with the representation of individual words reveals that the vocabulary of bots is no different from the vocabulary of an ordinary person. However, as soon as it comes to a sequence of two or three words, it turns out that the sequence generated by bots is significantly more predictable and much poorer in linguistic terms than the one that even the most poorly educated person can create (for example, a bot is more likely to repeat patterns). The difference between the n-gram sequence of bots and people is statistically significant even for large bots (ChatGPT), and this is what helps catch them.
Further study of natural language from a mathematical point of view brings scholars to some judgments on the location of such word vectors in space. There are regions of vector space (especially when it comes to the sequences of words) that only bots visit, and others that only people visit. Most (90–95%) are used by both, but there are separate bot areas—which is another way to catch them out.
If we cluster (a mathematical operation when sets of similar elements can be combined into one group—a cluster) a sequence of bots, these sequences turn out to be more rigid, compact, and without any discrepancies. When a verbal sequence of people of different genders and ages, with different education and backgrounds is clustered, the result is more blurry, indistinct clusters. Humans think significantly less clearly than bots, and this is another way to catch them.
If we represent each word or each n-gram as a vector, then their entire collection can be represented as a geometric object or a certain surface in a multidimensional space. Then, for example, if we take all possible word sequences in Russian, we may find that they do not fill the entire semantic space, but only part of it. Scientists can study and measure this sequence as a surface, even compare it with other surfaces (for example, with the surface of the English language). So, every surface in space has a dimension, ie, the number of independent parameters necessary to describe this object (for points on a sphere, for example, these are two values—longitude and latitude).
Studying the dimension of natural language, Vasilii Gromov expected to find an infinite value, but in the end, analysts came to the conclusion that language has a 9–10-digit dimension, and this figure varies slightly from language to language, but what is certain: human language lies in larger space dimensions than the bot's language.
Finally, the results of a recent 2023 study showed that this surface has ‘holes’ in it, like Swiss cheese. The holes are those areas of semantic space that our language has not yet reached. Although at the moment analysts cannot clearly indicate what is hidden behind them, they can detect them. Different languages have different holes, also referred to as ‘blind spots.’ When catching bots, it is important to remember that people are drawn to the boundaries of such holes, because they use language to create new meanings and ideas. Meanwhile, bots, like learned programs, move away from these holes, which makes the task of catching them easier for now. Surprisingly, it is humour that most often appears at the boundaries of such holes.
‘Bots are simply imitators, not artists. Technology does not stand still, so we must try to solve this “bot-catching” problem and understand what a language is from a mathematical point of view,’ summarised Vasilii Gromov.
See also:
HSE Researchers Develop Novel Approach to Evaluating AI Applications in Education
Researchers at HSE University have proposed a novel approach to assessing AI's competency in educational settings. The approach is grounded in psychometric principles and has been empirically tested using the GPT-4 model. This marks the first step in evaluating the true readiness of generative models to serve as assistants for teachers or students. The results have been published in arXiv.
‘Philosophy Is Thinking Outside the Box’
In October 2024, Louis Vervoort, Associate Professor at the School of Philosophy and Cultural Studies of the Faculty of Humanities presented his report ‘Gettier's Problem and Quine's Epistemic Holism: A Unified Account’ at the Formal Philosophy seminar, which covered one of the basic problems of contemporary epistemology. What are the limitations of physics as a science? What are the dangers of AI? How to survive the Russian cold? Louis Vervoort discussed these and many other questions in his interview with the HSE News Service.
HSE Scientists Propose AI-Driven Solutions for Medical Applications
Artificial intelligence will not replace medical professionals but can serve as an excellent assistant to them. Healthcare requires advanced technologies capable of rapidly analysing and monitoring patients' conditions. HSE scientists have integrated AI in preoperative planning and postoperative outcome evaluation for spinal surgery and developed an automated intelligent system to assess the biomechanics of the arms and legs.
HSE University and Sber Researchers to Make AI More Empathetic
Researchers at the HSE AI Research Centre and Sber AI Lab have developed a special system that, using large language models, will make artificial intelligence (AI) more emotional when communicating with a person. Multi-agent models, which are gaining popularity, will be engaged in the synthesis of AI emotions. The article on this conducted research was published as part of the International Joint Conference on Artificial Intelligence (IJCAI) 2024.
Neural Network for Assessing English Language Proficiency Developed at HSE University
The AI Lingua Neural Network has been collaboratively developed by the HSE University’s AI Research Centre, School of Foreign Languages, and online campus. The model has been trained on thousands of expert assessments of both oral and written texts. The system evaluates an individual's ability to communicate in English verbally and in writing.
HSE University and Yandex to Host International AI Olympiad for Students
The HSE Faculty of Computer Science and Yandex Education are launching their first joint AI competition, Artificial Intelligence and Data Analysis Olympiad (AIDAO), for students from around the world. Participants will tackle challenging tasks in science and industry and interact with experts from HSE and Yandex. The winners will receive cash prizes.
Winners of the International Olympiad in Artificial Intelligence Admitted to HSE University
In mid-August, Bulgaria hosted the finals of the first International Olympiad in Artificial Intelligence (IOAI) among high school students. The Russian team demonstrated excellent results, winning gold medals in the scientific round, silver medals in the practical round, and coming first in both rounds overall. This year two members of the Russian team were accepted into the programmes of the HSE Faculty of Computer Science.
Artificial and Augmented Intelligence: Connecting Business, Education and Science
The history of AI research in Nizhny Novgorod dates back to the 1960s and 1970s. Today, AI technologies, from voice assistants and smart home systems to digital twin creation and genome sequencing, are revolutionising our life. Natalia Aseeva, Dean of the Faculty of Informatics, Mathematics and Computer Science at HSE Campus in Nizhny Novgorod, discusses how the advancement of AI connects science, business, and education.
HSE University Leads the AI Alliance Ranking
The AI Alliance Russia has released a new ranking of Russian universities based on the quality of education in the field of AI. Similar to last year, HSE University has joined the leaders in A+ group alongside MIPT and ITMO. A total of 207 universities from 69 Russian regions participated in the ranking. In 2024, over 35,000 students were enrolled in AI-related programmes at these universities.
Reinforcement Learning Enhances Performance of Generative Flow Networks
Scientists at the AI Research Centre and the AI and Digital Science Institute of the HSE Faculty of Computer Science applied classical reinforcement learning algorithms to train generative flow networks (GFlowNets). This enabled significant performance improvements in GFlowNets, which have been employed for three years in tackling the most complex scientific challenges at modelling, hypothesis generation, and experimental design stages. The results of their work achieved a top 5% ranking among publications at the International Conference on Artificial Intelligence and Statistics AISTATS, held on May 2-4, 2024, in Valencia, Spain.