‘Data Mining Can Help Forecast the Pandemic Situation with an Accuracy Within 2.5%’
A mathematical model of Covid-19 spreading in Nizhny Novgorod Region, which has been created by the Big Data Laboratory at Nizhny Novgorod Development Strategy Project Office, has been widely discussed in the media and on social networks. The research was led by Anastasia Popova, a master’s student of HSE University in Nizhny Novgorod, repeat winner of machine learning competitions, and winner of Ilya Segalovich Award by Yandex. In the following interview given on April 15, Anastasia speaks about how the model was developed, the data it uses, and long-term potential applications.
— What data is the coronavirus spread model based on? And how well equipped is the Big Data Laboratory to undertake such studies?
— The Big Data Laboratory, which I’m heading, focuses on various development programmes for the region, from transport models to research and education centres.
The Nizhny Novgorod Development Strategy Project Office tasked the Laboratory with forecasting how the epidemic situation would develop in the region. We were to use mathematical calculations to predict how the spread of Covid-19 in the Nizhny Novgorod Region would be impacted by people demonstrating increased responsibility, or, on the contrary, by weaker isolation.
We recruited not only programmers and analysts from our team to work on the project, but also other experts, such as epidemiologists from the Volga Region Medical Research University.
— Is the situation developing in line with your forecasts?
— Our calculations were based on the data as of April 6 and 7, when there were about 80 detected Covid-19 cases in Nizhny Novgorod, with a daily increase of about 20. For today’s date, we have predicted 204 detected cases in the city, while the real number is 224 (data as of April 13, 2020). By Friday, we forecast about 500 cases, with 1,600 by April 24.
Today, Yandex’s self-isolation index has fallen (it used to be over 4, but is 2 today). In addition, it does not count people who travel without active Yandex services. We will see the effect of violating self-isolation as a leap in about ten days from now.
— What is the main difficulty in building such a model?
— The main difficulty is that the course of the epidemic is influenced not only by policies that are frequently changing, but also by people’s level of responsibility. That’s why all forecasts are conditional and seek to answer the question: ‘What will happen if such and such measures are implemented?’ There are other factors as well, such as the share of asymptomatic carriers and immunity, and it’s really hard to estimate them. As of today, we are following the scenario with incomplete isolation, which leads to almost 12,000 cases in the near future. In addition, we have a model that predicts the number of cases for the near future in Russia within 5-7 days with an error of 1-2%, and in Nizhny Novgorod with an error of up to 10%.
— What data have you used? Have you analysed the cities that have already passed the peak phase (such as Wuhan)?
— In terms of data analysis, the Covid-19 pandemic is a unique opportunity to work not in a laboratory, but ‘in real life.’ It would be unprofessional to ignore global experience. We have used several sources for our model.
First, we used the data on most countries and regions that have published Covid-19 statistics, including 297 regions in the world, and 21 provinces in Italy. Second, we are constantly monitoring Russian and international research on Covid-19. And, as I mentioned above, we are in continuous contact with epidemiologists from the Nizhny Novgorod Region.
This means that we have collected the data for our model from all over the world, both aggregated by country, and distributed by regions and smaller territorial areas over the whole period of the outbreak. The analysis included several dozen cities and regions, in order to detect the ones that have epidemiologic parameters that are closest to ours (policies, population size and density).
— Is your current mathematical forecast short-term or long-term in nature?
— The time frame is crucial to our research. It determines the choice of certain methods. When we are building a short-term model, we use exponential function time series extrapolation. In this case, the model returns high accuracy for up to 7 days ahead of the epidemic reaching its plateau. The error in the short-term forecast for 7 days for the whole Russia is less than 2.5%.
When we model the whole period of the epidemic, we use a more complicated SEIR-model, which includes 11 differential equations with 14 variables that mark the virus’s epidemiologic characteristics, the policies introduced, specific characteristics of the location, and preparedness of the local health care system.
The data on Nizhny Novgorod are actively being accumulated. The situation is changing hourly. And still, we are lacking material needed to build precise models.
That’s why we are focusing on developing the modelling for the whole period of the epidemic in the Nizhny Novgorod Region. Coefficients for this model have been chosen by data from China (excluding Hubei), since it has already defeated the epidemic, and we can observe all its stages. Some of the parameters have been chosen statistically and based on epidemiologists’ opinions, while the rest of them are based on the time series of the cumulative number of cases in Nizhny Novgorod (at the time the model was built, there were 80 confirmed cases in Nizhny Novgorod, with a daily increase of 24). The model error per 11,500 people is 9% for a 7-day forecast.
— Will you be improving your calculations? If so, how often?
— We are working on the model and are trying to update it as we get new data. This is very important, because Nizhny Novgorod is only starting to experience a sharp growth in cases. We are updating our coefficients and forecasts daily. We are now making the model more complicated, so that it considers more measures on epidemic prevention and compliance with them, as well as factors related to healthcare system preparedness such as the number of equipped hospital beds and ventilators available.
— Do you believe it’s necessary to self-isolate with the Covid-19 pandemic spreading?
— For me, it is completely obvious that the main factors of an optimistic model would have been the timely introduction of almost complete home isolation on March 28, the Yandex self-isolation index at 4.5, and maintaining home isolation until the end of the epidemic. The factors for a realistic scenario would be partial abolition of home isolation on April 6, a Yandex self-isolation index at 3.8, and maintaining home isolation until the end of the epidemic.
I believe that self-isolation should be as strict as possible; otherwise, the epidemic will become uncontrollable, and many more people will suffer.
During the Covid-19 epidemic, it is essential to act preventively, since the lag from a policy introduction to its effect is about two weeks
And thousands of people may become ill during those two weeks. That’s why I believe that the Nizhny Novgorod authorities were smart to take preventive measures, when there were only 11 confirmed cases. This will help us avoid a huge number of victims, but only assuming that all city residents act consciously and responsibly.
Unfortunately, the self-isolation index is gradually falling. But I hope very much that Nizhny Novgorod residents prove to be responsible. Each of us should understand that when we violate the self-isolation regime, we compromise not only our health, but also health and lives of other people.
— You are finishing your studies at the Master’s programme in Data Mining
— Yes, this year I’m graduating from HSE University. The tasks set to us by our teachers have been very interesting and, importantly, applied. Initially, my research project was dedicated to recognizing human emotions in speech, which could help improve the quality of security systems. My graduation thesis is about image recognition – increasing the information capacity of attributes’ vectors mined by high-precision neural networks from images, with the use of human re-identification approach. I love participating in projects that have the ability to optimize certain processes or prevent negative scenarios from occurring. This is my way of changing the world for the better.
Interview by Yulia Guseva
See also:
Analysing Genetic Information Can Help Prevent Complications after Myocardial Infarction
Researchers at HSE University have developed a machine learning (ML) model capable of predicting the risk of complications—major adverse cardiac events—in patients following a myocardial infarction. For the first time, the model incorporates genetic data, enabling a more accurate assessment of the risk of long-term complications. The study has been published in Frontiers in Medicine.
‘We Bring Together the Best Russian Scientists and AI Researchers at HSE University Site’
On October 25–26, 2024, HSE University’s AI and Digital Science Institute and the AI Research Centre hold the Fall into ML 2024 conference in Moscow. This year’s event will focus on the prospects in development of fundamental artificial intelligence, with SBER as its conference title partner.
HSE Researchers Demonstrate Effectiveness of Machine Learning in Forecasting Inflation
Inflation is a key indicator of economic stability, and being able to accurately forecast its levels across regions is crucial for governments, businesses, and households. Tatiana Bukina and Dmitry Kashin at HSE Campus in Perm have found that machine learning techniques outperform traditional econometric models in long-term inflation forecasting. The results of the study focused on several regions in the Privolzhskiy Federal District have been published in HSE Economic Journal.
‘The Goal of the Spring into ML School Is to Unite Young Scientists Engaged in Mathematics of AI’
The AI and Digital Science Institute at the HSE Faculty of Computer Science and Innopolis University organised a week-long programme for students, doctoral students, and young scientists on the application of mathematics in machine learning and artificial intelligence. Fifty participants of Spring into ML attended 24 lectures on machine learning, took part in specific pitch sessions, and completed two mini-courses on diffusion models—a developing area of AI for data generation.
Software for Rapid Detection of Dyslexia Developed in Russia
HSE scientists have developed a software tool for assessing the presence and degree of dyslexia in school students based on their gender, age, school grade, and eye-tracking data. The application is expected to be introduced into clinical practice in 2024. The underlying studies were conducted by specialists in machine learning and neurolinguistics at the HSE AI Research Centre.
‘Every Article on NeurIPS Is Considered a Significant Result’
Staff members of the HSE Faculty of Computer Science will present 12 of their works at the 37th Conference and Workshop on Neural Information Processing Systems (NeurIPS), one of the most significant events in the field of artificial intelligence and machine learning. This year it will be held on December 10–16 in New Orleans (USA).
HSE University Holds HSE Sber ML Hack
On November 17-19, The HSE Faculty of Computer Science, SBER and cloud technology provider Cloud.ru organised HSE Sber ML Hack, a hackathon based around machine learning. More than 350 undergraduate and graduate students from 54 leading Russian universities took part in the competition.
HSE University Hosts Fall into ML 2023 Conference on Machine Learning
Over three days, more than 300 conference participants attended workshops, seminars, sections and a poster session. During panel discussions, experts deliberated on the regulation of artificial intelligence (AI) technologies and considered collaborative initiatives between academic institutions and industry to advance AI development through megaprojects.
HSE University to Host ‘Fall into ML 2023’ Machine Learning Conference
Machine Learning (ML) is a field of AI that examines methods and algorithms that enable computers to learn based on experience and data and without explicit programming. With its help, AI can analyse data, recall information, build forecasts, and give recommendations. Machine learning algorithms have applications in medicine, stock trading, robotics, drone control and other fields.
New Labs to Open at Faculty of Computer Science
Based on the results of a project competition, two new laboratories are opening at HSE University’s Faculty of Computer Science. The Laboratory for Matrix and Tensor Methods in Machine Learning will be headed by Maxim Rakhuba, Associate Professor at the Big Data and Information Retrieval School. The Laboratory for Cloud and Mobile Technologies will be headed by Dmitry Alexandrov, Professor at the School of Software Engineering.