Something to Chew On: Modeling a pandemic - How the analysis of big data joined with biological and social scientific research helps in understanding a pandemic spread, with Dr. Caterina Scoglio, professor in electrical and computer engineering

Modeling a pandemic - How the analysis of big data joined with biological and social scientific research helps in understanding a pandemic spread, with Dr. Caterina Scoglio, professor in electrical and computer engineering

Apr 29, 2020

This episode brings another timely discussion about the challenges caused by the current worldwide pandemic. Dr. Caterina Scoglio, Paslay chair professor in the Mike Wiegers Department of Electrical and Computer Engineering at Kansas State University, explains the use of modeling in predicting the spread of epidemics. Dr. Scoligo specializes in developing theoretical models for the spread of disease. By using a generalized epidemic model framework software for the simulation of spreading, she apples models developed by her team to human and animal infectious diseases. Scoglio has developed models for the movement of ebola in Africa and protein corona formation in nanoparticles, which has been validated by experimental data. She has also developed network architectures and protocols for secure communication in smart grids.

Transcript:

How the Analysis of Big Data Joined with Biological and Social Scientific Research Helps in Understanding a Pandemic Spread, with Dr. Caterina Scoglio, Professor in Electrical and Computer Engineering

Now, in my experience, the predictions are very hard. So predictions for what I think are only valid in short term. So maybe it is possible to predict how many cases will be New York tomorrow and the day after tomorrow. But I cannot I think is very hard to say how many deaths will be in total during the epidemic right now. So I think those are whatever, making the comparison about the different scenarios that is the best use of the model.

Something to Chew On is a podcast devoted to the exploration and discussion of Global Food Systems produced by the Office of Research Development at Kansas State University. I'm Maureen Olewnik, coordinator of Global Food Systems.

I’m Scott Tanona. I'm a Philosopher of Science.

And I'm Jon Faubion. I'm a Food Scientist.

Hello, everybody, and welcome back to something to chew on. Managing life within a worldwide epidemic that necessitates both isolation and outreach provides an underlying platform to focus on what we can do in the moment to help. Today's guest, Dr. Caterina Scoglio, Professor of Electrical and Computer Engineering at Kansas State University, has been instrumental in developing big data based methods to understand how to help in the moment. An interdisciplinary approach to her work has provided Dr. ScaleIO with the ability through modeling systems to tackle questions of disease outbreak by developing new tools to provide a risk assessment before infection happens and guidance in implementing preventative measures. Dr. Scoglio’s research focuses on developing network based technology and tools in several fields. She conducts research in network theory problems and develop solutions to real world problems in the fields of computer networks and infectious diseases modeling. She has developed theoretical models, and has applied models and tools developed by her team to human and animal infectious diseases. today's podcast is again being recorded through zoom. Jon, Scott Courtney, our guest, Dr. Caterina Scoglio, and I are all practicing social distancing. As we converse through the phone on our computer connections, the current challenges with COVID-19 in the US and around the world are on the upswing, and we're facing ordeals that many of us never considered possible here. However, there are those that have considered this possibility and focus a good portion of their life's work on just this type of situation. Here in Manhattan, Kansas, the National bio and agro defense facility is nearing completion. This facility will focus on protecting the national food system against threats of potential impact of serious animal diseases. Additionally, K State is home to the Biosecurity Research Institute where comprehensive infectious disease research and teaching on threats to plant animal and human health is carried out. Research in the fundamental science of infectious outbreaks is critical but understanding the social science side of how people manage these situations, and the way in which these activities promote the spread of infection is equally critical. K State does have the ability to marry the social aspects with big data analysis and identify potential for the spread of these diseases. Today, I would like to welcome Dr. Caterina Scoglio, to the podcast. Catarina, your training is in electronic engineering, which on the face of it to me seems a long way from research carried out and Biosecurity Research Institute or other science based activities.

Yes, that is true. That is true. As a matter of fact, many people are surprised by my research topic. There is a reason for that. My work research work when I joined K State in 2005, was on computer networks. And one of the biggest challenges in computer networks was in is still malware propagation. So the fact that there are some viruses that can spread and knock down all our computers. And so I started working and studying that problem. And I understood that the methods that were used in epidemiology were also used for studying computer networks. And so from that, and understanding that K State had a strong emphasis on infectious diseases, I decided in 2007, to start working on the spreading of viruses, among people and animals.

Wow, that's very interesting. As we get started in the discussion and get into more detail on this, I think that there are going to be a lot of questions that all three of us have for you. Could we step back just a little bit and maybe get a little background on you who you are, what your history is, what brought you to, to the kind of work that you're enjoying doing?

Yes. So I was born in Italy. And I studied and worked my first period of my life in Italy in a National Research Center. And in 2000, my husband had the opportunity of moving to us because he was working for IBM, Italy. And he got the job from IBM, US. So we moved to Atlanta. And at that time, my research work was on computer networks. And so I worked for five years at Georgia Tech, Georgia Institute of Technology. After those five years, I applied to many places. And I got the five interviews among those, I like K State the best, and I moved to Manhattan, Kansas.

Wonderful.

So in your work, you're using the same ideas that you've applied to malware propagation that you're applying to biological virus propagation. Is that correct?

So is just the spreading process. Obviously, the details are completely different. As a matter of fact, I'm not an epidemiologist, I'm not a biologist, and my work is always multidisciplinary. So I need the guidance of biologists, virologists, people that are experts on the specific pathogen. But then the modeling approach has one kind of unique unifying theme that is related to how to simulate stochastic processes of spreading.

So how would then say any immunologist or of our virologist, or an epidemiologist? How would they take the results that you have and add their own particular value? To them? I assume that they're most useful in the context of the larger field of science?

Yes. So we receive the input in the modeling phase, the initial phase is the modeling, how do we model this process? What can we use as the infection rate, the number the how aggressive is this disease? How quickly can people recover? So we get from them a lot of preliminary informations, then we do our simulations. And then we provide them with results normally in different scenarios. So if we do these, for example, then this will happen.

So Catarina could use say whether these models are for application to be used, sort of to deliver information to policymakers to help in their decision making, are you developing new models and doing more theoretical work.

So we are doing work in both directions, for example, we are trying to improve all the time, the models, there are some assumptions for the models that are not supported by empirical evidence. So we try to modify the model in order to be following that empirical evidence. On the other side, we also try the models in different scenarios. So we can tell, for example, what is useful, what is not what to do, for example, and this is something that has been discussed a lot what to reduce our knot, which is the reproductive ratio to a number less than one that will guarantee that an epidemic will die out.

So some of the modeling work that you're doing, it doesn't just input the R naught or zero number, but actually helps determine that?

So, for example, a very recent work we have done for a COVID-19 in in China, because we had the data and we were trying to couple the policies done at a given point in China by the Chinese government, with Viega with the data that we were seeing and saying that there are different strategies that can replicate, replicate the behavior, and for example, If you just do social distancing, but you don't use the masks, for example, you reduce the number of cases, but maybe you don't go to or not less than one.

Or if you include connect, you've got up to four days of asymptomatic infectivity. That changes as well.

Exactly the type of different scenarios.

So there’s been if you for the people who sort of follow someone who's been out there and look at some models, right, there are these things. We're talking about this separately, right? There's compartmental models, right? Where we've got the susceptible and the infectious and the recovered people, and you treat them as all different groups, right? And then you look at the transmission rate between them. Right? So is this you doing them additional modeling to then determine the figures that go into that? Or is this all sort of one big package, you know, you're working within a model to figure out what happens? And then you're including more variables or more factors, right, besides just sort of some simple assumption about, you know, a standard transmission rate from one area to another?

Yes, yes. So try to include the more and more information, the more data we can in the model more accurate model is going to be. And one specific viewpoint of our modeling is kind of based on understanding the contact network impact on spreading. And the concept is simple, is that in a spreading scenario, not every person has the same role. It depends on the connections, the movement of each person. So if you think about a network of nodes, and links, there are people that have a lot of links. So those are the so called the Super spreads. And it's very important to understand the role and what happens with their, their role in the epidemic. So vaccinating them, or at least educating them is extremely important.

Cool. And so when you mentioned node sort of this is going back to where you came from, right? That this doesn't work analysis type of thing, where you treat each person as a, as a node with different connections between them, right, and human connections, are I take it sort of individual contact of some sort or proximity. Spreading.

Yes, exactly.

Interesting.

And so what kind of other factors are in there that you get from data, you know, the social sciences in terms of how people actually interact and behave sort of how varied are these nodes you were looking in? I mean, I'm curious, just in general, what kinds of things you have to say here, but you were just looking from data in China, you were saying and I'm curious about sort of, you know, what, what other factors you bring in there in terms of in terms of these networks, right, and sort of how to model them and sort of how complex it and how that like translates to other areas?

Yes, exactly. So I think one important experience we had a few years ago was an NSF project. We received, we were awarded in order to study the shape of these networks in rural regions. And so we went to test the and survey people in Clay Center and Channel, so two towns in Kansas. And we had the interesting results, say that is called Magneto. Era regions are very different from the contact nets of urban areas of states. And we got a very interesting result due to that.

Interesting.

Before COVID-19 came crashing through the door, what kind of were you looking at other specific diseases or other different pathogens?

Yeah, so we will get influenza because was the time when we had the h1n1 outbreak that at that time was a big deal. Now we see that it's nothing compared with this one. Yes, but it was an important thing. So we went there, and we asked them about their contact or their movement, their willingness to follow the directions of radio, TV and governmental agencies. And the end, we got interesting results that I can share with you if you are interested. Yes, so what we saw that, obviously, rural regions, the contacts are much less so people have fewer contacts, but there are very strong ties among people. And what we saw is that 49% of the respondents said that they would still visit at least one or two households outside their home, even there, even if there was a serious effect. Make in the net that we told them to remain at home? I think this is still valid.

Absolutely, you can see that happening. This question may be something that you've already discussed. But there were two models in papers that I was reading through. There are two models, one of them is SIS, and one of them is FAIS. Can you explain a little bit about what those two are? And how they differ from one another? How one may be more appropriate in one situation? And another?

Yes, yes. This is a very interesting question. So one topic that was at the center of the attention a few years ago was how to model the behavior of response of people, because the model the classic models, for example, SIS, or SIR, do not include anything of that. So people become susceptible then become infectious. And at the end, if they receive immunity, they are recovered. So they are outside of the game. But in the case of some for example, sexually transmitted diseases, they recover and they are again susceptible. So that is the difference between si s or SI are at the end, if you are immune or not. However, they don't consider any anything about the response of people. And for this reason, what we did was to include another compartment, which was the compartment of the susceptible but alert person. So a person is still susceptible, but takes some preventive measures to reduce the infectious the risk of becoming infected. And so it is more than we saw how the epidemic threshold moves, somehow given a contact network. And so that is a way to quantify the benefit of alertness.

Where I've seen more and more posts, just in the last few days of people lifting or organizations listing things that you might expect susceptible, but alert individuals to do and specifically saying don't do these things, even if it is your family, don't go visit them. They're no different than anyone else, etc, etc, etc. So I think it's really the first time I've seen this since we started worrying about it so much.

Yeah, so this is a way to model that behavior.

So I don't want to have us dig down too much into the different models. But I find all this kind of stuff. Fascinating. And I'm curious sort of. So you introduce this other compartment. And that's one way of sort of thinking about how to make models more realistic, right? Sort of not all susceptible people are the same, right? There's a difference. And now, but this model, I take it sort of from the two compartment model, right? So it's got the individuals, does it treat all the individuals in the compartments the same? Or do you do modeling different kinds of alertness or alertness, the spatial to like variations and alertness across regions? Or time? Or could you say more about this?

Yes, so our models are network models. So they all the nodes that are susceptible, they are not in the same condition, their risk to become infected, that depends on their position in the network. So our model is an individual based model. Some other the most classic models consider an aggregate of homogeneous susceptible people then homogeneously distributed, in fact, that and so on. So one characteristic of our network of our approach is that is individual base. So each node is in a different position, because of its the role played in the net.

Interesting, too. So and this also indicates one of the reasons somebody like you is involved in doing this modeling, right? Because that is much more complex analysis, right? Because you're treating each person as an individual. And, and you have many, many, many connections, sort of, you know, between the model, whereas the, it's much simpler to do much simple models, right? I can do some back of the envelope kind of calculations looking at information I've got, but that's nothing. And then, you know, an epidemiologist might be able to, you know, do some basic stuff. And then their computational epidemiologists who sort of do more work, could you say, so, and then you've got the more and more complex model gets, the more you really have to be thinking about how you're doing the computing work. So I wonder if you could say something about the range of that complexity and the way the different kinds of ways from simple to complex, you can think about modeling the real world question, you know, that we're dealing with now with COVID. And like, what that whole range looks like.

And I think that includes the jump between data and big data and walking into that realm of how you handle big data.

Right, exactly. So, the more this there is all these range of increasing complexity model starting from the homogeneous differential equations based models, where you have three differential equations system with an equation, and you can easily medically solve them up to what are called agent based simulations. So the agent based simulations are very complexes, very expensive, where you simulate with an agent, which is a simple piece of code each individual. So you need to input the schedule of each individual. And then in that type of teammate simulations, you can have all the details about the community. And you can input in those models, the behavior that you want to represent the to the detail of the single individual's obviously, these are huge simulations and require big computers, high performance computers in order to be to be run. But it's also true that these two complex models require also a lot of data. In order to be meaningful, you need to know really what people are going to do. And so sometimes we do not have that level of data, the more the big data source is available, the more those agent based simulations can be realized.

In what form do you get your data, for example, from China?

Oh, so we got that from the publicly available data. That is they were I think text files published daily. For each location, I believe the something very interesting at this point will be for those people who can obtain the movement tracing of people. So if we have all the data, either through cell phones, or either through other sources, stapler, this kind of internet data of how people move, that will be very, very useful and very interesting.

Now, so one of the projects that you worked on in the past was on Ebola and the movement of that outbreak. Can you talk a little bit about that research?

Yes, we were in that research. Also, there, we were supported by NSF to establish the efficiency and the efficacy of contact tracing, because at the beginning of an epidemic, contact tracing is very effective. So you can really try to detect a person, and if it is effect that isolate immediately and trace, or people that were in contact with that person. And so we were analyzing the conditions of for how fast we need the to detect, and in fact, that person, and how fast we need to follow up with a contact tracing. And the result was that the earlier you do all these steps, the better it is. So at the end, I think and it's true also for unfortunately, we haven't done a good job a for COVID-19. But early mitigation is the most effective policy, because you can see I think, our cylinder control, you can kind of follow be earlier than the epidemic not try to follow up. I think one of these, in these meetings, one of the White House doctors were saying we are kind of trying to follow up with the disease, we should anticipate we should be earlier than then following up.

So right now with COVID-19, we're in a situation where testing is much more limited than what epidemiologists want. Certainly, it's been pretty hard to do the contact tracing and because information comes back and not enough time sometimes and and at this point, we're pretty far spread out. Right, are there lessons we can draw from some of the modeling about sort of where somebody might be trying to still do that. So there are some areas where you're way beyond contact tracing, right, sort of just so endemic in the community that seems that there's probably no point, you know, but sort of at the edge of some outbreaks, you might be able to still stop and and catch it and stop it in that community. Are there lessons from modeling to sort of say where predict here's where you should put your contract tracing efforts? And then here's where you should be doing sort of other mitigation type efforts?

Yes, I think this is a very good point. So a contact tracing is going to be valid where the epidemic is still not in the community in a broad way. So some of the rural communities, I think, are in this condition, not only because they are the beginning of the epidemic, but also because the contact nature is kind of structured. So you have a few contacts, or at least many, much less contacting.

Yeah, right, the public transit in a big city or something like that, right?

Exactly the public transportation, but you have also, so you have a few of these contacts, you have some locations that are very critical, because many communities will have just a few location where they go for shopping or for the pharmacy, where people can get infected, but also they can both tested and educate. And maybe masks can be distributed. So those are, I think, a feasible strategies that if there are resources could be implemented effectively. And they should be also giving giving results.

Yeah, really interesting. Truly, I hate to ask this, but is there any indication that this is going to be happening with COVID-19?

In many places, politician, I think our governor has been very active. So I think if she can get some of those tests or masks, maybe that could be a good, a good news, because maybe not a lot of those tests will be enough to do a good job in some rural communities.

One of the studies that I also noted you working on there was a mention of development of models of protein, Corona formation, nanoparticles. What's that one about?

So that was a project we did a few years ago when Jim revere started a very interesting and exciting Institute for computational medicine. And so that was a completely different topic about is the following when you use nanoparticles, for example, to deliver some medicine or something in the body of a person, while those nanoparticles that you can imagine like being some spheres go through the blood, some proteins will attach on top of them, making what is called a Corona. So in this case, a corona of a protein around a nanoparticle. And then ending in which proportion different type of proteins will attach on top of the nanoparticles was our research topic.

I think, obviously, the corona word caught my attention as I was reading through that. So different finishes in the modeling that you're working on interest, the same.

You are right, because now we call the Coronavirus is because you have an internal part, which is done by RNA, I believe. And then you have all these proteins on top of that round of the sphere to do the corona. So it's similar.

So one of the things that's interesting about this, was this a network analysis tool for that?

No, it was no, it was just based on these different proportion of proteins that were attaching on top of was, if I can tell is a seal the study of a dynamic process, because the amount of protein change with time and you need to study how that happened. But it was not a network based model.

Interesting. We've also model cattle movement, is that right? Yes. Could you say something about that?

Yes, yes, because another topic of our research is for cattle diseases. And so we have been using similar models for for cattle and for cattle in southwest Kansas, we have been also developing synthetic data for cattle movement, because currently, there is no not a mandatory requirement for the industry, the cattle industry to provide the movement of their cattle that will be very useful in the case of an epidemic because obviously, you want to know where infected animals go and how to understand which farms are infected from those movements. And so what we did was to create a base on some data we had some Synthetic movement of cattle that people can use freely because they are synthetic. They are not the true cattle movement, but they are just simulated.

That's interesting. The podcast that we most recently published was with Dr. Megan Niederwerder, in the veterinary medicine school. I don't know if you're familiar with her at all, one of the things that she talked quite a bit about was the movement of contaminated feed, and places that it goes and how difficult it is to manage understanding the feed coming into the country or where those pigs are where the feed is worldwide. And it sounds to me like what you were talking about just now would fit right into that kind of a study in figuring out how you manage something like that. Yes, that is similar, or any of the, you know, just stepping back to the Ebola work that you had done. Are those modeling systems in use today? Are those things being used in any part of the world looking at things that are happening? Are they on the shelf now to be pulled off when the sign of something maybe starting.

A similar model, and we participated to that was used last year when the Democratic Republic of Congo had a an Ebola outbreak that was risking to be kinda to thread into Uganda. Since we had the collaborators in Uganda, we visited Uganda, we did some work there with our model. And our model was used by the health officials in Uganda to understand that the risk of each of their counties of getting Ebola, so four kinds of risk assessment, and we published the paper, we had also a good intuition a good result on which city county was going to be the most at risk, because there a few cases appeared, actually. And we published that result before those cases appeared.

So, if you were to be working on what's happening here today, how would you approach that question? So we've got an outbreak here, and I somebody comes to Katherine and says, Could you please help us modeling? What's going on here? How would you approach that problem?

Yes, if using my expertise, the first thing I will do is to try to create a base on data, the contact network of the community. And so understanding how people move and how they get income, and then from that, I will start building the network base models, considering which compartments I need to include. So for this case, I would definitely include the susceptible, but expose the because we have the expose the compartment where you don't show signs, and then I would include the two types of infectious people infectious, symptomatic, and infectious as symptomatic. And so any exposed people can transition to one of these two, and then I will let them transition into either recovered or hospitalized, and from hospitalized either recovered or that and with this model, I will try to do simulations, trying to change something that could for example, represent social distancing, and how to do that by changing the network. So normally, I have 15 contacts, my graduate students, but now I don't. So I have now only two contacts, my daughter and my husband. And so adapting the contact network can give us a different results. And that is one way but I can also reduce the infectious rate. Because I put mask, I wash my hands. So my beater, the probability of becoming infected is a smaller number. So things like that.

One of the things that we haven't quite said, but you mentioned in the beginning talking about using these models for decision making, I think that's people, when we look at sort of models and predictions that come out a model that's very different views, maybe about how out in the public, there are different views about sort of what these things mean. And so I think sometimes when people hear predictions from model, they think that this is, you know, well it's a prediction about what's going to happen. And they are they're their models that make predictions, but you just raised there one of the most important things, I think, come out of models, and this is at least the way I think about it so curious about what you say, right, which is that they're to help us make decisions, right? And in particular, for something like this. Like what kinds of things can we do that are going to make the most difference? And so what you learn from a model, that what you would learn from something like this would be All right, what should we be putting our energies into? Right? Sort of, you know, how much is social distancing going to make a difference? And how much is washing our hands gonna make a difference and let people get the infections like some people were talking about for COVID-19 at one point, like, let people get infected, you know, and raise up herd immunity, but then treat them better or you know, whatever. But then you look at sort of, you look at all the different options, and then you look to sort of see, well, how many people are going to be in the hospitalized bin? If we did that? And oh, well, that won't work, right? Because we don't have enough beds or something. And so then you look at different kinds of entry points for interventions. So I mean, that's at least the way I think about the usefulness of models. Is that how you think about what the modeling that you do should be used for there are other aspects to it, too?

I agree under percent is very models have a big responsibility there was a few days ago, and an article in The New York Times. So I think we have these two possibilities for using model. One is for forecasts, and one is for decision making, looking at different scenarios, and deciding the mitigation strategy. Now, in my experience, the predictions are very hard. So predictions for what I think are only valid in short term. So maybe it is possible to predict how many cases will be New York tomorrow and the day after tomorrow. But I cannot I think is very hard to say how many deaths will be in total, during the epidemic right now. So I think those are whatever, making the comparison about the different scenarios, that is the best use of the model.

As you said, it's a hard thing, right? Because it's sort of I mean, we need to forecast to this is this one of these things that I find complicated and interesting is that interventions take time and effort and money, and they have other effects, obviously, right? So we're looking at effects on the economy of the blocking down social distancing. And so you have to, you have to look at the forecasts from different different scenarios and take them seriously, right, because how many deaths might we prevent? If we did this? How many things might we prevent if we did something different? So you have to take pieces of the forecasting seriously of sort of but then as you say, the forecast overall is a different thing. Right? So because the forecast overall is so highly dependent on all these other variables, and through all the different actions that we take, and so many other things that we can predict, right, that it becomes hard, but we do still use the forecasting part in small pieces.

Yes, exactly. Exactly. As a comparison, especially in a competitive way.

Fascinating and absolutely timely to what we've, what we're looking at today. It's interesting to get a better understanding of how those models are developed. And just listening to what's on the news every every night, understanding what those workers are doing and putting those models together and understanding a little bit more about what their limitations are as well. Well, this has been a great discussion. Caterina, I very much appreciate your time and willingness to come on for a chat. Hopefully, we can do a follow up one of these days, hopefully, so.

Thank you. I mean, for making what could have been an arcane topic. very understandable and very interesting.

Yeah, absolutely. Thank you.

Thanks so much for talking with us.

If you have any questions or comments you would like to share check out our website at https://www.k-state.edu/research/global-food/ and drop us an email.

Our music was adapted from Dr. Wayne Goins’s album Chronicles of Carmela. Special thanks to him for providing that to us. Something to Chew On is produced by the Office of Research Development at Kansas State University.