DATA STORIES | COVID-19 | KNIME ANALYTICS PLATFORM

Data Science With KNIME, Jupyter, and Tableau Using COVID-19 Projections as an Example

Combine the power of KNIME, Jupyter and Tableau to get a better understanding of the question: How will COVID-19 spread in different countries of the world?

Dennis Ganzaroli
Low Code for Data Science
5 min readApr 8, 2021

--

Covid-19 Dashboard with expected cases

After publishing two different articles on the topic on “how to make Covid-19 projections” (see in the references below), I decided to produce this short video to summarize my experiences and give a brief insight into my methods.

The Motivation

My greatest concern was since the beginning of the spread in China to answer this crucial question: “When will be this pandemic over?
When I saw the first videos on Youtube with these long lines of people in front of the hospitals in China, I was asking myself: is this true? Are we in a real danger?

So, I started to answer these questions by doing the best thing that I can do: By analyzing the data.

Want to read this story later? Save it in Journal.

Data and Tools

I found the official data of the Johns Hopkins Institute here (the ones who made this famous dashboard with the red bubbles and the black background).

For data blending my decision was immediately clear: KNIME

First, because it’s a powerful visual programming tool, that helps you focus on the solution instead on the software.
Second, because it’s scalable with plugins for R, Python, Jscript, Tableau, Weka and many many more.
And third, because it’s open-source.

Then I had to find out a way how to calculate the forecasts.
Python includes a lot of libraries like SciPy that you can use to solve optimization problems or to perform curve fitting. The integration of Python in Jupyter Notebook is very well suited for data tasks. And another advantage of KNIME is that you can call both Python and Jupyter Notebook scripts over so-called nodes inside of a KNIME-workflow. This simplifies the work process enormously.

Python-Node in KNIME calls Jupyter Notebook script

At the end comes the visualisation of the predictions for every country of the world. Tableau is here always my first choice. The Public version has some restrictions but if you know how, then you can also reach your goal here.
The biggest challenge was to automatically update the data in Tableau Public on a daily basis, as only files from Google Drive can be updated.
But with KNIME also this task was quickly done. KNIME provides an easy solution with dedicated nodes here as well. You don’t have to create any key’s or API’s. Just use your Google login and go.

Upload Excel to Google drive automatically

Prediction Model

At the beginning of the spread there was just one wave, so a logistic regression function was good enough to make forecasts. The evolution of a pandemic is like a growth process. At the beginning It’s looking like an exponential function.

Exponential growth

Then it changes to a sigmoidal curve. Which is best described by a logistic function.

Logistic Growth

Therefore I tried in my first attempt to predict the Covid-19 cases by fitting a logistic function over every country of the world. I have described this method in the following article:

Covid 19-Projections with Knime, Jupyter and Tableau

The model was good enough until several waves followed. But then I had to change my approach. I found out that the Rockefeller University had already used in the late 90s a special kind of method, the so-called Loglet Analysis to forecast the evolution of multiple overlapping logistic functions, also called wavelets.

The idea behind is quite simple. Let’s take for example the following curve here:

Bi-Logistic growth

This curve could represent, the number of Covid infections, over several waves (in this case two). If you decompose such a curve into its different subprocesses then you get the following wavelets:

By fitting these functions and putting these wavelets together you get a good fit of this curve. But not only! You are also able to make forecasts.

These predictions are often quite accurate. For example, on January 28, I posted on Twitter that in Thailand, cases would rise very quickly but then level off. (image down on the left). A month later followed this tweet, which could very well confirm my prediction:

https://twitter.com/DennisGanzaroli/status/1363757399270690820

Tweet on my Covid 19 predictions for Thailand

This approach attracted a lot of interest and I was allowed to give an interview about it at the last KNIME Spring Data Talks.
The following video is a recording of my talk and covers the topic described above.

Thanks for reading!
Please feel free to share your thoughts or reading tips in the comments.

References
-
Covid 19-Projections with Knime, Jupyter and Tableau
- Loglet analysis-revisiting Covid-19 Projections

Material for this project:
knime-workflow:
knime-hub
Jupyter-Code:
github
Tableau-Dashboard:
Tableau-Public

Follow me on Medium, Facebook, Linkedin or Twitter

📝 Save this story in Journal.

--

--

Dennis Ganzaroli
Low Code for Data Science

Data Scientist with over 20 years of experience. Degree in Psychology and Computer Science. KNIME COTM 2021 and Winner of KNIME Best blog post 2020.