Comment on this (at bottom of page)

What are these graphs?

Using the Johns Hopkins/WHO data, I graph out both the raw data (number of infections, number of deaths and number of recoveries) for a number of countries. I also use a simple seven-day moving average for each of those (infections/deaths/recoveries) to project into the future where I think each country will be BASED ON THE CURRENT RATES.

Describe the individual graphs

For each country I produce two graphs. One is the "Base" graph and the other is the "Projected Graph."

The base graph simply shows the number of REPORTED infections (black line). I also compute the "resolution" rate of the reported infections. Infections are considered resolved when the infected patient either recovers or dies. The resolution rate is plotted as a dashed green line and shows the percentage of reported infections that have either died or recovered. For instance, today (March 17th, 2020), of China's 81,000 reported infections 71,000 have either recovered or died. That is an 88% resolution rate (again, green dashed line).

One point of clarification: On the base graph the "Death Rate" is expressed as the ratio between the number who die and the total number who resolve (either by dying or recovering). On the projection graph (more below) the "Death Rate" is the ratio between the number who die on a given day and the number who died the day before. These are two different measurements.

For instance, today the US has an official death rate of 85% on the base graph — with 94 dead and 17 recovered, 94 is 85% of (94+17). Whereas the "Death Rate" on the projection graph is 30%. Meaning that about 30% more people died today than they did yesterday.

There is no future component in the base graphs. They just represent the past.

The red dashed line shows the percentage of recovered (71,000 in China's case, above) who recovered by dying. Using China's data again, that number is 5%. Meaning, according to the official data (and I cannot stress this enough), 5% of the people in China who contracted COVID-19, died from it.

The projection graph shows some of the same data. It plots the number of infections, deaths, and recoveries (people who "got over" the disease) as raw numbers (solid lines). It also plots the trends in the change of those rates as dashed lines. Lines to the left of the "Today" marker represent historical data, again taken from the official sources. Lines to the right of the "Today" marker represent projections of where things will be based on a simple seven day moving average (I am looking at changing that to an exponential moving average).

The projection graphs are likely to be the most useful to most of you. They contain both past data as well as a crude projection into the future. If you want "one stop shopping," use the projection graphs for that. But, please, do not put much stock into future projections further than about a week into the future and consider the future projections to represent absolute worst-case scenarios. They are what we should be planning for, but not what we should expect to happen — if we do the right things now.

Some of you may note that in some of the projection graphs that the number of recovered and/or died exceeds the number of infections on the "future" side of the graph. Yes, I know. It is obviously "impossible" for the number of dead+recovered to exceed the number of infected. This is just an artifact of how I do things, currently. I have put some controls in to try and prevent this. You will see those controls kick in in a few countries where the recovered and/or death rates in the future get "squirrelly." Trying to keep an exponential explosion contained is hard, even in Perl code.

Describe the underlying data and your methodology

I am not a statistician and I am bad at math. I am a historian. My philosophy has been to rely on the official data and to plot it out in an as-unadulterated way as possible — i.e. with no transformations, no assumptions, etc. I am uncomfortable with my method of projection into the future.

That said, I have done a lot of thinking and writing about the nature of exponential growth (see, for example, a column I wrote a decade ago on the subject) and I believe the future projections to be accurate assuming no change in human behavior. That's the big question.

The data comes direct from the GitHub Johns Hopkins project. It's the same data that their LIVE MAP uses. I update it any time they update it, which is generally once a day in the evening (eastern US time).

What do you think about that data?

From what I have seen, the underlying (official) data is extremely suspect. First, you need to consider what it takes for an infected individual to "get on the radar." In an ideal world, every country would test every one of its inhabitants every day, record the results and track each result to the resolution — the individual either died or recovered. That would give extremely accurate numbers in terms of total infected, total dead and total recovered.

In the real world, it takes quite a bit for someone to get on the radar. And what it takes varies wildly from country to country. At the top of the list are countries like South Korea that is engaging in widespread testing, although the testing is still opt-in. At the bottom of the list are countries like the United States which has taken a "see no evil" approach to testing — i.e. none.

That said, from what I have seen, in most Western countries to "get on the radar" you need to be pretty sick. I.e. show up at a health provider. What that does to the data, again in the West, is bias the number of infected numbers DOWN and the death rate UP. My feeling (and that's all that it is) is that the number of infected (and thus the number who recover without dying) is between two and three orders of magnitude lower than actual. In other words, if the United States is reporting 5,000 infected (as it is doing today), the actual numbers are between 50,000 and 500,000 infected.

When looking at the projection maps please do not put much stock into projections for dead and recovered in the future, especially where the future is more than a week into the future, where there is less than two months worth of data. That means that the data should start, for today March 17th, no later than January 17th, 2020. As of today, there is no country with that much data, although both China and the US are getting close. The US data has other problems related to testing, however.

A little bit about the nature of exponential growth?

We are not very well equipped, cognitively, to deal with the concept of exponential growth. We are linear thinkers. I will try and explain it as best as I can because understanding it is crucial to understanding in how much danger we are.

At its core, exponential growth concerns doubling time. How long will it take for a given number to double? It's actually really easy to know, using something called the "rule of 70."

To know how long it takes for something to double, simply take the number today and compare it to the same number at some time in the past. Then divide that comparison into seventy. Since we're all worried about our money these days, let's use an example from our money.

As the pandemic spread, you decided to sell all of your stocks and dump them into your savings account at your local bank. If you put $10 into that account today and came back to it in two years and noted that you now had $10.20, how fast was it growing?

$10.20 - $10.00 is $0.20 (twenty cents). That's ten cents a year. Ten cents is 1% of the original $10. So the growth rate is 1% a year. How long would it take for your $10 to double? Well divide 70 by 1% = 70 years. It is going to take 70 years before your $10 grows to $20.

Let's do the same thing with SARS-CoV-2 infection rates. My data says that infections in the US are growing at 32% a day, as of today (March 17, 2020). 70/32% = 2.2. That means we can expect, at current rates, for infections to double in the US every two+ days. If we have 5,000 infections today, expect 10,000 by Thursday.

Why are your death rates so high?

Great question. Two reasons I believe:

  • Deaths are the most accurately collected of the statistics. If you die, you automatically get on the radar. The other two statistics, number infected and number who recover are much less accurately collected
    • This means that both the number infected and the number who recover are both much lower than reality. Since number dead is closer to reality, it makes the number dead look much higher than the other two and calculations of % dead look higher
  • Deaths lead recoveries. By this I mean that, if you are going to die from COVID-19, you are going to die within two weeks of symptoms developing (on average). If you are going to get better after contracting COVID-19 then it is going to take you a month, on average, to get better. People die from COVID-19 faster than they get better. So early in the disease's progression it will look like people are dying at a much higher rate than they are getting better

The methodology that I use to compute death rates is simple: Death Rate = # of deaths / ( # of those who have recovered ("got better") + # of those who died). This is standard fatality calculation and why you should use this method is well covered in this NYT article:

For the above reasons I object strongly to those who want to divide the # dead into the current number of cases. This gives a wildly DEFLATED sense of the actual death rate and is dangerous in that it may lull the population into believing the threat is less than it actually is. This is being used as a way, for example, show that South Korea has a < 1% death rate. It is not.

What should I be looking for in the graphs?

The number one thing you should be looking for is a slowing or (even better) a downward trend in infection rates. That''s the dashed black lines on the projection charts. We all need to do EVERYTHING we can to drive that black dashed line DOWN DOWN DOWN DOWN or we will not survive this.

But please be careful. Right now my USA projection chart is showing a downward trend in the dashed black line. I do not believe, at all, that this is a true representation of what is going on. Rather it is a manifestation of the unreliability of the underlying data, particularly in the US. I am using a seven day moving average to try and remove some of that unreliability but even so, it is profound (the unreliability).

On top of that, the official data (and thus my charts) are what economists term "lagging indicators." On the projection charts if you look at what is going on at the "TODAY" marker, you will see what was going on actually two to four weeks ago.

Because it takes between one to two weeks for someone to show signs of infection and because someone does not get onto the "infection radar" until they start showing signs today, today's number of infected is actually the number of infected as of one to two weeks ago. So on today's USA chart, the downward trend in infections was actually happening two weeks ago. We will need to wait for another two weeks to see what is happening today.

Note that the recovered rate is even worse. On average it takes over a month to "get over" coronavirus. Thus the number of recovered and the recovery rates today are what was happening back in mid February. There is more about infections, incubations, and death and recovery periods in the FAQ.

What should I take away from these graphs?

Number one: STAY HOME

Number two: How important it is to stay home, based on the success of the countries that did and the failures of the countries that didn't. Oh, and the importance of being unbelievably careful if you cannot for some reason STAY HOME

Number three: STAY HOME

Number four: What will happen if you don't stay home (future infection & death tolls)

Number five: STAY HOME