Forecasting BlueKeep

4 min readJun 3, 2019

I captured forecasts about BlueKeep from a group of security professionals.

This panel forecasted the probability of certain outcomes related to the following scenario:

Will exploitation of CVE-2019–0708 (Bluekeep) be observed by the security community “in the wild?”

This is not a technical analysis of BlueKeep. Instead, this focuses on analysis of perceived risks from security experts. There is plenty of technical analysis here: 1, 2, 3, 4, 5, 6, 7, 8.

Some previous forecasts I’ve organized: 1, 2, 3, 4, 5
Feedback is welcome! I’m trying to improve this method and always looking for opinions. My DM’s are open on Twitter.

Takeaway

Here’s a good summary output of this effort:

Chances are about even (47.62%) for “in the wild” BlueKeep exploitation to be observed between now and end of June.

Comparison

I use the NetSpectre forecast as a reference example of a vulnerability that had a lower probability of being exploited in the wild. This is opposed to the Struts forecast which had a higher probability of being exploited in the wild.

The sentiment gathered for BlueKeep was highly uncertain and better resembled the Struts forecast, which was exploited shortly after it was published.

Process

If you’re not familiar with my work, I do a lot of writing on the subject of probabilistic risk measurement with experts. This is strictly for situations when historical data isn’t widely available or useful, and is replaced when useful data comes online.

This might be a good place to start to catch up. Here’s much more formal documentation on the approach I use.

Outcomes for the BlueKeep forecast were mutually exclusive options representing all possible outcomes.

Observed between now and 6/8/2019. (“Immediately”)
Observed between 6/9/2019 and 6/30/2019 (“By end June”)
Observed between 7/1/2019 and 7/30/2019 (“In July”)
Observed on 8/1/2019 or after. Or, never. (“In August or later, if at all”)

Results

Here’s the raw output from a tool I use to collect forecasts from a large group of security professionals. Thirteen panelists joined over the weekend.

Analysis (Whole Panel)

The “Whole Panel” measurement output a 16% belief of BlueKeep being seen in the wild in the very short term (by the 8th).

This is about the same odds of losing in Russian Roulette (~17%).

Combining (simple addition) the first two options gives us a 47.62% belief that there will be exploitation by June. That’s where the takeaway that starts this article is from. Similarly, there’s a 72.15% belief that this will be exploited by July.

That leaves a 27.85% belief that it will be exploited in August or after, if ever.

You’ll notice that the panelist forecasts are all over the map. This is very different from the NetSpectre which had stronger consensus on certain outcomes.

There will be some pretty large “busts” while this is over, and many of the participants are guaranteed to see significant error afterwards. This is the first time I’ve seen this in my (limited) active research on the subject.

Analysis (Calibrated Panel)

This forecast introduced some new panelists who are new this process. Additionally, there was less discussion between panelists than I normally see.

This is something I can’t control except by encouraging it to happen more, and may have resulted in some of the noise.

In any case, forecast noise is undesirable for security professionals. A blue team does not want uncertainty amongst experts and desires a lower standard deviation.

For this reason I am going to publish another forecast, which is a calibrated panel output. These are individuals who have active participants in multiple forecasts, been trained by myself personally, have gone through calibration training somewhere else, or have gone through Good Judgement’s calibration training.

Research suggests that individuals who have undergone calibration training perform better as forecasters. This is a suggestion that we should trust a calibrated panel more than one that is not. It’s important to note that this measurement was strictly a filter on whether I know the panelist has been calibrated or not. This has removed some people who I would consider experts. That may be OK.

Here’s the difference that results if I filter by “calibrated” experts.

With this in mind, the situation is a bit more urgent. 26% probability of in the wild exploitation by end of the week, 55% probability by the end of June, and 78% chance of in the wild exploitation by end of July.

Conclusion

The panel outputs suggest that BlueKeep has about even odds of in the wild exploitation in the short term (by “End of July”). This is undesirable as far as vulns go.

The “patch now” platitude holds up for BlueKeep. Hopefully a method like this can support clearer voices in organizations that are having trouble lining up resources or attention to patching.

Good luck!

Ryan McGeehan writes about security on scrty.io