Describing Vulnerability Risks

Improving vulnerability communication with quantitative risk forecasts

Ryan McGeehan
5 min readApr 23, 2019

Vulnerabilities can be accompanied with clever risk measurements that clearly explain why one particular vulnerability is worse than another.

This helps when traditional classification would consider two vulnerabilities to be equivalent.

The following will discuss the potential benefits of vulnerability measurement with forecasting. This proposes that vulnerabilities can be described with infinite flexibility to describe the risks of the environment they exist in.

Explicitly state a future outcome the vulnerability contributes to.

We might express the risk of a vulnerability with CVSS, critical, or maybe a color code. We might support this with textual / verbal argument, a proof of concept, or other things. None of these are bad. We’ll likely use them forever as they are informative.

An additional measurement approach can describe the vulnerability with a higher fidelity of the risks we are relating to the vulnerability. That is done with forecasting a scenario. People are naturally familiar with the components involved with this sort of measurement… It just needs to be practiced and used a specific way.

For this essay: Let’s describe the risks involved with a vulnerability as a handful of scenarios. We’ll accompany the vuln with forecasts of whether these scenarios will occur.

Some scenarios may be (Given: we do not fix this bug…):

  • This vuln will be disclosed via bug bounty within one month of release. (60%)
  • We will publicly apologize for this vuln within one month of release. (35%)
  • This vuln will be fixed by on-call or incident response within one year. (45%)

The percentile is an assigned belief to each scenario that we would choose.

This allows for empirical approaches that let us score forecast skill over time, with highly contextual risks we need captured by the vulnerability.

The author (us) can be held accountable to the risks we’ve identified. These forecasts can be measured if we’re ultimately right or wrong. Once we publish these forecasts, we’ll have to measure it after one month/year, and account for the error.

This is a feature of forecasting that is impossible with critical, color codes, or verbal debate. We can’t account for our judgement over time with these methods. CVSS has very little context about the boutique risks of our environment and can’t reach that far into communicating them.

Why is this important?

It is possible to operate in professional environments as a calibrated forecaster.

This means, for instance… if we have a 90% belief in a future outcome, we are eventually correct as close as possible to 90% in all of our historical claims. This is an important concept in predictive measurement along with the concept of prediction error.

There is extensive research in expert prediction calibration dating back to the 50’s. It’s also the backbone of modern meteorology that we benefit from on a daily basis. In more uncertain settings, we see calibration being demonstrated on a number of topics by forecasting groups.

The 538 blog has been forecasting operationally for over a decade with the same measurements.

From the 538 blog

The above charts from 538 hug a calibration line (1:1). This means that the 538 blog expresses greater degrees of certainty only when they truly believe they will be correct with the same ratio (50% confident claims should be 50% correct over time). This is empirical evidence that suggests their methods are more reliable than “wild ass guesses”.

This cannot be achieved unless we keep track of our forecasts. And that is exactly where the opportunity lies in vulnerability reporting.

Can we build evidence that we reliably measure risk?

Let’s describe the possibilities of a vulnerability.

Imagine that we have discovered a severe vulnerability.

Let’s say it’s on a production server with sensitive data. It’s Struts. It’s the same vulnerability being exploited by numerous internet scanners. It was just exposed when the business made some changes to a network configuration.

Our intuition is screaming that this needs response right now.

Our “critical” classification doesn’t quite do it justice. A “10” CVSS score prioritizes it evenly with other 10's currently known at the company. This isn’t really in the same ballpark as kicking a door down.

All we have is our own expert belief that this is a big deal.

This context rich sense of urgency can be captured with measurable statements. We can express the severe urgency of this issue while also expressing that we are a reliable judge of severity.

Let’s use the following scenarios which might be crafted for this specific risk we are expressing.

We believe if no mitigation occurs:

  • This vuln will be exploited in the next 24 hours of this email. (60%)
  • This vuln will be exploited in the next 72 hours of this email. (80%)
  • This vuln will be exploited in the next 7 days. (90%)
  • This vuln will be exploited in 30 days. (98%).

And here’s the uppercut… We can express that we are already owned.

  • A P0 incident will escalate from a forensic review of this server. (50%)

If we are calibrated and show a low historical error in prediction making… then the source of this risk measurement (us!) can be demonstrated as reliable.

How do we demonstrate that we are reliable in measuring risk?

We must make a lot of predictions to be able to show calibration. There are a lot of opportunities to make these predictions over time. Such as:

  • How much will we pay out in bug bounties?
  • Will a red team capture the flag they’re targeting?
  • How many vulns will be reported in the week after our launch?
  • How many advisories will an upstream dependency publish?

Prediction opportunities appear frequently in any environment. We just have to keep our eyes peeled for good ones.

We can also forecast all sorts of business metrics that have nothing to do with security. Of most important note: prediction skill is not entirely subject matter specific. We can regularly participate in prediction of other subjects and see benefits in other topics. For instance, participating in the Good Judgement Open will help keep skills sharp.

It’s most preferable to find these opportunities that are related to our subject matter, but it’s useful to widen this skill set with training.

High potential for automation

Lastly, it’s possible to train classic statistical models to apply forecasts to vulnerabilities at scale. I’ve done similar with a proof of concept here. While that PoC is focused on AWS configuration security, it’s easily adaptable to vulnerability risk. An automated process would simply need insight into categorical data about the vulnerability and can be fully customized to support some data that experts have already assessed. I think there’s interesting work to be done there.

Ryan McGeehan writes about security on