Lessons learned in risk measurement

11 min readAug 7, 2019

This is a bit of a status report for my progress towards a better industrial approach towards cyber security risk.

Work I’ve done!
Experiences so far and opinions I’ve formed.
Where I might go from here.

1. Work done so far

Panel Forecasts: I’ve organized multiple panel forecasts and have several more pending. This includes Struts, NetSpectre, Chrome, that Bloomberg article, Fortune 500 data breaches, NPM Compromises, Firefox, BlueKeep.

Breach Impact: I’ve battled the “hardest” problem: the expected bottom line hit from a breach.

Tool assisted forecasting: I worked with a few others to develop a proof of concept approach to scaled forecasting with the “lens” model and AWS security.

Interviews: A large amount of my time has been spent away from the security industry in developing my opinions on this subject. I’ve been interviewing professionals who evaluate risk in aerospace, meteorology, and nuclear industries. I’ve taken a shot at centralizing recurring concepts in expert elicitation as Simple Risk Measurement documentation.

Consulting: I’ve used forecasting approaches in my consulting day-to-day to add rigor to attribution, IR root cause analysis, red teams, and prioritizing mitigation work.

(Small) Community Building: I’ve been lucky enough to tag along with a few companies that are introducing forecasting concepts into day to day security work and be in the loop on their observations. (1, 2) We meet up every month or two as a small, high focus group.

2. Experiences so far and opinions I’ve formed

These are points in risk measurement that I struggle with and cause me grief, hope, or ideas for future development.

Bay Area Talent: Quantification (really, any risk model) is more difficult to introduce to a typical engineering organization in the Bay Area. Imposing any sort of additional rigor on product development only succeeds when the reward is clear.

Many of us here are familiar with a common eng organization strategy:

Attract the best talent with high impact roles in a fast moving environment.

This is achieved with imposing low bureaucratic overhead and building perceived closeness to the technology being built by your engineers. A feeling of eng-driven decision making empowers an engineer. Engineers may feel that they aren’t appreciated at their current organizations as they aren’t given any authority over their space. The recruiting lure is that their decision making and judgement will be respected. An organization is built around these empowered engineers leading the way.

This results in some eng organizations that go without much management overhead. This is often OK if there are clear objectives being worked towards: A launch date, revenue, growth, etc. Symbolically projected over engineers on a 70 inch monitor dashboard hanging near their desks.

This hiring pattern is often pursued by security engineering organizations.

They often don’t have these obvious, easily measured goals.

Security talent is commonly hired in roughly defined subject matter areas. You know, the network / endpoint / cloud / application person. In this eng org approach, the engineer is empowered to subscribe to their own risk mitigation direction if they’re not following a prescribed compliance goal or maturity model.

This seems to put a “filter” on the long-game potential for a security organization. Larger organizations that cannot find a unified approach to risk will have disparate teams of focused subject matters looking to maximize their narrow approach to risk.

What is interesting is how an incident can provide this push through this filter towards unification: Everyone comes together for an incident. An incident is a unified risk that everyone can agree on. At least, temporarily.

Introducing a prescribed risk measurement direction for security engineers builds some bureaucratic overhead that we were trying to avoid. This suggests that the only way that a security engineering organization can truly operate with a systemic approach to risk, is if the engineers themselves are proposing and organizing these risks while they manipulate them. It has to be easy enough to do this.

That may sound obvious.

But, we are surrounded by anti-patterns. I can observe external consultants, penetration tests, vulnerability management programs, red teams, etc that feel as if they are the producers of risks to be mitigated. As if they are somehow producing the roadmap to mitigation that their organization is supposed to follow up on. Like they are directing an assembly line of engineers to mitigate the risks they throw over. This is exactly the type of engineering organization that talented engineers want to leave: The decision making of what to work on is pulled away from them.

I find that the strongest suggestion I can pull from my observations is that security engineers must be capable of structuring and organizing the risks they want to mitigate collaboratively with a large group. We did this with source code (CVS, SVN, Git, GitHub) and we probably need this attitude towards codification and collaboration on the subjective quantification of risk.

Imposed Risk: It is difficult to decide on a target threshold a company should aim for as acceptable risk when it relates to harming another person.

The company is not footing the bill when it comes to imposed risk.

(Note: We sometimes call this third party risk. Our industry’s use of the phrase doesn’t seem to capture all of my concern like the academic phrasing of imposed risk.)

The FAA has clearly standardized criteria on how much risk the government is willing to impose on another person. Here’s an FAA example about flight risk:

shall not exceed an expected average number of 0.00003 casualties per mission

The military has decision frameworks for experimental vehicle testing and space launches when risk exceeds this threshold. Cybersecurity needs efficient decision science based methods that figure out what it is willing to accept. For instance…

When is our product safe enough for launch?

Measuring effectiveness of forecasting: Forecasting is easy to test and score. There are challenges in deciding if something is good enough or reliable enough to make a decision with. It’s important to have a method to describe how good the measurements my panels produce are.

A simple one that I’ve been wondering is how much better a panels is compared to an apathy approach. A completely apathetic forecaster would codify the 🤷 emoji as the principle of indifference strategy in game theory.

Depending on the amount of choices available, an apathetic forecaster would have guaranteed Brier Scores calculated as:

Two Option or Credible Interval: 0.5
Three Option: 0.666
Four Option: 0.75
Five Option: 0.8

The panelists I’ve organized have an average Brier Score of about .32. In all option cases, this suggests that we’re tracking positively and are better than guessing.

But, I struggle with statements like this. Better than guessing does not necessarily mean we’re better than current approaches.

I think this is a useful starting point but it needs to improve. Forecast methods have a few dimensions of measurement and there’s work to be done understanding how they relate to cybersecurity.

Need to better express when a source of predictions is useful for a given situation.

Calibration requires volume: Calibration using non-simulated forecasts (perhaps like 538) with my panelists is difficult.

I’ll need to find frequent forecasting use cases if I ever want to make progress on this. It’s still useful in training.

Additionally, there is a dependence on the long term understanding how forecasting skill is generalized. For instance, if a skilled forecaster in one subject is improved in other subjects. This seems to be the case from my own observations, but academic guidance would be helpful.

Need to better express “skill” when there are infrequent, directly applicable forecast opportunities.

Comparison to Established Industry: Cyber security is behind on porting over well established risk and safety culture concepts to our space. We’d be smart to regulate and centralize data breach and incident notification into a smaller group of repositories of causal data. This must reduce the necessity of forecasting at all.

This is a divisive issue for me: Improve the expert as an approximation tool, or improve the collection of data as a statistical one? Both will ultimately be useful and bolster the other, they go hand in hand in other fields.

Need to figure out which side of data gathering efforts, or uncertainty efforts, are better for me to pursue.

Cost of Measurement: It’s overkill in many of my practical use cases to use all known available methods to defend against the dangers of expert opinion. We ultimately work in fast moving environments that are given limited time to implement mitigations in day to day security engineering.

For instance, take the Benjamin Franklin quote:

“For every minute spent organizing, an hour is earned.”

This wonderful quote assumes that our organization efforts can be done efficiently enough to earn us an hour. It maybe suggests that two minutes of organizing maybe wasn’t worth it?

Which approximation methods are efficient enough to hit this ratio of measurement effort? We must aim to have leftover resource and patience to execute with decisions made from the data we’ve gathered in risk assessment and analysis.

We can’t be bogged down with endless meetings, consultants, certifications and training in an effort to somehow make ourselves more efficient.

Quantitative approaches sometimes have higher costs, sometimes lower. Having clear expectations for using each case is important.

Quantitative feels heavyweight: It’s been very important for me to attack the overall burden of forecasting. It needs to be reduced to a very small cognitive effort to become a more continuously leveraged practice. These methods quickly becomes gets out of reach if an engineer is required to over-model a problem, or select and train a panel. It’s easier to just start putting out fires however possible, using a checklist, and using whatever intuition is available.

Kelly Shortridge sums this attitude up nicely in a discussion: “that sounds hard, let’s go shopping”. Risk measurement has to sound like an achievable method if we’re to prevent engineers from going shopping.

Risk measurement needs to be possible in a slack channel if it will also be available in fast moving decision environments.

Operational Forecasting: Many engineers become very excited when they go down the risk quantification road. The fact that a risk assessment can be empirically tested is something engineers aren’t always aware of. They’ve become very excited about it’s potential, but it is a rough road to operationalize a risk forecast over time.

A feeling of distraction forms from “real work” once the novelty wears off. We have to forecast… again? We just did one!

We should assume that engineers just want to write code with the least amount of overhead possible.

Quantitative approaches need to be repeatable, simple, and have a feeling of usefulness for those involved.

The journey is the destination: I’ve found that panel forecasts are really valuable for the discussion. Some participants don’t care about the numeric output, they care about the decomposition process. Group discussion often sees outlandish opinions or misunderstandings reigned in. A scenario helps give everyone a target for conversation that doesn’t force agreement, and everyone’s cards are shown.

The unified “risk” direction for a group may be more useful than the measurement.

Weaknesses in Quantitative Methods: I’ve found strong research directions that are wholly anti-quantitative and have reasonable justifications for those suggestions. I’ve avoided being overly quantitative with my approaches, but this industry is over indexed on qualitative methods and needs us to push its limits into new horizons.

The strongest arguments against quantifying risks are rooted in cognitive and systems-complex (book) critiques.

The cognitive arguments defend certain aspects of intuitive decision making and encourage investment in them. For instance, well designed checklists and smart heuristics.

The systems-complex debates point out that risk is so inherently complicated in any large system that it can’t ever be captured by a probabilistic risk assessment.

The problem I have is that I agree with both of these debate areas. I absolutely see them as limiting the usefulness of quantitative models. My main agreement is with how far down a probabilistic rabbit hole some risk assessments will go. They eventually lose sight that any subjective hypothesis that was chosen for analysis will not capture enough causal factors to begin with. Eventually we become fooled by randomness, with Nassim Taleb ushering you into a hell of /dev/urand. Ultimately, quantitative vs qualitative is a problem with the capability of models, of which all were wrong anyway.

Anti-Quantitative perspectives don’t eliminate the usefulness of quantification. It does pull back heavily on the “when?”, and “how?”. I need to figure out guidance for those questions.

Quantitative follows Qualitative: No amount of my research so far can eliminate the usefulness of cyber security’s qualitative approaches.

The big question is that checklists are high value and efficient… until they’re not.

It’s a bit of a ship of theseus problem. At some point, the models we use do not capture risk and guide mitigation like they once did. This is seen very clearly at large and complex organizations. It is difficult to answer… when did they get there?

The risk between the checks can be so large that it makes checklists look ridiculous. Even now, a risk like Account Takeover isn’t addressed in any maturity model, best practices document, regulatory or compliance standard that I’m familiar with. I’m sure this will change. These things mostly lived without best directly applicable practices for a long time. They lived between the checkmarks. They need some way of reasoning with them, which makes quantitative approaches very attractive.

Qualitative approaches eventually bolster quantitative ones, and vice versa.

Measurement Education: I really wish my education included a foundation of what measurement fundamentally is. I really wish everyone I spoke to would have it, too. I spend more time evangelizing basic measurement concepts before I can get into the opportunities measuring risk.

The concept of measurement is really useful for the uncertain topics we work with that have an absence of recorded data. If I could wave a magic wand, I would give everyone in our industry a primer on metrology. I approach this subject here. It’s been the most important foundation for my work so far.

ToDo: Study and teach how uncertainty applies to cybersecurity

Problems Come First: The most effective and purposeful risk measurement exercises I’ve been involved with seem to come from people who have skin-in-the-game and are actively mitigating a risk themselves.

The opposite is the case when people are excited to try a hot new risk measurement approach: they bumble around looking for a reason (a risk) to play with their toy. I’ve fallen into this repeatedly while searching for a problem to explore.

This is a big reason why I aim to work directly with engineers, and not compliance or risk managers.

Engineers prefer measuring risks they actively mitigate. Not the ones they’re asked to measure by some outside bureaucracy.

Tabletops were hiding something: The components of quantitative risk measurement are already in practice with security teams that tabletop. They propose a scenario and role play it. It’s a natural next step to learn the practice of decomposition and quantification of their scenario.

Tabletop exercises are my preferred quantitative risk measurement gateway drug and will probably use them at the beginning of any risk quantification project going forward.

3. What is next?

I view this as a systemic problem with many opportunities for improvement. The best summary of my goals to contribute to a 50 year solution:

Observe a security engineer, tech lead, manager, director, and CISO represent risks as the probability that the bad event(s) they care about will occur in a given timeframe.
Encourage industrial insurers, regulators, auditors, or decision platforms to publish opinions on the probability of incident analysis. This will help hold us accountable for being wildly wrong or fraudulent in risk predictions. This provides trust in #1.
Encourage an increased trend in disclosure as a safety culture value. Encourage the development and enforcement of breach notification laws. Better regulation will centralize incidents and root causes. This provides trust in #2.
Build imposed risk models that can be efficiently included in a companies quantification of risk. Companies properly budget and prioritize a security organization and its potential harm on the world as they become inclusive of societal risk instead of just their own losses. Help a company measurably show they work to protect their customers interests outside of their own.
Show that we can request risk measurements from organizations we’d normally envelop in a qualitative compliance process. “What’s the likelihood that you’ll lose this data in 5 years?” can be immediately responded to without resorting to a normative theater of checklists and certifications.

Conclusion

I’ve got seemingly endless work to do showing proof of concepts with these methods in real world settings. It’s idealistic to expect cybersecurity to hit all of these points. Other industries haven’t hit all of them either. In any case, you gotta dress for the job you want. I think these goals are OK to think about in the meantime.

Ryan McGeehan writes about security on scrty.io

Lessons learned in risk measurement

Contents

1. Work done so far

2. Experiences so far and opinions I’ve formed

3. What is next?

Conclusion

Written by Ryan McGeehan