Hypothesis, Risk, and Science

Ryan McGeehan
11 min readApr 6, 2020

My hope is that the cyber security community will develop as a risk science.

Science starts with correctable claims. Progress towards more useful knowledge come from continuous corrections.

However, a risk hypothesis may represent future events that have never previously happened, might not ever happen, or may not be observed when it does happen. How can we pursue science with an unstable vantage point?

A risk hypothesis conflicts with common notions of science when compared to our expectations of the more mature sciences. The following essay hopes to lay out some of these difficult concepts and describe a place for us to become a risk science.

A risk hypothesis complicates scientific methods.

An example hypothesis is easy to formulate and consider experiments for:

This new fertilizer will make the plant grow faster.

Experimentation may take place in the greenhouse. Soils, fertilizers, and plants are readily available. We can simultaneously run thousands of experiments. We carefully control light, air quality, and watering. We repeat these tests, only limited by cash, time, and effort. Then we measure and interpret measurements of whether the hypothesis can be confidently confirmed or rejected. New soils and old soils. One might grow faster!

This example can deceive us into thinking all science must meet a similar, straightforward standard of control. Not so. An arbitrary level of control of an uncertain problem is not the demarcation of science. Some increasing amount of control over our previous uncertainties is a victory of long practiced science, but not the gatekeeper to starting towards it.

Let’s consider that risk is the probability and impact of future events and represented by a probability distribution or an expected value. All risk is some expression of uncertainty. As we are measuring things that haven’t happened yet, or might not ever happen, and we may not even see if they happened… we are stuck with a highly uncertain and problematic scientific realm. Fine.

Uncertainty does not evade science.

1.The forecast as a risk hypothesis.
Defining clear targets for research, mitigation, and correction.

Someone might inspect engines if they are isolating what makes a car go faster.

Selecting miles-per-hour may be targeted for measurement if we’d want to understand whether the engine makes a difference in making cars go fast or slow. In risk, we need reasonable values to target that represent the reduction of undesirable future outcomes.

When I patch, an unwanted outcome is reduced.

A forecast is the best available term to represent a belief that a future scenario or impact may occur. A security engineer wants to avoid a breach. In a sense, their work is reducing probabilistic uncertainty just as others are making cars faster by increasing miles-per-hour. Forecasts are compatible with probability or impact in risk = probability * impact. Forecast values come from rich data (good), simple models (good), experts (meh), or some hybrid of both (ideal).

Example: We will disclose an incident to a regulator (90% / 4 Years) and suffer monetary losses.

But… probability is a complicated topic.

Risk has hard conceptual limitations when compared with typical scientific environments. Risk is unwieldy in the face of measurement, but not invulnerable to it. That doesn’t mean we’ve lost the opportunity of scientific methods altogether. Rather, we lean on scientific principles in face uncertainty, lack of data, and severe errors.

Let’s talk about these limitations and troubles in treating forecasts as scientific hypotheses in risk.

2. The difficulty of future events as testable measurements.
Science is resistant and fearless in the face of uncertainty in any form.

There are strict limitations and inconveniences while measuring risk.

A risk hypothesis targets an event occurring within a designated time. Confirmation of that measurement may only have a single opportunity which will come to pass. Unlike our greenhouse, we cannot test repeatedly without also having a time machine.

Note: This essay is concerned about rare, high impact events.

The risks we care about are only realized in real-time: It will only have ever happened right-at-that-moment. It won’t happen at that time or under those conditions ever again. It was a point in time, one that will never be repeated. Which means one opportunity is all that is available to observe and error a forecast. Only a time machine would allow us to grow the plants under different conditions. What once was conveniently repeatable is now highly restrictive.

This limitation means we must be able to “observe” a scenario with our precious opportunity to test a forecast. It raises a question:

Were we hacked, but didn’t know?

Issues of observability persist in our experiments: Did the scenario occur outside of our visibility? Was our data breached and logs were sidestepped or deleted? We might not trust our risk measurements due to any distrust of our observational capability.

There is also the case that we are impacted by scenarios that were never hypothesized at all. Should a science of risk pursue resilience against unknown, unforeseen threats and vulnerabilities? Certainly, we should at least be in the business of preventing things we didn’t outright hypothesize. Are we in this business too?

Lastly — what use could there be in measuring future bad things if our adversary is intelligent? What if our predictions leak? Will all adversaries perpetually and instantly change directions entirely to avoid our best shot at mitigating them?

Observability issues, measurement challenges, and failures of imagination are typical frustrations in any science. Let’s keep pushing.

3. Uncertainty attracts data, scrutiny, and tests.
A risk science won’t resemble a laboratory.

The early beginnings of a science never resemble their modern forms. Any comparison surfaces deep flaws with overly qualitative analysis, repeatability, and are often deeply refuted or corrected in hindsight. We’re only OK with this because correctable claims need to start somewhere. We celebrate scientists that have allowed us to journey down paths that begin with their discoveries.

The discovery of oxygen (gutenberg) is considered historically important scientific progress. Though the original writings read laughably non-scientific in comparison to modern sciences. So… why aren’t we laughing at their work? The discovery is supported by almost entirely qualitative and untestable measurements:

  • But the most remarkable of all the kinds of air that 1 have produced by this process is, one that is five or fix times better than common air.(*)

It goes without saying that this sort of scientific work would not meet today's standards. But we don’t ridicule this progress. We celebrate it!

A pursuit of risk sciences in cyber security must start with very rough edges. The bar always starts low. This is especially true of science in hindsight. As subjectivity, repeatability, and qualitative hurdles appears — they become scientific opportunities, not failures.

Fortunately, we won’t have to start from scratch. We have industries as reference who are farther around this path.

Let’s talk about repeatability.

Einstein’s theory of general relativity invited a hypothesis that the light bends around the sun. This was considered an untested hypothesis. It was left unconfirmed for years.

Why was it seemingly untestable? Well, how can we see the sun bending light if we are simultaneously blinded by it (daytime), or if the sun is behind the planet?(night time)?

Remember, this is the early 1900’s. No spaceships! No satellites!

Arthur Stanley Eddington was the first to say “we can test this, ya dummies!” and he waited for a solar eclipse. He set off on an expedition to measure the positions of stars near the sun from optimal locations during an eclipse. This was the only known opportunity to see stars near the sun. Two teams collected measurements from the West African island of Príncipe and the Brazilian town of Sobral.

Those measurements became the first confirmations of general relativity.

Horribly inconvenient! Only available when an eclipse occurs. Requires us to leave the laboratory. There’s no greenhouse for this science!

This suggests that the demarcation of science does not include easily tested hypothesis. Einstein's work is still considered scientific even before it was tested. Similarly, a risk hypothesis does not require convenient testing to be scientific, so long as test conditions are plausibly able to attract error.

Difficulty of a test does not exclude risk as a science. It welcomes it. Uncertain problems invite science. Risk measurement is simply done with a forecast, but our ability to collect error and perform calibration is inconvenient.

This becomes more complicated when we measure the risks of things we might not see taking place. Scientists familiar with dark matter are familiar with this headache.

Dark Matter
The study of dark matter is a science with broad observational limitations. We are not sure what dark matter is, whether it exists, or even how to properly measure it. Even the name hardly describes anything in particular at all.

Scientists are certain there is knowledge to be gained in the pursuit of such an evasive topic and they’re likely to be wrong about a lot of it. Dark matter and all scientific activity associated with it are great examples of hard to observe science. At the moment, they’re not really sure if they’re observing anything at all. But, scientists still pursue an understanding of dark matter with science.

Dark matter is a science of unobservable behavior that must be reasoned with by indirect observable evidence. The resulting evidence is highly approximate from the actual problem to be understood. The evidence doesn’t result in direct confirmation or refutation of dark matter. However, it helps scientific conjecture about the nature of dark matter.

Risk is often similar. We consider scenarios that have never occurred before (What if: Efficient integer factoring?) but we often have reference data that may be useful from nearby from cryptography research (All the failed attempts to factor integers efficiently). A risk science will sometimes assumes unobserved scenarios (Did we miss a breach?), but often not (Darn. Yep. We were breached!) whereas some sciences have never seen their scenario or object of study, ever. We have a growing amount of retrospective observations in cyber security. A qualm about possibly-not-observed adversaries does not strike risk out with scientific demarcation. It invites it!

Another big question is if a science of risk is a non-sequitor, due to it being a study of future things we can’t possibly predict.

4.Broad hypotheses is inclusive of “unknown unknowns”.
A risk hypothesis may include more causes than others.

Some amount of regret exists in our hindsight of bad events. You know… I should have thought of that. Stated differently, we may look back at our previous work and wish we would have prioritized risk differently to avoid an event. Maybe we wish we have worked on mitigations with different priority.

Or, perhaps we miss something altogether. Total failure of imagination.

Black Swan Theory
One may consider these failures of imagination to be a limit of a risk science. How could one mitigate, or even study, the inherently unknown? This draws upon Black Swan Theory. Black Swans can only be reasoned with in retrospective as they weren’t predicted to begin with.

The existence of events that are only tractable in hindsight is troubling. Are we only able to reason from things that have happened before?

Our job is quite the opposite: mitigate future suffering. However, there is hope, as risk both involves the prediction of events but also their impacts (R = P * I). Different perspectives on how risk can be decomposed helps insulate us from any infatuation with fragile hypothesis. That is, a focus on impacts and outcomes: not exclusively their causes.

Example: Instead of predicting how the stock market could crash (COVID-19), we are better off holding the assumption that any number of events may cause a crash, including, but not limited to,an infectious disease. This perspective avoids the enumeration of initiating events. It is far more effective to develop hypothesis that is resistant to large failures of imagination by avoiding overly specific hypothesis.

Most cyber security failures are not in black swan territory. We repeatedly suffer from familiar causes. The lessons we draw from black swan theory are still very important!

It’s our ability to formulate hypotheses that allow consideration of unforeseen events that allows us to open up the scope of our science towards usefulness when our creativity is limited.

This is surely related to defense in depth. However, defense in depth has principles that offer resilience in the face of multiple failures. A principle of broad hypothesis highlights resiliency in the face of unknowns. This may just be a rousing cheer for defense in depth, but the width (broad v. narrow) of a hypothesis has other important considerations.

Knowing when to develop broad or narrow hypothesis is surely a burden of expert knowledge. And therein lies the value of knowledge work, the most important principle of a risk science. This comes back to resemble what our world already looks like today: security engineers doing our best.

5. Blue teams pursue risk science at high speed.
Practicing security engineers generally study narrow hypothesis.

Science is often described as a slow process.

When I think of formal scientific method an image sometimes comes to mind of an enormous juggernaut, a huge bulldozer-slow, tedious, lumbering, laborious, but invincible― Robert M. Pirsig

The scientific method is iterative. Science may be as simple careful troubleshooting, or complex long term academic marathons. The length of these iterations is not relevant in examination of scientific demarcation.

Fluctuation in an area of knowledge will reduce the lifetime of a useful hypothesis and how much research is exerted against it. Our field fluctuates often in some ways, but not in others.

Malware, phishing, software exploitation, and authentication weaknesses are fruitful areas of long term study, for instance. We’ll discuss these as broad hypothesis:

What is modern malware capable of? How often are companies breached? How would I exploit x86 software? How strong is cryptography to known attacks? Has 2FA been effective for customers?

A broad hypothesis can sustain and support research over a long period. These studies may produce a well informed best practice. Or, attract data for widely seen events. Entire conferences devoted to a subject like malware research or exploit development have survived for decades in their spaces.

A broad area of study has very few conditions associated: They are resilient topics in the face of new adversaries, methods, technology, and victims.

However, broad hypotheses are not really in a blue team’s interests to dedicate time towards. A blue team’s environment or potential impacts adds numerous conditions that shorten the duration of study and breadth of interest in their hypothesis. It makes their hypothesis more narrow.

A blue team is generally concerned with topics like:

What vulnerabilities are exposed to this network from this other network? Should I work on corporate authentication or client-side hardening for the next month? Does this new product introduce vulnerabilities in this other product? Were there any persistent adversaries on this host that may have since eliminated their presence and evidence?

Typical blue team work is a study of these narrow hypothesis to inform decision making on a short life cycle. They can only exist with rapid hypothesis formulation, measurement, and communication of research findings. These limitations force us to rely on highly approximate and expert elicited measurements, very distant reference class data, and other approaches we would consider less desirable than what is available to broad hypothesis. It’s still science.

An example of a narrow hypothesis may include a vulnerability finding in a specific product. A finding may be rapidly discovered, associated with a hypothesis like will this be exploited?, measured with expert judgement (or, if applicable, data) with a forecast roughly based on Exploited / Time.

This suggests there is absolutely no replacement for expert knowledge in day to day, narrowly focused blue team work. As risk measurements become more nuanced and conditional, we come to rely on expertise and general knowledge in the absence of empirical data, statistics, or tooling.

The value of a broad hypothesis is also clear. With broad hypotheses, we allow for developed strategies that are inclusive of unknown risks and allow us to confirm their worth with a collaborative, scientific community.

As experts we will need to argue and measure about which ones are better in which situation. This will sure be a staple of our practice.

Any given science looks very different from the next. What would make a risk science different? We are similar to other sciences in terms of observability, difficult testing, and with uncertainty of future events. We develop broad hypothesis that may create reference classes, best practices, and data. However, we also have narrow hypothesis that moves quicker, is more approximate, is expert based and tactical. We’re able to make correctable statements with a forecast just like any well formulated hypothesis.

Risk makes our work look different, but we can still pursue it as a science.

Ryan McGeehan writes about security on scrty.io