A recent press-release from MIT proclaimed that a “System predicts 85 percent of cyber-attacks using input from human experts”. But why is it significant? MIT recently released a press-release touting a new technology developed by its Computer Science and Intelligence Lab (CSAIL) and PatternEx which details some interesting results. The system is called AI2 and the release states the following:

“The team showed that AI2 can detect 85 percent of attacks, which is roughly three times better than previous benchmarks, while also reducing the number of false positives by a factor of 5. The system was tested on 3.6 billion pieces of data known as “log lines,” which were generated by millions of users over a period of three months.”

Before we talk about the work’s significance, let’s first take a quick look at how threat analysis is typically done today:

  1. Traditional threat analysis systems rely on humans to make decisions, often times with minimal or no contextual information, or a sea of it which can impair the decision-making process.
  2. For when threat behaviors can be easily identified, humans put in place rules (think if-then-else logic) to automatically deal with low- and sometimes high-severity threats. See an article on CEP for some insights on this.
  3. Reasons number 1 and 2 are broken in today’s dynamic environment of dealing with threat actors from all walks of life, who do nothing but hone their skills on a daily basis. This translates to shifting tactics, techniques and procedures (TTPs) which can be difficult for the naked mind and static rules to keep up with.

You might be thinking, “let’s get the human out of the loop and put smart systems in place!” But this isn’t how the future of Cyber Threat Analysis is going to be fought or won. People have to be in the loop. Humans bring things like experience, education, insight, judgment, observation skills, and similar things to the cyber knife fight. But areas where humans can fall down are speed and time to discovery/resolution, consistency, accuracy, and experience. Machine learning can help bridge the gap in these areas.

The CSAIL project is interesting in that it integrates machine learning in the loop with human analysts. This is significant for several reasons:

  1. There are typically two types of machine learning algorithms used in, among other arenas, cyber: unsupervised learning and supervised learning.
    1. Unsupervised learning basically takes a set of data and tries to find interesting sequences or patterns within it. These patterns, which are found using statistics and other mathematical constructs, then have to be reviewed by a human for accuracy. Think about it like this: unsupervised learning can tell you that something happened but not why it happened or if it’s even significant. That is where the human comes to validate the results.
    2. Supervised learning, as the name suggests, uses humans to initially classify a data set ahead of time. These classifications are then applied to never-before-seen data in an attempt to discover what the data is. Think about it like this: A human can classify a mammal as giving live birth, producing milk to feed their young, etc. Then when a system is presented with previously unseen data, it can look at pre-created classifications to determine if the data closely resembles a mammal, a reptile, or something not-yet classified. Depending on if the systems gets it right or not, the human can correct the prediction so the classification process can get better over time – this is an ongoing and iterative process.
  2. Keeping the human in the loop allows for the output of the unsupervised part of the system, which needs some investigation and verification of its results, to get promoted to the next stage in the system or get discarded if need be.
  3. The supervised part of the system gets better / more intelligent at classification because it feeds directly off of how humans classify threats in the running system. As analysts get better and more accurate, so do the supervised classifications.
  4. A nice benefit of a platform like this is that it can be used to train new analysts. The way this would work is senior analysts can review how a junior analyst resolved a cyber threat and compare it to the results of the learning system. If the senior analysts deem the system resolved a similar, or identical, threat correctly, and the junior analyst did not, they can instruct the junior on what he/she did wrong.

Given this new and exciting technology, here are some things to think about:

  1. How can the Navy or DoD make use of a platform like this?
  2. Can you think of ways such a platform could be used beyond defensive cyber operations?
  3. Do you agree that human-centered machine learning is the future for cyber threat analysis?