Once upon a time I wrote an article about log data and threat analysis. Let’s see how much has changed in 10 years.
Ten years ago I authored an article about using log data (no, not the wooden kind of logs) for threat analysis; it’s on pages 17 through 26 if you care to read it. As I am thinking about writing another book (actually several), I have been dusting off some old material. I thought it would be interesting to review this older work and analyze what has changed and what is or is no longer en vogue. Let’s now take a look and see what has changed.
First off, it’s not just about log data anymore
These days it is about collecting anything and everything you can get your hands on. This includes flow data, end-point data like process call trees, memory dumps, and many other sources of data. And a lot of these systems don’t generate log data like us older folks are used to. Syslog is still the predominant protocol, however, but of course there are still the typical push-based protocols like Checkpoint’s OPSEC SDK.
Threat analysis using log data requires 5 steps: Configure, Understand, Collect, Prepare and Analyze
Most of this is true today. In order to receive data from a source, you have to configure it correctly. While normalization is still done today, as part of structuring and understanding your sources of data, it is more often the case that data received from a source is not normalized at all or algorithms are used to determine its type (i.e. source). The use of the word event is still used to denote a log message or piece of data which has been normalized into a form where a system can more easily reason about it. These days events are used most of the time, but raw log messages or alerts are also being used for analysis, too, alongside events.
Log Data Transmission Types
I Haven’t seen either SNMP or SMTP used in a LONG time. And the internal format and storage of the Windows Event Log has changed several times in 10 years, that it doesn’t resemble what it was from years past.
Log Gathering Architecture
Not a lot has changed here. For distributed environments, you still need collection points for your sources of data. The trend these days is to forgo normalization on these collection devices in favor of more complex analysis in a cloud-based cluster of Hadoop, Accumulo, Spark, [insert current buzzword tech here], etc., servers. Effectively what your distributed (or remote) collection points become are dumb store-and-forward boxes.
Another area which has changed is that of data and event transmission. In years past is was cool to write your own protocol on top of TCP/IP, which afforded you strict control over things like authentication, security, encryption, etc. These are all good things. But these days there are many off-the-shelf tools and libraries that can make this much easier. One of the more attractive technologies today is Kafka, which is a high-performance message-handling system.
Preparing Log Data for Analysis
This section of the article talks about event creation (think normalization), and various other attributes that are useful to help aid in further analysis and reasoning about things happening in and out of your network. Suffice it to say that much of this section is still applicable today.
Analyzing Events for Threats
Okay, this is kind of where the rubber meets the road. Ten years ago rule-based systems ruled. For example, “if you see X or Y or Z, do something.” Statistical correlation, using the most basic of statistical methods, where available and baked into rule engines, but often didn’t provide much value. Almost no one in the commercial or service provider space were using machine learning. Fast-forward several years and we start to see the advent of “big data” and problems of the scale under which modern systems (storage- and analytical-wise) started to crumble. The cyber security space went through a transformation where traditional SIEM platforms were too expensive, bulky, and plain no longer worked.
Between researchers at major commercial companies, MSSPs, universities, DoD agencies, and others, we started to see the emergence of new and sometimes novel techniques for analyzing huge volumes of data. These techniques employed some well-known and well-trusted machine learning algorithms, but in sometimes interesting and fresh ways.
We’ve already talked about some of these machine learning techniques, as they apply to cyber, on this blog. We will continue to post about these and other developments in cyber, with an eye toward raising awareness and facilitating thoughtful discussion.
Questions to consider:
1. Do you agree with my assessment?
2. What did I miss?
3. How has your experience with cyber threat analysis changed over the past 1, 5, 10 or more years?