Wednesday, August 18, 2010

Scalability: A Big Headache

by Dominik Weber

Dominik Weber
About the Author

Dominik Weber is a Senior Software Architect for Guidance Software, Inc.

In this month's installment, I will take a break from a specific problem and talk about a fundamental issue with deep forensics: Scalability.

Scalability is simply the ability of our forensic tools and processes to perform on larger data sets. We have all witnessed the power of Moore's law. Hard drives are getting bigger and bigger. A 2 TB SATA hard drive is to be had for much under $100. With massive storage space being the norm, operating systems, and software is leveraging this more and more. For instance, my installation of Windows 7 with Office is ~50GB. Browsers cache more data and many temporary files are being created. After Windows Vista introduced the TxF layer for NTFS, transactional file systems are now the norm, and the operating system keeps restore points, Volume Shadow Copies and previous versions. Furthermore, a lot of the old, deleted file data will not get overwritten anymore.

This "wastefulness" is a boon to forensic investigators. Many more operating and file system artifacts are being created. Data is being spread out in L1, L2, L3 caches, RAM, Flash storage, SSDs and hard drive caches. For instance the thumbnail cache now stores data from many volumes and Windows search happily indexes a lot of user data, creating artifacts and allowing analysis of its data files.

That was the good news. The bad news is that most of this data is in more complex, new and evolving formats, requiring more developer efforts to stay current. For instance I am not aware of any forensic tool that analyzes Windows Search databases - not that I had time to look (if you know of such a tool, post in the forum topic, please - see below). Worse than that is the need to thoroughly analyze the data. Traditionally, the first step is to acquire the data to an evidence file (or a set thereof). The data must be read, hashed, compressed and possibly encrypted. All this does take time, despite new multi-threaded and pipelined acquisition engines appearing (for instance in EnCase V6.16). High speed hardware solutions are also more prevalent. Luckily, this step is linear in time, meaning that a acquiring a full 2TB hard drive will take twice as long as a full 1TB drive. Note that unwritten areas of hard drives are usually filled with the same byte pattern (generally 00 or FF) and these areas will compress highly, yielding faster acquisition rates.

Read more at or discuss here.

No comments: