Friday, September 24, 2010

Programming for Digital Forensics

by Forensic Focus columnist, Chris Hargreaves

Chris Hargreaves
About the Author

Dr Chris Hargreaves is a lecturer at the Centre for Forensic Computing at Cranfield University in Shrivenham, UK.

This month I wanted to discuss programming, specifically whether learning a programming language is useful for a digital forensic practitioner. I have been unable to find any surveys or polls capturing the proportion of practitioners who can program, or what the language of choice is for those that do. However, anecdotally, my personal experience has left me surprised by the low proportion of practitioners who have programming experience. By ‘programming’ I am not suggesting that all practitioners should be re-implementing Encase, FTK etc. but that when appropriate, being able to write short simple scripts could be useful in a digital forensics context.

One motivation for being able to write bespoke code for digital forensics is the ability to extract data from binary file formats. It is fairly uncontroversial to say that digital forensics is a fast moving field. New versions of operating systems, new applications and new uses of existing applications emerge frequently, and this results in new digital objects that need to be understood. The digital forensics research community can identify relevant digital evidence using a variety of reverse engineering techniques. However, the output of this research may only result in a schema describing the patterns in the raw data and the rules of how they should be interpreted. If an extraction tool was not developed as part of the research, in order to make use of cutting edge advancements there are several options: manually extract information using the published schema; wait for it to be implemented as a feature in a mainstream forensic package; or develop custom code that extracts information according to the identified data structures. Manual extraction does not scale well for large volumes of data, and waiting for implementation in a commercial package introduces an element of uncertainty about when results could be available. Therefore, the ability to write simple, custom code means that data structures can be interpreted that would otherwise be ignored and more digital evidence can be recovered...


No comments: