Friday, August 24, 2007

Persistence of documents on file systems

Persistence of data on storage media is always an interesting topic. No one can predict how long deleted file will be stored on hard disk.
Sometimes during investigation it is necessary to present at court the history of resident documents or deleted documents. Today I would like to discuss the behavior of Microsoft Word application and how it influences on creating timeline history of information. Some mentioned behaviors are the same for other applications like AutoCAD. I will focus on method of creating timeline history of documents which were edited by the users in the past. I this article I’ve used the NTFS file system but similar behavior can be observed on FATx file systems. Analyzing data from documents metadata are out of the scope of this document.

1. General behavior of Microsoft Word

Let’s say that we already have a file on local file system. It means that several clusters are allocated for storing the content of the file. For better explanation I will use particular cluster numbers. Our file has at least one run list which starts at cluster number 0x15ca0 (89248).
When we add or remove at least one character to/from the doc file and save changes then a new MFT entry and new clusters will be allocated for storing metadata and content of the updated file. New allocated FILE entry will store the original name of the file. The FILE entry with the previous version of the file is also updated because the file is renamed into ~WRL????.tmp. At this time we have 2 allocated FILE entries which points to different clusters. (There are more changes in the FILE entry but there are not so important at this stage – of course MAC times are always useful ;)).
If we close the file, the MFT entry and clusters of ~WRL????.tmp file will be freed. It means that the operating system can overwrite content of entry and clusters at any time. The picture below shows clusters (previously reserved for first version of our file) which now are marked as unused. As I mentioned above the first cluster number is 0x15ca0.

The content of updated file is now stored at new clusters. The first cluster of the first run list is 0x15cef (89327).
When we repeat above activity (1. open file, 2. change something and finally, 3. close it), the situation will be the same. It means that new entry in the MFT will be allocated (very often the MFT entry freed previously are allocated once again, so only 2 FILE entries are usually used concurrently – I observed this behavior only for the MFT entries – not for clusters). Also new clusters will be allocated for updated file and the old clusters will be freed. In this case we can still recover previous files, even after hours or days, but as always, there is a risk that free clusters which contain previous versions of file will be simply allocated by the operating system.
Anyhow, we have to use data carving techniques to find all doc files on the file system (as we know the header of the doc file is well known ;)).

2. The save button (ctrl + S) during editing documents

Every “save process” invoked by the user will create new file on the file system – the previous one is renamed with the following prefix ~WRL. For example: dokument.doc file is being edited for some period of time and the user had saved changes in the content 4 times. The result is presented below:

The above statements are true only when the user had changed the content of the document before invoking “save process” (save process = press the button save or press CTRL + S).
As we can see all created files are visible during “editing session”. The content of each of file is stored at different allocated run lists. It also means that each file has its own (allocated) FILE entry in the MFT.

The part of the MFT is presented below:

The dokument.doc is allocated on clusters where the first cluster has number = 0x15c3b (89147). The ~WRL0003.tmp starts from 0xfaa8 (64168). The ~WRL0005.tmp starts from 0x15b06 (88838). The ~WRL0656.tmp starts from 0x15bee (89070). The last one - ~WRL1188.tmp starts from 0x15ba1 (88993).

When the file is closed by the user only one file will stay visible – document.doc. Rests of documents are deleted automatically. Delete means that the entries in the MFT and clusters are freed.

Such behavior allows us to trace the document history. We can easily recover each file because we can identify FILE entries in the MFT. We can also create the timeline history by analyzing MAC times which are written inside FILE entries. It is worth to mention that above entries and clusters can be allocated by other users or process (because there are not allocated).

3. Auto-save option

There is one more place from which documents edited (in the past) by users can be recovered. Microsoft Word has auto-save feature enable by default. This feature creates the copy of documents being edited. The default settings are presented below:

When the file is open & the content of file was modified, Microsoft Word will create the copy in “safe location” defined in “File Locations” tab. The name of the file is “AutoRecovery save of .asd”.
When the user modify the content of the file, after some period of time (10 minutes by default), Microsoft Word automatically will save changes in new file with the same name (“AutoRecovery save of .asd”) and will free clusters which contain the content of old file. Also the FILE entry of the MFT is freed. In brief the behavior is similar to activities described in first part of this article – General behavior of Microsoft Word. The only difference is that Microsoft Word closes and opens .asd file in background. It is worth to mention that each time new clusters are allocated, so the same content of file is at least in 2 different locations on file systems (original and backup location).