regardless of the operating system, every file has a specific structure to arrange its components, those components are file name, size, signature, contents, etc. the file structure is universal and the same for any operating system.
metadata, in general, is defined as Data describing other data. file metadata is the data that describes the file itself and is used by the OS application to make an opening, recognition, and processing that the file easier.
metadata is found in a different location, but as a starting point, the three locations you need to start looking for metadata when analyzing a file are:
- MFT Record
- File Header
- Magic Number
The NTFS file system contains a file called the master file table, or MFT. There is at least one entry in the MFT for every file on an NTFS file system volume, including the MFT itself. All information about a file, including its size, time and date stamps, permissions, and data content, is stored either in MFT entries or in space outside the MFT that is described by MFT entries. As files are added to an NTFS file system volume, more entries are added to the MFT and the MFT increases in size on the other hand each file has one or more MFT records.
MFT records can be used when searching for files within the file system. this record used a piece of evidence to prove the existence of lost or deleted files. directory snoop is a great tool to perform the MFT record search and required task. another great tool DiskExplorer for NTFS from Runtime Software.
file header is a unique identification section found at beginning/head of every file. header usually contains data used by the application that opens the file. the header could contain things like: name, author, date of creation, size, etc.
different files have different headers and it is important to remember that some headers are known standard and others are preparatory, also some files don’t have a header at all. for example, txt files don’t have a header.
final note, most file formats have a header and a trailer. e.g. PDF files have a section called XREF in addition to the header and trailer. we can check header and trailer of files with hex editor.
the magic number is another method used by applications (mostly Unix/Linux). Magic numbers are the first bits of a file that uniquely identify the type of file.
the magic number is a unique string, usually at the beginning of the file, which can be used to identify the type of the file. in Linux, the file command can be used to identify the type of the file.
in this story, I decided to analyze Docx and JPEG file format that describing metadata topic.
- DOCX Files Analysis
A WordprocessingML or docx file is a zip file (a package) containing a number of “parts” — typically UTF-8 or UTF-16 encoded XML files, though strictly defined, a part is a stream of bytes.
On the other hand, docx files are compressed files that contain XML files and binary files. you can verify that by opening and docx file in WinRAR or WinZip as below:
docx files start with 50 4B 03 04 14 00 06 00 08 00 00 00 21 as bellow:
trailer or the last section of the file, usually start with docProps/app.xml string.
the docx metadata are usually found in the docProps file within the compressed document in the form of XML files. each docx folder contains two files:
the core files contain fields that are used to describe the origin of the document. fields such as the author’s name, the last editor, and the creating and editing dates.
also, app.xml file usually contains data describing the content within the document. the file describes the number of words, characters, lines, and the application which was used to create the document.
- JPEG files Analysis
jpeg has pre-defined file structure: header , metadata and a footer. JPEG files can be found many extensions such as “JPG, jpg and jfif”. sometimes the terms JFIF, TFIF, EXIF and JPEG are used interchangeably. a JPEG file usually:
- start with FF D8 bytes.
- consists of section and the value FF is typically used as a delimiter to indicate the start of a new section.
in many case (not always though) the FF D8 at the beginning of the image is followed by FF E0 (in some case FF E1).It does not have a length of the file embedded, thus we need to find JPEG trailer, which is FF D9.
metadata files which are relevant to forensic investigation and can be categorized into three main types:
- System metadata
- Substantive metadata
- Embedded metadata and External metadata