NTFS - MFT - deleted files






4.95/5 (31 votes)
Sample code for reading deleted files in the NTFS Master File Table

Introduction
Some time ago I needed to write some code to scan NTFS File table to find which files had been deleted from the disk (marked as deleted) and list them.
Since there's little information available about the MFT structure, data runs and, on the other hand, no article (at least that i had found) about some .net code to get this, i decided to post this article.
Since there are few articles about this, it took me long time to have this code working, some pieces could have been done differently, feel free to make your suggestions.
Background
You can find more information about NFTS structure at www.NTFS.com. Also this article (http://comunidad.dragonjar.org/f157/taller-forensic-ii-ntfs-7688/) helped me underestand some other things..
1-NTFS Basics
Ntfs is the file system proprietary to Windows XP Vista 2003 2000 NT & Windows 7, which supports file-level security, compression and auditing. It also supports large volumes and powerful storage solution such as RAID.
The most important new feature of NTFS is the ability to encrypt files and folders to protect your sensitive data.
I won't go very deep inside the NFTS structure, I'll just explain some topics that are used in this example.
2-Partition Boot Sector
(some description taken from NTFS.com)
When you format an NTFS volume, the format program allocates the first 16 sectors for the boot sector and the bootstrap code.
Here is an example of a boot sector of an NTFS volume formatted while running Windows 2000.
Physical Sector:Cyl 0, Side 1, Sector 1
00000000:EB 52 90 4E 54 46 53 20 -20 20 20 00 02 08 00 00 .R.NTFS ........
00000010:00 00 00 00 00 F8 00 00 -3F 00 FF 00 3F 00 00 00 ........?...?...
00000020:00 00 00 00 80 00 80 00 -4A F5 7F 00 00 00 00 00 ........J.......
00000030:04 00 00 00 00 00 00 00 -54 FF 07 00 00 00 00 00 ........T.......
00000040:F6 00 00 00 01 00 00 00 -14 A5 1B 74 C9 1B 74 1C ...........t..t.
00000050:00 00 00 00 FA 33 C0 8E -D0 BC 00 7C FB B8 C0 07 .....3.....|....
00000060:8E D8 E8 16 00 B8 00 0D -8E C0 33 DB C6 06 0E 00 ..........3.....
00000070:10 E8 53 00 68 00 0D 68 -6A 02 CB 8A 16 24 00 B4 ..S.h..hj....$..
00000080:08 CD 13 73 05 B9 FF FF -8A F1 66 0F B6 C6 40 66 ...s......f...@f
00000090:0F B6 D1 80 E2 3F F7 E2 -86 CD C0 ED 06 41 66 0F .....?.......Af.
000000A0:B7 C9 66 F7 E1 66 A3 20 -00 C3 B4 41 BB AA 55 8A ..f..f....A..U.
000000B0:16 24 00 CD 13 72 0F 81 -FB 55 AA 75 09 F6 C1 01 .$...r...U.u....
000000C0:74 04 FE 06 14 00 C3 66 -60 1E 06 66 A1 10 00 66 t......f`..f...f
000000D0:03 06 1C 00 66 3B 06 20 -00 0F 82 3A 00 1E 66 6A ....f;....:..fj
000000E0:00 66 50 06 53 66 68 10 -00 01 00 80 3E 14 00 00 .fP.Sfh.....>...
000000F0:0F 85 0C 00 E8 B3 FF 80 -3E 14 00 00 0F 84 61 00 ........>.....a.
00000100:B4 42 8A 16 24 00 16 1F -8B F4 CD 13 66 58 5B 07 .B..$......fX [..
00000110:66 58 66 58 1F EB 2D 66 -33 D2 66 0F B7 0E 18 00 fXfX.-f3.f......
00000120:66 F7 F1 FE C2 8A CA 66 -8B D0 66 C1 EA 10 F7 36 f......f..f....6
00000130:1A 00 86 D6 8A 16 24 00 -8A E8 C0 E4 06 0A CC B8 ......$.........
00000140:01 02 CD 13 0F 82 19 00 -8C C0 05 20 00 8E C0 66 ..............f
00000150:FF 06 10 00 FF 0E 0E 00 -0F 85 6F FF 07 1F 66 61 ..........o...fa
00000160:C3 A0 F8 01 E8 09 00 A0 -FB 01 E8 03 00 FB EB FE ................
00000170:B4 01 8B F0 AC 3C 00 74 -09 B4 0E BB 07 00 CD 10 .....<.t........
00000180:EB F2 C3 0D 0A 41 20 64 -69 73 6B 20 72 65 61 64 .....A disk read
00000190:20 65 72 72 6F 72 20 6F -63 63 75 72 72 65 64 00 error occurred.
000001A0:0D 0A 4E 54 4C 44 52 20 -69 73 20 6D 69 73 73 69 ..NTLDR is missi
000001B0:6E 67 00 0D 0A 4E 54 4C -44 52 20 69 73 20 63 6F ng...NTLDR is co
000001C0:6D 70 72 65 73 73 65 64 -00 0D 0A 50 72 65 73 73 mpressed...Press
000001D0:20 43 74 72 6C 2B 41 6C -74 2B 44 65 6C 20 74 6F Ctrl+Alt+Del to
000001E0:20 72 65 73 74 61 72 74 -0D 0A 00 00 00 00 00 00 restart........
000001F0:00 00 00 00 00 00 00 00 -83 A0 B3 C9 00 00 55 AA ..............U.
Bytes starting from 3 to 6 determine the volume is NTFS.
This function is the one that reads boot sector and determines if it's an NTFS volume.
Public Function IsNFTSDrive(ByVal strDrive As String) As Boolean
Dim Hnd As Integer, nRead As Integer
Dim ret As UInt32
Dim Buffer(1024) As Byte
Hnd = CreateFile(Mid(strDrive, 1, 2), GENERIC_READ Or GENERIC_WRITE, FILE_SHARE_READ Or FILE_SHARE_WRITE, _
Nothing, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL Or FILE_FLAG_OVERLAPPED, IntPtr.Zero)
If (Hnd <> INVALID_HANDLE_VALUE) Then
ret = ReadFile(Hnd, Buffer, 1024, nRead, New System.Threading.NativeOverlapped)
Else
Return False
End If
If ret = 0 Then
ret = WaitForSingleObject(Hnd, INFINITE)
Select Case ret
Case WAIT_OBJECT_0
Case WAIT_TIMEOUT
End Select
Else
Return False
End If
CloseHandle(Hnd)
Return Buffer(3) = 78 And Buffer(4) = 84 And Buffer(5) = 70 And Buffer(6) = 83
End Function
API function CreateFile opens or creates a file, the other parameters define the open mode and permissions. Since the file parameter is given the value of the volume name, it reads all the bytes in it. Is just read 1024 bytes since I only need to read the first bytes to take the volume type (i could read less than 1024, since I just need some few bytes).
If the values in position 3, 4, 5 and 6 are NTFS then the volume is NTFS.
On these sectors there also some other information about the volume, like bytes per sector, sectors per track, etc. Here's a table with all the data stored there.
Byte Offset |
Field Length |
Sample Value |
Field Name |
---|---|---|---|
0x0B |
WORD |
0x0002 |
Bytes Per Sector |
0x0D |
BYTE |
0x08 |
Sectors Per Cluster |
0x0E |
WORD |
0x0000 |
Reserved Sectors |
0x10 |
3 BYTES |
0x000000 |
always 0 |
0x13 |
WORD |
0x0000 |
not used by NTFS |
0x15 |
BYTE |
0xF8 |
Media Descriptor |
0x16 |
WORD |
0x0000 |
always 0 |
0x18 |
WORD |
0x3F00 |
Sectors Per Track |
0x1A |
WORD |
0xFF00 |
Number Of Heads |
0x1C |
DWORD |
0x3F000000 |
Hidden Sectors |
0x20 |
DWORD |
0x00000000 |
not used by NTFS |
0x24 |
DWORD |
0x80008000 |
not used by NTFS |
0x28 |
LONGLONG |
0x4AF57F0000000000 |
Total Sectors |
0x30 |
LONGLONG |
0x0400000000000000 |
Logical Cluster Number for the file $MFT |
0x38 |
LONGLONG |
0x54FF070000000000 |
Logical Cluster Number for the file $MFTMirr |
0x40 |
DWORD |
0xF6000000 |
Clusters Per File Record Segment |
0x44 |
DWORD |
0x01000000 |
Clusters Per Index Block |
0x48 |
LONGLONG |
0x14A51B74C91B741C |
Volume Serial Number |
0x50 |
DWORD |
0x00000000 |
Checksum |
Then, with this code I read some data from here I may need later.
Dim Hnd As Integer, nRead As Integer
Dim ret As UInt32
Dim Buffer(1024) As Byte
Dim BytesPerSect As Long
Dim SectperCluster As Long
Dim MFTCluster As Long
Dim NO As System.Threading.NativeOverlapped
'Read Partition Info
Hnd = CreateFile(Mid(strDrive, 1, 2), GENERIC_READ Or GENERIC_WRITE, FILE_SHARE_READ Or FILE_SHARE_WRITE, _
Nothing, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL Or FILE_FLAG_OVERLAPPED, IntPtr.Zero)
If (Hnd <> INVALID_HANDLE_VALUE) Then
ret = ReadFile(Hnd, Buffer, 1024, nRead, NO)
Else
Return Nothing
End If
If ret = 0 Then
ret = WaitForSingleObject(Hnd, INFINITE)
Select Case ret
Case WAIT_OBJECT_0
Case WAIT_TIMEOUT
End Select
Else
Return Nothing
End If
BytesPerSect = LittleEndianHEXToDecimal(Buffer, &HB, 2)
SectperCluster = LittleEndianHEXToDecimal(Buffer, &HD, 1)
MFTCluster = LittleEndianHEXToDecimal(Buffer, &H30, 8)
I also use CreateFile for reading the boot sector. As showed previously in the table, I read Bytes per sector, sectors per cluster and MFTCluster.
What are these?
As you will see later, the position of data runs, the position of a file in the disk, the position where MFT is stored, etc., is stored meaning the cluster where it's saved.
Information is stored in bits. Each group of eight bits make 1 byte. Also, bytes can be grouped in sectors. When you format a default volume to NTFS, each sector is formed by 512 bytes.
Also, sectors are grouped into another logical unit, clusters, on a default volume, 8 sectors form 1 cluster. There are also some other concepts like heads but i don't use them here.
So, the last value I obtain is the MFT Cluster. The mft cluster is the cluster where the MFT is stored...
3-Master File Table (MFT)
The MFT is also a File, but it's a special file. This file is where every file and folder in disk (also deleted ones until they are used by another new file) have a record.
When the OS needs a file, it first searches the MFT for its entry and there is all the information about its name, size, properties, attributes, localization, etc.
The first MFT record is for the MFT itself, as its also a file. As you will see later, each record has an attribute corresponding to the file name, the name of the first record is $MFT, and is the one corresponding to the MFT itself.
After this first record, comes a second one called $MFTMirr, and stores info about a copy of mft. Then follow some other ones:
All the records up to 24 are for system functions. After record 24, are stored the records for every file and folder in the disk.
4-MFT Records
Each MFT record has a lenght of 1024 bytes.
Each MFT record is formed by:
A Header
Attributes
The Header
The header has a lenght of 56 bytes in windows xp/2003 and forward and 48 bytes in windows 2000.
As you will see later, of the information stored in the header, I read bytes 22 and 23, that mean:
01 00 existing file 00 00 deleted file
03 00 existing folder 02 00 deleted folder
So, as you see in this piece of code (taken from a while statement with which I read each MFT entry)
If bnAllFiles Or (Not bnAllFiles And BitConverter.ToInt16(Buffer, 22) = 0) Then
I read byte 22, If it's 00 (I convert it to Int, anyway its useless) its a deleted file (the other boolean value bnAllFiles corresponds to a checkbox value that determines wheather I list all files or only deleted ones)
The Attribues
Attributes are specific areas in each MFT entry (after the header area). Each attribute has specific information about something about the file, there's one attribute for localization, one for file name, one for its attributes, etc.
This are all the attributes in a MFT record:
Each attribute start with a code that determines what "kind" of attribute it is, since not all attributes may be present in a record.
So, as in the table above, when you find bytes 10 00 00 00 it means that the info that follows correspond to the Standard information attributes, that, after you'll see later, it gives information about the file like its creation date, modification date, etc.
The next four bytes of each attribute, after the 4 corresponding to its kind, stand for the size of this attribute, it means the lenght in bytes of this attribute, counting from its begining, the first byte that stands for its kind.
So, If you see:
10 00 00 00 00 00 00 60
This means that there starts the attribute $Standard_Information and that it's lenght is 96 bytes (starting from the begining, 10).
The lenght of the attribute, 00 00 00 60, is stored in little endian mode, this mode of expressing hex values means that the number has to be read from right to left, so, 00 00 00 60 is 00000096 in decimal.
As you will see in the code I use two different ways of converting little endian hex values to decimal, the first is by using bitconverterclass. The second is a function wrote by me, since bitconverter class converts to decimal only hex values of a defined lenght (int32, int64, int16):
This is the one using bitconverter:
Private Function LittleEndianHEXToDecimal(ByRef Buffer As Byte(), ByVal offset As Long, ByVal Length As Long) As Long
If Length = 1 Then
Return BitConverter.ToInt16(Buffer, offset)
End If
If Length = 2 Then
Return BitConverter.ToInt16(Buffer, offset)
End If
If Length = 4 Then
Return BitConverter.ToInt32(Buffer, offset)
End If
If Length = 8 Then
Return BitConverter.ToInt64(Buffer, offset)
End If
End Function
And this is the one wrote by me:
Private Function ReadNum(ByVal buffer() As Byte, ByVal bnCheckNegative As Boolean) As Integer
Dim strHex As String = ""
For i As Integer = UBound(buffer) - 1 To 0 Step -1
strHex &= Strings.Right("00" & Hex(buffer(i)), 2)
Next
If ("&H" & strHex(0) & strHex(1)) > 128 And bnCheckNegative Then
_IsNegative = True
Return CInt(Val("&H" & strHex) Xor Val("&H" & Strings.StrDup(strHex.Length, "F"))) + 1
Else
Return Val("&H" & strHex)
End If
End Function
Then, after the attribute kind and its lenght, comes some other information about the attribute, like if it's resident (it means the attribute data is stored in mft or not) and some other data, after those 24 bytes, start the attribute data.
So, each attribute has particular information about something of the file. I won't explain all the attributes, just the ones I use in the app.
One attribute I use is the $FILE_NAME, I use the filter I showed before to list all files or only deleted one, and I show their name. After the attribute header, at byte 63 is stored the file name lenght and then at byte 65, is stored its name.
So, as you see here, first of all I have a function that returns a collection of all the file attributes (you'll see in the function that I copy the bytes stored in the read buffer to a structure that represents a mft header, and return those structures). If the attribute type is 30 hex, this is the FILENAME attribute. So I take the filename lenght from byte 63, and then its name starting from byte 65.
Attributes = GetAttributes(Buffer)
offset = LittleEndianHEXToDecimal(Buffer, 20, 2) + 1
For Each Attribute In Attributes
'FileName
If Attribute.AttributeType = &H30 Then
FileNameLenght = Buffer(offset + Marshal.SizeOf(Attribute) + 63)
If strShortFileName = "" Then
strShortFileName = System.Text.UnicodeEncoding.Unicode.GetString(Buffer, offset + Len(Attribute) + 65, FileNameLenght * 2)
Else
strLongFileName = System.Text.UnicodeEncoding.Unicode.GetString(Buffer, offset + Len(Attribute) + 65, FileNameLenght * 2)
End If
End If
offset += Attribute.Length
Next
The $DATA attribute.
One of the attributes is the $DATA, the one starting with code 80 00 00 00. As with all the attributes, then come 4 bytes standing for its lenght. The byte that follows this 8 bytes tells us whether the attribute is resident or not.
This attribute stores information about the file content. Some small files can have it's whole data stored in this attribue, but it has to be of a very small size, since an mft entry is 1024 bytes only. Most of the files have their information anywhere else on the disk, so, in this attribute is stored information about the place/s where the file is saved on the volume. So, the nineth byte tells us if the file data is resident (its data is all stored in the attribute) or not. When its not resident, its value is 1, else it's 0.
But, not all the files have all their bytes one after the other staring from xxx cluster on the disk. That's because files can be fragmented. A fragmented file, has pieces of its data stored at different places on the disk.
So, in cases where information is not stored on the $data attribute, as I said before, the $data stores information about where is the file data stored. This information is stored in "groups" called "data runs". Each data run gives us information about a cluster number and lenght, and, one $data can store multiple data runs. The cluster stands for the cluster number where data is stored for this file and the lenght means how many bytes from this starting cluster belong to this file piece.
How to read data runs and read file's contents?
Data runs allways start at byte 32 after the attribute identifier.
The first data run byte stands for it's lenght.
Also, if we take the first digit in hex from this byte, it tells us how many bytes are we have to read to find the data run cluster, and the second number tells us how many bytes we have to read to find the lenght in clusters of this piece of data starting from this localization cluster. The bytes after the one standing for the data run lenght store the lenght of the file data piece and the following bytes stand for the cluster number. For example, lets consider this data run (values in hex):
32 E7 14 FE ED 02
32 means the data run lenght is 5 bytes long (3 + 2).
It also means:
3-> the last 3 bytes stand for the cluster where this piece of the file is stored.
2-> the first 2 bytes stand for the lenght of this piece of file starting from the cluster.
So, E7 14
In little endian (reading from right to left) mean that 5351 clusters.
And FE ED 02
In little endian means 191.998
So, if we read 5351 clusters starting at cluster 191.998 we will read a file portion. If there are more data runs, then we have more data of this file stored somewhere else on the disk. If we find 00 00, it means there are no more data runs.
But is a little more complex.
When we have more than one data run, the following ones are a little different. On the first one, the localization stands for the cluster where this block of data is stored, as in the example. On the following ones, the localization stands for how many clusters after the previous localization cluster we have to move to get the cluster where this piece starts.
For example, lets suppose that, after the previous data run in the example, we have one more data run, and its structure is the same. In the second one, we have that its starting cluster is 191.998, so, we have to move 191.998 clusters after the previous one to find this new block of data, so the starting cluster for this block is (191998 + 191998), and so on, all the localization clusters are accumulative.
So, now you know that data runs give us where to find all the file blocks. But, there's something else, we're assuming here that the second block is after the first one, the third is after the second one, and so on, since we are going xxx clusters after the previous one. This is ok, but not allways. Sometimes, a block of information can be stored before the previous one.
In those cases, the localization information is negative, instead of moving xxx clusters forward, we have to move xxx clusters backward.
So, how we know the localization is negative? Negative values are those ones that in little endian start with a value from 80 to FF. Positive ones are those starting with a value from 00 to 7F.
So, if the localization would be a negative value, we have to get the decimal value that we have to rest the previous one to get the staring cluster. To get the value we have to rest, we have to make a xor to each binary bit of the number.
-We get the value in little endian mode.
-If it starts with a value > to 7F
-We convert the value to binary and replace each 1 by 0 and each 0 by 1 and then sum 1. The value we get, in decimal, is the clusters we have to rest to the previous localization to find the start cluster of this data block.
For example, lets have F3 4D (negative value).
Code: F: 1111
3: 0011
4: 0100
D: 1101
Code: 1111 0011 0100 1101, we change 0 x 1 and 1 x 0
Code: 0000 1100 1011 0010
+ 1
================
0000 1100 1011 0011
We can also do this in .NET with the value XOR FF for each byte and then, sum 1.
The $MFT Data Runs
But why did I explain about data runs if the only thing I do in the app is listing file names, not it's content? That's because the data we read at the boot sector standing for the cluster number of mft tells us where the MFT begins. MFT is also a file, and, the first $FILE block is for the MFT itself, as I explained at the begining.
If we read the MFT data runs, we get the place where the MFT itself is stored, since it's file too!. Not allways all the $FILE attributes are one after the other, since, as the MFT is also a file, it can be fragmented, if you read the blocks starting from the cluster marked as the MFT start cluster, you may get the whole files list or not, since, if the MFT has a lot of entries, it can be fragmented, as the size reserved for it was not enough and it had to continue somewhere else.
So, as you'll see here, I have a DataRuns class that analizes the $DATA attribute of a file record, and returns all the data blocks localization and sizes.
I only use it for the $MFT record, to find all the blocks where the MFT is stored.
Public Class MFTDataRuns Dim _buffer() As Byte Dim _AttributeOffset As Integer Public Sub New(ByVal buffer() As Byte, ByVal AttributeOffset As Integer) _buffer = buffer _AttributeOffset = AttributeOffset End Sub Public ReadOnly Property DataRuns() As DataRun() Get Dim _DataRuns(0) As DataRun Dim datarunstartoffset As Integer = _buffer(_AttributeOffset + 32) '_buffer(_attributeoffset+datarunstartoffset) = first dr byte While _buffer(_AttributeOffset + datarunstartoffset) <> 0 Dim drLenght As Integer = Val(Strings.Right("00" & Hex(_buffer(_AttributeOffset + datarunstartoffset)), 2)(0)) + Val(Strings.Right("00" & Hex(_buffer(_AttributeOffset + datarunstartoffset)), 2)(1)) Dim drBytes(drLenght) As Byte Array.Copy(_buffer, _AttributeOffset + datarunstartoffset, drBytes, 0, drLenght + 1) _DataRuns(UBound(_DataRuns)) = New DataRun(drBytes) ReDim Preserve _DataRuns(UBound(_DataRuns) + 1) datarunstartoffset += drLenght + 1 End While 'delete last pos ReDim Preserve _DataRuns(UBound(_DataRuns) - 1) 'get real location Dim lastpos As Integer = 0 For Each dr As DataRun In _DataRuns If lastpos <> 0 Then dr.Place = lastpos + If(dr.IsNegative, dr.Place * -1, dr.Place) Else lastpos = dr.Place End If Next Return _DataRuns End Get End Property Public Class DataRun Dim _Place As Integer Dim _Length As Integer Dim _IsNegative As Boolean Public Sub New(ByVal buffer() As Byte) 'first datarun byte is lenght of data run, lenght was already filtered before 'but also tells me how many bytes are for localization and for lenght 'so we evaluate it Dim strHexDRLenght As String = Hex(buffer(0)) '32 ultimos 3, localizacion, primeros 2 desp Dim _LengthCount As Integer = Val(strHexDRLenght(1)) '2 Dim _PlaceCount As Integer = Val(strHexDRLenght(0)) '3 Dim length(0) As Byte For i As Integer = 0 To UBound(buffer) Dim count As Integer = 0 If i > 0 Then If i <= _LengthCount Then length(UBound(length)) = buffer(i) ReDim Preserve length(UBound(length) + 1) Else Exit For End If End If Next Dim place(0) As Byte For i As Integer = 0 To UBound(buffer) Dim count As Integer = 0 If i > _LengthCount Then If i <= _PlaceCount + _LengthCount Then place(UBound(place)) = buffer(i) ReDim Preserve place(UBound(place) + 1) Else Exit For End If End If Next _Length = ReadNum(length, False) _Place = ReadNum(place, True) End Sub Public Property Place() As Integer Get Return _Place End Get Set(ByVal value As Integer) _Place = value End Set End Property Public ReadOnly Property Length() As Integer Get Return _Length End Get End Property Public ReadOnly Property IsNegative() As Boolean Get Return _IsNegative End Get End Property Private Function ReadNum(ByVal buffer() As Byte, ByVal bnCheckNegative As Boolean) As Integer Dim strHex As String = "" For i As Integer = UBound(buffer) - 1 To 0 Step -1 strHex &= Strings.Right("00" & Hex(buffer(i)), 2) Next If ("&H" & strHex(0) & strHex(1)) > 128 And bnCheckNegative Then _IsNegative = True Return CInt(Val("&H" & strHex) Xor Val("&H" & Strings.StrDup(strHex.Length, "F"))) + 1 Else Return Val("&H" & strHex) End If End Function End Class End Class
Then, I read each $FILE block from each block of data given by the DataRuns property of the MFTDataRun object.
For Each dr In drs.DataRuns
Dim Pos As Integer = 0
Do
Dim strShortFileName As String = ""
Dim strLongFileName As String = ""
SetReadFileOffset(NO, (dr.Place * SectperCluster * BytesPerSect) + Pos)}
Conclusion
Well, I hope you'll find this article usefull, maybe some explanations are not very good, this is the first article I post.
Also, some code blocks could be optimized, since I wrote this code at the same time I learned how to do it.
Feel free to optimize it and suggest anything you want.