Click here to Skip to main content
14,268,184 members

Determining File Type: A Demonstration of Different Techniques

Rate this:
4.76 (8 votes)
Please Sign up or sign in to vote.
4.76 (8 votes)
2 Dec 2014CPOL
Analysis of various methods for determining the Mime type and executable status of a file

Introduction

There are several popular ways to determine whether a file is executable or not, and other techniques that are used to determine its Mime type. This article is based on my C# class library that encapulates some of these methods. In the demo solution, I provide Windows Forms and WPF applications that demonstrate using the library. While not specifically addressed here, the library should also work with Windows-based web servers.

Background

In this article, I describe the main functions and how they work and give a couple of short representative code examples where appropriate. Please refer to the code available at the links above for the remainder of the source code.

Note: The class methods return string values with generally human readable information. If you wish to use the library in production, it would be wise to return enum values instead of strings from GetExecutableType, GetBinaryType, and GetPEInformation.

The solution file was created in Visual Studio 2013 but should be usable with versions 2010 and above. If not, you can either change the versioning information at the top of the solution file to your specific version or import the three project files into a new solution using your version of Visual Studio.

The three project files and their minimum requirements are listed below. I kept the platform needs as minimal as possible and avoided C# features above version 2.0 wherever possible.

  1. FileTypesLib class library (.NET 2.0)
  2. FileTypesDemoWinForms (.NET 2.0)
  3. FileTypesDemoWPF (.NET 3.5)

Win32 Functions

The class named NativeMethods houses several Win32 function calls and their associated structures and enumerations. I've used best practices on all of the methods regarding datatypes, calling conventions, and function activation. Where appropriate, as with the SHGetFileInfo function, threaded and non-threaded examples are provided in the demo applications.

I. SHGetFileInfo

Gathers information about file system objects. Requires a SHFILEINFO structure and flags to tell the function what operation is desired. Note the use of CharSet.Auto with the DllImport attribute. This lets the function work in either Ansi or Unicode environments.

[DllImport("shell32.dll", CharSet = CharSet.Auto)]
internal static extern IntPtr SHGetFileInfo(
	[MarshalAs(UnmanagedType.LPWStr)]
	string pszPath,
	uint dwFileAttributes,
	ref SHFILEINFO psfi,
	uint cbSizeFileInfo,
	uint uFlags);

II. GetBinaryType

Detects whether a file is executable or not. Unfortunately, this function is unreliable. It does not correctly identify 32-bit and 64-bit applications when called on a 64-bit computer. The function returns false for 64-bit apps and leaves the default binary type value (SCS_32BIT_BINARY), while for 32-bit apps, it returns true but sets the binary type as 64 bit. See the demo image for an example of this aberrant behavior.

III. FindMimeFromData

Detects if a file is 1 of 26 different Mime types by examining the first 256 bytes of data provided. This function is sometimes combined with other Mime-detection techniques, as shown in the FileTypes class library's GetContentType method to allow detection of more types.

FileTypes Class Library Methods

The FileTypes class library contains several methods that can be used to determine either the Mime type of a file or whether a file is executable or not. Various limitations of each approach are discussed in the method descriptions below.

I. GetShellInfo

Uses shell32.dll's SHGetFileInfo function to obtain information about the submitted file. SHGetFileInfo returns the file type description. In File Explorer (formerly known as Windows Explorer), this is the string value seen under the Type column such as Application.

The GetShellInfo library method creates a SHFILEINFO structure, sets necessary flags, and then calls SHGetFileInfo from NativeMethods. The friendly type name is returned as a string.

Note: Since the SHGIF_USEFILEATTRIBUTES flag is set, the actual file is not read. Instead, the values that would be available if the file existed are retrieved based on the file extension and the attributes supplied when the function is called. This is, of course, faster than reading the actual file.

public string GetShellInfo(string fileName)
{
	if (String.IsNullOrEmpty(fileName)) return "Invalid file name";

	// Create the SHFILEINFO structure and assign strings to the string pointers
	SHFILEINFO info = new SHFILEINFO();
	info.szDisplayName = String.Empty;
	info.szTypeName = String.Empty;

	// Set the file attribute flags
	uint dwFileAttributes = (uint)SHGetFileInfoAttributes.FILE_ATTRIBUTE_NORMAL;

	// Set the operation flags for retrieving the desired information
	uint uFlags = (uint)(SHGetFileInfoFlags.SHGFI_USEFILEATTRIBUTES |
				 SHGetFileInfoFlags.SHGFI_DISPLAYNAME |
				 SHGetFileInfoFlags.SHGFI_TYPENAME);

	// Call the Win32 function
	NativeMethods.SHGetFileInfo(fileName, dwFileAttributes, ref info, (uint)Marshal.SizeOf(info), uFlags);

	// Return the type name as a string
	return info.szTypeName;
}

II. GetExecutableType

Also uses shell32.dll's SHGetFileInfo function, but with the SHGFI_EXETYPE flag set so it will determine the type of executable. This detection is fairly primitive and is limited to DOS, NT, and OS2 (which is also early Windows). I added two values to the enumeration to tell when the file is a WinNT console application and when the file is not an executable at all.

In addition, the return value contains the operating system version when the detected file type is WinNT executable.

III. GetBinaryType

See discussion above under Win32 methods. I added one value to the enumeration so that non-executables are reported as such.

IV. GetContentType

Combines a dictionary lookup of an embedded resource text file containing several hundred Mime types with a call to the urlmon.dll function FindMimeFromData to determine a file's Mime type. This code is an enhanced version of Tim Schmelter's code from stackoverflow. I changed the Mime type lists from being hardcoded in the source code and put them as name-value pairs (example: afl,video/animaflex) in embedded resource text files so that types can be easily updated. I also tweaked the code a bit here and there.

V. GetMimeType

References a set of readonly byte arrays containing the "magic" identifying bytes contained in the file header of various file types to determine a Mime type. This method is adapted from a posting by ROFLwTIME at stackoverflow. The routine detects 18 different Mime types but can be extended by adding additional magic bytes and Mime types, although I would suggest moving from hardcoded to some form of lookup technique.

VI. GetPEInformation

Uses a slightly modified version of the PEReader class from Mono to gather facts about an executable-style file submitted for inspection. The method reads the Portable Executable (PE) bytes from a file header and makes various information available. As seen in the demo image above, this information includes such things as the machine type a file runs on, if the file is an executable or a DLL, if it is a 32-bit application, and the file creation datetime.

My modifications were minimal, mostly concerned with preventing crashing when a non-executable file is passed in. The GetPEInformation method in FileTypes uses a switch and some if statements to parse the data available from PEReader into human readable form.

Points of Interest

According to the SHGetFileInfo documentation, "You should call this function from a background thread. Failure to do so could cause the UI to stop responding." Because of that disclaimer, I provided both threaded and non-threaded versions. For example, GetShellInfo and GetShellInfoThread. The threaded versions are used in the WPF demo while the Windows Forms demo uses the non-threaded versions. For readily accessible local files, non-threaded should be fine but use the threaded version (and perhaps add more robust error checking) for networked or other slow-to-access files.

The icon that is shown on the demo forms is obtained by a call to SHGetFileInfo with the appropriate flag. As mentioned, there are threaded and non-threaded versions supplied.

It is not exceedingly difficult to convert a Windows Forms application into a WPF application. For my demo applications, the major changes were in how images are handled and how control invocations are performed. Here are brief examples showing the difference between loading an icon into WinForms PictureBox and WPF Image and showing how to invoke threaded WinForms and WPF controls.

WinForms PictureBox (non-threaded)

IntPtr handle = fileUtilities.GetLargeIcon(file);
if (handle != IntPtr.Zero)
{
	// Convert the icon to a bitmap and load it to the control
	pictureBox1.Image = Bitmap.FromHicon(handle);
}

WPF Image (non-threaded)

IntPtr handle = fileUtilities.GetLargeIcon(file);
if (handle != IntPtr.Zero)
{
	// Convert the icon to a bitmap and load it to the control
	image1.Source = Imaging.CreateBitmapSourceFromHIcon(
		handle, Int32Rect.Empty, BitmapSizeOptions.FromEmptyOptions());
}

WinForms TextBox Threaded Callback

if (textBox1.InvokeRequired)
{
	SetTextBoxCallback d = new SetTextBoxCallback(TextBoxCallback);
	this.Invoke(d, new object[] { sender, eventArgs });
}
else { textBox1.Text = eventArgs.SHFileInfo; }

WPF TextBox Threaded Callback

if (!textBox1.Dispatcher.CheckAccess())
{
	SetTextBoxCallback d = new SetTextBoxCallback(TextBoxCallback);
	Dispatcher.Invoke(d, new object[] { sender, eventArgs });
}
else { textBox1.Text = eventArgs.SHFileInfo; }

History

Version 1.0 released 12/1/2014. CPOL license.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Ed Gadziemski
Founder Choycer
United States United States
Ed has over 40 years experience in computer technology and a bachelor's degree in Business Administration. He's currently a marketing technology consultant. During his career, he's led software development departments and created software still in use in the communications and healthcare industries. Ed is a veteran of the United States Army. He lives in Arizona in the United States.

Find Ed on Linkedin.

This material is copyright 2019 by Ed Gadziemski. Unauthorized use is strictly prohibited. All rights reserved.

Comments and Discussions

 
QuestionMy vote of 4 Pin
PPS Gusain10-Dec-14 1:55
memberPPS Gusain10-Dec-14 1:55 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Article
Posted 2 Dec 2014

Stats

17.7K views
621 downloads
9 bookmarked