Click here to Skip to main content
15,907,281 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I would like to write a code that would get the URL and File Type when a download occurs. I would like to get this info from the OSI Session/Presentation layer. Does anyone know where I should begin? Thank you.
DS
Posted
Comments
Dave Kreskowiak 27-Mar-11 18:13pm    
You haven't defined what you mean by "download" in your post, nor what you want to do with it. If you're looking for any file at all, you'd have to write a proxy server to do this.
Member 7766180 27-Mar-11 19:02pm    
Hi Dave, by download, I mean a file coming off the internet. Either a torrent or files. I'm trying to do this without a proxy server. I would like to use window services to run the applet. I just need a message box to pop up that a file came from URL and type of file it is. Thank you. DS

Firstly, start simple, then go big. Make a console application that will do what you want and print heaps of status messages to the screen, then put it into a service.

If you download from HTTP then 1 of the http headers is the content type, this will often be helpful for things like images. Other than that the file extension and file magic are the only ways to tell the file type.

File magic is (usually) the first X bytes of the file, some examples are:

EXT	HEX		ASCII
avi	52 49 46 46	RIFF
mp3	49 44 33	ID3
zip	50 4B 03 04	PK..


filext.com[^] is a good source for the magic (Identifying characters as it calls it), just search for the extension.

There is an open source library for downloading torrents called libtorrent[^]. It is a pretty good library supporting most features and has a great command line example that isn't too hard to follow. It doesn't compile with Visual Studio too easily tho.

HTTP is a really simple text based protocol that you can implement if you know how to use winsock, othersies use search[^]
 
Share this answer
 
Comments
Member 7766180 28-Mar-11 11:39am    
Thank you so much Andrew. This is very helpful. I will tryout your suggestions and return feedback later.
DS
Member 7766180 17-Apr-11 11:41am    
Andrew what exactly is file magic? Is there code that I can incorporate into my code?
Thanks
DS
Andrew Brock 18-Apr-11 2:05am    
File magic is also commonly referred to as file signature.
A lot of the common file formats have a sequence of bytes known as the magic, generally at the very start of the file.
For the example I provided in the answer, a .AVI file starts with the characters "RIFF" (so does .WAV). This means that a media player, such as VLC, Media Player Classic, in some cases Windows Media Player and a lot of other players, can determine what the file is without knowing the extension. Try it, rename a .avi file to something like abc.mkv and play it. It will still play fine, even tho it has a bad extension and windows may not know what it is. The reason is the magic, that is identifying the file as a .AVI (or .WAV). Check out http://en.wikipedia.org/wiki/File_format#Magic_number for more information. For a list of some of the common magics, check out http://en.wikipedia.org/wiki/List_of_file_signatures.
Member 7766180 18-Apr-11 7:10am    
Thank you very much! Is there code to read this "Magic"? I think I know what it is, thanks to your help! But how does one access or read the "Magic"?
DS
Member 7766180 wrote:
Is there code to read this "Magic"? I think I know what it is, thanks to your help! But how does one access or read the "Magic"?

Well, it is generally just the start of the file, so read in say the first 8 bytes and check them.

The following is complete code for a C/C++ windows console application that will tell you the file type based on the file. It hasn't been well tested, but you should get the idea.

This is a simple one, that you can hopefully follow easily:
#include <Windows.h>
#include <stdio.h>
typedef struct {
	DWORD nLength; //The size of the magic. Must be <8, or increase the size of BYTE pFileStart[8]; at the top of GetFileType
	char *pMagic;  //The exact bytes that must match
} FileMagic;
FileMagic g_fmFileTypes[] = {
	{ 4, "RIFF" },      //index 0
	{ 3, "ID3" },       //index 1
	{ 4, "PK\x03\x04" } //\xNN gives the character with the hex code NN.
};
LPCSTR GetFileExtensions(int nMagicID) {
	switch (nMagicID) {
		case 0: //RIFF
			return "mp3";
		case 1: //ID3
			return "avi;wav";
		case 2: //PK\x03\x04
			return "zip";
	}
	return ""; //No file extension / unknown
}
int GetFileType(LPCSTR szFileName) {
	BYTE pFileStart[8];
	DWORD nBytesRead;
	int nFileType;
	BYTE *pData = NULL;
	int nMagicID = -1;
	//Open the file for reading
	HANDLE hFile = CreateFile(szFileName, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
	if (hFile == NULL) {
		return -2;
	}
	if (!ReadFile(hFile, pFileStart, sizeof(pFileStart), &nBytesRead, NULL) || nBytesRead != sizeof(pFileStart)) {
		CloseHandle(hFile);
		return -2;
	}
	//Loop through file magics defined in g_fmFileTypes to find a match
	for (nFileType = 0; nFileType < ARRAYSIZE(g_fmFileTypes); ++nFileType) {
		if (memcmp(g_fmFileTypes[nFileType].pMagic, pFileStart, g_fmFileTypes[nFileType].nLength) == 0) { //Does the magic match?
			nMagicID = nFileType; //Save the ID of this file type
			break; //Stop searching
		}
	}
	CloseHandle(hFile);
	return nMagicID;
}
int main(int argc, char *argv[]) {
	if (argc != 2) {
		printf("Usage: %s <file to test>\n", argv[0]);
	}
	int nMagicID = GetFileType(argv[1]); //Get the file magic
	if (nMagicID == -2) {
		puts("An error occured.");
	} else if (nMagicID == -1) {
		puts("The type of this file is unknown.");
	} else {
		printf("The file type(s) are: %s\n", GetFileExtensions(nMagicID)); //Now print the possible file extensions
	}
	return 0;
}

And a more complex one that works if the magic isn't at the start of the file:
#include <Windows.h>
#include <stdio.h>

typedef enum {
	FMID_ERROR,   //File not found, permission denied, ...
	FMID_UNKNOWN, //The file format is unknown
	FMID_ID3,     //MP3
	FMID_RIFF,    //AVI or WAV
	FMID_PK,      //ZIP
} FileMagicID;

typedef struct {
	DWORD nOffset;         //The position from the start of the file of the magic. Usually 0.
	DWORD nLength;         //The size of the magic
	char *pMagic;          //The exact bytes that must match
	FileMagicID eFileType; //A unique number to identify the file in this program. Has no meaning outside of this code.
} FileMagic;

FileMagic g_fmFileTypes[] = {
	{ 0, 4, "RIFF", FMID_RIFF },
	{ 0, 3, "ID3", FMID_ID3 },
	{ 0, 4, "PK\x03\x04", FMID_PK } //\xNN gives the character with the hex code NN.
	//Add your file formats in here in the format
	//{ [offset of magic], [length of magic], [magic as either a string "abc" or array { 'a', 'b', 'c' }], [the internal identifier, FMID_???] }
};

FileMagicID GetFileType(LPCSTR szFileName) {
	BYTE pFileStart[8];
	DWORD nBytesRead;
	DWORD nFileType;
	BYTE *pData = NULL;
	FileMagicID idMagic = FMID_UNKNOWN;
	HANDLE hFile = CreateFile(szFileName, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
	if (hFile == NULL) {
		return FMID_ERROR;
	}
	if (!ReadFile(hFile, pFileStart, sizeof(pFileStart), &nBytesRead, NULL) || nBytesRead != sizeof(pFileStart)) {
		CloseHandle(hFile);
		return FMID_ERROR;
	}
	for (nFileType = 0; nFileType < ARRAYSIZE(g_fmFileTypes); ++nFileType) {
		if (g_fmFileTypes[nFileType].nOffset + g_fmFileTypes[nFileType].nLength < sizeof(pFileStart)) { //Do we already have enough data read in?
			if (memcmp(g_fmFileTypes[nFileType].pMagic, pFileStart + g_fmFileTypes[nFileType].nOffset, g_fmFileTypes[nFileType].nLength) == 0) { //Does the magic match?
				idMagic = g_fmFileTypes[nFileType].eFileType; //Save the ID of this file type
				break; //Stop searching
			} //else file is not this type, keep searching
		} else { //No, we have to read it
			pData = (BYTE *)malloc(g_fmFileTypes[nFileType].nLength);
			if (pData == NULL) { //Not enough memory
				idMagic = FMID_ERROR;
				break;
			}
			nBytesRead = SetFilePointer(hFile, g_fmFileTypes[nFileType].nOffset, NULL, FILE_BEGIN); //Seek to where the magic should be
			if (nBytesRead == g_fmFileTypes[nFileType].nOffset) { //Is the file big enough?
				if (ReadFile(hFile, pData, g_fmFileTypes[nFileType].nLength, &nBytesRead, NULL) && nBytesRead == g_fmFileTypes[nFileType].nLength) { //Was the file big enough to read all the data?
					if (memcmp(g_fmFileTypes[nFileType].pMagic, pData, g_fmFileTypes[nFileType].nLength) == 0) { //Does the magic match?
						idMagic = g_fmFileTypes[nFileType].eFileType; //Save the ID of this file type
						free(pData);
						break; //Stop searching
					}
				} //else file is not this type, keep searching
			} //else file is not this type, keep searching
			free(pData);
		}
	}
	CloseHandle(hFile);
	return idMagic;
}

//A simple function for getting a file extension from the internal identifier, FMID_???
LPCSTR GetFileExtensions(FileMagicID fmMagic) {
	switch (fmMagic) {
		case FMID_ID3:
			return "mp3";
		case FMID_RIFF:
			return "avi;wav";
		case FMID_PK:
			return "zip";
	}
	return ""; //No file extension / unknown
}

int main(int argc, char *argv[]) {
	if (argc != 2) {
		printf("Usage: %s <file to test>\n", argv[0]);
	}
	FileMagicID fmMagic = GetFileType(argv[1]); //Get the file magic
	if (fmMagic == FMID_ERROR) {
		puts("An error occured.");
	} else if (fmMagic == FMID_UNKNOWN) {
		puts("The type of this file is unknown.");
	} else {
		printf("The file type(s) are: %s\n", GetFileExtensions(fmMagic)); //Now print the possible file extensions
	}
	return 0;
}

And a sample run of the program:
>Magic.exe test.zip
The file types are: zip
>move test.zip test.mp3
>Magic.exe test.mp3
The file types are: zip

Here I run it on a valid .zip file, then renamed the .zip to a .mp3 and tun it again, and it still detects it as a .zip

I would suggest that you get a hex editor and take a look at the binary data of the files to see what I am talking about. I use Breakpoint Software's HexWorkshop[^] its not free but has a trial if u only want to use it for a few days.
 
Share this answer
 
v3
Comments
Member 7766180 18-Apr-11 19:38pm    
WOW! Andrew! I would give you a 10 for that answer! This is really nice of you! I am going to work on this and in a few days let you know what I've turned up. Thank you so much for being so informative!

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900