Click here to Skip to main content
14,880,365 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Sir, I am willing to find the duplicate files using C++. In-order to achieve this,
I should iterate over all files present in the drive and get its file size and then finding duplicate keys using a map.
So, I've created a map in which key is the size of the file and value is the path of the file. Here is my member function,
C++
bool duplicateFinder::processDrive(const wchar_t* sDir)
{
	// referred http://www.stackoverflow.com/questions/2314542/listing-directory-contents-using-c-and-windows
	//Map creation and usage
	map<int, wchar_t*> duplicate;
	map<int, wchar_t*>::iterator iterate;
	WIN32_FIND_DATA fdFile;
	HANDLE hFind = NULL;

	wchar_t sPath[2048];
	wsprintf(sPath, L"%s\\*.*", sDir);

	if ((hFind = FindFirstFile(sPath, &fdFile)) == INVALID_HANDLE_VALUE)
	{
		wprintf(L"Path not found: [%s]\n", sDir);
		return false;
	}

	do
	{
		
		if (wcscmp(fdFile.cFileName, L".") != 0
			&& wcscmp(fdFile.cFileName, L"..") != 0)
		{
			
			wsprintf(sPath, L"%s\\%s", sDir, fdFile.cFileName);
			if (fdFile.dwFileAttributes &FILE_ATTRIBUTE_DIRECTORY)
			{
				wprintf(L"Directory: %s\n", sPath);
				processDrive(sPath); 
			}
			else
			{
				//wprintf(L"File: %s\n", sPath);
				char** arr;
				char* hash = new char[MAX_PATH];
				memset(hash, 0, MAX_PATH);
				int correction;
				correction = wcstombs(hash, sPath, MAX_PATH);
				//arr = CALL_MD5_Function(hash);
				iterate = duplicate.find(getFileSize(hash));
				if (iterate != duplicate.end())
				{
					cout << "\n\n FOUND THE VALUE " << iterate->second;
				}
				else
				{
					duplicate.insert(pair<int, wchar_t*>(getFileSize(hash), sPath));
				}
			}
		}
	} while (FindNextFile(hFind, &fdFile));
	FindClose(hFind);
	return isDuplcateFound;
}

In the above code whenever a file is found the size of the file is calculated using getFileSize function(This is not a Win32 API function.It is Native C++ user defined function.this function returns size of the file in bytes e.g: 4278) and before inserting it to the map the presence of the key is checked using "find" function in maps if the key is not present then it is inserted into the map. But if the key is found using the "iterate" function the path should be displayed.

But whenever the duplicate file is found the out put returned is an address like this FOUND THE VALUE A012556 I don't know why this error.I tried iterate->first and found the size of the file displayed correctly but those two files or not the duplicates.

Kindly help me sir with this.
Thank you for your time sir.

What I have tried:

I have tried:

1. Before using maps I first did make sure that the program is iterating over all files present in the drive by printing the names of the files.

2. Then I made sure that "getFileSize" function works correctly by printing sizes of all files present in the drive.

3. After I've tested is MD5 Hash is computed correctly by printing the hashes of all files in the directory.[Since I'm getting errors in adding sizes of files to the map I didn't develop this further more because it is the next step when I found the file of same size]

After several tries and modification above three worked fine. then i moved on to the next step which is adding the value.
I referred my school notes[This is not my Homework] then internet about adding the values to the map and found that i did correctly so I'm unsure of the error.

Kindly help me sir with this.
Thank you for your time sir.
Posted
Updated 30-Apr-16 22:39pm

1 solution

std::cout treats a variable of type 'wchar_t*' as a pointer, not a string. The easiest way to fix your code would be to send output to std::wcout, which is defined in the same header (iostream).

Another possible solution would be to modify your std::map<int,wchar_t*> to std::map<int,wstring>, but this would require modifying other parts of your program as well.
   
v2
Comments
[no name] 1-May-16 8:22am
   
Thank you sir, for your kind help and time it worked. But is it possible to get the duplicate value pairs? because I could get only the location of second duplicate file. To my knowledge there is no inbuilt functionality in maps so could you please suggest me a fastest way to do this.
Once aging Thank you sir for your time
Daniel Pfeffer 1-May-16 8:56am
   
One way would be to use std::multimap, which allows multiple instances of the same key. You would then have to iterate over all the instances of the key in order to check for the presence of the duplicate.
Another way would be to use a map of std::map<int, std::set<wstring> >, and then insert each duplicate file name in the set. You would then have a two-phase lookup: find files of the appropriate size, and then check the set for the presence of the file name.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900