Click here to Skip to main content
15,860,972 members
Articles / Desktop Programming / WPF

Yet Another Duplicate File Detector: MVVM Pattern

Rate me:
Please Sign up or sign in to vote.
5.00/5 (6 votes)
11 Jun 2016CPOL14 min read 13.3K   548   7   5
A simple utility to find duplicate files in a folder (including sub-directories).

Github: https://github.com/kaviteshsingh/DuplicateFileDetectorMVVM

Please do vote and provide feedback..

Introduction

I recently wrote an article (http://www.codeproject.com/Tips/1073760/Yet-Another-Duplicate-File-Detector) on how to find duplicate files in a folder.

I got good feedback from community to improve performance and at the same time I took the initiative to learn Model-View-ViewModel (MVVM) pattern. The concept still remains the same that two files are considered same if MD5 hash of two files is same. 

I will use this article to describe my learnings about MVVM pattern. Few key differences of this article when compared to the previous version:

  1. Performance Impovement: This application only calculates hash of the file when a duplicate is detected. In previous application, hash was calculated for every file and then SQlite would figure out duplicates. You can see huge performance boost especially if files in directory are of huge size.
  2. MVVM pattern support
  3. Removed SQLite dependency and using .Net specified Dictionary and ObservableCollection.
  4. More information on UI screen about scan progress. 

MVVM Pattern

Internet is filled with plenty of MVVM tutorials and design discussions. For me to get started apart from the basic WPF programming knowledge, these two articles helped a lot to gain MVVM pattern understanding. 

  1. https://msdn.microsoft.com/en-us/magazine/dd419663.aspx
  2. http://www.markwithall.com/programming/2013/03/01/worlds-simplest-csharp-wpf-mvvm-example.html

First article can be a little daunting to begin with. I spent good amount of time going through the code to gain understanding of what the article was trying to explain. The article is pretty dense for a beginner but going over it again and again helps understand what author is doing the code. 

Second article is as the name say 'the simplest MVVM example' and truely is. This just focusses on the core concept of how an applcation can be considered MVVM.

Since data binding is an important concept in MVVM, these below tutorials might be helpful if you want to get a good overview of data binding concepts

  1. https://msdn.microsoft.com/en-us/library/ms750612(v=vs.100).aspx
  2. https://msdn.microsoft.com/en-us/library/aa480224.aspx

I have tried to base my application on these principles described in the article and you will see RelayCommand.cs is used from first artcile to relay commands from UI (user-interface) to View Model. 

Some MVVM Concepts

I would like to describe my learnings about MVVM pattern here.

The MVVM pattern implies that UI should be completely separated from the actual application logic. This means that application logic (referred as View-Modem) should have no reference or direct coupling with the application UI (referred as View). It would also imply that there should not be any business logic in the xaml code. XAML code should only be used for UI related work.

The Model is basically classes which contain information that View-Model needs to know and perform action on View ie. UI never talks to Model directly and information flow form Model -> View-Model -> View or vice versa.

To provide communication between UI and View Model, WPF provides Data-Binding feature (refer links above). This means that you can bind content of any user control with some entity(normally property) on the ViewModel. For example, if you have list of Names shown on the UI by a ListBox, the ItemSource properly of ListBox control can be bound to a regular List (more on ObservableCollection later) in the ViewModel. 

One fundamental thing to note is that even when you bind a control to a View-Model via Data-Binding feature, you may need to subscribe to change notification of the values so that both View and View-Model are in sync. Imagine a View which shows a list of Names and provide an option to add new ones. In this case, user will fill in the information about new user and press submit. At this point, this new user should be shown in the list and it should also be synced back to View-Model list. 

To solve this problem WPF provides an interface called INotifyPropertyChanged. Any object /class which implements this interface will notify the UI whenever something changes. Normally, public properties exposed by View-Model can be bound to any UI control in View. This means View-Model should implement INotifyPropertyChanged feature of these properties so that they can raise an event whenever value is changed and UI can update information accordingly. You can select one way binding, two way binding between source and target in xaml code.

Secondly, I found some people on internet say that INotifyPropertyChanged should only be implemented in the View-Model properties and Model should be simple class describing the data. In my application, I went ahead and implemented the INotifyPropertyChanged property for even my Model. This is to ensure if anything in the model changes it will also raise an event. 

Since I mentioned above that there should not be any direct communication between View and Model-View, we need some mechanism to notify View-Model when some event happens on the View. For example, in my application we have a 'Begin Scan' button. When the user clicks on the button, View-Model should be notified that it should be begin scanning a directory. This can be implemented by using WPF's ICommand interface. Buttons and some other user controls provide Command option to talk to View-Model. I am using RelayCommand.cs file from Josh Smith's article referenced above to implement the ICommand interface for communication between View and View-Model.

Another concept to discuss is ValueConverters. In my application, when I call .NET API to get file detail, it provides file size in Bytes. I use this information in bytes to create a dictionary and use it as a key. However, I would like to display the information on the UI screen in KiloBytes. Now I have one easy way of dividing the value I got from .NET API by 1024 and save it in my ViewModel as a file size property. I can then bind this value to a textblock to show the file size in KiloBytes. So, I am creating a duplicate property to save information in bytes and kilobytes because I need both these values in my application. 

But there is a problem in the above approach. We created a new property just to save information in KiloBytes because View expects that value. Secondly, if later on I decide to change this value to MegaBytes, I would again change the ViewModel and store the value in new property (or can reuse kilobyte property). But the requirement came from View so ideally I should not touch ViewModel. 

Here comes ValueConverters for rescue. As the name suggest, ValueConverter are used by xaml code i.e. View and will use value in bytes and it will cahnge it to KiloBytes by dividing it by 1024. Tomorrow, if i decide to change the size in MegaBytes, I would change the ValueConverter in the View instead of ViewMode.  So in short, I removed the KiloByte property from ViewModel and made the display of file size completely in the View. Refer the file FileSizeInBytesToKiloBytes.cs in the code.

Now let's dive in the code and see how things are implemented. 

View (XAML)

Before I get into the code details let me show a snapshot of the application in action and then I will explain things step by step.

Image 1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I mentioned earlier that simplest way to bind ModelView with View is to expose properties by ModelView and then use data binding. So from the above UI we can see that we need to expose all the properties which can change when we are scanning a directory. Some of them are:

  1. Last Found File
  2. Last Found File Size (KB) - remember ValueConverters
  3. Total Files Found
  4. Total File
  5. For Hash Calculation
    1. Last File
    2. Last File Hash
    3. Last File Size (KB)
    4. Size Buckets
    5. Potential Duplicates

Also, I am using DataGrid to show all the duplicates and some additional properties of the file like fle name, full file path, size along with the hash. 

I am breaking the xaml code in MainWindow.xaml (View) to show the data binding in View-Model. The first step is to create an instance of VIewModel and let the View i.e. xaml know about it. One way of creating this is by using DataContext property option in the xaml. This ensures that when view is loaded, xaml will create an instance of the ViewModel. Secondly, all the controls can bind to the public properties exposed by the ViewModel.

XML
<Window.DataContext> 
    <ViewModel:MainViewModel/> 
</Window.DataContext> 

I mentioned earlier that we will use ValueConverter to display file size in KiloBytes. I have another ValueConverter in the code i.e. InScanProgressToBrowseButton. When scan is in progress, you would like to disable the Browse button. In my ViewModel I have a property called IsScanInProgress which is set to TRUE when scan is in progress. At this point, I would like to set IsEnabled property to FALSE for browse button. Since this is inverse relation of Scan in progress and enabling browse button, a ValueConverter is needed to update state. 

Since these ValueConverters will be used by xaml, we define it in the resource section of xaml as below

XML
<Window.Resources> 
    <ValueConverters:InScanProgressToBrowseButton x:Key="InScanProgressToBrowseButton"/> 
    <ValueConverters:FileSizeInBytesToKiloBytes x:Key="FileSizeInBytesToKiloBytes"/>
</Window.Resources>

I also added Keybinding to add a shortcut to scan. Please note that you would need to first browse to a directory and then you can use Ctrl+S to trigger a scan. If you use the keyboard shortcut without selecting a directory, it would not work. I am yet to hook up browse button functionaliy if user uses the keyboard shortcut without setting the directory path. 

XML
<Window.InputBindings>
    <KeyBinding Key="S" Modifiers="Control" Command="{Binding CmdBeginScan}" /> 
</Window.InputBindings>

You would have noticed I am using Command option to send a command from View to ModelView. I mentioned earlier that ViewModel will expose ICommand interface properties which View will execute. In this command, I am not passing any argument to the ViewModel. 

Now let us see some of the properties and how they are bound to the controls. All the controls like label/button etc. bound to the public properties ViewModel exposes. 

You will see in the below xaml code that DirScanButton is using the same command as KeyBinding for triggering scan i.e. CmdBeginScan. This is helpful if sometime in future I would like to change the implementation of CmdBeginScan. The UI will automatically update both KeyBinding and Button option when Command is changed.

XML
<TextBlock x:Name="TextBlockFolderPath" Text="{Binding FolderPath}" /> 

<Button x:Name="BrowseButton" Content="Browse" IsEnabled="{Binding IsScanInProgress, Converter={StaticResource InScanProgressToBrowseButton}}" /> 

<Button x:Name="DirScanButton" Command="{Binding CmdBeginScan}" IsEnabled="{Binding IsScanInProgress, Converter={StaticResource InScanProgressToBrowseButton}}" Content="{Binding ScanButtonCaption}"/> 

<Label Content="{Binding FileEnumCurrentFile.FileName}"/> 

<Label Content="{Binding FileEnumCurrentFile.Size, Converter={StaticResource FileSizeInBytesToKiloBytes}}" /> 

<Label Content="{Binding FileEnumTotalFilesScanned}" /> 

<Label Content="{Binding FileEnumTotalTimeForScan}"/> 

<Label Content="{Binding HashCurrentFile.FileName}"/> 

<Label Content="{Binding HashCurrentFile.Hash}" /> 

<Label Content="{Binding HashCurrentFile.Size, Converter={StaticResource FileSizeInBytesToKiloBytes}}"/> 

<Label Content="{Binding NumberOfBuckets}"/> 

<Label Content="{Binding DuplicateFileList.Count}"/>

Let us discuss now the DataGrid which manages the list of duplicate files. The DataGrid should support few things:

  1. Should support naming the Column header 
    • AutoGenerateColumns="False"
    • <DataGrid.Columns>
  2. Should allow selection of multiple rows
    • SelectionMode="Extended"
  3. When selected, full row is selected and highlighted.
    • SelectionUnit="FullRow"
  4. Each row should have alternating color for easy readibilty. 
    • AlternatingRowBackground="Gainsboro"
    • AlternationCount="1"
  5. If after selection, user decides to unselect the row, he/she can use Ctrl + row click to unselect. This is default WPF behavior
  6. If user selects in empty area of DataGrid, all items should be unselected.
    • MouseLeftButtonDown="ResultDataGrid_MouseLeftButtonDown"
  7. Full file path name can be long and may not fit visible area. So when user hovers mouse over a row, small tooltip will show full file path. You can see that in the above screenshot. 
    • I use <DataGrid.RowStyle> and then <Style.Triggers> property to display a tooltip when IsMouseOver property is true.
  8. The ItemSource should be bound to a collection exposed by ViewModel
    • ItemsSource="{Binding DuplicateFileList}"
  9. Selected item/items should be send back to ViewModel for deletion and ViewModel should delete the items from the collection and same should reflect in the View i.e. DataGrid. 
    • More on this when I disuss about delete button handling.
<DataGrid x:Name="ResultDataGrid" SelectionMode="Extended" SelectionUnit="FullRow" 
          IsReadOnly="True" AlternatingRowBackground="Gainsboro" AlternationCount="1" 
          ItemsSource="{Binding DuplicateFileList}" AutoGenerateColumns="False" 
          MouseLeftButtonDown="ResultDataGrid_MouseLeftButtonDown">

    <DataGrid.RowStyle>
        <Style TargetType="DataGridRow">
           <Style.Triggers>
             <Trigger Property="IsMouseOver" Value="True">
               <Setter Property="ToolTip" Value="{Binding FullFilePath}" />
             </Trigger>
           </Style.Triggers>
        </Style>
    </DataGrid.RowStyle>

    <DataGrid.Columns>
      <DataGridTextColumn Header="Hash" Width="2.5*" Binding="{Binding Hash}"/>
      <DataGridTextColumn Header="File Name" Width="2*" Binding="{Binding FileName}"/>
      <DataGridTextColumn Header="Full File Path" Width="4.5*" Binding="{Binding FullFilePath}" />
      <DataGridTextColumn Header="Size (bytes)" Width="1*" Binding="{Binding Size}" />
    </DataGrid.Columns> 
</DataGrid>

Once the directory is scanned, Delete Selected Files come into action. Firstly, this button should only be enabled when atleast one item is selected. Any selected item/items basically are part of a list exposed by DataGrid as SelectedItems property. When no item is selected SelectedItems.Count is 0 and when selected some positive value. So IsEnabled property of button is updated whenever count value changes. 

Now, when items are selected, we need to send the SelectedItems List is sent back to ViewModel. Since UI will not do any application logic code i.e. delete files, we need to send the selected files via a command exposed by ViewModel. In this case we send a command parameter (SelectedItems) along wth the command to the ViewModel. 

XML
<Button x:Name="DeleteItemsButton" Content="Delete Selected Items"
Command="{Binding CmdDeleteSelectedList, Mode=OneWay}" CommandParameter="{Binding SelectedItems, ElementName=ResultDataGrid}"
IsEnabled="{Binding SelectedItems.Count, ElementName=ResultDataGrid, Mode=OneWay}"/>

Last but not least, we need to display errors which might happen when deleting files. The most common one is when you try to delete a file that is in use. So I created a ListBox component which will display items in Red and bind to ErrorList of ViewModel. 

XML
<ListBox Foreground="Red" ItemsSource="{Binding Path=ErrorList}" />

As an example, I opened the exe I was trying to delete and it threw an error and it was captured in the Listbox

 

Image 2

 

ViewModel

In the earlier section we saw bunch of properties binding to the public properties of the ViewModel to View. All these public properties implement INotifyPropertyChanged interface. To make things simpler and reuse code, I created an abstract class named INotifyBase.cs which implements this interface. Both ViewModel and Model can derive from this class and simply call the function exposed to raise the property change event. 

In the code you will see a base model class ViewModelBase.cs is present. This class derives from INotifyBase.c so it provides support to raise property change events. 

MainViewModel.cs derives from ViewModelBase.cs so it has all the above features. I am showing one example of how to raise the property change event in code for a property.

C#
private ObservableCollection<FileDetail> _DuplicateFileList; 
public ObservableCollection<FileDetail> DuplicateFileList 
{ 
    get { return _DuplicateFileList; } 
    set { _DuplicateFileList = value; base.OnPropertyChanged("DuplicateFileList"); } 
}

You will notice that DuplicateFileList is used in binding for ItemSource to DataGrid showing duplicate files. Secondly, this is not an ordinary generic List but ObservableCollection. The reason to choose ObservationCollection instead of normal list is because ObservableCollection has inbuilt INotifyProperty support. This means whenever an item is added/removed/updated in the list it will generate the property change notification and UI control will update itself accordingly.

One thing to keep in mind is that since Observable Collection is bound to UI component, it should be modified only in UI thread and not in any worker thread. If accessed outside of UI thread, application will throw an exception. You will notice whenever I am accessing this collection, I am using Dispatcher
Application.Current.Dispatcher.Invoke((Action)(() => 
{ 
   // access in UI thread. 
   DuplicateFileList.Add(item); 
}));
You also saw, I exposed bunch of Commands for button operation. I am using RelayCommand to implement ICommnad Interface provided by Josh Smith.
C#
        private ICommand _CmdBeginScan;
        public ICommand CmdBeginScan
        {
            get
            {
                if (_CmdBeginScan == null)
                {
                    _CmdBeginScan = new RelayCommand(
                        param => this.CmdBeginScanHandler()
                        );
                }
                System.Diagnostics.Debug.WriteLine("_CmdBeginScan..");
                return _CmdBeginScan;
            }
        }

You will notice I provided a function CmdBeginScanHandler which will be called when Command is received. Since scanning a folder and then calculating hashes can take quite some time, it would not be a good thing to block the UI thread. As a results, both CmdBeginScanHandler and CmdDeleteSelectedListHandler are handled asynchronously. 

Just to show two approaches, I am using async/await and threadpool to handle these commands. 

C#
async void CmdBeginScanHandler()
{
    System.Diagnostics.Debug.WriteLine("CmdBeginScanHandler..");
    await Task.Run(() => ThreadPoolWorkerFileEnumeration(this));
}

void CmdDeleteSelectedListHandler(object param)
{
    System.Diagnostics.Debug.WriteLine("CmdDeleteSelectedListHandler..");
    ThreadPool.QueueUserWorkItem(ThreadPoolWorkerDeleteSelectedItems, param);

}

If you recall, when user selects some files to delete and presses the Delete button, CmdDeleteSelectedList Command is triggered. We pass the SelectedItems as argument to the Command so that ViewModel can handle them.

In order to retrieve the list from the command since the list is passed as object is below:

System.Collections.IList list = (System.Collections.IList)state;

Basically, the object passed in SelectedItems supports IList interface and we can use that interface to iterate through the list. 

Also, I provide an ErrorList also as ObservableCollection so that Error Listbox on the UI is updated just in case some any error is encountered while deleting the files.

Logic to Figure out Duplicates Files

I wanted to take a moment to describe the logic to figure out duplicate files. I am using two Dictionary Collection to find duplicates

Dictionary<UInt64, FileDetail> _FileSizeInfo = new Dictionary<UInt64, FileDetail>();

Dictionary<string, List<FileDetail>> _HashInfoFiles = new Dictionary<string, List<FileDetail>>();
When a file is first found, it is put in _FileSizeInfo based on the file size. If there is no file with that size, a new entry is added to this dictionary. At this point, no hash is calculated.
If during a scan, another file with same size is found we calculate the hash of file already in the _FileSizeInfo and the new item and add it to the _HashInfoFiles dictionary. _FileSizeInfo uses size as key and helps detect if we found two or more files with same size.
_HashInfoFiles uses hash as the key and it maintains a list of files as value. If any file has a hash same as an existing one, then this file is added to the list of files. 
Once the full scan is done, the _FileSizeInfo is cleaned up and _HashInfoFiles is iterated to remove items with list of count 1. The final result gives us the list of files which has same hashes. 

Model

The model for this application is simple class to keep file details. FileDetail.cs derives from INotifyBase class so that it can support INotifyPropertyChanged interface. The properties exposed by model are:

  1.  public string FileName
  2.  public string FullFilePath
  3.  public UInt64 Size
  4.  public string Hash

Feedback

This is my first attempt at MVVM programming so I welcome suggestions on best practices or if you find bugs. One thing I am yet to implement is that while scanning a directory user cannot stop the scan. Only way to stop the scan is to close the app and then re-open. Hopefully, I will get that added in next version 

History

First release.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
PraiseVerify useful for beginner as me. Pin
tindecken16-Jun-16 21:56
tindecken16-Jun-16 21:56 
GeneralRe: Verify useful for beginner as me. Pin
kaviteshsingh17-Jun-16 6:41
kaviteshsingh17-Jun-16 6:41 
SuggestionSome improvements Pin
Mathew Sachin14-Jun-16 8:25
Mathew Sachin14-Jun-16 8:25 
GeneralRe: Some improvements Pin
kaviteshsingh14-Jun-16 17:31
kaviteshsingh14-Jun-16 17:31 
GeneralRe: Some improvements Pin
Mathew Sachin14-Jun-16 18:09
Mathew Sachin14-Jun-16 18:09 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.