Click here to Skip to main content
Click here to Skip to main content

Checksum Verification

By , 22 Sep 2011
 

Introduction

More and more we are seeing the importance of storage, archiving, and conservation of digital assets, potentially for decades or centuries. Digital Asset Management systems are mostly focused on ingest and user access, but what about the assets themselves? People are spending a lot of time creating videos, images, and other media. It makes sense that we should verify the data integrity of digital assets on a regular basis and alert when there are discrepancies. The program below does that using SHA-256 checksum. This is a console application written for .NET 4.0.

Background

Prerequisites

  • MySQL Server (and GUI Tools)* available at www.mysql.com
  • MySQL Connector/NET
  • *The database platform is MySQL but it could easily function with SQL Server. One of the things I like about MySQL Connector/NET is that the syntax for working with the database is almost the exact same as MS SQL Server (i.e., SqlCommand becomes MySqlCommand).

  • .NET 4.0/Visual Studio 2010.
  • Gmail account or other for authenticating and sending e-mail via SMTP.

Use Cases

The core Use Case is to check all files in a folder/subfolders and verify that the checksum value for a file has not changed. If it has, then a tech or business owner should be notified by e-mail. The program below does this; it also reports the total number of files, the number of unchanged files, and the number of new files.

The program writes a summary report into the body of the e-mail and attaches two spreadsheets (.csv) with detailed information.

Certain parts of the program are configurable through the app.config file, namely:

  • The folder/subfolders to verify
  • E-mail send to address
  • E-mail sent from address
  • Names and location of file attachments
  • Name and location of log files

The example uses a folder structure that looks like this:

folders.jpg

Using the code

First we will need a database. The attached file named DRMC.sql can be restored to MySQL or you can open the file in Notepad++ and see 1 table, 1 view, and 5 Stored Procedures. The program assumes that the database is called "drmc".

The entire program is in the download file. A few areas that are useful to point out are below:

Connecting and working with the MySQL database via a Stored Procedure is not always straightforward. The Stored Procedures contain some OUT parameters in order to get total counts.

The code below shows how to enumerate all the files in a folder and subfolders, run the checksum, and insert the results into a table:

// Connect to database
string m_conn = 
  ConfigurationManager.ConnectionStrings["MySqlConnectionString"].ConnectionString;
MySqlConnection conn = new MySqlConnection(m_conn);
conn.Open();

try
{
    // Get the checksum and path for all files in directory.
    // Use stored proc to insert data. Log results and errors.
    MySqlCommand cmd = new MySqlCommand("drmc.proc_checksum", conn);
    cmd.CommandType = CommandType.StoredProcedure;

    string[] m_files = Directory.GetFiles(m_path, "*.*", 
                                 SearchOption.AllDirectories);

    foreach (string m_file in m_files)
    {
        m_filename = Path.GetFileName(m_file);
        cmd.Parameters.Clear();
        cmd.Parameters.Add(new MySqlParameter(@"M_FILEPATH", 
            MySqlDbType.VarChar) { Value = m_file.Replace("\\", "\\\\") });
        cmd.Parameters.Add(new MySqlParameter(@"M_FILENAME", 
            MySqlDbType.VarChar) { Value = m_filename });
        cmd.Parameters.Add(new MySqlParameter(@"M_SHA256", 
            MySqlDbType.VarChar) { Value = GetChecksum(m_file).ToString() });
        cmd.ExecuteNonQuery();

        using (StreamWriter sw = File.AppendText(m_results))
        {
            Logger.LogMessage("File " + m_file + 
                   " inserted in database.", sw);
            sw.Close();
        }
    }
  ...

It is also good to know how to work with a MySqlDataAdapter as well as how to get an OUT parameter from ExecuteNonQuery. The snippet below shows how to do this.

// Get number and listing of new files
MySqlCommand cmd_newfiles = new MySqlCommand("drmc.proc_newfiles", conn);
cmd_newfiles.CommandType = CommandType.StoredProcedure;

cmd_newfiles.Parameters.AddWithValue("@M_NEWCOUNT", MySqlDbType.Int32);
cmd_newfiles.Parameters["@M_NEWCOUNT"].Direction = ParameterDirection.Output;
cmd_newfiles.ExecuteNonQuery();

Console.WriteLine("New files: " + cmd_newfiles.Parameters["@M_NEWCOUNT"].Value);
Console.WriteLine("\r\n");

string str_newfiles = cmd_newfiles.Parameters["@M_NEWCOUNT"].Value.ToString();

MySqlDataAdapter sda_newfiles = new MySqlDataAdapter(cmd_newfiles);
DataSet ds_newfiles = new DataSet();
ds_newfiles.DataSetName = "New Files";
sda_newfiles.Fill(ds_newfiles);
sda_newfiles.Dispose();

Finally, we start to build an e-mail with the results and send that, including two attachments created with the FileGenerator class.

// Begin creating email content and attachments
DataTable dt_newfiles = ds_newfiles.Tables[0];
DataTable dt_changedfiles = ds_changedfiles.Tables[0];
DataTable dt_changedfiles1 = ds_changedfiles.Tables[1];

FileGenerator.CreateFile(dt_changedfiles, m_changedfiles).ToString();

string m_emailbody = "This e-mail is a summary of checksum file integrity " + 
                     "for files located here: \r\n\r\n" + m_path + "\r\n\r\n";
m_emailbody = m_emailbody + "There are a total of " + 
              str_totalfiles + " files. \r\n\r\n";
m_emailbody = m_emailbody + 
              "The file location, file name, and checksum are the same for " + 
              str_samefiles + " files. \r\n\r\n";
m_emailbody = m_emailbody + "There are " + str_newfiles + 
              " new files. These are listed below, if any.  " + 
              "Detailed information is in the attached " + 
              "file checksum_new_files.csv\r\n\r\n";
m_emailbody = m_emailbody + "There are " + str_changedfiles + 
              " files where the CHECKSUM HAS BEEN CHANGED. " + 
              "The integrity of the file is in doubt, " + 
              "or it has been changed by a user.  " + 
              "The files are listed below, if any.  Detailed information " + 
              "is in the attached file checksum_changed_files.csv\r\n\r\n";

string m_emailnewbody = "New Files: \r\n" + 
       FileGenerator.CreateFile(dt_newfiles, m_newfiles).ToString();
string m_emailchangedbody = "Changed Files: \r\n" + 
       FileGenerator.CreateFile(dt_changedfiles1).ToString();

m_emailbody = m_emailbody + m_emailnewbody + "\r\n" + m_emailchangedbody;

#endregion

#region Email configuration and send

var client = new SmtpClient("smtp.gmail.com", 587)
{
	Credentials = new NetworkCredential("gmailuserhere", 
                      "gmailpasswordhere"),
	EnableSsl = true
};

MailMessage m_message = new MailMessage(
m_notification_to,
m_notification_from,
m_notification_title + DateTime.Today.ToShortDateString(),
m_emailbody);

//...see downloads for code on creating full attchments

m_message.Attachments.Add(m_new_attachment);
m_message.Attachments.Add(m_changed_attachment);

client.Send(m_message);

The real work in the code is performed by the function below. This piece comes almost verbatim from Jeff Barnes, MS MVP, and his blog article here: http://jeffbarnes.net/blog/post/2007/01/12/File-Checksum-using-NET.aspx.

private static string GetChecksum(string m_fileinput)
{
    try
    {
        string m_checksum;
        using (FileStream stream = File.OpenRead(m_fileinput))
        {
            SHA256Managed sha = new SHA256Managed();
            byte[] checksum = sha.ComputeHash(stream);
            m_checksum = 
              BitConverter.ToString(checksum).Replace("-", String.Empty);
        }

        return m_checksum;

    }
    catch (Exception ex)
    {
        using (StreamWriter swerr = File.AppendText(m_errors))
        {
            Logger.LogMessage(ex.Message.ToString(), swerr);
            swerr.Close();
        }

        return "unable to retrieve checksum";

    }
}

One of the issues you may run into with SHA-256 is that, because of its extended algorithm, it takes longer to create the checksum than say MD5. MD5 will run in less than half the time, which is important if you have terabytes to scan and you are more concerned with the checksum value and not the encryption. The snippet below shows how to use MD5 instead:

MD5 m_md5 = new MD5CryptoServiceProvider();
//SHA256Managed sha = new SHA256Managed();
byte[] checksum = m_md5.ComputeHash(stream);
m_checksum = BitConverter.ToString(checksum).Replace("-", String.Empty);

History

  • 23 September 2011 - Added the MD5 comment to allow for quicker scanning of files.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

smoore4
Database Developer
United States United States
Member
I am an MBA, MCSA, MCDBA, and recently MCITP: Database Administrator 2008. Although employed as a DBA, I do a good deal of sys admin work and development using .NET. Technically, I like to focus on business intelligence, database design, messaging architectures and web development/design.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionMySQL ConnectormemberDave Cross21 Sep '11 - 22:37 
Thank you!
 
I'm sure that you've just saved me many weeks of work on a project I have been putting off for ages.
 
To ease me in gently, can you recommend a quick primer on MySQL using MySQL/Connector/NET? The MySQL pages seem to assume a unix environment and the Connector/NET pages assume familiarity with MySQL. I'm floundering a bit trying to identify the minimum I need to know to set up MySQL for your project.
 
Regards,
 
Dave Cross
AnswerRe: MySQL Connectormembersmoore422 Sep '11 - 21:50 
Ah. Yes, there are certainly a few tricks that are useful. You want the MySQL Community Server (ie, the free version). The setup is really easy on Windows. Just accept all of the defaults, except I usually change the collation to UTF-8 and check the box to add a PATH variable. All of this is through a wizard btw.
 
The real trick is in the tools to connect. The current set is called MySQL Workbench. This is the "Enterprise Manager" of MySQL. I tried it, but ended up going back to an earlier version of MySQL GUI Tools which you can download here:
 
MySQL GUI Tools
 
People also use Toad.
 
Two simple things to know: (1) MySQL Administrator is what you use for users, backup, restore. (2) MySQL Query Browser is what you use for everything else.
 
The .NET DLL is installed with the link below. Just run setup and that is it.
 
http://www.mysql.com/downloads/connector/net/
 
Hope that helps.
 
Steve
GeneralRe: MySQL ConnectormemberDave Cross22 Sep '11 - 23:20 
Thanks again,
I'll find some time over the next week or so to try it out!
 
Dave Cross
GeneralMy vote of 5memberMember 432084421 Sep '11 - 9:35 
Thanks for SHA algorithm creating a constant Hash size
which is the base for integrity checking.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web03 | 2.6.130523.1 | Last Updated 23 Sep 2011
Article Copyright 2011 by smoore4
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid