Click here to Skip to main content
13,145,774 members (54,807 online)
Click here to Skip to main content
Add your own
alternative version

Stats

19.2K views
273 downloads
32 bookmarked
Posted 9 Nov 2015

Cryptographic Hashes: What They Are, and Why You Should be Friends

, 28 Jan 2016
Rate this:
Please Sign up or sign in to vote.
Description of cryptographic hashes and practical examples of how to calculate them

Introduction - What are Hashes?

A hash function is defined as a function that maps data of arbitrary size to data of fixed size. Take the following picture as an example:

That is a hash function that maps names of arbitrary length to an integer. This hash function will simply count how many letters are in the name to find the correspondent integer. Note that, for this function, it is not hard to find examples of different keys that will be mapped into the same integer.

Some modern programming languages, like Ruby, have data structures called hash tables, which are used to implement dictionaries, which maps keys to values. These tables use hash functions to compute an index into an array of slots. Hash tables can be very efficient when used with a good hash function. 

A cryptographic hash function is non-invertible, or, in other words, is a one-way-function. This means that it is practically impossible to recreate the input of the function (normally called message), by looking only at the output of the function (called message digest).

Differently from regular hash functions, which also have some difficulty to be reversed, cryptographic hash functions are very hard to invert even if the attacker knows the theory and the algorithm used. Given only the hash, an attacker should have no clue about the original message, not even the size of the message (which, obviously, it not the case of the first example).

Besides that, cryptographic hash functions have the following characteristics:

  • It is computationally easy to compute the hash value of any given message;
  • Message integrity: it is not possible to modify the message without modifying the message digest;
  • Collision resistance: it is infeasible to find two distinct messages that generate the same digest.

The most common hash functions are listed above:

MD5: announced in 1992 with a 128-bit digest size. Nowadays, it's considered cryptographically broken.

SHA-1: developed by NSA, was standardized in 1995. The digest size is 160-bit. It is no longer considered secure, new developments should implement SHA-2 or SHA-3 . Internet Explorer, Chrome and Firefox browsers have all announced they will stop accepting SHA-1 SSL certificates by 2017.

SHA-2: is a family of six cryptographic hash functions with four different digest size: 224-bit, 256-bit, 384-bit and 512-bit. Was also designed by the NSA and was published in 2001 as a U.S. federal standard (FIPS).

The SHA-3 standard was released by NIST on August 2015.

Cryptographic hash functions play a major role on Public Key Cryptography, but in this article we will examine their use in computing/verifying checksums.

Use of Cryptographic Hashes as Checksums

Downloading a File on the Internet

Many websites offering downloads provide the cryptographic hashes along with the downloadable files. For example, on Notepad++ download page, the user will find:

The author also makes a joke calling a paranoid anyone who would like to check the digests. Let's explain a little bit and let you decide whether or not it is paranoia to verify your downloads.

Listing the digests of the files serves two purposes:

  1. Security. If you download a file over the internet, perform a hash operation over this file and verify that the digest you calculated matches the one provided on the internet, you can be sure that the file you just downloaded is authentic, i.e., has not been tampered with. One could argue, though, that obtaining hashes from the same website you're getting the files is not especially secure because an attacker who has tampered with the file would probably also be capable of modifying the hash listing. Websites with a secured connection (HTTPS) and PGP-signed email from mailing list announcements are good places to get hash listings.
  2. Integrity. Hash values are good for detecting errors because the slightest modification on the original file over the transmission would generate a totally different digest. Only by looking at the digest, though, it is not possible to detect exactly what has changed, so the right action is to discard the downloaded file and start the download again.

This is what a list of checksums looks like:

Another Example - Downloading an Operating System

Perhaps you've decided that verifying every download you do is overzealous. If this is the case, no problem, it's up to you. But what about a very large file, or a very important one?

This is a screenshot of Ubuntu Vivid Vervet's download page. Along with various options of file formats for the download, there is always a list of the digests of such files.

Note that they are also offering SHA-256 sums, in accordance with the recommendation that new developments should use modern hash algorithms.

Transferring Files Via a Local Network or to External Media

If you have transferred files over a local network (maybe your company's intranet), you might have been familiar with the following message:

The first advice here is: do not cut & paste files over the network. Please, don't do that. I have seen several cases where, due to transmission errors, the file is cut from one machine and simply does not appear on the other end.

Instead of doing that, I advise you to copy the file to the remote machine and, after the transfer ends, hash the files both on the local and remote machines. If the digests match, you can now safely delete the original file. At work, we simply can't take chances of losing important files.

The same logic applies to your personal files. External hard drives are great for backing up your music, photos, videos, etc., but failures can occur, especially when we're talking about USB-powered devices. So, when in doubt if the transfer completed successfully, play safe and verify the digests.

A Practical Example

A little hands-on: as an example, let's download HxD, a really nice Hex Editor, and verify its digest. We are going to use the freeware #ashing, which has a simple graphical user interface on Windows.

This is what we find on the download page of the HxD:

After downloading the file, open your Downloads folder and find HxDSetupEN.zip.

Then, open #ashing and drag & drop the file into the program window. Alternatively, browse the file via the graphical interface.

Clink on the SHA-1 button to perform the hash operation:

Click 'Verify', then copy the digest found on HxD website and paste it on the new dialog that appears. Pay attention to not copy extra spaces after the end of the digest.

If everything went wright, you shall see a confirmation that the digests match:

Utilities Available

On Linux (and many other Unix-like operating systems), there are a set of programs installed by default that perform cryptographic hashes: md5sum, sha1sum, sha224sum, sha256sum, sha384sum, sha512sum.

On Windows, there are no built-in checksum utilities. You can install OpenSSL for terminal-level operation or pick a graphical utility. Besides #ashing, there are many other options for calculating hashes. Gizmo's freeware has a detailed list of pros and cons of each.

On OS X, there is the built-in shasum command available in the terminal. To generate a SHA-512 hash of a file, for example:

shasum -a 512 /path/to/file

Let me know in the comments which utility you like the best, and why.

 

Code snippets (C++)

Using OpenSSL


Pros: open source, cross-platform, full-featured toolkit.

Cons: must make share that OpenSSL's crypto library is available at target computer, or the program won't run.


As stated above, you need to link your program against the crypto library.

Many Linux distros provide easy installation of OpenSSL via package manager (RPM, aptitude, YaST). After installation, simply add "-lcrypto" to your program's makefile.

An installer for Windows can be found here. The crypto library for Windows is named libeay32.dll.

 

Start by including necessary headers (here is some documentation on SHA functions and MD5) and declaring a function to print the digest in the screen:

#include <stdio.h>
#include <string.h>

#include <openssl/sha.h>
#include <openssl/md5.h>

void printDigest(unsigned char* auchDigest, int iDigestSize)
{
   for (int i = 0; i < iDigestSize; i++ )
   {
      printf("%02X", auchDigest[i]);
   }
   
   printf("\n");
}

When you already know the size of the data you're working with, it's quite simple:

void HashFromMemory( unsigned char* auchBufferToHash, int iBufferLen)
{
   unsigned char auchDigest[64] = {0};
   
   MD5(auchBufferToHash, iBufferLen, auchDigest);
   printDigest(auchDigest, 16);
   
   SHA1(auchBufferToHash, iBufferLen, auchDigest);
   printDigest(auchDigest, 20);

   SHA224(auchBufferToHash, iBufferLen, auchDigest);
   printDigest(auchDigest, 28);

   SHA256(auchBufferToHash, iBufferLen, auchDigest);
   printDigest(auchDigest, 32);

   SHA384(auchBufferToHash, iBufferLen, auchDigest);
   printDigest(auchDigest, 48);

   SHA512(auchBufferToHash, iBufferLen, auchDigest);
   printDigest(auchDigest, 64);
}

When you do not known the size, or when the input might be very large, a good approach is to read the data in chunks:

void HashFromFile(const char* szFileToHash)
{
   FILE* fp = NULL;
   fp = fopen( szFileToHash, "rb");
   if ( NULL == fp )
   {
      printf("Error opening file\n");
      return;
   }

   // Retrieve the size of the file
   fseek(fp, 0, SEEK_END);
   long lSize = ftell(fp);
   rewind(fp);

   unsigned char btWorkBuffer[1024] = {0}; // read file in 1024-byte chunks
   unsigned char auchDigest[64]     = {0};

   // Hash contexts
   MD5_CTX     ctx_md5;
   SHA_CTX     ctx_sha1;
   SHA256_CTX  ctx_sha224;
   SHA256_CTX  ctx_sha256;
   SHA512_CTX  ctx_sha384;
   SHA512_CTX  ctx_sha512;

   if ( 1 != MD5_Init(&ctx_md5)       || 1 != SHA1_Init(&ctx_sha1)       || 1 != SHA224_Init(&ctx_sha224) ||
        1 != SHA256_Init(&ctx_sha256) || 1 != SHA384_Init(&ctx_sha384)   || 1 != SHA512_Init(&ctx_sha512)  )
   {
      // Unexpected error
      printf("Error initializing at least one context\n");
      return;
   }
   
   // Update the hash
   int  iRead = 0;
   while ( iRead = fread(btWorkBuffer, 1, 1024, fp) )
   {
      if ( 1 != MD5_Update(&ctx_md5,       btWorkBuffer, iRead) || 1 != SHA1_Update(&ctx_sha1,     btWorkBuffer, iRead) ||
           1 != SHA224_Update(&ctx_sha224, btWorkBuffer, iRead) || 1 != SHA256_Update(&ctx_sha256, btWorkBuffer, iRead) ||
           1 != SHA384_Update(&ctx_sha384, btWorkBuffer, iRead) || 1 != SHA512_Update(&ctx_sha512, btWorkBuffer, iRead) )
      {
         // Unexpected error
         printf("Error updating at least one hash\n");
         return;
      }
   }

   // Finalize the hash
   if ( 1 == MD5_Final(auchDigest, &ctx_md5) )
   {
      printf("MD5:"); printDigest(auchDigest, 16);
   }

   if ( 1 == SHA1_Final(auchDigest, &ctx_sha1) )
   {
      printf("SHA1:"); printDigest(auchDigest, 20);
   }

   if ( 1 == SHA224_Final(auchDigest, &ctx_sha224) )
   {
      printf("SHA224:"); printDigest(auchDigest, 28);
   }

   if ( 1 == SHA256_Final(auchDigest, &ctx_sha256) )
   {
      printf("SHA256:"); printDigest(auchDigest, 32);
   }

   if ( 1 == SHA384_Final(auchDigest, &ctx_sha384) )
   {
      printf("SHA384:"); printDigest(auchDigest, 48);
   }

   if ( 1 == SHA512_Final(auchDigest, &ctx_sha512) )
   {
      printf("SHA512:"); printDigest(auchDigest, 64);
   }
}

And this is how it should be called:

int main(int argc, char* argv[])
{
   // Hash the contents of a buffer in memory
   unsigned char myBuffer[10] = {0}; // buffer with 10 NULL bytes (0x00)
   HashFromMemory(myBuffer, sizeof(myBuffer));

   // Hash the contents of a file
   HashFromFile("data.txt");
   
   return 0;
}

 

There is also a higher-level approach, not only to perform hash operations but for cryptography in general, that uses an input-output abstraction called BIO. Here's an example of SHA-1 calculated that way.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Jerome Vonk
Systems Engineer
Brazil Brazil
C/C++ developer for Windows and Linux. Cryptography enthusiast.

You may also be interested in...

Comments and Discussions

 
QuestionMain problem with hashes in Security Applications Pin
AnotherKen29-Jan-16 20:44
professionalAnotherKen29-Jan-16 20:44 
AnswerRe: Main problem with hashes in Security Applications Pin
feanorgem31-Jan-16 15:08
memberfeanorgem31-Jan-16 15:08 
Questionfciv.exe Pin
raildude5-Jan-16 8:42
memberraildude5-Jan-16 8:42 
GeneralMy vote of 5 Pin
newton.saber3-Dec-15 4:30
membernewton.saber3-Dec-15 4:30 
Very nice article and explanation. Well written on a confusing topic.

This is related to work I've done and written about in my article on whitelisting processes which are running on your computer, here at CP:
Snapshot Running Processes With SeguroList, Part 1[^]

Check out the section titled:
"How I Generate A 256 bit SHA HashKey"

Thanks for sharing.
QuestionInvertable Pin
feanorgem10-Nov-15 11:04
memberfeanorgem10-Nov-15 11:04 
AnswerRe: Invertable Pin
Jerome Vonk11-Nov-15 5:47
memberJerome Vonk11-Nov-15 5:47 
GeneralRe: Invertable Pin
feanorgem11-Nov-15 13:38
memberfeanorgem11-Nov-15 13:38 
QuestionWell Explained Pin
237419-Nov-15 10:04
member237419-Nov-15 10:04 
PraiseNice Job Pin
Jacob F. W.9-Nov-15 9:03
memberJacob F. W.9-Nov-15 9:03 
GeneralRe: Nice Job Pin
Jerome Vonk9-Nov-15 9:12
memberJerome Vonk9-Nov-15 9:12 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.170915.1 | Last Updated 28 Jan 2016
Article Copyright 2015 by Jerome Vonk
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid