Click here to Skip to main content
14,598,921 members

Delete Blank Scanned Image File

Rate this:
4.18 (4 votes)
Please Sign up or sign in to vote.
4.18 (4 votes)
26 Oct 2014CPOL
This application will tell you if a scanned image file is blank

Image 1


This application will tell you if a scanned image file (JPG, GIF, PNG or TIF) is blank.

One you detected a blank file, you can:

  1. Add _deleted to the filename of a blank image
  2. Place blank images to to_be_deleted folder
  3. Delete blank images

At first, I thought that this was a simple problem and all I had to do is look at each pixel in the image and see how many of them are white. If the number of white pixels exceeded some specified “Content Density” threshold (like 99%) I would consider the image blank. However, this approach did not work because most of the pixels in a scanned image of a blank page are not white but a mix of various colors.

To address this problem, I identified the most common color. I did this by creating the histogram for each color. The histogram is basically a table that gives you the number of times each color is used (like red-5, green – 7). And I picked the most common color from the table. So if the blank sheet was pink, I will still be able to see that it is blank because most of the pixels were of the most common color: pink.

There still was a problem because two pixels that looked exactly the same had different color numbers. System.Drawing.Color gives you over 16 million colors (255^3). So I converted the color to one of 216 common colors I found here. This gave me the ability to group color that looked more or less similar. The color conversion function looked for the most similar common color by calculating the color distance between the color and a common color. The color distance is the sum of absolute values of the differences between reds, greens and blues.

Yet, again, I ran into a problem because white pixels did not look white once I zoomed in. The white “pixels” was a cluster of pixels made up of pixels of various colors. Each one was not white but the cluster itself looked white. To deal with this problem, I expended the definition of pixels to a box 5x5 of 25 pixels.

Another problem I ran into had to do with margins. A scanned white sheet of paper can have a blank margin around. This can make the program think that it was looking at a non-white sheet of paper. To address this problem, I set to ignore 200 pixels margin.

At this point, program runs satisfactorily but very slow. I solved by saving the file in grayscale and reducing the file size to 200 px.

Using the Code

Here is the code:

Imports System.IO
Imports System.Drawing.Imaging

Public Class frmMain

    Dim iPixelSize As Integer = 3

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        Dim oAppRegistry As New AppSetting

        cbFileType.SelectedIndex = oAppRegistry.GetValueDef("FileType", "0")
        cbAction.SelectedIndex = oAppRegistry.GetValueDef("Action", "0")

        For i As Integer = 100 To 1 Step -1
            Dim sVal As String = i & "%"

            If i = 1 Or i = 25 Or i = 50 Or i = 75 Then
                sVal = i & "% or more of all pixels have to be white for the file to be deleted"

            ElseIf i = 100 Then
                sVal = "100% of all pixels have to be white for the file to be deleted"
            End If


        cbDensity.SelectedIndex = oAppRegistry.GetValueDef("Density", "1")
        txtIgnoreMargin.Text = oAppRegistry.GetValueDef("IgnoreMargin", "200")
        txtPixelSize.Text = oAppRegistry.GetValueDef("PixelSize", "5")
        txtFrom.Text = oAppRegistry.GetValue("From")
        chkSubfolders.Checked = oAppRegistry.GetValue("Subfolders") = "1"
        chkResize.Checked = oAppRegistry.GetValue("Resize") = "1"
        chkShowHistogram.Checked = oAppRegistry.GetValue("Histogram") = "1"

    End Sub

    Private Sub frmMain_FormClosing(ByVal sender As Object, ByVal e As System.Windows.Forms.FormClosingEventArgs) Handles Me.FormClosing
        Dim oAppRegistry As New AppSetting

        oAppRegistry.SetValue("From", txtFrom.Text)
        oAppRegistry.SetValue("FileType", cbFileType.SelectedIndex)
        oAppRegistry.SetValue("Action", cbAction.SelectedIndex)
        oAppRegistry.SetValue("Density", cbDensity.SelectedIndex)
        oAppRegistry.SetValue("IgnoreMargin", txtIgnoreMargin.Text)
        oAppRegistry.SetValue("PixelSize", txtPixelSize.Text) 'Pixel size
        oAppRegistry.SetValue("Subfolders", IIf(chkSubfolders.Checked, 1, 0))
        oAppRegistry.SetValue("Resize", IIf(chkResize.Checked, 1, 0))
        oAppRegistry.SetValue("Histogram", IIf(chkShowHistogram.Checked, 1, 0))
    End Sub

    Function GetRequiredDensity() As Integer
        Dim sDensity As String = Split(cbDensity.SelectedItem, "%")(0)
        Return CInt(sDensity)
    End Function

    Private Sub btnProcess_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnProcess.Click

        Dim sFromPath As String = txtFrom.Text
        If Not System.IO.Directory.Exists(sFromPath) Then
            MsgBox("Folder does not exist")
            Exit Sub
        End If

        iPixelSize = txtPixelSize.Text

        sHistogramHtml = ""
        btnProcess.Enabled = False
        txtOutput.Text = ""
        txtOutput.Text += "Starting..." & vbCrLf
        txtOutput.Text += "Done!"
        btnProcess.Enabled = True

        If sHistogramHtml <> "" Then
            Dim sAssPath As String = System.Reflection.Assembly.GetExecutingAssembly().Location
            Dim sPath As String = System.IO.Path.GetDirectoryName(sAssPath)
            Dim sFilePath As String = System.IO.Path.Combine(sPath, "Histogram.htm")
            If IO.File.Exists(sFilePath) Then
            End If

            Dim oStreamWriter As New System.IO.StreamWriter(sFilePath, True)
        End If

    End Sub

    Sub ProccessFolder(ByVal sFolderPath As String)

        Dim sFileType As String = cbFileType.SelectedItem
        Dim iRequiredDensity As Integer = GetRequiredDensity()
        Dim sAction As String = cbAction.SelectedItem
        Dim iMargin As Integer = 0

            iMargin = IIf(IsNumeric(txtIgnoreMargin.Text), txtIgnoreMargin.Text, 0)
        Catch ex As Exception
            'Ignore bad input
        End Try

        Dim oFiles As String() = Directory.GetFiles(sFolderPath)
        ProgressBar1.Maximum = oFiles.Length

        For i As Integer = 0 To oFiles.Length - 1
            Dim sFromFilePath As String = oFiles(i)
            Dim oFileInfo As New FileInfo(sFromFilePath)
            Dim sExt As String = PadExt(oFileInfo.Extension)
            If sFileType = "All" OrElse sExt = sFileType Then

                Dim dTime As DateTime = DateTime.Now

                Dim iPercent As Integer = GetWhitePecent(sFromFilePath, iMargin)
                Dim iSecondsToReadFile As Integer = (DateTime.Now).Subtract(dTime).TotalSeconds()

                txtOutput.Text += sFromFilePath & vbTab & " has " & iPercent.ToString("0.00") & "% of white pixels. File was read in " & iSecondsToReadFile & " seconds." & vbCrLf

                If iPercent >= iRequiredDensity Then
                    'Found white page

                    Select Case sAction
                        Case "Delete blank images"

                        Case "Add _deleted to the filename of a blank image"
                            Dim sNewFileName As String = IO.Path.GetFileNameWithoutExtension(sFromFilePath) & "_deleted." & sExt
                            Dim sNewFilePath As String = sFolderPath & "\" & sNewFileName

                            If IO.File.Exists(sNewFilePath) Then
                                txtOutput.Text += "File " & oFileInfo.Name & " cannot be renamed to " & sNewFileName & " because file with this name already exists." & vbCrLf
                            End If

                        Case "Place blank images to to_be_deleted folder"
                            Dim sTempFolderPath As String = sFolderPath & "\to_be_deleted"

                            If IO.Directory.Exists(sTempFolderPath) = False Then
                            End If

                            If IO.File.Exists(sTempFolderPath & "\" & oFileInfo.Name) Then
                                txtOutput.Text += "File " & oFileInfo.Name & " cannot be moved to " & sTempFolderPath & " because file with this name already exists there." & vbCrLf
                                oFileInfo.MoveTo(sTempFolderPath & "\" & oFileInfo.Name)
                            End If
                    End Select

                End If
                'Catch ex As Exception
                '    txtOutput.Text += sFromFilePath & vbTab & ex.Message & vbCrLf
                'End Try
            End If

            ProgressBar1.Value = i

        ProgressBar1.Value = 0

        If chkSubfolders.Checked Then
            Dim oFolders As String() = Directory.GetDirectories(sFolderPath)
            For i As Integer = 0 To oFolders.Length - 1
                Dim sChildFolder As String = oFolders(i)
                Dim iPos As Integer = sChildFolder.LastIndexOf("\")
                Dim sFolderName As String = sChildFolder.Substring(iPos + 1)
                If sFolderName <> "to_be_deleted" Then
                End If
        End If

    End Sub

    Private Function PadExt(ByVal s As String) As String
        s = UCase(s)
        If s.Length > 3 Then
            s = s.Substring(1, 3)
        End If
        Return s
    End Function

    Dim oKnownColorLookup As New Hashtable

    Private Function GetColor2(ByVal oColor As Color) As String

        If oKnownColorLookup.ContainsKey(oColor) Then
            'Get from cache
            Return oKnownColorLookup(oColor)
        End If

        Dim iColorDistance As Long = 10000000
        Dim oColors As ArrayList = GetWebSafeColors()

        For Each sColor As String In oColors
            Dim iDistance As Integer = CompareColors(ColorTranslator.FromHtml("#" & sColor), oColor)
            If iDistance < iColorDistance Then iColorDistance = iDistance

        For Each sColor As String In oColors
            Dim iDistance As Integer = CompareColors(ColorTranslator.FromHtml("#" & sColor), oColor)
            If iDistance = iColorDistance Then
                oKnownColorLookup.Add(oColor, sColor)
                Return sColor
            End If

        Return ""
    End Function

    Public Shared Function CompareColors(ByVal a As Color, ByVal b As Color) As Integer
        Return Math.Abs(CInt(a.R) - CInt(b.R)) + Math.Abs(CInt(a.G) - CInt(b.G)) + Math.Abs(CInt(a.B) - CInt(b.B))
    End Function

    Dim sHistogramHtml As String = ""

    Private Function GetMostCommonColor(ByRef oBitmap As Bitmap, ByVal sFilePath As String) As String

        Dim oColorHistogram As New Hashtable
        Dim iTotalCount As Integer = 0

        For x As Integer = 0 To oBitmap.Width - 1 - iPixelSize Step iPixelSize
            For y As Integer = 0 To oBitmap.Height - 1 - iPixelSize Step iPixelSize
                Dim oColor As Color = GetColorPixel(oBitmap, x, y)
                Dim sColor As String = GetColor2(oColor)

                If oColorHistogram.ContainsKey(sColor) Then
                    oColorHistogram(sColor) = CInt(oColorHistogram(sColor)) + 1
                    oColorHistogram.Add(sColor, 1)
                End If

                iTotalCount += 1

        Dim oDataView As DataView = SortHashtable(oColorHistogram)

        If chkShowHistogram.Checked Then
            sHistogramHtml += "<h2>Histogram for " & sFilePath & "</h2>" & vbCrLf
            sHistogramHtml += "<table border=1>" & vbCrLf
            For iRow As Long = 0 To oDataView.Count - 1
                Dim sKey As String = oDataView(iRow)("key")
                Dim sValue As String = oDataView(iRow)("value")
                Dim iPecent As Double = (CDbl(sValue) / iTotalCount) * 100

                If iPecent > 1 Then
                    sHistogramHtml += vbTab & "<tr><td bgcolor='" & sKey & "'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</td><td>" & sKey & "</td><td>"
                    sHistogramHtml += Math.Round(iPecent, 2) & "%</td></tr>" & vbCrLf
                End If
            sHistogramHtml += "</table>" & vbCrLf

        End If

        If oColorHistogram.Count > 0 Then
            Dim sKey As String = oDataView(0)("key")
            Dim sValue As String = oDataView(0)("value")
            Return sKey
        End If

        Return ""
    End Function

    Public Function MakeGrayscale(ByRef original As Bitmap) As Bitmap

        Dim destWidth As Integer = original.Width
        Dim destHeight As Integer = original.Height

        If chkResize.Checked Then
            Dim Width As Integer = 200
            Dim Height As Integer = 200

            Dim sourceWidth As Integer = destWidth
            Dim sourceHeight As Integer = destHeight

            Dim nPercent As Single = 0
            Dim nPercentW As Single = 0
            Dim nPercentH As Single = 0

            nPercentW = (CSng(Width) / CSng(sourceWidth))
            nPercentH = (CSng(Height) / CSng(sourceHeight))

            If nPercentH < nPercentW Then
                nPercent = nPercentH
                nPercent = nPercentW
            End If

            destWidth = CInt((sourceWidth * nPercent))
            destHeight = CInt((sourceHeight * nPercent))
        End If

        Dim newBitmap As Bitmap = New Bitmap(destWidth, destHeight)
        Dim g As Graphics = Graphics.FromImage(newBitmap)

        Dim colorMatrix As ColorMatrix = New ColorMatrix(New Single()() {New Single() {0.3F, 0.3F, 0.3F, 0, 0}, New Single() {0.59F, 0.59F, 0.59F, 0, 0}, New Single() {0.11F, 0.11F, 0.11F, 0, 0}, New Single() {0, 0, 0, 1, 0}, New Single() {0, 0, 0, 0, 1}})
        Dim attributes As ImageAttributes = New ImageAttributes()
        g.DrawImage(original, New Rectangle(0, 0, destWidth, destHeight), 0, 0, original.Width, original.Height, GraphicsUnit.Pixel, attributes)

        Return newBitmap
    End Function

    Private Function GetWhitePecent(ByVal sFilePath As String, ByVal iMargin As Integer) As Integer

        Dim oBitmap As Bitmap = New Bitmap(sFilePath)
        oBitmap = MakeGrayscale(oBitmap)
        'oBitmap.Save(sFilePath & "_gray.bmp", ImageFormat.Bmp)

        Dim iTotalCount As Long = 0
        Dim iCount As Long = 0
        Dim sMostCommonColor As String = GetMostCommonColor(oBitmap, sFilePath)

        Dim iWidth As Integer = oBitmap.Width
        Dim iHeight As Integer = oBitmap.Height

        For x As Integer = iMargin To iWidth - 1 - iPixelSize - iMargin Step iPixelSize
            For y As Integer = iMargin To iHeight - 1 - iPixelSize - iMargin Step iPixelSize
                Dim oColor As Color = GetColorPixel(oBitmap, x, y)
                If GetColor2(oColor) = sMostCommonColor Then
                    iCount += 1
                End If
                iTotalCount += 1

        If iTotalCount = 0 Then
            MsgBox("Magin is too large " & txtIgnoreMargin.Text)
            txtIgnoreMargin.Text = "0"
            Return 0
        End If

        Return (CDbl(iCount) / iTotalCount) * 100
    End Function

    Private Function GetColorPixel(ByRef oBitmap As Bitmap, ByVal x As Integer, ByVal y As Integer) As Color
        If iPixelSize = 1 Then Return oBitmap.GetPixel(x, y)

        Dim iA As Integer = 0
        Dim iR As Integer = 0
        Dim iG As Integer = 0
        Dim iB As Integer = 0
        Dim iCount As Integer = 0

        For iX As Integer = 0 To iPixelSize
            For iY As Integer = 0 To iPixelSize
                Dim oColor As Color = oBitmap.GetPixel(x + iX, y + iY)

                iA += CInt(oColor.A)
                iR += CInt(oColor.R)
                iG += CInt(oColor.G)
                iB += CInt(oColor.B)

                iCount += 1

        Return Color.FromArgb((iA / iCount), (iR / iCount), (iG / iCount), (iB / iCount))
    End Function

    Private Function SortHashtable(ByVal oHash As Hashtable) As DataView
        Dim oTable As New Data.DataTable
        oTable.Columns.Add(New Data.DataColumn("key", System.Type.GetType("System.String")))
        oTable.Columns.Add(New Data.DataColumn("value", System.Type.GetType("System.Int32")))

        For Each oEntry As Collections.DictionaryEntry In oHash
            Dim oDataRow As DataRow = oTable.NewRow()
            oDataRow("key") = oEntry.Key
            oDataRow("value") = oEntry.Value

        Dim oDataView As DataView = New DataView(oTable)
        oDataView.Sort = "value DESC"

        Return oDataView
    End Function

    Private Sub btnFromFolder_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnFromFolder.Click
        txtFrom.Text = fldFrom.SelectedPath
    End Sub

    Private Function GetWebSafeColors() As ArrayList
        Dim o As New ArrayList





































        Return o
    End Function

    Private Sub chkResize_CheckedChanged(sender As Object, e As EventArgs) Handles chkResize.CheckedChanged
    End Sub

    Private Sub chkGrayscale_CheckedChanged(sender As Object, e As EventArgs)
    End Sub

    Private Sub HideSampleMargin()
        If chkResize.Checked = False = False Then
            txtIgnoreMargin.Text = "200"
            txtIgnoreMargin.Text = "3"
        End If
    End Sub

End Class

Public Class AppSetting

    Private oDS As New Data.DataSet
    Private oTable As New Data.DataTable
    Private sFilePath As String = ""

    Public Sub New()
        Dim sAssPath As String = System.Reflection.Assembly.GetExecutingAssembly().Location
        Dim sPath As String = System.IO.Path.GetDirectoryName(sAssPath)
        sFilePath = System.IO.Path.Combine(sPath, "Settings.xml")

    End Sub

    Private Sub LoadData()
        oDS = New Data.DataSet()

        If System.IO.File.Exists(sFilePath) Then
            If oDS.Tables.Count > 0 Then
                oTable = oDS.Tables(0)
                Exit Sub
            End If
        End If

        'setup New
        oTable = New Data.DataTable()
        oTable.Columns.Add(New Data.DataColumn("key"))
        oTable.Columns.Add(New Data.DataColumn("value"))
    End Sub

    Public Sub SaveData()
        'If System.IO.File.Exists(sFilePath) Then
        '    System.IO.File.Delete(sFilePath)
        'End If

        oTable.DataSet.WriteXml(sFilePath, XmlWriteMode.WriteSchema)
    End Sub

    Public Sub SetValue(ByVal sKey As String, ByVal sValue As String)

        Dim oDataRow As DataRow
        Dim oDataRows As DataRow() = oTable.Select("key = '" & Replace(sKey, "'", "''") & "'")
        If oDataRows.Length > 0 Then
            oDataRows(0)("value") = sValue
            oDataRow = oTable.NewRow()
            oDataRow("key") = sKey
            oDataRow("value") = sValue
        End If

    End Sub

    Public Function GetValue(ByVal sKey As String) As String

        Dim oDataRows As DataRow() = oTable.Select("key = '" & Replace(sKey, "'", "''") & "'")
        If oDataRows.Length > 0 Then
            Return oDataRows(0)("value") & ""
        End If

        Return ""
    End Function

    Public Function GetValueDef(ByVal sKey As String, ByVal sDefVal As String) As String
        Dim sValue As String = GetValue(sKey)
        If sValue <> "" Then
            Return sValue
        End If

        Return sDefVal
    End Function
End Class



  • 27th October, 2014: Initial version


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Igor Krupitsky
Web Developer
United States United States
Igor is a business intelligence consultant working in Tampa, Florida. He has a BS in Finance from University of South Carolina and Masters in Information Management System from University of South Florida. He also has following professional certifications: MCSD, MCDBA, MCAD.

Comments and Discussions

QuestionTime Delay Pin
EBalaji1-Mar-15 22:37
MemberEBalaji1-Mar-15 22:37 
AnswerRe: Time Delay Pin
Igor Krupitsky4-Mar-15 1:55
mvaIgor Krupitsky4-Mar-15 1:55 
I plan to improve this by residing the image to a smaller image instead of doing sampling.
QuestionFormat? Pin
Nelek8-Nov-14 13:42
protectorNelek8-Nov-14 13:42 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Posted 26 Oct 2014


10 bookmarked