|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
Note: This is an unedited contribution. If this article is inappropriate,
needs attention or copies someone else's work without reference then please
Report This Article
AbstractThis article presents an unsophisticated class for comparing the raw speed of two programming design alternatives. It is not a scientific benchmarking exercise, but is a very simple way of implementing two coded processes to gauge which is probably faster and approximately how much faster. The class allows the comparison with a minimum of infrastructure coding, it is not rocket-science. Contents
IntroductionIn the process of writing some imaging filter methods I continually found myself in the position of having to optimise my code for raw speed. Although the approach I had taken was fundamentally quite an efficient one, the need to regularly process more than six million pixels made speed paramount, sometimes at the expense of transparency and good programming practice. The simple process of converting a 24-bit image to 8-bit, indexed grayscale then finding edges, meant 20 million pixel manipulations. One microsecond gained per pixel saved 20 seconds of processing time. Routine practices soon became critical decisions, for example: do I protect a class member/field (like a pixel array) and access it via a read-write property, or do I make it public and allow read-write directly to the data? Until I knew what the time penalty was for access via a property, I had no more than a strong suspicion that reading-writing directly would be slightly faster, and even less idea of how much faster it might be. Eventually I had such a backlog of outstanding comparisons to make that I wrote a simple abstract class that enabled me to get through them as quickly as possible with sufficiently scientific results that I could choose which options to use. Some Surprising ResultsWell they were to me anyway... I was not interested in any rigorous scientific comparison that would withstand serious academic scrutiny, I was merely interested in making some design choices that would result in a shorter wait for some imaging filter to execute. Some of the differences in the tests I ran were a little more pronounced than I expected, some quite surprising. Example - property versus direct member access:An example; direct, unprotected read access to class members is about five times as efficient as the same access via a property. I did not test write-access; the test outcome was sufficient evidence for me to make several class members public. Later I discovered, to my surprise, that if the ÔÇ£ReleaseÔÇØ version of the executable is run the outcome is indeterminate - one pass gives property access the advantage, another direct access. My conclusion is that either will do, and my assumption is that the optimiser builds similar code for both in the release binary. I have included two executables in the download, Example - recursion versus stack-and-loop:Another example: a simple recursive method surprisingly seemed ten times as efficient as pushing a value on a dotNet Class SpeedTestsABThe speed test runs a rudimentary control method (SpeedTestControl) which is simply an empty method, to provide an overhead metric which can be subtracted from the total time of each test to provide a net running time. The abstract class SpeedTestsAB defines four abstract/mustoverride methods:
In addition there are methods and properties that enable simple reporting of the results.
Results output looks like: TEST RESULTS: 10,000,000 repetitions. Using SpeedTestsABCreate a class that inherits Public Class clsSpeedTestAB_Properties Inherits clsBaseSpeedTestAB Create a code section that defines properties, methods and data required to run the tests, e.g.: #Region "[=== SPEED TEST COMPONENTS ===]" Protected f_SomeInteger As Int32 = 123456 Public Property SomeInteger() As Int32 Get Return Me.f_SomeInteger End Get Set(ByVal value As Int32) Me.f_SomeInteger = value End Set End Property #End Region Override methods ''' <summary> ''' Overriden speed test A setup. ''' </summary> ''' <remarks></remarks> Protected Overrides Sub SetUpTestB() Me.f_DescribeB = "Access a field via a property." End Sub Override methods ''' <summary> ''' Overridden Speed Test A. ''' </summary> ''' <remarks></remarks> Protected Overrides Sub SpeedTestA() Dim xInt As Int32 xInt = Me.f_SomeInteger End Sub ''' <summary> ''' Overridden Speed Test B. ''' </summary> ''' <remarks></remarks> Protected Overrides Sub SpeedTestB() Dim xInt As Int32 xInt = Me.SomeInteger End Sub Define the class somewhere and call the Private Sub Button1_Click(ByVal sender As System.Object _ , ByVal e As System.EventArgs) Handles Button3.Click Dim xTest As New clsSpeedTestAB_Properties(10000000) xTest.RunTest() xTest.ShowResults End Sub The DownloadsThe down-loads are in four packs:
Points to NoteThis is not an example of proper scientific speed benchmarking, but is only a mechanism for making a choice as to which code design to use when raw speed is of primary importance. There is often quite a difference between the efficiency/speed of code which is compiled in debug mode when compared to the same code compiled in release mode. Assuming that the final version of your software will be compiled in release mode, I recommend speed testing in that mode. It is of interest however, to compare execution speed in the two modes. Sample Test Results(In debug Mode.) These results are from an earlier incarnation of the class and were actively used in making image coding design decisions. The differences may disappear when tested with a release compiled version. Speed Test: If, ElseIf compared with Select Case. Public Overrides Sub SpeedTestA()
If Me.f_Int32 = 0 Then
Me.f_Int32 = 1
ElseIf Me.f_Int32 = 1 Then
Me.f_Int32 = 1
ElseIf Me.f_Int32 = 2 Then
Me.f_Int32 = 2
ElseIf Me.f_Int32 = 3 Then
Me.f_Int32 = 3
ElseIf Me.f_Int32 = 4 Then
Me.f_Int32 = 4
ElseIf Me.f_Int32 = 5 Then
Me.f_Int32 = 5
ElseIf Me.f_Int32 = 6 Then
Me.f_Int32 = 6
ElseIf Me.f_Int32 = 7 Then
Me.f_Int32 = 7
ElseIf Me.f_Int32 = 8 Then
Me.f_Int32 = 8
ElseIf Me.f_Int32 = 9 Then
Me.f_Int32 = 9
End If
End Sub
Public Overrides Sub SpeedTestB()
Select Case Me.f_Int32
Case 0
Me.f_Int32 = 0
Case 1
Me.f_Int32 = 1
Case 2
Me.f_Int32 = 2
Case 3
Me.f_Int32 = 3
Case 4
Me.f_Int32 = 4
Case 5
Me.f_Int32 = 5
Case 6
Me.f_Int32 = 6
Case 7
Me.f_Int32 = 7
Case 8
Me.f_Int32 = 8
Case 9
Me.f_Int32 = 9
End Select
End Sub
Protected f_Int32 As Int32
TEST RESULTS: 600,000,000 repetitions. CONCLUSION: Select Case may be approximately 40% faster than If, ElseIf. Speed Test: If, ElseIf compared with nested IIF. Public Overrides Sub SpeedTestA()
If Me.f_Int32 = 0 Then
Me.f_Int32 = 1
ElseIf Me.f_Int32 = 1 Then
Me.f_Int32 = 1
ElseIf Me.f_Int32 = 2 Then
Me.f_Int32 = 2
ElseIf Me.f_Int32 = 3 Then
Me.f_Int32 = 3
ElseIf Me.f_Int32 = 4 Then
Me.f_Int32 = 4
ElseIf Me.f_Int32 = 5 Then
Me.f_Int32 = 5
ElseIf Me.f_Int32 = 6 Then
Me.f_Int32 = 6
ElseIf Me.f_Int32 = 7 Then
Me.f_Int32 = 7
ElseIf Me.f_Int32 = 8 Then
Me.f_Int32 = 8
ElseIf Me.f_Int32 = 9 Then
Me.f_Int32 = 9
End If
End Sub
Public Overrides Sub SpeedTestB()
IIf(Me.f_Int32 = 0, Me.f_Int32 = 0 _
, IIf(Me.f_Int32 = 1, Me.f_Int32 = 1 _
, IIf(Me.f_Int32 = 2, Me.f_Int32 = 2 _
, IIf(Me.f_Int32 = 3, Me.f_Int32 = 3 _
, IIf(Me.f_Int32 = 4, Me.f_Int32 = 4 _
, IIf(Me.f_Int32 = 5, Me.f_Int32 = 5 _
, IIf(Me.f_Int32 = 6, Me.f_Int32 = 6 _
, IIf(Me.f_Int32 = 7, Me.f_Int32 = 7 _
, IIf(Me.f_Int32 = 8, Me.f_Int32 = 8 _
, IIf(Me.f_Int32 = 9, Me.f_Int32 = 9 _
, Me.f_Int32 = 9))))))))))
End Sub
Protected f_Int32 As Int32 = -1
TEST RESULTS: 100,000,000 repetitions. Do not use IIF. Speed Test: Compare Shift-Right 4 (X >> 4) with Divide by 16.Public Overrides Sub SpeedTestA() Dim xInt As Int32 = CInt((((((((Me.f_Int >> 4) >> 4) >> 4) >> 4) >> 4) >> 4) >> 4) >> 4) End Sub Public Overrides Sub SpeedTestB() Dim xInt As Int32 = CInt((((((((Me.f_Int / 16 / 16 / 16 / 16 / 16 / 16 / 16 / 16) End Sub Protected f_Int As Int32 = 123456 TEST RESULTS: 100,000,000 repetitions. Getting close to 200 times as fast to shift rather than divide. History2008-05-26 Created.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||