Click here to Skip to main content
Licence Zlib
First Posted 29 Nov 2005
Views 346,308
Bookmarked 78 times

Generating Fractals with SSE/SSE2

By | 29 Nov 2005 | Article
An article on generating Mandelbrot and Julia sets using Intel's Streaming SIMD Extensions (SSE, SSE2).
 

License

This article, along with any associated source code and files, is licensed under The zlib/libpng License

About the Author

Peter Kankowski

Software Developer

Russian Federation Russian Federation

Member

Peter lives in Siberia, the land of sleeping sun, beautiful mountains, and infinitely deep snow. He recently started a wiki about algorithms and code optimization, where people could share their ideas, learn, and teach others.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board. (secure sign-in)
 
Search this forum  
 FAQ
    Noise  Layout  Per page   
  Refresh
Generalsource contains a virus PinmemberCem Usta21:57 3 May '09  
GeneralRe: source contains a virus Pinmembersergeykkk15:29 25 Aug '11  
GeneralRe: source contains a virus PinmemberPeter Kankowski16:31 25 Aug '11  
GeneralRe: source contains a virus Pinmembersergeykkk18:28 25 Aug '11  
GeneralFractal Dimension Indicator Formula Code PinmemberYIYI2223:50 30 Jul '08  
Generaltypo PinmemberSavageJ20:11 3 Jun '06  
GeneralSimplified Program PinmemberPeter Kankowski18:30 26 May '06  
GeneralIterations greater than 64 [modified] Pinmemberzenzero11:46 26 May '06  
GeneralRe: Iterations greater than 64 Pinmemberzenzero13:22 26 May '06  
GeneralRe: Iterations greater than 64 PinmemberPeter Kankowski18:23 26 May '06  
GeneralRe: Iterations greater than 64 Pinmemberzenzero1:08 27 May '06  
GeneralCould not compile on VS 2005 Pinmemberzenzero7:54 26 May '06  
GeneralRe: Could not compile on VS 2005 Pinmemberzenzero10:20 26 May '06  
GeneralBenchmark PinmemberxKuemmelx12:42 8 May '06  
GeneralRe: Benchmark PinmemberPeter Kankowski14:45 9 May '06  
GeneralRe: Benchmark PinmemberxKuemmelx21:54 10 May '06  
GeneralRe: Benchmark PinmemberxKuemmelx11:34 15 May '06  
GeneralAnother approach, removing branches PinmemberArne Thormodsen12:24 10 Apr '06  
Interesting article, thanks.
 
I've played with this problem a bit and came up with the approach below.   The main thing I concentrated on was removing all branching from the code.   By using "cmove" and calculating the iterations in an XMM register it can all be done with only once branch decision, the main iteration loop that checks on the counters.
 
The only penalty is that I used up all the XMM registers and so need to keep one contant ("4.0") im memory.   It seems that there is little or no penalty for referencing constant source operands from memory in SSE instructions once they are in L0 cache.   Does this match anyone elses experience?
 
Any comments are appreciated, I'm still learning tricks for optimising on the Pentium.
 
--arne
 
Hope it's OK to post in "ATT" format, I use GCC.
 
####
# Mandelbrot inner loop
# Calculates two points at once with SSE2
# calling from C (GCC) as "int *counters=iterPoint(iterLim,xmmBuffer);"
# Free for any use as long as this comment is included
# Arne Thormodsen - April 2006
####
 
.data
.p2align 6,0 #align at 2^6 for SSE2 and cache line
COUNTER: .double 0.0,0.0   # return location for counters, actually used for 64 bit ints, not doubles
FOUR: .double 4.0,4.0
 

.globl _iterPoint
     .def     _iterPoint;     .scl     2;     .type     32;     .endef
  
.text
_iterPoint:
   pushl %ebx # used to hold sse2 result compare
   pushl %ecx # holds the constant 1
   movl 16(%esp), %eax # pointer to unaligned buffer passed from C that contains
                                 # Cim and Cre for two adjacent points
   movupd   (%eax),%xmm2 # CRe
   movupd   0x10(%eax),%xmm3 # CIm
   #
   movl     12(%esp), %eax # get iteration limit
   movl $1, %ecx # constant 1, used a couple of places
   # initialize XMM registers
   pxor %xmm7, %xmm7 # counter for iterations = 0
   movapd %xmm2, %xmm0 # =Cre
   movapd %xmm3, %xmm1 # =Cim
   call MAIN_LOOP
   movapd %xmm7, COUNTER # save iteration counters
   movl $COUNTER,%eax # return pointer to counters
   popl %ecx
   popl %ebx
   ret
 
MAIN_LOOP:
  
   ### A^2 -> A, MAG^2(A)->M^2 ###
   movapd %xmm0, %xmm4 # t1=re
   movapd %xmm1, %xmm5 # t2=im
   mulpd %xmm0, %xmm0 # re(partial)=re*re
   mulpd %xmm1, %xmm5 # t2=im*im
   movapd %xmm0, %xmm6 # M^2(partial)=re*re
   addpd %xmm1, %xmm1 # im=2*im
   addpd %xmm5, %xmm6 # M^2=(re*re)+(im*im)
   mulpd %xmm4, %xmm1 # im=2*im*re
   subpd %xmm5, %xmm0 # re=(re*re)-(im*im)
   ###
  
   ###
   cmpltpd FOUR, %xmm6 # compare magnitude squared to 4.0
   psubq %xmm6, %xmm7 # increment counters (note that -1 is put in for
                              # "true" so sub increments counters)
   movmskpd   %xmm6, %ebx # get compare flags
   andl %ebx, %ebx # =0? (i.e. both counters have stopped incrementing)
   cmove %ecx, %eax # if =0 set iter counter to 1, will bail at next decrement
   ###
  
   ###
   addpd     %xmm2, %xmm0 # zre=(zre*zre)-(zim*zim)+cre
   addpd     %xmm3, %xmm1 # zim=(2*zim*zre)+cim
   ###
  
   subl   %ecx,%eax #decrement counter that is tracking both calcs
   ja     MAIN_LOOP
   ret
GeneralRe: Another approach, removing branches PinmemberPeter Kankowski0:25 11 Apr '06  
GeneralRe: Another approach, removing branches PinmemberArne Thormodsen13:33 11 Apr '06  
GeneralCode Questions/Comments PinmemberxKuemmelx2:40 8 Mar '06  
GeneralRe: Code Questions/Comments PinmemberPeter Kankowski15:51 8 Mar '06  
GeneralRe: Code Questions/Comments PinmemberxKuemmelx20:24 8 Mar '06  
GeneralRe: Code Questions/Comments PinmemberPeter Kankowski1:29 9 Mar '06  
GeneralFFFF: Open Source Pinmembermaihem7:49 15 Jan '06  

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Mobile
Web04 | 2.5.120529.1 | Last Updated 29 Nov 2005
Article Copyright 2005 by Peter Kankowski
Everything else Copyright © CodeProject, 1999-2012
Terms of Use
Layout: fixed | fluid