Click here to Skip to main content
Licence Zlib
First Posted 29 Nov 2005
Views 346,308
Bookmarked 78 times

Generating Fractals with SSE/SSE2

By | 29 Nov 2005 | Article
An article on generating Mandelbrot and Julia sets using Intel's Streaming SIMD Extensions (SSE, SSE2).
 

License

This article, along with any associated source code and files, is licensed under The zlib/libpng License

About the Author

Peter Kankowski

Software Developer

Russian Federation Russian Federation

Member

Peter lives in Siberia, the land of sleeping sun, beautiful mountains, and infinitely deep snow. He recently started a wiki about algorithms and code optimization, where people could share their ideas, learn, and teach others.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board. (secure sign-in)
 
Search this forum  
 FAQ
    Noise  Layout  Per page   
  Refresh
Generalsource contains a virus PinmemberCem Usta21:57 3 May '09  
GeneralRe: source contains a virus Pinmembersergeykkk15:29 25 Aug '11  
GeneralRe: source contains a virus PinmemberPeter Kankowski16:31 25 Aug '11  
GeneralRe: source contains a virus Pinmembersergeykkk18:28 25 Aug '11  
GeneralFractal Dimension Indicator Formula Code PinmemberYIYI2223:50 30 Jul '08  
Generaltypo PinmemberSavageJ20:11 3 Jun '06  
GeneralSimplified Program PinmemberPeter Kankowski18:30 26 May '06  
GeneralIterations greater than 64 [modified] Pinmemberzenzero11:46 26 May '06  
GeneralRe: Iterations greater than 64 Pinmemberzenzero13:22 26 May '06  
GeneralRe: Iterations greater than 64 PinmemberPeter Kankowski18:23 26 May '06  
GeneralRe: Iterations greater than 64 Pinmemberzenzero1:08 27 May '06  
GeneralCould not compile on VS 2005 Pinmemberzenzero7:54 26 May '06  
GeneralRe: Could not compile on VS 2005 Pinmemberzenzero10:20 26 May '06  
GeneralBenchmark PinmemberxKuemmelx12:42 8 May '06  
GeneralRe: Benchmark PinmemberPeter Kankowski14:45 9 May '06  
GeneralRe: Benchmark PinmemberxKuemmelx21:54 10 May '06  
GeneralRe: Benchmark PinmemberxKuemmelx11:34 15 May '06  
GeneralAnother approach, removing branches PinmemberArne Thormodsen12:24 10 Apr '06  
GeneralRe: Another approach, removing branches PinmemberPeter Kankowski0:25 11 Apr '06  
GeneralRe: Another approach, removing branches PinmemberArne Thormodsen13:33 11 Apr '06  
GeneralCode Questions/Comments PinmemberxKuemmelx2:40 8 Mar '06  
GeneralRe: Code Questions/Comments PinmemberPeter Kankowski15:51 8 Mar '06  
GeneralRe: Code Questions/Comments PinmemberxKuemmelx20:24 8 Mar '06  
GeneralRe: Code Questions/Comments PinmemberPeter Kankowski1:29 9 Mar '06  
In general, SSE is faster than FPU:
 
- SSE uses registers instead of stack, which is much easier for both programmer and processor.
 
- All SSE instructions have lower latencies, for example FADD takes 6 cycles to execute on Pentium IV model 3, while ADDPD or ADDSD takes only 5.
 
- Though, throughput for FADD is higher than for ADDPD on Pentium IV (throughputs for multiplication are equal). That's why Intel's Optimization Reference Manual says: "For applications with a large number of adds relative to the number of multiplies, x87 FPU may be a better choice". The algorithm for Mandelbrot set contains 3 multiplications and 5 additions, so, in theory, it may be the case. But in practice, SSE code for calculating Mandelbrot set is faster than FPU code, mostly because it processes pixels in parallel.
 
I will cite Intel's manual again: "Use scalar SSE/SSE2 unless you need an x87 feature. Most scalar SSE2 arithmetic operations have shorter latency then their x87 counterpart and they eliminate the overhead associated with the management of the x87 register stack". So SSE is usually faster than FPU even for scalar instructions (such as ADDSD or ADDSS versus the vector instructions ADDPD or ADDPS). That's why MS had included the option to generate scalar SSE code in their Visual C++ compiler (/arch:SSE2).
 
Also, there are rumors that FPU, MMX and 3DNow will not be supported in 64-bit Windows, because Windows will not save FPU stack when doing task switch (see http://en.wikipedia.org/wiki/Talk:AMD64#FPU.2FMMX_Registers). Note that I only heard the rumor, I have not 64-bit CPU nor 64-bit Windows and cannot confirm or disprove it. The latest information from Wikipedians is that the instructions will work, but none of Microsoft compilers will generate them. But you still may think that optimizing FPU code to death is not the best way to spend you time, because FPU instructions will work in 64-bit Windows, but they will be considered as a legacy and will not be officially supported.
 
So, thank you for interesting considerations, but the overall perfomance will never be higher for FPU code than for SSE code.
 
Peter
GeneralFFFF: Open Source Pinmembermaihem7:49 15 Jan '06  

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Mobile
Web03 | 2.5.120529.1 | Last Updated 29 Nov 2005
Article Copyright 2005 by Peter Kankowski
Everything else Copyright © CodeProject, 1999-2012
Terms of Use
Layout: fixed | fluid