
MarkBrock wrote: To be fair, my bet is 99% of college students don't know (or care) how to do this, along with 99% of the rest of the world.
You must have been taught how at junior school, though
I know, since my son has this in his Math book.





ChandraRam wrote: You must have been taught how at junior school, though
Yeah, I was.
Sit down and watch the TV show "Are you smarter than a 5th grader". Unless you've got better memory than an elephant, I bet you can't answer most of the questions .





You do not need to divide by all numbers, only by primes. If the number is evenly divided by a prime, then keep dividing by that prime until the result leaves a remainder, then go to the next prime. There exists (google primes) a site on which there is a list of the primes for the first million numbers, the second million numbers ... to the first 15 million numbers.
Dave Augustine.





Member 4194593 wrote: You do not need to divide by all numbers, only by primes.
... and, if we are optimizing, your forloop does not need to run all the way up to InputCase . You only need to check for divisibility of numbers up to the square root of InputCase .





I see. I'll apply both suggestions.
That's why its so slow at 20!.
It is said that the most complex structures built by mankind are software systems. This is not generally appreciated because most people cannot see them. Maybe that's a good thing because if we saw them as buildings, we'd deem many of them unsafe.





Ian Uy wrote: That's why its so slow at 20!.
There are easier ways of factorizing 20!
Peter
"Until the invention of the computer, the machine gun was the device that enabled humans to make the most mistakes in the smallest amount of time."





A possible solution would be as follows:
void prime_factors(unsigned int n, std::deque<std::pair<unsigned int,unsigned int>>& factor_list)
{
factor_list.clear();
unsigned int upper_bound = ::floor(std::sqrt(n));
unsigned int i = 2;
while(i <= upper_bound)
{
std::pair<unsigned int, unsigned int> current_factor(i,0);
while(0 == (n % i))
{
n /= i;
++current_factor.second;
}
if (current_factor.second > 0)
{
factor_list.push_back(current_factor);
}
++i;
}
}
The primefactors will be in the deque, the first of each element is the factor and the second is the recurrence count of the factor.
[updated]





[edit] since Arash has completely changed the code in his previous post, this appears somewhat out of context [/edit]
I think this is an example of misguided optimization. Whilst Ian Uy's example looks inefficient
int main()
{
int InputCase;
cin >> InputCase;
for(int i=2;i<=InputCase;i++)
{
if(InputCase%i==0)
{
cout << i;
InputCase/=i;
i;
}
}
return 0;
}
all he is doing each iteration is testing
if(InputCase%i==0)
In order to avoid these "unnecessary" tests, you are testing each n as
if (n % 2 == 0) n++;
while(!is_prime(n)) { ++n; }
The function is_prime(n) will involve much more overhead than the original code, and it doesn't matter that 'i' may not be prime.
Peter
"Until the invention of the computer, the machine gun was the device that enabled humans to make the most mistakes in the smallest amount of time."
modified on Saturday, October 18, 2008 11:35 PM





Hi Peter,
I agree.
Some improvements are possible, but they do not involve adding method calls:
1. replacing if(){} by while(){} and dropping the i;
then changing the for loop so it starts with 3 and increments by 2 or by 2 and 4 alternating
(make sure to test 2 itself also)
2. use multiple (say two) threads, assuming a Core Duo or something similar.





Hi Luc,
True. The most important thing he should do is limit the testing to sqrt(N).
To factorise a number N, you need to test up to sqrt(N), and there are approximately
sqrt(N)/log(sqrt(N)) primes to test. For example, to factorise 1,000,000 you only need to test the 168 primes less than 1000, or 16.8% of the numbers less than 1000.
Using the simple algorithm you test 100% of the numbers less than sqrt(N):
if you eliminate the multiples of 2 you are down to testing about 50%
then eliminating the multiples of 3 drops you to testing 33.3%
then eliminating the multiples of 5 drops you to testing 26.7%
and so on.
the first few are probably worthwhile as you point out, but you then rapidly get in to diminishing returns, and may easily get to the stage where it is not worth the effort to save the time of the line
if(InputCase%i==0)
I don't think multiple threads are worthwhile until you go past 32 bit arithmetic (even the simplest algorithm only has 2^16 tests to do).
Peter
"Until the invention of the computer, the machine gun was the device that enabled humans to make the most mistakes in the smallest amount of time."
modified on Sunday, June 29, 2008 10:10 PM





cp9876 wrote: The most important thing he should do is limit the testing to sqrt(N).
I believe this is not correct, assume you have an x composed from 2p where p is a prime, in this case it is clear that in certain circumstance sqrt(x) can be less than p (eg: 15838 = 7919 * 2, where 7919 > sqrt(15838)), for primal testing the sqrt(n) bound is ok, however its not suitable for the upper bound in this problem.





Arash Partow wrote: I believe this is not correct, assume you have an x composed from 2p where p is a prime, in this case it is clear that in certain circumstance sqrt(x) can be less than p (eg: 15838 = 7919 * 2, where 7919 > sqrt(15838)), for primal testing the sqrt(n) bound is ok, however its not suitable for the upper bound in this problem.
He's studying for the ACM competition  I don't want to spoon feed him! He can limit his testing to sqrt(N) but he has to modify his algorithm slightly for when InputCase is not 1 at the end.
In the example you give, he only needs to test until floor(sqrt(15838))=125, he will have only found 2 as a factor and InputCase will be 7919 after the 124 tests. He can then conclude the the remaining value in InputCase is the remaining prime factor. This is much more efficient than testing the remaining 15,703 factors.
Peter
"Until the invention of the computer, the machine gun was the device that enabled humans to make the most mistakes in the smallest amount of time."





cp9876 wrote: This is much more efficient than testing the remaining 15,703 factors.
true.





you're right, as long as one divides by the current number until the remainder is nonzero they will have taken out all of the multipleoffactors as well, no need to explicitly only test/divide by primes.
good catch!





Hello,
i have set of 2D points and i want to offset with a given distance (like offset command in AutoCAD)
i do not know how to deal with corners. i have searched on Internet, there are advanced methods like straight skeletons etc. but my polyline is not self crossing and no holes in it.
Is there any simple method? any references?
Best Regards.
Bekir





Why can't you just iterate through the points and add your offset to each one? It seems like this should translate the corners along with everything else.






beko wrote: Just translating the edges with the offset distance will not work.
Why not?
beko wrote: I have found an algorithm not tested throughly, but it is working at least for my very simple 3 points polyline test
What kind of test you did?
If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler.
 Alfonso the Wise, 13th Century King of Castile.
This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong.
 Iain Clarke





Hello,
At the end, it is a translation however it must be implemented with respect to the normals, as far as i read through if it is a parametric curve then it is rather complicated. If you like to see how much it can further get complicated, you can have a look at the link below.
http://timeguy.com/cradekfiles/emc/liu2007_line_arc_offset.pdf[^]
for the test, i have just created a polyline with three points and used the algorithm from Eduardo in my earlier reply, and compared it with AutoCAD output
Best Regards.





That paper seems to give a good description of how to do it. It's complicated, you will have lots of cases  you'll just have to get used to it.
Peter
"Until the invention of the computer, the machine gun was the device that enabled humans to make the most mistakes in the smallest amount of time."





There is no simple solution to this problem, the best known general solution, is to create capsules between consecutive point pairs of your polyline where the capsule width is the offset, then union the capsule outlines. The capsule may be centered around the edge or it may be biased towards one side of the edge it is really up to the end requirement. For the union operation use something like Alan Murta's Generic Polygon Clipper library.
If you know your polyline is enclosed and represents a convex polygon, then simply calculate the centroid, then offset each edge by the product of the desired offset amount and the vector composed of the midpoint of the edge at hand minus the centroid.
A simple example can be found here:
http://www.codeproject.com/KB/recipes/Wykobi.aspx





Thank you for the all answers,
I will have a look at it.
Best Regards.





Hi all. I've been working on a model in Excel that I'm considering porting into C/C++. I've been using the "Goal Seek" function in Excel to find a value in one cell that causes the results in another cell to equal some target value.
Is there an existing numerical recipe/algorithm to perform a similar function in C without reinventing the wheel?
Thanks for any help!





Not in C/C++, but this is a custom implementation in VB:
http://www.bmsltd.ie/DLCount/DLCount.asp?file=GoalSeek.zip[^]
Goal seeking is based on linear interpolation, perhaps you can adapt the VB code to C++?
I'm not too familiar with COM, etc. so I don't know the details but I do know that you can make calls to Excel from C++, maybe you can call the Excel Goal Seek function using COM (by loading the Excel .dll)?





"...but I do know that you can make calls to Excel from C++..."
Well, THAT would solve my problem! I've been doing a lot of processing in Excel (nicer, because I can see the numbers all laid out), then saving data to a text file, reading it into my C program and continuing my model that way. If I could just make calls right into a spreadsheet I could save myself a lot of dumb grunt labor. I'll check into that. Thanks!





Here[^] is a Microsoft knowledgebase article that provides some explanation. I think you have to use MFC to do it.
If you want to use Managed C++, here's a nice CP article:
Link[^].
I don't know a whole lot about COM, etc...so you might want to try asking for more details in the C++ forums here. Somebody can probably give you more assistance there.





KimN wrote: Is there an existing numerical recipe
There is a famous book called "Numerical Recipes", the original editions are available online for free. See here[^]. I think you need some adobe plugin to view the books  it is well worthwhile.
What you are interested in is optimisation of functions. The way this is typically done is you define a function that you want to minimize, e.g.
double func(double* x)
which takes N variables in the array 'x' and returns a single value. An optimization routine will attempt to find the values of 'x' that minimize the function. A very simple workhorse is called the downhill simplex method in multidimensions, but to you it is simply a function Amoeba to which you pass mainly a pointer to your function, the number of variables and starting values.
If you want to solve f(x) = c, make func(x) = (f(x)  c)^2
You will find a listing for amoeba in section 10.4 of Numerical Recipes in C.
If you know more about the function you are trying to minimize then there are probably better routines available, but I'd suspect that amoeba will work as well as excel (and be much faster in native code).
Peter
"Until the invention of the computer, the machine gun was the device that enabled humans to make the most mistakes in the smallest amount of time."





Good answer, nicely explained.





I am trying to write some graphics code using the webdevelopment tool Adobe ActionScript (version 3). AS3 has many sophisticated graphics capabilities, but what it does not have apparently, is any sort of BitBlt capability with raster operation (ROP) codes. This may have something to do with it being based on 'Sprites' which from my brief persual of some wikipedia info, is a different philosophy altogether from BitBlt.
I was trying to emulate BitBlt from scratch in ActionScript, by merely dumping three bitmaps into arrays, and then going through them pixel by pixel doing the logical ops on each pixel. As could probably be expected, this is way too slow. It was taking a full second to do ROP bitmap combinations that BitBlt would do instantaneously. (This has nothing to do with any inherent limitation in Actionscript, which does plenty of things graphical instantaneously.)
So I need to find out the algorithms that BitBlt is employing to do what it does and if its possible for me to emulate that (assuming that BitBlt isn't relying primarily on graphics hardware that can't be accessed from a web application.)
Important point:
There is only one ROP code I need to use, so possibly it could be optimized on that basis. (Its one of the more complex ROP codes involving XOR and AND of three different bitmaps).
 modified at 14:44 Thursday 15th November, 2007





Hi, image operations in professional packages are implemented using highly optimized code,
typically a combination of C and assembly code; where possible SIMD techniques are applied
(that's MMX/SSE on Intel Pentium). It is the result of extensive research, so you won't
match its performance easily.
Luc Pattyn [Forum Guidelines] [My Articles]
this months tips:
 use PRE tags to preserve formatting when showing multiline code snippets
 before you ask a question here, search CodeProject, then Google





"Hi, image operations in professional packages are implemented using highly optimized code, typically a combination of C and assembly code; where possible SIMD techniques are applied (that's MMX/SSE on Intel Pentium). It is the result of extensive research, so you won'tmatch its performance easily."
If it were documented anywhere, of course I would be able to emulate it (w/ exception of hardware capabilities). There's no apparent performance hit merely from using ActionScript either, which does some things much faster (e.g. antialiasing) than can be done with Windows code, although I have no idea how its done.





Windows apps can call on the builtin BitBlt functions[^].
Google (BilBlt, rasterop, theory, internals, ...) will lead you to some junk and some
code snippets; there must be some C/C++/Java implementations around. But again, performance
wise none of those will match specialized products.
Luc Pattyn [Forum Guidelines] [My Articles]
this months tips:
 use PRE tags to preserve formatting when showing multiline code snippets
 before you ask a question here, search CodeProject, then Google





Luc Pattyn wrote: Windows apps can call on the builtin BitBlt functions[^].
The Windows BitBlt was the one I was referring to, and I am familiar with the documented interface for it.
Luc Pattyn wrote: Google (BilBlt, rasterop, theory, internals, ...) will lead you to some junk and some
code snippets; there must be some C/C++/Java implementations around. But again, performance
wise none of those will match specialized products
You could save yourself some typing by just responding to every post with "Google it". (Incidentally why does this forum even exist when a link to google is all that's needed.)
There are several people in this forum with "Microsoft MVP" in their signature. Don't know if that's literal or some code unique to this forum, but maybe I was thinking someone like that might run across this thread.
As far as my problem, I have three bitmaps  two of them never change position in relation to the client area, and the third scrolls up and down. With each scroll the BitBlt w/ ROP code has to be done. I'm thinking now since the first two bitmaps don't move, I can precalclulate values for them and store them in another table maybe 1015 megs in size. So, part of the ROP calculation could be replaced by an indexed table lookup, though given the ROP code, I don't know how to do this yet, and furthermore don't know if there's any time saving with this method.





Most of the people active on this forum use Windows; when they would need BitBlt, they
would call the function, without worrying too much how it is implemented (that was
Microsoft's job).
If you want to implement your own BitBlt function, you will run into performance issues;
the general lines to maximize performance on pixel operations are: avoid testing
and branching, avoid function calls, avoid unnecessary moves, and try to handle as
many pixels as possible in a single operation. The details of all this are way beyond the
scope of a forum such as this, as I already indicated.
With the partial description you provided, I would judge your table lookup is unlikely
to provide any speedup unless it replaces two of the three images, which probably it isn't.
A Pentiumclass CPU is better at reading a couple of pixel streams (which is a very
predictable process, and a prefetch can be organized), than at looking up tables;
and it is unlikely the table replaces more than a couple of (fast) instructions anyway.
Luc Pattyn [Forum Guidelines] [My Articles]
this months tips:
 use PRE tags to preserve formatting when showing multiline code snippets
 before you ask a question here, search CodeProject, then Google





<blockquote class="FQ"><div class="FQA">Luc Pattyn wrote:</div>With the partial description you provided, I would judge your table lookup is unlikely
to provide any speedup unless it replaces two of the three images, which probably it isn't.</blockquote>
Something just occurred to me:
BYTE and[256][256]
BYTE or[256][256]
BYTE not[256];
That's less than 150K.
The question is, could and[x][y] be retrieved quicker than x&y is calculated. I suppose some table like this might already be used behind the scenes if it is actually quicker though.

Sorry, that was stupid.
But here's what I might try:
BYTE ROP[256][256][256];
This would store every possible byte value for the Raster Op I'm doing. I think one table look up would be quicker than three bitwise logical ops. But how much quicker, and is it worth a 16MB table. Guess I'll find out.
 modified at 17:21 Thursday 15th November, 2007





Hi,
The optimization of image processing code is handled at the cycle level, that is why
C/assembly/MMX/SSE are required here. If you want to explain an approach in a higher level,
that is fine to express the concept, but for performance you must visualize the basic CPU
operations that will correspond.
if BYTE ROP[x][y][z] contains a random value, of course you need a table lookup.
At best, these are the basic operations that would occur (assume x,y,z in CPU registers):
copy x, shift it by 8 positions, add y to it, shift it by 8 positions, add z to it,
add the address of ROP[0][0][0] to it, do an indirect load; so that amounts to 6 simple
arithmetic operations and a load.
as long as the value of BYTE ROP[x][y][z] can be expressed as a function of x,y,z containing
less than say 6 logical operators the direct code would be faster, since it does not need
the load at all.
So what I am saying is this: on modern CPUs (Pentium class, RISC, and many others)
u= x&y  x^z  !y&z would be faster than u=ROP[x][y][z]
without any doubt, (say by a factor of 1.33 to 2)
From what you have shown so far, I can only conclude that you are a novice in
optimization stuff.
If you want to continue in this field, I recommend you study some code outside
the .NET Framework, and look at assemly code generated (a) by compilers, (b) by
manual optimization (e.g. look at the disassembled code of the strlen function, you
will find over 100 lines for something that most people would do, somewhat slower,
in less than 10 lines! The difference is not much, but that's mainly because strlen
is hard to parallellize, although there are MMX versions of it too).
Luc Pattyn [Forum Guidelines] [My Articles]
this months tips:
 use PRE tags to preserve formatting when showing multiline code snippets
 before you ask a question here, search CodeProject, then Google





Luc Pattyn wrote: From what you have shown so far, I can only conclude that you are a novice in
optimization stuff.
For the record, I've had two courses in digital logic at the graduate level, (pursuant to my master's degree) but that type of optimization isn't really an option for me here, is it? I certainly know assembler as well. Remember my original problem, however: I am trying to emulate BitBlt with in the actionscript/flash environment which doesn't have it. My goal is to eak out as much speed for the operation in that environment, and see what I come up with. In my previous post I was thinking out loud, essentially in an attempt to get some meaningful feedback for my desired goal, which BTW, I have yet to recieve. Essentially what you're implying is that there is NO WAY to achieve a faster emulation of BitBlt in a flash environment other than a pixel by pixel operation. I appreciate your explanation as to why a table lookup would not be faster than the bitwise logical operation, but once again how have you helped me towards solving my original problem at all?
Respectfully,
Force Code

"if BYTE ROP[x][y][z] contains a random value, of course you need a table lookup. At best, these are the basic operations that would occur (assume x,y,z in CPU registers): copy x, shift it by 8 positions, add y to it, shift it by 8 positions, add z to it,
add the address of ROP[0][0][0] to it, do an indirect load; so that amounts to 6 simplearithmetic operations and a load."
I am not at all convinced that all that copying and shifting and adding you describe is in fact the operations involved in a table lookup:
My thinking first of all that a compiler would convert x,y,z into a single value and store it in one register and use it as an offset from a base address  there go your litany of copies and such.
(But you said it with such authority  I almost assumed for a minute you knew what you were talking about.)
 modified at 19:39 Thursday 15th November, 2007





Force Code wrote: I am not at all convinced that all that copying and shifting and adding you describe is in fact the operations involved in a table lookup
well that too proves my point. how do you think a CPU gets to an element of a three
dimensional array?
Force Code wrote: that there is NO WAY to achieve a faster emulation of BitBlt in a flash environment other than a pixel by pixel operation
that's not what I said; to the contrary, I told you looking for parallellism
is a major way to speed up image processing. To what degree it is possible in your case
I can't tell.
I am not going to spend the time and solve your problem, I don't even plan on
investigating it any further. I have told you how complex your goal may be and what
is important to image processing performance.
I think you are looking at a major undertaking and probably will be disappointed by
the results that you will obtain, but hey don't let anyone discourage you...
Luc Pattyn [Forum Guidelines] [My Articles]
this months tips:
 use PRE tags to preserve formatting when showing multiline code snippets
 before you ask a question here, search CodeProject, then Google







Force Code wrote: There are several people in this forum with "Microsoft MVP" in their signature. Don't know if that's literal or some code unique to this forum, but maybe I was thinking someone like that might run across this thread.
It's an award handed out by MS to people who are exceptionally helpful on software forums. It uses peer nomination (other users have to submit someones name to be considered), and MS then looks over the nominees to see if they're upto their standards. MVPs aren't MS employees, nor do they have access to the MS codebase.

Help Stamp Out and Abolish Redundancy
The preceding is courtesy of the Department of Unnecessarily Redundant Repetition Department.





This was unnecessarily harsh. Luc was being very professional and trying to help.
From reading the rest of the discourse, it is very obvious to me who knows what he is talking about, and who does not.
David

Empirical studies indicate that 20% of the people drink 80% of the beer. With C++ developers, the rule is that 80% of the developers understand at most 20% of the language. It is not the same 20% for different people, so don't count on them to understand each other's code.
http://yosefk.com/c++fqa/picture.html#fqa6.6







Luc Pattyn wrote: So what I am saying is this: on modern CPUs (Pentium class, RISC, and many others)
u= x&y  x^z  !y&z would be faster than u=ROP[x][y][z]
without any doubt, (say by a factor of 1.33 to 2)
#include < iostream >
#include < windows.h >
using namespace std;
BYTE ROP[256][256][256];
int main(int argc, char *argv[])
{
DWORD time = GetCurrentTime();
for (int x=0; x<256; x++)
for (int y=0; y<256; y++)
for (int z=0; z<256; z++)
ROP[x][y][z] = (x&y)  (x^z)  (!y&z);
cout << "test 1: " << GetCurrentTime()time << endl;
BYTE n;
time = GetCurrentTime();
for (int x=0; x<256; x++)
for (int y=0; y<256; y++)
for (int z=0; z<256; z++)
n = ROP[x][y][z];
cout << "test 2: " << GetCurrentTime()time << endl;
}
And the results:
C:\devcpp\examples\test>test
test 1: 240
test 2: 60





Force Code wrote: test 1: 240
test 2: 60
and a release build gives:
test 1: 125
test 2: 0
proving the test program is worth zilch.
My final statement to you is: try to figure what the corresponding assembly code is.
Luc Pattyn [Forum Guidelines] [My Articles]
this months tips:
 before you ask a question here, search CodeProject, then Google
 the quality and detail of your question reflects on the effectiveness of the help you are likely to get
 use PRE tags to preserve formatting when showing multiline code snippets





What are you trying to prove with that benchmark? You are completely comparing apples to oranges here. The first is measuring how long it takes to WRITE some COMPUTED value to some COMPUTED OFFSET. The second is measuring how long it takes to READ some STORED value from the same COMPUTED OFFSET. In addition, the compiler may be moving x, y, or z out of the register space to allow for the arithmetic in the first example, so you may have additional memory reading/writing going on (not to mention that reads are far faster than writes). I bet if you store the result of the first loop to a single int variable, the first will be faster. If this was an attempt to prove that table lookups are faster than actual computing, then you must know that I am ABSOLUTELY not convinced.
As far as your imaging is concerned, if you are using c#, then you should use either int or long pointers to the data, and perform your operations on the pointer to a linear array. I just did some image processing stuff, and this proved to be the most efficient in c# that I found.
Jeff





OK, it looks like maybe I was mistaken:
#include < iostream >
#include < windows.h >
using namespace std;
BYTE ROP[256][256][256];
BYTE ROP2[256][256][256];
int main(int argc, char *argv[])
{
DWORD time = GetCurrentTime();
for (int x=0; x<256; x++)
for (int y=0; y<256; y++)
for (int z=0; z<256; z++)
ROP[x][y][z] = (x&y)  (x^z)  (!y&z);
cout << "test 1: " << GetCurrentTime()time << endl;
time = GetCurrentTime();
for (int x=0; x<256; x++)
for (int y=0; y<256; y++)
for (int z=0; z<256; z++)
ROP2[x][y][z] = ROP[x][y][z];
cout << "test 2: " << GetCurrentTime()time << endl;
time = GetCurrentTime();
memcpy((void*)ROP2,(void*)ROP,16777216);
cout << "test 3: " << GetCurrentTime()time << endl;
}
and the results:
C:\devcpp\examples\test>test
test 1: 191
test 2: 310
test 3: 250
But the really strange thing is, in the third case, its just a block transfer of data from ROP to ROP2, but even that's slower than iterating through the entire array and calculating each value one at a time. I'll admit that makes no sense to me.





memcpy is not /that/ fast in most implementations (you can almost always use mmx/sse2 to write a faster one if you want)
consider for one thing that the details of the copy are passed as parameters, it must at least do some work to use them, this is a waste if you already know what they are (the for loop tells the compiler for you)
also when operating on memory some instructions do not need an additional mov, which means they can be as fast, or faster, than a memory copy, others force you to do two movs (mul and div for instance)





I noticed several optimizations that may speed up your algorithm:
1. In the innermost loop, you're recomputing (x&y) even though they don't change for different z values. If you assign (x&y) to a temporary variable before the z loop, it will save time.
2. Computing the address for ROP[x][y][z] every iteration wastes time. If you set a pointer to the first element (ROP[0][0][0]), it can be incremented in one instruction to point to the next element to store your result. This is faster than the several arithmetic operations needed to access an element in a 3D array.
Your second case does this 3Delementaddress calculation twice each iteration, and takes the most time.
3. You might try making z a register variable.
4. It's been claimed that ++x is faster than x++ because the preincremented value isn't needed.
5. The slow performance in the third case is surprising. One thing you could try to bypass memcpy is the #asm directive to include inline assembly language to copy the memory block.
Hope this helps...





IIRC ActionScript is basically JavaScript, and like any interpreted language, its very slow. Sure, if you call an inbuilt antialiasing function or drawline routine, they will execute fast  but they are not written in ActionScript, just made avaliable to ActionScript.
There would be no hope of matching a native code implementation with pure ActionScript.
I would start looking for a workaround  does AS provide say a copypaste between canvas, for example?




