|
His name is GOTO so he just has to optimize!
... such stuff as dreams are made on
|
|
|
|
|
I try occasionally. I've tried matrix multiplication too but I couldn't get further than around 28 flop/cycle (on a Haswell so the target is 32 flop/cycle), that already took some weird trickery.
I'm learning AVX512 but so far just theoretically since I don't have the hardware (I could just buy it of course, but I want VBMI2 and some other extensions too and they're not out yet).
|
|
|
|
|
Chris Maunder wrote: Anyone here writing truly performant code
Heck, I'm always complaining about the lack of performance, documentation and consistency of all the ridiculous ways things "have to" get done where I work. A website for requesting access, hardware, software, etc., that's so obtuse nobody actually knows how to use it. Spreadsheets used as forms to submit releases from development to QA to production. A job scheduler that is only smart enough to launch .bat files that then trigger SQL SP's or application EXE's. An SDLC (Systems Development Life Cycle) app that takes more time to figure out how to submit a release than to write the damn code. Corporate guidelines on the configuration of a development computer - W7, 4GB RAM, archaic 5400 RPM drive. Oh, and giving up after 15 minutes on hold at the corporate help desk because I still can't connect to my box using remote desktop because the request got lost in the aforementioned "automated" request website.
So, performant code, IMHO, starts with performant people and processes, then performant tools, then performant designs, and at some point you might need to actually look at loop order and memory utilization. I've written complex multithreaded analysis tools in C# that ran circles around the previous C++ versions because I worked on optimizing the design of the code. Long gone are the days when I had to count CPU cycles and worry about odd/even byte addresses to see what I could fit into the 1.6ms or so (IIRC) of the vertical blanking interval of a video display.
Sorry, didn't mean to turn this into a curmudgeon's rant. I spent this week dealing with these things, computer reboots due to security updates, and other non-performant issues.
Latest Article - Contextual Data Explorer
Learning to code with python is like learning to swim with those little arm floaties. It gives you undeserved confidence and will eventually drown you. - DangerBunny
Artificial intelligence is the only remedy for natural stupidity. - CDP1802
modified 1-Mar-18 11:10am.
|
|
|
|
|
Let me get you a whisky
cheers
Chris Maunder
|
|
|
|
|
Here I am, and in single thread to boot.
I had to document, debug and complete in Assembler: a library for fast conservative rotations on 8 bit images; build the same library for 16 bit images; design and develope a normalizing / lut applicator library.
The last time I used assembler inline was to make fast integrations across columns on a 16 bit image and to devolve the 3x3 Sobel algorithm from a bidimensional algo to a single pass linear one. I brought down the execution times by 66% only using assembler, after having them already halved with proper cycle ordering and pointer arithmetics.
GCS d-- s-/++ a- C++++ U+++ P- L+@ E-- W++ N+ o+ K- w+++ O? M-- V? PS+ PE- Y+ PGP t+ 5? X R+++ tv-- b+(+++) DI+++ D++ G e++ h--- ++>+++ y+++* Weapons extension: ma- k++ F+2 X
|
|
|
|
|
Hi, Chris.
I've read your post about through with a certain interest. Norm
ally, in my opinion neither C# nor JavaScript and even Node.JS are not considered to be the best platforms for high-performance computing, specifically matrix operations:
1.C# and .NET Framework have System.Threading and System.Threading.Tasks packages that implement such things as Parallel.For and Parallel.Foreach methods that use POSIX threads to perform parallel loop execution. Unfortunately, POSIX threads are mainly based on "stealing" threads scheduling mechanism, and, thus, the loop execution cannot be perfectly parallelized and either scaled across multiple CPU's cores.
2. JavaScript has a Worker object that allows to spawn particular fragments of code running in parallel. The same is also about Parallel.JS library as well. Unfortunately, the either Worker object or Parallel.JS have the number of issues that for many programming tasks obstacle from delivering a proper parallel code that might be perfectly scaled across all CPU's cores. For example, multithreading in JavaScript does not yet allow to provide and implement synchronization mechanism for the threads spawned during the code execution.
3. Node.js which is also known as server-side JavaScript has the number of issues such as numbers of memory leaks in node.exe executable, various bugs and errors related to 64-bit platform. Since the time when Node.js era began, there's still no such things as tight loop parallelization, synchronization of threads, etc.
What I've recommended:
To be more specific addressing the matrix multiplication task, I would recommend writing the code in C++ and use OpenMP performance library to deliver the modern code that will be executed in parallel and will be perfectly scaled across all cores of your CPU. Here's an example of parallel code in C++ using OpenMP library implementing matrix multiplication:
#include <iostream>
int A[3][3] = { { 2, 9, 7 },
{ 1, 6, 4 },
{ 9, 1, 8 } };
int B[3][3] = { { 8, 1, 3 },
{ 6, 5, 2 },
{ 4, 7, 9 } };
int C[3][3] = { { 0 } };
using namespace std;
int i,j;
const int N = 3;
int main()
{
#pragma omp parallel for private(i,j) shared(A,B,C) collapse(2)
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
{
for (int t = 0; t < N; t++)
C[i][j] += A[i][t] * B[t][j];
}
for (i = 0; i < N; i++)
{
for (j = 0; j < N; j++)
std::cout << C[i][j] << " ";
std::cout << "\n";
}
return 0;
}
Ref: cpp.sh/2lsa3
|
|
|
|
|
|
Sometimes.
Embedded world needs performance, sometimes.
|
|
|
|
|
I don't work on that sort of software (Bob be praised), but I try to pay attention to things like how I loop and cache stuff and how to use multiple threads and database connections effectively.
|
|
|
|
|
I was writing
var dualScreenTop = window.screenTop ?? screen.top;
And wondering why it wasn't working.
Maybe because I'm using Javascript.
I :heart: ??
cheers
Chris Maunder
|
|
|
|
|
var dualScreenTop = window.screenTop || screen.top;
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Yeah, I know, but it just feels...wrong. It's like adding two bitmask flags instead of putting on a pair of pants like a civilised person and ORing them.
(and what I really want is the ?? in Javascript to do
var dualScreenTop = window.screenTop !== undefined ? window.screenTop : screen.top;
ie. a little bit of type checking. And yes, I know that's an oxymoron in Javascript.
cheers
Chris Maunder
|
|
|
|
|
It certainly feels wrong. But then, it's Javascript; the whole language is one big steaming pile of nope.
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
cheers
Chris Maunder
|
|
|
|
|
Don't you listen to him!
Everyone whines about stuff like pizza or french fries being horrid junk food.
But then they all line up to buy and eat them.
And so it is with javaScript.
Perhaps an allegorical junk-food language, but deep down, don't we all love junk food?
Ravings en masse^ |
---|
"The difference between genius and stupidity is that genius has its limits." - Albert Einstein | "If you are searching for perfection in others, then you seek disappointment. If you are seek perfection in yourself, then you will find failure." - Balboos HaGadol Mar 2010 |
|
|
|
|
|
I had found one of these[^] under the christmas tree (put it there myself)
It's very nice and well as it is as long as I fly it as a plain 'broomstick'. Eventually I might put it into a scale fuselage, like a Sikorsky S-58 or a Wessex[^]. The problem is that I need a torque tube for the tail rotor, not a belt drive.
Now look what I just found: Align Torque Tube Drive Upgrade Set (T-Rex 500X)[^]
Edit: Animated video how to assemble that thing[^]
Edit^2: And I will need one of these.[^]
I have lived with several Zen masters - all of them were cats.
modified 1-Mar-18 9:08am.
|
|
|
|
|
|
RickZeeland wrote: That was a very nice act of yourself What? Act? Are you a painter? Or a pervert? Both?
I have lived with several Zen masters - all of them were cats.
|
|
|
|
|
I'm a painting, perverse, populistic, programming pirate
|
|
|
|
|
A pirate? Do you program in Arrr ?
|
|
|
|
|
|
And I thought that pirates run on rum.[^]
Quote: Functional language, easily extensible and possible (Lua features with LISP syntax and functional) to be embarked on software Go! The words, I hear them, but what do they want to tell us with them? LISP is the software equivalent of waterboarding.
I have lived with several Zen masters - all of them were cats.
|
|
|
|
|
Long ago I had to do some AutoLisp in AutoCAD 10, most dreadful experience ever !
But rum appeals to me somehow, why would that be, harrr ?
|
|
|
|
|
well you know, some days you don't 'feel yourself.' so I guess the other days you do.
Signature ready for installation. Please Reboot now.
|
|
|
|
|
That's no excuse for commenting the act paintings he got of me. Where did he get it anyway? Should I not have been there when it was made?
I have lived with several Zen masters - all of them were cats.
|
|
|
|