|
can u help me to get the code to work
|
|
|
|
|
isn't that working to you?
---
|
|
|
|
|
Non of these uploaded programs does work. It's nothing except for wasting time.
|
|
|
|
|
Very interestin article. In future I want try to experiment with GPU. Well done.
|
|
|
|
|
how exactly did you obtain the weights files using mike's code?
|
|
|
|
|
Hi.
I analyzed the execution of the program,and i found that computing 4th layer(function executeThirdLayer)
is the bottleneck of program.
I have a curious about
why the above implementation does not use multiple threads in each block
while computing 4th, 5th layer.
Because of that, the benefit of parallelism is not well exposed.
I'm not familiar with Neural Network. So i'm not sure whether the reason of such unparallelized code is limit of NN problem or not.
Could you give me any comment?
Thank you.
|
|
|
|
|
When the paint2.exe calls NN.exe the application crashes...
The same happens when NN.exe is called separately.
|
|
|
|
|
I can build everything, but everytime I run it gives the answer 2, no matter what number I draw. It even does that if I try to delete all the data files. I'll try rebooting in case the weights have been loaded in to the graphics card and are not being refreshed.
|
|
|
|
|
I like the program, but the CUDA code is not optimized - optimizations strategies were not applied. I optimized executeThirdLayer kernel to see the difference between optimized and unoptimized code. What I changed:
- Weights in each block (neuron) are padded to 1280 bytes - this makes coalesced reads
- Partial sums are copied to shared memory and then summed by one thread in each block(neuron)
In cudaprof, the standard method took 3 ms to complete. My optimized method took 0.3 ms to complete!
It is easy to write program for CUDA, but it's difficult to make it work rweally fast
My project and code is here:
http://rapidshare.com/files/295861492/NeuralNetworks_CUDA.zip.html[^]
You can switch between standard and optimized versions by commenting/uncommenting "#define ROMANOWSKI_THIRD_LAYER"
|
|
|
|
|
Error
To download this file, the uploader either needs to transfer this file into his/her Collector's Account, or upload the file again. The file can later be moved to a Collector's Account. The uploader just needs to click the delete link of the file to get further information.
|
|
|
|
|
The link you gave to us can not be logged on. Could you send the code to my email? winter.liuzy@gmail.com
|
|
|
|
|
Please, would you like to help me to prepare my project handwritten recognition word or letters with TDNN (time delay neuronal network) ,a language is c# or .
Thank you very match
contact me in:
Doc_dh23000@hotmail.com
|
|
|
|
|
Hello,
Just to let you know, we have published a paper on our GPGPU performance simulator, called GPGPU-Sim. You can find it available for download at www.gpgpu-sim.org .
It has become quite popular in the last little while, and a lot of people have been asking for the applications that we used as benchmarks in our paper, "Analyzing CUDA Workloads using a Detailed GPU Simulator" (http://www.ece.ubc.ca/~aamodt/papers/gpgpusim.ispass09.pdf), where we used your Neural Network application as one of our benchmarks.
We are currently asking the CUDA application authors whether we could package their source code together to make available for download along with our GPU simulator. Would it be OK to do so?
Just to let you know, in our paper, we slightly modified your code to increase the amount of thread-level parallelism by allowing concurrent execution on multiple digits.
George
|
|
|
|
|
Great, thanks to you.
You may be interested about another neural network application, see Sharky Neural Network.
This is free software for playing with neural networks classification (for Windows XP/Vista).
You can see network results during learning like a movie - live view.
You may also be interested in other CodeProject article: Neural Network Classifier[^]
Regards,
SharkTime.com
|
|
|
|
|
Hello!
I have realize your method depend on MNIST with the weight of Mr. Mike O'Neil's
but I can not get the accuracy that Mr. Mike O'Neil get,
About only one in ten can be recognized correctly
Coud you email to me, and sent your recognition result on the text set of MNIST.
Thanks!
Email: hanxiaoxue724@hotmail.com
|
|
|
|
|
i think it possible to train the convolutional neural network in CUDA ,only need more shared memory.
we can compute every delta value separately and finnaly sum them up in the kernel function one layer one kernel function., or other way.does anyone else agree with me. ,
as the performance,i think the CUDA is the better architecture for neural network compute, more like brain then pc.
|
|
|
|
|
where can i get Layer_1.neu , Layer_2.neu ...
i can't get it work!? i use 9600 GT and sdk has been setuped.
who will be kind to tell me how can i get it work?
and my english is poor ,i holp my words will be understood.
modified on Wednesday, December 3, 2008 8:40 PM
|
|
|
|
|
i get it.
Layer_1.neu , Layer_2.neu ... be created by nn.exe.
but nn.exe can't be execute in my system.
i modify the main function in nn.cu and link the nn.cu ,then it works. but i don't know why?
main(int argc, char** argv)
{
NeuralNetwork();
//CUT_EXIT(argc, argv);
}
|
|
|
|
|
i hope it can be used to training the network.
|
|
|
|
|
u miss cudart.dll file .download it from....google
|
|
|
|
|
Hi,
I downloaded NN.cu and NN_kernel.cu.
I modified NN.cu so that it prints out the output as follows:
for(int a=0;a<10;a++)
{
printf("output[%d]=%f\n", a, Layer5_Neurons_CPU[a]);
outputLayer[a] = (double)Layer5_Neurons_CPU[a];
}
output(outputLayer);
However, upon execution (I'm using an 8600GTS), I get the following results:
output[0]=nan
output[1]=nan
output[2]=nan
output[3]=nan
output[4]=nan
output[5]=nan
output[6]=nan
output[7]=nan
output[8]=nan
output[9]=nan
I'm not sure what the correct results are. Are they what the previous poster (AIgpu) got?
I noticed you provided what I assume to be intermediate outputs, e.g. layer_1.neu, layer_2.neu, layer_3.neu, layer_4.neu.
Would it be possible to upload your copies of those files so I can diff against what I got and debug them as necessary?
Thank you,
George
|
|
|
|
|
My bad, turns out I needed the files included with the GUI source code (e.g. lw1.wei to lw4.wei and in.neu). Now I get the same results as AIGPU.
I noticed that in your implementation you used at most 1250 threads. Since the 8800GTX can support over 12k and the GTX260 and GTX280 many more, would it be possible to further parallelize your implementation? Perhaps an easy way would be to recognize multiple digits/characters at once. I was wondering if you've done any further work on that front?
I would be very interested if you had such an implementation, since it would directly help with the research that I am doing.
|
|
|
|
|
it seems that each neural node is handled by a single thread in our program. so, as for the digit recognition program, i guess there is no need to use more threads and i don't know how.
the neural network computing is not a completely parallel procedure. you have to feed the result of the first layer into the second layer as the input.
if you can think of a way that finishes all the computation within only one layer, i think maybe you can make use of more threads.
or may be you can try a more complex neural network, i.e more layers and larger feature maps. but will the accuracy increase? i doubt it.
|
|
|
|
|
I would imagine one possible way to due it is to just replicate all the layers so that it works on multiple digits at once.
This leads to my next question:
I'm not quite sure how to interpret the output of the NN kernel.
I assume your input digit is specified by Layer1_Neurons_CPU[]:
float Layer1_Neurons_CPU[29*29]={
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,0,0,0,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1};
I assume this is a '2'. The outputs of the program (e.g. the output[] array) all seem to be close to either positive or negative one.
Does this have something to do with the 'certainty' of the program in recognizing the digit?
In that case, how can I know what the program actually thinks what the number is?
|
|
|
|
|
the output is a set of 10 numbers, the index of the largest value indicates the result.
you should read the reference article that i mentioned on the top.
|
|
|
|
|