I have a signal processing algorithm that uses FFT...
The algorithm is implemented in a module or the same class and has the ability to be parallelized so that up to n Instances can be made from it and run in parallel... So far, the implementation has been done in the CPU.
With this approach, I want to write a Kernel function and give it n modules... but I ran into a problem...
After some searching and checking a series of project examples, I realized that apparently the FFT calculation module in Cuda can only be used on the Host side, and it cannot be used inside the Device and consequently inside the Kernel function!
This problem does not work with the desired algorithm! which is going to take FFT many times and use its results.
Now the question is, what was the necessity of having CuFFT library in Cuda when it is not supposed to be used in Kernel!!!
If a solution is not found, the algorithm may have to be changed, then it may not be possible to parallelize it in that way.
Thank you for your attention.
What I have tried:
I implement CPU based code and it's working well.
Now, I want to use this module in GPU.