Click here to Skip to main content
15,893,381 members
Articles / Programming Languages / CUDA

Base64 Encoding on a GPU

Rate me:
Please Sign up or sign in to vote.
4.89/5 (26 votes)
16 Sep 2013LGPL310 min read 64K   1.8K   56  
Performing base64 encoding on a graphics processing unit using CUDAfy.NET (CUDA in .NET).
CUDAfy .NET SDK V1.26 - General Purpose GPU programming for .NET
Copyright 2011-13 Hybrid DSP Systems
www.hybriddsp.com

Description
-----------
CUDAfy.NET is a .NET 4.0 library that allows writing of NVIDIA CUDA and (Intel/AMD/NVIDIA/Altera, etc) OpenCL applications from with .NET. There are no separate CUDA cu files or complex set-up procedures to launch GPU device functions. It follows the CUDA programming model and any knowledge gained from tutorials or books on CUDA can be easily transferred to CUDAfy, only in a clean .NET fashion.

Licenses
--------
The LGPL v2.1 License applies to CUDAfy .NET. If you wish to modify the code then changes should be re-submitted to Hybrid DSP.  If you wish to incorporate Cudafy.NET into your own application instead of redistributing the dll's then please consider a commerical license. Visit www.hybriddsp.com. This will also provide you with priority support and contribute to on-going development.

The MIT license applies to ILSpy, NRefactory and ICSharpCode.Decompiler (Copyright (c) 2011 AlphaSierraPapa for the SharpDevelop team).
Mono.Cecil also uses the MIT license (Copyright JB Evain).
CUDA.NET is a free for use license (Copyright Company for Advanced Supercomputing Solutions Ltd)
The BSD v2 License applies to SimpleBarrier. Author: Thomas W. Christopher

The MIT license applies to Cristi Potlog's Controls for .NET used in installer.


Installation Instructions
-------------------------

Download and install in default locations the CUDA Toolkit 5.5 from http://developer.nvidia.com/cuda-toolkit-sdk

Demo Visual Studo projects are provided.  
Cudafy by Example is based on the book CUDA By Example by Sanders and Kandrot and is copyright NVIDIA. Reading this book is 
highly recommended.

CudafyModuleViewer can be used to examine Cudafy modules (*.cdfy)


Releases
--------
V1.26 16-09-13
	Fix: Select correct ProgramFiles for CUDA for 32-bit app on 64-bit OS
	Fix: Checksum not compared correctly when platform set to Auto.
	Fix: Deserialize from stream.

V1.25 02-08-13
	Fix: Typo in xml comment of GThread.
	Fix: Use of eCudafyAddressSpace attribute on parameters (needed for OpenCL).
	Fix: Shared memory arrays of structs not translated correct for OpenCL.
	Fix: Cudafying reported by some users to appear to freeze when calling nvcc. 
	Chg: Make HostAllocate for OpenCL 4096 byte aligned.
	Chg: OpenCL Linux so name
	Add: LinuxDebug configuration to Cloo.VS2010.csproj
	Add: More flexible Cudafy method using CompileProperties (testing to be 		finalized).
	Add: Dynamic Parallelism support (alpha).
	Add: Ability to GetDevice based on architecture
	Add: Binary (cubin) support.
	Chg: Cudafy.Host.Unit tests.
	Chg: Update cudafybyexample
	Chg: If no arch specified fall back on CudafyModes.Lanaguage prop
	Add: CUDA 5.5 (required for RT API)
	Add: New *.cdfy files (not backward compatible)
	Add: Clone method to CudafyModule to allow easier loading of same module to 		multiple GPUs.
	Add: Support GetDeviceProperties advanced for both current and previous CUDA 		release.
	Add: .NET 3.5 support.
	Fix: Delete generated code.
	Add: CUDA 5.5 support.

V1.22 07-05-13
	Fix: Handle non-public members
	Add: OpenCL as an architecture
	Add: Support for Infinite and NaN

V1.21 Alpha - 10-04-13
	Fix: Handing of structs in OpenCL
	Fix: Make GMath.PI and GMath.E const
	Fix: Formatting of floating points

V1.20 Beta - 02-03-13
	Add: OpenCL support - target AMD GPUs and Intel CPUs
	Add: Integer intrinsics
	Add: Can now specify a null in list of types to cudafy.
	Fix: GMath.E and GMath.PI did not translate to floating point versions
	Fix: Unicode handling for CUDA when writing a string on device
	Fix: Nested structs and structs referenced within a struct broken

V1.12 - 08-11-12
	Fix: CudafyTranslator support for 2.1 and 3.5 architectures.
	Fix: Some device properties incorrect for CUDA 5.0.
	Fix: CudafyHost.GetDeviceProperties failed if device was not already instantiated.
	Add: Strongly typed launches (thanks to P. Geerkens).
	Add: GetArchitecture method to GPGPU. 

V1.11 - 30-10-12
	Add: Support for CUDA 5 Production Release.
	Fix: Compilation could enter infinite loop for some versions of CUDA.
	Fix: warpSize not properly translated.

V1.10 - 26-08-12
	Add: Support for CUDA 5 RC (required if using Maths libraries).
	Fix: Lock method when multi-threading enabled could dead-lock.
	Add: Architecture sm_35.
	Add: Support for context switching.
	Fix: Translation of PI and E must be done using InvariantCulture.
	Add: tcc driver property (HighPerformanceDriver).
	Add: GetDevice always sets the current context to the device context that was got.
	Add: Device to device copies.
	Add: Async on device copies.
	Fix: Device properties were not always correctly returned.


V1.9 - 23-04-12
	Add: New copy to device and copy from device overloads.
        Add: CUDA 4.2 support (Maths libraries must use CUDA 4.2).
	Add: Architectures sm_21 and sm_30 (Kepler).

V1.8 - 15-02-12
	Fix: All relevant CUDA.NET functions now use SizeT.
	Fix: CUSPARSE wrapper and unit tests.
	Fix: GMathUnitTests could access uninitialised memory.
	Add: Disallow passing arrays by ref.
	Add: Support for CUDA 4.1 Maths libraries (4.0 support dropped).
	Fix: Dummy struct with parameterless constructor.
	Fix: Check in CudafyTranslator caused cudafycl to fail on some structs.
	Add: GThread Ballot, SyncThreadsCount, Any, All.
	Add: Examples for SyncThreadCount and Ballot.
	Add: Debug.WriteLine and Console.WriteLine support.
	Add: Debug.Assert support.
	Add: New CUDA 4.1 error codes.	
	Fix: CudafyLanguageException inherits from CudafyException.
	Fix: GPGPU destructor should not set IsDisposed flag and should not be removed from CudafyHost store.	
	Add: Allow CudafyIgnore to be placed on struct constructors.	

V1.7 - 13-12-11
	Fix: Compiler options for specific CUDA version not handled correctly.
	Fix: Context switching to use SetCurrentContext and GetCurrentContext.

V1.6 - 09-12-11
	Chg: Use DefaultEncoding for generated CUDA C files.
	Add: Basic support for Unicode strings.
	Fix: In unit test TestASUMInVectorStep, N should be divided by 2. 
	Fix: Wrong code generated for 2D indexing when an indexer is a statement (e.g. [x+1,y+1]).
	Fix: CUDARuntime class now explicitly supports 64-bit only (cudart64_40_17.dll). Support for 32-bit can be given by 	manually altering DLL_NAME in source file. 
	Fix: Checksum check should also ensure that a suitable PTX exists. 
	Add: SPARSE Matrix library Level 1,2,3 for real numbers.
	Add: Conjugate Gradient solver.
	Add: Biconjugate gradient stabilized method (BiCGSTAB) for sparse linear system solver.
	Add: cudafy command line tool (translates and embeds CudafyModule in .NET assembly)
	Add: cudafycl unit tests.
	Fix: StringTests unit test did not run if CudafyModule was unchanged after Deserialize.
	Fix: Struct methods that return void were not translated correctly, resulting in nvcc compile error.
	Add: Deserialize CudafyModule from Stream.
	Add: SuppressWindow property in CudafyModule (stops command windows showing)
	Add: Embedded cudafy module support in CudafyModule and helper extensions for Assembly class. 	
	Add: Transparently handle checked/unchecked expressions - allows VB projects to run without Remove integer overflow 	checks option being enabled. 

V1.5 - 25-10-11
	Fix: Destination offset was set to source offset value in CopyOnHost.
	Fix: CudaGPU Free methods did not correctly switch context.
	Fix: In generated CUDA C code do not print @ sign infront of reserved words.
	Fix: Emulator now behaves like CudaGPU for illegal multithreaded use.
	Add: IsMultithreadingEnabled property.
	Add: Smart copy - easy asynchronous copying to and from GPU. Call EnableSmartCopy then use overloads of 		CopyXXXDeviceAsync().

V1.4 - 11-07-11
	Add: GenerateDebug flag to CudafyTranslator and CudafyModule - allows Parallel NSight Debugging.
	Add: Ability to pass GThread between device functions.
	Add: Max and Min support to Math and GMath.
	Fix: Math.Abs translation.
	Fix: Int64 and UInt64 not translated correctly.
	Fix: Catch kernels without any members in TryVerifyChecksums and VerifyChecksums.
	Add: Enum support (not flags).
	Fix: 32-bit and 64-bit cross compiler options.
	Add: Constant memory offset and size when transferring from host to device.
	Fix: Create default constructor for structs.
	Add: New icon to CudafyModuleViewer.
	Add: CudafyIgnore attribute to allow ignoring of properties and methods of structs.
	Add: Support for .NET pointers ('fixed' statements).
	Add: Support for fixed pointers within structs.
	

V1.3 - 21-06-11
	Change	: Separate ILSpy and Mono.Cecil dlls removed - functionality placed in Cudafy.Translator
	Fix	: DeviceProperties struct was incorrect layout results in some parameters being filled wrongly

V1.2 - 15-06-11
	Change	: CudafyByExample code now uses dynamic run-time to launch functions
	Add	: Extension methods to IntPtr to allow reading and writing of managed array
	Add	: Default parameter to delete generated code set to false
	Fix	: Dynamic launcher with zero arguments must set stream id to -1
	Add	: CUBLAS 4.0 Wrapper for Level 1 (beta)
	Add	: CUFFT 4.0 Wrapper (beta)
	Add	: CURAND 4.0 host and kernel Wrapper (beta)
	Add	: Auto resolving of 32-bit or 64-bit CUBLAS, CURAND, CUFFT and CUDART (no need to copy and rename dll's)
	Fix	: Emulated FFT destructor could cause crash
	Add	: Casting one array to another in terms of offset, size, rank, type (not fully supported for emulation)
	Fix	: Handling of the zero stream
	Add	: Additional overloads to the CudafyTranslator.Cudafy method


V1.1 - 23-05-11
	Add	: CUDA 4.0 RC Support
	Add	: CudafyTranslator (built on ILSpy)
	Removed	: .NET Reflector requirements
	Add	: CUBLAS Wrapper (Alpha)
	Add	: CUFFT Wrapper
	Change	: CudafyByExample and CudafyExamples updated to use CudafyTranslator.
	Add	: Char (Unicode) + String copy to/from device support
	Add	: Multiple kernel module support
	Add	: Casting between 1D, 2D and 3D arrays
	Add	: Foreach support
	Change	: Support Any CPU
	Add	: Dynamic Launcher (DLR)
 

V1.0 - 30-03-11
	Add	: 64-bit support to CUDA.NET and for cudafy module compilation.
	Add	: Support multiple platform targets.
	Add	: CudafyReflectorWrapper to enable cudafying from directly within user application.
	Add	: Multithreading support (call GPGPU.EnableMultithreading()).
	Add	: Overloaded Cudafy, Serialize and Deserialize methods that use name of calling type.
	Change	: Simpilify design of GPGPU, CudaGPU and EmulatedGPU.
	Add	: GPGPU supports Dispose.
	Add	: CudafyHost.CreateGPU method.
	Add	: WarpSize property to GThread.
	Change	: CudafyByExample and CudafyExamples updated to support CudafyReflectorWrapper use.
	

V0.3 - 17-03-11	
	Fix	: .NET Reflector Add-in could put output files in wrong folder when calling nvcc.exe
	Fix	: Cudafy Module Viewer - resizing window reveals bad anchoring of sub components.
	Fix	: CUDA.NET XML documentation was missing.
	Change	: Make x86 default configuration for example solutions.
	Change	: All examples except hist_gpu_shmem_atomics are pre-compiled for compute capability 1.1.
	Add	: Check for compute capability 1.2 or higher in hist_gpu_shmem_atomics. 
	Add	: Link in documents folder to CUDAfy web page.

V0.3 - 13-03-11	
	Add	: Support for .NET Reflector version 7


Known Issues and Limitations
----------------------------

Large numbers of threads per block can result in very slow emulation - for debugging on CPU temporarily minimize this.

Out parameters are not supported - instead pass an array of length 1.

The following constructs are not currently supported for translation:

	Try-Catch-Finally
	Events
	Delegates
	Pinned types
	Optional modifiers
	Volatile
	Modifiers
	Function pointers
	Properties
	Abstract methods
	Virtual methods
	Params attribute
	Jagged arrays
	Generic parameter constraints
	Default expressions
	Type of typed reference expressions
	Value of typed reference expressions
	Typed reference create expressions
	Method Of expressions
	Base reference expressions
	Try cast expressions (as)
	Can cast expressions (is)
	Null coalescing expressions
	Delegate create expressions
	Property indexer expressions
	Event reference expressions
	Delegate invoke expressions
	Stack allocate expressions
	Lambda expressions
	Query expressions
	Snippet expressions
	Memory copy statements
	Memory initialize statements
	Debug break statements
	Lock statements
	Using statements
	Fixed statements
	Non variable declarations in a block expression
	Attach event statements
	Remove event statements
	Resources
	Arrays of more than 3 dimensions
	Enumerations
	Interfaces
	Nested types
	Operator overloading
	Overloaded Methods
	Classes (individual methods, constants and blittable structs are)

Furthermore only blittable and value types are supported. For example:
	Byte, SByte, Int16, UInt16, Int32, UInt32, Int64, UInt64, Char, Single, Double

For convenience the following complex types are used.  They are directly equivalent to the CUDA types:
	ComplexF, ComplexD (cuFloatComplex and cuDoubleComplex)

Most Math methods are supported. Some .NET Math methods only support doubles. 
If you are using floats then this means an unnecessary cast will be inserted in the generated CUDA code. 
In these cases use GMath.  

Array: Length, GetLength and Rank members are supported.

Be sure to read the output of the NVIDIA compiler since the CUDAfy language will not catch all illegal
settings.

Targeting OpenCL is more restrictive than CUDA. Constant strings, methods in structs,
the Cudafy.Maths libraries, etc are not supported. Consult the unit tests and CudafyExamples
on cudafy.codeplex.com for more information on what is and is not supported.



Acknowledgements
----------------

Hybrid DSP Systems would like to acknowledge the following:

GASS for CUDA.NET (http://www.hoopoe-cloud.com/Solutions/CUDA.NET/Default.aspx)

CUDA, NSight and 'CUDA By Example' (Sanders and Kandrot) are copyright NVIDIA Corporation (http://www.nvidia.com)

Daniel Grunwald and SharpDevelop for ILSpy (http://wiki.sharpdevelop.net/DanielGrunwald.ashx)

JB Evian for Mono.Cecil (http://www.mono-project.com/Cecil)

Tamas Szalay for Managed FFTW Wrapper - Copyright 2006 - Tamas Szalay (http://www.sdss.jhu.edu/~tamas/bytes/fftwcsharp.html)

Thomas W. Christopher for SimpleBarrier

By viewing downloads associated with this article you agree to the Terms of Service and the article's licence.

If a file you wish to view isn't highlighted, and is a text file (not binary), please let us know and we'll add colourisation support for it.

License

This article, along with any associated source code and files, is licensed under The GNU Lesser General Public License (LGPLv3)


Written By
Systems Engineer Hybrid DSP Systems
Netherlands Netherlands
Nick is co owner of Hybrid DSP, a company specialized in high speed data acquisition, processing and storage.

CUDAfy.NET took considerable effort to develop and we ask nothing in return from users of the LGPL library other than that you please consider donating to Harmony through Education. This small charity helps handicapped children in developing countries by providing suitable schooling.

Comments and Discussions