Nvidia PInvoke F#Win32 Intermediate Dev Windows Visual Basic C#

F# Versus Mathematics: Part One - Getting Started with BLAS and LAPACK

Dan Buskirk

4.89/5 (12 votes)

Mar 24, 2011

CPOL

11 min read

30749

409

Accessing Industry-Standard Linear Algebra Libraries from .NET Code

Download fundamentals - 1.99 MB

Introduction

There are hundreds of reasons to become interested in linear algebra, ranging from basic research to engineering to econometrics and financial analysis. Whatever reason draws a .NET developer to this fascinating branch of mathematics, there are many possible paths to follow. Unfortunately, the decision must often be made at the beginning of the journey, when one may not have the experience to choose appropriately. While many linear algebra libraries have been developed in pure .NET code, we shall examine the use of the well known BLAS and LINPACK libraries. These libraries have a long history with an established record of reliability and performance. Unfortunately, the libraries are large, and their nomenclature might seem obscure to the beginner who might then be put off. We shall attempt to clarify the organization and use of these libraries in general as we develop our .NET interface library.

Although we will be using F# in our example code, our goal will be to create an interface to BLAS and LINPACK which can be used by C#, Visual Basic, or any other .NET language one might prefer.

Background

BLAS, which stands for Basic Linear Algebra Subroutines, was first developed in 1979 and was programmed in Fortran. Today, the acronym "BLAS" sometimes refers to this library, sometimes to similar libraries developed in other languages, and sometimes to an ad-hoc 'standard' which alternative libraries attempt to implement. In this article, we shall use BLAS in its original sense, to refer to the Fortran library which is still an active project and can be accessed at www.NetLib.org. Originally, BLAS was often used in conjunction with LINPACK. The organization was straightforward; core functionality, like the multiplication of matrices and vectors, is found in BLAS. More complex functionality, like computing eigenvalues, was implemented in LINPACK.

You may have noticed that improved computer hardware has become available since 1979. LAPACK was developed to take advantage of new hardware in a way that LINPACK did not, particularly in parallelism. Today, BLAS/LAPACK sets the standards for numerical linear algebra, with other packages supplementing them and optimizing them, but not replacing them. In general, whether one says "BLAS" or "LAPACK", one is often talking about the pair. As its name suggests, BLAS is fundamental; LAPACK builds on the functionality of BLAS.

This project was stimulated by some F# code in the unit tests provided with the F# Powerpack source, which can be downloaded from www.codeplex.com. Not surprisingly, the organization of a test does not lend itself to a library which will grow and require maintenance. Unfortunately some of the test code also appears to be wrong, but I assume this went unnoticed since the code was used to test native pointer functionality, not itself. Portions of the code presented here, notably the wrappers found in NativeUtilities.fs, are virtually unchanged and are marked with a Microsoft copyright notice to indicate that they fall under Microsoft's CodePlex licensing policy.

Using the Code

In Part One, we will implement the multiplication of real matrices and vectors, which means we will really only need BLAS. This will permit us to focus on the library organization and on the structure we will need to call a Fortran library from .NET. In Part Two, we will extend the library a bit to include complex matrices and vectors, as well as some more sophisticated calculations, including the calculation of eigenvalues and eigenvectors, and the solution of systems of linear equations. The organization of the library will allow us to switch libraries in the future if needed. For example, if we may wish to move the BLAS functions to CUBLAS, which is the library provided by NVidia to do super fast matrix operations on its video cards or on one of its highly parallel floating point processor devices such as Tesla.

We will need a copy of the BLAS and LINPACK libraries, of course. Netlib has downloadable Windows executables. If you wish, you can compile the source yourself, but this involves installing and using the gnu compiler suite on Windows and is not for the faint of heart. Compiled Windows DLLs (32-bit) for LAPACK 3.3.0 are included in the zip file for this project. For convenience (mine, not yours), I did not include the most recent LAPACK extensions in this compilation, but by the time you need them, you will be an expert anyway.

Our library will, to a great degree, follow the BLAS and LAPACK nomenclature. Most .NET developers would rather call a function "DoubleMatrixMultiply" rather than DGEMM, and rightfully so. But then, you need a procedure for multiplying tridiagonal matrices, and another for triangular matrices. Ironically, good function names can become poor function names if you have to think of over a thousand. BLAS and LINPACK have a well thought-out convention, but one which begins to seem natural only after a bit of struggle. We will use the standard names to indicate which F# functions are ultimately relying on which BLAS/LAPACK functions. In the code here in Part One, we will use only DGEMM and DGEMV. The "D" means double floating point. The "G" means general. In other words, the matrices which are not assumed to have any special characteristic, perhaps tridiagonal for instance. "MM" means matrix-by-matrix and "MV" means matrix-by-vector, but you figured that out already.

In Fortran, all function arguments are passed by reference. This means we shall get quite our fill of pointers, even for the integers. Another complexity is that, in contrast with the .NET Framework and the C language, Fortran arrays are column-major. This means that when a two-dimensional array is laid out in RAM, the value for row 0 column 0 is followed by row 1 column 0. In .NET, an array in RAM goes row 0 column 0, row 0 column 1, and so forth. We will need to address this issue. When we create our declaration for DGEMM and DGEMV, we notice that there seem to be lots and lots of potentially redundant arguments. These are provided to give the functions great flexibility, but we shall try to keep things simple here. The good news is that once you become familiar with the meaning of the declared parameters, the same ones turn up in lots of other functions as well and the job gets easier.

// dgemm ****************************************************************************
// LAPACK/BLAS primitive matrix/matrix multiply routine 
[<DllImport(@"blas.dll", EntryPoint="dgemm_", 
		CallingConvention=CallingConvention.Cdecl)>] 
extern void DGEMM(char* transa, char* transb,
                  int* m, int* n, int* k, 
                  double* alpha, double* A, int* lda,double* B, 
			int* ldb, double* beta, 
                  double* C, int* ldc);

The GNU toolchain likes to put the underscore at the end of the function name when compiling an unmanaged Windows DLL and our declaration must match exactly. The function name is, of course, case sensitive. In this declaration, we specify the calling convention, but since Cdecl is the default, this is more for self-documentation than a requirement. Alternative calling conventions cease to exist in 64-bit Windows, for what it's worth.

The first two arguments, transa and transb are characters and have the values 't' or 'n' depending on whether you want to use matrices A and B transposed or not. Though not appropriate here, 'c' is also a possible value which can be used to request the conjugate transpose of a complex matrix. These values are not case sensitive.

'm' is the number of rows of matrix A, and hence also of the result C. 'n' is the number of columns of matrix B and hence of C. 'k' is the number of columns of matrix A and so it must also be the number of rows of B.

We'll come back to alpha and beta in a short while. "double* A" is the first double floating-point matrix and 'lda' is the leading dimension of A. In our work at present, lda will always equal m. Similarly, 'double* B' is the second matrix and 'ldb' is the leading dimension of B, once again in our work here equal to k.

'double* C' is the matrix in which we will find our result matrix and 'ldc' is its leading dimension. C can also provide input, however. This is where alpha and beta come in. DGEMM calculates the formula (alpha * A * B + beta * C). In other words, it multiplies the product of A and B by the scalar alpha. Generally, we want alpha to be 1.0. This result is added to the scalar beta times whatever contents of C are provided as input... Usually, we want beta to be 0.0. The results are placed in C.

Now that we have our declaration, we are ready to call it from F#. The organization of our library will differ a bit from what you are used to. There will be one layer, represented by LapackAPI.fs, which will contain all the declarations. At the other end, there will be two separate files for our library interface. One, fsLapack.fs, will contain the convenient wrapper functions for use of the F# programmer. Its sister file, clsLapack.fs, will contain an interface which can be called without any use of F#-specific types. This will be usable directly by the C# or Visual Basic programmer. Note the unfortunate prefix. Though lots of folks use "cls" as a Hungarian prefix for "class", here I am using it to indicate the functions accessible by any CLS language.

Between these two layers, there will be a separate file for each BLAS/LAPACK function. Though this may seem excessive, as more functions are added, it becomes more difficult to follow the call sequence from interface to API call. Having a separate file for each BLAS/LAPACK function greatly eases debugging and maintenance.

The following code illustrates a standard pattern for the project. The function DGEMM directly calls the function declaration in LapackAPI with the same name. Notice that this function takes column-major Fortran matrices as input. The second function, decorated with _C, accepts F# C-style matrices and prepares them for export to Fortran. Since all of these matrices serve as both input and output, the marshalling mechanisms will set everything right before we use the results from our .NET code.

Note the plethora of "mutable" declarations in DGEMM. We have said that everything is passed to Fortran by reference, and F# will not let you obtain a pointer to a type unless the type is mutable. Even if you don't need to "mute" it.

let DGEMM trans alpha (A: FortranMatrix<double>) 
	(B: FortranMatrix<double>) beta (C: FortranMatrix<double>) = 
  let mutable trans = trans   
  Debug.Assert(trans.ToString().ToLower() = "n") 
  Debug.Assert(A.NumRows = B.NumCols)
  let mutable beta = beta 
  let mutable alpha = alpha 
  let mutable m = A.NumRows 
  let mutable n = B.NumCols 
  let mutable k = A.NumCols 
  let mutable lda = m
  let mutable ldb = k 
  let mutable ldc = m
  LapackAPI.DGEMM(&&trans, &&trans, &&m, &&n, &&k, &&alpha, 
	A.Ptr, &&lda, B.Ptr, &&ldb, &&beta, C.Ptr, &&ldc)

   /// Here we build a version that operates on row-major data
let DGEMM_C alpha (A: CMatrix<double>) (B: CMatrix<double>) 
	beta (C: CMatrix<double>) = 
   Debug.Assert(A.NumCols = B.NumRows);
   DGEMM 'n' alpha B.NativeTranspose A.NativeTranspose beta C.NativeTranspose

If you are new to F#, you might be surprised to see two ampersands, not one, when we send value types by reference. In F#, one ampersand is for use within .NET and means "ref" or "ByRef" depending on whether you speak C# or VB. The second ampersand turns the .NET pointer into a native pointer for use with pInvoke.

We are now in a position to look into fsLapack.fs, the file which provides the interface between our BLAS/LAPACK code and the rest of the F# world. These functions are prefixed with 'fs' to indicate that. Obviously, you could change that if you like. The critical feature of these functions is that they pin the .NET matrices in RAM so that the garbage collector won't move them at an inopportune time. This is an absolute requirement for passing pointers to unmanaged code. Similar pinning is done for .NET arrays. (Matrices are defined in F#; Arrays are defined in the Common Language Specification.)

The pinning functions, found in NativeUtilities.fs, are just wrappers for the F# types PinnedArray and PinnedArray2, which themselves are just wrappers for GCHandle. If you want to master pointers and marshalling, you will become familiar with GCHandle. NativeUtitilies.fs contains code virtually unchanged from Microsoft's CodePlex test and sample code and is copyrighted accordingly.

let fsDGEMM (A: matrix) (B: matrix) = 
     let C = Matrix.zero A.NumRows B.NumCols
     let (pA,pB,pC) as ptrs = pinMMM A B C
     try dgemm.DGEMM_C 1.0 pA.NativeArray pB.NativeArray 0.0 pC.NativeArray
     finally
         freeMMM ptrs
     C

Clearly, pinMMM is the convenient wrapper that pins three matrices in a single call, and freeMMM frees the three native pointers to those matrices when we are done.

The same function can be provided with arrays as parameters, rather than matrices, so that the library may be used directly from C# or Visual Basic with no additional work.

let clsDGEMM (A:double[,]) (B:double[,]) = 
    let C = Array2D.zeroCreate (Array2D.length1 A) (Array2D.length2 B)
    let (pA,pB,pC) as ptrs = pinA2A2A2 A B C
    try dgemm.DGEMM_C 1.0 pA.NativeArray pB.NativeArray 0.0 pC.NativeArray
    finally
        freeMMM ptrs
    C

Note that a second set of pinning wrappers are provided for objects of array type.

A Warning, Perhaps an Omen

When running in the Visual Studio design environment, the Visual Studio warns of a stack imbalance. Such a warning is to be taken very seriously. If you step past the warning, the code runs normally (apparently, anyway). If you run outside the Visual Studio, script code running in fsi and compiled code execute without warning or error. I have been unable to determine how or why Visual Studio sees a stack imbalance. I have multiplied ten million pairs of matrices within a loop using this code in an attempt to manifest some stack problem, but have not uncovered any.

The Fundamentals Source Project Files

The zip file associated with this article contains Visual Studio solution files for both Visual Studio 2008 and Visual Studio 2010. There are also some batch files which will start the F# interpreted environment, fsi and load F# test scripts. Also included are some PDF files which describe some of the details you might need to attend to.

History

24^th March, 2011: Initial post