RPGM 2.0 Docs

C scripts, .c and .dll files

R is a very powerful language, however its main weakness is the execution times, especially for operations like loops.
It is however possible to create packages in C for R, ensuring the fastest execution time is achieve for an algorithm and/or a code.
We will see in this page how to create a C package and how RPGM can help you.

RPGM is not needed for creating and compiling C packages, however, it is complicated since you always have to switch between your code and the command line console for testing.
In RPGM, a Compile button automatically compiles the opened .c file in the Editor and generates the DLL file if no error occurs.

For using a C function in a R code, the DLL file must be loaded first with the dyn.load("my_C_code.dll") R function.
Also, it must be unloaded with dyn.unload("my_C_code.dll") before recompiling it.
Otherwise, an error will occur as the file is in use and can’t be overwritten.

There are two ways to write C functions for R.

Without R objects, using .C()

This is the easiest method. The function in C will look like this:

void myCfuntion(int *arg1, double *arg2, double *result)
{
}

The asterisk means that a variable is a pointer. The keyword void means that the function will return nothing. A variable passed as a pointer will be modified as the result.

Here is a simple C function which returns the highest value between x and y by storing it in m:

void myMax(double *x, double *y, double *m)
{
    if(*x > *y)
        *m = *x;
    else
        *m = *y;
}

Of course there is already max() in R but this is for the demonstration. Now, the DLL must be created.
Just press the Compile button or the F6 key and make sure that everything went well in the output console.
A myfunction.dll file is now in the same folder as your C source file.

To load it in R, load the DLL first by using dyn.load("myfuncton.dll") and then use it with .C() like this:

.C("myMax", x=2, y=5, m=numeric(1))
$x
[1] 2

$y
[1] 5

$m
[1] 5

The first argument is the name of the C function of the DLL, followed by the arguments needed for the function (x, y and m).
m is numeric(1) to create a variable of type double of length 1 which will contain the result.
All variables are named, e.g. x=2, because the .C function returns a list of all variable given and the name are the same as the one used to call the function.
The function returns a list of variables. So the m variable must be read from the list:

m <- .C("myMax", x=2, y=5, m=numeric(1))$m

However, this way to call C functions may induce some problems, for example, 1L means 1 to R in the integer type, but if the myMax function is called with 5 and 1L:

.C("myMax", x=2, y=5L, m=numeric(1))$m
[1] 2

The result is wrong, because the C code has a double for y and so the result may be wrong due to number type/format. Sometimes this can lead to the crash of the R session.
For these reasons, we need to write a R function which will call the C function:

myMax <- function(x, y)
    return(.C("myMax", x=as.double(x), y=as.double(y), m=numeric(1))$m)

myMax(2, 5L)
[1] 5

And now it works. If you want to edit or add a function in your C file, don’t forget to execute dyn.unload("myfunction.dll") or you won’t be able to compile it again.
A vector can also be used in the C function, each element are accessible with the syntax m[0]. But some special R objects like a matrix can’t be used.
You can return a vector but you cannot return a matrix. Fortunately, this is doable using the second way to write C functions.

With R objects, using .Call()

A R object, which is actually a structure, is a SEXP in the C code:

SEXP myCfunction(SEXP arg1, SEXP arg2)
{
}

The function only takes and returns SEXP objects.
Here is an example which behaves like colSums(), but returns a vector of the max values instead of the sums for each column in a matrix n x m:

 SEXP colMaxs(SEXP M)
 {
1    int i, j, nrow, ncol;
2    double *pM, *pcMax;
3    SEXP dimMatrix, cMax;
4    pM = REAL(M);
5    dimMatrix = getAttrib(M, R_DimSymbol);
6    nrow = INTEGER(dimMatrix)[0];
7    ncol = INTEGER(dimMatrix)[1];

8    PROTECT(cMax = allocVector(REALSXP, ncol));
9    pcMax = REAL(cMax);

10    for(i=0; i<nrow; i++)
      {
11        pcMax[j] = pM[nrow*j];
12        for(j=0; j<ncol; j++)
13            if(pM[i + nrow*j] > pcMax[j])
14                pcMax[j] = pM[i + nrow*j];
      }
15    UNPROTECT(1);
16    return(cMax);
}

It may look complicated at first but it’s simple:

  1. We declare int variables. i, j will be for the loops, nrow and ncol are the dimension of the matrix M parameter.
  2. We declare two pointers. pM will be the pointer of the values of the matrix M. M is an object with several attributes, pM will point to the values of the matrix. We will have to return a SEXP object of type vector. pcMax will point to the values of this vector.
  3. dimMatrix will get the dimensions of M. It is a SEXP object. cMax will be the vector we will return at the end of the function.
  4. We now make pM to point to the values of M. Since M is a matrix of double, the function to access the values is REAL().
  5. We access to the dimensions of M using getAttrib() and the value R_DimSymbol. dimMatrix is a SEXP object which contains two integers.
  6. Since dimMatrix is a SEXP object of integers, we access to the values with INTEGER(). It returns a pointer but with [0] we access to the first value.
  7. Same as above with the number of columns.
  8. Here we create the vector that we will return. AllocVector() is the function for the creation of a SEXP of type vector. REALSXP() will assign the type double and ncol corresponding to its length. The function PROTECT() protects the variable from the garbage collector for the time needed, always use it.
  9. Here, as in line 4, we assign to pcMax the access to the values of the vector created.
  10. We start the loop over the lines.
  11. Since we look for the max of the column, we first affect the first value.
  12. The loop over the columns.
  13. If the value of the next element in the column is greater than the highest saved…
  14. … we put it in our vector.
  15. We remove the protection of our variable.
  16. We return the variable.

Note that we access to the value of the matrix with one index, pM[i + nrow*j].
In C both are vectors, remember that in R we always count the first column and then the second and so on.
Here, it is the same. pM[0] is M[1,1], pM[1] is M[1,2]. This explains the call of pM[i + nrow*j].