17. The Foreign Function Interface

This chapter describes the Foreign Function Interface (FFI) classes and methods, and how you can use them to build and interface to an existing C library.

Overview of the Foreign Function Interface
The purpose and use of the FFI.

FFI Core Classes
Describes the FFI related classes and data types.

FFI Wrapper Utilities
Instructions for using FFI utilities to define FFI classes for your library.

17.1 Overview of the Foreign Function Interface

For certain applications, you may need to provide functionality that is not readily available within GemStone Smalltalk. Such functionality might include interactions with third-party products such as these:

Access to hardware, such as a bar code reader
Access to software that provides a service, such as the zlib compression library
Data encryption
Screen graphics
Interaction with Oracle, mySQL, or other databases

To interact with third-party products such as these, you can use the FFI to make C library calls from within GemStone Smalltalk. Using the FFI, you can access C functions in external libraries without the need to write UserActions.

NOTE
With UserActions, your code is checked against function prototypes of the external library that you’re calling. With the FFI, no such checking takes place.

17.2 FFI Core Classes

The core FFI defines six classes: CLibrary, CFunction, CPointer, CByteArray, CCallout, and CCallin.

CLibrary

An instance of CLibrary corresponds to a C compiled library. Instances of CLibrary are created using:

CLibrary class >> named:libraryName

passing in the path and name of the C shared library to be loaded. The platform-specific extension (such as .so) is optional.

CCallout

Individual functions within a CLibrary are represented by instances of CCallout. To create a CCallout, the following class methods are available:

library: aCLibrary name: aName result: resType args: argumentTypes

library: aCLibrary name: aName result: resType args: argumentTypes
varArgsAfter: varArgsAfter

name: aName result: resType args: argumentTypes

name: aName result: resType args: argumentTypes varArgsAfter: varArgsAfter

aCLibrary may be an instance of CLibrary, an Array of CLibraries, or nil. Passing nil for aCLibrary will cause search of the loaded libraries for a function of this name. aName is a String providing the name of the specific function. resType is the return type of the function, and argumentTypes is an array of zero or more symbols describing the types of the argument for this function.

varArgsAfter is -1 if the number of arguments to the function is fixed. If the function prototype ends with an ellipsis (‘...’), indicating that the function takes a variable number of arguments, then varArgsAfter indicates the one-based index of the last fixed argument. (If varArgsAfter is 0, there are no fixed arguments.)

The following instance method is used to invoke the function described by the instance of CCallout:

callWith: argsArray

To get the value of the C global variable errno that was saved by the most recent call to callWith:, use the CCallout class method:

errno

C type symbols

Table 17.1 lists the symbols used for creating resType (result type) and argumentTypes arguments when creating CCallouts.

Table 17.1 C Type
	Return type	Argument type
#int64	Integer. The C function returns an int64.	Integer
#uint64	Integer. The C function returns a uint64.	Integer
#int32	Integer. The C function returns a signed C integer 32 bits.	Integer
#uint32	Integer. The C function returns an unsigned C integer, 32 bits or smaller.	Integer
#int16	Integer	Integer
#uint16	Integer	Integer
#int8	Integer	Integer
#uint8	Integer	Integer
#double	SmallDouble or Float. The C function returns a C double.	SmallDouble or Float; and the function is limited to a maximum of four arguments.
#float	SmallDouble or Float. The C function returns a C float.	SmallDouble or Float
#'char*'	nil or a String	The corresponding arg must be a String. The body is copied to C memory before call and copied from C memory (and possible grown/shrunk) after call. C memory will not be valid after the call finishes.
#void	nil
#ptr	nil or a CPointer	The corresponding arg must be nil, a CByteArray or a CPointer. If nil, a C NULL is passed. If CByteArray, address of body is passed. If CPointer, the encapsulated pointer is passed.
#'&ptr'		The corresponding arg must be a CPointer. The CPointer’s value will be passed and updated on return.
#'&int64'		The corresponding arg must be a CByteArray of size 8. A pointer to body will be passed.
#'&double'		The corresponding arg must be a CByteArray of size 8. A pointer to body will be passed.
#'const char*'		The corresponding arg must be nil (to pass NULL) or a String (body is copied to C memory before call) C memory will not be valid after the call finishes.

Functions using varArgs normally may have a maximum of 20 variable arguments. This limit is lower if native code is disabled for this session, as described in the following section.

Limitations with native code disabled

If the generation of native code is disabled, there are further limitations:

Functions using varArgs may have a maximum of four fixed and 10 total arguments.
Functions not using varArgs are limited to a maximum of 15 total arguments.
Arguments and results of C type float are not supported.
Functions with one or more args of C type double are limited to a maximum of four arguments.
CCallin cannot be used

Native code generation is on by default, but may be configured to be disabled or becomes disabled when breakpoints are set. See the System Administration Guide for more information on native code generation.

CCallin

A CCallin represents a signature for a C function to be called by C code. The resulting CCallin may be used as a type within the argumentTypes array when defining a CCallout.

CByteArray

A CByteArray represents an allocation of C memory. When objects such as pointers or strings are passed to or from C functions, creating a CByteArray, with memory malloc’ed, ensures that the memory will be valid following the call.

CFunction

CFunction is an abstract superclass representing the type signature of a C function. It has two subclasses, CCallout and CCallin.

CPointer

CPointer encapsulates a C pointer that does not have auto-free semantics. New instances are created by CFunction calls with result type #ptr, and are also used for certain arguments of CFunctions.

17.3 FFI Wrapper Utilities

While it is possible to manually construct FFI calls using the core classes described above in FFI Core Classes, it involves analysis of the various header files and may be tedious and error-prone. The typical header file includes many other header files, and the typical C program involves many defines, typedefs, and other definitions.

To help in the process of constructing FFI calls, GemStone includes a class, CHeader, that does the required analysis of a header file. You can parse a header file by using the method CHeader class >> path:. This will return an object containing an analysis of the header file.

The following example analyzes a a header file and stores the result in a variable in UserGlobals:

Example 17.1 Create a CHeader for zlib.h

topaz 1> doit

UserGlobals at: #'ZLibHeader' put:

	(CHeader path: '/usr/include/zlib.h').

NOTE
Many of the following examples use zlib, a software library for data compression that is available on many platforms. Documentation on the library is available at http://zlib.net/manual.html. These zlib examples are on Linux; library details are platform-specific. If you are trying these examples on another platform, you may need to experiment.

Once you have a CHeader object, you can get information about the various things defined in the header file and those it includes.

Example 17.2 CDeclaration for compress()

topaz 1> printit

(ZLibHeader functions at: 'compress')

a CDeclaration

  header              a CHeader

  name                compress

  storage             extern

  type                int32

  count               nil

  pointer             0

  fields              nil

  parameters          a Array

  enumTag             nil

  isStorage           false

  isConstant          false

  includesCode        false

  isVaryingArgCount false

  isTransparentUnion false

  bitCount            nil

  source              \n/* Return flags

			indicating compile-time options.\n\n    Type ...

  file                /usr/include/zlib.h

  line                1042

While the compress() function is directly in zlib.h, this isn’t necessarily the case. Functions that are defined in any header file that is #included in the parsed header file also will have definitions in the instance of CHeader.

For example, on Linux the zlib.h file #includes unistd.h, so functions such as getcwd() also have definitions in the instance of CHeader:.

topaz 1> run

(ZLibHeader functions at: 'getcwd') file.

/usr/include/unistd.h

On other platforms, zlib.h may not #include unistd.h. In this case, the definition is not included in ZLibHeader. In this case (if you wanted to access these functions from GemStone), you could create a separate instance of CHeader for unistd.h:

topaz 1> doit

UserGlobals at: #'UnistdLibHeader'

	put: (CHeader path: '/usr/include/unistd.h').

Note that parsing the header file does not give you the location of the actual C library file that you will be calling. Normally when to write an interface to specific libraries, you would be provided the library names and locations as well as the header files.

Simple function call – getcwd()

To take an example that is in unitstd.c, viewing the source for the getcwd() function declaration will let us see the argument declarations.

topaz 1> run

(ZLibHeader functions at: 'getcwd') source

/* Get the pathname of the current working directory,

   and put it in SIZE bytes of BUF.  Returns NULL if the

   directory couldn't be determined or SIZE was too small.

   If successful, returns BUF.  In GNU, if BUF is NULL,

   an array is allocated with `malloc'; the array is SIZE

   bytes long, unless SIZE == 0, in which case it is as

   big as necessary.  */

extern char *getcwd (char *__buf, size_t __size) __THROW __wur;

This tells us that the function takes two arguments, a pointer to a string and an integer, and returns a pointer to a string. Knowing that the function defined by this header is in libc, and the actual library path and filename is /lib/libc.so.6, we can manually create a call to this function:

Example 17.3 CCallout to invoke getcwd()

| string ccallout_getcwd |

string := String new: 200.

ccallout_getcwd := CCallout

	library: (CLibrary named: '/lib/libc.so.6')

	name: 'getcwd'

	result: #'char*'

	args: #(#'char*' #'uint64').

string := ccallout_getcwd callWith:

	(Array with: string with: string size).

It’s important to note the way arguments are defined, since C handles memory differently from Smalltalk. The temporary string that is created as an argument to the function must be created with a size larger than the expected result. This is required for heap space to be allocated for the C function; if it is not large enough, the function will error. Also keep in mind that it’s very important that the specified size of the string in the second argument not be larger than the actual size of the string. The C function will write results to memory limited by the second argument.

getcwd() updates the argument as well returns a value; both contain the same string, but different instances. In both cases String’s size is now the actual size of the returned String, truncated from the original size of 200.

More complex function call – compress()

A more complex example is the ZLib function compress(). This is defined in zlib.h as follows:

ZEXTERN int ZEXPORT compress OF((Bytef *dest, uLongf *destLen,

const Bytef *source, uLong sourceLen));

You can view a simplified definition using the CHeader printString:

topaz 1> printit

(ZLibHeader functions at: 'compress') printString

extern "C" int32 compress(uint8 *dest, uint64 *destLen,

	const uint8 *source, uint64 sourceLen)

This tells us that compress() takes four arguments:

a pointer to a destination buffer
a pointer to the length of the destination buffer
a pointer to the source data
the length of the source data

The function compresses the source data and places the result in the destination buffer. The destination length is updated with the space actually used. The function returns a flag indicating success or the type of error experienced.

We can manually create a call to this function using the core classes described in 16.1:

CCallout

	library: (CLibrary named: '/lib/libz.so.1.2.3.3')

	name: 'compress'

	result: #'int32'

	args: #(#'ptr' #'ptr' #'const char*' #'uint64').

This creates an object that can be used to call the compress() function in the library. The constructor takes four arguments: (1) an instance of CLibrary; (2) the name of the function; (3) the result type; and (4) a list of the types of the arguments.

In order to call the function from Smalltalk we need to create the arguments. The source string and the source length are easy–they are just instances of a Smalltalk String and Integer. The destination and destination length are a bit more complex. They are both pointers to memory locations where the function will retrieve information (destLen starts as the available length of the destination buffer) as well as return information (dest, where the result is placed, and destLen, the amount of dest actually used).

In general, C libraries cannot deal directly with Smalltalk objects since the format is different and objects can move in memory with various garbage collection operations. As part of making the C function call, the virtual machine converts the Smalltalk objects to C data and constructs a C stack before making the C library call. For many objects this works fine; as we saw in the getcwd() example above, simple String and Integer objects are handled properly. But when an argument is a pointer to a chunk of memory in which the C library will place arbitrary data, we need to explicitly allocate that space and pass a pointer to it.

The class CByteArray represents a chunk of memory that is outside the Smalltalk object space (it is on the "heap"), and when an instance of CByteArray is passed as a #'ptr' type, the virtual machine puts a pointer to the space on the stack before making the function call. There are methods in CByteArray to place various Smalltalk objects in the allocated memory and to retrieve Smalltalk objects from the memory.

To allocate memory for the destination buffer, we can do the following:

dest := CByteArray gcMalloc: 100.

The gcMalloc constructor says to create space on the heap (outside of Smalltalk's object memory) and create a Smalltalk object (in object memory) that references the external memory. The heap memory will be automatically freed when the Smalltalk object is garbage collected. We don't need to put anything into the memory since the compress() function will not retrieve anything from the buffer. We pick a size that is enough to hold the expected result (we made an educated guess for this example; in real use we could get a better estimate by calling compressBound() with the source length).

To allocate memory for the destination size, and put a value in the location, we can do the following:

dest_size := CByteArray gcMalloc: 8.

dest_size uint64At: 0 put: destination size.

This allocates 8 bytes in the heap and puts the integer 100 (or whatever size we have allocated for the destination buffer) in that memory location (starting at a zero-based offset of 0). When we call the function we will pass a pointer to the number, not the number itself. This is so we provide a place for the function to tell us the amount of the destination buffer actually used (reusing the memory we allocated). After we make the call we can get the size back from the memory location:

used := dest_size uint64At: 0.

Once we know the amount of the destination actually used, we can extract the zip data. Note that the zip data is generic binary data, not a string, and may include bytes with a value of 0 (so cannot be treated as a C-string). Note that we are again dealing with zero-based offsets since our underlying structures are C memory:

compressed := destination byteArrayFrom: 0 to: used - 1.

We can put this all together and pass a source string to be compressed:

Example 17.4 CCallout to invoke compress()

| ccallout_compress source dest dest_size result used compressed |

ccallout_compress := CCallout

	library: (CLibrary named: '/lib/libz.so.1.2.3.3')

	name: 'compress'

	result: #'int32'

	args: #(#'ptr' #'ptr' #'const char*' #'uint64').

source := 'The quick brown fox jumped over the lazy dog'.

dest := CByteArray gcMalloc: 100.

dest_size := CByteArray gcMalloc: 8.

dest_size uint64At: 0 put: dest size.

result := ccallout_compress callWith:

	(Array with: dest with: dest_size with: source with: source size).

used := dest_size uint64At: 0.

compressed := dest byteArrayFrom: 0 to: used - 1.

If the result is zero (Z_OK), then the function executed successfully, and compressed will reference a ByteArray that contains the compressed data.

Creating a Smalltalk class

The CHeader object can also be used to create a new Smalltalk class and automatically generate methods to invoke the C functions.

The method CHeader >> wrapperForLibraryAt: can be used to create a Smalltalk class with default name and methods for each function. The default name is the library name without the ‘lib’, so for zlib.h, the resulting class name is simply “Z”.

When creating Smalltalk methods that allow arguments to be passed to the C function in the generated interface methods, each function argument is represented with “_:”. So for example for the getcwd() function, which as two arguments, the equivalent Smalltalk method is:

getcwd_: buffer _: size

To generate a wrapper class for the zlib library, in the most simple case you could use the following code:

Example 17.5 Create wrapper class using default

| header wrapperClass wrapper |

header := CHeader path: '/usr/include/zlib.h'.

wrapperClass := header wrapperForLibraryAt:

	'/lib/libz.so.1.2.3.3'.

wrapperClass initializeFunctions.

UserGlobals at: wrapperClass name put: wrapperClass.

After this is executed, you can use a code browser to view the class-side methods that create the CCallout instances, and the instance-side methods that call the functions.

As mentioned earlier, the header file may include many functions beyond that provided in the library – all the functions that are defined in the referenced include files. And we can call any of these functions through this library, due to the way the C function lookup occurs.

For example, the function getpid() is defined to take no arguments and return a 32-bit number. This makes it very easy to call once we have defined a wrapper class:

Example 17.6 Invoke Z function getpid

topaz 1> run

Z new getpid

We probably don’t want to allow the Z class to have access to every function that is included – for example, it might be better not to have access to sethostid(), which changes the current machine's Internet number. It’s better to be more selective about what functions to include in the wrapper. It’s also desirable to have a more descriptive name for the library wrapper class.

The method CHeader>> wrapperNamed: forLibraryAt: select: allows you to specify the name and a select block to determine the specific library to include. The select block should evaluate to a Boolean that indicates whether or not to include the particular function.

For example, to create a wrapper for various compress functions, you could do the following:

Example 17.7 Create wrapper class specifying name and functions

| header class |

UserGlobals removeKey: #'ZLib' ifAbsent: [].

header := CHeader path: '/usr/include/zlib.h'.

class := header

	wrapperNamed: 'ZLib'

	forLibraryAt: '/lib/libz.so.1.2.3.3'

	select: [:each |

		each name includesString: 'compress'].

class initializeFunctions.

UserGlobals at: class name put: class.

This code creates a wrapper class, ZLib, that contains only four functions: compress(), uncompress(), compress2(), and compressBound(), all the ones that happen to include the string “compress”. The select block may be considerably more complex, depending on which specific functions you want to include.

To invoke compress using the Zlib class rather than manually creating a CCallout:.

Example 17.8 Invoke Zlib function compress()

topaz 1> printit

| source destination dest_size result used compressed |

source := 'The quick brown fox jumped over the lazy dog'.

destination := CByteArray gcMalloc: 100.

dest_size := CByteArray gcMalloc: 8.

dest_size uint64At: 0 put: destination size.

result := ZLib new

      compress_: destination

      _: dest_size

      _: source

      _: source size.

used := dest_size int64At: 0.

compressed := destination byteArrayFrom: 0 to: used - 1.

compressed

x.^K.HU(,.L.VH*./.SH..P.*.-HMQ./K-R(^A..$VU*...^C.k.^P0