This chapter describes the Foreign Function Interface (FFI) classes and methods, and how you can use them to build and interface to an existing C library.
Overview of the Foreign Function Interface
The purpose and use of the FFI.
FFI Core Classes
Describes the FFI related classes and data types.
FFI Wrapper Utilities
Instructions for using FFI utilities to define FFI classes for your library.
For certain applications, you may need to provide functionality that is not readily available within GemStone Smalltalk. Such functionality might include interactions with third-party products such as these:
To interact with third-party products such as these, you can use the FFI to make C library calls from within GemStone Smalltalk. Using the FFI, you can access C functions in external libraries without the need to write UserActions.
NOTE
With UserActions, your code is checked against function prototypes of the external library that you’re calling. With the FFI, no such checking takes place.
The core FFI defines six classes: CLibrary, CFunction, CPointer, CByteArray, CCallout, and CCallin.
An instance of CLibrary corresponds to a C compiled library. Instances of CLibrary are created using:
CLibrary class >> named:libraryName
passing in the path and name of the C shared library to be loaded. The platform-specific extension (such as .so) is optional.
Individual functions within a CLibrary are represented by instances of CCallout. To create a CCallout, the following class methods are available:
library: aCLibrary name: aName result: resType args: argumentTypes
library: aCLibrary name: aName result: resType args: argumentTypes
varArgsAfter: varArgsAfter
name: aName result: resType args: argumentTypes
name: aName result: resType args: argumentTypes varArgsAfter: varArgsAfter
aCLibrary may be an instance of CLibrary, an Array of CLibraries, or nil. Passing nil for aCLibrary will cause search of the loaded libraries for a function of this name. aName is a String providing the name of the specific function. resType is the return type of the function, and argumentTypes is an array of zero or more symbols describing the types of the argument for this function.
varArgsAfter is -1 if the number of arguments to the function is fixed. If the function prototype ends with an ellipsis (‘...’), indicating that the function takes a variable number of arguments, then varArgsAfter indicates the one-based index of the last fixed argument. (If varArgsAfter is 0, there are no fixed arguments.)
The following instance method is used to invoke the function described by the instance of CCallout:
To get the value of the C global variable errno that was saved by the most recent call to callWith:, use the CCallout class method:
errno
Table 18.1 lists the symbols used for creating resType (result type) and argumentTypes arguments when creating CCallouts.
Functions using varArgs normally may have a maximum of 20 variable arguments. This limit is lower if native code is disabled for this session, as described in the following section.
If the generation of native code is disabled, there are further limitations:
Native code generation is on by default, but may be configured to be disabled or becomes disabled when breakpoints are set. See the System Administration Guide for more information on native code generation.
A CCallin represents a signature for a C function to be called by C code. The resulting CCallin may be used as a type within the argumentTypes array when defining a CCallout.
A CByteArray represents an allocation of C memory. When objects such as pointers or strings are passed to or from C functions, creating a CByteArray, with memory malloc’ed, ensures that the memory will be valid following the call.
While it is possible to manually construct FFI calls using the core classes described above in FFI Core Classes, it involves analysis of the various header files and is usually tedious and error-prone. The typical header file includes many other header files, and the typical C program involves many defines, typedefs, and other definitions.
NOTE
The following examples use zlib, a commonly available software library for data compression that is available on many platforms. The examples are based on zlib v1.2.8 on Linux; with other versions of zlib.h or on other platforms, you may need to experiment. Documentation on zlib is available at http://zlib.net/manual.html.
To help in the process of constructing FFI calls, GemStone includes a class, CHeader, that does the required analysis of a header file. You can parse a header file by using the methods:
CHeader class >> path: headerFileOrPath
You may pass in the full path and name of a header file, or the header file name. If the headerFileOrPath does not fully specify a file, will look in the system search path, including the current directory, /usr/include/, and /usr/local/include/. These paths are also used to locate any files that are parsed due to include statements.
CHeader class >> path: headerFileOrPath searchPath: aPath
Search for headerFileOrPath and its include files by looking first in aPath, and then in the current directory and the system search path.
CHeader class >> path: headerFileOrPath searchPaths: collOfPaths
Search for headerFileOrPath and its include files by searching first in the directories in collOfPaths in order, then in the current directory and the system search path.
You may lookup the search path using:
CPreprocessor new allSearchPaths
The following example analyzes a a header file and stores the result in a variable in UserGlobals:
topaz 1> run
UserGlobals at: #'ZLibHeader' put:
(CHeader path: '/usr/include/zlib.h').
%
Since zlib.h is in /usr/include/, on the system search path, the following also lookup the header file:
(CHeader path: 'zlib.h')
(CHeader path: 'zlib.h' searchPath: '/usr/include/')
(CHeader path: 'zlib.h' searchPaths: {'/usr/include/'})
(CHeader path: 'include/zlib.h' searchPath: '/usr/')
(CHeader path: 'include/zlib.h' searchPaths: {'/usr/'})
Once you have a CHeader object, you can get information about the various things defined in the header file and those it includes.
topaz 1> run
(ZLibHeader functions at: 'compress')
%
a CDeclaration
header a CHeader
name compress
storage extern
linkageSpec nil
type int32
count nil
pointer 0
fields nil
parameters a Array
enumTag nil
isConstant false
includesCode false
isVaryingArgCount false
bitCount nil
source extern int compress (Bytef *dest, uLongf
*destLen, const Bytef *source, uLong sourceLen)
file /usr/include/zlib.h
line 1060
While the compress() function is directly in zlib.h, this isn’t necessarily the case. Functions that are defined in any header file that is #included in the parsed header file also will have definitions in the instance of CHeader.
For example, on Linux the zlib.h file #includes unistd.h, so functions such as getcwd() also have definitions in the instance of CHeader:
topaz 1> run
(ZLibHeader functions at: 'getcwd') file.
%
/usr/include/unistd.h
On other platforms, zlib.h may not #include unistd.h. In this case, the definition is not included in ZLibHeader. In this case (if you wanted to access these functions from GemStone), you could create a separate instance of CHeader for unistd.h:
topaz 1> run
UserGlobals at: #'UnistdLibHeader'
put: (CHeader path: '/usr/include/unistd.h').
%
Note that parsing the header file does not give you the location of the actual C library file that you will be calling. Normally when to write an interface to specific libraries, you would be provided the library names and locations as well as the header files.
To take an example that is in unistd.c, viewing the source for the getcwd() function declaration will let us see the argument declarations.
topaz 1> run
(ZLibHeader functions at: 'getcwd') source
%
extern char *getcwd (char *__buf, size_t __size) __attribute__
((__nothrow__ , __leaf__)) ;
This tells us that the function takes two arguments, a pointer to a string and an integer, and returns a pointer to a string. Knowing that the function defined by this header is in libc, and the library path and filename is /lib/x86_64-linux-gnu/libc.so.6, we can manually create a call to this function:
topaz 1> run
| string ccallout_getcwd |
string := String new: 200.
ccallout_getcwd := CCallout library:
(CLibrary named: '/lib/x86_64-linux-gnu/libc.so.6')
name: 'getcwd'
result: #'char*'
args: #(#'char*' #'uint64').
string := ccallout_getcwd callWith:
(Array with: string with: string size).
%
It’s important to note the way arguments are defined, since C handles memory differently from Smalltalk. The temporary string that is created as an argument to the function must be created with a size larger than the expected result. This is required for heap space to be allocated for the C function; if it is not large enough, the function will error. Also keep in mind that it’s very important that the specified size of the string in the second argument not be larger than the actual size of the string. The C function will write results to memory limited by the second argument.
getcwd() updates the argument as well as returning a value; both contain the same string, but different instances. In both cases String’s size is now the actual size of the returned String, truncated from the original size of 200.
A more complex example is the ZLib function compress(). This is defined in zlib.h as follows:
ZEXTERN int ZEXPORT compress OF((Bytef *dest, uLongf *destLen,
const Bytef *source, uLong sourceLen));
You can view a simplified definition using the CHeader printString:
topaz 1> run
(ZLibHeader functions at: 'compress') printString
%
extern int32 compress(uint8 *dest, uint64 *destLen, const uint8 *source, uint64 sourceLen)
This tells us that compress() takes four arguments:
The function compresses the source data and places the result in the destination buffer. The destination length is updated with the space actually used. The function returns a flag indicating success or the type of error experienced.
We can manually create a call to this function using the core classes described in 16.1:
CCallout
library: (CLibrary
named: '/lib/x86_64-linux-gnu/libz.so.1')
name: 'compress'
result: #'int32'
args: #(#'ptr' #'ptr' #'const char*' #'uint64').
This creates an object that can be used to call the compress() function in the library. The constructor takes four arguments: (1) an instance of CLibrary; (2) the name of the function; (3) the result type; and (4) a list of the types of the arguments.
In order to call the function from Smalltalk we need to create the arguments. The source string and the source length are easy–they are just instances of a Smalltalk String and Integer. The destination and destination length are a bit more complex. They are both pointers to memory locations where the function will retrieve information (destLen starts as the available length of the destination buffer) as well as return information (dest, where the result is placed, and destLen, the amount of dest actually used).
In general, C libraries cannot deal directly with Smalltalk objects since the format is different and objects can move in memory with various garbage collection operations. As part of making the C function call, the virtual machine converts the Smalltalk objects to C data and constructs a C stack before making the C library call. For many objects this works fine; as we saw in the getcwd() example above, simple String and Integer objects are handled properly. But when an argument is a pointer to a chunk of memory in which the C library will place arbitrary data, we need to explicitly allocate that space and pass a pointer to it.
The class CByteArray represents a chunk of memory that is outside the Smalltalk object space (it is on the "heap"), and when an instance of CByteArray is passed as a #'ptr' type, the virtual machine puts a pointer to the space on the stack before making the function call. There are methods in CByteArray to place various Smalltalk objects in the allocated memory and to retrieve Smalltalk objects from the memory.
To allocate memory for the destination buffer, we can do the following:
dest := CByteArray gcMalloc: 100.
The gcMalloc constructor says to create space on the heap (outside of Smalltalk's object memory) and create a Smalltalk object (in object memory) that references the external memory. The heap memory will be automatically freed when the Smalltalk object is garbage collected. We don't need to put anything into the memory since the compress() function will not retrieve anything from the buffer. We pick a size that is enough to hold the expected result (we made an educated guess for this example; in real use we could get a better estimate by calling compressBound() with the source length).
To allocate memory for the destination size, and put a value in the location, we can do the following:
dest_size := CByteArray gcMalloc: 8.
dest_size uint64At: 0 put: destination size.
This allocates 8 bytes in the heap and puts the integer 100 (or whatever size we have allocated for the destination buffer) in that memory location (starting at a zero-based offset of 0). When we call the function we will pass a pointer to the number, not the number itself. This is so we provide a place for the function to tell us the amount of the destination buffer actually used (reusing the memory we allocated). After we make the call we can get the size back from the memory location:
used := dest_size uint64At: 0.
Once we know the amount of the destination actually used, we can extract the zip data. Note that the zip data is generic binary data, not a string, and may include bytes with a value of 0 (so cannot be treated as a C-string). Note that we are again dealing with zero-based offsets since our underlying structures are C memory:
compressed := destination byteArrayFrom: 0 to: used - 1.
We can put this all together and pass a source string to be compressed:
topaz 1> run
| ccallout_compress source dest dest_size result used compressed |
ccallout_compress := CCallout
library: (CLibrary
named: '/lib/x86_64-linux-gnu/libz.so.1')
name: 'compress'
result: #'int32'
args: #(#'ptr' #'ptr' #'const char*' #'uint64').
source := 'The quick brown fox jumped over the lazy dog'.
dest := CByteArray gcMalloc: 100.
dest_size := CByteArray gcMalloc: 8.
dest_size uint64At: 0 put: dest size.
result := ccallout_compress callWith:
(Array with: dest with: dest_size with: source with: source size).
used := dest_size uint64At: 0.
compressed := dest byteArrayFrom: 0 to: used - 1.
%
The CHeader object can also be used to create a new Smalltalk class and automatically generate methods to invoke the C functions.
The method CHeader >> wrapperForLibraryAt: can be used to create a Smalltalk class with default name and methods for each function. The default name is the library name without the ‘lib’, so for zlib.h, the resulting class name is simply “Z”.
When creating Smalltalk methods that allow arguments to be passed to the C function in the generated interface methods, each function argument is represented with “_:”. So for example for the getcwd() function, which as two arguments, the equivalent Smalltalk method is:
getcwd_: buffer _: size
To generate a wrapper class for the zlib library, in the most simple case you could use the following code:
topaz 1> run
| header wrapperClass wrapper |
header := CHeader path: '/usr/include/zlib.h'.
wrapperClass := header wrapperForLibraryAt:
'/lib/x86_64-linux-gnu/libz.so.1'.
wrapperClass initializeFunctions.
UserGlobals at: wrapperClass name put: wrapperClass.
%
After this is executed, you can use a code browser to view the class-side methods that create the CCallout instances, and the instance-side methods that call the functions.
As mentioned earlier, the header file may include many functions beyond that provided in the library – all the functions that are defined in the referenced include files. And we can call any of these functions through this library, due to the way the C function lookup occurs.
For example, the function getpid() is defined to take no arguments and return a 32-bit number. This makes it very easy to call once we have defined a wrapper class:
We probably don’t want to allow the Z class to have access to every function that is included – for example, it might be better not to have access to sethostid(), which changes the current machine's Internet number. It’s better to be more selective about what functions to include in the wrapper. It’s also desirable to have a more descriptive name for the library wrapper class.
The method CHeader>> wrapperNamed:forLibraryAt:select: allows you to specify the name and a select block to determine the specific library to include. The select block should evaluate to a Boolean that indicates whether or not to include the particular function.
For example, to create a wrapper for various compress functions, you could do the following:
topaz 1> run
| header class |
UserGlobals removeKey: #'ZLib' ifAbsent: [].
header := CHeader path: '/usr/include/zlib.h'.
class := header
wrapperNamed: 'ZLib'
forLibraryAt: '/lib/x86_64-linux-gnu/libz.so.1'
select: [:each |
each name includesString: 'compress'].
class initializeFunctions.
UserGlobals at: class name put: class.
%
This code creates a wrapper class, ZLib, that contains only four functions: compress(), uncompress(), compress2(), and compressBound(), all the ones that happen to include the string “compress”. The select block may be considerably more complex, depending on which specific functions you want to include.
To invoke compress using the Zlib class rather than manually creating a CCallout:.
topaz 1> run
| source destination dest_size result used compressed |
source := 'The quick brown fox jumped over the lazy dog'.
destination := CByteArray gcMalloc: 100.
dest_size := CByteArray gcMalloc: 8.
dest_size uint64At: 0 put: destination size.
result := ZLib new
compress_: destination
_: dest_size
_: source
_: source size.
used := dest_size int64At: 0.
compressed := destination byteArrayFrom: 0 to: used - 1.
compressed
%
x\u9c^KÉHU(,ÍLÎVH*Ê/ÏSH˯PÈ*Í-HMQÈ/K-R(^AÊç$VU*¤ä§^C.k\u93^P0