Green card: a foreign-language interface for HaskellThomas Nordin, Simon Peyton Jones, Alastair Reid, Malcolm Wallace**** Note that this document describes GreenCard as of November 1997 - in particular, it supersedes the Haskell Workshop 97 paper. There are significant syntax changes, some simplifications, and some new features to extend the power of DISs. Note also that it differs from the current Glasgow version of this document, which failed to adopt some of the changes we agreed here. ****
Table of Contents
1 MotivationA foreign-language interface provides a way for software components written in a one language to interact with components written in another. Programming languages that lack foreign-language interfaces die a lingering death. This document describes GreenCard, a foreign-language interface for the non-strict, purely functional language Haskell. We assume some knowledge of Haskell and C.
1.1 Goals and non-goalsOur goals are limited. We do not set out to solve the foreign-language interface in general; rather we intend to profit from others' work in this area. Specifically, we aim to provide the following, in priority order:
The ability to call C from Haskell is an essential foundation. Through it we can access operating system services and mountains of other software libraries. In the other direction, should we be able to write a Haskell library that a C program can use? Yes indeed, but this paper does not address the question directly. (Some implementations of GreenCard, e.g. for nhc98, have provided a limited mechanism to allow this.) Should we support languages other than C? The trite answer is that pretty much everything available as a library is available as a C library. For other languages the right thing to do is to interface to a language-independent software component architecture, rather than to a raft of specific languages. For the moment we choose COM, but CORBA(2) might be another sensible choice. (Note also that there is some current research focussed on using IDL to specify generalised foreign language interfaces for Haskell.) While we do not here propose a mechanism to call Haskell from C, it does make sense to think of writing COM software components in Haskell that are used by clients. For example, one might write an animated component that sits in a Web page. This document, however, describes only /1/, the C interface mechanism. 2 Foreign language interfaces are harder than they lookEven after the scope is restricted to designing a foreign-language interface from Haskell to C, the task remains surprisingly tricky. At first, one might think that one could take the C header file describing a C procedure, and generate suitable interface code to make the procedure callable from Haskell. Alas, there are numerous tiresome details that are simply not expressed by the C procedure prototype in the header file. For example, consider calling a C procedure that opens a file, passing a character string as argument. The C prototype might look like this: int open( char *filename ) Our goal is to generate code that implements a Haskell procedure with type open :: String -> IO FileDescriptor
None of these details are mentioned in the C header file. Instead, many of them are in the manual page for the procedure, while others (such as parameter lifetimes) may not even be written down at all. 3 Overview of GreenCardThe previous section bodes ill for an automatic system that attempts to take C header files and automatically generate the "right" Haskell functions; C header files simply do not contain enough information. The rest of this paper describes how we approach the problem. The general idea is to start from the Haskell type definition for the foreign function, rather than the C prototype. The Haskell type contains quite a bit more information; indeed, it is often enough to generate correct interface code. Sometimes, however, it is not, in which case we provide a way for the programmer to express more details of the interface. All of this is embodied in a program called "GreenCard". GreenCard is a Haskell pre-processor. It takes a Haskell module as input, and scans it for GreenCard directives (which are lines prefixed by `%'). It produces a new Haskell module as output, and (in some implementations) a C module as well. (Figure 1). Figure 1: The big picture GreenCard's output depends on the particular Haskell implementation that is going to compile it. For the Glasgow Haskell Compiler (GHC), GreenCard generates Haskell code that uses GHC's primitive `ccall'/`casm' construct to call C. All of the argument marshalling is done in Haskell. For Hugs, GreenCard generates a C module to do most of the argument marshalling, while the generated Haskell code uses Hugs's `prim' construct to access the generated C stubs. For nhc98, GreenCard generates a C module to do part of the argument marshalling, although the majority of it is done in the generated Haskell code. For example, consider the following Haskell module: module M where %fun sin :: Float -> Float sin2 :: Float -> Float sin2 x = sin (sin x) Everything is standard Haskell except the `%fun' line, which asks GreenCard to generate an interface to a (pure) C function `sin'. After the GHC-targeted version of GreenCard processes the file, it looks like this(3): (Only GHC aficionados will understand this code. The whole point of GreenCard is that Joe Programmer should not have to learn how to write this stuff!) module M where sin :: Float -> Float sin f = unsafePerformPrimIO ( case f of { F# f# -> _casm_ "%r = sin(%0)" f# `thenPrimIO` \ r# -> returnPrimIO (F# r#)}) sin2 :: Float -> Float sin2 x = sin (sin x) The `%fun' line has been expanded to a blob of gruesome boilerplate, while the rest of the module comes through unchanged. If Hugs is the target, the Haskell source file remains unchanged, but the the Hugs variant of GreenCard generates output that uses Hugs's primitive mechanisms for calling C. For the nhc98 target, GreenCard generates something different again. Much of the GreenCard implementation is, however, shared between all variants. 4 GreenCard directivesGreenCard pays attention only to GreenCard directives, each of which starts with a `%' at the beginning of a line. All other lines are passed through to the output Haskell file unchanged. The syntax of GreenCard directives is given in Figure 2). The syntax for the dis production is given later (Figure 3).
Figure 2: Grammar for GreenCard A general principle we have followed is to define a single, explicit (and hence long-winded) general mechanism, that should deal with just about anything, and then define convenient abbreviations that save the programmer from writing out the general mechanism in many common cases. We have erred on the conservative side in defining such abbreviations; that is, we have only defined an abbreviation where doing without it seemed unreasonably long-winded, and where there seemed to be a systematic way of defining an abbreviation. GreenCard understands the following directives:
Following a GreenCard directive, subsequent leading or trailing whitespace is in general ignored or trimmed. This applies even to the `%C' directive. Because there are occasions when it can be desirable to preserve whitespace in the C code, some implementations of GreenCard (currently only for nhc98) allow a special form `%-' which is exactly like `%C' except that it preserves all whitespace. All directives (except `%C' and `%-') can span more than one line, but the continuation lines must each start with a `%' followed by some whitespace. Haskell-style comments are permitted in GreenCard directives (except, for obvious reasons, `%C' and `%-'). For example: %fun draw :: Int -- Length in pixels % -> Maybe Int -- Width in pixels % -> IO () In later sections, we shall encounter the specification of short fragments of literal C code (and indeed, literal Haskell code) deep within a GreenCard directive. On such occasions, the literal C code is enclosed within double-quote marks (and the literal Haskell code is also denoted syntactically). However, within these fragments one sometimes wishes to make use of the value of a name bound by a GreenCard DIS macro, rather than the name itself. Hence, a name used within double-quotes can be escaped by prefixing it with the `%' character. When the literal code is generated, these escaped names will be replaced by the value bound to that name in the current environment. See Section 7.2 for examples. 5 Procedure specificationsThe most common GreenCard directive is a procedure specification. It describes the interface to a C procedure. A procedure specification has four parts:
Any of these parts may be omitted except the type signature. If any part is missing, GreenCard will fill in a suitable statement based on the type signature given in the `%fun' statement. For example, consider this procedure specification: %fun sin :: Float -> Float GreenCard fills in the missing statements like this(4): %fun sin :: Float -> Float %call (float arg1) %code res1 = sin(arg1); %result (float res1) The rules that guide this automatic fill-in are described in Section 5.5 Automatic fill-in. A procedure specification can define a procedure with no input parameter, or even a constant (a "procedure" with no input parameters and no side effects). In the following example, `printBang' is an example of the former, while `grey' is an example of the latter(5): %fun printBang :: IO () %code printf( "!" ); %fun grey :: Colour %code r = GREY; %result (colour r) All the C variables bound in the `%call' statement or mentioned in the `%result' statement, are declared by GreenCard and in scope throughout the body. In the examples above, GreenCard would have declared `arg1', `res1' and `r'.
5.1 Type signatureThe `%fun' statement starts a new procedure specification. GreenCard supports two sorts of C procedures: ones that may cause side effects (including I/O), and ones that are guaranteed to be pure functions. The two are distinguished by their type signatures. Side-effecting functions have the result type `IO t' for some type `t'. If the programmer specifies any result type other than `IO t', GreenCard takes this as a promise that the C function is indeed pure, and will generate code that assumes such. The procedure specification will expand to the definition of a Haskell function, whose name is that given in the `%fun' statement, with two changes: the longest matching prefix specified with a `%prefix' (Section 5.7 Prefixes elaborates) statement is removed from the name and the first letter of the remaining function name is changed to lower case. Haskell requires all function names to start with a lower-case letter (upper case would indicate a data constructor), but when the C procedure name begins with an upper case letter it is convenient to still be able to make use of GreenCard's automatic fill-in facilities. For example: %fun OpenWindow :: Int -> IO Window would expand to a Haskell function `openWindow' that is implemented by calling the C procedure `OpenWindow'. %prefix Win32 %fun Win32OpenWindow :: Int -> IO Window would expand to a Haskell function `openWindow' that is implemented by calling the C procedure `Win32OpenWindow'. 5.2 Parameter marshallingThe `%call' statement tells GreenCard how to translate the Haskell parameters into C values. Its syntax is designed to look rather like Haskell pattern matching, and consists of a sequence of zero or more Data Interface Schemes (DISs), one for each (curried) argument in the type signature. For example: %fun foo :: Float -> (Int,Int) -> String -> IO () %call (float x) (int y, int z) (string s) ... This `%call' statement binds the C variables `x', `y', `z', and `s', in a similar way that Haskell's pattern-matching binds variables to (parts of) a function's arguments. These bindings are in scope throughout the body and result-marshalling statements. In the `%call' statement, `float', `int', and `string' are the names of the DISs that are used to translate between Haskell and C. The names of these DISs are deliberately chosen to be the same as the corresponding Haskell types (apart from changing the initial letter to lower case) so that in many cases, including `foo' above, GreenCard can generate the `%call' line by itself (Section 5.5 Automatic fill-in). In fact there is a fourth DIS hiding in this example, the `(_,_)' pairing DIS. DISs are discussed in detail in Section 6 Data Interface Schemes. 5.3 The bodyThe body consists of arbitrary C code, beginning with `%code'. The reason for allowing arbitrary C is that C procedures sometimes have complicated interfaces. They may return results through parameters passed by address, deposit error codes in global variables, require `#include''d constants to be passed as parameters, and so on. The body of a GreenCard procedure specification allows the programmer to say exactly how to call the procedure, in its native language. The C code starts a block, and may thus start with declarations that create local variables. For example: %code int x, y; % x = foo( &y, GREY ); Here, `x' and `y' are declared as local variables. The local C variables declared at the start of the block scope over the rest of the body and the result-marshalling statements. (The C code may also mention values from included C header files, such as `GREY' above, or use global variables or structures declared earlier by GreenCard `%C' (or `%-') directives. 5.4 Result marshallingFunctions return their results using a `%result' statement. Side-effecting functions -- ones whose result type is `IO t' -- can also use `%fail' to specify the failure value.
5.4.1 Pure functionsThe `%result' statement takes a single DIS that describes how to translate one or more C values back into a single Haskell value. For example: %fun sin :: Float -> Float %call (float x) %code ans = sin(x); %result (float ans) As in the case of the `%call' statement, the `float' in the `%result' statement is the name of a DIS, chosen as before to coincide with the name of the type. A single DIS, `float', is used to denote both the translation from Haskell to C and that from C to Haskell, just as a data constructor can be used both to construct a value and to take one apart (in pattern matching). All the C variables bound in the `%call' statement, the `%result' statement, and all those bound in declarations at the start of the body, scope over all the result-marshalling statements (i.e. `%result' and `%fail'). 5.4.2 Arbitrary C resultsIn a result-marshalling statement an almost arbitrary C expression, enclosed in double quotes, can be used in place of a C variable name. The above example could be written more briefly like this(6): %fun sin :: Float -> Float %call (float x) %result (float "sin(x)") 5.4.3 Side effecting functionsA side effecting function returns a result of type `IO t' for some type `t'. The `IO' monad supports exceptions, so GreenCard allows them to be raised. The result-marshalling statements for a side-effecting call consists of zero or more `%fail' statements, each of which conditionally raise an exception in the `IO' monad, followed by a single `%result' statement that returns successfully in the `IO' monad. Just as in Section 5.4 Result marshalling, the `%result' statement gives a single DIS that describes how to construct the result Haskell value, following successful completion of a side-effecting operation. For example: %fun windowSize :: Window -> IO (Int,Int) %call (window w) %code struct WindowInfo wi; % GetWindowInfo( w, &wi ); %result (int "wi.x", int "wi.y") Here, a pairing DIS is used, with two `int' DISs inside it. The arguments to the `int' DISs are C record selections, enclosed in double quotes; they extract the relevant information from the `WindowInfo' structure that was filled in by the `GetWindowInfo' call(7). The `%fail' statement has two fields, each of which is either a C variable or a C expression, enclosed in double quotes. The first field is a boolean-valued expression that indicates when the call should fail; the second is a `(char *)'-value that indicates what sort of failure occurred. If the boolean is true (i.e. non zero) then the call fails with a `UserError' in the `IO' monad containing the specified string. For example: %fun fopen :: String -> IO FileHandle %call (string s) %code f = fopen( s ); %fail "f == NULL" "errstring(errno)" %result (fileHandle f) The assumption here is that `fopen' puts its error code in the global variable `errno', and `errstring' converts that error number to a string. `UserError's can be caught with `catch', but exactly which error occurred must be encoded in the string, and parsed by the error-handling code. This is rather slow, but errors are meant to be exceptional. 5.5 Automatic fill-inAny or all of the parameter-marshalling, body, and result-marshalling statements may be omitted. If they are omitted, GreenCard will "fill in" plausible statements instead, guided by the function's type signature. The rules by which GreenCard does this filling in are as follows:
5.6 ConstantsSome C header files define a large number of constants of a particular type. The `%const' statement provides a convenient abbreviation to allow these constants to be imported into Haskell. For example: %const PosixError [EACCES, ENOENT] This statement is equivalent to the following `%fun' statements: %fun EACCES :: PosixError %fun ENOENT :: PosixError After the automatic fill-in has taken place we would obtain: %fun EACCES :: PosixError %result (posixError "EACCES") %fun ENOENT :: PosixError %result (posixError "ENOENT") Each constant is made available as a Haskell value of the specified type, converted into Haskell by the DIS macro for that type. (It is up to the programmer to write a `%dis' definition for the macro -- see Section 6.2 DIS macros.) There are variant ways of declaring constants within the `%const' directive. Firstly, the type-name can be replaced by a DIS-name if you wish. Secondly, you may find the Haskell constant names `eACCES' and `eNOENT' somewhat ugly, so you may associate a different Haskell name with each C constant name. %const PosixError [ % errAccess = "EACCES", % errNoEnt = "ENOENT" % ] 5.7 PrefixesIn C it is common practice to give all function names in a library the same prefix, to minimize the impact on the common namespace. In Haskell we use qualified imports to achieve the same result. To simplify the conversion of C style namespace management to Haskell the `%prefix' statement specifies which prefixes to remove from the Haskell function names. module OpenGL where %prefix OpenGL %prefix gl %fun OpenGLInit :: Int -> IO Window %fun glSphere :: Coord -> Int -> IO Object This would define the two procedures init and sphere which would be implemented by calling OpenGLInit and glSphere respectively. 5.8 Arbitrary C inclusionsIt is often useful to be able to write arbitrary lines of C code outside any procedure specification, for instance to include a header file, define the layout of a C structure, or declare a C global variable. The `%C' directive (with its whitespace-preserving variant `%-') is provided expressly for this purpose. For example, either of %C #include <header.h>or %-#include <header.h>tells GreenCard to arrange that a specified C header file will be included by the C code it generates. As another example, for simple convenience one might wish to add data or type declarations directly to the generated C module, rather than in a separate header file. Thus: %-struct _iocb { %- int fd; %- void *buf; %- int pos; %- unsigned flags; %-}; %-typedef struct _iocb *FILE 6 Data Interface SchemesA Data Interface Scheme, or DIS, tells GreenCard how to translate from a Haskell data type to a C data type, and vice versa.
Figure 3: Grammar of DISs 6.1 Forms of DISsThe syntax of DISs is given in Figure 3. It is designed to be similar to the syntax of Haskell patterns. A DIS takes one of the following forms:
6.2 DIS macrosIt would be unbearably tedious to have to write out complete DISs in every procedure specification, so GreenCard supports DIS macros in much the same way that Haskell provides functions. (The big difference is that DIS macros can be used in "patterns" -- such as `%call' statements -- whereas Haskell functions cannot.) DIS macros allow the programmer to define abbreviations for commonly-occurring DISs. For example: newtype This = MkThis Int (Float, Float) %dis this x y z = MkThis (int x) (float y, float z) Along with the `newtype' declaration the programmer can write a `%dis' declaration that defines the DIS macro `this' in the obvious manner. DIS macros are simply expanded out by GreenCard before it generates code. So for example, if we write: %fun f :: This -> This %call (this p q r) ... GreenCard will expand the call to `this': %fun f :: This -> This %call (MkThis (int p) (float q, float r)) ... (In fact, `int' and `float' are also DIS macros defined in GreenCard's standard prelude, so the `%call' line is further expanded to something like: %fun f :: This -> This %call (MkThis ((declare "int" p in I# (%%Int p)) % (declare "float" q in F# (%%Float q), % declare "float" r in F# (%%Float r)))) ... The fully expanded calls describe the marshalling code in full detail; you can see why it would be inconvenient to write them out literally on each occasion!) Notice that DIS macros are automatically bidirectional; that is, they can be used to convert Haskell values to C and vice versa. For example, we can write: %fun f :: This -> This %call (this p q r) %code f( p, q, r, &a, &b, &c); %result (this a b c) The form of DIS macro definitions, given in Figure 3, is very simple. The formal parameters can only be variables (not patterns), and the right hand side is simply another DIS. Only first-order DIS macros are permitted. Note however that the quoting/escape mechanism for literal code enables one to use the value of a macro variable within a fragment of C code (or Haskell code). This feature is very powerful, as shown in Section 6.2.0 Marshalling complex structures. 6.2.0 Marshalling complex structuresThe full power of DIS macros becomes apparent when one attempts to map between a structured Haskell type and a structured C type. For example, let us study a Haskell `ColourPoint' type: data ColourPoint = CP Int Int Colour data Colour = Red | Green | Blue | ...for which we happen to want a representation in C as a `struct colourpoint': struct colourpoint { int x; int y; enum colour c; };It requires just two small DIS macros to capture the mapping: %dis colourPoint cp = % declare "struct colourpoint" cp in % CP (int "%cp.x") (int "%cp.y") (colour "%cp.c") %dis colour x = % declare "enum colour" x in % <fromEnum/toEnum> (int x)Using these, it is then very easy to implement the required interfaces to foreign functions which manipulate coloured points: %fun translate :: Int -> Int -> ColourPoint -> IO ColourPoint %call (int xrel) (int yrel) (colourpoint p) %code p.x += xrel; % p.y += yrel; % render(&p); %result (colourpoint "p")Note that in this example, the return value is actually the same structure as the argument value (destructively updated). It is for this reason that the final `p' is quoted as a C literal - it prevents the `declare' clause of the DIS macro from generating a second (overlapping) declaration of the variable in C. Here is a different example where it is more obvious that the literal-C argument to the `colourPoint' DIS should not generate a variable declaration: %fun nullPoint :: ColourPoint %result (colourPoint "{0,0,RED}") 6.3 Semantics of DISsHow does GreenCard use these DISs to convert between Haskell values and C values? We give an informal algorithm here, although most programmers should be able to manage without knowing the details. To convert from Haskell values to C values, guided by a DIS, GreenCard does the following:
Much the same happens in the other direction, except that GreenCard calls the `to_t' function when inside a user-defined DIS, and builds a value with a data constructor, rather than taking it apart. Again, C variables are declared of the appropriate types, although of course a literal C expression in a result does not generate a declaration. 7 Standard DISsFigure 4 gives the DIS macros that GreenCard provides as a "standard prelude".
Figure 4: Standard DISs 7.1 Haskell type extensionsSeveral of the provided DISs involve types that go beyond standard Haskell:
7.2 MaybeAlmost all DISs work on single-constructor data types. It is much less obvious how to translate values of multi-constructor data types to and from C. In fact, the right way to do it is through user-defined DISs. We illustrate how with a DIS for the `Maybe' type. The definition of a `maybe' DIS is: %dis maybeInt default x = <fromMaybe %default/toMaybe %default> (int x) fromMaybe def (Nothing) = def fromMaybe def (Just x) = x toMaybe def x | def == x = Nothing | otherwise = Just x where `default' is a Haskell expression which represents the `Nothing' value. Note how we use the `%' character to unquote the bound variable `default' within a context where it would otherwise be treated as literal Haskell. In the following example, the function `foo' takes an argument of type `Maybe Int'. If the argument value is `Nothing' it will bind `x' to `0'; if it is `Just a' it will bind `x' to the value of `a'. The return value will be `Just r' unless `r == -1' in which case it will be `Nothing'. %fun foo :: Maybe Int -> Maybe Int %call (maybeInt 0 x) %code r = foo(x); %result (maybeInt -1 r) 8 ImportsGreenCard "connects" with code in other modules in two ways:
9 Invoking GreenCardMost Haskell compilers invoke GreenCard automatically when they are given a source file with the extension `.gc'. However, the general syntax for invoking GreenCard as a stand-alone program is: greencard [options] [filename] GreenCard reads from standard input if no filename is given. The options can be any of these:
10 Related Work
11 Alternative design choices and avenues for improvementHere we summarise aspects of GreenCard that are less than ideal, and indicate possible improvements.
Footnotes(1)Microsoft's Common Object Model (COM) is a language-independent software component architecture. It allows objects written in one language to create objects written in another, and to call their methods. The two objects may be in the same address space, in different address spaces on the same machine, or on separate machines connected by a network. OLE is a set of conventions for building components on top of COM. (2)CORBA is a vendor-independent competitor of COM. (3)Only GHC aficionados will understand this code. The whole point of GreenCard is that Joe Programmer should not have to learn how to write this stuff. (4)The details of the filled-in statements will make more sense after reading the rest of Section 5 Procedure specifications (5)When there are no parameters, the `%call' line must be omitted. The second example can also be shortened by writing a C expression in the `%result' statement; see Section 5.4 Result marshalling. (6)It can be written more briefly still by using automatic fill-in (Section 5.5 Automatic fill-in). (7)This example also shows one way to interface to C procedures that manipulate structures. This document was modified heavily by Malcolm Wallace Nov/Dec 1997, from an original document generated on 21 March 1997 using the texi2html translator version 1.51. |