OVERVIEW MSCC (Memory Safe C Compiler) is a tool to ensure both temporal and spatial memory safety in C programs through a source-to-source transformation. MSCC was developed with the following criteria in mind: * Detect all spatial and temporal memory errors. * Handle most C programs, with almost no source code changes. * No changes to the memory allocation model of C. Freed memory is immediately returned to the heap for reuse. * Relatively low performance overheads compared to previous techniques. MSCC detects memory access errors at the level of memory blocks, which correspond to the least units of memory allocation, such as a global or local variable, or memory returned by a single invocation of malloc. It flags memory errors only at a point where a pointer is dereferenced. For each pointer p, metadata that describe the temporal and spatial attributes of the memory block pointed by p are maintained at runtime and used for detecting memory errors before dereferences of p. Metadata is stored separately from a pointer for better backwards-compatibility with untransformed libraries. MSCC handles most common forms of pointer arithmetic as well as type casts that can be classified into upcasts (cast from subtype to supertype) or downcasts (casting from a supertype to a subtype), which account for most casts in typical C programs. More details of the transformation can be found in our ACM SIGSOFT FSE 2004 paper: "An efficient and backwards-compatible transformation to ensure memory safety of C programs" (http://www.seclab.cs.sunysb.edu/seclab/pubs/papers/fse04.pdf) MSCC is implemented in Objective Caml (http://caml.inria.fr) and uses CIL (http://manju.cs.berkeley.edu/cil/) as the front end to manipulate C constructs. The MSCC package is available at http://www.seclab.cs.sunysb.edu/mscc/. This work is supported by NFS grants CCR-0098154 and CCR-0208877, and an ONR grant N000140110967. COPYRIGHT MSCC is distributed under the terms of the GNU General Public License Version 2. Some of the source code are borrowed from CCured (http://manju.cs.berkeley.edu/ccured/). They are governed by their own copyright terms. STATUS MSCC is alpha software. It is provided ONLY for the research and evaluation purpose. MSCC has been tested on the Olden benchmark programs, several SPECINT benchmark programs, and several GNU utilities programs such as bc, gzip, patch, and tar. These programs range from 400 to 30,000 lines of code. INSTALLATION This version of MSCC has been tested to be working with GNU C Compiler 3.2.2 on Red Hat Linux 9. It requires Objective Caml 3.07 and CIL 1.2.6. * PREREQUISITES 1. Obtain and install Objective Caml 3.07. wget http://caml.inria.fr/pub/distrib/ocaml-3.07pl2/ocaml-3.07pl2.tar.gz tar xvfz ocaml-3.07pl2.tar.gz cd ocaml-3.07 ./configure make world.opt make install OCaml will be installed onto /usr/local/bin and /usr/local/lib. 2. Obtain and install CIL 1.2.6. wget http://manju.cs.berkeley.edu/cil/distrib/cil-1.2.6.tar.gz tar xvfz cil-1.2.6.tar.gz cd cil ./configure make make install The CIL library files and scripts will be installed onto /usr/local/lib/cil and /usr/local/share/cil respectively. * INSTALLING MSCC tar xvfz mscc-0.2.2.tar.gz cd mscc-0.2.2 ./configure make The "configure" script accepts the following options: --cil-prefix (default: /usr/local) Set cil-libdir and cil-datadir to /lib/cil and /share/cil respectively. --cil-libdir (default: /usr/local/lib/cil) Directory where the CIL library files are installed --cil-datadir (default: /usr/local/share/cil) Directory where the CIL scripts are installed Once MSCC is compiled, you can add /path/to/mscc-0.2.2/bin into PATH. USING MSCC MSCC can be used just like a C compiler, and it accepts most GCC options. For example, the following command transforms "foo.c" and compiles the transformed source into an executable "foo": mscc -g -O2 -o foo foo.c To use MSCC with make, simply set the values of CC and AR to mscc as follows: make CC="mscc" AR="mscc --mode=AR" RANLIB="echo" Note that RANLIB is also disabled because the command "ranlib" cannot recognize the intermediate files generated by mscc. COMMON PROBLEMS WHEN USING MSCC * RESOLVING MERGING ERRORS MSCC merges all C files of an application into one single file and then applies the memory-safe transformation on the merged file. However, merging can sometimes fail. Most merging errors are due to inconsistent declarations of the same global variable or function in different C source files, and thus can be easily fixed. For instance, tar 1.12 has a merging error caused by two different prototypes of xmalloc: void * xmalloc(); /* in xmalloc.c */ and char * xmalloc(); /* in xgetcwd.c */ This error can be fixed by changing "char *" to "void *" in the second xmalloc() declaration. * SPECIFYING INTENDED DATA TYPE AT ALLOCATION SITES MSCC needs to know the intended data type of each heap-allocated memory block at allocation time in order to generate appropriate code for allocating and initializing the associated pointer meta-data. If a malloc-allocated memory block is used for storing pointers, then the intended data type of the block should be specified as an explicit type cast on the return value of malloc (or other heap-allocation functions). For example, void *p = malloc(1024); ... char **q = (char **)p; should be changed to char **p = (char **) malloc(1024); ... char **q = p; Because of the explicit type cast "(char **)" in the second code snippet, MSCC knows that the allocated block is intended to store "char *" pointers, and hence MSCC will generate code to allocate memory for meta-data associated with each "char *" pointer stored in the allocated block. * SPECIFYING MALLOC-LIKE USER-DEFINED MEMORY MANAGEMENT FUNCTIONS For each call to a memory allocation function, MSCC will generate additional code to allocate and initialize the meta-data associated with the allocated block. Therefore, all memory allocation functions should be registered with MSCC. By default, MSCC knows only about the standard library memory allocation functions such as malloc/calloc. User-defined allocations functions that have similar semantics to malloc/calloc can be registered using the "csafealloc" pragma, e.g. #pragma csafealloc("xmalloc", nozero, sizein(1)) #pragma csafealloc("xcalloc", zero, sizemul(1,2)) The above two pragma's register "xmalloc" and "xcalloc" as allocation functions, where xmalloc (similar to malloc) uses its first argument as the allocation size and does not zero out the allocated block, while xcalloc (similar to calloc) uses the muplication of its first two arguments as the allocation size and initializes the allocated block with zeroes. Similarly, deallocation functions can be specified using the "csafedealloc" pragma, e.g. #pragma csafedealloc("free") Because user-defined memory management functions are not transformed, any functions invoked by these functions should also remain untransformed. To tell MSCC not to transform a function, the attribute "__compat__" can be added to the function definition, e.g. static void * (__attribute__((__compat__)) fixup_null_alloc) (size_t n); KNOWN BUGS AND LIMITATIONS * USER-DEFINED MEMORY MANAGEMENT FUNCTIONS Currently MSCC does not support user-defined memory management functions that have different semantics compared to malloc/free, e.g. an allocation function that returns a matrix of objects. * EXTERNAL FUNCTIONS There are two major problems related to external functions. The first is related to function prototypes. MSCC changes the function prototypes by introducing extra arguments that hold meta-data pertaining to the original arguments. External functions are automatically marked and their function prototypes are unchanged. If a user-defined function is called by an external function as a callback, however, the modified function prototype of the user-defined function will be incompatible with what the external function expects. To work around this problem, such user-defined functions can be manually marked as "__compat__" to avoid being transformed. The second problem is related to meta-data of external pointers. When an external function returns a pointer, MSCC assumes that the returning pointer is always valid and thus assigns special meta-data to the pointer such that dereferencing the pointer always succeed. This solution avoids the needs for wrapper functions in most cases, but its drawback is that memory errors caused by these pointers won't be detected. Meanwhile, the current support for external pointers is limited to simple data structures, such as "char *". If an external function returns a complex data structure that contains deep-level pointers, MSCC does not generate enough meta-data for every pointer contained within the data structure. Runtime memory errors may occur and won't be detected. Even worse, the transformed program may terminate prematurely because of the lacking of meta-data for validating pointer dereferences. Wrapper functions are required in these cases. * TYPE CASTS MSCC supports type casts that follow the upcast/downcast paradigm, e.g. casting from "char **" to "void *" then to "int **", or casting from "struct A *" to "struct B *" then to "struct A *", if "struct A" is a bigger structure and "struct B" is a smaller compatible structure. There are two kinds of typecasting operations that MSCC currently does not support. The first is casting from a pointer to an integer then back to a pointer. The second is casting between structure pointers in a manner that violates the subtype criteria. Bad casts themselves don't cause runtime errors. However, if the resulting pointer is dereferenced, a runtime memory error will be reported. Usually bad casts can be eliminated by modifying the source code. For example, if the integer in an integer-to-pointer cast previously stores a pointer value, then we can get rid of this bad cast by changing the integer type to "void *". * POINTER ARITHMETIC MSCC supports all the pointer arithmetic that advances a pointer from one array element to another array element, no matter that this is achieved by a trivial pointer increment operation, or by first casting the pointer into "void *" then adding carefully calculated offsets and casting back to a pointer of the desired type. If a pointer arithmetic on a pointer p doesn't satisfy the above condition, it is still allowed to dereference p, but runtime errors will be reported if pointers contained within *p are accessed.