Soundness

A main goal of sixgill is to be a verifier, to consider all possible behaviors of the code during analysis and to find all bugs. We are not there yet; this page discusses the various ways in which sixgill is unsound, our assessment of how these unsoundnesses affect its results, and approaches by which these unsoundnesses could be fixed. These unsoundnesses can be divided into three categories.

1. Unsoundness likely to miss bugs. These are serious issues.

Integer Overflow
Terminator Clobbering
Buffer Access Identification

2. Unsoundness which fixing could degrade tool results. Depending on the type of assertion being checked, these may or may not lead to missed bugs; for buffer overflow assertions, we doubt these issues are serious. However, they have a significant effect on the tool's memory model, and fixing them would need to preserve the precision and ease of use of the tool.

Type Safety
Modifies Sets

3. Unsoundness which fixing would not have much effect on the tool results.

Aliasing
Invariant Points
Indirect Calls
Analysis Timeouts

Three additional issues are more fundamental, and not fixable. First, trusted annotations (those of the 'assume' variety) are not checked; if these annotations are wrong, the results of the analysis will be affected. Second, only C/C++ code with available source is checked. Code in unanalyzed libraries, assembly code, or dynamically generated code (as from a JIT compiler) will not be checked. Finally, bugs in the tool itself can certainly affect its results.

Integer Overflow

sixgill does not consider the possibility of integer overflow in arithmetic computations. In the following example, the write to buf[i] will be checked successfully, even though the computation len * sizeof(int) might overflow.
int *buf = malloc(len * sizeof(int));
for (int i = 0; i < len; i++)
  buf[i] = 0;
It would be difficult to modify sixgill's memory model to directly model overflow in arithmetic computations. However, sixgill could, with some modification, check whether an operation can overflow, e.g. for the above example static_assert(len * sizeof(int) <= UINT_MAX). With knowledge of which operations might overflow, it would be straightforward to soundly consider the possibility of such overflow.

Terminator Clobbering

sixgill detects code which scans for a buffer terminator (e.g. null terminators in C strings), and code which establishes a terminator. It does not, however, consider cases where the terminator is overwritten by another value, which may break later code which scans for it. In the following example, the precondition on strlen requiring null termination will be checked successfully, even though foo might overwrite the terminator established by bar.
precondition(zterm(buf) >= 0)
int strlen(char *buf);

void foo(char *buf)
{
  for (int i = 0; i < 100; i++)
    buf[i] = ...;
}

void bar()
{
  char buf[100];
  ...
  buf[99] = 0;
  foo(buf);
  int len = strlen(buf);
}
This problem could be approached either by using a separate analysis (as for integer overflow above) or by improving the analysis as is to detect these terminator clobbers. The latter approach would require a sound treatment of modifies sets, see section below.

Buffer Access Identification

sixgill uses a heuristic to determine which memory accesses to treat as buffer accesses whose bounds need to be checked. This heuristic includes any access with an explicit array index, (e.g. buf[n] = ...), and dereferences whose target may have been produced by pointer arithmetic within the same function, such as *q in the following example:
if (...)
  p++;
...
q = p;
...
*q = ...;
This heuristic does not account for cases where a pointer was produced by arithmetic in one function, then passed to another function which accessed it. In the following example, *x will not be treated as a buffer access, and will not be checked (the &buf[n] term is not actually accessing the buffer).
int buf[100];
int* foo(int n)
{
  return &buf[n];
}

void bar()
{
  int *x = foo(100);
  *x = 0;
}
There are a few straightforward ways of fixing this, though doing so accurately (that is, accesses marked as buffer accesses should actually be buffer accesses) will likely require additional annotations.

Type Safety

sixgill assumes the analyzed program is type safe, namely that stack and heap objects are treated as having a consistent type throughout their lifetime. If an object is used as multiple types, sixgill will not correctly model reads and writes of the overlaying fields. In the following example, the write to pval->c will not be treated as modifying val.a and the assertion will be successfully checked.
struct str1 { int a; int b; };
struct str2 { int c; int d; };

struct str1 val;
val.a = 0;

struct str2 *pval = (struct str2*) &val;
pval->c = 1;

static_assert(val.a == 0);
This problem can be approached using a separate analysis to find type safety violations. For a previous project, Saturn, we investigated this problem in the Linux kernel, and found that the code is almost entirely type safe, though this can be extremely difficult to prove.

For this type safety project we developed sound techniques of dealing with the problems described in the sections on modsets, aliasing, and indirect calls below. These techniques could (and should) be applied to sixgill without much difficulty.

Modifies Sets

When analyzing a function, sixgill needs to know the modset for each called function, the set of heap locations that call might modify. It computes such sets, though these sets are usually incomplete. In the following example, even if bar might modify pval->a its modset will not include that location, and the assertion will be successfully checked.
struct str { int a; ... };

void bar();

void foo(struct str *pval)
{
  pval->a = 0;
  bar();
  static_assert(pval->a == 0);
}
Sound modsets could be constructed and used, though some analysis precision will be lost. This precision can be largely regained using a small number of additional annotations.

Aliasing

When analyzing a function, sixgill does not consider the possibility of aliasing between pointers passed into it or pointers returned from called functions. In the following example, even if a and b refer to the same location the assertion will be successfully checked.
void foo(int *a, int *b)
{
  *a = 0;
  *b = 1;
  static_assert(*a == 0);
}
This sort of aliasing does not occur much in programs, and is fairly easy to detect and model. As an aside, it is common for functions to create and internally use multiple pointers to the same location; this aliasing is modelled exactly by sixgill, as described in Precision.

Invariant Points

When checking that a type or global invariant is preserved by a write to some field or variable, sixgill only checks that the invariant is established by the end of the function or loop iteration containing the write. Intervening function calls can let callees see the structure before its invariant has been established. In the following example the assertion and invariant will both be successfully checked.
struct str {
  invariant(a <= b)
  int a;
  int b;
};

void bar(struct str *s)
{
  if (s->a >= 5)
    static_assert(s->b >= 5);
}

void foo(struct str *s)
{
  s->a = 5;
  bar(s);
  s->b = 6;
}
Fixing this would be fairly straightforward given the sets of locations used by each function, which can be computed in a similar manner to modifies sets.

Indirect Calls

sixgill needs a complete callgraph that accounts for the possible targets of all indirect calls (through both function pointers and virtual calls in C++). A separate analysis is run to identify these indirect targets, and while by and large this analysis gets correct results, there are some cases it misses:

1. Heap-allocated arrays of function pointers.
2. Function pointers passed as arguments to indirect calls.
3. Function pointers cast to another type (e.g. int) and later cast back to the function and called.

This sort of usage is extremely rare (about one dozen occurrences in all of Linux). Fixing these, however, would require additional annotations in some cases.

Analysis Timeouts

Timeouts are set in all the analyses in sixgill. Most timeouts that occur are in the final checking analysis, and these show up as (useful) reports. In earlier analyses, however, timeouts compromise soundness. For about .01% of functions the tool times out while computing the memory model for that function, and for about .1% of functions the tool times out while identifying assertions to check in that function.