Precision

Analysis precision in sixgill works in three ways. This page discusses these views of precision and their limitations. Some of these limitations are fixable. Currently, though, these limitations can be addressed only with Trusted Annotations.

At the basic level, sixgill uses a memory model to determine the possible behaviors of loop free blocks of code. These loop free blocks of code can be single iterations of loops, or entire functions with the loops treated like function calls. A loop free block of code can only do finitely many things, and sixgill models these behaviors with near exact precision. Pointer operations, structure operations, control flow and arithmetic are all modelled exactly. The memory model is incomplete only in its understanding of nonlinear operations.

Nonlinear Operations

At a higher level, sixgill uses program conditions — preconditions, postconditions, loop invariants, and type/global invariants — to determine the possible behaviors of an entire codebase. These conditions can be automatically inferred by sixgill or supplied with annotations, and in either case the same restrictions apply. sixgill does not understand most program conditions which use quantifiers (effectively any annotation which uses loops or recursion), nor type invariants involving multiple objects.

Quantified Asserts
Type Invariants over multiple objects

Finally, sixgill can augment inferred preconditions, postconditions and loop invariants with information about the context those conditions were inferred from. This technique does not affect the expressive power of these conditions, but significantly increases the robustness and quality of sixgill's inference.

Condition Contexts

Nonlinear Operations

Nonlinear operations include all operations other than the arithmetic addition, subtraction, and multiply/divide/modulus/shifts by constants. Nonlinear operations are most commonly used for bitwise operations and for multiplication or shifts by variables. sixgill does not completely model nonlinear operations, and this shows up in sixgill reports as values assigned to variables that do not follow the semantics of those operations. In the following example, if x is assigned 5 and y is assigned 10, z may be assigned 3.
void foo(int x, int y)
{
  int z = x * y;
  ...
}
All nonlinear operations are modelled partially — for example, if a and b are unsigned then a & b <= a and a & b <= b. However, there are several ways sixgill's handling of nonlinear operations could be improved, and this is an active area of work.

Quantified Asserts

Quantification is used in any assertion which reasons about a potentially unbounded number of heap values. For the following example, quantification is used in three different ways, only one of which can currently be expressed by sixgill annotations.
char* strcat_list(char **strings, int strings_len)
{
  int len = 0;
  for (int i = 0; i < strings_len; i++)
    len += strlen(strings[i]);

  char *res = malloc(len + 1);

  char *pos = res;
  for (int i = 0; i < strings_len; i++) {
    strcpy(pos, strings[i]);
    pos += strlen(strings[i]);
  }

  res[len] = 0;
  return res;
}
1. strlen has a precondition requiring its argument to include a null terminator. The same holds for the second argument to strcpy. Null terminators are handled by the zterm annotation term, as described in Annotations. Internally, sixgill supports and infers other kinds of terminators (arrays terminated with a constant other than zero, or arrays of structures terminated by a field with a specific constant), though this is not yet exposed through the annotations.

2. strcat_list has a precondition requiring every element of its strings parameter to be null terminated. Array invariants such as this cannot yet be annotated. This functionality is a good candidate for improving the expressiveness of sixgill annotations.

3. The two loops in strcat_list have invariants that, respectively, len is the sum of the lengths of all strings up to i, and pos is at an index into res equal to that same sum of lengths. This general form of quantification is not supported in annotations, and is unlikely to be supported in the future. Code which uses this general form tends to be far more complicated than this example, and it quickly gets very difficult to specify and prove the quantified conditions.

Type Invariants over multiple objects

The only type invariants which sixgill supports describe various fields of a single heap object. For the following example, while the invariant on str can be annotated and used by sixgill when the program accesses values of type str, sixgill will not be able to check that the program actually maintains the invariant.
struct data {
  int *buf;
  ...
};

struct data_info {
  invariant(ubound(d->buf) >= len)
  struct data *d;
  int len;
};
Similarly, type invariants which mention global variables cannot be checked, as in the following example.
int *glob_buf;
int glob_len;

struct glob_key {
  invariant(index < glob_len)
  int index;
  ...
};
sixgill checks type invariants by checking assertions after every write in the program which might affect an invariant on some object. This approach breaks down for these two kinds of invariants — which structures are affected by the writes in the following two functions?
void init_data(struct data *d)
{
  d->buf = malloc(100 * sizeof(int));
  ...
}

void init_glob()
{
  glob_buf = malloc(100 * sizeof(int));
  glob_len = 100;
}
There are several potential solutions to this problem; a separate analysis could be used to identify objects whose invariants might be affected, annotations could be used to specify those objects, or sixgill could optimistically (and unsoundly) assume that no objects have their invariants affected. None of these solutions are supported yet.

Condition Contexts

Contexts are used by sixgill to infer preconditions, postconditions and loop invariants that do not always have to hold, but hold only in certain cases. To illustrate this, consider the following example:
int foo(char *buf, int flag)
{
  if (flag)
    buf[20] = 0;
}

void bar()
{
  char buf1[10];
  char buf2[30];
  foo(buf1, 0);
  foo(buf2, 1);
}
We are interested in checking that the write to buf[20] is safe. A correct precondition for foo is !flag || ubound(buf) > 20' This is actually fairly hard to reliably infer, as the test for flag may be intermingled with dozens of unrelated tests. Instead, sixgill only needs to infer a precondition which holds whenever the write to buf[20] is actually executed, namely when flag is set. The inferred precondition will be ubound(buf) > 20, which when paired with the context of the write is sufficient to check the safety of the write.

Contexts can be used to infer conditions referring to values not in scope. These are the 'frame(...)' values seen in some sixgill reports. Consider the following example:
int foo(int mx)
{
  for (int i = 0; i < mx; i++) {
    if (...)
      return i;
  }
  return mx;
}

void bar()
{
  char buf[100];
  int x = foo(99);
  buf[x] = 0;
}
Again, we are interested in checking that the write to buf[x] is safe. A correct postcondition for foo is return <= mx, though again sixgill does not have to infer this postcondition, only one which holds whenever foo is called by bar and the write to buf[x] is later executed. The inferred postcondition will be ubound(frame(1,buf)) > return where frame(1,buf) refers to the value of buf within bar.