Automatic code inspection aims to assist developers in writing code with fewer bugs by providing immediate feedback on the code being written. Code inspection tools accomplish this by performing a number of checks on the source code, typically using some form of static analysis. Static analysis entails a textual analysis of the program rather than observing data resulting from runs of the program (which is termed dynamic analysis). Based on the source code itself, various checks can be performed, ranging from constraints on variable types and missing parameter checks to flawed memory addressing. If a particular source code location violates any of the checks supported by the tool, it will issue a warning for that location. By automating the checking of various hazardous code constructs, code inspection tools have the potential to make bug finding easier and more efficient.
However, code inspection tools are notorious for producing too many warnings. In the studies performed in this thesis, we found that tools may flag up to 30% of the lines in a program as violating one or more checks supported by the tool. Clearly this is a long way from being a shortlist of potential bugs, and increases rather than decreases the inspection burden for the developer. There are two important reasons for this: (1) the static analysis may not be powerful enough to determine whether some location passes a check, and will conservatively issue a warning; (2) a check may be too generic and not apply in specific situations or programs. Warning issued by the tool that we do not consider real problems are termed false positives. Even if all false positives are discarded, the sheer number of remaining warnings may still deter developers in inspecting all of them. Automatic code inspection tools should thus be augmented with a means to focus on the most relevant warnings.
Existing approaches that provide such focus prioritize warnings using either the accuracy of the static analysis, the likelihood developers will take action on a certain warning, or the likelihood it represents a real fault. In this thesis, we use the likelihood the flawed construct will lead to an observable failure of the program. The path from faulty source code to failure starts with execution of that piece of code, resulting in corruption of program state and deviation from normal behavior, in turn recorded in the issue database for the program. This thesis proposes two criteria based on this path: execution likelihood, targeting the first step in the path, and historical fault likelihood, encompassing the complete path.
The execution likelihood criterion is implemented by one extra static analysis, dubbed ELAN (for Execution Likelihood ANalysis). The approach models control flow and data flow using graph representations of the program, where statements are nodes and edges represent transfer of control between the various statements. The probabilities associated with transfer of control from conditional statements (e.g., if,for,while in C programs) are modeled by either simple type heuristics or by approximating values for the variables involved in the condition. In conjunction with the probabilities associated with every edge, the path through the graph from execution start (e.g., main in a C program) to a certain location can be used to derive an estimate of the execution likelihood for that location.
The historical fault likelihood criterion is estimated by using the program's version history and its associated issue database. The first contains all versions of the source code for the program, and the second contains, amongst others, a record of all the bugs found in that program. Oftentimes, the version history and issue database can be linked in such a way that all source code lines involved in a bug fix can be identified. If a certain check is consistently violated in code associated with a bug, and not violated anywhere else, this check constitutes a good indication of potential bugs. By keeping score for every check whether it is on a bug fix line or a non-fix line, we can estimate a fault likelihood for that check.
In this thesis, we have placed particular emphasis on empirical validation of the proposed approaches. Contributions of this thesis can thus be summarized as both approaches and validation, in particular: (1) ELAN, an approach to use execution likelihood to prioritize code inspection results; (2) Empirical evaluation of ELAN on open-source programs, showing that simple controlflow-based techniques perform on par with more advanced dataflow-based techniques; (3) Approaches to link inspection warnings to bugs on three different levels of granularity (line, file, project); (4) Application of the historical fault likelihood approach on three industrial embedded C projects, using the checks from the MISRA C standard, a widely-used coding standard; (5) A detailed analysis of the empirical results of those three studies, showing which factors associated with specific programs influence the fault likelihood of checks; and (6) A method to select efficient checks (or rules from coding standards) for a certain program, based on their historical fault likelihood and the tradeoff between the potential number of issues covered by the check and the number of warnings generated using the check. This method reduced the number of warnings to inspect in our case studies by 63% to 95%, while still covering 64% to 95% of the bugs covered by the unreduced list of warnings.