[ Back ]


Author Haidar OSMAN
Director of thesis Prof. Oscar Nierstrasz
Co-director of thesis
Summary of thesis

There is an increasing demand on high-quality software as software bugs have an

economic impact not only on software projects, but also on national economies in

general. Software quality is achieved via the main quality assurance activities of

testing and code reviewing. However, these activities are expensive, thus they need

to be carried out efficiently.

Auxiliary software quality tools such as bug detection and bug prediction tools

help developers focus their testing and reviewing activities on the parts of software

that more likely contain bugs. However, these tools are far from adoption as mainstream

development tools. Previous research points to their inability to adapt to the

peculiarities of projects and their high rate of false positives as the main obstacles of

their adoption.

We propose empirically-grounded analysis to improve the adaptability and efficiency

of bug detection and prediction tools. For a bug detector to be efficient,

it needs to detect bugs that are conspicuous, frequent, and specific to a software

project. We empirically show that the null-related bugs fulfill these criteria and are

worth building detectors for. We analyze the null dereferencing problem and find

that its root cause lies in methods that return null. We propose an empirical solution

to this problem that depends on the wisdom of the crowd. For each API method,

we extract the nullability measure that expresses how often the return value of this

method is checked against null in the ecosystem of the API. We use nullability to

annotate API methods with nullness annotation and warn developers about missing

and excessive null checks.

For a bug predictor to be efficient, it needs to be optimized as both a machine

learning model and a software quality tool. We empirically show how feature selection

and hyperparameter optimizations improve prediction accuracy. Then we optimize

bug prediction to locate the maximum number of bugs in the minimum amount

of code by finding the most cost-effective combination of bug prediction configurations,

i.e., dependent variables, machine learning model, and response variable. We

show that using both source code and change metrics as dependent variables, applying

feature selection on them, then using an optimized Random Forest to predict the

number of bugs results in the most cost-effective bug predictor.

Throughout this thesis, we show how empirically-grounded analysis helps us

achieve efficient bug prediction and detection tools and adapt them to the characteristics

of each software project.

Status finished
Administrative delay for the defence 2017
URL http://scg.unibe.ch/staff/Osman
LinkedIn http://ch.linkedin.com/pub/haidar-osman/44/1bb/50b/