Active Model Learning for the Analysis of Network Protocols

Paul Fiterău-Broștean

promotor: prof. dr. F.W. Vaandrager (RU)
Radboud University
Date: 13 April 2018
Thesis: PDF


Network protocols have become deeply ingrained into our everyday lives. Each form of Internet communication involves a series of protocols, as do many of the applications we have come to use. These applications include web browsers and servers, mail clients, chat programs, etc. Note how these applications communicate with each other to provide a desired service. A web browser for example, communicates with a web server to retrieve a desired web page. It is this very communication that is governed by protocols implemented in the applications’ software. Unsurprisingly, verifying that these protocols are properly implemented is of crucial importance, especially when security is considered. Performing such a verification is an arduous task, more so in a black-box setting, where the protocol’s source code is not accessible. Let us expand on some of the tasks required by testing techniques commonly used in verification.

Classical testing techniques require the constant manual maintenance of a large test suite, and may fail to spot corner cases where implementations are wrong. Model-based testing uses a model of the protocol to automatically generate tests. Unfortunately, such a model is rarely provided by the protocol’s specification, so it needs to be manually constructed and maintained. The technique known as model learning can provide significant relief, as it allows for the automatic generation of models from implementations. The models can then be checked against properties extracted from the protocol specification. This can be done manually by inspecting the models, or automatically via model checking. Either way, the tester’s task is greatly facilitated.

One goal of this work is to promote model learning as a viable technique for verifying, or in broader terms, analyzing practical software such as protocol implementations. To that end, in Chapters 2 and 3, we use model learning with abstraction to obtain models of TCP and SSH implementations, respectively. We then perform model checking on these models, in order to analyze the adherence of the learned implementations to the corresponding protocol specifications. This analysis helps uncover various standard violations and bugs.

Another goal is to ease verification of protocol implementations by improving model learning techniques. The challenge posed is that while model learning techniques are useful, their application to verification is made difficult by the many restrictions they impose on the system we want to verify. Such restrictions may require the system to have parameter-less input/output interfaces, to be deterministic, or to have no temporal dependencies. Overcoming these restrictions may not be possible, or may involve significant manual work. The fewer the restrictions, the easier and wider application of learning techniques becomes. So it is a goal of this work to lift some of these restrictions by expanding or developing new learning techniques.

Many current learning algorithms require the system’s behavior to be completely deterministic, and as a result, restrict systems from generating arbitrary, unrelated values in outputs. This restriction greatly limits applicability of learning, as many systems, particularly network protocols, output these values in the form of nonces, sequence numbers and identifiers. Chapter 4 introduces an extension of a well-known framework by which we largely lift this restriction. This extension and other optimizations are integrated in Tomte, the learner implementing this framework, and tested over a series of benchmarks. Network protocols also commonly perform a wide range of arithmetic operations on data, whereas learning algorithms typically limit these operations to assignments and equality checks. A different learning framework provides means of supporting more advanced operations. RALib, the framework’s implementation, supports equality and inequality operations. In Chapter 5, we integrate into RALib extensions for handling inequalities over sums with constants, as well as the extension developed in Chapter 4. Integrating these extensions allows us to infer more detailed models of TCP client implementations. Upon analyzing these models, we find bugs which were made discoverable by the new extensions, and could not have been discovered in our earlier experiments on TCP.

Chapters 4 and 5 shed light on more foundational problems. Active learning algorithms are complex and often tied to the restrictions they impose. This makes them difficult to extend, or to adapt for specific usage scenarios, such as learning a system that cannot be reset. They also require optimization before they can be put to practice due to inefficiencies in the traditional framework. Chapter 6 proposes a learning framework based on SMT for confronting these problems. Within this framework, learning algorithms are expressed by more compact logical formulas. This enables quick prototyping of learning for even advanced formalisms. Breaking away from the traditional framework, our framework also removes the need for optimization and achieves high adaptability. We present extensions of our framework for various formalisms and scenarios. We provide an open-source implementation and use it to assess the framework’s effectiveness over a series of benchmarks.

Over the course of this thesis we explore different approaches for learning practical systems. Research on each approach is supported by implementations, experiments or case studies. Future work should evolve these approaches in order to further facilitate and widen their application to practical systems. In doing so, it should make it possible to verify even complex implementations with the simple click of a button.