GPU strong and branching bisimilarity checkingThis page is dedicated to checking strong and branching bisimilarity of Labelled Transition Systems completely on an NVIDIA GPU with at least CUDA computation capability 2.0. Bisimilarity checking is a useful computation for offline model checking, in which the state space is completely known and stored on disk, and temporal properties must be checked on it.Download links for both the tools and a set of models for experimentation are provided on this page. To generate the state spaces of the involved models directly on a CPU, one of the model checking toolsets mCRL2, CADP or PRISM is required, depending on the particular model. In the paper, strong and branching bisimilarity checking is done on the CPU using the LTSmin toolset, in particular, the LTSmin-reduce-dist tool. REQUIRED PACKAGES: CUDA driver + SDK; To produce the state spaces of the used models on a CPU: CADP, mCRL2, and PRISM SOFTWARE: The tools required for GPU bisimilarity checking can be downloaded here. Models and State SpacesFollowing, we present all the characteristics of the analysed state spaces, with a brief description of each model and how to obtain it.
In the table, "n.a." stands for "not applicable", meaning that no corresponding experiment has been performed. For instance, no branching bisimilarity checking has been done for the 'coin8.3' model. How to obtain a state space from a particular model depends on the tools that are required. Assuming that CADP, mCRL2, and PRISM have all been installed in standard paths, each state space can be generated from the corresponding model by running the script called generate_statespace which is included in all the archives. For the PRISM models, the resulting files also need to be converted to the .aut format. For this, there is a Python script included in the GPUreduce tool package called mdp2aut. Applying it on the state space files converts them to the .aut format. For the VLTS cases, the ones prefixed 'vasy_' and 'cwi_', direct links to the state spaces themselves are given. These spaces are in '.bcg' format, though. To convert them to the desired '.aut' format, use the CADP toolbox by running 'bcg_io A.bcg A.aut' for each .bcg file named A. On most of the state spaces we also applied hiding, to analyse how branching bisimilarity checking performs for varying numbers of tau-transitions. In the tables, the resulting state spaces are marked h1 or h2. In the GPUreduce tool package, a tool called produce_hidden_auts can be found. It selects deterministically fixed percentages of the labels in a state space and hides those, i.e. renames them to tau. After that, tau-compression is performed to ensure that no tau-cycles are constructed. To prepare a .aut file for the GPU, autrelabel should be applied on it first. This tool renames all action labels to integers, and sorts the outgoing transitions of each state by label. The latter is a precondition for the GPU bisimilarity checking tool. Finally, the bisimilarity checking tool gpureduce can be run as follows: gpureduce <relabelled .aut file> [-br] The -br argument is optional, and selects branching bisimilarity, as opposed to the (default) strong bisimilarity. Experimental ResultsThe table below presents our runtime measurements (in seconds) for the strong bisimilary checking experiments. The setups used are:
The hardware setups for the experiments were as follows: For the GPU experiments, we used a machine with an Intel E5-2620 2.0 GHz CPU, running CentOS Linux, equipped with an NVIDIA K20m GPU. The latter contains 13 Streaming Multiprocessors, each having 192 Streaming Processors, and 5 GB global memory. For the CPU experiments, we used a machine with an AMD Opteron 6172 CPU with 48 cores, 192 GB RAM, running Debian 6.0.7. In the two tables below, '-' entries indicate that the experiment ran out of memory, either in the global memory of the GPU, or in the RAM on the CPU, depending on the type of experiment. Entries 'o.o.t.' indicate running out of time. As a time-bound, we selected 14 hours.
Finally, the final table below presents our measured runtimes (in seconds) for the branching bisimilarity experiments. In this case, GPUreduce ss and GPU LR are no longer present, the first, because the strong bisimilarity results already demonstrate how multi-way splitting affects the runtimes, and the second, because the LR algorithm is limited to checking strong bisimilarity.
Some nice observations can be made: Firstly, the GPU bisimilarity checking tool requires an a-priori known amount of memory, namely 2.|S|+2.|T|, with S the set of states and T the set of transitions of an LTS. For LTSmin, which stores complete signatures in the memory, this is not the case. For instance, in the case of vasy_6120, this means that the GPU approach can completely perform the calculation in 5 GB, whereas LTSmin does not manage it with 192 GB. Of course, this is possible since repeatedly reconstructing the signatures is not too harmful in a many-core setting, in which many thousands of threads are run, whereas when running up to 32 threads, this would result in a serious runtime penalty. Secondly, in some cases, the GPU tool is the slowest of all. Intuitively, this depends on the number of tau-transitions and the maximum branching factor of an LTS, since those influence how many inert paths there are potentially, and therefore how much checking needs to be done in each partition refinement step to detect them. However, a clear relation between those LTS characteristics and the measured runtimes remains to be identified. |