Software Verification

Compilation Forms

wingolog, Static Single Assignment for Functional Programmers:

Data-race Detection

Edwin Westbrook, Jisheng Zhao, Zoran Budimlic, and Vivek Sarkar, ECOOP 2012, Practical Permissions for Race-free Parallelism: the paper presents a static type system for data-race using function permissions. The type system is gradual, meaning that the compiler inserts dynamic casts unless a programmer annotation indicates otherwise. Annotations are asserted, so a bad annotation does throw an exception–fail-stop semantics that assume a data-race is a bug. An execution is data-race free if no exceptions occur. The result is sound and precise for the execution. A data-race free execution does not imply a deterministic program for two reasons: first, a different schedule may result in a data-race; and second, the program is intended to be non-deterministic, and the data-race detection is only checking for unordered data-race that yields undefined behavior in the memory model. The paper extends the static system with gradual typing to the foreach array construct and the isolated keyword. Empirical studies show minimal overhead and few programmer annotations in practice. The object-stored permissions are reminiscent of Java Pathfinders dynamic check for aliasing which also stores a unique thread ID with each object access. The work is a good extension to the RV 2011 paper that requires significant programmer annotation for array-based parallelism using foreach.

Edwin Westbrook, Jisheng Zhao, Zoran Budimlic, and Vivek Sarkar, RV 2011, Permission Regions for Race-Free Parallelism: the paper presents a gradual type system for data-race detection that not only detects low-level races but high-level races such as the violation of atomic regions. “High-level races are especially insidious because they depend on programmer intent, and can occur even in well-synchronized code.” The ability to detect language level race is significant. The approach uses the notion of a permission region that is either from a programmer annotation or a result of compiler insertion. The runtime dynamically checks that conflicting permission regions are not concurrent at runtime–property of the schedule and not the actual program itself. Two regions conflict if both write a variable or one reads while the other writes in its permissions: “Two dynamic instances of permit statements are said to be conflicting if the write set of one overlaps the read or write sets of the other.” Regions can change dynamically then the base pointer is reassigned. As such, conflict is on the current set of variables that change with assignment. Parallel array constructs do require special annotation to break the array into subviews. The ECOOP 2012 work seems to remove that needed annotation. The algorithm is not precise due to how the compiler constructs the permission regions allowing for false positives. The authors claim that such false positives are rare events (and the claim is born out in the benchmarks). The algorithm assumes the safe initialization of final fields, and it checks a schedule for safe publication on constructors–a property of a trace and not the program itself. The language does include permission method annotations that require permissions to exist before method invocation that can be used to create atomic regions of code. Such annotations reduce the number of dynamic checks. The annotation inference algorithm operates on the AST and, “ the algorithm finds the highest ancestor a of n such that the path from a to n does not contain an async or an isolated . A permission region for x is then inserted around a in the AST, with x in the appropriate variable set.” The actual implementation to check for violation is done through an extension to the Object class where each object receives an acquire and release method for read and write permissions. These are invoked at the start and finish of permission regions. The methods are implemented by a simple state machine that detects conflict (the paper has a good figure showing the states in the machine).

Determinism

Jack B. Dennis, Guang R. Gao, and Vivek Sarkar, DFM 2012, Determinacy and Repeatability of Parallel Program Schemata: the paper precisely defines determinacy and repeatability using parallel execution graphs (PEGs) and function composition graphs (FCGs). A PEG is very similar to a computation graph. The structure of the FCG, unlike the PEG, is dependent on execution order in HJ program. The paper argues that the FCG provides semantics to the PEG. In other words, starting from the PEG, it is possible to construct an FCG to define the semantics of the HJ program. In this context, the semantics give the execution order of the operators. The sufficient condition for determinacy in an HJ program is that there are no conflicting unordered operations in the PEG (i.e., data-race free). A program with determinacy is also repeatable. A program is repeatable if every pair of concurrent operators commute (i.e., order does not matter). More directly, “A system is determinate if the same ultimate output is produced for every run of the system for a presented input, and this is true for all choices of input and for all interpretations of operators.” The property is compositional so any arbitrary interconnection of objects with determinacy yield a system with determinacy. Repeatable restricts the operators to a single interpretation. Given a specific interpretation, a repeatable system produces output for all runs that are strictly a function of inputs. Data-race is an important property of both determinacy and repeatability. In the context of this paper, data-race is any conflicting operation on a shared variable. It does not consider the happens-before ordering for that access. In other words, sequential consistency is guaranteed with programs are data-race free where data-race means a conflicting access that is not ordered by the happens-before relation. Conflicting accesses are ordered when they are marked as volatile or protected by a lock. This level of data-race freedom gives sequential consistent behavior from the runtime in the Java memory model. In the context of this paper, data-race free simply means no conflicting access.

Ming Kawaguchi, Patrick Rondon, Alexander Bakst, Ranjit Jhala., PLDI 2012, Deterministic Parallelism via Liquid Effects: The paper uses first order logic go construct a proof of determinism in concurrent code. The concurrent code is on the level of parallel for-loops ore co-begin over functions. The proof construction writes expressions from the source code that capture the range of addresses that are affected by the particular code. In particular the paper gives the default actions of Read and Write to a memory location. The expressions capture the ranges of addresses that are touched by Read or Write Writing the expressions requires loop-invariants and the ability to write function summaries. Given the summary expressed in first order logic as a relation on operations performed over ranges of addresses, the deterministic proof shows that actions that do not commute (e.g., Read and Write to a common memory location) operate on disjoint address ranges. The proof is accomplished by first projecting the function summaries to specific operations, renaming variables where appropriate to reflect different instances of program variables or calls to recursive functions, combining into a single expressions the projections for the two different actions that do not commute (e.g., disjunction when sequential and conjunction when parallel), and then asking the SMT solver to find an address that satisfies the constraints from both expressions. If no such address exists, then the program is deterministic. If such an address exists, then the code is non-deterministic as the two non-commuting actions conflict (e.g., a non-ordered read and write pair from different threads on a common memory location). The paper relies on liquid types to infer the loop-invariants and the expressions describing effects of program statements. When inference fails, the tool asks for annotation from the user. The benchmarks show the tool requires few annotation. The benchmarks were small in general. The paper presents a grammer for the language, a type-system for the inference and proof construction, and an implementation in CSolve which is built on the Boogie platform. Great paper for an approach to proving determinism and a source for benchmarks.

Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Julian Shun, PPOPP 2012, Internally deterministic parallel algorithms can be fast: the paper implements several benchmark algorithms in a manner that is internally deterministic and then measures speedup on those implementations. The implementations are all non-blocking and use a novel pattern for greedy algorithms built on reserve/commit over iterates. Of particular interest is the definition of internally deterministic with citations to its original definition. The definition is built on a trace, which in this context, is annotated DAG. The DAG represents the parallel computation independent of the schedule yielding the sequential and logical parallel portions of the computation. It is similar to the parallel execution chart or the computation graph in HJ. The annotations the DAG are the inputs and outputs to each operation in the trace. “A program is internally deterministic if for any fixed input I, all possible executions with input I result in equivalent traces.” And to be clear, “two traces are equivalent traces if they have equivalent final states, equivalent control-flow DAGs, and corresponding DAG nodes are annotated with equivalent return values.” The paper defines sufficient conditions to have an internally deterministic program as first, “all logically parallel accesses of shared objects to use operations that commute”; and second, logically parallel operations are linearizable (i.e., reduce to a valid sequential history under any possible interleaving).

Robert H. B. Netzer and Barton P. Miller, LOPLAS 1992, What Are Race Conditions? Some Issues and Formalizations: the paper presents some basic formalism for characterizing data races. The formalism is over an actual program history, and it includes a relation describing program dependence. What is slightly unusual in the formalism though is that it does have an notion of two events being concurrent, literally, happening at the same time. As such, for two events, a may precede b in some execution, b may precede a in some execution, and a and b happen at the same time in some execution. The important nuance is that when a and b are related in the order relation, it is the order in which they are observed in the execution. When they are not ordered in the relation (even transitively), then it implies they happened at the same time. As such, a general race is one in which two events are observed in either order or at the same time. A data-race is only one in which the events are observed at the same time. Any data-race is an error in a program with critical sections. Any general race is an error in a deterministic program. Given the definitions of race, an

internally deterministic program has no general races;
externally deterministic program may have general races by they do not affect the final result from run to run (i.e., the races commute);
associatively non-deterministic programs have general races between associative operations and are externally non-deterministic due to round-off errors; and
completely non-deterministic programs have general races and simply do not fall into any of the previous categorizations.

The paper assumes sequential consistency throughout, but the definition of data-race, where the two events a and b are unordered (and assumed to happen at the same time) precisely matches the condition where sequential consistency breaks down in the Java memory model. In the Java memory model, to guarantee sequential consistency, conflicting events must be happens-before ordered; otherwise, the behavior is undefined. The paper finally does argue that discovering all general races is NP-hard. You must determine all orders allowed by the program.

Formalisms, Complexity, and Decidability

Ahmed Bouajjani and Michael Emmi, POPL 2012, Analysis of Recursively Parallel Programs: the paper presents a simple formalism for recursively parallel programs and shows how several language constructs from Cilk, X10, Habanero, etc. can be reduced to the given language. Reachability in the presence of task passing (tasks are handed to either child or parent) in general is undecidable. With reasonable restrictions (passing tasks only to parent) reachability is decidable. There is some nuance between single wait (only waiting for a spawned task to complete) and multi-wait (waiting for all spawned tasks to complete) that affect the complexity of the reachability problem. In general Cilk like constructs are NP-complete. The async (X10) construct only yields that state reachability is decidable. Reachability is not stated. In the single wait cases, when there is no aliasing (i.e., nothing shared), the problem reduces to PTIME; otherwise, complexity is 2EXPTIME (no surprise). More interesting is the recursively parallel programs language. Its semantics essentially describe computation graphs. The analysis of multi-wait languages such as X10 in section 7 assumes that there is no aliasing, “Note that although X10 allows, in general, asynchronous tasks to interleave their memory accesses, our model captures only non-interfering tasks, by assuming either data-parallelism (i.e., disjoint accesses to data), or by assuming tasks are properly synchronized to ensure atomicity.” The difference in complexity between Cilk and X10 is due to the local-scope property. Cilk and foreach in X10 only use local-scope meaning that tasks return with only empty memory-regions (i.e., tasks do not pass up to the parent from a child). The more general async/finish constructs do not have the local-scope property. The problem is known to be EXPSPACE-hard.

Miscellaneous

Subodh Sharma, Ganesh Gopalakrishnan, and Greg Bronevetsky, SBMF 2012, A sound reduction of persistent-sets for deadlock detection in MPI applications: the paper presents a reduction on the POE persistent set the is sound by not complete. The goal is to detect deadlock in MPI programs that arise from mixing wildcard receives with deterministic receives in the presence of non-determinism on send orderings. The intuition in the algorithm is defined in the notion of commuting sends. Two sends to a common endpoint commute if there exists an enabled wildcard receive that is able to match with either send and matching with one send leads to a state where the other send is still enabled or matchable (i.e., is not disabled). With such a definition, the new reduction removes from the persistent sets created by the POE algorithm any sends that commute. The benchmark results show a significant reduction in the number of schedules explored to show deadlock freedom; however, such a result is not complete since it may remove schedules where an actual deadlock does exist. If completeness is needed, then the POE reduction is preferred.

Mohamed Faouzi Atig, Ahmed Bouajjani, Michael Emmi and Akash Lal, CAV 2012, Detecting Fair Non-Termination in Multithreaded Programs: The paper finds a schedule given a bounded context that leads to a non-terminating execution (if one exists under the context) in a multi-threaded program. The non-terminating schedule is divided into a stem and lasso. The stem is the finite prefix and the lasso is the infinite suffix. A fair execution means any lock which is acquired causing other threads to block before the lasso remains locked through the lasso and any thread that is enabled in the lasso executes at least one instruction in the lasso. The formalization is on interfaces defined by annotated traces. An annotated trace describes the start context (i.e., global variable valuations, procedure stacks, etc.) and an end context. The trace moves the program from the begin context to the end context. The begin and end of an annotated trace creates an interface. The lasso must begin and end at an interface (i.e., arrive at a point where the global valuations are the same). Further, the procedure stacks must be non-decreasing (i.e., it will infinitely repeat. The stem must move from the initial state to the begin interface of the lasso. The formalism is very clean. The actual algorithm to find the stem and lasso performs iterative context bounding on the program by creating a sequential program for each schedule and checking assertions for the stem, lasso, and stack. The tool is implemented in BOOGIE and byte code for verification.

Vineet Kahlon and Chao Wang,, CAV 2012, Lock Removal for Concurrent Trace Programs: The paper recognizes that SMT techniques do not scale efficiently with complex thread interactions and reduces those interactions by removing non-needed locks. The primary contribution of the paper is therefore an algorithm to remove redundant locks from a given program execution trace in preparation for a precise SMT analysis. The single program trace resolves all branches, address references, and loops. The information is used to precisely identify shared memory interactions along the trace. Non-shared memory interactions are considered invisible and replaced with NOPs in the program. The analysis then removes redundant locks. The algorithm to detect redundant locks is based on the lock acquisition pattern (LAP) algorithm and is thread local. The analysis first generates pairs of program locations (POIs) in the trace that start and end a single transition: a transition starts with a communication operation and ends at the next subsequent communication operations–all operations between are not visible or lock releases. For each POI in a single thread, the analysis computes the LAP. This POI and LAP computation is repeated for every thread. The LAPs between threads are then compared to see if they are inconsistent (i.e., actually constrain the valid schedules in the program). If the two LAPs from two different threads are found to be inconsistent, then the root lock causing the inconsistency is considered necessary to the program as it constrains the set of viable schedules. All locks that are not root-cause can be safely removed from the trace after which the trace is ready for SMT analysis. The results show several examples that complete with the non-needed lock interactions removed whereas before they ran out of memory. A good proof-of-concept that static-analysis can reduce the cost of precise analysis. The paper lumps its result in the context of predictive analysis in that it uses static-analysis to predict potential errors and then applies precise analysis to construct a proof of feasibility for the error.

Vincent Cave, Jisheng Zhao, Jun Shirako, and Vivek Sarkar, PPPJ 2011, Habanero-Java: the new adventures of old X10: the paper presents the Habanero Java (HJ) language with its compilation, runtime, and IDE framework. An interesting result is a set of 7 theorems claiming properties of different subsets of the HJ language features. The theorems are not formally proven in the paper (and no obvious reference is given for the proofs).

Michael Emmi, Shaz Qadeer, and Zvonimir Rakamaric, POPL 2011, Delay-Bounded Scheduling: the paper presents a scheduler for multi-threaded programs that is more efficient than iterative context scheduling. The scheduler essentially considers the number of delays from a round-robin schedule. Like iterative context bounding, the scheduler is sound only up to the bound on the number of delays. The paper shows how to implement the delay-bounded scheduler in three different ways including a reduction to a sequential program. The formalism is very well presented.

Roberto Lublinerman, Jisheng Zhao, Zoran Budimlic ́, Swarat Chaudhuri, and Vivek Sarkar, OOPSLA 2011, Delegated Isolation: this paper introduces the an async isolated keyword to Habanero Java. The keyword spawns a task intended to execute in isolation from any other task with which it may conflict. Tasks conflict if they access the same heap objects. To resolve a conflict between two tasks A and B, one task, say A, delegates itself to B. “Delegate” in this context means that A gives B all the heap objects in its region (i.e., those objects to which it has exclusive access), and A places itself on a task queue that B must process at a later date in order to complete the task that A had started. The paper uses the notion of declarative isolation similar to Chorus, but is distinct from Chorus with the delegation rather than replay for conflict. A task is referred to as an assembly in the paper, and an assembly has a region of the heap to which it has exclusive access. A general implementation of the runtime for delegated isolation (that runtime is named Aida in the paper), must implement a rollback mechanism to rollback heap mutations on delegation. The paper notes, however, that for cautious programs (i.e., programs that acquire all objects before any mutation), the rollback mechanism is not needed as the conflict is detected, and the delegation takes place, before any mutation. The paper shows the Aida runtime to be more efficient than transactional memory systems (DTSM2). The paper also discusses some minor details of the Aida runtime implementation including its fine-grain locking for detection object conflicts and implementing delegation (e.g., the union-find structure). The paper does present formal semantics for delegated isolation; those semantics block mutation before async isolated to simplify the presentation and avoid having to kill assemblies spawned before the delegation. As such, assignment/delegation and spawning new isolated blocks take place in phases.

Partial Order Reduction

Sarvani Vakkalanka, Ganesh Gopalakrishnan, and Robert M. Kirby, CAV 2008, Dynamic Verification of MPI Programs with Reductions in Presence of Split Operations and Relaxed Orderings: The work presents a partial order reduction for model checking MPI programs. The reduction is named partial order with elusive transitions (POE). The paper makes the observation that the MPI semantics relax the program issue order of MPI operations from different processes. The work also recognizes that when calls into MPI return to the caller, it does not mean message transactions are completed. For example, the completion of a send operation does not imply message reception. MPI semantics decouple the two operations. The POE algorithm runs each process to the point where the call into MPI must witness the completion of an operation. These witness operations are barriers, tests, and waits. When each process is at a witness operation, the set of issued MPI calls since the last witness is analyzed to determine what the MPI semantics allow in terms of behavior. The algorithm gives priority to barriers, which are issued as soon as possible and the involved processes are then again run to witness operations. In the absence of barriers, it implies that all processes are waiting/testing the completion of send or receive operations. The potential match sets are then constructed and explored. The model checker implementation intercepts all MPI calls and records those calls in a log. Once all processes are witnessing, the model checking chooses the action (with barriers getting preference) and issues the appropriate calls to the MPI runtime. In the case of wild-card receive calls, the model checker rewrites the calls to be deterministic by indicating the specific sends to which receives should match. The model checking witnesses the send/receive pairs by using blocking forms of the API to ensure a particular schedule. The paper confuses out-of-order execution with MPI semantics in the presentation. The program does not execute out-of-order. The program issues all calls into MPI in program order. MPI semantics account for the seeming reordering of those calls, but in reality, there is not reordering, just MPI semantics. As such, the ISP tool implements MPI semantics by first intercepting all the calls, and then generating deterministic schedules that reflect orderings allowed by the MPI semantics. The key is that the tool captures calls in program order, as does an actual runtime, and then generates a schedule allowed by the semantics, as does the actual runtime. Unlike the runtime though, the tool is able to enumerate all schedules allowed by MPI semantics to verify there is no deadlock or assertion violation.

Symbolic Execution

Sam Blackshear, Bor-Yuh Evan Chang, and Manu Sridharan, PLDI 2013, Thresher: Precise Refutations for Heap Reachability: the paper uses context-insensitive flow analysis to build a points-to graph that guides a more precise mixed symbolic-explicit representation of the heap in a backward symbolic execution. The goal is to prune out false-alarms in the points-to graph by either witnessing an actual error or refuting the path. As it must refute the path to prune a false-alarm, and the points-to graph is very imprecise, the approach must consider many edges in the graph before it is able to refute. To speed up the more precise analysis, the paper quickly prunes infeasible paths using the points-to graph (using through separation: the same variable cannot point to two different locations), a special mixed symbolic-explicit representation, and a way to infer loop invariants. That paper shows the value of the approach in an empirical study. A few comments on the loop-invariant claim and the empirical study. The loop-invariant is only valid under a bound on the number of materializations in the symbolic execution, and it abstracts the constraint set for pure (i.e. primitive) variable that are defined on each loop iteration. The analysis is still sound under the abstraction. The results are interesting. The three primary contributions, early refuting, mixed symbolic-explicit representation, and loop invariant inferring, in the results, show that neither contribution alone is the killer feature. In fact, the number of false alarms, with our without any one contribution, is unchanged. Rather, it is the running time that changes. As such, the informed backward symbolic execution adds the precision, and the other contributions improve the running time. Regardless, the paper yields important insights to the application of separation logic in backward symbolic execution with materialization. It also shows the importance of refuting. The contribution on inferring loop-invariance is unclear as it seems to be a direct application of a bound (which is rather common).

Formal Verification and Modeling in Human-Machine Systems

Matthew L. Boulton, Ellen J. Bass, and Radu I. Siminiceanu, IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans 2012, Using Formal Verification to Evaluate Human-Automation Interaction: A Review: the paper is an excellent starting point for an overview of work in modeling and verifying human-automation interaction (HAI). Sources of failure in HAI include brittle automation, poor interfaces that lack needed information, human attention or lack of training, incompatible behavior between the device and the human, and inadequate human-automation interface that causes mode confusion or lack of usability. Theorem proving is applied to these systems, but the bulk of the work is in model checking (more recently). Modeling is largely handled by some type of guarded transition system (or state machine system) expressed in a multitude of languages. Properties that can readily be verified by temporal logics include (Campos and Harrison)

Reachability
Visibility
Task related
Reliability

Table II has a great summary describing each of these aspects of a system. There is a significant body of work exploring mode confusion. This work also includes automatic construction of mental models to represent the human. These mental models though, “are incomplete, are difficult for the human operator to “run,” are unstable, do not have firm boundaries, are unscientific, and are “parsimonious” (overly abstract to save on mental complexity).” Other work is focused on human knowledge models and rather explores what the human is trying to accomplish “the approach requires that analysts capture the potentially imperfect knowledge the human operator has about achieving goals with the system.” When looking at system properties, there are two broad groups of research: task models and cognitive models. Task models are concerned with capturing the observable manifestation of human behavior while cognitive models are concerned with describing the cognitive process that drives the observable human behavior. I differentiate a task model as process driven while a cognitive model is goal driven (eventually arriving at tasks). Regardless, unlike the work on mode confusion, mental models, and human knowledge models, this later work is trying to model the human and understand the system given the human model. Most interesting is multi-operator systems:

A single task model could be created describing the behavior of every human operator in the system;

Separate task models could be created for each human operator with constructs being used to describe the coordinated and communicative relationships between activities and actions between operator tasks; and

A separate task model could be created for each human operator but with the addition of a shared structure which relates the coordinated and communicative relationships between the activities and actions contained in the separate models.

Cognitive models are goal driven, and there has yet to be research to compare cognitive models to task models (though the cognitive models seem to be more complex). A most interesting piece of future work identified is, “…investigating how common HAI metrics like timing, workload [169], [170], situation awareness [46], [171], trust [172], and compromise [50] can be extracted from verification analyses.” The paper also notes that, “Analysts could then use human subject testing to explore whether the potentially iden- tified problems can actually occur and explore how to correct them. Such a method offers advantages beyond using either of these two techniques separately.” As such, a role for the model is to discover when needs to be considered in a field study. Finally, figuring out measures that reflect the real world for duration of tasks, difficulty of tasks, workload, etc. is a wide open problem. What can be observed in the field, measured, and then integrated into the model in a way that enables the model to be predictive is largely unknown for many tasks.

Ontology Models and Verification

Keith A. Butler, Anne Hunt, John Muehleisen, Jiaji Zhang, and Beth Huffer, Ontology models for interaction design: case study of online support, CHI 2010.

A Usre-friendly Tool for Model Checking Health-care Workflows

vv-lab/annotated-bibliography.txt · Last modified: 2015/02/18 15:31 by egm

Back to top

Table of Contents