Analysis of programs

From Encyclopedia of Mathematics
Jump to: navigation, search

program analysis

Program analysis offers static (i.e. compile-time) techniques for predicting safe and computable approximations to the set of values or behaviours arising dynamically (i.e. at run-time) when executing a program upon a computer. A main application is to allow compilers to generate more efficient code by avoiding to generate code to handle errors that can be shown not to arise, and to generate more specialized and faster code by taking advantage of the availability of data in registers and caches. Among the newer applications is the validation of software (possibly developed by subcontractors) to reduce the likelihood of malicious behaviour. To illustrate the breadth of the approaches, five types of analyses will be discussed: i) flow-based analysis; ii) inference-based analysis; iii) semantics-based analysis; iv) abstract interpretation; and v) fixed-point approximation techniques. These approaches borrow mathematical techniques from lattice theory and logical systems.

Flow-based approaches to program analysis.

These have mainly been developed for imperative, functional and object-oriented programs. For a given program the task is to find a flow function that maps the program points in to approximate descriptions of the data that may reach those points. This may be formalized by a relation defined by a number of clauses depending on the syntax of the program . As a very simple example, consider the program that takes an input, executes on it, then executes on the result obtained from , and finally produces the result from as the overall result. The defining clause for this program may be written:'
<tbody> </tbody>
if and only if

In the presence of recursive programs, the predicate cannot be defined by structural induction on and instead one appeals to co-induction; this merely amounts to taking the greatest fixed point of a function mapping sets of pairs into sets of pairs as described by the right-hand sides of the clauses.

It is essential that the analysis information is semantically correct. Given a semantics for describing when the program maps the input to the output , this amounts in its simplest form to establishing a subject reduction result:

Finally, realization of the analysis amounts to finding the minimal (with respect to the pointwise partial order on ) for which ; this is related to the lattice-theoretic notion of being a Moore family. Often, this is done by obtaining a more implementation-oriented specification for generating a set of constraints , deriving a function mapping flow functions to flow functions, and ensuring that if and only if .

Inference-based approaches to program analysis.

These have mainly been developed for functional, imperative, and concurrent programs. For a given program , the task is to find an "annotated type" describing the overall effect of executing : if describes the input data, then describes the output data, and properties of the dynamic executions (e.g., whether or not errors can occur) are described by . This may be formalized by a relation defined by a number of inference rules (cf. Permissible law (inference)); some of these are defined compositionally in the structure of the program as in:

Here, combines the effects and into an overall effect; in the simplest cases, simply equals , but in more complex cases the causal structure (i.e. which effect comes first) must be retained. Successful analysis of recursive programs demands that invariants be guessed in order to obtain a finite inference tree. (This form of program analysis is closely related to program verification: see Theoretical programming, the section on "The logical theory of programs" .)

Semantic correctness necessitates relations between values and properties, e.g. , and between the effects of the dynamic semantics and the analysis, e.g. . The correctness statement then amounts to a subject reduction result:

Implementation of the analysis amounts to defining an algorithm that is syntactically sound, i.e. , and syntactically complete, i.e. implies that is an "instance" of ; the precise definition of "instance" depends on the nature of the properties and may have some complexity to it. Often, the algorithm works using constraints somewhat similar to those used in the flow-based approach.

Semantics-based approaches to program analysis.

These attempt to define the analysis in much the same way that the semantics itself is defined. These developments have mainly been based on denotational semantics (where is written ) for imperative and functional programs, although work has started on similar developments based on (so-called small-step) structural operational semantics and (so-called big-step) natural semantics (cf. also Theoretical programming). The denotational approach may be formulated as an analysis function , where is a mathematical entity describing the input to and is a mathematical entity describing the output from . The denotational approach calls for to be defined compositionally in terms of the syntactic constituents of the program ; an example of this is the functional composition

i.e., . In case of recursive programs, a least fixed point is taken of a suitable functional.

To express the correctness of the analysis one needs to define correctness relations on the data, e.g. , and on the computations, e.g. ; the latter definition is usually called a logical relation. Semantic soundness then amounts to establishing

for all programs of type . Efficient implementation of the analyses often demands that they be presented in a form less dependent on the program syntax, e.g. using graphs or constraints, and then the solution procedure must be shown sound and complete.

Abstract interpretation.

This is a general framework for obtaining cheaper and more approximative solutions to program analysis problems. In its most general form it attempts to be independent of the semantic base, but usually it is formulated in either a structural operational or a denotational setting. Given the semantics of a program , the first step is often to define the uncomputable collecting semantics . The intention is that

and this may be regarded as a soundness as well as a completeness statement.

To obtain a more approximate and eventually computable analysis, the most commonly used approach is to replace the sets of values by some more approximate properties . Both and are partially ordered in such a way that one generally wants the least correct element and where the greatest element is uninformative. (By the lattice duality principle, one could instead have desired the greatest correct element, and part of the literature on data flow analysis uses this vocabulary.) The relation between and is often formalized by a pair of functions:

where is the concretization function giving the "meaning" of properties in , and is the abstraction function. It is customary to demand that and are complete lattices (cf. Complete lattice). Furthermore, it is customary to demand that is a Galois connection, i.e. and are monotone and is extensive and is reductive , or, equivalently, that is an adjunction, i.e. . This ensures that a set of values always has a best description ; in fact, .

Abstract interpretation then amounts to define an analysis on the properties of such that the following correctness statement holds:

This may be illustrated by the diagram

Figure: a110580a

It expresses that a certain diagram semi-commutes.

Example analyses that can be obtained in this way include intervals as approximations of sets of integers, polygons in the plane as approximations of sets of pairs of numbers, representations of arithmetical congruences as approximations of periodic sets as well as combinations of these.

Fixed-point approximation.

This is essential for program analysis because most of the approaches can be reformulated as finding the least fixed point (cf. also Fixed point) of a monotone function on a complete lattice of properties; often, each element describes a set of values. There are two characterizations of the least fixed point that are of relevance here. Tarski's theorem ensures that

satisfies and hence is the least fixed point of . Transfinite induction allows one to define

and then the least fixed point is given by

for some cardinal number ; it suffices to let be the cardinality of . In simple cases, has finite height (all strictly increasing sequences are finite) and then may be taken to be a natural number ; this then suggests a computable procedure for calculating : simply do the finite number of calculations .

In general, cannot simply be taken to be a natural number and then one has to be content with a computable procedure for finding some such that ; for pragmatic reasons should be "as close to" as possible. An upper bound operator is a not necessarily monotone function such that for all . Given a not necessarily increasing sequence , one can then define

for all natural numbers . It is easy to show that is in fact an increasing sequence. A widening operator is an upper bound operator such that eventually stabilizes (i.e. ) for all increasing sequences . (A trivial and uninteresting example of a widening operator is given by .) One can now define

for all natural numbers . One can show that eventually stabilizes and that equals for some and that . Hence, and, furthermore, there is a computable procedure for calculating : simply do the finite number of calculations . Examples of "good" widening operators can be found in the literature.


[a1] P. Cousot, "Semantic foundations of program analysis" S.S. Muchnick (ed.) N.D. Jones (ed.) , Program Flow Analysis: Theory and Applications , Prentice-Hall (1981) pp. 303–342
[a2] S. Jagannathan, S. Weeks, "A unified treatment of flow analysis in higher-order languages" , Proc. POPL '95 , ACM Press (1995) pp. 393–407
[a3] N.D. Jones, F. Nielson, "Abstract interpretation: a semantics-based tool for program analysis" S. Abramsky (ed.) D.M. Gabbay (ed.) T.S.E. Maibaum (ed.) , Handbook of Logic in Computer Science , 4 , Oxford Univ. Press (1995) pp. 527–636
[a4] J.P. Talpin, P. Jouvelot, "The type and effect discipline" , Information and Computation , 111 (1994)
How to Cite This Entry:
Analysis of programs. H.R. NielsonF. Nielson (originator), Encyclopedia of Mathematics. URL:
This text originally appeared in Encyclopedia of Mathematics - ISBN 1402006098