Principles of Program Analysis: Data Flow Analysis

October 30, 2017 | Author: Anonymous | Category: N/A

Share Embed

Report this link

Short Description

of the book: Flemming Nielson,. Hanne Riis Nielson and Chris Hankin: principles of program analysis ......

Description

Principles of Program Analysis: Data Flow Analysis Transparencies based on Chapter 2 of the book: Flemming Nielson, Hanne Riis Nielson and Chris Hankin: Principles of Program Analysis. c Flemming Nielson & Hanne Riis Nielson & Chris Springer Verlag 2005. Hankin.

PPA Chapter 2

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

1

Example Language Syntax of While-programs a ::= x | n | a1 opa a2 b

::= true | false | not b | b1 opb b2 | a1 opr a2

S ::= [x := a]` | [skip]` | S1; S2 | if [b]` then S1 else S2 | while [b]` do S

Example:

[z:=1]1; while [x>0]2 do ([z:=z*y]3; [x:=x-1]4)

Abstract syntax – parentheses are inserted to disambiguate the syntax

PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

2

Building an “Abstract Flowchart” Example:

[z:=1]1; while [x>0]2 do ([z:=z*y]3; [x:=x-1]4)

?

init(· · ·) = 1

[z:=1]1

final(· · ·) = {2} ? ?

labels(· · ·) = {1, 2, 3, 4}

[x>0]2

no

-

yes flow(· · ·) = {(1, 2), (2, 3), (3, 4), (4, 2)} flowR(· · ·) = {(2, 1), (2, 4),

?

[z:=z*y]3 ?

[x:=x-1]4

(3, 2), (4, 3)} PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

3

Initial labels init(S) is the label of the first elementary block of S: init : Stmt → Lab init([x := a]`) = ` init([skip]`) = ` init(S1; S2) = init(S1) init(if [b]` then S1 else S2) = ` init(while [b]` do S) = `

Example: init([z:=1]1; while [x>0]2 do ([z:=z*y]3; [x:=x-1]4)) = 1

PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

4

Final labels final(S) is the set of labels of the last elementary blocks of S: final : Stmt → P(Lab) final([x := a]`) = {`} final([skip]`) = {`} final(S1; S2) = final(S2) final(if [b]` then S1 else S2) = final(S1) ∪ final(S2) final(while [b]` do S) = {`}

Example: final([z:=1]1; while [x>0]2 do ([z:=z*y]3; [x:=x-1]4)) = {2}

PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

5

Labels labels(S) is the entire set of labels in the statement S: labels : Stmt → P(Lab) labels([x := a]`) = {`} labels([skip]`) = {`} labels(S1; S2) = labels(S1) ∪ labels(S2) labels(if [b]` then S1 else S2) = {`} ∪ labels(S1) ∪ labels(S2) labels(while [b]` do S) = {`} ∪ labels(S)

Example labels([z:=1]1; while [x>0]2 do ([z:=z*y]3; [x:=x-1]4)) = {1, 2, 3, 4}

PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

6

Flows and reverse flows flow(S) and flowR(S) are representations of how control flows in S: flow, flowR : Stmt → P(Lab × Lab) flow([x := a]`) = ∅ flow([skip]`) = ∅ flow(S1; S2) = flow(S1) ∪ flow(S2) ∪ {(`, init(S2)) | ` ∈ final(S1)} flow(if [b]` then S1 else S2) = flow(S1) ∪ flow(S2) ∪ {(`, init(S1)), (`, init(S2))} flow(while [b]` do S) = flow(S) ∪ {(`, init(S))} ∪ {(`0, `) | `0 ∈ final(S)} flowR(S) = {(`, `0) | (`0, `) ∈ flow(S)}

PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

7

Elementary blocks A statement consists of a set of elementary blocks blocks : Stmt → P(Blocks) blocks([x := a]`) = {[x := a]`} blocks([skip]`) = {[skip]`} blocks(S1; S2) = blocks(S1) ∪ blocks(S2) blocks(if [b]` then S1 else S2) = {[b]`} ∪ blocks(S1) ∪ blocks(S2) blocks(while [b]` do S) = {[b]`} ∪ blocks(S) A statement S is label consistent if and only if any two elementary statements [S1]` and [S2]` with the same label in S are equal: S1 = S2 A statement where all labels are unique is automatically label consistent PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

8

Intraprocedural Analysis Classical analyses: • Available Expressions Analysis • Reaching Definitions Analysis • Very Busy Expressions Analysis • Live Variables Analysis Derived analysis: • Use-Definition and Definition-Use Analysis PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

9

Available Expressions Analysis The aim of the Available Expressions Analysis is to determine

For each program point, which expressions must have already been computed, and not later modified, on all paths to the program point.

Example:

point of interest ⇓ [x:= a+b ]1; [y:=a*b]2; while [y> a+b ]3 do ([a:=a+1]4; [x:= a+b ]5)

The analysis enables a transformation into [x:= a+b]1; [y:=a*b]2; while [y> x ]3 do ([a:=a+1]4; [x:= a+b]5)

PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

10

Available Expressions Analysis – the basic idea

X1

X2 HH H

HH H

HH H

HH H

HH HH j

N = X1 ∩ X2 x := a

kill }| { X = (N \{expressions with an x} ) z

∪ {subexpressions of a without an x} | {z } ? gen

PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

11

Available Expressions Analysis kill killAE([x := a]`) killAE([skip]`) killAE([b]`)

and = = =

gen functions {a0 ∈ AExp? | x ∈ FV(a0)} ∅ ∅

genAE([x := a]`) = {a0 ∈ AExp(a) | x 6∈ FV(a0)} genAE([skip]`) = ∅ genAE([b]`) = AExp(b) data flow equations: AE= (

AEentry (`) =

∅ T

{AEexit (`0) | (`0, `) ∈ flow(S?)}

if ` = init(S?) otherwise

AEexit (`) = (AEentry (`)\killAE(B `)) ∪ genAE(B `) where B ` ∈ blocks(S?) PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

12

Example: [x:=a+b]1; [y:=a*b]2; while [y>a+b]3 do ([a:=a+1]4; [x:=a+b]5) kill and gen functions: killAE(`) genAE(`) ` 1 ∅ {a+b} 2 ∅ {a*b} 3 ∅ {a+b} 4 {a+b, a*b, a+1} ∅ 5 ∅ {a+b}

PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

13

Example (cont.): [x:=a+b]1; [y:=a*b]2; while [y>a+b]3 do ([a:=a+1]4; [x:=a+b]5) Equations:

AEentry (1) AEentry (2) AEentry (3) AEentry (4) AEentry (5) AEexit (1) AEexit (2) AEexit (3) AEexit (4) AEexit (5)

PPA Section 2.1

= ∅ = AEexit (1) = AEexit (2) ∩ AEexit (5) = AEexit (3) = AEexit (4) = AEentry (1) ∪ {a+b} = AEentry (2) ∪ {a*b} = AEentry (3) ∪ {a+b} = AEentry (4)\{a+b, a*b, a+1} = AEentry (5) ∪ {a+b}

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

14

Example (cont.): [x:=a+b]1; [y:=a*b]2; while [y> a+b ]3 do ([a:=a+1]4; [x:=a+b]5) Largest solution: ` AEentry (`) AEexit (`) 1 ∅ {a+b} 2 {a+b} {a+b, a*b} 3 {a+b} {a+b} 4 {a+b} ∅ 5 ∅ {a+b}

PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

15

Why largest solution? 0

[z:=x+y]`; while [true]` do [skip]`

00

Equations: ?

AEentry (`) = ∅ AEentry (`0) = AEexit (`) ∩ AEexit (`00) AEentry (`00) = AEexit (`0) AEexit (`) = AEentry (`) ∪ {x+y} AEexit (`0) = AEentry (`0) AEexit (`00) = AEentry (`00)

[· · ·]` ? ?

[· · ·]`

0

no

-

yes ?

[· · ·]`

00

After some simplification: AEentry (`0) = {x+y} ∩ AEentry (`0) Two solutions to this equation: {x+y} and ∅ PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

16

Reaching Definitions Analysis The aim of the Reaching Definitions Analysis is to determine

For each program point, which assignments may have been made and not overwritten, when program execution reaches this point along some path.

Example:

point of interest ⇓ 1 2 3 [x:=5] ; [y:=1] ; while [x>1] do ([y:=x*y]4; [x:=x-1]5)

useful for definition-use chains and use-definition chains

PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

17

Reaching Definitions Analysis – the basic idea

X1

X2 HH H

HH H

HH H

HH H

HH HH j

N = X1 ∪ X2 [x := a]`

kill }| { X = (N \{(x, ?), (x, 1), · · ·} ) z

∪ {(x, `)} | {z } ? gen

PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

18

Reaching Definitions Analysis kill and gen functions killRD([x := a]`) = {(x, ?)} 0 0 ` ∪{(x, ` ) | B is an assignment to x in S?} killRD([skip]`) = ∅ killRD([b]`) = ∅ genRD([x := a]`) = {(x, `)} genRD([skip]`) = ∅ genRD([b]`) = ∅ data flow equations: RD= ( {(x, ?) | x ∈ FV(S?)} S RDentry (`) = {RDexit (`0) | (`0, `) ∈ flow(S?)}

if ` = init(S?) otherwise

RDexit (`) = (RDentry (`)\killRD(B `)) ∪ genRD(B `) where B ` ∈ blocks(S?) PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

19

Example: [x:=5]1; [y:=1]2; while [x>1]3 do ([y:=x*y]4; [x:=x-1]5) kill and gen functions: ` 1 2 3 4 5

PPA Section 2.1

killRD(`) genRD(`) {(x, ?), (x, 1), (x, 5)} {(x, 1)} {(y, ?), (y, 2), (y, 4)} {(y, 2)} ∅ ∅ {(y, ?), (y, 2), (y, 4)} {(y, 4)} {(x, ?), (x, 1), (x, 5)} {(x, 5)}

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

20

Example (cont.): [x:=5]1; [y:=1]2; while [x>1]3 do ([y:=x*y]4; [x:=x-1]5) Equations:

RDentry (1) RDentry (2) RDentry (3) RDentry (4) RDentry (5) RDexit (1) RDexit (2) RDexit (3) RDexit (4) RDexit (5)

PPA Section 2.1

= {(x, ?), (y, ?)} = RDexit (1) = RDexit (2) ∪ RDexit (5) = RDexit (3) = RDexit (4) = (RDentry (1)\{(x, ?), (x, 1), (x, 5)}) ∪ {(x, 1)} = (RDentry (2)\{(y, ?), (y, 2), (y, 4)}) ∪ {(y, 2)} = RDentry (3) = (RDentry (4)\{(y, ?), (y, 2), (y, 4)}) ∪ {(y, 4)} = (RDentry (5)\{(x, ?), (x, 1), (x, 5)}) ∪ {(x, 5)}

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

21

Example (cont.): [x:=5]1; [y:=1]2; while [x>1]3 do ([y:= x*y ]4; [x:=x-1]5) Smallest solution: ` RDentry (`) RDexit (`) 1 {(x, ?), (y, ?)} {(y, ?), (x, 1)} 2 {(y, ?), (x, 1)} {(x, 1), (y, 2)} 3 {(x, 1), (y, 2), (y, 4), (x, 5)} {(x, 1), (y, 2), (y, 4), (x, 5)} 4 {(x, 1), (y, 2), (y, 4), (x, 5)} {(x, 1), (y, 4), (x, 5)} 5 {(x, 1), (y, 4), (x, 5)} {(y, 4), (x, 5)}

PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

22

Why smallest solution? 0

[z:=x+y]`; while [true]` do [skip]`

00

Equations: ?

RDentry (`) = {(x, ?), (y, ?), (z, ?)} RDentry (`0) = RDexit (`)∪ RDexit (`00) RDentry (`00) = RDexit (`0) RDexit (`) = (RDentry (`) \ {(z, ?)})∪{(z, `)} RDexit (`0) = RDentry (`0) RDexit (`00) = RDentry (`00)

[· · ·]` ? ?

[· · ·]`

0

no

-

yes ?

[· · ·]`

00

After some simplification: RDentry (`0) = {(x, ?), (y, ?), (z, `)} ∪ RDentry (`0) Many solutions to this equation: any superset of {(x, ?), (y, ?), (z, `)} PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

23

Very Busy Expressions Analysis An expression is very busy at the exit from a label if, no matter what path is taken from the label, the expression is always used before any of the variables occurring in it are redefined. The aim of the Very Busy Expressions Analysis is to determine For each program point, which expressions must be very busy at the exit from the point.

Example: point of interest ⇓ if [a>b]1 then ([x:= b-a ]2; [y:= a-b ]3) else ([y:= b-a ]4; [x:= a-b ]5) The analysis enables a transformation into [t1:= b-a ]A; [t2:= b-a ]B ; if [a>b]1 then ([x:=t1]2; [y:=t2]3) else ([y:=t1]4; [x:=t2]5) PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

24

Very Busy Expressions Analysis – the basic idea 6

kill }| { N = (X\{all expressions with an x} ) ∪ {all subexpressions of a} | {z } gen x := a z

X = N1 ∩ N2 *

N1

PPA Section 2.1

HH Y H HH

H HH

H HH

H HH

H H

N2

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

25

Very Busy Expressions Analysis kill and killVB([x := a]`) = killVB([skip]`) = killVB([b]`) =

gen functions {a0 ∈ AExp? | x ∈ FV(a0)} ∅ ∅

genVB([x := a]`) = AExp(a) genVB([skip]`) = ∅ genVB([b]`) = AExp(b) data flow equations: VB= (

VBexit (`) =

∅ T

if ` ∈ final(S?) {VBentry (`0) | (`0, `) ∈ flowR(S?)} otherwise

VBentry (`) = (VBexit (`)\killVB(B `)) ∪ genVB(B `) where B ` ∈ blocks(S?) PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

26

Example: if [a>b]1 then ([x:=b-a]2; [y:=a-b]3) else ([y:=b-a]4; [x:=a-b]5) kill and gen function: ` killVB(`) genVB(`) 1 ∅ ∅ 2 ∅ {b-a} 3 ∅ {a-b} 4 ∅ {b-a} 5 ∅ {a-b}

PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

27

Example (cont.): if [a>b]1 then ([x:=b-a]2; [y:=a-b]3) else ([y:=b-a]4; [x:=a-b]5) Equations:

VBentry (1) VBentry (2) VBentry (3) VBentry (4) VBentry (5) VBexit (1) VBexit (2) VBexit (3) VBexit (4) VBexit (5)

PPA Section 2.1

= VBexit (1) = VBexit (2) ∪ {b-a} = {a-b} = VBexit (4) ∪ {b-a} = {a-b} = VBentry (2) ∩ VBentry (4) = VBentry (3) = ∅ = VBentry (5) = ∅

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

28

Example (cont.): if [a>b]1 then ([x:=b-a]2; [y:=a-b]3) else ([y:=b-a]4; [x:=a-b]5) Largest solution: ` 1 2 3 4 5

PPA Section 2.1

VBentry (`) VBexit (`) {a-b, b-a} {a-b, b-a} {a-b, b-a} {a-b} {a-b} ∅ {a-b, b-a} {a-b} {a-b} ∅

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

29

Why largest solution? (while

[x>1]`

do

0 00 ` ` [skip] ); [x:=x+1]

Equations:

VBentry (`) = VBexit (`) VBentry (`0) = VBexit (`0) VBentry (`00) = {x+1} VBexit (`) = VBentry (`0) ∩ VBentry (`00) VBexit (`0) = VBentry (`) VBexit (`00) = ∅

? ?

[· · ·]`

no

00

-

[· · ·]`

yes ?

[· · ·]`

?

0

After some simplifications: VBexit (`) = VBexit (`) ∩ {x+1} Two solutions to this equation: {x+1} and ∅ PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

30

Live Variables Analysis A variable is live at the exit from a label if there is a path from the label to a use of the variable that does not re-define the variable. The aim of the Live Variables Analysis is to determine For each program point, which variables may be live at the exit from the point.

Example: point of interest ⇓ [ x :=2]1; [y:=4]2; [x:=1]3; (if [y>x]4 then [z:=y]5 else [z:=y*y]6); [x:=z]7 The analysis enables a transformation into [y:=4]2; [x:=1]3; (if [y>x]4 then [z:=y]5 else [z:=y*y]6); [x:=z]7 PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

31

Live Variables Analysis – the basic idea 6

kill z}|{

N = (X\{x} ) ∪ {all variables of a} | {z } gen x := a X = N1 ∪ N2 *

N1

PPA Section 2.1

HH Y H HH

H HH

H HH

H HH

H H

N2

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

32

Live Variables Analysis kill and gen functions killLV ([x := a]`) = {x} killLV ([skip]`) = ∅ killLV ([b]`) = ∅ genLV ([x := a]`) = FV(a) genLV ([skip]`) = ∅ genLV ([b]`) = FV(b) data flow equations: LV= (

LVexit (`) =

∅ S

if ` ∈ final(S?) {LVentry (`0) | (`0, `) ∈ flowR(S?)} otherwise

LVentry (`) = (LVexit (`)\killLV (B `)) ∪ genLV (B `) where B ` ∈ blocks(S?) PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

33

Example: [x:=2]1; [y:=4]2; [x:=1]3; (if [y>x]4 then [z:=y]5 else [z:=y*y]6); [x:=z]7 kill and gen functions: ` killLV (`) genLV (`) 1 {x} ∅ 2 {y} ∅ 3 {x} ∅ 4 ∅ {x, y} 5 {z} {y} 6 {z} {y} 7 {x} {z}

PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

34

Example (cont.): [x:=2]1; [y:=4]2; [x:=1]3; (if [y>x]4 then [z:=y]5 else [z:=y*y]6); [x:=z]7 Equations:

LVentry (1) LVentry (2) LVentry (3) LVentry (4) LVentry (5) LVentry (6) LVentry (7)

= LVexit (1)\{x} = LVexit (2)\{y}

= LVentry (2) = LVentry (3)

=

= LVentry (4) = LVentry (5) ∪ LVentry (6)

= = = =

PPA Section 2.1

LVexit (1) LVexit (2) LVexit (3)\{x} LVexit (3) LVexit (4) ∪ {x, y} LVexit (4) (LVexit (5)\{z}) ∪ {y} LVexit (5) (LVexit (6)\{z}) ∪ {y} LVexit (6) {z} LVexit (7)

= LVentry (7) = LVentry (7) = ∅

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

35

Example (cont.): [x:=2]1; [y:=4]2; [x:=1]3; (if [y>x]4 then [z:=y]5 else [z:=y*y]6); [x:=z]7 Smallest solution: ` LVentry (`) LVexit (`) 1 ∅ ∅ 2 ∅ {y} 3 {y} {x, y} 4 {x, y} {y} 5 {y} {z} 6 {y} {z} 7 {z} ∅

PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

36

Why smallest solution? (while

[x>1]`

do

0 00 ` ` [skip] ); [x:=x+1]

Equations:

LVentry (`) = LVexit (`) ∪ {x} LVentry (`0) = LVexit (`0) LVentry (`00) = {x} LVexit (`) = LVentry (`0) ∪ LVentry (`00) LVexit (`0) = LVentry (`) LVexit (`00) = ∅

? ?

[· · ·]`

no

00

-

[· · ·]`

yes ?

[· · ·]`

?

0

After some calculations: LVexit (`) = LVexit (`) ∪ {x} Many solutions to this equation: any superset of {x} PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

37

Derived Data Flow Information • Use-Definition chains or ud chains: each use of a variable is linked to all assignments that reach it [x:=0]1; [x:=3]2; (if [z=x]3 then [z:=0]4 else [z:=x]5); [y:= x ]6; [x:=y+z]7 6

• Definition-Use chains or du chains: each assignment to a variable is linked to all uses of it [x:=0]1; [ x :=3]2; (if [z=x]3 then [z:=0]4 else [z:=x]5); [y:=x]6; [x:=y+z]7 6

PPA Section 2.1

6

6

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

38

ud chains ud : Var? × Lab? → P(Lab?) given by ud(x, `0) = {` | def(x, `) ∧ ∃`00 : (`, `00) ∈ flow(S?) ∧ clear(x, `00, `0)} ∪ {? | clear(x, init(S?), `0)} where [x:= · · ·]`

-

-

|

··· {z

no x:=· · ·

-

-

[· · · :=x]`

0

}

• def(x, `) means that the block ` assigns a value to x • clear(x, `, `0) means that none of the blocks on a path from ` to `0 contains an assignments to x but that the block `0 uses x (in a test or on the right hand side of an assignment) PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

39

ud chains - an alternative definition

UD : Var? × Lab? → P(Lab?) is defined by: (

UD(x, `) =

{`0 | (x, `0) ∈ RDentry (`)} if x ∈ genLV (B `) ∅ otherwise

One can show that: ud(x, `) = UD(x, `)

PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

40

du chains du : Var? × Lab? → P(Lab?) given by   {`0 | def(x, `) ∧ ∃`00 : (`, `00) ∈ flow(S?) ∧ clear(x, `00, `0)}    if ` 6= ? du(x, `) = 0 | clear(x, init(S ), `0 )}  {` ?   

if ` = ?

[x:= · · ·]`

-

-

|

···

-

{z

no x:=· · ·

-

[· · · :=x]`

0

}

One can show that: du(x, `) = {`0 | ` ∈ ud(x, `0)}

PPA Section 2.1

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

41

Example: [x:=0]1; [x:=3]2; (if [z=x]3 then [z:=0]4 else [z:=x]5); [y:=x]6; [x:=y+z]7

ud(x, `) 1 2 3 4 5 6 7

x ∅ ∅ {2} ∅ {2} {2} ∅

PPA Section 2.1

y ∅ ∅ ∅ ∅ ∅ ∅ {6}

z ∅ ∅ {?} ∅ ∅ ∅ {4, 5}

du(x, `) x 1 ∅ 2 {3, 5, 6} 3 ∅ 4 ∅ 5 ∅ 6 ∅ 7 ∅ ? ∅

y ∅ ∅ ∅ ∅ ∅ {7} ∅ ∅

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

z ∅ ∅ ∅ {7} {7} ∅ ∅ {3}

42

Theoretical Properties • Structural Operational Semantics

• Correctness of Live Variables Analysis

PPA Section 2.2

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

43

The Semantics A state is a mapping from variables to integers: σ ∈ State = Var → Z The semantics of arithmetic and boolean expressions A : AExp → (State → Z)

(no errors allowed)

B : BExp → (State → T)

(no errors allowed)

The transitions of the semantics are of the form hS, σi → σ 0

PPA Section 2.2

and

hS, σi → hS 0, σ 0i

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

44

Transitions h[x := a]`, σi → σ[x 7→ A[[a]]σ] h[skip]`, σi → σ hS1, σi → hS10 , σ 0i hS1; S2, σi → hS10 ; S2, σ 0i hS1, σi → σ 0 hS1; S2, σi → hS2, σ 0i hif [b]` then S1 else S2, σi → hS1, σi

if B[[b]]σ = true

hif [b]` then S1 else S2, σi → hS2, σi

if B[[b]]σ = false

hwhile [b]` do S, σi → h(S; while [b]` do S), σi if B[[b]]σ = true hwhile [b]` do S, σi → σ

PPA Section 2.2

if B[[b]]σ = false

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

45

Example: h[y:=x]1; [z:=1]2; while [y>1]3 do ([z:=z*y]4; [y:=y-1]5); [y:=0]6, σ300i → h[z:=1]2; while [y>1]3 do ([z:=z*y]4; [y:=y-1]5); [y:=0]6, σ330i → hwhile [y>1]3 do ([z:=z*y]4; [y:=y-1]5); [y:=0]6, σ331i → h[z:=z*y]4; [y:=y-1]5; while [y>1]3 do ([z:=z*y]4; [y:=y-1]5); [y:=0]6, σ331i → h[y:=y-1]5; while [y>1]3 do ([z:=z*y]4; [y:=y-1]5); [y:=0]6, σ333i → hwhile [y>1]3 do ([z:=z*y]4; [y:=y-1]5); [y:=0]6, σ323i → h[z:=z*y]4; [y:=y-1]5; while [y>1]3 do ([z:=z*y]4; [y:=y-1]5); [y:=0]6, σ323i → h[y:=y-1]5; while [y>1]3 do ([z:=z*y]4; [y:=y-1]5); [y:=0]6, σ326i → hwhile [y>1]3 do ([z:=z*y]4; [y:=y-1]5); [y:=0]6, σ316i → h[y:=0]6, σ316i → σ306 PPA Section 2.2

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

46

Equations and Constraints Equation system LV=(S?): (

LVexit (`)

=

LVentry (`)

=

∅ S

if ` ∈ final(S?) {LVentry (`0) | (`0, `) ∈ flowR(S?)} otherwise

(LVexit (`)\killLV (B `)) ∪ genLV (B `) where B ` ∈ blocks(S?)

Constraint system LV⊆(S?): (

LVexit (`)

⊇

LVentry (`)

⊇

∅ S

if ` ∈ final(S?) {LVentry (`0) | (`0, `) ∈ flowR(S?)} otherwise

(LVexit (`)\killLV (B `)) ∪ genLV (B `) where B ` ∈ blocks(S?)

PPA Section 2.2

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

47

Lemma Each solution to the equation system LV=(S?) is also a solution to the constraint system LV⊆(S?). Proof: Trivial.

Lemma The least solution to the equation system LV=(S?) is also the least solution to the constraint system LV⊆(S?). Proof: Use Tarski’s Theorem. Naive Proof: Proceed by contradiction. Suppose some LHS is strictly greater than the RHS. Replace the LHS by the RHS in the solution. Argue that you still have a solution. This establishes the desired contradiction. PPA Section 2.2

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

48

Lemma A solution live to the constraint system is preserved during computation

hS, σ1i

→

6

|= LV⊆ ?

live

hS 0, σ10 i

→ ··· →

6

→

σ1000

6

|= LV⊆

|= LV⊆

?

live

hS 00, σ100 i

?

···

live

Proof: requires a lot of machinery — see the book.

PPA Section 2.2

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

49

Correctness Relation σ1 ∼ V σ2 means that for all practical purposes the two states σ1 and σ2 are equal: only the values of the live variables of V matters and here the two states are equal.

Example: Consider the statement [x:=y+z]` Let V1 = {y, z}. Then σ1∼V1 σ2 means σ1(y) = σ2(y) ∧ σ1(z) = σ2(z) Let V2 = {x}. Then σ1∼V2 σ2 means σ1(x) = σ2(x)

PPA Section 2.2

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

50

Correctness Theorem The relation “∼” is invariant under computation: the live variables for the initial configuration remain live throughout the computation. hS, σ1i

hS 0, σ10 i

→

6

6

∼V

hS 0, σ20 i

σ1000

→

6

∼V 00

?

→

hS 00, σ100 i 6

∼V 0

?

hS, σ2i

→ ··· →

∼V 000

?

→ ··· →

hS 00, σ200 i

?

σ2000

→

V 00 = liveentry (init(S 00 ))

V = liveentry (init(S)) V 0 = liveentry (init(S 0 ))

V 000 = liveexit (init(S 00 )) = liveexit (`) for some ` ∈ final(S)

PPA Section 2.2

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

51

Monotone Frameworks • Monotone and Distributive Frameworks

• Instances of Frameworks

• Constant Propagation Analysis

PPA Section 2.3

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

52

The Overall Pattern Each of the four classical analyses take the form (

Analysis◦(`) =

ι F

if ` ∈ E {Analysis•(`0) | (`0, `) ∈ F } otherwise

Analysis•(`) = f`(Analysis◦(`)) where –

F

is

T

or

S

(and t is ∪ or ∩),

– F is either flow(S?) or flowR(S?), – E is {init(S?)} or final(S?), – ι specifies the initial or final analysis information, and – f` is the transfer function associated with B ` ∈ blocks(S?). PPA Section 2.3

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

53

The Principle: forward versus backward • The forward analyses have F to be flow(S?) and then Analysis◦ concerns entry conditions and Analysis• concerns exit conditions; the equation system presupposes that S? has isolated entries. • The backward analyses have F to be flowR(S?) and then Analysis◦ concerns exit conditions and Analysis• concerns entry conditions; the equation system presupposes that S? has isolated exits.

PPA Section 2.3

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

54

The Principle: union versus intersecton • When is we require the greatest sets that solve the equations and we are able to detect properties satisfied by all execution paths reaching (or leaving) the entry (or exit) of a label; the analysis is called a must-analysis. F

T

• When is we require the smallest sets that solve the equations and we are able to detect properties satisfied by at least one execution path to (or from) the entry (or exit) of a label; the analysis is called a may-analysis. F

PPA Section 2.3

S

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

55

Property Spaces The property space, L, is used to represent the data flow information, F and the combination operator, : P(L) → L, is used to combine information from different paths.

• L is a complete lattice, that is, a partially ordered set, (L, v), such F that each subset, Y , has a least upper bound, Y .

• L satisfies the Ascending Chain Condition; that is, each ascending chain eventually stabilises (meaning that if (ln)n is such that l1 v l2 v l3 v · · ·,then there exists n such that ln = ln+1 = · · ·).

PPA Section 2.3

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

56

Example: Reaching Definitions • L = P(Var? × Lab?) is partially ordered by subset inclusion so v is ⊆

• the least upper bound operation

F

is

S

and the least element ⊥ is ∅

• L satisfies the Ascending Chain Condition because Var? × Lab? is finite (unlike Var × Lab)

PPA Section 2.3

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

57

Example: Available Expressions • L = P(AExp?) is partially ordered by superset inclusion so v is ⊇

• the least upper bound operation AExp?

F

is

T

and the least element ⊥ is

• L satisfies the Ascending Chain Condition because AExp? is finite (unlike AExp)

PPA Section 2.3

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

58

Transfer Functions The set of transfer functions, F , is a set of monotone functions over L, meaning that l v l0 implies f`(l) v f`(l0) and furthermore they fulfil the following conditions:

• F contains all the transfer functions f` : L → L in question (for ` ∈ Lab?)

• F contains the identity function

• F is closed under composition of functions PPA Section 2.3

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

59

Frameworks A Monotone Framework consists of:

• a complete lattice, L, that satisfies the Ascending Chain Condition; F we write for the least upper bound operator

• a set F of monotone functions from L to L that contains the identity function and that is closed under function composition

A Distributive Framework is a Monotone Framework where additionally all functions f in F are required to be distributive: f (l1 t l2) = f (l1) t f (l2)

PPA Section 2.3

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

60

Instances An instance of a Framework consists of: – the complete lattice, L, of the framework – the space of functions, F , of the framework – a finite flow, F (typically flow(S?) or flowR(S?)) – a finite set of extremal labels, E (typically {init(S?)} or final(S?)) – an extremal value, ι ∈ L, for the extremal labels – a mapping, f·, from the labels Lab? to transfer functions in F

PPA Section 2.3

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

61

Equations of the Instance: Analysis◦(`) =

G

{Analysis•(`0) | (`0, `) ∈ F } t ι`E (

where ι`E =

ι ⊥

if ` ∈ E if ` ∈ /E

Analysis•(`) = f`(Analysis◦(`))

Constraints of the Instance: Analysis◦(`) w

G

{Analysis•(`0) | (`0, `) ∈ F } t ι`E (

where ι`E =

ι ⊥

if ` ∈ E if ` ∈ /E

Analysis•(`) w f`(Analysis◦(`))

PPA Section 2.3

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

62

The Examples Revisited Available Expressions

Reaching Definitions

Very Busy Expressions

Live Variables

L

P(AExp?)

P(Var? × Lab?)

P(AExp?)

P(Var?)

v

⊇

⊆

⊇

⊆

F

T

S

T

S

⊥ ι

AExp? ∅

∅ {(x, ?) | x ∈ FV(S?)}

AExp? ∅

∅ ∅

E

{init(S?)}

{init(S?)}

final(S?)

final(S?)

F

flow(S?)

flow(S?)

flowR(S?)

flowR(S?)

F

{f : L → L | ∃lk , lg : f (l) = (l \ lk ) ∪ lg }

f`

f`(l) = (l \ kill(B `)) ∪ gen(B `) where B ` ∈ blocks(S?)

PPA Section 2.3

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

63

Bit Vector Frameworks A Bit Vector Framework has • L = P(D) for D finite • F = {f | ∃lk , lg : f (l) = (l \ lk ) ∪ lg }

Examples: • Available Expressions • Live Variables • Reaching Definitions • Very Busy Expressions

PPA Section 2.3

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

64

Lemma:

Bit Vector Frameworks are always Distributive Frameworks

Proof (

f (l1 ∪ l2) f (l1 ∩ l2) ( ((l1 \ lk ) ∪ (l2 \ lk )) ∪ lg = ((l1 \ lk ) ∩ (l2 \ lk )) ∪ lg ( f (l1) ∪ f (l2) = f (l1) ∩ f (l2)

f (l1 t l2) =

(

((l1 ∪ l2) \ lk ) ∪ lg ((l1 ∩ l2) \ lk ) ∪ lg ( ((l1 \ lk ) ∪ lg ) ∪ ((l2 \ lk ) ∪ lg ) = ((l1 \ lk ) ∪ lg ) ∩ ((l2 \ lk ) ∪ lg ) =

= f (l1) t f (l2)

• id(l) = (l \ ∅) ∪ ∅ • f2(f1(l)) = (((l \ lk1) ∪ lg1) \ lk2) ∪ lg2 = (l \ (lk1 ∪ lk2)) ∪ ((lg1 \ lk2) ∪ lg2) • monotonicity follows from distributivity • P(D) satisfies the Ascending Chain Condition because D is finite PPA Section 2.3

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

65

The Constant Propagation Framework An example of a Monotone Framework that is not a Distributive Framework The aim of the Constant Propagation Analysis is to determine

For each program point, whether or not a variable has a constant value whenever execution reaches that point.

Example: [x:=6]1; [y:=3]2; while [x > y ]3 do ([x:=x − 1]4; [z:= y ∗ y ]6) The analysis enables a transformation into [x:=6]1; [y:=3]2; while [x > 3]3 do ([x:=x − 1]4; [z:=9]6)

PPA Section 2.3

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

66

Elements of L > ) , v) d = (( Var → Z State ? ⊥ CP

Idea:

• ⊥ is the least element: no information is available b ∈ Var? → Z> specifies for each variable whether it is constant: • σ b (x) ∈ Z: x is constant and the value is σ b (x) – σ b (x) = >: x might not be constant – σ

PPA Section 2.3

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

67

Partial Ordering on L The partial ordering v on (Var? → Z>)⊥ is defined by b ∈ (Var? → Z>)⊥ : ∀σ

b ⊥ v σ

b 1, σ b 2 ∈ Var? → Z> : σ b1 v σ b2 ∀σ

iff

b 1(x) v σ b 2(x) ∀x : σ

where Z> = Z ∪ {>} is partially ordered as follows: ∀z ∈ Z> : z v > ∀z1, z2 ∈ Z : (z1 v z2) ⇔ (z1 = z2)

PPA Section 2.3

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

68

Transfer Functions in F d FCP = {f | f is a monotone function on State CP}

Lemma d Constant Propagation as defined by State CP and FCP is a Monotone Framework

PPA Section 2.3

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

69

Instances Constant Propagation is a forward analysis, so for the program S?:

• the flow, F , is flow(S?),

• the extremal labels, E, is {init(S?)},

• the extremal value, ιCP, is λx.>, and • the mapping, f·CP, of labels to transfer functions is as shown next

PPA Section 2.3

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

70

Constant Propagation Analysis > d ACP : AExp → (State CP → Z⊥)

(

b =⊥ ⊥ if σ b (x) otherwise σ

(

⊥ n

b = ACP[[x]]σ b = ACP[[n]]σ

b =⊥ if σ otherwise

b = ACP[[a1]]σ b oc b ACP[[a1 opa a2]]σ pa ACP[[a2]]σ

transfer functions: f`CP ( b =⊥ ⊥ if σ b) = [x := a]` : f`CP(σ b [x 7→ ACP[[a]]σ b ] otherwise σ [skip]` :

b) = σ b f`CP(σ

[b]` :

b) = σ b f`CP(σ

PPA Section 2.3

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

71

Lemma Constant Propagation is not a Distributive Framework

Proof Consider the transfer function f`CP for [y:=x*x]` b 1 and σ b 2 be such that σ b 1(x) = 1 and σ b 2(x) = −1 Let σ b1 t σ b 2 maps x to > — f`CP(σ b1 t σ b 2) maps y to > Then σ b 1) and f`CP(σ b 2) map y to 1 — f`CP(σ b 1) t f`CP(σ b 2) maps y to 1 Both f`CP(σ

PPA Section 2.3

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

72

Equation Solving • The MFP solution — “Maximum” (actually least) Fixed Point – Worklist algorithm for Monotone Frameworks

• The MOP solution — “Meet” (actually join) Over all Paths

PPA Section 2.4

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

73

The MFP Solution – Idea: iterate until stabilisation.

Worklist Algorithm Input: An instance (L, F , F, E, ι, f·) of a Monotone Framework Output: The MFP Solution: MFP◦, MFP• Data structures: • Analysis: the current analysis result for block entries (or exits) • The worklist W: a list of pairs (`, `0) indicating that the current analysis result has changed at the entry (or exit) to the block ` and hence the entry (or exit) information must be recomputed for `0 PPA Section 2.4

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

74

Worklist Algorithm Step 1

Initialisation (of W and Analysis) W := nil; for all (`, `0) in F do W := cons((`, `0),W); for all ` in F or E do if ` ∈ E then Analysis[`] := ι else Analysis[`] := ⊥L;

Step 2

Iteration (updating W and Analysis) while W 6= nil do ` := fst(head(W)); `0 = snd(head(W)); W := tail(W); if f`(Analysis[`]) 6v Analysis[`0] then Analysis[`0] := Analysis[`0] t f`(Analysis[`]); for all `00 with (`0, `00) in F do W := cons((`0, `00),W);

Step 3

Presenting the result (MFP◦ and MFP•) for all ` in F or E do MFP◦(`) := Analysis[`]; MFP•(`) := f`(Analysis[`])

PPA Section 2.4

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

75

Correctness The worklist algorithm always terminates and it computes the least (or MFP) solution to the instance given as input.

Complexity Suppose that E and F contain at most b ≥ 1 distinct labels, that F contains at most e ≥ b pairs, and that L has finite height at most h ≥ 1. Count as basic operations the applications of f`, applications of t, or updates of Analysis. Then there will be at most O(e · h) basic operations.

Example:

Reaching Definitions (assuming unique labels):

O(b2) where b is size of program: O(h) = O(b) and O(e) = O(b). PPA Section 2.4

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

76

The MOP Solution – Idea: propagate analysis information along paths.

Paths The paths up to but not including `: path◦(`) = {[`1, · · · , `n−1] | n ≥ 1 ∧ ∀i < n : (`i, `i+1) ∈ F ∧ `n = ` ∧ `1 ∈ E} The paths up to and including `: path•(`) = {[`1, · · · , `n] | n ≥ 1 ∧ ∀i < n : (`i, `i+1) ∈ F ∧ `n = ` ∧ `1 ∈ E} Transfer functions for a path ~ ` = [`1, · · · , `n]: f~` = f`n ◦ · · · ◦ f`1 ◦ id PPA Section 2.4

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

77

The MOP Solution The solution up to but not including `: MOP◦(`) =

G

{f~`(ι) | ~ ` ∈ path◦(`)}

The solution up to and including `: MOP•(`) =

G

{f~`(ι) | ~ ` ∈ path•(`)}

Precision of the MOP versus MFP solutions The MFP solution safely approximates the MOP solution: M F P w M OP (“because” f (x t y) w f (x) t f (y) when f is monotone). For Distributive Frameworks the MFP and MOP solutions are equal: M F P = M OP (“because” f (x t y) = f (x) t f (y) when f is distributive). PPA Section 2.4

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

78

Lemma Consider the MFP and MOP solutions to an instance (L, F , F, B, ι, f·) of a Monotone Framework; then: MFP◦ w MOP◦ and MFP• w MOP• If the framework is distributive and if path◦(`) 6= ∅ for all ` in E and F then: MFP◦ = MOP◦ and MFP• = MOP•

PPA Section 2.4

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

79

Decidability of MOP and MFP The MFP solution is always computable (meaning that it is decidable) because of the Ascending Chain Condition. The MOP solution is often uncomputable (meaning that it is undecidable): the existence of a general algorithm for the MOP solution would imply the decidability of the Modified Post Correspondence Problem, which is known to be undecidable.

PPA Section 2.4

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

80

Lemma The MOP solution for Constant Propagation is undecidable. Proof: Let u1, · · · , un and v1, · · · , vn be strings over the alphabet {1,· · ·,9}; let | u | denote the length of u; let [[u]] be the natural number denoted. The Modified Post Correspondence Problem is to determine whether or not ui1 · · · uim = vi1 · · · vin for some sequence i1, · · · , im with i1 = 1. x:=[[u1]]; y:=[[v1]]; while [· · ·] do (if [· · ·] then x:=x * 10|u1| + [[u1]]; y:=y * 10|v1| + [[v1]] else ... if [· · ·] then x:=x * 10|un| + [[un]]; y:=y * 10|vn| + [[vn]] else skip) [z:=abs((x-y)*(x-y))]` Then MOP•(`) will map z to 1 if and only if the Modified Post Correspondence Problem has no solution. This is undecidable. PPA Section 2.4

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

81

Interprocedural Analysis • The problem

• MVP: “Meet” over Valid Paths

• Making context explicit

• Context based on call-strings

• Context based on assumption sets

(A restricted treatment; see the book for a more general treatment.) PPA Section 2.5

c F.Nielson & H.Riis Nielson & C.Hankin (May 2005)

82

The Problem: match entries with exits proc fib(val z, u; res v) -

is1

?

[z

Principles of Program Analysis: Data Flow Analysis

Short Description

Description

Comments