6. Pointer Analysis Flashcards
Pointer Analysis
flow of non primitive things
pointer aliasing
Expressions built using pointers, such as x.radius, allow the same memory address to be referred to in
different ways
pointer aliasing circle example
Circle x = new Circle() Circle z = ? x. radius = 1 z.radius = 2 y = x.radius assert(y==1)
Because we don’t know if z represents the same circle as x or not, the analysis is unable to continue past z.radius =2 since we’d be unsure if x.radius still equaled 1
May-Alias Analysis
An analysis that is dedicated to proving facts of the form “x may-alias z?” is called
a MAY-alias analysis.
Is pointer analysis must-alias or may-alias analysis?
may-alias
May-Alias circle example
Circle x = new Circle() Circle z = new Circle() x. radius = 1 z.radius = 2 y = x.radius assert(y==1)
We know now in this case that x!=z so after z.radius=2 is done, we can confidently say that x.radius is still 1.
Must-Alias circle example
Circle x = new Circle() Circle z = x x. radius = 1 z.radius = 2 y = x.radius assert(y==1)
after the assignment to z, we know that x == z. x.radius =1 makes the x radius 1. z.radius=2 makes x.radius=2 since we previously established that x == z.
x and z MUST Alias in this case.
May alias vs must alias
Must alias is more advance, but it is less useful in practice.
May alias analysis is useful for more practical dataflow analysis than must alias.
Why is pointer analysis hard?
you have to keep track of everything. In the case of a doubly linked list, you could refer to h.data in a number of ways:
h.next.prev.data, h.next.next.prev.prev.data, etc
cycles are hard yo
Pointer analysis problem is undecidable. T/F
True. We must sacrifice some combination of Soundness, Completeness, Termination.
what does pointer analysis sacrifice to become decidable?
completeness. This means that we can expect false positives, but no false negatives.
False positive
If the answer is no, but yes is the returned answer.
How can a false positive manifest in the circle example?
Circle x = new Circle() Circle z = new Circle() x. radius = 1 z.radius = 2 y = x.radius assert(y==1)
x May-Alias z returns YES. means that after z=new Circle() our analysis cannot determine that x!=z, but that x==z or x!=z. So going down the remaining analysis we reach the conclusion that y==1 or y==2
Approximate algorithms for pointer analysis have varying levels of precision. These algorithms differ in two key aspects:
- How to abstract the heap (ie dynamically allocated data)
2. How to abstract control-flow
Abstracting the heap (Elevator example): Abstract object based on site they are allocated
Basically take the levels of the program and combine all those things into a single node in the graph. all objects allocated within a for loop are represented by a single node.
Creates a Points to graph http://imgur.com/a/nXlBb
Abstracting the control flow (Elevator example)
only a single points to graph for entire program, so we abstract the control flow.
remember how the Points to graph looked before? Instead of changing the graph representation, change the code itself to create that graph. http://imgur.com/a/a18LY
flow insensitivity
http://imgur.com/a/a18LY idk man
all constructs such as for loops are removed
; are removed as well
all statements that do not affect pointers removed
indices replaced with nondeterministic *s
no order to these statements now even though they are still in rough order
Chaotic Iteration Algorithm
Start with empty point to graph
go through each statement s in set
while applying rule corresponding to s on graph
until graph stops changing.
Statement types
v = new v = v2 object copy v2 = v.f field read statement v.f = v2 field write statement v2 = v[*] field read v[*] = v2 field write
Convert v.events = new Object[] to the statement grammar or whatever its called
tmp = new Object[] v.events = tmp
convert
v.events[*] = e
tmp = v.events tmp[*] = e
convert
v1.f = v2.f
tmp = v2.f v1.f = tmp
convert
v1.f.g = v2.h
tmp1 = v1.f
tmp2 = v2.h
tmp1.g=tmp2
Rule for object allocation sites: Weak update
if there is already an arrow from v to another allocation site node and we just need to add another.
Rule for object allocation sites: Strong update
replace the points-to information
Rule for Object Copy
v1 = v2
create variable node for v1
add blue arrow from v1 to all nodes pointed to by v2
Rule for field writes
v1.f=v2
if v1 points to A and v2 points to B
we add a red arrow from A to B and label it by name for field or by * if its an index
rule for field reads
v1 = v2.f
if v2 points to B and B points to C by f or *
we add a blue arrow from v1 to C
Classifying Pointer Analysis Algorithms
Flow sensitive
Context Sensitive
What heap abstraction scheme
how are aggregate data types modeled
flow sensitivity
how to model control flow within a procedure
either flow insensitive or flow sensitive
flow insensitive
weak updates
only generate new facts, doesn’t kill any old facts
suffice for may-alias analysis
flow sensitive
strong updates - killing and generating facts
must-alias analysis
Inpractical for a must alias analysis to have a low false positive rate by being flow insensitive
true
Context sensitivity
how to model control flow across procedures
context insensitive
context sensitive
context insensitive
only analyze each procedure once despite how many times its called in the program
inprecise but efficient
context sensitive
analyze each procedure possibly multiple times, once per abstract calling context
heap abstraction scheme
scheme to partition unbounded set of concrete objects into finitely many abstract objects (oval nodes)
Ensures pointer analysis terminates
Many sound schemes exist
Heap abstraction scheme: too few abstract objects ==
efficient but imprecise
Heap abstraction scheme: too many abstract objects ==
expensive but precise
Heap abstraction scheme #1: Allocation-Site Based
one abstract object per allocation site
Allocation site identified by: “new” keyword in Java/C++, malloc() call in C
Finitely many allocation sites in program, so guaranteed finitely many abstract objects
Allocation site based downsides
costly
large programs
clients needing quick turnaround time
overly fine granularity of sites
Heap abstraction scheme #2: Type Based
one abstract object per type
finitely many types in program, so finitely many abstract objects
Heap abstraction scheme #3: Heap Insensitive
Single abstract object representing the entire heap.
highly imprecise, but sound
When is heap insensitive scheme useful
popular for languages with primarily stack-directed pointers (C)
Do quiz #31 in lesson 6
dumbass
Modeling Aggregate Data Types: Arrays
Common choice: single [*] field to represent all array elements
Modeling Aggregate Data Types: Records
structs or objects
Three common choices:
1. field insensitive - merge all fields of each record object
2. field based - merge each field of all record objects
3, field sensitive - keep each field of each record object separate