Compiler Construction Flashcards

Question

context-free?

Answer 1

“context-free” because rules for nonterminals can be written without regard for the context in which they appear

Answer 2

context sensitive: you need the context of a code to validate it context free: you do not need the context of a code to validate it

Answer 3

it always produces a unique parse for a given valid input.

Answer 4

The parse is the sequence of productions needed to parse the input. The parse tree is a structure representing this derivation

Answer 5

"sweet spots": allow interesting languages to be specified, but can also be parsed efficiently

Answer 6

An AST is usually the result of the syntax analysis phase of a compiler lexical analysis -> syntax analysis

Answer 7

replace the leftmost non-terminal at each step vs replace the rightmost non-terminal at each step

Answer 8

- Our recursive descent parser encodes state information in its call stack. - Using recursive procedure calls to implement a stack abstraction may not be particularly efficient - This suggests other implementation methods: —explicit stack, hand-coded parser —stack-based, table-driven parser

Answer 9

evaluate and propagate constant expressions at compile time Constant propagation entails recognizing that certain “variables” actually have constant values, and then propagating these constants to expressions where they are used. Later we will see SSA is ideal to analyze this. constant folding: > Evaluate constant expressions at compile time > Only possible when side-effect freeness guaranteed

Answer 10

If a grammar has more than one derivation for a single | sentential form

Answer 11

- starts at the root of derivation tree and fills in - picks a production and tries to match the input - may require backtracking - some grammars are backtrack-free (predictive) - uses leftmost derivation - can be either hand-written or generated - cannot handle left-recursion in a grammar -> Since a parser does not just parse, but must also produce a suitable IR, we must decorate the grammar with additional actions that the generated parser will take

Answer 12

- starts at the leaves and fills in - starts in a state valid for legal first tokens - as input is consumed, changes state to encode possibilities (recognize valid prefixes) - uses a stack to store both state and sentential forms - uses rightmost derivation - are normally built by parser generators Reading in reverse, we have a rightmost derivation, first replacing S, then B, A and A again. 9

Answer 13

- build bottom-up parsers

Answer 14

eliminate code that can never be | executed

Answer 15

``` If the parser makes the wrong choices, expansion doesn’t terminate! -> solve using right recursion: E -> E + T E -> T rewrite as: E -> T E' E' -> + T E' E' -> ```

Answer 16

top-down parsers may need to backtrack when they select the wrong production - large subclasses of CFGs can be parsed with limited lookahead - Among the interesting subclasses are: LL(1): Left to right scan, Left-most derivation, 1-token look-ahead LR(1): Left to right scan, Right-most derivation, 1-token look-ahead

Answer 17

basic idea: For any two productions A → α | β, we would like a distinct way of choosing the correct production to expand. Whenever two productions A → α and A → β both appear in the grammar, we would like: FIRST(α) ∩ FIRST(β) = ∅ This would allow the parser to make a correct choice with a look-ahead of only one symbol! => this type of parsing works only on grammars where the first terminal symbol of each subexpression provides enough information to choose which production to use (iff FIRST property can be established!). What if a grammar does not have this property? -> left factoring

Answer 18

sometimes we can transform a grammar in order to have this property (FIRST(α) ∩ FIRST(β) = ∅): - For each non-terminal A find the longest prefix α common to two or more of its alternatives - if α ≠ ε then replace all of the A productions A → αβ1 | αβ2 | … | αβn with A → α A´ A´ → β1 | β2 | … | βn where A´ is fresh - Repeat until no two alternatives for a single non-terminal have a common prefix. -> we do this so that the token lookahead will be unambiguous! i.e. that selection requires only a single token look-ahead

Answer 19

The lexical analyzer breaks these syntaxes into a series of tokens, by removing any whitespace or comments in the source code. If the lexical analyzer finds a token invalid, it generates an error. The lexical analyzer works closely with the syntax analyzer. It reads character streams from the source code, checks for legal tokens, and passes the data to the syntax analyzer when it demands.

Answer 20

FIRST(a) is the set of all terminal symbols that can begin any string derived from a -> if two different productions have the same left-hand-side symbol X and their right-hand sides have overlapping FIRST sets, then the grammar cannot be parsed using predictive parsing.

Answer 21

p. 50/51 - create predictive parsing table with one production per entry (if > 1 production, predictive parsing will not work -> is ambiguous) - > an ambiguous grammar will always lead to duplicate entries in a predictive parsing table - > need to find an unambiguous grammar - > grammars whose predictive parsing tables contain no duplicate entries are called LL(1) = left-to-right prase, leftmost-derivation, 1-symbol lookahead - > a recursive-descent parser does its job just by looking at the next token of the input, never looking more than one token ahead ... FIRST property

Answer 22

generalize the notion of FIRST sets to describe the first k tokens of a string, and make an LL(k) parsing table whose rows are the nonterminals and columns are every sequence of k terminals (rarely done -> tables get large!)

Answer 23

Any NFA can be converted into a DFA, by simulating sets of simultaneous states. Since a DFA must always be in a unique state, the states of the DFA must be all possible subsets of the NFA states.

Answer 24

- use parse table | -

Answer 25

- No left-recursive grammar is LL(1) - No ambiguous grammar is LL(1) - Some languages have no LL(1) grammar - An ε–free grammar where each alternative expansion for A begins with a distinct terminal is a simple LL(1) grammar

Answer 26

Goal: - Given an input string w and a grammar G, construct a parse tree by starting at the leaves and working to the root. - We parse bottom-up, replacing terms by non-terminals! the question is always whether to shift or reduce!

Answer 27

A handle of a right-sentential form γ is a production A → β and a position in γ where β may be found and replaced by A to produce the previous right-sentential form in a rightmost derivation of γ —Suppose: S ⇒* αAw ⇒ αβw —Then A → β in the position following α is a handle of αβw -> NB: Because γ is a right-sentential form, the substring to the right of a handle contains only terminal symbols.

Answer 28

by a scanner generator

Answer 29

Shift-reduce parsers use a stack and an input buffer 1. initialize stack with $ 2. Repeat until the top of the stack is the goal symbol and the input token is $: a) Find the handle. If we don’t have a handle on top of the stack, shift (push) an input symbol onto the stack b) Prune the handle. If we have a handle A → β on the stack, reduce I. Pop |β| symbols off the stack II. Push A onto the stack A shift-reduce parser has just four canonical actions: shift, reduce, accept, error

Answer 30

... | CFGs are used to impose structure!!!

Answer 31

1. Maps sequences of characters to tokens 2. Eliminates white space (tabs, blanks, comments etc.) - > happens within the scanner - > The string value of a token is a lexeme

Answer 32

CFGs are used to impose structure!!! - For practical purposes it is important that a grammar be unambiguous, i.e., that it always produces a unique parse for a given valid input - Although parsers read their input Left to Right (the first “L” in most of these categories), they may work either top-down — producing a leftmost derivation — or bottom-up — producing a rightmost derivation. - The process of discovering a derivation is called parsing.

Answer 33

pro-contra? Rule of thumb: —right recursion for top-down parsers —left recursion for bottom-up parsers

Answer 34

replace the leftmost non-terminal at each step | - corresponds to top-down parsing, and is especially well-suited to recursive descent parsers

Answer 35

replace the rightmost non-terminal at each step - is produced bottom-up, and is better-suited to table-driven parsing

Answer 36

must be able to recognize the occurrence of the right hand side of a production after having seen all that is derived from that right hand side with k symbols of look-ahead. LR dilemma: pick A → b or B → b ? (how to reduce?)

Answer 37

eliminate by rearranging the grammar

Answer 38

Intent: Represent an operation to be performed on the elements of an object structure. Visitor lets you define a new operation without changing the classes of the elements on which it operates. - > A visitor gathers related operations - > Visitor makes adding new operations easy (write new visitor) - > Visitor can break encapsulation

Answer 39

- front-end for javacc - supports the building of syntax trees that can be traversed using visitors. - transforms a bare JavaCC grammar into three components: 1) a JavaCC grammar with embedded Java code for building a syntax tree 2) one class for every form of syntax tree node 3) a default visitor which can do a depth-first traversal of a syntax tree

Answer 40

—break compiler into manageable pieces —isolates back end from front end —different languages can share IR and back end - Enables machine-independent optimization -> general techniques, multiple passes

Answer 41

- Abstract syntax trees (AST) - Linear operator form of tree (e.g., postfix notation) - Directed acyclic graphs (DAG) - Control flow graphs (CFG) - Program dependence graphs (PDG) - Static single assignment form (SSA) - 3-address code - Hybrid combinations

Answer 42

to decide which production rule to apply at any point without backtracking

Answer 43

An AST is a parse tree with nodes for most non-terminals removed. -> Since the program is already parsed, non-terminals needed to establish precedence and associativity can be collapsed! Remember: Concrete syntax trees, or parse trees, show all of the intermediate non-terminals needed to produce an unambiguous grammar. An abstract syntax tree collapses these to offer a much simpler version of the parse tree.

Answer 44

the process of discovering a derivation

Answer 45

replace the leftmost non-terminal at each step vs replace the rightmost non-terminal at each step

Answer 46

Statements take the form: x = y op z Advantages: —compact form —names for intermediate values

Answer 47

Goal: simplify procedure-global optimizations Program is in SSA form if every variable is only assigned once - is only used for static analysis and optimization. It will disappear when we generate the final executable code - SSA is normally used for control-flow graphs (CFG)c

Answer 48

- Φ-functions are always at the beginning of a basic block | - Selects between values depending on control-flow

Answer 49

Two steps: —Place Φ-functions —Rename Variables We want minimal amount of needed Φ —Save memory —Algorithms will work faster

Answer 50

Dominance: node D dominates node N if every path from the start node to N goes through D (“strictly dominates”: D ≠ N) -Definition dominates use -> Dominance can be used to efficiently build SSA

Answer 51

DF(D) = the set of all nodes N such that D dominates an immediate predecessor of N but does not strictly dominate N. application of DF: Φ-Functions are placed in all basic blocks of the Dominance Frontier

Answer 52

A top-down parser starts with the root of the parse tree, labeled with the start or goal symbol of the grammar.

Answer 53

- Idea: remove Φ as late as possible - Variables in Φ-function never live at the same time! - > Can be stored in the same register - > Do register allocation on SSA! So, don’t remove Φ functions before register allocation

Answer 54

top-down parsers may need to backtrack when they select the wrong production - large subclasses of CFGs can be parsed with limited lookahead - Among the interesting subclasses are: LL(1): Left to right scan, Left-most derivation, 1-token look-ahead LR(1): Left to right scan, Right-most derivation, 1-token look-ahead

Answer 55

basic idea: For any two productions A → α | β, we would like a distinct way of choosing the correct production to expand.

Answer 56

A lattice is a partially ordered set with meet and join. A partially ordered set is a set S and a binary relation ≤ such that a ≤ a (reflexivity) if a ≤ b and b ≤ a, then a = b (antisymmetry) if a ≤ b and b ≤ c, then a ≤ c (transitivity) A complete lattice is a lattice where every subset has a supremum and infimum -> For static analysis, abstract interpretation is mostly performed over finite lattices

Answer 57

Flow based analysis is a particular instance of abstract interpretation

Answer 58

Performance: faster execution Size: smaller executable, smaller memory footprint Tradeoffs: 1) Performance vs. Size 2) Compilation speed and memory Optimizations both in the optimizer and back-end! - Back-end optimizations may focus on how the machine code is optimally generated

Answer 59

> Register Allocation > Instruction Selection > Peep-hole Optimization

Answer 60

generalize the notion of FIRST sets to describe the first k tokens of a string, and make an LL(k) parsing table whose rows are the nonterminals and columns are every sequence of k terminals (rarely done -> tables get large!)

Answer 61

is any string derivable from the start symbol

Answer 62

cf. 03 sl 56

Answer 63

``` > Constant Folding / Propagation > Copy Propagation > Algebraic Simplifications > Strength Reduction > Dead Code Elimination (Structure Simplifications) > Loop Optimizations > Partial Redundancy Elimination > Code Inlining ```

Answer 64

is the set of terminals that can immediately follow X i. e. t e FOLLOW(X) if there is any derivation containing Xt. This can occur if the derivation contains XYZt where Y and Z both derive epsilon - > I.e., a non-terminal’s FOLLOW set specifies the tokens that can legally appear after it - A terminal symbol has no FOLLOW set.

Answer 65

- No left-recursive grammar is LL(1) - No ambiguous grammar is LL(1) - Some languages have no LL(1) grammar - An ε–free grammar where each alternative expansion for A begins with a distinct terminal is a simple LL(1) grammar

Answer 66

Goal: - Given an input string w and a grammar G, construct a parse tree by starting at the leaves and working to the root. - We parse bottom-up, replacing terms by non-terminals!

Answer 67

A handle of a right-sentential form γ is a production A → β and a position in γ where β may be found and replaced by A to produce the previous right-sentential form in a rightmost derivation of γ —Suppose: S ⇒* αAw ⇒ αβw —Then A → β in the position following α is a handle of αβw -> NB: Because γ is a right-sentential form, the substring to the right of a handle contains only terminal symbols.

Answer 68

Common Subexpression: —There is another occurrence of the expression whose evaluation always precedes this one —operands remain unchanged

Answer 69

Optimizing code in loops is important —often executed, large payoff! e.g. —fission/fusion: split/combine loops to improve locality or reduce overhead —scheduling: run parts in multiple processors —unrolling: duplicate body several times in order to decrease the number of times the loop condition is tested —loop-invariant code motion: move invariant code out of loop

Answer 70

Shift-reduce parsers use a stack and an input buffer 1. initialize stack with $ 2. Repeat until the top of the stack is the goal symbol and the input token is $

Answer 71

Values of variables form an arithmetic progression sl. 38

Answer 72

A grammar is LR(k) if k tokens of lookahead are enough to determine a unique parse:

Answer 73

- LR(1) grammars are used to construct LR(1) parsers - everyone’s favorite parser - virtually all context-free programming language constructs can be expressed in an LR(1) form - LR grammars are the most general grammars parsable by a deterministic, bottom-up parser - efficient parsers can be implemented for LR(1) grammars - LR parsers detect an error as soon as possible in a left-to-right scan of the input - LR grammars describe a proper superset of the languages recognized by predictive (i.e., LL) parsers

Answer 74

e.g. three simple ones: —Constant Propagation -> SSA: Variables are assigned once -> We know that we can replace all uses by the constant! ``` —Copy Propagation -> SSA: for a statement x1 := y1 -> replace later uses of x1 with y1 -> as a “clean up” optimization after other optimizations have been performed ``` —Simple Dead Code Elimination - > SSA: Variable is live if the list of uses is not empty - > Dead definitions can be deleted

Answer 75

Approach: —Generate code, —profile it in a typical scenario, —then use that information to optimize it Problem: —usage scenarios can change in deployment, there is no way to react to that as profile is generated at compile time.

Answer 76

A hand coded recursive descent parser directly encodes a grammar (typically an LL(1) grammar) into a series of mutually recursive procedures. It has most of the linguistic limitations of LL(1).

Answer 77

must be able to recognize the use of a production after seeing only the first k symbols of its right hand side. LL dilemma: pick A → b or A → c ? (which rule to apply?)

Answer 78

- There is no general “right” order of optimizations - One optimization generates new opportunities for a preceding one. => Optimization is an iterative process Compile Time vs. Code Quality

Answer 79

- based on LL(k) - Grammars are written in EBNF - Transforms an EBNF grammar into an LL(k) parser - Top-down parsing (recursive descent) with variable lookahead -

Answer 80

Intent: Represent an operation to be performed on the elements of an object structure. Visitor lets you define a new operation without changing the classes of the elements on which it operates.

Answer 81

study slides again + chap. 6

Answer 82

CC-08 sl 24 usual approach: caller’s registers: callee saves Call includes bitmap of caller’s registers to be saved/restored. Best: saves fewer registers, compact call sequences

Answer 83

the syntactic structure of the program

Answer 84

An AST is a parse tree with nodes for most non-terminals removed. -> Since the program is already parsed, non-terminals needed to establish precedence and associativity can be collapsed!

Answer 85

A DAG is an AST with unique, shared nodes for each value.

Answer 86

A CFG models transfer of control in a program

Answer 87

Statements take the form: x = y op z Advantages: —compact form —names for intermediate values

Answer 88

Goal: simplify procedure-global optimizations Program is in SSA form if every variable is only assigned once - is only used for static analysis and optimization. It will disappear when we generate the final executable code

Answer 89

- Φ-functions are always at the beginning of a basic block | - Selects between values depending on control-flow

Answer 90

Dominance: node D dominates node N if every path from the start node to N goes through D - Definition dominates use - > Dominance can be used to efficiently build SSA

Answer 91

s Virtual machine provides a virtual processor The VM provides a virtual processor that interprets bytecode instructions

Answer 92

Simplifies many optimizations: - Every variable has only one definition - Every use knows its definition, every definition knows its uses - Unrelated variables get different names ``` Examples: —Constant propagation —Value numbering —Invariant code motion and removal —Strength reduction —Partial redundancy elimination ```

Answer 93

``` 256 Bytecodes, four groups: Stack Bytecodes Send Bytecodes Return Bytecodes Jump Bytecodes ```

Answer 94

- Idea: remove Φ as late as possible - Variables in Φ-function never live at the same time! - > Can be stored in the same register - > Do register allocation on SSA!

Answer 95

Types: - static vs dynamically checked - soundness - type checking - > Typing rules are usually written as natural deductions - Abstract Interpretation: - > One of the most generic and fundamental ways of approximating the behavior of a program

Answer 96

sl 38 cc-09 A method context is an object that represents an activation record (stack frame) for a method invocation. Each method context points to the context of its caller, thus constituting a stack of contexts.

Answer 97

A lattice is a partially ordered set with meet and join. A partially ordered set is a set S and a binary relation ≤ such that a ≤ a (reflexivity) if a ≤ b and b ≤ a, then a = b (antisymmetry) if a ≤ b and b ≤ c, then a ≤ c (transitivity) A complete lattice is a lattice where every subset has a supremum and infimum -> For static analysis, abstract interpretation is mostly performed over finite lattices

Answer 98

Flow based analysis is a particular instance of abstract interpretation

Answer 99

Performance: faster execution Size: smaller executable, smaller memory footprint Tradeoffs: 1) Performance vs. Size 2) Compilation speed and memory Optimizations both in the optimizer and back-end! - Back-end optimizations may focus on how the machine code is optimally generated

Answer 100

> Register Allocation > Instruction Selection > Peep-hole Optimization

Answer 101

Write barrier: remember objects with old-young pointers in "Remembered Set": When marking young generation, use objects in remembered set as additional roots

Answer 102

For every expression, there are many ways to realize them for a processor Example: Multiplication*2 can be done by bit-shift Instruction selection is a form of optimization -> Group IR-tree nodes into clumps that correspond to actions of targetmachine instructions

Answer 103

mark and sweep compacting collector with two generations > Cooperative, i.e., not concurrent > Single threaded

Answer 104

``` > Constant Folding / Propagation > Copy Propagation > Algebraic Simplifications > Strength Reduction > Dead Code Elimination (Structure Simplifications) > Loop Optimizations > Partial Redundancy Elimination > Code Inlining ```

Answer 105

Multithreading is the ability to create concurrently running “processes” Non-native threads (green threads): – Only one native thread used by the VM – Simpler to implement and easier to port Native threads – Using the native thread system provided by the OS – Potentially higher performance

Answer 106

Idea: Just In Time Compilation > Translate unit (method, loop, ...) into native machine code at runtime > Store native code in a buffer on the heap Challenges > Run-time overhead of compilation > Machine code takes a lot of space (4-8x compared to bytecode) > Deoptimization (for debugging) is very tricky Adaptive compilation: gather statistics to compile only units that are heavily used (hot spots)

Answer 107

s Replace expensive operations with simpler ones -> Peephole optimizations are often strength reductions

Answer 108

Similar to dead code: Simplify CFG Structure e. g. - Delete Empty Basic Blocks - Fuse Basic Blocks (e.g. “conditional” jumps between basic blocks where conditions are always true) -> Optimizations will degenerate CFG: — Needs to be cleaned to simplify further optimization!

Answer 109

Optimizing code in loops is important —often executed, large payoff! e.g. —fission/fusion: split/combine loops to improve locality or reduce overhead —scheduling: run parts in multiple processors —unrolling: duplicate body several times in order to decrease the number of times the loop condition is tested —loop-invariant code motion: move invariant code out of loop

Answer 110

> Ambiguous languages —That’s what CFGs are for! >

Answer 111

Values of variables form an arithmetic progression sl. 38

Answer 112

> Traditional linear-time parsers have fixed lookahead —With unlimited lookahead, don’t need separate lexical analysis! - especially useful when mixing languages with different terminals

Answer 113

By memoizing parsing results, we avoid having to recalculate partially successful parses. => A “packrat parser” is a PEG that memoizes (i.e., caches) intermediate parsing results so they do not have to be recomputed while backtracking.

Answer 114

e.g. hree simple ones: —Constant Propagation —Copy Propagation —Simple Dead Code Elimination

Answer 115

> General CFG parsing (ambiguous grammars) —produces at most one result > Parsing highly “stateful” syntax (C, C++) —memoization depends on statelessness > Parsing in minimal space —LL/LR parsers grow with stack depth, not input size

Answer 116

> Parser combinators in functional languages are higher order functions used to build parsers —e.g., Parsec, Haskell > In an OO language, a combinator is a (functional) object —To build a parser, you simply compose the combinators

Answer 117

Dynamic Translation: Compilation done during execution of a program – at run time – rather than prior to execution Improve time and space efficiency of programs using: > portable and space-efficient byte-code > run-time information → feedback directed optimizations > speculative optimization

Answer 118

- There is no general “right” order of optimizations - One optimization generates new opportunities for a preceding one. => Optimization is an iterative process Compile Time vs. Code Quality

Answer 119

—A language for specifying program transformations XT: A collection of transformation tools

Answer 120

study slides again + chap. 6

Answer 121

Scannerless GLR parsing vs Agile parsing (top-down + bottom-up) Reusable, generic traversal strategies vs Fixed traversals Separates rewrite rules from traversal strategies vs Traversals part of rewrite rules

Answer 122

The “maximal munch” principle is the rule that as much of the input as possible should be processed when creating some construct.

Answer 123

Choose registers for variables and temporary values; variables not simultaneously live can share same register

Answer 124

Liveness analysis: Problem: —IR has unbounded # temporaries —Machines has bounded # registers Approach: —Temporaries with disjoint live ranges can map to same register —If not enough registers, then spill some temporaries (i.e., keep in memory) The compiler must perform liveness analysis for each temporary —It is live if it holds a value that may still be needed Liveness information is a form of data flow analysis over the control flow graph -> liveness analysis calculates places where each variable holds a still-needed (live) value

Answer 125

an abstract computing architecture supporting a programming language in a hardware-independent fashion

Answer 126

s Virtual machine provides a virtual processor

Answer 127

The “machine-code” of the virtual machine Bytecode is analogous to assembler, except it targets a virtual machine rather than a physical one. Many VMs are stack machines: the generated bytecode pushes values onto a stack, and executes operations that consume one or more values on the top of the stack, replacing them with results Different forms of bytecodes: Single bytecodes Groups of similar bytecodes Multibyte bytecodes

Answer 128

``` 256 Bytecodes, four groups: Stack Bytecodes Send Bytecodes Return Bytecodes Jump Bytecodes ```

Answer 129

Stack machines • Smalltalk, Java and most other VMs • Simple to implement for different hardware architectures • Very compact code Register machines • Potentially faster than stack machines • Only a few register VMs exist, e.g., Parrot VM (Perl6

Answer 130

sl 38 cc-09 A method context is an object that represents an activation record (stack frame) for a method invocation. Each method context points to the context of its caller, thus constituting a stack of contexts.

Answer 131

Tell when an object is no longer used and then recycle the memory ``` Challenges – Fast allocation – Fast program execution – Small predictable pauses – Scalable to large heaps – Minimal space usage ``` Main Approaches: 1. Reference Counting (cool idea but fragile) 2. Mark and Sweep

Answer 132

Idea > For each store operation increment count field in header of newly stored object > Decrement if object is overwritten > If count is 0, collect object and decrement the counter o each object it pointed to Problems > Run-time overhead of counting (particularly on stack) > Inability to detect cycles (need additional GC technique)

Answer 133

Idea > Suspend current process > Mark phase: trace each accessible object leaving a mark in the object header (start at known root objects) > Sweep phase: all objects with no mark are collected > Remove all marks and resume current process Problems > Need to “stop the world” > Slow for large heaps generational collectors > Fragmentation compacting collectors

Answer 134

"Most new objects live very short lives; most older objects live forever" Idea > Partition objects into generations > Create objects in young generation > Tenuring: move live objects from young to old generation > Incremental GC: frequently collect young generation (very fast) > Full GC: infrequently collect young+old generation (slow) Difficulty > Need to track pointers from old to new space

Answer 135

Write barrier: remember objects with old-young pointers in "Remembered Set": When marking young generation, use objects in remembered set as additional roots

Answer 136

Idea > During the sweep phase all live objects are packed to the beginning of the heap > Simplifies allocation since free space is in one contiguous block Challenge > Adjust all pointers of moved objects – object references on the heap – pointer variables of the interpreter!

Answer 137

mark and sweep compacting collector with two generations > Cooperative, i.e., not concurrent > Single threaded

Answer 138

– Incremental GC on allocation count or memory needs – Full GC on memory needs – Tenure objects if survivor threshold exceeded

Answer 139

Multithreading is the ability to create concurrently running “processes” Non-native threads (green threads): – Only one native thread used by the VM – Simpler to implement and easier to port Native threads – Using the native thread system provided by the OS – Potentially higher performance

Answer 140

- Method cache for faster lookup: receiver’s class + method selector - Method context cache (as much as 80% of objects created are context objects!) - Interpreter loop: 256 way case statement to dispatch bytecodes - Quick returns: methods that simply return a variable or known constant are compiled as a primitive method - Small integers are tagged pointers: value is directly encoded in field references

Answer 141

Idea: Just In Time Compilation > Translate unit (method, loop, ...) into native machine code at runtime > Store native code in a buffer on the heap Challenges > Run-time overhead of compilation > Machine code takes a lot of space (4-8x compared to bytecode) > Deoptimization (for debugging) is very tricky

Answer 142

> Cooperative between processes of the same priority | > Preemptive between processes of different priorities

Answer 143

Textbook Method 1. Formalize syntax via a context-free grammar 2. Write a parser generator (.*CC) specification 3. Hack on grammar until “nearly LALR(1)” 4. Use generated parser

Answer 144

a rule system to generate language strings but we really want: a rule system to recognize language strings

Answer 145

model recursive descent parsing best practice PEG specifies rules to recognize sentences in a topdown fashion. CFG: S → aaS S → ε PEG: S ← aaS / ε PEG specifies rules to recognize sentences in a topdown fashion CFG: a rule system to generate language strings PEG: a rule system to recognize language strings

Answer 146

> Simplicity, formalism of CFGs > Closer match to syntax practices —More expressive than deterministic CFGs (LL/LR) —Unlimited lookahead, backtracking > Linear time parsing for any PEG (!) -> linear parse time can be achieved with the help of memoization using a “packrat parser”.

Answer 147

> Expresses all deterministic languages — LR(k) > Closed under union, intersection, complement > Can express some non-context free languages —e.g., a^nb^nc^n > Undecidable whether L(G) = ∅

Answer 148

> Ambiguous languages —That’s what CFGs are for! >

Answer 149

Predictive parsers • use lookahead to decide which rule to trigger • fast, linear time Backtracking parsers • try alternatives in order; backtrack on failure • simpler, more expressive (possibly exponential time!)

Answer 150

- especially useful when mixing languages | with different terminals

Answer 151

By memoizing parsing results, we avoid having to recalculate partially successful parses

Answer 152

> Linear cost —bounded by size(input) × #(parser rules) > Recognizes strictly larger class of languages than deterministic parsing algorithms (LL(k), LR(k)) > Good for scannerless parsing —fine-grained tokens, unlimited lookahead > Scannerless parsing enables unified grammar for entire language —Can express grammars for mixed languages with different lexemes!

Answer 153

> General CFG parsing (ambiguous grammars) —produces at most one result > Parsing highly “stateful” syntax (C, C++) —memoization depends on statelessness > Parsing in minimal space —LL/LR parsers grow with stack depth, not input size

Answer 154

> Parser combinators in functional languages are higher order functions used to build parsers —e.g., Parsec, Haskell > In an OO language, a combinator is a (functional) object —To build a parser, you simply compose the combinators

Answer 155

Translation: - Compilation - migration from procedural to OO Rephrasing: - desugaring regular expressions - partial evaluation

Answer 156

program -> parse -> (tree) -> transform -> (tree) -> transform -> (tree) -> pretty-print -> program

Answer 157

—A language for specifying program transformations XT: A collection of transformation tools

Answer 158

Stratego parses any context-free language using Scannerless Generalized LR Parsing

Answer 159

general-purpose source to source transformation language -> original designed as a desugaring tool for syntactic extensions to the teaching language Turing

Answer 160

Scannerless GLR parsing vs Agile parsing (top-down + bottom-up) Reusable, generic traversal strategies vs Fixed traversals Separates rewrite rules from traversal strategies vs Traversals part of rewrite rules

Answer 161

The process of changing a software system in such a way that it does not alter the external behaviour of the code, yet improves its internal structure

Answer 162

cross-cutting concerns: Certain features (like logging, persistence and security), cannot usually be encapsulated as classes. They cross-cut code of the system. AOP improves modularity by supporting the separation of cross-cutting concerns. An aspect packages cross-cutting concerns. A pointcut specifies a set of join points in the target system to be affected. Weaving is the process of applying the aspect to the target system

Answer 163

aka environment | symbol table map variable names to information about the variables (e.g. type, location)

Answer 164

- > Primitive methods trigger a VM routine and are executed without a new method context unless they fail - Do work that can only be done in VM (new object creation, process manipulation, become, ...) - Improve performance

Answer 165

—Simpler and more concise for tokens than a grammar —More efficient scanners can be built from REs - > CFGs are used to impose structure - > lexical analysis: Lexical analysis is the first phase of a compiler. The lexical analyzer breaks these syntaxes into a series of tokens, by removing any whitespace or comments in the source code.

Answer 166

Lexemes are said to be a sequence of characters (alphanumeric) in a token. There are some predefined rules for every lexeme to be identified as a valid token. These rules are defined by grammar rules, by means of a pattern. A pattern explains what can be a token, and these patterns are defined by means of regular expressions.

Answer 167

a program that reads an executable program and produces the results of running that program The job of an interpreter is to execute source code, possibly consuming input, and producing output. An interpreter typically will translate the source code to some intermediate representation along the way, but this intermediate form will not necessarily be stored as an artifact. In contrast to a compiler, an interpreter does execute the source program.

Answer 168

LookAhead LR (uses “lookahead sets”)

Answer 169

a proof that shows a sentence is in the language of a grammar. - > start with the start symbol, then repeatedly replace any nonterminal by one of its right-hand sides - there are many different derivations of the same sentence - a parse tree is made by connecting each symbol in a derivation to the one from which it was derived We can view the productions of a CFG as rewriting rules. The process of discovering a derivation is called parsing

Answer 170

``` automatically construct code from regular expression-like descriptions —construct a DFA —use state minimization techniques —emit code for the scanner (table driven or direct code ) ```

Compiler Construction Flashcards

(219 cards)