Exam Flashcards
What is an Interpreter?
Analyses syntactics and semantics.
- Execute programs without much translation.
- Allows for interactive debugging:
- Breakpoints.
- Viewing of variables during run-time.
- Provides a significant degree of machine independence:
- (Able to run on a variety of computers).
Interpreters can be up to 100 times slower than compiled code.
What is a Language processor?
A language processor is a hardware device used to perform tasks, such as processing program code to machine code.
What is machine code?
Code the machine can directly read.
Consisting only of 0’s and 1’s.
What is Target code?
The output code after a compilation.
What is Syntax?
The structure of symbols in programs.
Defines sequenses of symbols that are legal.
Is not concerned with what the symbols mean.
What is Semantics?
The meaning of symbols in programs.
The semantics of a programming language are commonly divided into two classes.
What are they?
- Static semantics.
- Runtime semantics.
What is Static semantics?
Provides a set of rules that specify which syntactically legal programs are actually valid.
To specify the rules it is required that:
- Used variables are declared.
- Operators and operands are type-compatible.
- Procedures are called with the proper number of parameters.
Static semantics can be expressed in two ways:
- Expressed informally using everyday speech like in most language specifications. (less precise)
- Expressed formally using a variety of notations. (very precise)
- One kind of notation: Attribute Grammars:
- Attribute grammars augment CFG since it cannot express the rules.
- E -> E + T is a production in a CFG.
- This is the production augmented with a type attribute for E and T:
- One kind of notation: Attribute Grammars:
What is Runtime semantics?
Specifies what a program computes.
Runtime semantics are often described informally.
But there are ways to describe them formally, this is three of them:
-
Natural (structural operational semantics):
- Describe how the overall results of the executions are obtained.
-
Axiomatic:
- Based on mathematical logic to proving the correctness of computer programs.
-
Denotational:
- Defines that specifies the meaning of a construct (e.g addition).
- Relys on notation and terminology from mathematics, so they are often very compact.
What are the different kinds of machine code?
There are three different kinds of machine code:
- Pure Machine Code.
- Augmented Machine Code.
- Virtual Machine Code
What is pure machine code?
Code generated on the assumption that it doesn’t require an operating system or other libraries. (It can run purely on its own)
Rarely used, but when used it’s often for things like building operating systems.
What is augmented machine code?
Code generated for a machine architecture that is augmented with an operating system.
- Requires the particular operating system it was developed for to be present.
- Is often used.
What is virtual machine code?
Virtual machine code is composed entirely of virtual instructions.
- Good technique for producing code that can be run easily on a variety of computers.
- You create a Virtual Machine to accompany the code.
- The code can then be run on any machine with the VM installed.
What are the different formats of target code?
There are three different formats of target code:
- Assembly or other source formats.
- Relocatable binary format.
- Absolute binary format.
What does a syntax-directed compiler consist of?
- Scanner
- Parser
- Type Checker
- Translator
- Optimizer
- Code generator
- Symbol tables, which are connected to all of the above.
How is an Abstract Syntax Tree drawn?
What is IR?
An Intermediate representation (IR) is a data structure that is constructed from input data to a program.
Can occour between the Translator and Code generator in a syntax-directed compiler. (not always, some compilers generate target code directly)
When an intermediate representation of the program is created it serves as input to the code generator.
What is tokens?
A token is a structure representing a lexeme that explicitly indicates its categorization for the purpose of parsing.
- Identifiers.
- Integers.
- Reserved words.
- Delimiters.
What is a Scanner?
It is the 1st phase in a syntax-directed compiler.
It reads the source code as a text file and produces a stream of tokens. The tokens are then encoded (as integers) and served onto the parser.
- It makes the program a compact and uniform format (stream of tokens).
- It removes unnecessary information (like comments).
- Can also be called: lexical analyzer, lexer, or tokenizer.
What is a Parser?
It is the 2nd phase in a syntax-directed compiler.
It reads tokens and groups them into phases according to the syntax.
- The parser verifies correct syntax.
- The Parser then builds an AST.
What is a Type checker?
It is the 3rd phase in a syntax-directed compiler.
- It checks for static semantics for each AST node.
- Meaning it checks that all identifiers are declared, that types are correct and so on.
- It adds information to the AST node.
What is a Translator? (Program synthesis)
It is the 4th phase in a syntax-directed compiler.
Translates all the AST nodes that has been evaluated correct from the type checker into IR code.
What is an Optimizer?
It is the 5th phase in a syntax-directed compiler.
It analyzes and transforms the IR code from the Translator into improved IR code.
This process can be complex and take a long time. Often involving numerous subphases.
Some compilers have the option to turn off the optimization phase to speed up compilation time.
What is a Code generator?
It is the 5th or 6th phase in a syntax-directed compiler. (depending on the compiler having an Optimizer)
- Assumes that program has been thoroughly checked and is well formed (scope & type rules)
- It requires detailed information about the target machine and its machine specific optimization such as register allocation and code scheduling.
- It maps the IR code to the target machine code.
So it is often coded by hand and not generated.
What is Symbol Tables?
(Sometimes called Identification tables)
It is an information table that can be accessed by all phases in a syntax-directed compiler.
- Allows information to be associated with identifiers which can be shared among different compile phases.
Each time an identifier/variable is declared or used, the symbol table provides information collected about it.
What is a compiler writing tool?
It is often packaged as a compiler compiler. (SableCC, antlr)
- Includes scanner and parser generators.
- Some also include symbol table managers, and code-generation tools.
How is a Context-free grammar written?
What are the terminal and nonterminal symbols?
They are the lexical elements used in specifying the production rules constituting a formal grammar.
What is a terminal symbol?
A grammar symbol that cannot be rewritten.
(By convention written in lowercase)
(On the picture it’s the id, assign and $ symbols.)
What is a nonterminal symbol?
Nonterminal symbols are those symbols which can be replaced.
(By convention written in uppercase)
(On the picture it’s the Val and Expr symbols)
How can production rules define a grammar?
They specify which symbols may replace which other symbols.
The rule z0 → z1 specifies that z0 can be replaced by z1.
What is a the CFG Start Symbol?
A special nonterminal. It’s usually the symbol on the left-hand side of the grammars first rule.
(On the picture the start symbol is “Prog”)
What is a regular expression?
A regular expression is a sequence of characters that forms a search pattern, mainly for use in pattern matching with strings, or string matching,
“find and replace”-like operations.
Written like this: ab*(c|ε)
How is the formal specification of a language’s tokens typically accomplished?
(token specification)
By associating a regular expression with each token.
What is a Recursive decent parser?
One of the simplest parsing techniques used in practical compilers.
This is a top-down process in which the parser attempts to verify that the syntax of the input stream is correct as it is read from left to right.
What is a parse tree?
A Parse tree (sometimes called a derivation tree) visualizes how a string is structured by a grammar.
It has the following characteristics:
- The root is the grammar’s start symbol S.
- Each node is either a grammar symbol or λ.
- Its interior nodes are nonterminals.
What is Symbol Table construction?
A semantic processing activity that traverses the AST to record all identifiers and their types in a symbol table.
Symbol table:
In the Type-Checking phase, in which order does it check the types?
It walks the AST bottom up from its leaves towards the root.
What is a Tombstone diagram
Tombstone diagrams consist of a set of “puzzle pieces” representing compilers and other related language processing programs.
They are used to illustrate transformations from a source language to a target language realised in an implementation language.
Tombstone diagram: What does the diagram describe?
A program P implemented in the language L.
Tombstone diagram: What does the diagram describe?
A Translator/Compiler implemented in the language L, that can translate the language S into the language T.
Tombstone diagram: What does the diagram describe?
A machine M implemented in hardware.
Tombstone diagram: What does the diagram describe?
A language interpreter written in L that interprets the language M.
When would you want to compile a compiler?
When you have created a compiler you want to run on a machine that cannot run on the code it’s currently written in. (see image for an example)
What is Bootstrapping?
A way to compile a compiler written in it’s own language.
The first compiler has to be compiled by a compiler in another language.
What is a recognizer?
A recognition device that reads input strings over the alphabet of the language and decides whether the input strings belong to that language.
What is Backus-Nair Form (BNF)?
BNF is a notation techniques for context-free grammars.
What is Extended Backus-Nair Form (EBNF)?
EBNF is an adds extra features to BNF:
- ? which means that the symbol (or group of symbols in parenthesis) to the left of the operator is optional (it can appear zero or one times)
- * which means that something can be repeated any number of times (and possibly be skipped altogether)
- which means that something can appear one or more times
When is a grammar ambiguous?
When it generates a sentential form that has two or more distinct parse trees.
What is a Throws analysis visitor?
Throws Visitor is a specialized visitor used to collect information about throws that may ”escape” from a given construct.
These visitors compute the throwsSet field, which records exceptions that may be thrown.