module 2 Flashcards
data type
defines a collection of data objects and a set of predefined
operations on those objects
descriptor
collection of the attributes of a variable
object
an instance of a user-defined (abstract data) type
One design issue for all data types
What operations are defined and how are
they specified?
primitive data types
- Almost all programming languages provide a set of primitive data types
- not defined in terms of other data types
- Some primitive data types are merely reflections of the hardware
- Others require only a little non-hardware support for their implementation
primitive data type examples
Integer
Floating Point
Complex
Decimal
Boolean
Character
Value
sequences of characters
design issues of character strings
Is it a primitive type or just a special kind of array?
Should the length of strings be static or dynamic?
typical string operations
§Assignment and copying
§Comparison (=, >, etc.)
§Catenation
§Substring reference
§Pattern matching
ordinal type
one in which the range of possible values can be easily
associated with the set of positive integers
examples of ordinal types in java
§integer
§char
§boolean
Enumeration Types
All possible values, which are named constants, are provided in the definition
design issues of enumeration
§Is an enumeration constant allowed to appear in more than one type definition, and if
so, how is the type of an occurrence of that constant checked?
§Are enumeration values coerced to integer?
§Any other type coerced to an enumeration type?
array design issues
- What types are legal for subscripts?
- Are subscripting expressions in element references range checked?
- When are subscript ranges bound?
- When does allocation take place?
- Are ragged or rectangular multidimensional arrays allowed, or both?
- What is the maximum number of subscripts?
- Can array objects be initialized?
- Are any kind of slices supported?
4 categories of arrays
- Static: subscript ranges are statically bound and storage allocation is static
(before run-time):
§Advantage: efficiency (no dynamic allocation) - Fixed stack-dynamic: subscript ranges are statically bound, but the allocation is
done at declaration time:
§Advantage: space efficiency - Fixed heap-dynamic: similar to fixed stack-dynamic: storage binding is dynamic
but fixed after allocation (i.e., binding is done when requested and storage is
allocated from heap, not stack) - Heap-dynamic: binding of subscript ranges and storage allocation is dynamic
and can change any number of times:
§Advantage: flexibility (arrays can grow or shrink during program execution)
rectangular array
multi-dimensioned array in which all of the rows have
the same number of elements and all columns have the same number of
elements
F# and C# support rectangular arrays and jagged arrays
jagged matrix
has rows with varying number of elements:
§Possible when multi-dimensioned arrays actually appear as arrays of arrays
C, C++, and Java support jagged arrays
slice
some substructure of an array; nothing more than a referencing
mechanism
Slices are only useful in languages that have array operations
Access function
maps subscript expressions to an address in the array
for single-dimensioned arrays:
address(list[k]) =
address (list[lower_bound]) + ((k-lower_bound) * element_size)
record
a possibly heterogeneous aggregate of data elements in which the
individual elements are identified by names
design issues for records
§What is the syntactic form of references to the field?
§Are elliptical references allowed
tuple
data type that is similar to a record, except that the elements are not
named
Used in Python, ML, and F# to allow functions to return multiple values
Python:
§Closely related to its lists, but immutable
Lists
in Lisp and Scheme are delimited by parentheses and use no commas
union
a type whose variables are allowed to store different type values at
different times during execution
design issue for unions
Should type checking be required?
Discriminated vs. Free Unions
free union: C and C++ provide union constructs in which there is no language support for
type checking
discriminated: each union include a type indicator called a
discriminant (Supported by ML, Haskell, and F#)
pointer type variable
a range of values that consists of memory addresses
and a special value
* Provide the power of indirect addressing
* Provide a way to manage dynamic memory
* A pointer can be used to access a location in the area where storage is
dynamically created (usually called a heap)
Design Issues of Pointers
- What are the scope of and lifetime of a pointer variable?
- What is the lifetime of a heap-dynamic variable?
- Are pointers restricted as to the type of value to which they can point?
- Are pointers used for dynamic storage management, indirect addressing, or
both? - Should the language support pointer types, reference types, or both?
Problems with Pointers
- Dangling pointers (dangerous):
§A pointer points to a heap-dynamic variable that has been deallocated - Lost heap-dynamic variable:
§An allocated heap-dynamic variable that is no longer accessible to the user program
(often called garbage):
o Pointer p1 is set to point to a newly created heap-dynamic variable
o Pointer p1 is later set to point to another newly created heap-dynamic variable
o The process of losing heap-dynamic variables is called memory leakage
Reference Types
used
primarily for formal parameters:
§Advantages of both pass-by-reference and pass-by-value
- Java extends C++’s reference variables and allows them to replace pointers
entirely:
§References are references to objects, rather than being addresses - C# includes both the references of Java and the pointers of C++
Type checking
the activity of ensuring that the operands of an operator are of
compatible types
compatible type
one that is either legal for the operator, or is allowed under
language rules to be implicitly converted, by compiler-generated code, to a legal
type: this automatic conversion is called a coercion
Type error
the application of an operator to an operand of an inappropriate type
If all type bindings are static…
nearly all type checking can be static
If type bindings are dynamic…
type checking must be dynamic
what makes a programming language strongly typed?
type errors are always detected
Advantage of strong typing
allows the detection of the misuses of variables that
result in type errors
examples of strong typing
- Language examples:
§C and C++ are not: parameter type checking can be avoided; unions are not type
checked
§Java and C# are, almost: because of explicit type casting
§ML and F# are - Coercion rules strongly affect strong typing: they can weaken it considerably
(C++ versus ML and F#) - Although Java has just half the assignment coercions of C++, its strong typing is
still far less effective than that of Ada
Name type equivalence
means the two variables have equivalent types if they
are in either the same declaration or in declarations that use the same type
name
Easy to implement but highly restrictive:
§Subranges of integer types are not equivalent with integer types
§Formal parameters must be the same type as their corresponding actual parameters
Structure Type Equivalence
means that two variables have equivalent types if
their types have identical structures
More flexible, but harder to implement
what are the problems with structured type equivalence?
Consider the problem of two structured types:
§Are two record types equivalent if they are structurally the same but use different
field names?
§Are two array types equivalent if they are the same except that the subscripts are
different?
e.g., [1..10] and [0..9]
§Are two enumeration types equivalent if their components are spelled differently?
§With structural type equivalence, you cannot differentiate between types of the same
structure:e.g., different units of speed, both float
expressions
the fundamental means of specifying computations in a
programming language
To understand expression evaluation, need to be familiar with the orders of
operator and operand evaluation
dominant role of assignment statements
Essence of imperative languages
one of the motivations for the development of the first
programming languages
Arithmetic evaluation
Arithmetic expressions
consist of operators, operands, parentheses, and function calls
In most languages, binary operators are
infix, except in Scheme and LISP, in which
they are prefix; Perl also has some prefix binary operators
Most unary operators are
prefix, but the ++ and –- operators in C-based languages
can be either prefix or postfix
Typical precedence levels (operators)
§parentheses
§unary operators
§** (if the language supports it)
§*, /
§+, -
Operator Precedence Rules
define the order in
which “adjacent” operators of different precedence levels are evaluated in expressions
Operator Associativity Rule
define the order in
which adjacent operators with the same precedence level are evaluated
Typical associativity rules:
Left to right, except **, which is right to left
Sometimes unary operators associate right to left (e.g., in FORTRAN)
Precedence and associativity rules can be overriden with
parentheses
Operand evaluation order
§Variables: fetch the value from memory
§Constants: sometimes a fetch from memory; sometimes the constant is in the machine language instruction
§Parenthesized expressions: evaluate all operands and operators first
§The most interesting case is when an operand is a function call
Two possible solutions to the problem of functional side effects
§Write the language definition to disallow functional side effects:
o No two-way parameters in functions
o No non-local references in functions
o Advantage: it works!
o Disadvantage: inflexibility of one-way parameters and lack of non-local references
§Write the language definition to demand that operand evaluation order be fixed:
o Disadvantage: limits some compiler optimizations
o Java requires that operands appear to be evaluated in left-to-right order
referential transparency
if any two expressions in
the program that have the same value can be substituted for one another anywhere in the program, without affecting the action of the program
Advantage of referential transparency
Semantics of a program is much easier to understand if it has referential
transparency
operator overloading
Use of an operator for more than one purpose
Some are common (e.g., + for int and float)
Some are potential trouble (e.g., * in C and C++):
§Loss of compiler error detection (omission of an operand should be a detectable
error)
§Some loss of readability
narrowing conversion
converts an object to a type that cannot
include all of the values of the original type e.g., float to int
widening conversion
an object is converted to a type that can
include at least approximations to all of the values of the original type, e.g., int
to float
Relational Expressions
- Use relational operators and operands of various types
- Evaluate to some Boolean representation
- Operator symbols used vary somewhat among languages (!=, /=, ~=, .NE.,
<>, #)
Boolean Expressions
Operands are Boolean and the result is Boolean
Short Circuit Evaluation
An expression in which the result is determined without evaluating all of the operands and/or operators
Example: (13 * a) * (b / 13 – 1)
§If a is zero, there is no need to evaluate (b /13 - 1)
types of assignments
- Functional language: expressions are the building blocks:
§If computation is expression evaluation that depends only on the referencing
environment for that evaluation
§Expressions in a purely functional language are referentially transparent: value
depends only on the referencing environment - Imperative language: computation usually ordered series of changes to the
values of variables in memory:
§Assignments provide the principal means for these changes
side effect of assignments
a programming construct influences subsequent computation in any
way other than by returning a value for use in the surrounding context:
§Expressions: always produce value and may have side effects
§Statements: executed solely for the side effects
§Imperative programming: computing by means of side effects
Value model of variables
an expression is either an l-value or an r-value, based on the context in which it appears
how are the semantics of an assignment different across languages?
Context based: a variable may refer to the value of the variable (r-value) or its location (l-value) – a named container for a value
Reference model of variables
a variable is a named reference for a value –
every variable is an l-value:
Assignment Statements
- The general syntax:
<target_var> <assign_operator> <expression>
The assignment operator:
= Fortran, BASIC, the C-based languages
\:= Ada
* = can be bad when it is overloaded for the relational operator for equality:
§That’s why the C-based languages use == as the relational operator
</expression></assign_operator></target_var>
Combination Assignment Operators
Imperative languages frequently update variables and can use statements like a
= a + 1; that result in redundant address calculations:
Unary Assignment Operators
Unary assignment operators in C-based languages combine increment and
decrement operations with assignment
sum = ++count (count incremented, then assigned to sum)
sum = count++ (count assigned to sum, then incremented)
Mixed-Mode Assignment
- In Fortran, C, Perl, and C++, any numeric type value can be assigned to any
numeric type variable - In Java and C#, only widening assignment coercions are done
- In Ada, there is no assignment coercion
control structure
a control statement and the statements whose execution it
controls
design question for control structures
Should a control structure have multiple entries?
selection statement
provides the means of choosing between two or more
paths of execution
2 types of selection statements
§Two-way selectors
§Multiple-way selectors
Two-Way Selection Statements
if control_expression
then clause
else clause
design questions for 2 way selection statements
§What is the form and type of the control expression?
§How are the then and else clauses specified?
§How should the meaning of nested selectors be specified?
Multiple-Way Selection Statements
Allow the selection of one of any number of statements or statement groups
design issues for multiple-way selection statements
§What is the form and type of the control expression?
§How are the selectable segments specified?
§Is execution flow through the structure restricted to include just a single selectable
segment?
§How are case values specified?
§What is done about unrepresented expression values?
approaches to implementing multiple selectors
§Multiple conditional branches
§Store case values in a table and use a linear search of the table
§When there are more than ten cases, a hash table of case values can be used
§If the number of cases is small and more than half of the whole range of case values
are represented, an array whose indices are the case values and whose values are
the case labels can be used
iterative statements
The repeated execution of a statement or compound statement is accomplished
either by iteration or recursion
design issues for iteration control statements
§How is iteration controlled?
§Where is the control mechanism in the loop?
Counter-Controlled Loops
A counting iterative statement has a loop variable, and a means of specifying
the initial and terminal, and stepsize values
design issues of Counter-Controlled Loops
§What are the type and scope of the loop variable?
§Should it be legal for the loop variable or loop parameters to be changed in the loop
body, and if so, does the change affect loop control?
§Should the loop parameters be evaluated only once, or once for every iteration?
§What is the value of the loop variable after loop termination?
Logically-Controlled Loops (& design issues)
Repetition control is based on a Boolean expression
Design issues:
§Pretest or posttest?
§Should the logically controlled loop be a special case of the counting loop statement
or a separate statement?
User-Located Loop Control Mechanisms
- Sometimes it is convenient for the programmers to decide a location for loop
control (other than top or bottom of the loop) - Simple design for single loops (e.g., break)
Design issues for nested loops:
§Should the conditional be part of the exit?
§Should control be transferable out of more than one loop?
what controls loop iteration?
The number of elements in a data structure
Control mechanism for iteration
a call to an iterator function that returns the next element in some chosen order, if there is one; else loop is terminated
Unconditional Branching
- Transfers execution control to a specified place in the program
- Represented one of the most heated debates in 1960’s and 1970’s
- Major concern: Readability
- Some languages do not support goto statement (e.g., Java)
- C# offers goto statement (can be used in switch statements)
Guarded Commands
- Designed by Dijkstra
- Purpose: to support a new programming methodology that supported verification
(correctness) during development - Basis for two linguistic mechanisms for concurrent programming (in CSP)
- Basic Idea: if the order of evaluation is not important, the program should not
specify one
Selection Guarded Command
Form:
if <Boolean> -> <statement>
[] <Boolean> -> <statement>
...
[] <Boolean> -> <statement>
fi</statement></Boolean></statement></Boolean></statement></Boolean>
when construct is reached:
§Evaluate all Boolean expressions
§If more than one are true, choose one non-deterministically
§If none are true, it is a runtime error
Loop Guarded Command
do <Boolean> -> <statement>
[] <Boolean> -> <statement>
...
[] <Boolean> -> <statement>
od</statement></Boolean></statement></Boolean></statement></Boolean>
for each iteration:
§Evaluate all Boolean expressions
§If more than one are true, choose one non-deterministically; then start loop again
§If none are true, exit loop
Two fundamental abstraction facilities
Process abstraction:
Data abstraction:
subprogram definition
describes the interface to and the actions of the
subprogram abstraction
subprogram call
an explicit request that the subprogram be executed
subprogram header
the first part of the definition, including the name, the
kind of subprogram, and the formal parameters
parameter profile (aka signature) of a subprogram
the number, order, and
types of its parameters
protocol
a subprogram’s parameter profile and, if it is a function, its return
type
prototypes
Function declarations in C and C++
A subprogram declaration provides
the protocol, but not the body, of the
subprogram
A formal parameter
a dummy variable listed in the subprogram header and
used in the subprogram
An actual parameter
represents a value or address used in the subprogram call
statement
Actual/Formal Parameter Correspondence
- Positional:
§The binding of actual parameters to formal parameters is by position: the first actual
parameter is bound to the first formal parameter and so forth
§Safe and effective - Keyword:
§The name of the formal parameter to which an actual parameter is to be bound is
specified with the actual parameter
§Advantage: parameters can appear in any order, thereby avoiding parameter
correspondence errors
§Disadvantage: user must know the formal parameter’s names
two categories of subprograms
§Procedures are collection of statements that define parameterized computations
§Functions structurally resemble procedures but are semantically modeled on
mathematical functions:
o They are expected to produce no side effects
o In practice, program functions have side effects
Design Issues for Subprograms
- Are local variables static or dynamic?
- Can subprogram definitions appear in other subprogram definitions?
- What parameter passing methods are provided?
- Are parameter types checked?
- If subprograms can be passed as parameters and subprograms can be nested,
what is the referencing environment of a passed subprogram? - Are functional side effects allowed?
- What types of values can be returned from functions?
- How many values can be returned from functions?
- Can subprograms be overloaded?
- Can subprogram be generic?
- If the language allows nested subprograms, are closures supported?
advantages/disadvantages of stack-dynamic local variables
§Advantages:
o Support for recursion
o Storage for locals is shared among some subprograms
§Disadvantages:
o Allocation/de-allocation, initialization time
o Indirect addressing
o Subprograms cannot be history sensitive
advantages/disadvantages of static local variables
Advantages and disadvantages are the opposite of those for stack-dynamic local
variables
draw the semantic models of parameter passing
- in mode
- out mode
- inout mode
Two important considerations for Parameter Passing
§Efficiency
§One-way or two-way data transfer
But the above considerations are in conflict:
§Good programming suggest limited access to variables, which means one-way
whenever possible
§But pass-by-reference is more efficient to pass structures of significant size
Conceptual Models of Transfer
- Physically move a value
- Move an access path to a value
- Approaches:
§Pass-by-Value (In Mode)
§Pass-by-Result (Out Mode)
§Pass-by-Value-Result (inout Mode)
§Pass-by-Reference (Inout Mode)
§Pass-by-Name (Inout Mode)
types of binding for subprograms (3)
- Shallow binding:
§The environment of the call statement that enacts the passed subprogram
§Most natural for dynamic-scoped languages - Deep binding:
§The environment of the definition of the passed subprogram
§Most natural for static-scoped languages - Ad hoc binding:
§The environment of the call statement that passed the subprogram
Calling Subprograms Indirectly happens when
there are several possible subprograms to be called and the
correct one on a particular run of the program is not know until execution (e.g.,
event handling and GUIs)
draw an example of parameter-passing methods with:
* Function header: void sub(int a, int b, int c, int d)
* Function call in main: sub(w, x, y, z)
§Pass w by value
§Pass x by result
§Pass y by value-result
§Pass z by reference
answer is on slide 37
Design Issues for Functions
- Are side effects allowed?
§Parameters should always be in-mode to reduce side effect (like Ada) - What types of return values are allowed?
§Most imperative languages restrict the return types
§C allows any type except arrays and functions
§C++ is like C but also allows user-defined types
§Java and C# methods can return any type (but because methods are not types, they
cannot be returned)
§Python and Ruby treat methods as first-class objects, so they can be returned, as
well as any other class
Overloaded Subprograms
overloaded subprogram is one that has the same name as another
subprogram in the same referencing environment:
§Every version of an overloaded subprogram has a unique protocol
- C++, Java, C#, and Ada include predefined overloaded subprograms
- In Ada, the return type of an overloaded function can be used to disambiguate
calls (thus two overloaded functions can have the same parameters) - Ada, Java, C++, and C# allow users to write multiple versions of subprograms
with the same name
Generic Subprograms
A generic or polymorphic subprogram takes parameters of different types on
different activations
* Overloaded subprograms provide ad hoc polymorphism
Subtype polymorphism
a variable of type T can access any object of
type T or any type derived from T (OOP languages)
parametric polymorphism
A cheap compile-time substitute for dynamic binding
A subprogram that takes a generic parameter that is used in a type expression
that describes the type of the parameters of the subprogram
closure
a subprogram and the referencing environment where it was
defined
§The referencing environment is needed if the subprogram can be called from any
arbitrary place in the program
when are closures needed?
only needed if a subprogram can access variables in nesting scopes
and it can be called from anywhere
A static-scoped language that does not permit nested subprograms doesn’t need
closures
coroutine
a subprogram that has multiple entries and controls them itself –
supported directly in Lua
Also called symmetric control: caller and called coroutines are on a more equal
basis
- Coroutines repeatedly resume each other, possibly forever
- Coroutines provide quasi-concurrent execution of program units (the
coroutines); their execution is interleaved, but not overlapped
resume
A coroutine call
The first resume of a coroutine is to its beginning, but subsequent calls enter at
the point just after the last executed statement in the coroutine
General semantics of calls to a subprogram:
§Parameter passing methods
§Stack-dynamic allocation of local variables
§Save the execution status of calling program
§Transfer of control and arrange for the return
§If subprogram nesting is supported, access to nonlocal variables must be arranged
subprogram linkage
The subprogram call and return operations of a language
Call Semantics:
§Save the execution status of the caller
§Pass the parameters
§Pass the return address to the called
§Transfer control to the called
Return Semantics
§If pass-by-value-result or out mode parameters are used, move the current values of
those parameters to their corresponding actual parameters
§If it is a function, move the functional value to a place the caller can get it
§Restore the execution status of the caller
§Transfer control back to the caller
Required storage:
Status information, parameters, return address, return value for functions,
temporaries
Environment Pointer (EP)
must be maintained by the run-time system. It
always points at the base of the activation record instance of the currently
executing program unit
when are activation record instances created?
when a subprogram is
called
where do activation record instances reside?
on the run-time stack
where does the dynamic link point
to the top of an instance of the activation record of the
caller
The activation record format is static, but its size may be
dynamic
what does an activation record look like?
(stack top)
Local variables
parameters
dynamic link
return address
dynamic chain, or call chain
collection of dynamic links in the stack at a given time
local_offset
can be determined by the compiler at compile
time
Local variables can be accessed by their offset from the beginning of the
activation record, whose address is in the EP.
Nested Subprograms
Some non-C-based static-scoped languages (e.g., Fortran 95+, Ada, Python,
JavaScript, Ruby, and Swift) use stack-dynamic local variables and allow
subprograms to be nested
The process of locating a non-local reference:
§Find the correct activation record instance
§Determine the correct offset within that activation record instance
static chain
a chain of static links that connects certain activation record
instances
* The static link in an activation record instance for subprogram A points to one of
the activation record instances of A’s static parent
* The static chain from an activation record instance connects it to all of its static
ancestors
* static_depth is an integer associated with a static scope whose value is the
depth of nesting of that scope
Blocks
user-specified local scopes for variables
2 ways to implement dynamic scoping
- Deep Access: non-local references are found by searching the activation record
instances on the dynamic chain:
§Length of the chain cannot be statically determined
§Every activation record instance must have variable names - Shallow Access: put locals in a central place:
§One stack for each variable name
§Central table with an entry for each variable name
Referential transparency means that:
The execution of a function always produces the same result, given the same parameters.
Iteration is a principal mechanism in
Imperative programming languages
A cactus stack is used to provide support for:
Declaring more than one coroutine in the same nonglobal scope.
A closure includes only a referencing environment.
FALSE
The subroutine is a principal mechanism for process abstraction
TRUE
Loop categories do not include:
Type-controlled.
Pointers are just addresses of memory locations
FALSE
The goto statement is often used in modern programming languages.
FALSE
A programming language must provide a special syntax for recursive functions.
FALSE
The size of a union is the sum of the sizes of its members.
FALSE
Unlike expressions, statements always have the corresponding value.
FALSE
A slice is nothing more than a referencing mechanism.
TRUE
The earliest fundamental abstraction facility is:
Process abstraction.
Subprograms can be called indirectly, i.e., there are pointers to functions.
TRUE
Two different subprograms cannot have the same name.
FALSE
A subprogram that has multiple entries and controls them itself is called:
Coroutine
Shallow binding is most natural for dynamic-scoped languages.
TRUE
Procedures are expected to produce no side effects.
FALSE
Static scoping is implemented using:
Static chain