Section Seven: Data structures Flashcards
Chapter 33 – Arrays, tuples and records
Data structures
Computer languages such as Python, Pascal and VB have built-in elementary data types such as integer, real, Boolean and char. They also have some built-in structured data types such as string, array and record. These are made up of a number of elements of a specified type such as integer, real or string.
Chapter 33 – Arrays, tuples and records
1-dimensional arrays
An array is defined as a finite, ordered set of elements of the same type, such as integer, real or char. Finite means that there is a specific number of elements in the array. Ordered implies that there is a first, second, third etc. element of the array.
For example, (assuming the first element of the array is myArray[0]:
myArray = [51, 72, 35, 37, 0, 3]
x = myArray[2] #assigns 35 to x
Chapter 33 – Arrays, tuples and records
2-dimensional arrays
An array can have two or more dimensions. A two-dimensional array can be visualised as a table, rather like a spreadsheet.
Chapter 33 – Arrays, tuples and records
Arrays of three dimensions
Arrays may have more than two dimensions. An n-dimensional array is a set of elements of the same type, indexed by n integers. In a 3-dimensional array x, a particular element may be referred to as x[4,5,2], for example. The first element would be referred to as x[0,0,0].
Chapter 33 – Arrays, tuples and records
Tuples
A tuple is an ordered set of values, which could be elements of any type such as strings, integers or real numbers, or even graphic images, sound files or arrays. Unlike arrays, the elements do not all have to be of the same type. However, a tuple, like a string, is immutable, which means that its elements cannot be changed, and you cannot dynamically add elements to or delete elements from a tuple.
In Python a tuple is written in parentheses, for example:
pupil = (“John”, 78, “a”)
You can refer to individual elements of a tuple, for example:
name = pupil[0]
but the following staement is invalid:
pupil[0] = “Mary”
Chapter 33 – Arrays, tuples and records
Records
If you want to store data permanently so that you can read or update it at a future date, the data needs to be stored in a file on disk. The most common way of storing large amounts of data conveniently is to use a database, but sometimes you need to create and interrogate your own files.
Generally, a file consists of a number of records. A record contains a number of fields, each holding one item of data. For example, in a file holding data about students, you might have the following record structure:
ID Firstname Surname DateOfBirth Class
1453 Gemma Baines 01/05/2004 2G
1768 Paul Gerrard 17/11/2003 2G
2016 Brian Davidson 03/08/2002 3H
Chapter 34 – Queues
Abstract data types
An abstract data type is one that is created by the programmer, rather than defined within the programming language. They include structures such as queues, stacks, trees and graphs. These can easily be shown in graphical form, and it is not hard to understand how to perform operations such as adding, deleting or counting elements in each structure. However, programming languages require data types to represent them.
An abstract data type (ADT) is a logical description of how the data is viewed and the operations that can be performed on it, but how this is to be done is not necessarily known to the user. It is up to the programmer who creates the data structure to decide how to implement it, and it may be built in to the programming language. This is a good example of data abstraction, and by providing this level of abstraction we are creating an encapsulation around the data, hiding the details
of implementation from the user.
As a programmer, you will be quite familiar with this concept. When you call a built-in function such as random to generate a random number, or sqrt to find the square root of a number, you are not at all concerned with how these functions are implemented.
Chapter 34 – Queues
Queues
A queue is a First In First Out (FIFO) data structure. New elements may only be added to the end of a queue, and elements may only be retrieved from the front of a queue. The sequence of data items in a queue is determined, therefore, by the order in which they are inserted. The size of the queue depends on the number of items in it, just like a queue at traffic lights or at a supermarket checkout.
Queues are used in a variety of applications:
- Output waiting to be printed is commonly stored in a queue on disk. In a room full of networked computers, several people may send work to be printed at more or less the same time. By putting the output into a queue on disk, the output is printed on a first come, first served basis as soon as the printer is free.
- Characters typed at a keyboard are held in a queue in a keyboard buffer.
- Queues are useful in simulation problems. A simulation program is one which attempts to model a real-life situation so as to learn something about it. An example is a program that simulates customers arriving at random times at the check-outs in a supermarket store, and taking random times to pass through the checkout. With the aid of a simulation program, the optimum number of check-out counters can be established.
Chapter 34 – Queues
Operations on a queue
The abstract data type queue is defined by its structure and the operations which can be performed on it. It is described as an ordered collection of items which are added at the rear of the queue, and removed from the front.
The following queue operations are needed:
- enQueue(item) Add a new item to the rear of the queue
- deQueue() Remove the front item from the queue and return it
- isEmpty() Test to see whether the queue is empty
- isFull() Test to see whether queue is full
Chapter 34 – Queues
Dynamic vs static data structures
A dynamic data structure refers to a collection of data in memory that has the ability to grow or shrink in size. It does this with the aid of the heap, which is a portion of memory from which space is automatically allocated or de-allocated as required.
Languages such as Python, Java and C support dynamic data structures, such as the built-in list data type in Python. A potential drawback of using a dynamic data structure is that the data structure may cause overflow if it exceeds the maximum memory limit.
Dynamic data structures are very useful for implementing data structures such as queues when the maximum size of the data structure is not known in advance. The queue can be given some arbitrary maximum to avoid causing memory overflow, but it is not necessary to allocate space in advance. A
further advantage of using a built-in dynamic data structure such as a list is that many methods or functions such as append, remove, length, insert, search and pop may already be written and can be used in the implementation of other data structures such as a queue or stack.
A static data structure such as an array is fixed in size, and cannot increase in size or free up memory while the program is running. An array is suitable for storing a fixed number of items such as the months of the year, monthly sales or average monthly temperatures. The disadvantage of using an array to implement a dynamic data structure such as a queue is that the size of the array has to be decided in advance by the programmer, and if the number of items added fills up the array, then no more can be added, regardless of how much free space there is in memory. Python does not have a built-in array data structure.
Chapter 34 – Queues
Implementing a linear queue
There are basically two ways to implement a linear queue in an array or list:
- As items leave the queue, all of the other items move up one space so that the front of the queue is always the first element of the structure, e.g. q[0]. With a long queue, this may require significant processing time.
- A linear queue can be implemented as an array with pointers to the front and rear of the queue. An integer holding the size of the array (the maximum size of the queue) is needed, as well as a variable giving the number of items currently in the queue. However, clearly a problem will arise as many items are added to and deleted from the queue, as space is created at the front of the queue which cannot be filled, and items are added until the rear pointer points to the last element of the data structure.
Chapter 34 – Queues
A circular queue
One way of overcoming the limitation of a static data structure such as an array is to implement the queue as a circular queue, so that when the array fills up and the rear pointer points to the last element of the array, say q[5], it will be made to point to the first element, q[0], when the next person joins the queue, assuming this element is empty. This solution requires some extra effort on the part of the programmer, and is less flexible than a dynamic data structure if the maximum number of items is not known in advance.
Chapter 34 – Queues
Priority queues
In some situations where items are placed in a queue, a system of priorities is used. For example an operating system might schedule jobs in order of priority, or a printer may give shorter print jobs priority over longer ones.
A priority queue acts like a queue in that items are dequeued by removing them from the front of the queue. However, the logical order of items within the queue is determined by their priority, with the highest priority items at the front of the queue and the lowest priority items at the back. It is therefore possible that a new item joins the queue at the front, rather than at the rear.
Such a queue could be implemented by checking the priority of each item in the queue, starting at the rear and moving it along one place until an item with the same or lower priority is found, at which point the new item can be inserted.
Chapter 35 – Lists and linked lists
Definition of a list
In computer science, a list is an abstract data type consisting of a number of items in which the same item may occur more than once. The list is sequenced so can refer to the first, second, third, item and we can also refer to the last element of the list.
A list is a very useful data type for a wide variety of operations, and can be used, for example, to implement other data structures such as a queue, stack or tree.
Chapter 35 – Lists and linked lists
Operations on lists
List operation, Description, Example, list contents, Return
value
isEmpty(), Test for empty list, a.isEmpty(), [45, 13, 19, 13, 8], False
append(item), Add a new item to list to the end of the list, a.append(33), [45, 13, 19, 13, 8, 33]
remove(item), Remove the first occurrence of an item from list, a.remove(13), [45, 19, 13, 8, 33]
search(item), Search for an item in list, a.search(22), [45, 19, 13, 8, 33], False
length(), Return the number of items, a.length(), [45, 19, 13, 8, 33], 5
index(item), Return the position of item, a.index(8), [45, 19, 13, 8, 33], 3
insert(pos,item), Insert a new item at position pos a.insert(2,7), [45, 19, 7, 13, 8, 33]
pop() Remove and return the last item in the list, a.pop(), [45, 19, 7, 13, 8], 33
pop(pos), Remove and return the item at position, pos a.pop(1), [45, 7, 13, 8], 18
Chapter 35 – Lists and linked lists
Using an array
It is possible to maintain an ordered collection of data items using an array, which is a static data structure. This may be an option if the programming language does not support the list data type and if the maximum number of data items is small, and is known in advance.
The programmer then has to work out and code algorithms for each list operation. The empty array must be declared in advance as being a particular length, and this could be used, for example, to hold a priority queue.
Chapter 35 – Lists and linked lists
Inserting a new name in the list
If the list needs to be held in sequential order in the array, the algorithm could first determine where a new item has to be added, and then if necessary, start at the end of the list and move the rest of the items along in order to make room for it.
Chapter 35 – Lists and linked lists
Linked lists
A linked list is a dynamic data structure used to hold a sequence, as described below:
- The items which form the sequence are not necessarily held in contiguous data locations, or in the order in which they occur in the sequence
- Each item in the list is called a node and contains a data field and a next address field called a link or pointer field (the data field may consist of several subfields.)
- The data field holds the actual data associated with the list item, and the pointer field contains the address of the next item in the sequence
- The link field in the last item indicates that there are no further items by the use of a null pointer
- Associated with the list is a pointer variable which points to (i.e. contains the address of) the first node in the list
Chapter 36 – Stacks
Concept of a stack
A stack is a Last In, First Out (LIFO) data structure. This means that, like a stack of plates in a cafeteria, items are added to the top and removed from the top.