(4.5) Fundamentals of data representation Flashcards
What are natural numbers? (ℕ)
Positive integers
(including 0)
number set ℕ = {0, 1, 2, 3, … }
What are integers? (ℤ)
Whole numbers
number set ℤ = { …, -3, -2, -1, 0, 1, 2, 3, … }
What are rational numbers? (ℚ)
Numbers that can be written as fractions (ratios of integers), including integers (ex 7; 7/1)
What are irrational numbers?
Numbers that cannot be written as a fraction (ex √2, √3, √5, √7, √11, √13, √17, √19)
What are real numbers? ( ℝ)
possible real world quantities
includes natural, rational and irrational numbers
numbers that are not real include imaginary numbers (ex e or i)
What are ordinal numbers?
ordinal
numbers are used to tell an objects numerical position in a list (ex 1st, 2nd, 3rd, etc)
What type of numbers are used for:
-counting
-measurements
Counting - natural numbers
Measurements - real numbers
Why is hexadecimal (base 16) used as shorthand for binary (base 2)?
- large numbers can be represented using fewer digits
- easier to understand and remember
(colour values and MAC addresses are often represented in hex)
How do you work out how many values can be represented with n bits?
2^n
ex if n=3, 2^3 = 8
(000 001 011 111 010 110 101 100)x8
How can quantities of bytes be described using binary prefixes representing powers of 2
(ex 1KiB = 2^10)
- kibi, Ki - 2^10
- mebi, Mi - 2^20
- gibi, Gi - 2^30
- tebi, Ti - 2^40
How can quantities of bytes be described using decimal prefixes representing powers of 10
(1kB = 10^3)
- kilo, k - 10^3
- mega, M - 10^6
- giga, G - 10^9
- tera, T - 10^12
Historically innacurate use of units
Historically the terms kilobyte, megabyte, etc
have often been used when kibibyte, mebibyte,
etc are meant
In unsigned binary, what are the minimum and maximum values for a given number of bits (n) ?
0 and (2^n)-1
ex bits = 4
Minimum value = 0
Maximum value = (2^4)-1 = 15
Adding two (unsigned) binary integers
010010
100100
110110
0 + 0 = 0
1 + 0 = 1
1 + 1 = 10 (0 carry the 1)
1 + 1 + 1 = 11 (1 carry the 1)
Multiplying two (unsigned) binary integers
10100
x —10
—00000
101000
0101000
0 x 0 = 0
0 x 1 = 0
1 x 1 =1
How to represent negative and positive integers in two’s complement (Signed binary)
Positive
-128 64 32 16 8 4 2 1
—0—1—1—1-0-1-0-1
= (64+32+16+4+1) = 117
The most significant bit is always 0
Negative
-128 64 32 16 8 4 2 1
–1– 0—0—0–1-0-1-1
= (-128+8+2+1) = -117
The most significant bit (negative) is always 1
Converting binary from unsigned to signed
Unsigned
01110110 = 177
Signed
10001010 = -117
From the least significant bit, keep the binary digit the same up to and including the first 1, after which switch a 1 to a 0/ a 0 to a 1.
Performing subtraction using two’s complement
165 - 23 = ?
1) Convert the number being subtracted (23) into negative signed binary
00010111 = 23
11101001 = -23
2) Add together the positive binary number (165) and the negative number (-23)
-10100101
-11101001
110001110
3) If there is overflow with subtraction, ignore the most significant figure
so 110001110
= 10001110 = 142
165 + -23 = 142
Minimum and maximum ranges of values that can be represented in signed and unsigned binary
Unsigned
Min = 0
Max = (2^n) -1
ex 3 bit
Min = 000 = 0
Max = 111 = (2^3) -1 = 7
Signed
Min = -2^(n-1)
Max = 2^(n-1) -1
ex 3 bit
Min = 100 = -2^(3-1) = -4
Max = 011 = 2^(3-1)-1 = 3
Representing numbers with fractional points in binary
Fixed point form binary:
-16 8 4 2 1 . 1/2 1/4 1/8
0 0 1 1 1 1 1 0 = 7.75 (4+2+1+1/2+1/4)
Disadvantages:
-range of numbers that can be stored is limited as some bits are being used for fractional part of the number
-Some numbers cannot be stored accurately (ex 1/3, recurring, etc)
Floating point binary:
-8 4 2 1 . 1/2 1/4 1/8 1/16
-64 32 16 8 4 2 1 . 1/2
(both use 8 bits - but can represent a wider range of magnitudes using the same number of bits, or allow for more relative precision for smaller numbers in the range)
Mantissa (number being stored, always in two’s complement)
Exponent (binary point position, always in two’s complement)
ex -4 2 1
1 1 0
=move binary point 2 to the left (because negative)
Advantages and disadvantages of fixed point and floating point
Range:
Floating point can represent a wider range of magnitudes than a fixed-point number using the same number of bits
Precision:
Floating point also allows for more relative precision for smaller numbers in the range.
Speed of calculation:
Fixed point can be faster and/or use less hardware than floating point
Why are both fixed point and floating point representation of decimal numbers inaccurate?
There are many numbers that binary cannot accurately represent, not exactly.
Rounding errors
Both fixed point and floating point binary representation of decimal numbers may be inaccurate.
Why was unicode introduced?
To represent a much wider range of different characters than ASCII, because it uses more bits and therefore combinations to represent.
However, it takes up more space.
Parity bits (Error checking and correction)
ASCII characters only use 7 bits - the left over bit can be used as a parity bit (usually the most significant bit)
Even parity = 0 (to create an even number of 1s)
Odd parity = 1 (to create an odd number of 1s)
Disadvantages:
-We cannot tell which bit has been corrupted, so the whole byte has to be resent
-If two bits came corrupted it wouldn’t be detected
Majority voting (Error checking and correction)
Identifies errors in data by transmitting binary digits multiple times and looking at the pattern recieved.
010 = 0 111 = 1 110 = 1 etc
If the pattern doesn’t match, majority voting checks which bit occurs most frequently and assumes it is the correct bit.
Advantages:
-Data doesn’t need to be requested again
Disadvantages:
-Requires sending x3 the bits just to receive an 8 bit character
Checksums (Error checking and correction)
A mathematical algorithm applied to a block of data
-data from the block is used to create the initial checksum, which is added up and transmitted along with the original data
ex(not accurate)
01010000 01110010 10001011
=50 + 23 + 87 = checksum
The same algorithm is applied at the end. If the checksums match, it is assumed the data has been transmitted correctly.
Check digits (Error checking and correction)
Redundancy check used for error detection on identification numbers, such as bank cards, where they are entered manually (where human error occurs often)
-Takes original code, each digit is assigned a weight, weights are added up, (some function, varies) produces check digit
(ex IBSN numbers on books)
Analogue signals and digital signals
analogue signal: Natural sound waves, occurring in a continuous wave form. e.g. Human voice
digital signal: Discrete digital format for representing natural sound waves. e.g. CDs and DVDs
Analogue data and digital data
Analogue data: continuous values
Digital data: discrete values
The principles of operation of an analogue to digital converter (ADC)
An analogue to digital converter (ADC): Any device which can convert analogue signal (continuous natural sound waves) into a digital format
They are used together with analogue sensors (e.g.a microphone)
They measure and record the amplitude of the sound wave at set intervals
The principles of operation of a digital to analogue converter (DAC)
A Digital to analogue converter (DAC): Any device which can convert a digital audio signal into an analogue signal (continuous natural sound waves)
Describe sampling rate (digital representation of sound)
The frequency you record the amplitude of a sound wave
number of samples per second is measured in hertz (Hz)
The more often you record a sample the smoother the playback will sound
Describe sample resolution (digital representation of sound)
Represents how many different gradations of amplitude can be represented in a digital wave form
sample resolution is stored it bits
For example, if a sample only measures 3 different gradations of amplitude, only 2 bits are required (2^2 = 8), however for a sample that measures 16 gradations of amplitude, 4 bits are required (2^4 = 16)
The nyquist theorem
If you want to produce an accurate recording you need to use a sampling rate which is at least double that of the highest frequency in the original signal.
Calculating sound sample sizes in bytes
Size of sample = (Number of samples per second) x (Number of bits per sample) x (Length of sample in seconds)
Gives answer in bits.
To find bytes, divide by 8
Purpose of MIDI and the use of event messages in MIDI
MIDI is a technical standard
It allows a wide range of electronic musical intruments, computers, etc. to communicate with each other.
It uses a MIDI controller to send and receive event messages to each device. The messages specify details, such as:
- Duration of note
- Pitch
- Volume change
- Vibrato
- Tempo synchronisation
The advantages of using MIDI files for representing music
- MIDI file uses far less disk space than a traditional digital recording
- Instruments can be recorded seperately and put together digitally
What is a pixel?
(picture element) is the smallest addressable element of a picture
How are bitmaps represented?
Digital bitmapped images are made up of pixels. Each pixel is represented by a binary number
Resolution, colour depth and size in pixels for bitmaps
Resolution: The number of dots per inch where a dot is a pixel
Colour depth: The number of bits stored for each pixel
Size in pixels: the width of an image in pixels x height of image in pixels
Calculating storage requirements for bitmapped images
Ignoring metadata:
Storage requirements = size of image x colour depth
gives size in bits
To find it in bytes, divide by 8
However, bitmap image files may also contain metadata
Typical metadata examples
Metadata is data about data
It is stored along with the actual bits which make up the image and increase the overall file size.
examples:
- width
- height
- colour depth
- file name
- etc
How do vector graphics represent images using lists of objects?
The properties of each geometric object/shape in the vector graphic image are stored as a list
Typical properties of objects examples:
- centre coordinates
- radius
- fill colour
- outline colour
- outline width
Advantages and disadvantages of vector graphics vs bitmapped graphics
Vector graphics
Advantages:
- File size is kept relatively small, regardless of scale
- will always scale without loss of quality
- great format for logos or images with simple shapes and colours
Disadvantages:
- Cannot easily replicate an image with continuous areas of changing colour
- Individual pixels cannot be changed
Bitmapped graphics
Advantages:
- Great format for storing full colour images taken on phone/digital camera
- Can manipulate individual pixels easily
- images photos can easily be altered, retouched etc.
Disadvantages:
- generally takes up more memory and file storage
- images dont scale very well, they become pixelated the larger they get
Why images, sound files, and text files are compressed
Files are compressed to reduce their size. Smaller files can be transferred faster between storage devices/over the internet
Lossy compression, advantages and disadvantages
Reduces the file size by remvoing data. Original cannot be reconstructed
Advantages:
* Greatly reduced file sizes
* The extent to which the file size can be reduced is not limited
Disadvantages:
* Loss in data, original cannot be reconstructed
* Quality of file is reduced
Lossless compression, advantages and disadvantages
File size is reduced in a way which results in no data loss
Advantages:
* No reduction in quality
* No loss of data
Disadvantages:
* Larger file sizes than lossy
* Limit to how much a file can be compressed
The principles behind run length encoding (RLE) for lossless compression
RLE reduces the size of a file by removing repeated info and replacing it with one occurance of the repeated info, followed by the number of times it is repeated
The principles behind dictionary-based methods for lossless compression
A dictionary containing repeated data is appended to the file.
Encryption definition
The process of scrambling data so that it cannot be understood if intercepted in order to keep it secure during transmission
Meaning of the words ‘cipher’, ‘plaintext’, ‘ciphertext’
Cipher:
A type of encryption method
Plaintext:
Unencrypted information
Ciphertext:
Encrypted information
In order to decrypt a ciphertext, you must know the encryption method and the key used to encrypt the information
Caeser ciphers
Caeser ciphers encrypt information by replacing characters. One character is always replaced by the same character.
There are two types:
Shift ciphers:
* all the letters of the alphabet are shifted by the same amount
* the amount characters are shifted forms the key
Substitution ciphers:
* Letters are randomly replaced
Caeser ciphers are easily cracked because:
- the frequency at which each character occurs can provide a clue as to which letter has been repaced with which
- once you discover one character, a shift cypher can be completely cracked as the key can be found.
Vernam ciphers
A Vernam cipher is a one-time pad cipher. This means each key should only ever be used once. It also requires the key to be random and at least as long as the plaintext that is to be encypted.
How the vernam cipher works:
1. Align characters of the plaintext and the key
2. Convert each character to binary (using an information coding system)
3. Applying a logical XOR operation to the two bit pattern
4. Converting the result back to a character
Why vernam cipher are not easily cracked:
* The key used with a vernam cipher is chosen at random
* The ciphertext is also random, and so the cipher is considered absolutely secure
Computational security of cyphers
All ciphers (other than the Vernam cipher) are, in theory, crackable, but not within a reasonable timeframe given current computing power.
Ciphers that use this form of security are said to rely on computational security