3.5 data representation Flashcards
natural numbers set
- symbol N
- integer
- positive
integers set
- symbol Z
- positive and negative
- cannot be fractional
real numbers set
- symbol R
- positive and negative
- irrational, fractional
rational numbers set
- symbol Q
- can be represented as fractions
- positive and negative
irrational numbers set
- no specific symbol
- cannot be represented as fractions
ordinal numbers
- natural number that describes the numerical position of a value
- used for ordering
why hexadecimal used
- more compact when displayed
- easier for people to remember
- lower likelihood of error when typing in data
- saves programmer time writing in data
kibi, Ki
2^10 (1024) bits, kilobytes (10^3) but accurate
mebi, Mi
2^20, megabytes (10^6) but accurate
gibi, Gi
2^30, gigabytes (10^9) but accurate
tebi, Ti
2^40, terabytes (10^12) but accurate
unsigned binary
- positive integers
- min and max values for n bits are 0 and (2^n) - 1 respectively
signed binary
- negative (and positive) integers
- range of integers that can be represented by two’s complement -2^(n-1) to + (2^(n-1)-1)
practice two’s complement representation of signed integers
- most significant bit (leftmost) has a place value of -2^(n-1), where n is the number of bits
- for pos numbers, left bit has to be 0
- for neg numbers, left bit has to be 1
find negative equivalent of a positive number in two’s complement
- flip the bits and add 1 (literally 1. rightmost bit)
why was unicode introduced
- support a larger range of chars
- to facilitate communication / text in different languages
analogue data
- continuous
- no limits to values data can take
can change as freq. as required
analogue signal
consists of a continuously variable voltage
digital data
- discrete
- can only take specified range of values
- can only change value at specified intervals
digital signal
representation of discrete values over time
digital to analogue converter
- reads bit pattern representing an analogue signal
- outputs an alternating, analogue, electrical current
analogue to digital converter
- analogue signal sampled at regular time intervals
- amplitude of wave at each interval measured
- measurement coded into fixed num of bits
sampling
- taking measurements of the level of the analogue signal (amplitude) at regular time intervals
- measurements assigned a binary pattern, stored in memory
sampling rate
- number of samples taken per second, measured in Hz (1Hz equal to 1 sample p/s)
- higher sampling rate = better quality of audio recording, as well as bigger file size
sample resolution
- number of bits (audio bit depth) used to represent each sample
- determines number of digital values that can be used during sampling
nyquist theorem equation
sampling rate ≥ 2fmax
ideal nyquist theorem answer
- to faithfully recreate analogue signal, sample rate at least 2x highest frequency of og sound
- i.e., a sound with freq. of 10kHz must be sampled at min of 20kHz in order to reproduce original
- reason for doubling to ensure sample covers complete range of peaks + troughs in analogue
MIDI file
consists of a list of event messages that explains
- what notes must be played
- when they should be played
- how long or loud each note should be
files do not store a digital representation of analogue sound, instead hold signals used to produce sound
MIDI (musical instrument digital interface) standard
creates sounds as requested either from instrument or piece of software, not live but synthesized sound
MIDI benefits
- significantly reduces amount of data transferred / more compact representation
- event messages can synchronize tempo, control pitch, change volume etc.
- no data lost about musical notes
pixels and colour depth
- picture element
- colour depth is number of bits assigned to a pixel in an image
- each value represents diff colour
- colour depth works through powers of 2
bitmap images and the con
image broken down into pixels, each of which has assigned binary value, enlarging results in blurry pixelated image
calculating image file size…
…produces minimum value / base file size, bitmap image files may also contain metadata which adds to file size
vector graphics and their benefits
- represent images using geometric objects and shapes
- properties of each shape / object stored in a list
- can be scaled without losing quality
- well suited to simple images which use shapes
- use less storage space
capturing an image
- light sensor measures intensity of colour in each pixel
- each measurement converted into binary code using ADC
- image analysed to identify runs / sequences of the same colour / value
- num of pixels recorded in the grid affects num. of bits used and therefore file size
sensing colour
- red/green/blue filters used with different sensors in camera to separate out these wavelengths
- intensity of colours falling on sensors is measured and stored to aggregate an RGB value
benefits of compressing data
- faster data transfer times
- less bandwidth used as transfer limits may apply
- buffering on audio/video streams less common
- less storage required
lossy compression
some info lost permanently, through reducing image res or lowering sample res of audio file
common lossy compression formats
mp3, jpeg
lossless compression
no loss of info, file size reduced without decreasing quality, patterns in data spotted + recorded instead of data itself, some limit to how much a file can be compressed
common lossless compression formats
png, zip
benefits of lossless compression
- file can be reproduced exactly as it was originally
- lossless data compression can be reversed
run length encoding
compresses and summarises data by collecting identical, consecutively-occuring binary runs by their binary value, followed by the amount of times they occur
rle - image
- image analysed to identify sequences of the same colour/value
- the colours/values and counts of run-lengths are stored
rle - sound
same sound or note played for fraction of a second could result in hundreds of identical samples, rle records one example of the sample + how many times it consecutively repeats
dictionary compression
dictionary containing records of repeated data appended to the file, order in which data recorded using the binary values in order of amount of instances in file, best used with large amounts of data as dictionary has to be in file
encryption
using an algorithm to convert a message into a form that is no understandable without the key to decrypt it
plaintext
unencrypted info
ciphertext
encrypted info
what must be known to decrypt ciphertext
encryption method used + key used to encrypt
caesar ciphers
encrypt info by replacing characters - one character always replaced by same character
shift cipher
all letters in alphabet shifted by same amount - amount shifted forms the key
substitution cipher
letters randomly replaced
caesar cipher effectiveness
can be easily cracked, freq. at which characters occur can be a clue to which they actually are
brute force attack
attempts to apply every possible key to decrypt ciphertext until one works
vernam cipher
one-time pad cipher, requires key to be random + at least as long as the plaintext to be encrypted
how does vernam cipher work
- aligning characters of the plaintext + the key
- converting each char. to binary (using an info coding system)
- applying logical XOR operation to the 2 bit patterns
- convert result back to a character
vernam cipher benefits
!!- frequency / statistical analysis of ciphertext reveals nothing about plaintext!!
- if implemented correctly, unbreakable (caesar cipher can be easily cracked)
- more possible keys
one-time pad
generated from physical/unpredictable phenomenon, like atmospheric noise or radioactive decay
when is error checking used
during data transmission
parity bit
- single bit added (as least or most significant bit) to a byte
- set to make total number of 1s in accordance to odd or even parity
- sender says what type of parity is used along with the transmitted data
even parity / odd parity
even parity - parity bit makes total num of 1s in data even
how does receiver perform error detection on received byte (even parity)
- if number of 1s in byte even, data (assumed to have been) received correctly / has not been corrupted
- if number of 1s in byte odd, data corrupted / incorrect
problems with parity bits
- may miss errors if an even number of bits are corrupted
- parity bits can only detect errors, not correct them
why might there be transmission errors when transmitting data?
- electrical interference
- power surges
- physical disruption / distortion
majority voting
- every bit transmitted multiple times (odd amount of times so not having to randomly decide which is correct)
- most commonly occurring value taken as correct when data received
why use majority voting
- can detect multiple errors
- more efficient at detecting errors (as parity bit system may miss errors if even number of bits corrupted)
- can identify as well as correct most errors that occur in transmission, due to the majority bits being taken as the correct value
problem with majority voting
volume of data transmitted increased = significantly increased transmission time
checksum
- piece of data added to a block of data to enable error detection
how do checksums work
- produced by applying checksum algo to a block of data
- checksum algo returns a value (the checksum)
- checksum transmitted with the data
- receiver recalculates the checksum from received data using same algo, compares values
check digit
a digit calculated (using an algorithm); from the other digits/letters (in the input sequence);
main purpose of check digits
- recognise and prevent human errors when entering or assigning identification numbers
why are messages encrypted
- prevent unauthorized users understanding intercepted data
- prevent message alteration; identify authentic users