3 - Fundamentals of Data Representation Flashcards
To not fail GCSEs
Number base
The number of unique digits available in a numbering system. E,g: In decimal these are 10 individual digits available (0,1,2,3,4,5,6,7,8,9), so it is also known as base 10
Computer binary
Computers use binary to represent all data and instructions. Whatever a user enters, be it number, text, image, sound, or video, it is ultimately stored as a string of 0’s and 1’s
Binary to decimal conversion
Use the table thing: 128 64 32 16 8 4 2 1
Place binary underneath and add to number
E.g: 11001010 = 202
since 128 + 64 +8 + 2 = 202
Decimal to binary conversion
Put 1 into biggest number it can go into, then do the same going down the 128 number thing
E.g: 85 = 01010101
Hexadecimal
One hexadecimal digit can always be translated to four binary digits. Hexadecimal is used as shorthand, since it is quicker to write and less prone to being misread
Hexadecimal examples table
Binary : Decimal : Hex
0000 : 0 : 0 0001 : 1 : 1 1001 : 9 : 9 1010 : 10 : A 1111 : 15 : F
To convert from binary to hexadecimal
- We’re gonna convert 011110 to decimal
- If binary number has a number of digits divisible by 4 (e.g: 4,8,12, etc) it can be left alone. Else add 0’s to left until the number of digits is div by 4. We add 2 0’s to left to our number
- Next split the number into ‘nibbles’ of four bytes each
- Finally convert each nibble separately, using the hex table
011110
00011110
0001 1110
0001 = 1
1110 = E
So 011110 = 1E
To convert hexadecimal to binary
- We’re gonna convert A6
- Each hex digit translate to binary nibble, use table
- Attach nibbles together
A6
A = 1010
6 = 0110
1010 0110
So A6 = 10100110
To convert between decimal and hexadecimal numbers
Best way is to go through a binary number
So either
- Decimal > Binary > Hex
Or
- Hex > Binary > Decimal
Bit (b)
A single binary digit // Either a single ‘1’ or ‘0’
Byte (B)
A sequence of 8 bits // An individual keyboard character such as ‘#’ or ‘k’
Kilobyte (KB)
Approx 1,000 bytes // A paragraph of text containing around 200 words
Megabyte (MB)
Approx: 1,000 KB or
1,000,000 B
// Around 1 min of average quality mp3 music
Gigabyte (GB)
Approx: 1,000 MB or
1,000,000 KB or
1,000,000,000 B
// About 90 mins of standard definition video
Terabyte (TB)
Approx: 1,000 GB or 1,000,000 MB or 1,000,000,000 KB or 1,000,000,000,000 B // Depending on quality - several hundred hours of video
Binary Shift
Moving values of a binary number left or right
a. 00011000 > Shift right 1 > 00001100
b. 00001100 > Shift left 2 > 00110000
Shifting loss/gain
When you shift left or right, any bits that ‘fall off’ the end are lost forever; any bits that join the number at the other end are always ‘0’
Shifting
Shift Left:
Shift 1: X2
Shift 2: X4
Shift 3: X8
Shift Right:
Shift 1: Div 2
Shift 2: Div 4
Shift 3: Div 8
Adding binary
0 + 0 = 0
0 + 1 = 1
1 + 0 = 1
1 + 1 = 10 (10 is binary for 2, need to carry)
1 + 1 + 1 = 11 (11 is bianry for 3, and you will need to carry
10110101 + 00111100
- Stack the numbers
- Add the pairs
- Result should be 11110001
Character
A single symbol, such as a letter, number, symbol or space. Most keys on a keyboard cause one character to appear on screen
Character set
A list of all characters recognised by a computer system. Each character has a corresponding code, and the same codes are used by all computers that use the same character set. ASCII (American Standard Code of Information Interchange) and Unicode are two common character sets
Character, ASCII and Unicode
Character Value : ASCII : Unicode
A : 100 0001 : 0000 0000 0100 0001
a : 110 0001 : 0000 0000 0110 0001
# : 010 0011 : 0000 0000 0010 0011
3 : 011 0011 : 0000 0000 0011 0011
Character groups
Characters are commonly grouped and run in the order that you would expect, i.e. ‘A’ in ASCII has a decimal representation of 65. ‘B’ is 66, ‘C’ is67, ‘D’ is 68, etc. In lower case, ‘a’ is 97, ‘b’ is 98, ‘c’ is 99, etc. Numerals (0, 1, 2, 3, etc.) are similarly grouped together
Character set bits
ASCII uses seven bits, meaning 128 different characters (27) can be represented.
Unicode uses sixteen bits, meaning 65,536 different characters (216) can be represented. Using Unicode instead of ASCII gives you access to far more alphabets, including Chinese, Japanese, Arabic and Russian, but more storage space is required
Representing Images
An image is divided into pixels, each of which is a tiny dot on screen. A pixel can only be one colour at a time, and when a picture is saved, the colour of each individual pixel must be stored in binary. The more bits that are used to store each pixel, the more colours are potentially available
Pixel
Short for picture element, this term refers to the smallest possible unit within an image or on a screen. A pixel cannot be divided up into smaller units, and a pixel can only ever by one colour at a time. VDUs display output using millions of pixels
More bits?
If more colours are needed, more bits are needed. Many images store 24 bits, three bytes, for each pixel
Amount of storage required
The amount of storage required for an image depends on a number of factors including:
- Colour depth
- Size in pixels
Colour depth
A measure of how many colours are available; the more colours that are available, the more bits that must be assigned to store each pixel
Size in pixels
The number of pixel in height and width for an image. More pixels require more storage space than a lower-res image, but result in an image of higher quality
Calculating Image size
W - width of img (in px)
H - Height of img (in px)
D - Colour depth, n.o of bits used to store each pixel
File size in bits: W x H x D
File size in bytes: (W x H x D) / 8
Convert file size from B to KB or B to MB
Bytes to Kilobytes:
Divide the number of bytes by 1,000
Bytes to Megabytes:
Divide the number of bytes by 1,000,000
Analogue
Sound is an analogue signal, which is the opposite of digital. A comp usually works with digital signals, processing 0s and 1s. An analogue signal is continuously variable. It might be one of two values, such as 1 or 0, or anything in between, such as 0.817. Sound is an analogue signal, since frequencies do not occur at set points
To process sound
Analogue data needs to be converted to digital in order to be stored and processed. This is done by taking regular samples of the sound’s amplitude. With sound, thousands of samples are taken per second, each being a measure of the amplitude at a specific point in time
Sample rate
A measure of how often a sample is taken, measured in Hertz (Hz). 1 Hz means once per second, 1MHz means one million times per second. A sound wave with more samples per second has a higher sampling frequency
High Sampling Rate
A high sampling rate results in higher accuracy but requires more storage space
Sample resolution
Not to be confused with screen resolution, which is a measure of the number of pixels. Resolution is sound sampling refers to how many bits are required to store each sample
Calculating sound file size
Sampling rate - Number of samples per sec
Sample res - Number of bits used to store each sample
Seconds - the length, in seconds, of the whole sound file
File size in bits: Rate x Res x Seconds
File size in bytes: (Rate x Res x Seconds) / 8
Compression
Techniques to reduce the size of a file, so that it takes up less storage space and can be transmitted across a network more quickly. There are different types of compression
Choice of compression
Choice of compression type might depend on a number of factors:
- If data needs to be precise, such as in money transactions, lossless compression would be used
- If data does not need to be precise, typically as in photos or music, lossy compression might be considered to save disk space and transmission time
Run length encoding (RLE)
A form of compression that is effective when dealing with repeating data. Instead of storing each item of repeated data, the data is stored once, along with how many times it repeats
What RLE does
- Is lossless compression
Effective in storing or transmitting a file that contained lost of repeated data such as:
- a txt file that contains repeated characters e.g: zzzzz
- An img file where lots of adjacent px are identical colour
- Sound file containing stretches of complete silence
RLE example
Uncompressed: QQQQQQQQWWWWWWEEEEEEEEEEEEEEEQQQQQQQQ
Compressed: 8Q6W15E8Q
RLE larger file
Uncompressed: QWERTY
Compressed: 1Q1W1E1R1T1Y
Because no letters were repeated in the uncompressed file, the ‘compression’ process has actually resulted in a larger file
Huffman encoding
a form of compression that represents commonly used characters with smaller bit patterns, and rarely used characters with larger bit patterns
Huffman example
To write ‘razzmatazz’, using this encoding, the following bit strings would be used in order:
110 10 0 0 1110 10 111 10 0 0
or
110100011101011111000
An ASCII representation of the word ‘razzmatazz’ would have required 70 bits (10 chars x 7 bits per char). The Huffman encoded representation requires on 21 bits. Coz the most commonly used char ‘z’, is given the shortest code ‘0’. Ofc much of this saving would be lost due to the fact that the Huffman tree itself would also need to be transmitted. However, for larger amounts of txt, a substantial saving can be made in terms of file size, even when the size of the tree is factored in