midterm 1 - diff file types Flashcards
lect 7
How does binary store numbers vs text file?
text file –> each digit is stored as indiv ascii char –> 1 byte per char
binary –> digit is stored as corresponding binary
pro/con of binary number storage over txt files?
pros of binary storage: uses as little bits per number because it doesnt use 1 whole byte per digit
cons of binary storage: more complicated method of differentiating separate number values (text file can use an ascii char to separate them)
how does a binary file know how to differentiate separate numbers
binary file sets max # of bits used per number value –> therefore every X bits, the system knows to read as start of new number
binary file number limits explained
since binary files allocate x bits per value –> sets max value that can be stroed
typical limit is 32 bits per #
sci notation representation may be used for larger values –> but introduces limit precision
example of binary number limits and their impact on systems
32 bit systems –> max of 32bits of unique usable values
2^32 bits of ram = 4gb
therefore 32bit systems can only use up to 4 gb of ram
how is txt file encoded and interpreted
NO HEADER for file identification
all code used is for a character
no standard txt encoding exists –> usually UTF8 but variations exist –> leads to txt files being incorrectly read depending on program limitations
data compression - broad definition
reducing the number of bits required to store encoded information –> reducing overall file byte size
when uncompressed –> encoded data is reconstructed to remake the original file
types of data compression
lossless - the original file is recinstructed with the exact same bits sequence
lossy - reconstructed bit sequence has mismatches –> causes data encoding errors. usually associated with multimedia files being transferred –> errors manifest as aliasing errors and artifacts
list the methods of compression encoding (6)
DP - SQRT
dictionary encode
predictive encode
symbol freq
quantization + modelling human perception
run length encode
transformations
symbol frequency compression
aka variable length coding
use algorithm to determine optimal bit pattern per symbol based on how frequently each symbol appears
more frequent = less bits –> compression
where a symbol = characters, values, etc
eg Huffman code
run length encoding compression
used for data that is repeated several times
(eg sampled data that doesn’t change often) –> temperature every minute
encodes the data point –> paired with a value of X repetitions
therefore compresses repeat sequences
dictionary encoding compression
substitute patterns with shorter symbol codes
analogous to saying “let X = (equation)”
a portion of the compressed file is required to provide meaning of these shorter symbol codes so that decompression software knows what the original code meant
predictive compression
uses current data to predict next data point –> calculate diff btw predicted and actual data pts
compression by only encoding failed predictions –> “predict the next data pt, but know it is not X”
OR
encode difference btw predict vs actual (decompress by predicting next value then adding back the difference)
transformation encoding compression
use math algorithm to transform raw data into other formats which are then more easily compressed
decompress the file –> transform BACK into original data format
quantization
used in multimedia –> decrease bit depth/resolution/hz
reduce quality to compress the file