Unit 4 : Exchanging Data Flashcards
What is compression
● The process used to reduce the storage space required by a file
● Particularly important for sharing files over networks or the Internet
● Increases the number of files that can be transferred in a given time
● Downloading a compressed file is faster than downloading the full version
What is lossy compression
● Lossy compression reduces the size of a file while also removing some information
What is lossless compression
● Lossless compression reduces the size of a file without losing any information
what is run length encoding
● A method of lossless compression
● Repeated values are removed and replaced with one occurrence followed by the number of times it should be repeated
● Relies on consecutive pieces of data being the same
● Doesn’t offer a great reduction in file size if there’s little repetition
What is dictionary encoding
● A method of lossless compression
● Frequently occurring pieces of data replaced with an index
● Compressed data is stored alongside a dictionary
● Dictionary matches frequently occurring data to an index
● Original data can be restored using the dictionary
What is encryption
● Used to keep data secure when it’s being transmitted
What is asymmetric encryption
● Two keys are used: public and private
● The public key can be published anywhere
● The private key must be kept secret
● Together, the keys are known as a key pair
● The keys are mathematically related to one another
● Messages encrypted with the public key can only be decrypted with the corresponding private key
● Encrypting a message using your private key verifies that the message was sent by you. If your public key can decrypt a message, then it must have been encrypted with your private key, which only you have access to.
What is Hashing
● An input (called a key) is turned into a fixed size value (called a hash)
● A vast number of algorithms, called hash functions, do this
● The output of a hash function can’t be reversed to form the key
● the keys, which can’t be reversed to gain the passwords.
● A hash table is a data structure which holds key-value pairs
● Hash tables can be used to lookup data in an array in constant time
● Hash tables are used extensively in situations where a lot of data needs to be stored with constant access times. For example, in caches and databases
● If two keys produce the same hash, a collision is said to occur
● Methods to overcome collisions include storing items together in a list under the hash value and using a second hash function to generate a new hash
● A good hash function should have a low chance of collision and should be quick to
calculate
● A hash function’s output should be smaller than the input it was provided
What is a relational database
● A relational database is one which uses different tables for different entities.
● An entity is an item of interest about which information is stored.
What is a flat file
● A flat file database consists of a single file.
● The flat file will most likely be based around a single entity and its attributes.
● Attributes are the categories about which data is collected.
● Flat files are typically written out in the following way:
Entity1(Attribute1, Attribute2, Attribute3 …)
What is a primary key
● The unique identifier which is different for each object added to the database.
What is a foreign key
● A foreign key is the attribute which links two tables together.
What is a secondary key
● A secondary key is used to enable a database to be searched quickly
● This will allow the table to be sorted on this attribute.
What are the different types of entity relationship modelling
● One-to-one: Each entity can only be linked to one other entity.
● One-to-many: One table can be associated with many other tables.
● Many-to-many: One entity can be associated with many other entities and the same applies the other way round
What is normalisation
● The process of coming up with the best possible design for a relational database is called normalisation.
● Normalisation tries to accomplish the following things:
○ No redundancy (unnecessary duplicates)
○ Consistent data throughout linked tables.
○ Records can be added and removed without issues.
○ Complex queries can be carried out.
What are the 3 types of normalisation
- First normal form
(No attribute can contain more than a single value) - Second normal form
(● No partial dependencies.
● Is in first normal form. ) - Third normal form
(● Is in second normal form.
● Contains no non-key dependencies.
● A non key dependency is when the attribute depends on the value of the primary key and nothing else. )
What is indexing
● Method used to store the position of each record when ordered by a certain attribute.
● Used to look up and access data quickly.
● Primary key is automatically indexed.
What is capturing data
● Data needs to be input into the database and there are various ways of doing this.
● The chosen method is always dependent on the context.
● Data may need to be manually entered or scanned using methods such as Magnetic Ink Character Recognition (MICR) which is used with cheques.
What is selecting and managing data
● Selecting the correct data is an important part of data preprocessing.
● This could involve only selecting data that fits a certain criteria.
● Collected data can be managed using SQL to sort, restructure and select certain sections.
What is exchanging data
● Exchanging data is the process of transferring the data that has been collected.
● One common example of this is EDI (Electronic Data Interchange).
What is transaction processing
● A transaction is defined as a single operation executed on data.
● Transactions must be processed in line with ACID.
What is ACID
Atomicity:
● A transaction must be processed in its entirety or not at all.
Consistency:
● A transaction must keep the referential integrity rules between linked tables.
Isolation:
● Simultaneous execution of transactions must lead to the same result as if they were executed one after the other.
Durability
● Once a transaction has been executed it will remain so.
What is record locking
- The process of preventing simultaneous access of records in a database.
- This is used to prevent inconsistencies or a loss of updates.
- If anyone tries to access the same record they will not be able to.
- The biggest problem with this is deadlock.
What is redundancy
- The process of having one or more copies of the data in physically different locations.
- This means that if there is any damage to one copy the others can be recovered.