Data Processing and Information Flashcards
Differences between data and information
Data is raw number, text, images, symbols or sounds that are in unorganized form, that have no meaning on its own.
Information is data that has been processed, given meaning and context to, thus can be understood on its own.
Direct data
Data that is collected for a specific purpose for which it will be used for.
Examples: Questionnaires, Interviews and Data lodging
Advantages:
-Collected data is relevant to purpose
-Original source is verified
-Date is up to date
-Date more likely to be unbiased
Disadvantages:
-May take a long time to collect large samples of date and will be difficult
-Costly compared too indirect data
Suitability of questionnaires
Questionnaires is used to gather specific data, opinions. can be used on large number of respondents, and statistical analysis can be carried out.
Online questionnaires enable quicker analysis of data, as user fills the data online, then it is directly filled into the database, thus saves time and is cost effective as no third party is required to enter the data into the database.
Suitability of Interviews
The questions are directly asked to the respondents, thus can ask respondents to elaborate an answer if required
Data lodging
Using sensors to gather data
Indirect data
Data that has been collected for a different purpose (secondary source)
Examples:
-Electoral register: where business collect personal information that is then used by third parties
-Weather data
Advantages:
-Quick and easy to gather data
-Data is immediately available, and large samples for statistical analysis is more likely to be available.
Disadvantages:
-Required data may not exist, irrelevant or additional data may require sorting, thus is costly
-Data not up to date
-Original source of the data is not verified
-Data more likely to be biased, due to unverified source
Factors that affect the quality of information:
-Accuracy: Inaccurate infromation is not good, data must be accurate in order to be considered of good quality
-Relevance: Information must be relavant to its purpose, otherwise users must search through irrelavant information to find data that is required.
-Age: Information must be up to date in order to be useful
-Level of detail: Right amount of information is required to be good quality. Too much, means it is difficult to find exact information required, too little means cannot use it correctly.
-Completeness of the information: Full information needs to be provided in order to be of good quality, not having all the information means it cannot be used properly.
The need for encryption:
Encryption: Scrambling data so it cannot be understood without a decryption key, making it unreadable if intercepted.
Thus encryption helps protect private and sensitive data against hackers.
Methods of encryption:
-Symmetric (using private key only): Requires both sender, and recipient to posses a secret encryption and decryption key (private key). Requires the secret key to sent to the recipient.
-Asymmetric (using private and public keys): Includes a public key which is available to anyone sending data, and a private key that is known only to the recipient. The key is the algorithm required to encrypt and decrypt the data.
Encryption protocols
-Secure Socket Layer (SSL): Itis the security method used to secure websites
-Transport Layer Security (TLS): Encrypts data sent over the Internet to ensure that hackers are unable to see what you transmit
-Asymmetric encryption is used for SSL, and once SSL has established an authenticated session, the client and server will create symmetric keys for faster secure communication.
The use of SSL/TLS in client- server communication:
When a browser requests a secure page, it will check the digital certificate to ensure that it is trusted, valid and that the certificate is related to the sire which it originates. The browser then uses a public key to encrypt a new symmetric key that is sent to the web server. The browser and web server can then communicate using a symmetric encryption key, which is much faster than asymmetric encryption.
Uses of encryption
Protection of data such as on a hard disk, email or in HTTPS websites
-Disk encryption is used in hard disks and other storage media, It encrypts every single bit of data stored on a disk, and data is usually accessed through a password or using a registered fingerprint.
-HTTPS - Hypertext Transfer Protocol Secureis the encryption standard used for secure web pages and uses SSL or TLS to encrypt/decrypt pages and information sent and received by web users.
-Email encryption uses asymmetric encryption. Encrypting an email will also encrypt any attachments.
-Encryption only scrambles the data so that if it is found, it cannot be understood. It does not stop the data from being intercepted, stolen or lost.
Methods of validation
-Presence check:used to ensure that data is entered(present).
-Range check:ensures that data is within a defined range. Contains two boundaries, the lower boundary and the upper boundary.
-Type check:ensures that data must be of a defined data type.
-Length check: Ensures data is of a defined length or within a range of lengths.
-Format check:ensures data matches a defined format.
-Lookup check:rests to see if data exists in a list. Similar to referential integrity.
-Consistency check:compares data in one field with data in another field that already exists within a record, to check their consistency.
Verification uses
-Visual checking:Visually checking the data if it matches the original source, by reading and comparing, usually by the user.
-Double data entry:Data is input into the system twice and checked for consistency by comparing.
-Parity check:
-Checksum:
-Hash total:
-Control total:
By using both validation and verification, the chances of entering incorrect data are reduced. If data that is incorrect passes a validation check, then verification check will likely spot the error
Batch Processing
Sets of data are processed all at once without user interaction
Advantages:
-It is a single automated process, which requires little human participation which reduces costs
-It can be scheduled to occur when there is little demand for computer resources (night)
-There are fewer repetitive tasks for the human operator
Disadvantages:
-There is a delay as data is not processed until the specific time period
-Only data of the same type can be processed since an identical, automated process is being applied to all the data
-Errors cannot be corrected until the batch process is completed
For data to be processed it is often stored first. Master files and trasnaction files are main file types used to store data. Master file stores data about a thing (person, place or object). Transaction file stores data about an event, such as an order, electricity usage and travel expenses.