Data and Information Flashcards
section 1-3 of the syllabus
Why has Unicode encoding system replaced ASCII?
ASCII is a 7 bit (per character) encoding system which can only represent 128 possible characters.
Unicode is 8 bit (per grouping)
What is the difference between backup and archive?
Type of data: Backup backs up the current data that is in use, while Archive is for the data that is no longer in use and only needed occasionally.
Purpose: Backup is used to restore data to the previous state after data corruption/loss/hardware failure while Archive is to free up storage space and move data to cheaper and long term storage
Deletion: Original data is not deleted after backup, but the original data in the regular storage is deleted after transferring to archive
Frequency: Backups are usually scheduled regularly while archive is only when needed, infrequently.
Why is version control and naming convention important?
What role does computing have on lifestyle and workplace for social and economic developments?
Social: Main source of news from newspaper, magazines, television to Social media or news apps. Main source of information from library, google to many new search engines like Bing, reddit, Quora, Wikipedia. Main means of publication from Magazines, blogging to Youtube, Wiki, Telegram. Main means of contact from SMS, Mail, to Whatsapp, Discord.
What is Data Redundancy?
It refers to the same data being stored more than once. The presence of duplicate data in multiple data files occurs when different divisions, functional areas, and groups in an organisation independently collect the same piece of information.
This wastes storage space and the time of different people keying in and maintaining the same data. This also increases the chance for inconsistency if a typo occurs.
What is Data Inconsistency?
This occurs when duplicate data in many files is maintained. With duplicated data, all changes must be duplicated everywhere the data are located.
1. There is an insert anomaly as we cannot insert a new item without all the related fields.
2. There is an update anomaly as the same field needs to be changed many times between tables/records.
3. There is a deletion anomaly as deleting 1 item may also cause information about another item to be lost.
What is Data Dependency?
Program Data Dependency: This is the tight relationship between data stored in files and the specific programs required to update and maintain those files. In a traditional file environment, any change in data requires a change in all the programs that access the data.
Partial dependency: For a composite primary key, any attribute in the table depends on only part of the primary key and not on the complete primary key.
Transitive: A column’s value relies upon another column through a second intermediate column.
Reasons for why this (based on context) database designs are usually normalised.
Minimise data duplication.
Eliminate data redundancy.
Eliminate update anomalies.
Eliminate data inconsistency.
Eliminate insertion anomalies.
Eliminate deletion anomalies.
List all the PDPA Obligation.
CAP N PRATA
1. Consent
2. Accountability
3. Protection
4. Notification
5. Purpose Limitation
6. Retention Limitation
7. Access and Correction
8. Transfer Limitation
9. Accuracy
SQL vs NoSQL
SQL: Each field has a fixed datatype(due to constraints), each record has the same field. The schema is difficult to change (due to constraints) and it maintains strict data integrity. It scales up (vertically), meaning that when the database grow, a more powerful machine is needed
NoSQL: Fields have no fixed datatypes and records can have different fields. It is more flexible as there is no fixed schema to follow and allows hierarchical data storage (differentiating between fast and slow storage device). It scales up and out (horizontally), as more machines can be added when database grow. This allows for more machines to be added to increase performance to match the increased demand.
What is a Primary key, Secondary key, Foreign key, and Composite key?
PK: It is an attribute or column that uniquely identifies a row in a table in a relational database that is in use.
SK: It is an attribute or column that uniquely identifies a row in a table in a relational database that is not in use.
FK: It is a key column that refers to the primary key of another table in the relational database.
CK: It is a primary key that has more than 1 column.
How does NoSQL address the shortcomings of a relational database (SQL)?
NoSQL allows for a more flexible way for the database to grow as it does not enforce constraints on data fields and types, this makes it better for the proposed scenario where requirements and schema changes. NoSQL allows for new fields to be added easily while for relational database, it have to be reconstructed. Moreover, since it scales horizontally, it is cheaper to increase more machine to meet the demand rather than upgrading it to a more powerful machine.
Describe methods to protect data.
Uninterrupted power supply: Ensures that data supply is well-maintained and well-supplied to prevent sudden data loss.
Air-conditioned room: Ensures that servers are well-cooled and do not overheat and malfunction
Backups: Regular backing up of data can safeguard against sudden loss of data and integrity of data is preserved and be restored.
Secondary power supply: Ensures that should the main power supply fails, there is a backup supply to supply the server until backup is made or the issue is resolved.
What are the disadvantages of using NoSQL?
NoSQL does not enforce constraints: the startup will have to implement more validation code in their service app.
There is likely to be more data duplication/it is harder to normalise data with NoSQL since collections cannot be joined in a query: startup has to be more careful to avoid breaking data integrity