Comp 1314 Data Management 1 Flashcards

1
Q

What are the three key features of an OS?

A

Multi-user: Many users same system at the same time
Multi-processing: Multiple processors at the same time
Multi-tasking: Multiple processes at the same time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the philosophy of UNIX?

A

Set the cultural norms for minimalistic modular software development

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is special about UNIX?

A

Programs can be stringed together but it is secure as programs do not know about each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Linux based on?

A

UNIX

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is piping?

A

Redirecting the input and/or output of a program

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the symbol for piping?

A

|
Program1 | Program2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the Input/Output redirection symbol?

A

<: Input
>: Output
»: Append

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

List some programs that you can use

A

Head
Tail
sort
wc
uniq
du
xargs
more
cut
find
tar
gzip
nohup
parallel
basename

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the environmental variables?

A

A set of variables that every running process has access to

Set using the export command

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you write a for loop in bash?

A

for var in directionory;
do
something
done;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you do a while loop in bash?

A

cat file | while read line;
do
something
done;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does grep do?

A

Searches for the input provided in the text provided

Can even use regular expressions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does SED stand for?

A

Text Stream Editor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does SED do?

A

Reads the input provided and modifies it as specified by the command and then writes that to the standard input

sed [options] command [file]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does scp do?

A

Securely copies files from a secured server

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does awk do?

A

For processing structured text files into rows and columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are wildcards for?

A

Allows multiple arguments for commands (accessing multiple files becomes easy) using regular expressions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the three permission categories?

A

u: users
g: group
o: others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How do you change the permissions of a file?

A

chmod u+=x
chmod g-w*

You can also specify a decimal number which will be converted into binary and then assigned where each bit is a 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Who can change the permissions of a file?

A

The owner of a file and the superuser

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is everything in UNIX?

A

Either a file or a process, this include directories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How can you move a process to the background?

A

bg

Adding a & symbol will begin the process in the background

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the two options with the kill command?

A

SIGTERM: A gentle request to kill, giving the process time to close
SIGKILL: A hard-request with no clean up time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How can you prevent all processes terminating when you log off?

A

With screens

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is a CSV file?

A

Comma separate values
Easy to manipulate and process but new lines and commas in text can be problematic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What does YAML stand for?

A

Yet Another Markup Language

Fixed Row and column format
Widely used within config files and passing messages between applications

Made of key-value pairs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What does JSON stand for?

A

JavaScript Object Notation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What are the benefits of being understandable as a machine?

A

Searching
Aggregation and Summarisation
Prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is metadata?

A

Data that is useful for the machine, but not for the human.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is markup?

A

Contains a bunch of semantic links to other pages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is SGML?

A

Standard Generalised Markup Language

A superset of all markup language

Separate structure from content

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is XML?

A

Extensible Markup Language

Designed to carry data, not display data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What must XML have?

A

A root element

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is the issue with namespaces?

A

XML files can reference other XML files and people can define two tags with the same name

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

How is the namespace problem resolved?

A

Using the special namespace tag to separate the two documents apart.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is a URI?

A

A unique full name of the namespaces, by convention this is used as the URL but they are not the same.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What does a schema enable us to do?

A

Gives meaning to the structured data we work with, lots of people have already done this.
So there are a lot of different schemas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Give an example schema

A

<Pizza>
<base></base>Pan</Base>
<Cheese>Cheddar</Cheese>
</Pizza>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

How can we specify the number of occurrences in a sequence?

A

minOccurs
maxOccurs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What does DTD stand for?

A

Document Type Definition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What is XSD?

A

XML Schema Definition
Every element is either simple or complex

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

How many parsers does HTML5 contain?

A

2

One which can parse HTML or XML

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Why is XML not used for large scale solutions?

A

We could easily loose all of the data, database can mitigate this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What is a DBMS?

A

Database Management System

A collection of software that manages a database

44
Q

Why do we need DBMS?

A

Handles all the data exchange and creates data independence

Applications don’t need to care about the database

45
Q

What is logical and physical independence?

A

Logical: Protect from change in the data structure

Physical: Protect from changes in how the data is stored

46
Q

What does the DBMS manage?

A

The data model
Store large amounts of data persistently, conveniently and efficiently
Transaction management
Concurrency control
Access Control
Resiliency

47
Q

What is a data model?

A

A collection of mathematical concepts for describing data

Every model has a data language

48
Q

What are the two parts of a data language?

A

A data definition language
A data manipulation language

49
Q

What is XPATH?

A

The Data manipulation language for XML

50
Q

What properties does a relation have?

A

Each row represents a k-tuple of R
The ordering of rows is immaterial
All rows are distinct
The order of attributes should not be significant
The significance of each column is conveyed by the name we give it.

51
Q

What is a k-ary relation scheme?

A

A relation name and an ordered sequence of k attributes

52
Q

What is an instance of a relation scheme?

A

A relation that conforms to the schema

Arities match
Data types of attributes match

53
Q

What is a key?

A

A set of attributes where any two different tuples cannot have the same value

54
Q

What is a superkey?

A

For every relation R, X is a set of attributes of R, X is a superkey of R, if X -> A, for every attribute A of R

The set of all attributes will always be a superkey

55
Q

What is a candidate key?

A

X is a candidate key of R where X is a minimal superkey of R

Where X is a superkey and there is no superkey Y such that Y is a subset of X

56
Q

Describe the steps of the closure algorithm

A

X+ = {Ai : F ⊧ X → Ai } is the closure of X with respect to F * F ⊧ X → Y if and only if Y ⊆X+

INPUT: R, attribute set U, F, X is a subset of U
OUTPUT: X+ = {Ai : F⊧X → Ai }
Repeat until: X_n+1 == X_n
R(ABCDEFG)
F = {AB -> CD, C -> EG, D->H}
0. X={A,C}
1. X_0={A,C} only path is to EG
2. X_1={A<C<E<G}
3. X_2 = X_1
4. This imples that X is not a superkey

  1. X={A,B}
  2. X_0={A,B}
  3. X_1={A,B,C,D}
  4. X_2={A,B,C,D,E,G,H}
  5. X_3=X_2
  6. As X_3 = R, this implies X is a superkey
  7. X is also the only candidate key, this can be determined by apply the algorithm to each subset
57
Q

Why is bad database design a problem?

A

It can lead to anomalies

58
Q

What is a functional dependency?

A

When two tuples within a relation agree on the values of A1,…,An then they also agree on B

A –> B

So the right side cannot change where the left side is the same.

59
Q

What is splitting/combining in Functional dependencies?

A

We can split bigger functional dependencies into smaller ones and vice versa.

60
Q

What is normalisation for?

A

Avoiding anomalies

61
Q

Give examples of anomalies

A

Redundancy
Update anomalies
Insert anomalies
Deletion anomalies

62
Q

What is 1st Normal Form?

A

A relation that contains only atomic values with no repeating groups

63
Q

What is 2nd Normal Form?

A

No partial key dependencies
Every non-key attributes is dependant on all attributes of all candidates keys

64
Q

What is 3rd Normal Form?

A

All attributes are determined only by the keys

65
Q

What is transitive dependence?

A

A -> B -> C but B does not determine A

66
Q

What is Boyce-Codd Normal Form?

A

Every determinant is a candidate key

67
Q

What are the benefits and drawbacks of BCNF?

A

Advantages:
No redundancy
Efficiency
No duplication
Changes can cascade across relations

Disadvantages:
More tables
More complex
More relationships
Queries become more complex

68
Q

What is SQL?

A

Structured Query Language
Converts a data model to a physical databases by specifying a DDL and a DML

69
Q

What is SQLite?

A

All of the database contained within a single file.
Simple databases

70
Q

What types does SQL use?

A

INTEGER
REAL
TEXT
BLOB
NULL

71
Q

What joins are possible in SQLite?

A

LEFT: All of 1 and matches in 2
RIGHT: All of 2 and matches in 1
FULL OUTER: Everything from both

72
Q

What are views?

A

Virtual tables which have a name and can run queries. Pre-joined for convenience

73
Q

What are indexes?

A

Data structures associated with tables to support queries, logically ordered by the values of the key.

74
Q

Why are indexes used?

A

To improve the performance of looking up data

75
Q

How is an index created?

A

CREATE INDEX <name> ON <table></name>

76
Q

What does NO SQL stand for?

A

Not Only SQL
Less adherence to ACID and schemas

77
Q

What does NOSQL do?

A

Storage and retrieval in a non-tabular format
More flexible with various types of data.

78
Q

Why is NoSQL more flexible?

A

As it has no fixed schema and no fields are necessary

79
Q

What is scalability?

A

Ability of a system to handle increasing amounts of workload or data by adding resources to the system.

80
Q

What is vertical scalability?

A

Scaling by adding more CPUs to handle increased workload.

81
Q

What is horizontal scalability?

A

As workload increases, we distribute between multiple nodes and so we add more nodes.

82
Q

What architecture does NoSQL operate as?

A

Distributed Architecture

83
Q

What is sharding?

A

Partitioning data across multiple nodes to distribute the workload

84
Q

What does sharding require and allow?

A

Every nodes is responsible for the data on it, and a request can be handled in parallel. The system will require a shard management system to keep track of shards.

85
Q

What is a shard key?

A

An attribute used to determine how data is distributed between nodes

86
Q

Why is data replicated between multiple nodes?

A

To provide fault tolerance and high availability

The replication factor determines the number of replicas

87
Q

What is CAP theorem?

A

Consistency - Every read is the most recent
Availability - Every request receives a response without guaranteeing consistency
Partition Tolerance - The system continues to operate despite network partitions.

88
Q

What are trade offs with CAP?

A

You can’t have CAP, so we have CP or AP.
If down then availability is sacrificed - CP

If down then consistency is sacrificed - AP

89
Q

When is CA possible?

A

In a single-node system or partition free environment.

90
Q

What is BASE?

A

BA: Basically Available
System guarantees availability, will always respond to a request

S: Soft State
The state of the database changes over time even with no new input

E: Eventual Consistency
System does not guarantee immediate consistency across all nodes after a write operation, instead it will ensure that id no new updates are made, all nodes will eventually converge to the same state.

91
Q

What are the types of NoSQL database?

A

Key-value
Document
Graph
Column-family

92
Q

What is key-value style NoSQL?

A

Storing data as a collection of key-value pairs. Each item has a unique key used to access it. With no schema

93
Q

What is Document style NoSQL?

A

Semi-structured format, typically in JSON, BSON or XML.
Each document is self-contained unit of data that can contain key-value pairs. Allows hierarchical data.

94
Q

What is graph style NoSQL?

A

Storing data in the form of a graph with nodes and edges. This is good for interconnected data with lots of relationships.

95
Q

What is a column-family style NoSQL?

A

Organises data into columns rather than rows. Handles large volumes of data with support for distributed architectures,

Each row can have a different set of columns

96
Q

What are hybrid NoSQL databases?

A

Combining features from multiple types to offer flexibility to handle diverse data models.

97
Q

What factors should be considered when picking a type of NoSQL database?

A

Current Data
Application requirements
Evaluate NoSQL database types
Consider consistency models
Access scalability and performance
Examine operational conditions
Evaluate ecosystem and integration
Perform a proof of concept.

98
Q

Explain the basics of using XPATH

A

// Skip to node
/ To next node
@Attribute
[Filter by something]
text() Only the text stored within the attribute

99
Q

How do indexes work?

A

The chosen attribute for the index becomes an index key. The system is implemented as either a binary tree or hash indexes.

DBMS plans the best way to execute a query with it’s indexes.

100
Q

What is the drawback of indexes?

A

They slow down INSERT, UPDATE and DELETE queries as they must account for the indexes and update them as well.

101
Q

How do you create views?

A

CREATE VIEW Name AS SELECT …

102
Q

What is ACID in relational database transactions?

A

Atomicity
Consistency
Isolation
Durability

103
Q

What is atomicity in a transaction?

A

Each statement within a transaction is treated separately. Either the entire statement is executed though, or none of it is.

104
Q

What is consistency in a transaction?

A

Changes to tables can only happen in predefined, predictable ways.

105
Q

What is isolation in a transaction?

A

Isolation of transactions between multiple users to ensure that transactions from multiple users don’t affect others.

106
Q

What is durability in a transaction?

A

Ensures that changes to your data model by successful transactions are saved, even in the event of system failure.

107
Q

Give an example of each type of NoSQL database.

A

Column-family:
Cassandra
Key-value:
Dynamo DB
Document:
MongoDB
Graph:
Neo4j