Week 2 - Data Management Flashcards

1
Q

What is Data management?

A

It includes the collection, storage, retrieval, quality assurance, and security

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

explain data-information-knowledge-wisdom (DKIW)

A

Data: raw observations of the world
Information: data that has been processed to provide meaning
Knowledge: what makes possible the transformation of information into instructions or knowing how to do something
Wisdom: insight is integrated and actionable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is meta-data

A

Data that describes the properties or characteristics of end-user data and the context of those data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does meta-data do?

A

It enhances the searchability, categorisation and data management efficiency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Structured data

A

Strictly organised such that it is easily searchable - database with a rigid schema

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Unstructured data

A

Requires special handling - email body, social media post

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Semi-structured data

A

Mix of both structured and unstructured data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a database

A

An organised collection of logically related data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data management system

A

Data integrity: ensuring accuracy and consistency
Data security: protecting sensitive information
Scalability: adapting to growing amounts of data
Collaboration: enabling cross-functional access and analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the fundamental database operations?

A

Create, read (retrieve), update, delete
It forms the basis of data manipulation and access

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does ACID stand for?

A

Atomicity: all or nothing approach, smallest unit of transaction [buying concert ticket]
Consistency: ensuring that transactions bring the database from one valid state to another [library checkout]
Isolation: making sure transactions are processed independently [airplane seat tickets]
Durability: guarantees that once a transaction is committed, it will remain even in the case of a system failure [saving a paper]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

ACID are the principles that …

A

Ensure reliable transactions in a database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Tabular data

A

[+] ideal for small amounts of data
[+] easy to create and use
[-] not suitable for complex relationships, only 2 dimensional

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

CSV files

A

Text file often used for data exchange between different system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a relational data-base

A

It is a collection of tables (relation) that interact with each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does relational database enable

A

Enables complex queries and data manipulation

17
Q

What are its benefits?

A

Data integrity, flexibility, scalability, security

18
Q

What is the process of normalisation

A

Process of organising data in a database to reduce redundancy

19
Q

How do you normalise data?

A

Divide large tables into smaller, related tables and defining relationships between them

20
Q

What are the goals of normalisation

A
  1. Improve data integrity and consistency
  2. Optimise storage and query performance
21
Q

Columnar data base

A

Data stored in columns. Typically used for data warehousing

22
Q

What are the pros and cons of columnar databases?

A

[+] efficient data compression - data in column, same type
[+] queries that sum, count, average or otherwise aggregate values
[-] not suited for OLTP
[-] slower for write operations

23
Q

What are document databases?

A

No fixed schema. No SQL database designed to store, retrieve, and manage document -oriented information

24
Q

Schema Flexibility

A

Document databases typically allow for a flexible schema within the documents. Documents within the same collection may have different fields and structures

25
Hierarchical Data representation
Documents can contain nested structures, arrays, and other complex data types, making them suitable for hierarchical data
26
Distributed architecture
Distributed and can scale horizontally across multiple nodes or clusters
27
Indexing and querying
Allowing for efficient search and retrieval of documents
28
Lack of ACID transactions
Databases may not support full ACID properties across multiple documents or collections
29
Graph Databases
Data entities represented as nodes and the relationship between them represented as edges
30
Graph schema
Defines the types of nodes and relationships, while others are schema-less allowing for more flexibility
31
Graph query language
Support GQL like Cypher to enable efficient querying and manipulation of graph structure
32
Directed and undirected graphs
Directed - one way relationship Undirected - two way relationship
33
Graph databases are useful for
Interconnected relationship. Where the relationship is the defining characteristic of your data and your query is based on the relationship itself. The graph represents the model in a natural and intuitive way.
34
Uses of graph databases?
Connections (linked in) knowledge discovery, recommender systems (TikTok)