Week 2 - Data Management Flashcards

1
Q

What is Data management?

A

It includes the collection, storage, retrieval, quality assurance, and security

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

explain data-information-knowledge-wisdom (DKIW)

A

Data: raw observations of the world
Information: data that has been processed to provide meaning
Knowledge: what makes possible the transformation of information into instructions or knowing how to do something
Wisdom: insight is integrated and actionable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is meta-data

A

Data that describes the properties or characteristics of end-user data and the context of those data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does meta-data do?

A

It enhances the searchability, categorisation and data management efficiency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Structured data

A

Strictly organised such that it is easily searchable - database with a rigid schema

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Unstructured data

A

Requires special handling - email body, social media post

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Semi-structured data

A

Mix of both structured and unstructured data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a database

A

An organised collection of logically related data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data management system

A

Data integrity: ensuring accuracy and consistency
Data security: protecting sensitive information
Scalability: adapting to growing amounts of data
Collaboration: enabling cross-functional access and analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the fundamental database operations?

A

Create, read (retrieve), update, delete
It forms the basis of data manipulation and access

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does ACID stand for?

A

Atomicity: all or nothing approach, smallest unit of transaction [buying concert ticket]
Consistency: ensuring that transactions bring the database from one valid state to another [library checkout]
Isolation: making sure transactions are processed independently [airplane seat tickets]
Durability: guarantees that once a transaction is committed, it will remain even in the case of a system failure [saving a paper]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

ACID are the principles that …

A

Ensure reliable transactions in a database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Tabular data

A

[+] ideal for small amounts of data
[+] easy to create and use
[-] not suitable for complex relationships, only 2 dimensional

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

CSV files

A

Text file often used for data exchange between different system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a relational data-base

A

It is a collection of tables (relation) that interact with each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does relational database enable

A

Enables complex queries and data manipulation

17
Q

What are its benefits?

A

Data integrity, flexibility, scalability, security

18
Q

What is the process of normalisation

A

Process of organising data in a database to reduce redundancy

19
Q

How do you normalise data?

A

Divide large tables into smaller, related tables and defining relationships between them

20
Q

What are the goals of normalisation

A
  1. Improve data integrity and consistency
  2. Optimise storage and query performance
21
Q

Columnar data base

A

Data stored in columns. Typically used for data warehousing

22
Q

What are the pros and cons of columnar databases?

A

[+] efficient data compression - data in column, same type
[+] queries that sum, count, average or otherwise aggregate values
[-] not suited for OLTP
[-] slower for write operations

23
Q

What are document databases?

A

No fixed schema. No SQL database designed to store, retrieve, and manage document -oriented information

24
Q

Schema Flexibility

A

Document databases typically allow for a flexible schema within the documents. Documents within the same collection may have different fields and structures

25
Q

Hierarchical Data representation

A

Documents can contain nested structures, arrays, and other complex data types, making them suitable for hierarchical data

26
Q

Distributed architecture

A

Distributed and can scale horizontally across multiple nodes or clusters

27
Q

Indexing and querying

A

Allowing for efficient search and retrieval of documents

28
Q

Lack of ACID transactions

A

Databases may not support full ACID properties across multiple documents or collections

29
Q

Graph Databases

A

Data entities represented as nodes and the relationship between them represented as edges

30
Q

Graph schema

A

Defines the types of nodes and relationships, while others are schema-less allowing for more flexibility

31
Q

Graph query language

A

Support GQL like Cypher to enable efficient querying and manipulation of graph structure

32
Q

Directed and undirected graphs

A

Directed - one way relationship
Undirected - two way relationship

33
Q

Graph databases are useful for

A

Interconnected relationship. Where the relationship is the defining characteristic of your data and your query is based on the relationship itself. The graph represents the model in a natural and intuitive way.

34
Q

Uses of graph databases?

A

Connections (linked in) knowledge discovery, recommender systems (TikTok)