Data Types Flashcards
What is structured data?
- Organised🗂️
- Labelled🏷️
- Tables format for 🔢
- Quick retrieval💨
What can you do with structured data?
- Sort 🔼
- Aggregate ➕
- Query with SQL📟
Whats an example of structured data?
CSV file
Collibra exported to Excel which happened to be in structured format, brought into google sheets for analysis
What is unstructured data?
no set structure/format
How can you analyse unstructured data?
Pre-processing methods:
Texting mining ⚒️
Natural Language Processing🗣️
Image Recognition🖼️
What’s an example of unstructured data?
customer reviews on social media
What is a relational database?
database structured into tables made with rows and columns
What are the tables within a relational database joined by?
Primary Key: unique identifier for each record
What is normalisation
process to organise and maintain the data
what are the steps in normalisation called?
Normal forms
what are the 3 normal forms?
First, second, third - standards used to structure tables
what is the first form?
- Unique column names🏷️
- Indivisible columns ➗
E.g. in a Products table, you may have started with “Clothing, Casual” in the “Categories” column, but 1NF means “Clothing” and “Casual” are split into seperate rows
what is the second form?
- 1NF plus…
- Divided into tables 🪑
- With primary keys 🗝️
e.g. ProductName depends only on ProductID, so they create a seperate table from the Sales Table which contains SaleID, Qty and Customer Name
what is the third form?
- 2NF plus…
- Columns dependent only on PK only
- Not inferred from each other
E.g. A sales table with SaleID, ProductID, Qty, CustomerName and CustomerID - CustomerName can be derived from CustomerID, so we only need CustomerID in the table and the Name is removed
What is NoSQL
not only SQL - flexible databases that store and manage un/structured data