Topic 5 Flashcards
What is an ER diagram?
In Pearson Edexcel International Advanced Level (IAL) IT, an ER diagram (Entity-Relationship diagram) is a type of visual representation used to model the relationships between data entities in a database system.
It consists of:
1. Entities: Represented by rectangles, which are objects or concepts (e.g., “Student” or “Course”).
2. Attributes: Represented by ovals, which provide details about an entity (e.g., “Name” or “ID”).
3. Relationships: Represented by diamonds, showing how entities are connected (e.g., “Enrolled in”).
ER diagrams help design and structure a database before implementation, ensuring data is organized logically and effectively.
What are three features of structured data?
Structured Data:
1. Organized Format:
○ Stored in a predefined format like rows and columns (e.g., spreadsheets or databases).
○ Example: A customer database with names, ages, and email addresses.
2. Easily Searchable:
○ Can be quickly searched and queried using tools like SQL.
○ Example: Finding all customers older than 30 in a database.
3. Defined Data Types:
○ Each piece of data has a specific type (like numbers, dates, or text).
○ Example: A “Date of Birth” column only contains valid dates.
What are three features of unstructured data?
Unstructured Data:
1. No Fixed Format:
○ Data isn’t stored in rows or columns and doesn’t have a consistent structure.
○ Example: Photos, videos, emails, or social media posts.
2. Challenging to Search:
○ Requires advanced tools like AI or machine learning to find specific information.
○ Example: Searching for a specific phrase in thousands of video transcripts.
3. Varied Data Types:
○ Includes a mix of text, images, audio, and video.
○ Example: A YouTube video with captions and comments.
Define structured data:
Data that is organised into rows and columns in tables, making it easy to search, store, and analyse (e.g., data in a spreadsheet or database).
Define unstructured data:
Data that does not have a clear format or structure, such as images, videos, emails, or text documents.
Explain structured and unstructured data:
Structured data is highly organised and often stored in relational databases. It is easy to query using SQL.
Unstructured data lacks a predefined format, so analysing it requires advanced tools like natural language processing or AI.
Define format:
The way data is arranged, structured, or presented (e.g., CSV, JSON, XML).
Explain why format is useful in IT:
Formats standardise how data is stored and shared, ensuring compatibility across systems and simplifying data processing.
Define qualitative data:
Descriptive data that captures qualities or characteristics (e.g., colour, texture, opinions).
Explain why format is useful in IT:
Formats standardise how data is stored and shared, ensuring compatibility across systems and simplifying data processing.
Examples of structured data:
Customer names and phone numbers in a database
Sales records in an Excel sheet
Financial transactions stored in a SQL database
Why structured data is easy to search, manipulate, and analyse:
It is organised into predefined fields (rows/columns).
It uses standard formats and can be queried with tools like SQL.
Relationships and patterns are clear due to its structured nature.
Examples of unstructured data:
Social media posts
Photos and videos
Emails and chat logs
How ML uses structured data:
ML uses structured data to train algorithms by analysing patterns and relationships in labelled datasets, like predicting sales or diagnosing diseases.
Process of developing an ML model:
Collect and prepare data.
Preprocess data (cleaning, normalisation).
Select an algorithm (e.g., linear regression, decision trees).
Train the model using training data.
Validate and test the model with test data.
Deploy the model and monitor performance.