Week 1 Flashcards
Week 1 course content
What are the types of data?
Structured Data, Semi-Structured Data, and Unstructured Data.
What are the differences of data types in term of format?
Structured Data has predefined schema. Semi-Structured Data has some structure, often with tags. Unstructured Data has no fixed format.
What are the differences of data types in term of analysis?
SD is easy. SSD is moderate. UD is difficult.
What are the differences of data types in terms of tools?
In SD, sql, traditional databases. IN SSD, Specialized Tools like JSON parsers. In UD, Natural Language Processing, Computer Vision.
What are the differences of data types in term of examples?
In SD, databases and spreadsheets. In SSD, JSON, HTML, and XML. In UD, text, images, and videos.
What is ‘Structured Data’
organized in a predefined format with a fixed schema and is typically stored in rows and columns, similar to a spreadsheet or database table. Examples include customer information, spreadsheet sales data, and machine sensor readings.
What is ‘Unstructured Data’
lacks a predefined structure or format. It’s often text-heavy or multimedia-based. Social media posts, emails, images, videos, and audio files are typical examples of unstructured data.
What is ‘Semi-Structured Data’
has some structure but lacks the rigid format of structured data. It often includes tags or markers to indicate the meaning of different parts of the data. Examples of semi-structured data include JSON data, XML data, and HTML documents.
What is Data?
the raw material that fuels insights and informed decision-making, originating from various data sources
What is Data Source?
is where data is stored or generated, such as sensors, social media platforms, customer interactions, databases, and public records.
What is ‘Data Collection’?
involves systematically gathering data from these diverse sources
What are the two categories of Data Sources?
Primary Data Source and Secondary Data Source
What is Primary Data Source?
is data collected firsthand by the researcher for a specific purpose or project. Data is collected from primary data sources through surveys, experiments, and observations.
What is Secondary Data Source?
is data that has already been collected by someone else for another purpose but is being repurposed for a new analysis. Secondary data sources include public databases, published research, and third-party sources.
What is Databases?
a database is an organized collection of data that allows for efficient storage, retrieval, and manipulation. It is designed for transactional processing and day-to-day operations like creating, reading, updating, and deleting data (CRUD). Examples of databases are Microsoft SQL Server, MySQL, and MongoDB.
What is Data Warehouse?
a data warehouse is a large, centralized repository of data that aggregates information from various sources. It’s designed for analytical processing, historical data storage, and decision-making.
What is ‘Knowledge Discovery in Databases(KDD)’
is a method that offers a structured framework for extracting valuable insights from data.