Week 1 Flashcards

Question 1

Q

What are the types of data?

Answer

A

Structured Data, Semi-Structured Data, and Unstructured Data.

Question 2

Q

What are the differences of data types in term of format?

Answer

A

Structured Data has predefined schema. Semi-Structured Data has some structure, often with tags. Unstructured Data has no fixed format.

Question 3

Q

What are the differences of data types in term of analysis?

Answer

A

SD is easy. SSD is moderate. UD is difficult.

Question 4

Q

What are the differences of data types in terms of tools?

Answer

A

In SD, sql, traditional databases. IN SSD, Specialized Tools like JSON parsers. In UD, Natural Language Processing, Computer Vision.

Question 5

Q

What are the differences of data types in term of examples?

Answer

A

In SD, databases and spreadsheets. In SSD, JSON, HTML, and XML. In UD, text, images, and videos.

Question 6

Q

What is ‘Structured Data’

Answer

A

organized in a predefined format with a fixed schema and is typically stored in rows and columns, similar to a spreadsheet or database table. Examples include customer information, spreadsheet sales data, and machine sensor readings.

Question 7

Q

What is ‘Unstructured Data’

Answer

A

lacks a predefined structure or format. It’s often text-heavy or multimedia-based. Social media posts, emails, images, videos, and audio files are typical examples of unstructured data.

Question 8

Q

What is ‘Semi-Structured Data’

Answer

A

has some structure but lacks the rigid format of structured data. It often includes tags or markers to indicate the meaning of different parts of the data. Examples of semi-structured data include JSON data, XML data, and HTML documents.

Question 9

Q

What is Data?

Answer

A

the raw material that fuels insights and informed decision-making, originating from various data sources

Question 10

Q

What is Data Source?

Answer

A

is where data is stored or generated, such as sensors, social media platforms, customer interactions, databases, and public records.

Question 11

Q

What is ‘Data Collection’?

Answer

A

involves systematically gathering data from these diverse sources

Question 12

Q

What are the two categories of Data Sources?

Answer

A

Primary Data Source and Secondary Data Source

Question 13

Q

What is Primary Data Source?

Answer

A

is data collected firsthand by the researcher for a specific purpose or project. Data is collected from primary data sources through surveys, experiments, and observations.

Question 14

Q

What is Secondary Data Source?

Answer

A

is data that has already been collected by someone else for another purpose but is being repurposed for a new analysis. Secondary data sources include public databases, published research, and third-party sources.

Question 15

Q

What is Databases?

Answer

A

a database is an organized collection of data that allows for efficient storage, retrieval, and manipulation. It is designed for transactional processing and day-to-day operations like creating, reading, updating, and deleting data (CRUD). Examples of databases are Microsoft SQL Server, MySQL, and MongoDB.

Question 16

Q

What is Data Warehouse?

Answer

A

a data warehouse is a large, centralized repository of data that aggregates information from various sources. It’s designed for analytical processing, historical data storage, and decision-making.

Question 17

Q

What is ‘Knowledge Discovery in Databases(KDD)’

Answer

A

is a method that offers a structured framework for extracting valuable insights from data.

Question 18

Q

What are the KDD Steps?

Answer

A

Data Selection, Data Preprocessing, Data Transformation, Data Mining, Pattern Evaluation, and Knowledge Representation.

Question 19

Q

What is ‘Data Selection’

Answer

A

identifying the relevant data sources for analysis by selecting the target dataset(s) or focusing on a subset of variables or data samples.

Question 20

Q

What is ‘Data Preprocessing’

Answer

A

cleaning and preparing the data for analysis by removing outliers and handling missing values to correct errors and inconsistencies.

Question 21

Q

What is ‘Data Transformation’

Answer

A

preparing the data for analysis by transforming it into suitable formats for mining. This data transformation includes reduction, normalization, discretization, and feature engineering.

Question 22

Q

What is ‘Data Mining’

Answer

A

apply algorithms such as classification, clustering, and association rules to extract patterns or models from the processed data.

Question 23

Q

How does ‘Data Mining’ work?

Answer

A

Collect Data: Start with a lot of information (like sales records, customer reviews, or website activity).

Analyze: Use tools and algorithms (special formulas) to look for patterns.

Find Insights: Spot trends, such as which products sell best during certain months or what type of customers are most loyal.

Question 24

Q

What is ‘Pattern Evaluation’

Answer

A

identify interesting patterns in the dataset by assessing their relevance, validity, novelty, and potential usefulness for action.

Question 25

Q

What is ‘Knowledge Representation’

Answer

A

share the discovered knowledge using reports, visualizations, or decision support systems to communicate the findings to stakeholders.

Question 26

Q

How does ‘Data Transformation’ works?

Answer

A

Collect Data: Gather a lot of small details (e.g., daily sales from multiple stores).

Group It: Organize the data (e.g., by month, region, or product type).

Summarize: Use calculations like totals, averages, or counts to create a summary (e.g., “Total sales for January = $10,000”).

Question 27

Q

What is ‘Data Dredging’

Answer

A

Data dredging (also called data fishing or data snooping) is when someone looks through a lot of data to find patterns or results, but they do it without a clear plan or hypothesis.

Question 28

Q

What is ‘Data Discrepancy’

Answer

A

is when there’s a mismatch or inconsistency in data that should be the same or aligned.

Question 29

Q

What is ‘Data Regression’

Answer

A

A system bug or error that causes old, incorrect data to reappear.

Question 30

Q

What is ‘Regression Analysis’

Answer

A

A statistical tool to predict and explain data relationships.

Question 31

Q

What is the difference between ‘Data Analysis’ and ‘Data Mining’?

Answer

A

data analysis answers predefined questions using statistical techniques. data mining involves discovering hidden patterns in large datasets without a specific question using clustering or association rule mining methods.

Question 32

Q

What is ‘Feature Engineering’

Answer

A

is the process of creating, selecting, or transforming data into meaningful inputs (features) that can improve the performance of a machine learning model.is the process of creating, selecting, or transforming data into meaningful inputs (features) that can improve the performance of a machine learning model.

Question 33

Q

How does ‘Feature Engineering’ works?

Answer

A

Identify Raw Data: Start with the original dataset (e.g., sales data, customer details, sensor readings).

Create New Features: Derive useful information from the raw data.
Example: Combine “date of birth” to calculate “age.”

Transform Features: Apply techniques like scaling or encoding to make the data usable for models.

Select Features: Pick the most relevant features and remove unnecessary ones to avoid overcomplicating the model.

Question 34

Q

What is ‘Classification Technique’

Answer

A

a technique used to categorize data into predefined classes or categories based on the features or attributes of the data instances. It involves training a model on labeled data and using it to predict the class labels of new, unseen data instances.

Question 35

Q

What is ‘Regression Technique’

Answer

A

employed to predict numeric or continuous values based on the relationship between input variables and a target variable. It aims to find a mathematical function or model that best fits the data to make accurate predictions.

Question 36

Q

What is ‘Clustering Technique’

Answer

A

a technique used to group similar data instances together based on their intrinsic characteristics or similarities. It aims to discover natural patterns or structures in the data without any predefined classes or labels.

Question 37

Q

What is ‘Association Rule’

Answer

A

focuses on discovering interesting relationships or patterns among a set of items in transactional or market basket data. It helps identify frequently co-occurring items and generates rules such as “if X, then Y” to reveal associations between items.

Question 38

Q

What is ‘Anomaly Detection’

Answer

A

aims to identify rare or unusual data instances that deviate significantly from the expected patterns. It is useful in detecting fraudulent transactions, network intrusions, manufacturing defects, or any other abnormal behavior.

Question 39

Q

What is ‘Time Series Analysis Technique’

Answer

A

focuses on analyzing and predicting data points collected over time. It involves techniques such as forecasting, trend analysis, seasonality detection, and anomaly detection in time-dependent datasets.

Question 40

Q

What is ‘Neural Networks Technique’

Answer

A

a type of machine learning or AI model inspired by the human brain’s structure and function. They are composed of interconnected nodes (neurons) and layers that can learn from data to recognize patterns, perform classification, regression, or other tasks.

Question 41

Q

What is ‘Decision Trees Technique’

Answer

A

graphical models that use a tree-like structure to represent decisions and their possible consequences. They recursively split the data based on different attribute values to form a hierarchical decision-making process.

Question 42

Q

What is ‘Ensemble Methods Technique’

Answer

A

combine multiple models to improve prediction accuracy and generalization. Techniques like Random Forests and Gradient Boosting utilize a combination of weak learners to create a stronger, more accurate model.

Question 43

Q

Brainscape's Knowledge GenomeTM

Week 1 Flashcards

Week 1 course content

Brainscape's Knowledge Genome^TM