Introduction and foundations Flashcards
What is data mining?
a) Data cleaning process
b) Discovering patterns in large datasets
c) Data storage technique
d) A way to delete unnecessary data
b) Discovering patterns in large datasets
Which analogy is commonly used to describe data mining?
a) Searching for errors in data
b) Extracting gold from ore
c) Planting information seeds
d) Building data warehouses
b) Extracting gold from ore
Why is data mining often called “Knowledge Discovery from Data (KDD)”?
a) It involves cleaning the data
b) It transforms raw data into useful knowledge
c) It focuses on visualization
d) It is synonymous with machine learning
b) It transforms raw data into useful knowledge
Which step in the KDD process involves combining data from multiple sources?
a) Data cleaning
b) Data integration
c) Data selection
d) Pattern evaluation
b) Data integration
What is the primary goal of the data mining process?
a) To store data efficiently
b) To visualize data trends
c) To uncover interesting patterns and models
d) To clean and organize datasets
c) To uncover interesting patterns and models
What are “outliers” in data mining?
a) Common data points in a dataset
b) Data points that deviate significantly from others
c) Summary of the entire dataset
d) Missing data points
b) Data points that deviate significantly from others
Which type of pattern does data mining NOT aim to find?
a) Associations
b) Correlations
c) Predictions
d) Irrelevant trends
d) Irrelevant trends
What does the term “Big Data” refer to?
a) Small datasets processed in real-time
b) Vast amounts of data characterized by volume, velocity, and variety
c) Data limited to structured formats
d) Data that only includes images and videos
b) Vast amounts of data characterized by volume, velocity, and variety
Big Data is characterized by which three V’s?
a) Value, Validation, Velocity
b) Variety, Volume, Velocity
c) Volume, Verification, Variability
d) Visualization, Variety, Value
b) Variety, Volume, Velocity
What is a key challenge in mining Big Data?
a) Limited storage space
b) Poor visualization tools
c) Efficient handling of high velocity and volume
d) Incompatibility of algorithms with structured data
c) Efficient handling of high velocity and volume
Why is Big Data important for data mining?
a) It allows access to unlimited data storage
b) It provides vast, diverse datasets for uncovering patterns
c) It simplifies machine learning algorithms
d) It only focuses on small subsets of data
b) It provides vast, diverse datasets for uncovering patterns
What is Knowledge Discovery from Data (KDD)?
a) Cleaning and summarizing datasets
b) A process that involves extracting useful information from raw data
c) A tool used to query databases
d) A step focused solely on visualization
b) A process that involves extracting useful information from raw data
What distinguishes KDD from simple database querying?
a) KDD generates knowledge, not just results
b) KDD is only for structured data
c) KDD relies on external tools
d) KDD ignores data cleaning steps
a) KDD generates knowledge, not just results
Why might outliers be important rather than ignored?
a) They make data cleaning easier
b) They reveal valuable anomalies like fraud
c) They confirm dataset accuracy
d) They are always indicative of errors
b) They reveal valuable anomalies like fraud
What is the difference between structured and unstructured data?
a) Unstructured data cannot be analyzed
b) Structured data has clear formats and attributes
c) Unstructured data is error-prone
d) Structured data is always accurate
b) Structured data has clear formats and attributes