Introduction and foundations Flashcards
(39 cards)
What is data mining?
a) Data cleaning process
b) Discovering patterns in large datasets
c) Data storage technique
d) A way to delete unnecessary data
b) Discovering patterns in large datasets
Which analogy is commonly used to describe data mining?
a) Searching for errors in data
b) Extracting gold from ore
c) Planting information seeds
d) Building data warehouses
b) Extracting gold from ore
Why is data mining often called “Knowledge Discovery from Data (KDD)”?
a) It involves cleaning the data
b) It transforms raw data into useful knowledge
c) It focuses on visualization
d) It is synonymous with machine learning
b) It transforms raw data into useful knowledge
Which step in the KDD process involves combining data from multiple sources?
a) Data cleaning
b) Data integration
c) Data selection
d) Pattern evaluation
b) Data integration
What is the primary goal of the data mining process?
a) To store data efficiently
b) To visualize data trends
c) To uncover interesting patterns and models
d) To clean and organize datasets
c) To uncover interesting patterns and models
What are “outliers” in data mining?
a) Common data points in a dataset
b) Data points that deviate significantly from others
c) Summary of the entire dataset
d) Missing data points
b) Data points that deviate significantly from others
Which type of pattern does data mining NOT aim to find?
a) Associations
b) Correlations
c) Predictions
d) Irrelevant trends
d) Irrelevant trends
What does the term “Big Data” refer to?
a) Small datasets processed in real-time
b) Vast amounts of data characterized by volume, velocity, and variety
c) Data limited to structured formats
d) Data that only includes images and videos
b) Vast amounts of data characterized by volume, velocity, and variety
Big Data is characterized by which three V’s?
a) Value, Validation, Velocity
b) Variety, Volume, Velocity
c) Volume, Verification, Variability
d) Visualization, Variety, Value
b) Variety, Volume, Velocity
What is a key challenge in mining Big Data?
a) Limited storage space
b) Poor visualization tools
c) Efficient handling of high velocity and volume
d) Incompatibility of algorithms with structured data
c) Efficient handling of high velocity and volume
Why is Big Data important for data mining?
a) It allows access to unlimited data storage
b) It provides vast, diverse datasets for uncovering patterns
c) It simplifies machine learning algorithms
d) It only focuses on small subsets of data
b) It provides vast, diverse datasets for uncovering patterns
What is Knowledge Discovery from Data (KDD)?
a) Cleaning and summarizing datasets
b) A process that involves extracting useful information from raw data
c) A tool used to query databases
d) A step focused solely on visualization
b) A process that involves extracting useful information from raw data
What distinguishes KDD from simple database querying?
a) KDD generates knowledge, not just results
b) KDD is only for structured data
c) KDD relies on external tools
d) KDD ignores data cleaning steps
a) KDD generates knowledge, not just results
Why might outliers be important rather than ignored?
a) They make data cleaning easier
b) They reveal valuable anomalies like fraud
c) They confirm dataset accuracy
d) They are always indicative of errors
b) They reveal valuable anomalies like fraud
What is the difference between structured and unstructured data?
a) Unstructured data cannot be analyzed
b) Structured data has clear formats and attributes
c) Unstructured data is error-prone
d) Structured data is always accurate
b) Structured data has clear formats and attributes
What is an example of predictive data mining?
a) Clustering similar customers
b) Analyzing frequent purchases
c) Predicting future sales based on patterns
d) Summarizing datasets
c) Predicting future sales based on patterns
Which of the following is NOT a step in the KDD process?
a) Data transformation
b) Pattern evaluation
c) Knowledge presentation
d) Web scraping
d) Web scraping
Which method can be used for outlier detection?
a) Statistical tests
b) Deep learning only
c) Manual analysis
d) Regression
a) Statistical tests
How does data cleaning contribute to data mining?
a) By adding more patterns
b) By removing irrelevant data
c) By ensuring all models fit all datasets
d) By storing data efficiently
b) By removing irrelevant data
What are the four primary types of data?
a) Binary, Continuous, Nominal, Ratio
b) Nominal, Ordinal, Interval, Ratio
c) Numeric, Text, Boolean, Ratio
d) Structured, Unstructured, Semi-structured, Nominal
b) Nominal, Ordinal, Interval, Ratio
What is nominal data?
a) Data that represents order but not magnitude
b) Data with categories that have no inherent order
c) Data that measures absolute zero
d) Data with equal intervals but no true zero
b) Data with categories that have no inherent order
Which type of data reflects order but not distance between values?
a) Nominal
b) Ordinal
c) Interval
d) Ratio
b) Ordinal
What distinguishes ratio data from interval data?
a) Ratio data cannot have a zero value
b) Ratio data includes a true zero point
c) Ratio data is categorical
d) Ratio data lacks any numerical meaning
b) Ratio data includes a true zero point
What is data quality?
a) The process of data storage
b) The degree to which data meets user needs for accuracy and reliability
c) A measure of data size and velocity
d) The ability to visualize datasets
b) The degree to which data meets user needs for accuracy and reliability