Sir kyle Flashcards
What is data analytics?
a) The process of collecting and organizing data
b) The process of analyzing data to make decisions
c) The process of creating data
d) The process of deleting outdated data
b
Why is data analytics important for businesses?
1/1
a) It helps in predicting market trends
b) It provides insights for better decision-making
c) It identifies business performance issues
d) All of the above
d
Which type of analytics predicts future outcomes based on data?
1/1
a) Descriptive analytics
b) Diagnostic analytics
c) Predictive analytics
d) Prescriptive analytics
c
What is descriptive analytics?
1/1
a) Analytics that explains what has happened
b) Analytics that predicts what will happen
c) Analytics that determines why something happened
d) Analytics that recommends actions to take
a
Which of these is an example of structured data?
1/1
a) Social media posts
b) Email contents
c) Customer database with names and phone numbers
d) Images stored in a file
c
What type of data visualization is best suited for showing parts of a whole?
1/1
a) Line chart
b) Pie chart
c) Scatter plot
d) Histogram
b
Big Data refers to datasets that are…
1/1
a) Easy to store and manage
b) Too large and complex for traditional data-processing methods
c) Small but require a lot of computation
d) Structured and easy to analyze
b
What is the purpose of A/B testing in data analytics?
1/1
a) To compare two versions of a product or feature to determine which performs better
b) To clean data
c) To automate the analysis process
d) To visualize complex data
a
Which of the following describes prescriptive analytics?
1/1
a) Provides insights into why things happened
b) Describes what is happening in real-time
c) Recommends actions based on data analysis
d) Predicts future trends
c
What is the significance of data governance in analytics?
1/1
a) To regulate the storage of data
b) To ensure data privacy, security, and compliance
c) To visualize large datasets
d) To improve the speed of data processing
b
Which of the following is a type of data analytics?
1/1
A) Predictive Analytics
B) Descriptive Analytics
C) Prescriptive Analytics
D) All of the above
d
What type of data is “Gender” in a dataset?
1/1
A) Quantitative
B) Qualitative
C) Continuous
D) Interval
b
Which chart is most commonly used to show trends over time?
1/1
A) Pie Chart
B) Bar Chart
C) Line Chart
D) Scatter Plot
c
In data cleaning, which process removes duplicate values in a dataset?
A) Normalization
B) Deduplication
C) Data Merging
D) Standardization
a
What is the significance of data governance in analytics?
1/1
a) To regulate the storage of data
b) To ensure data privacy, security, and compliance
c) To visualize large datasets
d) To improve the speed of data processing
b
Which of the following is NOT a form of data visualization?
1/1
a) Bar Chart
b) Line Graph
c) Base Graph
d) Scatter Plot
c
Which chart is best suited for showing the distribution of data across different categories?
1/1
a) Line chart
b) Pie chart
c) Bar chart
d) Scatter plot
c
When creating a histogram, the X-axis represents:
1/1
a) Data frequency
b) Data values or ranges
c) Percentages
d) None of the above
b
In a histogram, what does the height of each bar represent?
1/1
a) The sum of data values in that range
b) The frequency or count of data in a specific range
c) The total data collected
d) The average of the data points in that bin
b
If the bars in a histogram are skewed to the right, what does this indicate about the distribution of the data?
1/1
a) Symmetric distribution
b) Positively skewed distribution
c) Negatively skewed distribution
d) Uniform distribution
b
Which measure of central tendency is most affected by outliers?
1/1
a) Mean
b) Median
c) Mode
d) All are equally affected
a
The median is defined as:
1/1
a) The average of all values
b) The most frequently occurring value
c) The middle value when data is ordered
d) The range of the dataset
c
When would the median be a better measure of central tendency than the mean?
1/1
a) When data is symmetrically distributed
b) When data has outliers or is skewed
c) When data is categorical
d) When data contains repeated values
b
What does the mean of a dataset represent?
1/1
a) The most frequently occurring value
b) The value that divides the data into two equal parts
c) The average of all data points
d) The value with the highest frequency
c
If the mean and median of a dataset are equal, what type of distribution does the data likely have?
1/1
a) Skewed to the left
b) Skewed to the right
c) Relatively Symmetric
d) Uniform distribution
c
Which measure of central tendency divides the dataset into two equal parts?
1/1
a) Mean
b) Median
c) Mode
d) Interquartile range
b
In a dataset where the mean is greater than the median, what can you infer about the shape of the distribution?
1/1
a) It is symmetric
b) It is positively skewed (right-skewed)
c) It is negatively skewed (left-skewed)
d) It is normally distributed
b
When analyzing income data, which measure of central tendency is typically preferred and why?
1/1
a) Mean, because it includes all data values
b) Median, because it is less influenced by extreme outliers
c) Mode, because it represents the most common income level
d) Mean, because it minimizes the impact of variance
b
In a dataset with outliers, why might the median be a better measure of central tendency than the mean?
1/1
a) The median reflects all values in the dataset
b) The mean is distorted by extreme values, while the median is not
c) The mode is more reliable than the mean
d) The mean and median are always equal
b
- Questions
- Data Collection
- Data Cleaning
- Data Analysis
- Data Interpretation
Data Analytics Workflow
Why are measures of central tendency important for summarizing large datasets?
1/1
a) They reduce the complexity of data by providing a single representative value
b) They eliminate the need to analyze individual data points
c) They measure the spread of the data
d) They provide insight into data variability
a
1965, Intel co-founder ____ predicted that
the number of transistors on a chip would double
roughly every two years, with a minimal rise in cost1
Gordon Moore
“I would expect that next year, people will share twice as
much information as they share this year, and next year,
they will be sharing twice as much as they did the year
before”
Mark Zuckerberg
characteristic of members of a population
e.g., market share, revenue, season, Bike_Rentals, temperature,
date, weather condition
Variables
Observations can be named without particular order or ranking imposed on the data.
Words, letters, and even numbers are used to classify the data
Nominal Value
observations of variable
e.g., 11%, $225M, summer, 985, 23.5˚, 1/12/2011, mcdonalds
Data
contains variables and observations
Array (rows and columns)
Data Set
Indicates an actual amount (numerical). The order and the difference between the variables
can be known. It limitation is it has no “true zero”.
Interval Level
The degree to which all required data is known.
Completeness
Describes ranking or order. The difference or ratio between rankings may not always be
the same.
Ordinal Value
It has the same properties as the interval level. The order and difference can be described.
Additionally, it has a true zero and the ratio between two points has a meaning
Ratio Level
Accuracy. Ensure your data is close to the true values (real-world objects it
represents).
Validity. If it measures what it is supposed to measure
Completeness. The degree to which all required data is known.
Consistency. Ensure your data is consistent within the same dataset and/or
across multiple data sets.
Uniformity. The degree to which the data is specified using the same unit of
measure.
DATA QUALITY DIMENTIONS
Ensure your data is close to the true values (real-world objects it
represents).
Accuracy
If it measures what it is supposed to measure
Validity
Right positively skewed:
The right tail is longer
Values of data extend to the right
Skewed to the RighT
Ensure your data is consistent within the same dataset and/or
across multiple data sets.
Consistency
Gather data from various sources, such as databases, files, APIs, or surveys.
Ensure that the data collected is relevant to your research question or analysis
objectives.
Data Collection and Acquisition
The degree to which the data is specified using the same unit of
measure.
Uniformity
Examine the raw data to get a sense of its structure and contents.
Check for missing values, outliers, and anomalies that may require attention..
Data Inspection
Address missing data by deciding whether to fill in missing values or remove records with missing
values.
Correct any data entry errors, inconsistencies, or outliers, duplicated records.
Standardize data formats (e.g., date formats, data types) to ensure consistency.
Data Cleansing
Encode categorical variables into numerical format using techniques like one-hot encoding or label encoding.
Normalize or scale numerical features if necessary to bring them to a common scale.
Data Transformation
Combine data from multiple sources if needed, ensuring that there are common identifiers to merge the
data correctly.
Data Integration
Create visualizations to explore the data further
and identify patterns, relationships, or outliers.
Visualization helps in understanding the data’s
characteristics and guiding further analysis.
Data Visualization
Data Creation/Collection
Data Ingestion (ETL)
Data Storage
Data Presentation and Visualization
Data Sharing and Distribution
Data Archiving and Retention
Data Backup and Disaster Recovery
Data Deletion and Disposal
Data Life Cycle
Left negatively skewed:
The left tail is longer
Values of data extend to the left
Skewed to the left