Sir kyle Flashcards

1
Q

What is data analytics?
a) The process of collecting and organizing data
b) The process of analyzing data to make decisions
c) The process of creating data
d) The process of deleting outdated data

A

b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is data analytics important for businesses?
1/1
a) It helps in predicting market trends
b) It provides insights for better decision-making
c) It identifies business performance issues
d) All of the above

A

d

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which type of analytics predicts future outcomes based on data?
1/1
a) Descriptive analytics
b) Diagnostic analytics
c) Predictive analytics
d) Prescriptive analytics

A

c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is descriptive analytics?
1/1
a) Analytics that explains what has happened
b) Analytics that predicts what will happen
c) Analytics that determines why something happened
d) Analytics that recommends actions to take

A

a

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which of these is an example of structured data?
1/1
a) Social media posts
b) Email contents
c) Customer database with names and phone numbers
d) Images stored in a file

A

c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What type of data visualization is best suited for showing parts of a whole?
1/1
a) Line chart
b) Pie chart
c) Scatter plot
d) Histogram

A

b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Big Data refers to datasets that are…
1/1
a) Easy to store and manage
b) Too large and complex for traditional data-processing methods
c) Small but require a lot of computation
d) Structured and easy to analyze

A

b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the purpose of A/B testing in data analytics?
1/1
a) To compare two versions of a product or feature to determine which performs better
b) To clean data
c) To automate the analysis process
d) To visualize complex data

A

a

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which of the following describes prescriptive analytics?
1/1
a) Provides insights into why things happened
b) Describes what is happening in real-time
c) Recommends actions based on data analysis
d) Predicts future trends

A

c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the significance of data governance in analytics?
1/1
a) To regulate the storage of data
b) To ensure data privacy, security, and compliance
c) To visualize large datasets
d) To improve the speed of data processing

A

b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which of the following is a type of data analytics?
1/1
A) Predictive Analytics
B) Descriptive Analytics
C) Prescriptive Analytics
D) All of the above

A

d

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What type of data is “Gender” in a dataset?
1/1
A) Quantitative
B) Qualitative
C) Continuous
D) Interval

A

b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which chart is most commonly used to show trends over time?
1/1
A) Pie Chart
B) Bar Chart
C) Line Chart
D) Scatter Plot

A

c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In data cleaning, which process removes duplicate values in a dataset?
A) Normalization
B) Deduplication
C) Data Merging
D) Standardization

A

a

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the significance of data governance in analytics?
1/1
a) To regulate the storage of data
b) To ensure data privacy, security, and compliance
c) To visualize large datasets
d) To improve the speed of data processing

A

b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which of the following is NOT a form of data visualization?
1/1
a) Bar Chart
b) Line Graph
c) Base Graph
d) Scatter Plot

A

c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which chart is best suited for showing the distribution of data across different categories?
1/1
a) Line chart
b) Pie chart
c) Bar chart
d) Scatter plot

A

c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When creating a histogram, the X-axis represents:
1/1
a) Data frequency
b) Data values or ranges
c) Percentages
d) None of the above

A

b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In a histogram, what does the height of each bar represent?
1/1
a) The sum of data values in that range
b) The frequency or count of data in a specific range
c) The total data collected
d) The average of the data points in that bin

A

b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

If the bars in a histogram are skewed to the right, what does this indicate about the distribution of the data?
1/1
a) Symmetric distribution
b) Positively skewed distribution
c) Negatively skewed distribution
d) Uniform distribution

A

b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Which measure of central tendency is most affected by outliers?
1/1
a) Mean
b) Median
c) Mode
d) All are equally affected

A

a

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

The median is defined as:
1/1
a) The average of all values
b) The most frequently occurring value
c) The middle value when data is ordered
d) The range of the dataset

A

c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

When would the median be a better measure of central tendency than the mean?
1/1
a) When data is symmetrically distributed
b) When data has outliers or is skewed
c) When data is categorical
d) When data contains repeated values

A

b

19
Q

What does the mean of a dataset represent?
1/1
a) The most frequently occurring value
b) The value that divides the data into two equal parts
c) The average of all data points
d) The value with the highest frequency

A

c

20
Q

If the mean and median of a dataset are equal, what type of distribution does the data likely have?
1/1
a) Skewed to the left
b) Skewed to the right
c) Relatively Symmetric
d) Uniform distribution

A

c

21
Q

Which measure of central tendency divides the dataset into two equal parts?
1/1
a) Mean
b) Median
c) Mode
d) Interquartile range

A

b

22
Q

In a dataset where the mean is greater than the median, what can you infer about the shape of the distribution?
1/1
a) It is symmetric
b) It is positively skewed (right-skewed)
c) It is negatively skewed (left-skewed)
d) It is normally distributed

A

b

22
Q

When analyzing income data, which measure of central tendency is typically preferred and why?
1/1
a) Mean, because it includes all data values
b) Median, because it is less influenced by extreme outliers
c) Mode, because it represents the most common income level
d) Mean, because it minimizes the impact of variance

A

b

22
Q

In a dataset with outliers, why might the median be a better measure of central tendency than the mean?
1/1
a) The median reflects all values in the dataset
b) The mean is distorted by extreme values, while the median is not
c) The mode is more reliable than the mean
d) The mean and median are always equal

A

b

23
Q
  1. Questions
  2. Data Collection
  3. Data Cleaning
  4. Data Analysis
  5. Data Interpretation
A

Data Analytics Workflow

23
Q

Why are measures of central tendency important for summarizing large datasets?
1/1
a) They reduce the complexity of data by providing a single representative value
b) They eliminate the need to analyze individual data points
c) They measure the spread of the data
d) They provide insight into data variability

A

a

23
Q

1965, Intel co-founder ____ predicted that
the number of transistors on a chip would double
roughly every two years, with a minimal rise in cost1

A

Gordon Moore

24
Q

“I would expect that next year, people will share twice as
much information as they share this year, and next year,
they will be sharing twice as much as they did the year
before”

A

Mark Zuckerberg

25
Q

characteristic of members of a population

e.g., market share, revenue, season, Bike_Rentals, temperature,
date, weather condition

A

Variables

25
Q

Observations can be named without particular order or ranking imposed on the data.

Words, letters, and even numbers are used to classify the data

A

Nominal Value

25
Q

observations of variable

e.g., 11%, $225M, summer, 985, 23.5˚, 1/12/2011, mcdonalds

A

Data

25
Q

contains variables and observations

Array (rows and columns)

A

Data Set

26
Q

Indicates an actual amount (numerical). The order and the difference between the variables

can be known. It limitation is it has no “true zero”.

A

Interval Level

26
Q

The degree to which all required data is known.

A

Completeness

26
Q

Describes ranking or order. The difference or ratio between rankings may not always be

the same.

A

Ordinal Value

26
Q

It has the same properties as the interval level. The order and difference can be described.

Additionally, it has a true zero and the ratio between two points has a meaning

A

Ratio Level

26
Q

Accuracy. Ensure your data is close to the true values (real-world objects it
represents).

Validity. If it measures what it is supposed to measure

Completeness. The degree to which all required data is known.

Consistency. Ensure your data is consistent within the same dataset and/or
across multiple data sets.

Uniformity. The degree to which the data is specified using the same unit of
measure.

A

DATA QUALITY DIMENTIONS

26
Q

Ensure your data is close to the true values (real-world objects it
represents).

A

Accuracy

26
Q

If it measures what it is supposed to measure

A

Validity

27
Q

Right positively skewed:

The right tail is longer

Values of data extend to the right

A

Skewed to the RighT

27
Q

Ensure your data is consistent within the same dataset and/or
across multiple data sets.

A

Consistency

27
Q

Gather data from various sources, such as databases, files, APIs, or surveys.

Ensure that the data collected is relevant to your research question or analysis
objectives.

A

Data Collection and Acquisition

27
Q

The degree to which the data is specified using the same unit of
measure.

A

Uniformity

27
Q

Examine the raw data to get a sense of its structure and contents.

Check for missing values, outliers, and anomalies that may require attention..

A

Data Inspection

27
Q

Address missing data by deciding whether to fill in missing values or remove records with missing
values.

Correct any data entry errors, inconsistencies, or outliers, duplicated records.

Standardize data formats (e.g., date formats, data types) to ensure consistency.

A

Data Cleansing

27
Q

Encode categorical variables into numerical format using techniques like one-hot encoding or label encoding.

Normalize or scale numerical features if necessary to bring them to a common scale.

A

Data Transformation

27
Q

Combine data from multiple sources if needed, ensuring that there are common identifiers to merge the
data correctly.

A

Data Integration

27
Q

Create visualizations to explore the data further
and identify patterns, relationships, or outliers.

Visualization helps in understanding the data’s
characteristics and guiding further analysis.

A

Data Visualization

27
Q

Data Creation/Collection

Data Ingestion (ETL)

Data Storage

Data Presentation and Visualization

Data Sharing and Distribution

Data Archiving and Retention

Data Backup and Disaster Recovery

Data Deletion and Disposal

A

Data Life Cycle

27
Q

Left negatively skewed:

The left tail is longer

Values of data extend to the left

A

Skewed to the left