Compiled Summatives - Sheet1 Flashcards

1
Q

What is the primary focus of statistics?

Predictive modeling
Data mining
Application of algorithms to inform strategic decisions
Collection, analysis, interpretation, presentation, and organization of data

A

Collection, analysis, interpretation, presentation, and organization of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which of the following methods is commonly used in statistics to understand data distributions and relationships?

Algorithm application
Data mining
Hypothesis testing and regression analysis
Predictive modeling

A

Hypothesis testing and regression analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does analytics emphasize in addition to statistical methods?

Data presentation
Data interpretation
Predictive modeling and data mining
Data collection

A

Predictive modeling and data mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which of the following best describes the scope of analytics?

Integrates statistical methods with advanced computational techniques
Focuses solely on hypothesis testing
Limited to data collection and presentation
Only involves data organization

A

Integrates statistical methods with advanced computational techniques

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the first step in the data analysis process

Get actionable information
Extract patterns
Prepare data
Apply machine learning techniques

A

Prepare data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which of the following is not listed as a data source from the chart?

Printed Books
Email
Social Media Posts
Audio

A

Printed Books

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does the second step of the process involve?

Finding patterns using algorithms
Making decisions based on information
Collecting raw information
Cleaning and transforming databases

A

Finding patterns using algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In which step would you apply machine learning techniques according to this flowchart?

Step 2- Extract Patterns
None of the above steps explicitly mention applying machine learning techniques
Step 3 - Get Actionable Information
Step 1 - Prepare Data

A

Step 2- Extract Patterns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What outcome does this flowchart suggest as a result of following these steps?

Creation of new databases
Learning how to code in various programming languages
Development of new software programs
Gaining insights or making informed decisions based on analyzed data

A

Gaining insights or making informed decisions based on analyzed data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does transactional data primarily consist of?

Visual representations of data
General summaries of transactions
Structured, detailed information
Unstructured and random information

A

Structured, detailed information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which of the following is an example of transactional data?

Credit card payment
Social media posts
Weather forecasts
Movie reviews

A

Credit card payment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What type of information is included in contractual, subscription, or account data?

Social media interactions
General market trends
Information about the type of product combined with customer characteristics
Weather patterns

A

Information about the type of product combined with customer characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which of the following is an example of a product type mentioned in the statement?

Loan
Weather forecast
Movie review
Social media post

A

Loan

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the primary aim of surveys?

To extract sociodemographic and behavioral data from a particular group of people
To organize social events for communities
To entertain a particular group of people
To provide financial assistance to people

A

To extract sociodemographic and behavioral data from a particular group of people

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Surveys are typically in the form of:

Novels
Music albums
Questionnaires
Art exhibitions

A

Questionnaires

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which of the following is NOT an example of unstructured data?

Social media posts
Media files
Sensor data
Spreadsheets

A

Spreadsheets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is unstructured data?

Information that resides in a traditional row-column database
Data that is always textual
Data that is always numerical
Information that does not reside in a traditional row-column database

A

Information that does not reside in a traditional row-column database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which of the following is an example of a purpose for which data poolers gather data?

Marketing and credit risk assessment
Weather forecasting
Event planning
Cooking recipes

A

Marketing and credit risk assessment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the primary role of data poolers?

To provide financial advice
and sell data for specific purposes
To develop software applications
To create new databases

A

and sell data for specific purposes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the first phase in the data analytics process?

Business Understanding
Modelling
Data Preparation
Evaluation

A

Business Understanding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the primary goal of the Business Understanding phase?

Cleaning data for better quality
Evaluating the model
Evaluating the model
Applying machine learning algorithms

A

Evaluating the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Which phase involves selecting related data from various databases?

Data Understanding
Deployment
Data Preparation
Modelling

A

Data Understanding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Which of the following is NOT a type of database mentioned in the Data Understanding phase?

Relational Databases
Temporal, Sequence or Time-Series Database
Social Media Databases
Data Warehouses

A

Social Media Databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is another term for Data Preparation?

Data Modelling
Data Preprocessing
Data Transformation
Data Cleaning

A

Data Preprocessing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Which of the following activities is NOT part of Data Preparation?

Aggregating data
Filling in missing values
Applying machine learning algorithms
Filtering outliers

A

Applying machine learning algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What does Data Transformation involve?

Converting different measurements into a unified numerical scale
Evaluating the model
Selecting related data from databases
Cleaning data for better quality

A

Converting different measurements into a unified numerical scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Which of the following is an example of categorical values?

Filtered data
Numerical scales
Ordinal values (less, moderate, strong)
Aggregated data

A

Ordinal values (less, moderate, strong)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the primary focus of the Modelling phase?

Applying statistical and machine learning algorithms
Identifying business tasks
Selecting related data
Cleaning data

A

Applying statistical and machine learning algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Which phase involves evaluating the performance of the model?

Deployment
Data Preparation
Business Understanding
Evaluation

A

Evaluation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is the final phase in the data analytics process?

Modelling
Deployment
Evaluation
Data Understanding

A

Deployment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Which activity is part of the Data Preparation phase?

Identifying relevant data for the problem description
Evaluating the model
Applying machine learning algorithms
Filtering outliers and redundancies

A

Filtering outliers and redundancies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What type of data can be found in a Temporal, Sequence or Time-Series Database?

Static data
Aggregated data
Time-based data
Categorical data

A

Time-based data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Which phase involves selecting the related data from many available databases to correctly describe a given business task?

Data Understanding
Evaluation
Data Preparation
Modelling

A

Data Understanding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is the definition of Mean

The range of values in a dataset
The average value of a dataset
The middle value in a dataset
The most frequently occurring value in a dataset

A

The average value of a dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

How is the Mean calculated?

By identifying the most frequent value
By summing all values and dividing by the number of values
By subtracting the smallest value from the largest value
By finding the middle value

A

By summing all values and dividing by the number of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What does the Median represent?

The most frequently occurring value in a dataset
The middle value when arranged in order
The difference between the highest and lowest values
The average value of a dataset

A

The middle value when arranged in order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Which measure of central tendency can have multiple values?

Median
Mean
Range
Mode

A

Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What is the primary purpose of measures of central tendency?

Measuring dispersion
Solving equations
Calculating probability
Organizing, summarizing, and visualizing data

A

Organizing, summarizing, and visualizing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Formula for mean of population data

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Formula for mean of sample data

A
41
Q

What is the midrange of the data set 11, 13, 4, 30, 9, 15?

15
17
16
18

A

17

42
Q

Median Formula for Grouped Datasets

A

t

43
Q

What does the SUM function do?

Calculates the mean value of a dataset
Adds a range of cells
Returns the median value of a dataset
Returns the maximum value of a dataset

A

Adds a range of cells

44
Q

Which function would you use to calculate the arithmetic average of a range of cells?

SUMIF
AVERAGE
MEDIAN
MAX

A

AVERAGE

45
Q

For finding the smallest value in your data set, which function will you use?

AVERAGE
MIN
SUMIF
MAX

A

MIN

46
Q

Given the class boundaries 50-60, 60-70, 70-80, 80-90, and 90-100 with frequencies 5, 12, 9, 6, and 4 respectively, what is the total frequency (N)?

30
36
45
40

A

36

47
Q

What is the midpoint ((d_i)) of the class 70-80?

70
80
65
75

A

75

48
Q

Calculate the product of the midpoint and frequency for the class 80-90.

600
570
540
510

A

510

49
Q

Match each question to its corresponding statistical method.

Which factors move together?
Are there differences in distribution?
Are two populations similar?

A

Correlation Coefficient
Categorical Distribution
Analysis of Variance (ANOVA, F-Test)

50
Q

What branch of statistics involves using sample data to make conclusions or predictions about a larger population?

Inferential Statistics
Descriptive Statistics
Non-parametric Statistics
Bayesian Statistics

A

Inferential Statistics

51
Q

Which method measures the linear relationship between two numerical variables?

Pearson Correlation Coefficient
Chi-Square Test
ANOVA
T-test

A

Pearson Correlation Coefficient

52
Q

What does the F-Test in ANOVA compare?

Variances within and between groups
Means of two samples
Medians of two samples
Standard deviations of two samples

A

Variances within and between groups

53
Q

Which technique is NOT used for hypothesis testing?

Predictive Modeling
Z-test
T-test
Chi-Square Test

A

Predictive Modeling

54
Q

What does the significance level (α) indicate in hypothesis testing?

The probability of rejecting the null hypothesis when it is true.
The maximum allowed sample size.
The minimum sample size for an accurate test.
The variance within a sample.

A

The probability of rejecting the null hypothesis when it is true.

55
Q

What is the significance level (α) commonly set at?

0.05 or 5%
0.10 or 10%
0.01 or 1%
0.20 or 20%

A

0.05 or 5%

56
Q

In the given example, which two categorical variables are being tested for association?

Gender (male/female) and smoking status (smoker/non-smoker)
Age group and education level
Income level and exercise frequency
Ethnicity and diet preference

A

Gender (male/female) and smoking status (smoker/non-smoker)

57
Q

A T-test is a statistical test used to determine whether there is a significant difference between sample and population means, or between the means of two samples.

True
False

A

False

58
Q

Use a Z-test when the population standard deviation (σ) is unknown

and must be estimated from the sample.

True
False

A

False

59
Q

What is the formula for Pearson Correlation Coefficient

A
60
Q

If you know the population standard deviation and have a large sample size (n > 30), you can use a Z-test for comparing means.
True
False

A

True

61
Q

If the population standard deviation is unknown or the sample size is small (n < 30), use a t-test to compare means.

True
False

A

True

62
Q

If the test statistic is greater than the critical t-alue, we reject the null hypothesis.

True
False

A

False

63
Q

In the given example, the calculated t-value of -4.22 is less than the critical t-value of -2.821, so we reject the null hypothesis.

True
False

A

True

64
Q

Match the following types of analytics with their corresponding questions

Descriptive Analytics
Diagnostic Analytics
Predictive Analytics

A

What has happened or what is happening now?
Why it happened?
What will likely happen?

65
Q

Which of the following activities are associated with Data Exploration?

A) Data cleaning
B) Data augmentation and transformation
C) Exploratory data analysis
D) Feature selection
E) Identify data dependencies and correlations
F) Identify trends or anomalies in the data

C, E, F
A, B, D
B, D, F
A, C, E

A

C, E, F

66
Q

Which of the following activities are associated with Data Exploration? Choose 3 correct answers

Identify data dependencies and correlations
Identify trends or anomalies in the data
Exploratory data analysis
Data cleaning
Feature selection
Data augmentation and transformation

A

Identify data dependencies and correlations
Identify trends or anomalies in the data
Exploratory data analysis

67
Q

Which of the following activities are associated with Data Modification?

A) Data cleaning
Data augmentation and transformation
Exploratory data analysis
Feature selection
Identify data dependencies and correlations
Identify trends or anomalies in the data

A

A) Data cleaning
Data augmentation and transformation
Feature selection

68
Q

Which process involves removing or correcting errors in the data?

Data cleaning
Data augmentation
ata transformation
Feature selection

A

Data cleaning

69
Q

What is the purpose of Feature Selection?

To reduce the number of variables for modeling
To identify trends in the data
To enhance the data with additional information
To clean the data

A

To reduce the number of variables for modeling

70
Q

Which activity involves adding new data points or modifying existing ones to improve the dataset?

Data augmentation
Data cleaning
Exploratory data analysis
Feature selection

A

Data augmentation

71
Q

Which of the following is NOT typically a part of Data Exploration?

Cleaning the data
Identifying data dependencies
Identifying trends in the data
Exploratory data analysis

A

Cleaning the data

72
Q

Which activity is crucial for understanding the relationships between different variables in a dataset?

Identifying data dependencies and correlations
Data cleaning
Data augmentatio
Feature selection

A

Identifying data dependencies and correlations

73
Q

Can you use the model already for prediction purposes?

No, you still need to investigate the model’s goodness-of-fit.
Yes, the model is ready for predictions.

A

No, you still need to investigate the model’s goodness-of-fit.

74
Q

What do you need to prove before using the model for predictions?

If your predictors are significant
The model’s accuracy

A

If your predictors are significant

75
Q

Simple Linear Regression Match the Symbol:

y
β
x
α
ε

A

dependent variable
beta coefficient
independent variable
alpha intercept
error term

76
Q

Which of the following methods is best for visualizing the relationship between TV ad spend and sales?

Scatter plot
Line graph
Bar chart
Pie chart

A

Scatter plot

77
Q

What does ANOVA stand for?

Analysis of Varianc
Analysis of Variables
Analysis of Values
Analysis of Vectors

A

Analysis of Varianc

78
Q

In ANOVA, what does the explained variability represent?

The amount of variation in the response variable that may be attributed to the predictors explicitly stated in the model
The total variation in the response variable
The amount of variation that cannot be explained by the model
The amount of variation attributed to random error

A

The amount of variation in the response variable that may be attributed to the predictors explicitly stated in the model

79
Q

Which part of the variation does ANOVA decompose?

Both explained and unexplained variability
Only the explained variability
Only the unexplained variability
Neither explained nor unexplained variability

A

Both explained and unexplained variability

80
Q

Why is ANOVA used in statistical analysis?

To compare the means of different groups
o measure the central tendency of data
To determine the correlation between variables
To visualize data distributions

A

To compare the means of different groups

81
Q

In multiple regression, what is the purpose of including multiple independent variables?

To improve the prediction accuracy by accounting for more factors
To increase the complexity of the model
To ensure the residuals are normally distributed
To reduce the sample size

A

To improve the prediction accuracy by accounting for more factors

82
Q

Which of the following is a key assumption of linear regression?

The residuals are normally distributed
The relationship between the independent and dependent variables is non-linear
The independent variables are highly correlated
The dependent variable is categorical

A

The residuals are normally distributed

83
Q

Which of the following libraries are used for mathematical and statistical operations on multi-dimensional arrays and matrices in Python?

NumPy
Pandas
Matplotlib

A

NumPy

84
Q

Which of the following libraries are used for data visualization in Python?

Matplotlib
SciPy
NumPy

A

Matplotlib

85
Q

Which of the following libraries are used for sorting, grouping, and rearranging data in Python?

Pandas
NumPy
SciPy
Matplotlib

A

Pandas

86
Q

Which of the following libraries are used for processing large multidimensional arrays and matrices in Python?

SciPy
Pandas
PyTorch

A

SciPy

87
Q

Which of the following libraries are used for deep learning in Python?

TensorFlow
Keras
Scikit-learn

A

TensorFlow

88
Q

Which of the following libraries are used for natural language processing in Python?

NLTK
Scrapy
Scikit-learn

A

NLTK

89
Q

Which of the following libraries are used for data scraping in Python?

Scrapy
Gensim
NLTK
Pandas

A

Scrapy

90
Q

Which of the following libraries are used for efficient learning of word representations in Python?

Gensim
Scrapy
NLTK

A

Gensim

91
Q

Which of the following libraries are used for creating spiders bots that scan website pages and collect structured data in Python?

Scrapy
SciPy
Pandas

A

Scrapy

92
Q

Which of the following libraries are used for object identification, speech recognition, and more in Python?

PyTorch
Keras
Dist-keras

A

PyTorch

93
Q

Which of the following libraries are used for reading data, selecting and filtering in data, and data manipulations in Python? There are two correct answer in the options, just choose one.

NumPy
Pandas
SciPy
PyTorch

A

NumPy
Pandas

94
Q

Which of the following libraries are used for creating two-dimensional diagrams and graphs in Python?

Matplotlib
NumPy
SciPy
Seaborn

A

Matplotlib

95
Q

Which of the following libraries are used for creating interactive and scalable visualizations in a browser using JavaScript widgets in Python?

A

Plotly
Bokeh

96
Q

Which Python libraries are built on NumPy? There are two correct ansers from the choices, just select one.

Pandas
Scikit-Learn
Seaborn
Matplotlib

A

Pandas
Scikit-Learn

97
Q

Which Python library provides machine learning algorithms?

Scikit-Learn
NumPy
Matplotlib
Pandas

A

Scikit-Learn

98
Q

Which data type in Pandas corresponds to a column with mixed data types?

object
int64
float64
timedelta[ns]

A

object