3.2: Sampling Methods, Data reduction. and Bias Flashcards

1
Q

What are the main reasons for using sampling methods in business analytics?

A

Sampling methods are used in business analytics to deal with large and costly data sets, making data collection and analysis more efficient.

They allow analysts to work with subsets of data to draw inferences about the entire population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is data reduction, and why is it useful in business analytics?

A

Data reduction involves the process of reducing large data sets, often data for the entire population, into smaller data sets that focus on specific items of interest.

It is useful in business analytics to make data more manageable and relevant for analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why is it crucial to avoid biases when using sampling methods or data reduction techniques?

A

Avoiding biases is essential because biases can lead to inaccurate or skewed results. Biases can occur when the sample or reduced data set is not representative of the population, which can compromise the validity of analytical findings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the four common sampling methods used in business analytics?

A

The four common sampling methods are:

Simple random sampling
Stratified random sampling
Cluster sampling
Convenience/non-probability sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe simple random sampling and its effectiveness.

A

In simple random sampling, every observation in the population has an equal chance of being selected into the sample.

It is particularly effective when the goal is to obtain a representative sample of the entire population, and when the population is relatively homogeneous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Can you provide an example of when simple random sampling would be suitable?

A

Simple random sampling would be suitable when conducting a survey to determine the average income of households in a city, where you want every household to have an equal chance of being included in the sample, and there are no specific subsets or attributes of interest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can Excel be used to create a simple random sample?

A

Excel offers two methods to create a simple random sample:

Using the =RAND() function to generate random numbers between 0 and 1, which can be copied, sorted, and used for selecting observations.

Utilizing the Excel Data Analysis ToolPak, which provides a built-in feature for creating simple random samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the purpose of the =RAND() function in Excel when creating a simple random sample?

A

The =RAND() function in Excel generates random numbers between 0 and 1, which can be employed to assign random selection probabilities to observations.

These random numbers can be sorted and used to select the desired number of observations for a simple random sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does the Excel Data Analysis ToolPak assist in creating a simple random sample?

A

The Excel Data Analysis ToolPak is a built-in feature that provides tools for various data analysis tasks, including creating simple random samples.

It streamlines the process by allowing users to specify the sample size and generate the sample without manual calculations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the advantages of using Excel for creating simple random samples?

A

Excel’s capabilities make it a convenient tool for creating simple random samples. It simplifies the process, reduces the chance of errors, and enables efficient sampling from large datasets, enhancing the accuracy and reliability of research and analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is stratified random sampling, and when is it used in data reduction?

A

Stratified random sampling is a data-reduction method used when a population can be divided into distinct groups or strata, such as demographic or geographic categories, and you want to ensure that each group is adequately represented in your sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the key steps involved in creating a stratified random sample?

A

The steps for creating a stratified random sample are as follows:

Divide the population into groups or strata based on specific criteria.

Calculate the proportion of the population that each group (stratum) represents.

Perform a random sample within each group to ensure that the appropriate number of observations from each stratum is included in the overall sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is stratified random sampling useful when dealing with populations that have distinct groups or strata?

A

Stratified random sampling is useful in such cases because it ensures that each subgroup or stratum within the population is represented in the sample.

This method allows for more accurate analysis of each subgroup’s characteristics and prevents underrepresentation or bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Can you provide an example of when stratified random sampling might be applied?

A

Certainly. Suppose you want to study the job satisfaction of employees in a large corporation with divisions in different regions (e.g., North, South, East, and West).

Using stratified random sampling, you can ensure that employees from each region are proportionately represented in the sample to make meaningful regional comparisons.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is convenience sampling, and what is another name for it?

A

Convenience sampling, also known as non-probability sampling, is a method of data collection where data points are chosen based on convenience and accessibility.

It may not result in a representative sample of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When might convenience sampling be used despite its limitations?

A

Convenience sampling is typically used when time and budget constraints make it impractical to conduct more rigorous sampling methods.

It is chosen for its simplicity and speed, even though it may lead to a non-representative sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the two forms of convenience sampling, and how do they differ?

A

Convenience sampling can take two forms:

Selecting a subset of data that has already been collected.

Distributing a survey digitally or in paper format and stopping data collection after a specific number of responses (n) is received.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Why is convenience sampling not recommended for research or data analysis, despite its convenience?

A

Convenience sampling is not recommended because it often results in a sample that is not representative of the population.

This can introduce bias and lead to unreliable conclusions, making it less suitable for rigorous research and analysis.

19
Q

What is data reduction, and why is it important in business analysis projects?

A

Data reduction is the process of reducing the size of a data set to make it more manageable and suitable for analysis. It is crucial in business analysis to focus on critical, interesting, or abnormal data and to avoid unnecessary data overload.

20
Q

What is filtering in the context of data reduction?

A

Filtering is the process of selecting a smaller portion of a data set for further viewing or analysis.

It involves removing rows of data that are not of interest to concentrate on a specific and relevant subset.

21
Q

How can filtering be used to reduce data for analysis, and what are the benefits?

A

Filtering allows you to choose specific column attributes to identify and remove rows of data that you are not interested in analyzing.

This helps speed up data analysis by concentrating on the most relevant information, making it easier to derive insights.

22
Q

Which software tools facilitate the process of filtering data?

A

Software tools like Excel, Tableau, and Power BI offer features that make filtering data easy.

They provide options to filter data based on column attributes, allowing users to select and focus on the data of interest.

23
Q

Can you describe how filtering works in Excel as an example?

A

In Excel, filtering involves using drop-down arrows on columns to apply filters.

For instance, if you want to analyze sales to customers from Texas, you can filter the data set by clicking the drop-down button in the Customer State field, selecting “TX,” and applying the filter to display only the records associated with Texas.

24
Q

What factors should be considered when deciding to use a subset of data in analysis?

A

When deciding to use a subset of data, it’s important to consider:

The purpose of the analysis
Time constraints
Cost constraints

25
Q

How does the purpose of the analysis influence the decision to use a data subset?

A

The purpose of the analysis, based on the specific question or goal, determines whether a subset of data should be used.

For instance, if the analysis focuses on a particular timeframe or specific items of interest, filtering the data may be appropriate.

26
Q

In what situations might time and cost constraints lead to using a sample of data rather than the entire dataset?

A

Time and cost constraints can lead to using a sample of data when processing the entire dataset would take too long or be too expensive.

For example, when a quick answer is needed to continue operations and processing a massive dataset would be time-prohibitive, analyzing a sample becomes a practical choice.

27
Q

Why might running calculations on a massive dataset take significantly longer than on a subset?

A

Running calculations on a massive dataset takes longer because of the sheer volume of data to process.

Analyzing a subset reduces the amount of data to handle, making calculations quicker and more manageable.

28
Q

Can you provide an example of a scenario where analyzing a sample of data is the best course of action?

A

Certainly. Imagine a company’s database has 100 million rows of data that take a week to process, but an urgent question arises, requiring an answer within the hour to continue critical operations.

In this case, analyzing a sample of the data is the most practical and timely solution.

29
Q

What is bias in the context of business analytics?

A

Bias refers to prejudice in favor of or against a thing, person, group, or idea.

In business analytics, bias can occur intentionally or unintentionally and can affect the collection, analysis, and presentation of data.

30
Q

What are the two main categories of bias based on intention?

A

Bias can be intentional when someone deliberately favors or opposes something, or it can be unintentional, resulting from poor research methods or data misinterpretations.

31
Q

How can bias manifest in the context of business analysis?

A

Bias can manifest at various stages of business analysis, including when collecting new data, conducting the analysis, and presenting the results.

It can influence decision-making and affect the accuracy of analytical findings.

32
Q

What are the four common types of bias in business analytics?

A

The four common types of bias are:

Nonresponse bias
Selection bias
Confirmation bias
Outlier bias

33
Q

Why is it important to be aware of and work to eliminate biases in the business analytics process?

A

It is essential to be aware of and eliminate biases to engage in ethical data analytics.

Bias can lead to inaccurate conclusions, unfair treatment, and flawed decision-making.

Being vigilant about potential biases ensures the integrity and reliability of business analytics.

34
Q

What is nonresponse bias in the context of data collection and surveys?

A

Nonresponse bias refers to the partiality or bias that arises when respondents differ from non-respondents in a survey or data-collection method.

It occurs when individuals choose not to participate in the survey.

35
Q

When does nonresponse bias typically occur, and what are some reasons for it?

A

Nonresponse bias occurs when people opt out of responding to a survey.

It can happen due to various reasons, such as poorly written surveys, lengthy and burdensome questionnaires, or questions that invade respondents’ privacy.

36
Q

How can analysts work to prevent nonresponse bias in surveys and data collection?

A

Analysts can take several steps to prevent nonresponse bias, including:

Informing survey takers about the survey’s importance and how the results will be used.

Providing incentives to participants, such as gift cards or entry into prize drawings, to encourage responses.

37
Q

Why is it important to address nonresponse bias in data collection and surveys?

A

Addressing nonresponse bias is crucial because it helps ensure that the data collected accurately represents the target population.

Failing to do so can lead to skewed results and misinformed decisions, affecting the quality and reliability of the analysis.

38
Q

What is selection bias, and when does it occur in data analysis?

A

Selection bias occurs when an analyst intentionally selects portions of the population that are likely to provide answers aligned with their beliefs or hypothesis.

It can lead to results that do not accurately reflect the opinions or experiences of the entire population.

39
Q

What are two effective ways to avoid selection bias in data collection?

A

Simple random sampling and stratified sampling are two effective methods to avoid selection bias in data collection.

These methods help ensure that samples are representative and not skewed by analyst bias.

40
Q

What is confirmation bias, and when does it occur in data analysis?

A

Confirmation bias occurs when analysts analyze or present results in a way that confirms their existing beliefs or theories while ignoring data and analyses that do not support those beliefs.

It can distort the interpretation of results.

41
Q

What is outlier bias, and how can outliers impact study results?

A

Outlier bias refers to the disproportionate influence of extreme values (outliers) on study results. Outliers can significantly affect the interpretation of results.

While including extreme outliers can be informative if explained, neglecting to explain outliers may lead to results that do not represent the entire population, potentially resulting in poor decisions.

42
Q

Why is it essential for analysts to be aware of and mitigate confirmation and outlier bias in data analysis?

A

Analysts must be aware of and mitigate confirmation and outlier bias because these biases can distort the accuracy and reliability of analytical findings.

Failing to address them can lead to misinformed decisions, impacting the quality of business or research outcomes.

43
Q
A