Statistics (2) Flashcards
What is the purpose of descriptive statistics?
To explore and compare data meaningfully, assess major differences, determine data distribution shape, check for missing or unusual data, see data noise, and verify data fit for further testing
Descriptive statistics provides a summary of the data but does not allow for objective decisions regarding hypotheses.
What are some key functions of descriptive statistics?
- Explore and compare data meaningfully
- Assess major differences between conditions/variables
- Determine the shape of data distributions
- Check for missing data or outliers
- See the amount of noise in the data
- Verify data fit for further statistical testing
These functions help in understanding the basic characteristics of the data before applying inferential statistics.
True or False: Descriptive statistics can help us make objective decisions about our alternative hypothesis.
False
For objective decisions regarding the alternative hypothesis, inferential statistics is required.
Fill in the blank: Descriptive statistics allows us to check for _______ or unusual data.
missing data
Identifying missing data and outliers is critical for ensuring the integrity of data analysis.
What is needed to arrive at an objective decision about the alternative hypothesis?
Inferential statistics
Inferential statistics allows researchers to make predictions or inferences about a larger population based on sample data.
What are descriptive statistics used for?
Descriptive statistics allow us to:
* Look at measures of central tendency, dispersion, and variation
* Organise and aggregate or disaggregate data in a meaningful way
* Get a ‘feel’ for any relevant patterns
* Present data graphically or in a tabular format
Descriptive statistics summarize data without making inferences about a larger population.
What do inferential statistics allow us to do?
Inferential statistics allow us to:
* Test hypotheses about distributions
* Determine whether differences or relationships are statistically meaningful
* Express whether we can retain or reject the null hypothesis
Inferential statistics make predictions or generalizations about a population based on a sample.
Fill in the blank: Descriptive statistics focus on analyzing _______ data.
[observed]
Fill in the blank: Inferential statistics are used to determine if differences or relationships are statistically _______.
[meaningful]
True or False: Descriptive statistics can present data graphically.
True
True or False: Inferential statistics provide a summary of data without making predictions.
False
What is the normal curve also known as?
Standard normal distribution
What are the important measures that tend to be in the center of the distribution?
Mean, median, and mode
What do the mean, median, and mode represent in a distribution?
Numbers that are representative of the distribution as a whole
Fill in the blank: The mean, median, and mode are measures of _______.
Central tendency
True or False: The mode is the measure that represents the least common value in a distribution.
False
What is the significance of the center of the distribution?
It is where important measures tend to be found
What is the mean in statistics?
The mean is the average of a set of numbers, calculated by adding all items in a set and dividing by the number of items.
The mean is commonly used to represent general performance in statistics.
What types of data is the mean primarily used with?
The mean is used mostly with interval and ratio data.
Interval data is numerical data where the difference between values is meaningful, while ratio data has a true zero point.
How is the mean mathematically represented?
X = (Σxi) / N
Where X is the mean, Σxi is the sum of all items in the set, and N is the number of items.
True or False: The mean can only be calculated for integer values.
False
The mean can be calculated for both integers and decimals.
What is the median?
The middle of a set of values if arranged from smallest to largest.
The median is particularly useful in non-normal distributions or with extreme scores.
When is the median most useful?
When you have a non-normal distribution, extreme scores, or ordinal data.
Ordinal data refers to data that can be ranked but not measured.
What is the mode?
The most commonly occurring number in a set of data.
The mode is most frequently used with nominal data.
When is the mode most frequently used?
With nominal data.
Nominal data is categorical data without a specific order.
What do measures of variability or dispersion indicate?
They indicate how the data varies and the spread of data.
What additional information can measures of variability provide?
They can provide insight into the amount of ‘noise’ in the data set.
List common measures of dispersion.
- Range
- Interquartile range
- Mean absolute deviation
- Variance
- Standard deviation
True or False: The interquartile range (IQR) is a common measure of dispersion.
True
Fill in the blank: Common measures of dispersion include range, interquartile range, mean absolute deviation, ______, and standard deviation.
[variance]
What is the range in descriptive statistics?
The difference between the smallest and largest value in a distribution
What is a limitation of using the range?
It is susceptible to extreme scores in a distribution
What is the interquartile range (IQR)?
The range of the middle 50% of values, between the 25th and 75th percentile
Why is the interquartile range (IQR) preferred over the range?
It is not susceptible to extreme scores
How can the interquartile range (IQR) be displayed graphically?
On a boxplot
Fill in the blank: The IQR is the range of values between the _______ and _______ percentile.
25th and 75th
What is the absolute mean deviation?
A measure of how much difference or deviation there is from the mean
It is calculated by finding the difference between each value and the mean, ignoring negative signs.
How is the absolute mean deviation calculated?
By working out the difference between each value and the mean, summing them, and dividing by N
N represents the total number of values.
What is a more useful measure than absolute mean deviation?
Standard deviation
Standard deviation maps onto the standard normal distribution and helps assess proportions within the whole distribution.
What is the relationship between standard deviation and variance?
The standard deviation is the square root of variance.
What is variance?
Variance is a statistic that indicates the overall amount of variability in a set of data by adding up the squared differences between each value and the mean, divided by N-1.
How is variance calculated?
Variance is calculated by the formula:
Var(X) = (Σ(Xi - x̄)²) / (N - 1)
where Xi represents each value, x̄ is the mean, and N is the number of observations.
What does variance indicate?
Variance indicates the overall amount of variability in a set of data.
What is the relationship between variance and statistical formulas?
Variance is used in many statistical formulas.
In the variance formula, what does N represent?
N represents the number of observations in the data set.
In the variance formula, what does Xi represent?
Xi represents each individual value in the data set.
True or False: Variance is expressed in the same units as the original data.
False
Fill in the blank: The formula for sample variance is Var(X) = _______.
(Σ(Xi - x̄)²) / (N - 1)
Fill in the blank: The formula for population variance is Var(X) = _______.
(Σ(Xi - μ)²) / N
What is the standard deviation?
A better measure of variance that is easier to understand, and is the square root of the variance.
Standard deviation (s) is measured in the original units of measurement and relates to the standard normal distribution.
What does the standard deviation help us understand?
It helps us get a much better sense of the distribution of scores in our data.
What is the formula for calculating sample standard deviation?
s = √(Σ(xi - x̄)² / (N - 1))
What does ‘N’ represent in the standard deviation formula?
The number of observations in the sample.
True or False: The standard deviation is only applicable to population data.
False
Fill in the blank: The standard deviation relates to the _______ distribution.
standard normal
What is the relationship between variance and standard deviation?
Standard deviation is the square root of variance.
What is the standard error of the mean (SE)?
A measure of the error in estimating the population mean, especially with small sample sizes
The SE reflects the standard deviation of the population mean.
Why is the standard error important?
It assesses the degree of error between sample means and indicates uncertainty around knowing the population mean
The sample mean is usually not exactly the same as the population mean.
How is the standard error calculated?
By imagining a re-sampling of scores that provide different deviations from the mean
This method allows for an assessment of the degree of error.
Fill in the blank: The standard error is a measure of the _______ in estimating the population mean.
error
True or False: The standard error indicates the degree of certainty around knowing the population mean.
False
The standard error indicates the degree of uncertainty.
What is the standard error of the mean (SE)?
A measure of the error in estimating the population mean, especially with small sample sizes
The SE reflects the standard deviation of the population mean.
Why is the standard error important?
It assesses the degree of error between sample means and indicates uncertainty around knowing the population mean
The sample mean is usually not exactly the same as the population mean.
How is the standard error calculated?
By imagining a re-sampling of scores that provide different deviations from the mean
This method allows for an assessment of the degree of error.
Fill in the blank: The standard error is a measure of the _______ in estimating the population mean.
error
True or False: The standard error indicates the degree of certainty around knowing the population mean.
False
The standard error indicates the degree of uncertainty.
What are the two main measures of the shape of a distribution?
Height and breadth
These measures help to describe how data is distributed.
What term refers to the shape of a distribution?
Kurtosis
Kurtosis can vary from tall and thin to short and wide.
What do we refer to as the degree of asymmetry in a distribution?
Skew
Skew can vary in severity and can be positive, negative, or zero.
What is a positive skewness value indicative of?
Positive skew
A positive skew means that the tail on the right side of the distribution is longer or fatter.
What does a negative skewness value indicate?
Negative skew
A negative skew means that the tail on the left side of the distribution is longer or fatter.
What skewness value indicates a symmetrical distribution?
0
A skewness value of 0 indicates no asymmetry in the distribution.
Fill in the blank: The shape of a distribution can vary from _______ to _______.
tall and thin, short and wide
This variation is captured by the concept of kurtosis.
What can extreme values in a distribution of data do to measures of central tendency?
Skew them positively or negatively.
What is the purpose of the Shapiro-Wilk test in JAMOVI?
To test for normality in a data distribution.
What is one visual method to identify extreme values in data?
A boxplot.
Fill in the blank: Extreme values in a distribution of data can _______ the measures of central tendency.
skew
True or False: The Shapiro-Wilk test is a method for testing the presence of extreme values in a dataset.
False
What should the preliminary information section explain to the respondent?
The study’s purpose and duration
This allows respondents to understand what they are participating in.
What rights do respondents have regarding their data?
They can withdraw their data if they wish
It’s essential to inform them how to do this.
What contact information should be provided to respondents?
Your email address
This allows respondents to reach out for further questions.
What should respondents be informed about if they have further questions?
They can contact you
Providing clarity on how to ask questions increases transparency.
Fill in the blank: The preliminary information should give the respondent sufficient information to make an _______.
[informed decision]
What must participants have before deciding to take part in a study?
Sufficient information
Participants should be well-informed to make a decision about their participation.
What is required for using participants’ information in a study?
Consent
Participants must agree to allow their information to be used for the specified purposes.
What option must be available to participants regarding their data?
The possibility to withdraw their data
Participants should understand how and when they can withdraw their data.
How should the survey ensure anonymity?
By not asking for names or student IDs
Anonymity helps protect participants’ identities.
What should be explained to participants regarding data withdrawal?
How they can withdraw their data and a timeframe for doing so
For example, a timeframe could be one week.
What must participants have before deciding to take part in a study?
Sufficient information
Participants should be well-informed to make a decision about their participation.
What is required for using participants’ information in a study?
Consent
Participants must agree to allow their information to be used for the specified purposes.
What option must be available to participants regarding their data?
The possibility to withdraw their data
Participants should understand how and when they can withdraw their data.
How should the survey ensure anonymity?
By not asking for names or student IDs
Anonymity helps protect participants’ identities.
What should be explained to participants regarding data withdrawal?
How they can withdraw their data and a timeframe for doing so
For example, a timeframe could be one week.
What is typically desired regarding participant demographic information?
Collect as little information as possible
This approach helps to minimize the burden on participants and focuses on essential data.
What determines the demographic information collected in a study?
What we are studying
The specific research question influences the type of demographic data needed.
What are examples of demographic information that might be collected?
- Employment status
- Geographic location
These factors can provide context for the study’s findings.
What minimum demographic data should be measured in a study?
- Age (as a whole number)
- Gender
These are fundamental demographic variables that are often relevant in research.
What should you use for each questionnaire?
A different block
This helps in organizing questions clearly.
What principles should guide the question type and format?
Simplicity and being clearly understood
Ensures that participants can easily comprehend the questions.
What can be useful to include at the start of each block?
A brief explanation of how to respond
This helps participants understand the expectations for their answers.
What type of response may be encouraged for some questions?
First impression
This approach prevents participants from over-thinking their answers.
What should be included at the end of the survey?
Thank the participant and briefly explain what the study was about
This helps in providing closure and context to the participants regarding their involvement in the study.
What should participants be reminded of at the end of the survey?
Contact details to withdraw their data or ask a question
This ensures participants have the necessary information to exercise their rights regarding their data.
Why is it useful to signpost participants to agencies or resources?
In cases where the content of the study is sensitive
This provides participants with support options if they feel distressed or need further assistance after participating.