Descriptive statistics Flashcards
What is reproducibility and the two types of it?
Results Reproducibility: Achieving the same results with the same data as in the original study.
Inferential Reproducibility: Drawing similar conclusions from the data.
What is transparency? (Auginis)
Crucial for evaluating methodological rigor and for enabling policy and managerial application of findings.
What is research performance problem?
Rooted in insufficient knowledge, skills, or motivation among researchers, leading to inadequate methodological transparency.
What techniques can be used for handling missing data?
Imputation methods
Listwise deletions
What should you do before embarking on the actual data analysis?
Inspect the data for coding errors, outliers and missing values.
Coding errors = illogical values in the data set. If a variable can only contain values from 1 to 7, then all values outside this range are coding errors
Outliers = extreme values that deviate from what is typical for the variable. If a variable can assume values between 1 and 100, and the majority of observations are grouped from 1 to 20, then values like 70, 80, 90 is outliers.
Missing values = occur then where is no observation recorded in one or more cells in a data set.
How can you treat missing values?
- listwise deletion: omit all cases with missing values. works well when there is little missing data relative to the sample size.
- pairwise deletion: retains more data than listwise by using cases where data is available for each pair of variables in the analysis. where listwise removes all cases with any missing data, pairwise only removes cases when the data is missing between each pair or variables being analyzed
- replace missing values with a neutral value.
Which are the two measures of central tendency?
Mean and median.
Mean = all the numbers added / amount of numbers
Median = take the two numbers in the middle and divide it by two
What is skewness?
The skewness measures how far away an observed distribution is from a theoretical symmetrical distribution.
If the distribution is symmetrical, then the skewness is 0.
What is kurtosis?
The kurtosis is a measure of how peaked or flat the distribution is compared to a theoretical normal distribution.
What is it called when the distribution has the same kurtosis as a normal distribution?
mesokurtic (kurtosis is 0)
What is the variance?
Variance is a measure of how the data is spread out around the mean. This is partly because respondents think differently about specific questions, and partly due to respondent error.
What are the two basic methods for estimating how reliable an empirical measurement is?
- Consistency over time
- Internal consistency
What is Cronbach’s Alpha?
the most widely used measure of reliability, often referred to as simply, alpha (α)
What does covariation and correlation measure?
both measure the linear relationship between variables. When people talk about correlation, they are most often referring to the Pearson correlation
What is Spearman’s Rank Correlation?
is for estimating a correlation coefficient for ordinal level variables
What is the T-test?
T-tests determine whether there is a statistically significant difference between the means of two groups or between the mean of one group and a specified test value.
Imagine you have two bowls of candy. One bowl might have more candies than the other, or they might have the same amount. A t-test is like a special way to check if one bowl really has more candies, or if it just looks that way.
It’s like asking, “Does this bowl have more candies for sure, or is it just a little different by accident?”
What is the independent samples t-test?
determines whether there is a statistically significant difference between the means of two unrelated samples. The samples are assumed to be mutually exclusive, meaning that no case is present in both groups. A typical example is comparing gender (assuming two categories) for a continuous variable, like the amount of sick leave for men and women.
Quantitative is more XXX and XXX than qualitative?
Quantitative research design is more linear and sequential than qualitative. One step determines the next, and each is dependent on what has been done before.
Why does quantitative research have a deductive logic?
The logic is deductive in that it requires researchers to work from a theory/hypothesis and then gather data to describe it or test it.
What is coding (quan)?
Turning raw data (ex answers/observations) into numeric codes (numbers)
A two category “nominal” variable is often called a dummy variable (when you have 1 or 0 like male = 1 and women = 0).
Which are the two general categories of statistics?
- Descriptive statistics (statistical procedures that is used for summarizing, organizing and describing data in an illustrative way)
- Inferential statistics (allows us to draw inferences and conclusions from the population on the basis of sample data. Represented as tests of significance
What is nominal data?
questions that ask about categories; categories without values or ranking
What are the four data measurement scales?
- nominal (categorical)
- ordinal (categorical)
- interval (numerical)
- ratio (numerical)
What is ordinal data?
questions that ask about oredr/ranking. often used to capture preferences/attitudes
“a master degree at uppsala is beneficial for your future”
1 är dont agree 10 är strongly agree
What is scale ata?
numeric values on an internval/ratio. often used to capture the exact amount like income, weight, age.
“how many employees does your company have?”