Descriptive statistics Flashcards
What is reproducibility and the two types of it?
Results Reproducibility: Achieving the same results with the same data as in the original study.
Inferential Reproducibility: Drawing similar conclusions from the data.
What is transparency? (Auginis)
Crucial for evaluating methodological rigor and for enabling policy and managerial application of findings.
What is research performance problem?
Rooted in insufficient knowledge, skills, or motivation among researchers, leading to inadequate methodological transparency.
What techniques can be used for handling missing data?
Imputation methods
Listwise deletions
What should you do before embarking on the actual data analysis?
Inspect the data for coding errors, outliers and missing values.
How can you treat missing values?
- listwise deletion: omit all cases with missing values. works well when there is little missing data relative to the sample size.
- pairwise deletion: retains more data than listwise by using cases where data is available for each pair of variables in the analysis. where listwise removes all cases with any missing data, pairwise only removes cases when the data is missing between each pair or variables being analyzed
- replace missing values with a neutral value.
Which are the two measures of central tendency?
Mean and median.
Mean = all the numbers added / amount of numbers
Median = take the two numbers in the middle and divide it by two
What is skewness?
The skewness measures how far away an observed distribution is from a theoretical symmetrical distribution.
If the distribution is symmetrical, then the skewness is 0.
What is kurtosis?
The kurtosis is a measure of how peaked or flat the distribution is compared to a theoretical normal distribution.
What is it called when the distribution has the same kurtosis as a normal distribution?
mesokurtic (kurtosis is 0)
What is the variance?
Variance is a measure of how the data is spread out around the mean. This is partly because respondents think differently about specific questions, and partly due to respondent error.
What are the two basic methods for estimating how reliable an empirical measurement is?
- Consistency over time
- Internal consistency
What is Cronbach’s Alpha?
the most widely used measure of reliability, often referred to as simply, alpha (α)
What does covariation and correlation measure?
both measure the linear relationship between variables. When people talk about correlation, they are most often referring to the Pearson correlation
What is Spearman’s Rank Correlation?
is for estimating a correlation coefficient for ordinal level variables
What is the T-test?
T-tests determine whether there is a statistically significant difference between the means of two groups or between the mean of one group and a specified test value.
It’s like asking, “Does this bowl have more candies for sure, or is it just a little different by accident?”
What is the independent samples t-test?
determines whether there is a statistically significant difference between the means of two unrelated samples. The samples are assumed to be mutually exclusive, meaning that no case is present in both groups. A typical example is comparing gender (assuming two categories) for a continuous variable, like the amount of sick leave for men and women.
Quantitative is more XXX and XXX than qualitative?
Quantitative research design is more linear and sequential than qualitative. One step determines the next, and each is dependent on what has been done before.
Why does quantitative research have a deductive logic?
The logic is deductive in that it requires researchers to work from a theory/hypothesis and then gather data to describe it or test it.
What is coding (quan)?
Turning raw data (ex answers/observations) into numeric codes (numbers)
A two category “nominal” variable is often called a dummy variable (when you have 1 or 0 like male = 1 and women = 0).
Which are the two general categories of statistics?
- Descriptive statistics (statistical procedures that is used for summarizing, organizing and describing data in an illustrative way)
- Inferential statistics (allows us to draw inferences and conclusions from the population on the basis of sample data. Represented as tests of significance
What is nominal data?
questions that ask about categories; categories without values or ranking. such as gender or race.
What are the four data measurement scales?
- nominal (categorical)
- ordinal (categorical)
- interval (numerical)
- ratio (numerical)
What is ordinal data?
questions that ask about oredr/ranking. often used to capture preferences/attitudes
“a master degree at uppsala is beneficial for your future”
1 är dont agree 10 är strongly agree
What is scale ata?
numeric values on an internval/ratio. often used to capture the exact amount like income, weight, age.
“how many employees does your company have?”
What is the difference between drop outs and missing data?
ex if individulals that make up the target sample do not participate they are drop outs.
if specific questions are NOT answered, these are referred to as missing data.
Descriptive statistics can be done.. how?
- graphical
- charts
- histogram - numerical
- mean/median
- spread (standard deviation)
- shape (skew/kurtosis)
What types of graphs are there?
Bar charts
- represents categories, not numerical values
Pie charts
- each variable value is represented by a sector proportional to its frequency
Time plot
- suitable when the categories are points in time
- good tool for illustrating trends
What is a histogram?
its like a bar chart but for continous variables.
What is a scatterplot?
Shows the relationship between two variables. Is very blurry if there are many data units.
What is the mode?
The most frequent (common value)
What is spread?
The range of the data: the difference between the minimum value and the maximum value.
What is standard deviation?
Shows the average difference between each individual data points and the mean age. If all data points are close to the mean then the standard deviation is low, showing that there is little difference between values. A large STD shows a large spread of the data.
What is a kologorow-smirnov value?
U use it when u have a number of observers that is > 50
What is the shapiro-wilk value?
You use it when u have a number of observers that is < 50
What is a rule of thumb in standard deviation?
everything above 0,05 is a normally distributed variable
What does skewness = 0 say?
Symmetric distribution historigram
What does positive skewness say?
More observations to the left of the mean than above it. So you have a tail to the right. The mean > than the median.
What does a negative skewness say?
You have a small number of low observations and a large number of high ones. So you have the tail to the left. The median > mean.
What is kurtosis?
Kurtosis shows how peaked or flat the distribution (histogram) is
Negative kurtosis (<0) = a flat and wide distribution (platykurtic)
Positive kurtosis (>0) = a peaked distribution (leptokurtic)
What is correlation?
Correlation is the degree to which two variables are linearly related
What is the dependent t-test?
aka paired t-test, compares the means of two related groups to determine whether there is a statistically significant difference between the means
When is surveys good to use as a research method?
- for descriptive, exploratory, explanatory research purposes
- to collect original information about a population
- to measure altitudes, preferences
IS NOT THE SAME AS QUESTIONAIRE
What are some sampling techniques?
- random sampling (everyone has equal chance to be selected)
- cluster sampling (dividing a population into clusters, then random selection)
- stratified sampling (the population is divided into homogeneous groups and then random selection)
- convenience
- purposeful
What is a questionnaire?
a research instrument consisting of a series of standardized questions for the purpose of gathering information from a specific target group or audience. questionnaires are used to obtain a structured set of survey data
What is important before conducting questionnaires?
you should specified you variables properly and find suitable indicators before you go out and collect your data
Why would you prefer close-end questions?
quick to answer
precise
no confusion (like it can be if they answer themselves)
easy coding and analysis
you get access to a wide range of participants
What is respondent bias in questionnaires?
- lack of knowledge
- incomplete or inaccurate information
- respondents are not able to comprehend the questions
- context effect
- memory loss (so u guess)
- time constraints
- social desirability
- affirmative behavior (answer what you think is wanted)
- fear of disclosure (fear of consequences when telling the truth)
What are the disadvantages with questionnaires?
- single source and self-reported data (common method bias; if u study issues related to companies it is often that only one person answers)
- rather low response rates in general
What is a concept?
based on theory
What is a construct (variable)?
A concept that is operationalized
What are indicators?
ex the questions in a questionnaire - direct operationalized measures