Descriptive statistics Flashcards

1
Q

What is reproducibility and the two types of it?

A

Results Reproducibility: Achieving the same results with the same data as in the original study.

Inferential Reproducibility: Drawing similar conclusions from the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is transparency? (Auginis)

A

Crucial for evaluating methodological rigor and for enabling policy and managerial application of findings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is research performance problem?

A

Rooted in insufficient knowledge, skills, or motivation among researchers, leading to inadequate methodological transparency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What techniques can be used for handling missing data?

A

Imputation methods

Listwise deletions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What should you do before embarking on the actual data analysis?

A

Inspect the data for coding errors, outliers and missing values.

Coding errors = illogical values in the data set. If a variable can only contain values from 1 to 7, then all values outside this range are coding errors

Outliers = extreme values that deviate from what is typical for the variable. If a variable can assume values between 1 and 100, and the majority of observations are grouped from 1 to 20, then values like 70, 80, 90 is outliers.

Missing values = occur then where is no observation recorded in one or more cells in a data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can you treat missing values?

A
  1. listwise deletion: omit all cases with missing values. works well when there is little missing data relative to the sample size.
  2. pairwise deletion: retains more data than listwise by using cases where data is available for each pair of variables in the analysis. where listwise removes all cases with any missing data, pairwise only removes cases when the data is missing between each pair or variables being analyzed
  3. replace missing values with a neutral value.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which are the two measures of central tendency?

A

Mean and median.

Mean = all the numbers added / amount of numbers

Median = take the two numbers in the middle and divide it by two

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is skewness?

A

The skewness measures how far away an observed distribution is from a theoretical symmetrical distribution.

If the distribution is symmetrical, then the skewness is 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is kurtosis?

A

The kurtosis is a measure of how peaked or flat the distribution is compared to a theoretical normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is it called when the distribution has the same kurtosis as a normal distribution?

A

mesokurtic (kurtosis is 0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the variance?

A

Variance is a measure of how the data is spread out around the mean. This is partly because respondents think differently about specific questions, and partly due to respondent error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the two basic methods for estimating how reliable an empirical measurement is?

A
  1. Consistency over time
  2. Internal consistency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Cronbach’s Alpha?

A

the most widely used measure of reliability, often referred to as simply, alpha (α)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does covariation and correlation measure?

A

both measure the linear relationship between variables. When people talk about correlation, they are most often referring to the Pearson correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Spearman’s Rank Correlation?

A

is for estimating a correlation coefficient for ordinal level variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the T-test?

A

T-tests determine whether there is a statistically significant difference between the means of two groups or between the mean of one group and a specified test value.

Imagine you have two bowls of candy. One bowl might have more candies than the other, or they might have the same amount. A t-test is like a special way to check if one bowl really has more candies, or if it just looks that way.

It’s like asking, “Does this bowl have more candies for sure, or is it just a little different by accident?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the independent samples t-test?

A

determines whether there is a statistically significant difference between the means of two unrelated samples. The samples are assumed to be mutually exclusive, meaning that no case is present in both groups. A typical example is comparing gender (assuming two categories) for a continuous variable, like the amount of sick leave for men and women.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Quantitative is more XXX and XXX than qualitative?

A

Quantitative research design is more linear and sequential than qualitative. One step determines the next, and each is dependent on what has been done before.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why does quantitative research have a deductive logic?

A

The logic is deductive in that it requires researchers to work from a theory/hypothesis and then gather data to describe it or test it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is coding (quan)?

A

Turning raw data (ex answers/observations) into numeric codes (numbers)

A two category “nominal” variable is often called a dummy variable (when you have 1 or 0 like male = 1 and women = 0).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which are the two general categories of statistics?

A
  1. Descriptive statistics (statistical procedures that is used for summarizing, organizing and describing data in an illustrative way)
  2. Inferential statistics (allows us to draw inferences and conclusions from the population on the basis of sample data. Represented as tests of significance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is nominal data?

A

questions that ask about categories; categories without values or ranking

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the four data measurement scales?

A
  1. nominal (categorical)
  2. ordinal (categorical)
  3. interval (numerical)
  4. ratio (numerical)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is ordinal data?

A

questions that ask about oredr/ranking. often used to capture preferences/attitudes

“a master degree at uppsala is beneficial for your future”
1 är dont agree 10 är strongly agree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is scale ata?

A

numeric values on an internval/ratio. often used to capture the exact amount like income, weight, age.

“how many employees does your company have?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the difference between drop outs and missing data?

A

ex if individulals that make up the target sample do not participate they are drop outs.

if specific questions are NOT answered, these are referred to as missing data.

22
Q

Descriptive statistics can be done.. how?

A
  1. graphical
    - charts
    - histogram
  2. numerical
    - mean/median
    - spread (standard deviation)
    - shape (skew/kurtosis)
23
Q

What types of graphs are there?

A

Bar charts
- represents categories, not numerical values

Pie charts
- each variable value is represented by a sector proportional to its frequency

Time plot
- suitable when the categories are points in time
- good tool for illustrating trends

24
Q

What is a histogram?

A

its like a bar chart but for continous variables.

25
Q

What is a scatterplot?

A

Shows the relationship between two variables. Is very blurry if there are many data units.

26
Q

What is the mode?

A

The most frequent (common value)

27
Q

What is spread?

A

The range of the data: the difference between the minimum value and the maximum value.

28
Q

What is standard deviation?

A

Shows the average difference between each individual data points and the mean age. If all data points are close to the mean then the standard deviation is low, showing that there is little difference between values. A large STD shows a large spread of the data.

29
Q

What is a normality test?

A
30
Q

What is a kologorow-smirnov value?

A

U use it when u have a number of observers that is > 50

31
Q

What is the shapiro-wilk value?

A

You use it when u have a number of observers that is < 50

32
Q

What is a rule of thumb in standard deviation?

A

everything above 0,05 is a normally distributed variable

33
Q

What does skewness = 0 say?

A

Symmetric distribution historigram

34
Q

What does positive skewness say?

A

More observations to the left of the mean than above it. So you have a tail to the right. The mean > than the median.

35
Q

What does a negative skewness say?

A

You have a small number of low observations and a large number of high ones. So you have the tail to the left. The median > mean.

36
Q

What is kurtosis?

A

Kurtosis shows how peaked or flat the distribution (histogram) is

Negative kurtosis (<0) = a flat and wide distribution (platykurtic)

Positive kurtosis (>0) = a peaked distribution (leptokurtic)

37
Q

Can a histogram show outliers?

A

Yes

38
Q

What is correlation?

A

Correlation is the degree to which two variables are linearly related

39
Q

What is the correlation coefficient?

A

its a statistic representing how closely two variables co-vary (from -1, 0, +1). it measures the strength and direction of a linear relationship

-1 is perfectly negative (total opposites)
0 is no correlation
+1 is perfect positive correlation

40
Q

What is two important rules of correlations?

A

Correlation does not imply causation!

and correlation significant if sig < 0,05

if we have a correlation coefficient that is below 0,3 its a low correlation

41
Q

What is the hypothesis t-test?

A

it is a technique that checks if two means are reliably different from each other; if there is a significant difference between two data groups.

42
Q

How can you do a hypothesis t-test?

A

Hypothesis 1: men are more confident with a climb than women (confidence).

The null hypothesis H0 for H1: there is no difference between men and women with regard to their confidence in climbing (equal difference)

43
Q

What is the independent t-test?

A

aka two sample t-test, is an inferential statistical test that determines whether there is a statistically significant difference between the means in two unrelated groups (ex men and women).

44
Q

What is the dependent t-test?

A

aka paired t-test, compares the means of two related groups to determine whether there is a statistically significant difference between the means

45
Q

Levenes test for significant above 0,05 means..

A

a non-significant difference between the groups. so instead you go to the top row to evaluate the significance of mean differences between groups

46
Q

When is surveys good to use as a research method?

A
  • for descriptive, exploratory, explanatory research purposes
  • to collect original information about a population
  • to measure altitudes, preferences

IS NOT THE SAME AS QUESTIONAIRE

47
Q

What are some sampling techniques?

A
  • random sampling (everyone has equal chance to be selected)
  • cluster sampling (dividing a population into clusters, then random selection)
  • stratified sampling (the population is divided into homogeneous groups and then random selection)
  • convenience
  • purposeful
48
Q

What is a questionnaire?

A

a research instrument consisting of a series of standardized questions for the purpose of gathering information from a specific target group or audience. questionnaires are used to obtain a structured set of survey data

49
Q

What is important before conducting questionnaires?

A

you should specified you variables properly and find suitable indicators before you go out and collect your data

50
Q

What are some types of questions in a questionnaire?

A

Open-ended (skriv fritt)

Closed-ended (välj 1, X, 2)

Combined open- and close

51
Q

Why would you prefer close-end questions?

A

quick to answer
precise
no confusion (like it can be if they answer themselves)
easy coding and analysis
you get access to a wide range of participants

52
Q

What is respondent bias in questionnaires?

A
  • lack of knowledge
  • incomplete or inaccurate information
  • respondents are not able to comprehend the questions
  • context effect
  • memory loss (so u guess)
  • time constraints
  • social desirability
  • affirmative behavior (answer what you think is wanted)
  • fear of disclosure (fear of consequences when telling the truth)
53
Q

What are the disadvantages with questionnaires?

A
  • single source and self-reported data (common method bias; if u study issues related to companies it is often that only one person answers)
  • rather low response rates in general
54
Q

What is a concept?

A

based on theory

55
Q

What is a measurement?

A

iq

56
Q

What is a construct (variable)?

A

A concept that is operationalized

57
Q

What are indicators?

A

ex the questions in a questionnaire - direct operationalized measures

58
Q

What is risk propensity?

A

it measures peoples general risk-taking tendencies