Statistics Flashcards
chapter 1.1-1.4
Describe the relationship between a dependent and an independent variable?
dependent variable is dependent of the independent one.
In an experiment the subjects are mostly the dependent variables and the treatment or the interventions are the independent ones. If the independent variable changes the dependent changes as well.
Define the four levels of measurement!
Variables can be measured
categorial so their category is important .
nominal: means that there are only a few categories you can put them into. pie chart / bar chart, central tendency - none
ordinal: the order of measured scores is important. central tendency: median / mean, histogram
quantitative variables:
ratio: every scale with an absolute zero point
mean , median ( with outliers)
interval: scale with a meaningful interval in between the numbers. mean / median ( with outliers)
What makes a graph skewed to the left / right?
skewed to the left : The skew ( tail) is on the left . So the peak is rather on the right side
skewed to the right: the skew is on the right. Peak on the left side.
What attributes do the mode, median and mean have?
If you see a symmetric graph / histogram, the mode is always on the left the median in the middle and the mean on the right.
the mode is the least informative and the mean is the most informative.
The mean is sensitive to outliers and not as stable as the median.
What is the summary of 5?
The minimum, Q1, Median , Q3 and the Maximum.
How do you calculate the Median? P.30
M= n+1/ 2 –> Location of M than add the values of the score location to the formular–> M= x1+x2/ 2
How do you calculate the mean? P. 28
sum of the values of the score divided by n.
How do you calculate the quartiles ?
To calculate the quartile you first need to find the median and then take the median out of the two halfs you identified through the median.
Define the boxplot! P.34
a graph of the five number summary. Not a real graph . A drawn box that visualizes the median, the two quartiles and the minimum and maximum on a scale.
How do you calculate the interquartile range IQR ?
IQR= (Q3-Q1)
How do you identify outliers? P.36
Multiply the IQR with 1,5 and
- from Q1 -> everything underneath that value is an outlier
+ to Q3–> everything above this value is an outlier
What`s the standard deviation and what the difference towards the variance? P.38
the standard deviation looks @ how far the scores from their mean. And gives an average of a value like this.
The variance is simply the step before the standard deviation. From the variance you may also read out the difference to the mean, but the numbers are larger. This makes it easier to spot outliers.
The standard deviation is closer to the actual numbers.
How is it possible to define the correlation between two variables?
- variables are either positively or negatively associated
- association means that the scores / values of both variable are above average . x> average ; y> average- go together
negative correlation: one goes up the other goes down.
Whats a scatterplot?
- visualisation of the correlation between two quantitative variables. f.i. price and rating
- each variable is represented as a dot in the coordinate system.
- -> you can easily spot outliers and see where there is a crowd of points . ( whether there is a weak or strong correlation)
What do I need to consider when figuring out a relationship between data?
to identify the relationship you need to :
- identify the cases - how many cases are there , what kind of?
- are the variables categorical / quantitative?
- is it rather a response variable or an explanatory variable?
- What are the values and labels for each variable?
Whats a log transformation?
- log comes from logarithm and is 10^x ( 10 hoch iwas)
- if you transform a scatterplot with a log transformation you take the values and take the root to make the distance between the scores shorter and the pattern more visible.
What does smoothing have to do with a scatterplot?
- its a geometrical method of creating a smooth curve in a scatterplot, that helps us in identifying a linear relationship.
- the higher the smooth values is the straighter the curve is .
What does the little r stand for? P 101
- correlation of two quantitative variables having a linear relationship.
- -> it describes if this relationship is positive or negative and how strong of weak it is .
What is the formula for r?
r = 1/ (n-1) * Z ( (xi-mean of x)/ sx) ((yi- mean of y)/sy) )
Tell me something about the character of r ?
- r is not affected if you change the scale. meters-> centimetres the correlation stays the same
- r is sensitive to outliers
- r is always in between of -1 and 1
- close to 0 = weak correlation ; close to 1/-1 scores are in a straight line
- negative correlation = x below mean+ y below mean
visualisation: left top corner to right bottom corner - positive correlation = x above mean + y above mean
visualisation= left bottom corner to right top corner
How can you put two categorical variables in a table? P 137
- you need a two way table which represents two categories at once yes /no / group one / two
- the yes / no on the left is the met requirement- whether a variable was met or not = row variable
- the = column variable is the vertical variable and describes group 1/2
- the combination of both is a = cell
What the hell is a joint distribution? P. 138
- the davidite of the thing you write in the cell ( cell entry) through the entire sample size–> you get a proportion
- -> all proportions together give 1
the collection ( so not the sum - just all of the proportions written out) = the joint distribution