First year Flashcards
Normal distribution
95% of values in a population will lie between 1.96 standard deviations of the mean (mean + 1.96sd)
If a variable X follows a normal distribution we say that X is N(μ, σ2) or X~ N(μ, σ2)
The mean (μ) The standard deviation (σ)
Tabla
Data
data types: numerical (discrete, a count or continuous, a measurement) or categorical (nominal, data can be ordered or nominal, data cannot be ordered)
Central tendency: the mode (the most frequently occurring value in a data set), the median (middle value in an ordered dataset) and the mean (la media)
Measures of position: the quartiles
Measures of spread: The ranges (maximum value – minimum value), the interquartile range (the difference between the first and third quartiles) standard deviation (how closely the data values in a dataset cluster around the mean)
Shape of data set
Outliers
An observation that is numerically distant from the rest of the data. Lower inner fence
LIF = Q1-(1.5 x IQR)
Upper inner fence UIF = Q3+(1.5 x IQR)
Tables
Percentage = Proportion = Probability
Correlation 1
Response (observation making, dependent variable) and factor (independent variable) in associations between two variables.
Depends on the nature of the variables,
- Numerical vs numerical: scatter plot
- Categorical vs categorical: Chi-Square test
- Categorical (factor) vs numerical (response):
• Comparing histograms
• Comparing boxplots
• Compare descriptive statistics
• Formal methods
• Correlation Coefficients - Ordinal variables: Spearman’s rank correlation coefficient (Rs)
Correlation 2
Value of r Correlation (linear relationship) -1 to –0.7 Strong negative correlation -0.7 to –0.3 Moderate negative correlation -0.3 to 0 Weak negative correlation 0 to 0.3 Weak positive correlation 0.3 to 0.7 Moderate positive correlation 0.7 to 1 Strong positive correlation very close to 0 No correlation
Correlation is NOT causation.
Correlation may support the argument for causation but it does not prove it.
Simple linear regression
y=mx+c
2 variables