quantitaive methods Flashcards
what is random sampling?
Every single dot will have an equal chance of being selected, difficult to achieve unless done by an instittuion with loads of resources.
what is stratified sampling?
Random within groups, divided into overall groups, based on common homogenous characteristics e.g. gender. Comes from a selection of each of these groups.
What is cluster sampling?
similar to stratified, individuals are divided into groups based on dfferent characteristics (Heterogenerous). Have a good mixture of everything in one group, select entire cluster as a sample.
What are the types of sampling?
random, stratified, cluster
What are the sampling biases ?
Convenience sample - individuals are easily accesible are more likely to be included in the sample.
Non-Responsive: if only a (non-random) fraction of the randomly sampled people respond to a survey such that sample is no longer representative of the population.
Voluntary Response: occurs when the sample consists of people who volunteer to respond becasue they have strong opinions on the issue.
What are the types of data?
1) Numerical (quantitative) 2) categorical (qualitative)
What are the types of modality?
unimodal, bimodal, multimodal and uniform
What is skewness? and what are the different types?
left skewed, symetric, right skewed
What is a box plot and explain IQR?
Box plot is a computation of quartiles and IQR. IQR is the difference between the upper and lower quartiles i.e. Q3 - Q1. / the range of the central 50% of the data.
what are the types of statisitics?
inferential statistics- methods used to estimate, predict andd generalise a property of a population on the basis of a sample.
Descriptive statisics- methods of organising, summarising, presenting data in an informative way.
what are the types of descriptive statisitcs?
measure of central location - mean, median, mode
measure of dispersion - range, variance, standard deviation
what are the meanings of mean, median and mode?
mean - set of values divided by the no of items
median - middle irem of the data
mode - value that occurs the most often
what are the unusual observations? and how do you deal with them?
1) errors - value not equal to the true/actual value. double entry check/delete it.
2) outliers- cannot be eliminated! - analyse data without it/treat it seperately
explain the mean and trimmed mean and their +/-ve?
mean - can be distorted by extreme values, often quoted to several decimals places, doesnt correspind to an actual value.
trimmed mean- remvoes the effect of unsual values, eliminate a small proportion of the lowest/highest observations. Its -ve is that its quoted to several decimal places.
explain the modes -ve/+ve
+ve - only sensible measure for categorical data
-ve - may not be representative, usntbae due to sensitiveness to the number of observations.
compare skewness and measure of centre
zero skewness = mode =median = mean
positive skewness = mode < median < mean
negative skewness = mode > median > mean
which way skewed is a) positive b)negative
a) right b)left
what is variance ?
the arithmetic mean of the squared deviations from the mean.
what is standard deviation?
it is the sqaure root of the variance.
why is standard deviation useful?
as the units associated with the variance are squared, by taking the square root the units are the same as the units used to calculate the mean = make direct comparisons witht the sample mean.
what is the coefficient of variation? why is it useful?
it is an indication of how large the standard deviation is in relation to the mean. It is useful when we want to compare the variablity of the variables that have different means and standard deviations.
what do we mean by lying with graphs ?
ranges used on a axis can distort the same data on two graphs
benefit of measure of dispersion? and the types ?
can be more importnat than the mean and average.
Range, IQR, Variance and standard deviation.
positives and negatives of range?
+ve - simplest measure of dispersion ( R= Max - Min), broad spread useful to spot typing errors.
-ve - only takes into account the two most extreme values
positives of IQR?
+ve - not influenced by extreme values, stable measure= doesnt change a lot if we keep adding observations
why do we use variance?
to get rid of negatives, so the -/+ve values dont cancel each other out when added together.
Also increases larger deviations more than smaller ones so that theyare weighted more heavily.
characteristics of variance?
- non-negative
- for observations who values near the mean, the variance will be small
- values dispersed from the mean, variance will be large
postives and negatives of variance?
+ve - uses all observations in the data set to measure the variation in the samle (vs range)
-ve - variance measures squared value = intepretation sint straightforward
what is standard deviation and an advantage of it?
it is the most common/useful measure of dispersion = average distance of each obeservation from the mean.
Advantage - uses all values of the data set, expresed in the same unit of measure as the observations.
what kind of relations can a scatter plot show?
linear (positive relationship) and non-linear
types of linear relationships and their meanings ?
linear positive - one variable increases so does the other one
linear negative - one variable increases the other decreases
non-linear association - e.g. hours studied and test score
no association - no. of people who go gymvs no. of tickets sold at museum