Lecture 6 Flashcards
mean, SE, SD, median and plots
What can stats provide?
provide objective criteria for evaluating hypothesis, synthesize information, help detect patterns in data, help to critically evaluate arguments
what stats cannot provide?
tell the truth, compensate for poor design,
indicate clinical significance
examples of narrative data
gender, social status, profession, colour
numerical measurements examples
lab results, vital signs
what is a population
A population is any entire collection of people, animals, plants, or things from which we may collect data. It is the entire group we are interested in, which we wish to describe or draw conclusions about.
what is a sample
A sample is a group of units selected from a larger group (the population) for the study.
why we select samples for data analysis
• A sample is generally selected for study because the population is too large to study in its entirety. The sample should be representative of the general population. This is often best achieved by random sampling.
what is descriptive statistics
brief descriptive coefficients that summarize a given data set, which can be either a representation of the entire population or a sample of a population.
why descriptive statistics are important
we get acquainted with the data, calculate the outliers, asses assumptions needed to check statistical hypothesis, check that the data doesnt have any errors
describe nominal (categorical/qualitative) variable
values are in arbitrary (sutartinis)categories and no units
describe ordinal (categorical/qualitative) variable
values in ordered categories and no units
describe discrete (metric/quantitative) variable
integer (sveiki skaiciai), counted units
describe continuous (metric/quantitative) variable
continuous values, measured units
how can we systemise descriptive statistics
frequency and contingency tables, charts
how to summarise metric data
Mode, median, mean, quartile, variance, standard deviation
Asymmetry, kurtosis , Charts , histogram
what is a limitation of histograms
they can only represent one variable
how to write assumption of normality?
mean ± SD
what is mean
The mean is the arithmetic average of the observations
what is variance
the spread around the mean.
standard deviation
a quantity expressing by how much the members of a group differ from the mean value for the group.
why analysing only the mean is not accurate
• Analysing just the mean values is not as accurate, due to various values that could be present in the data set. Therefore we need to look at the variance of the data and SD to evaluate the accuracy.
what the empirical rule states?
The empirical rule states that for a normal distribution (bell shaped histogram), nearly all of the data will fall within three standard deviations of the mean.
the empirical rule
In particular, the empirical rule predicts that 68% of observations falls within the first standard deviation (µ ± σ), 95% within the first two standard deviations (µ ± 2σ), and 99.7% within the first three standard deviations (µ ± 3σ)
standard error definition
• The SD of the sample means, known as the standard error (SE), measures the deviation of individual sample means from the population mean. It measures the variability in the sample means.
what small and large SE values indicate
small - little variation between the sample means
large - much variation in the sample means
what determines the size of SE?
The size of the SE depends on the variation between individuals in the population and on the sample size, and is calculated as follows: SE = s/sqrt(n), where s is the SD of the sample and n is the sample size
why using a large sample size is beneficial?
Larger sample sizes lead to smaller standard errors (the denominator, n, is larger). Large sample sizes produce more precise estimates of the population value in question.
does 95% means probability?
no
what 95% means
95% mean that 5% of intervals will not include the population mean
why SE is useful
Standard error is a useful measure of uncertainty of the mean. However, it is better to use confidence intervals instead
define the median
The median is the middle observation, that is, the point at which half the observations are smaller and half are larger. Symbolized by Md or M.
is median sensitive to extreme values?
no
symmetric histogram
If the mean and the median are equal, the distribution of observations is symmetric.
skewed to the right
If the mean is larger than the median, the distribution is skewed to the right.
skewed to the left
If the mean is smaller than the median, the distribution is skewed to the left.
define the quartiles
• The quartiles of a ranked set of data values are the three points that divide the data set into four equal groups, each group comprising a quarter of the data.