rmb Flashcards
what is a simple way to simplify a large set of numbers?
counting how often each number occurs (frequency)
what type of data do we use histograms for?
continuous
where is the centre of a histogram?
1
what is the benefit of using more bins in a histogram?
shows the distribution with higher resolution (but can get noisy)
how does a change in mean affect distribution shape metrics?
a change in mean keeps the shape of the distribution the same but changes the centre of mass such that the highest bars occur where the most likely values are
how does a change in variance affect distribution shape metrics?
a change in variance stretches or compresses the data set to reflect the values in the dataset occurring from a wide range of values or a very narrow range of values
how does a change in skewness affect distribution shape metrics?
a dataset with a negative skewness will have a long tail in which that tail points towards negative values in the dataset
how does a change in kurtosis affect distribution shape metrics?
kurtosis reflects the peak hardness of our datasets
so data high kurtosis will have a sharp peak, and low kurtosis will have very wide tails
what is a dataset?
a collection of data acquired for a specific purpose
may relate to multiple experiments or hypotheses
what is a variable?
a number that can ‘vary’ (e.g. take a high or a low value) depending on an attribute that we’re trying to measure
name all the types of variables
nominal
ordinal
interval
ratio
what is nominal data?
no relationship between different possibilities in scale. sometimes called categorical data
the distinct set of possible answers, and there is no particular order in relating those things together
e.g. country of origin
what is ordinal data?
a natural order between possibilities but nothing else. can’t interpret the ‘magnitude’ of differences
e.g. likert scales
what is interval data?
the possibilities are ordered and have interpretable magnitudes, though ‘zero’ does not have special meaning
e.g. temperature
what is ratio data?
like interval data, but now zero is directly interpretable and we can interpret ratios between values
e.g. reaction times
what is continuous data?
a variable that can change freely to take any value
for example - temp could be 4C, 10.34C or -0.0000513C
what is discrete data?
a numbered variable that takes one of a fixed set of values
for example - number of cars owned
what is a sample?
the data we’ve actually collected
what is a population?
in most cases a theoretical or hidden quantity which represents the distribution we would have seen if we were able to collect all possible data to completely describe the group of people we’re interested in
the total set of everyone within a group that we want to test
do very large datasets reflect the wider population better or worse than small datasets?
better
what does a sampling distribution tell us?
how variable the mean is for a given data sample from a given population
how does a larger standard deviation differ than a smaller one on a distribution graph?
with a larger standard deviation we notice a very similar mean/centre of the sampling distribution but the breadth of it is much larger
if a larger sample produced a higher standard error of mean, what does this suggest?
that each sample in the larger population is more variable so we can be less precise in our estimation of the mean from one sample of the second population compared to the first
if a dataset is normally distributed, how can we calculate the standard error of the mean?
SEM = σ / √N
dividing the standard deviation of the data by the square root of the number of samples