Introduction to Statistical Analysis Flashcards
Why do we analyse data?
=To discover implicit structure in the data (finding patterns in experimental data which might
in turn suggest new models or experiments)
=To confirm or refute a hypothesis about the data
What are the types of data and the scales within them?
=Qualitative -Categorical scale -Ordinal scale =Quantitative -Interval scale -Ratio scale
What is categorical scale?
Each data item is drawn
from a fixed number of categories, where the
names of the categories may occur in any
sequence and are not orderable
-Nationality: French, Japanese, Mexican, etc.
-Can be called nominal
What is Ordinal scale?
Data on an ordinal scale has a recognized
ordering between data items, but there is no
meaningful arithmetic on the values
-Finishing position in a race: 1st, 2nd, 3rd etc.
What is Interval scale?
a numerical scale (usually with real
number values) in which we are interested in relative
rather than absolute value
(Celsius temperature scale)
=The differences between the numbers are
interpretable, but the variable doesn’t have a
“natural” zero value
=Subtraction and average are meaningful, but addition
or multiplication are not
What is Ratio scale?
Ratio scale: a numerical scale (again usually
with real number values) in which there is a
notion of absolute value (response time/ age in years)
=Zero really means zero
=Subtraction, average, addition and
multiplication are meaningful
What is the difference between continuous and discrete data?
Continuous variable: it is possible to have another
value between any two values
e.g. response time
• Discrete variable: a variable that is not continuous
e.g. graduation year
What scales are continuous and discrete?
Continuous= interval and ration (quantitative) Discrete= nominal, ordinal, interval, ratio (all)
Describe normal distribution
Any normal distribution is described by two
parameters:
The mean μ is the centre around which the data
clusters.
The standard deviation σ is a measure of the
spread of the curve
What are the percentages associated with standard deviations?
1= 68% within 2= 95% within 3= 99.7%
What is a statistic?
single value computed from data that captures some overall property of the data
What measures are we interested in when describing data?
=Central tendency- idea of what a typical or common value for a given variable is (mean, median, mode)
=Dispersion- idea of how
spread out data values are
(range, variance, standard deviation)
Describe the mean
-Total divided by the number of values
-Appropriate for both interval and ratio scales;
it does not depend on an absolute zero in the
scale. Does not work for qualitative
-Affected by outliers
Descriptive vs inferential data
- Descriptive= present information to summarise and visualise
- Inferential= generalise to larger populations
Describe the median
middle value
when the values are ranked in ascending or
descending order
-Non-decreasing order= x((N+1)/2) for N odd or any value between x(N/2) and x(N/2)+1 for even
=Appropriate for qualitative ordinal data and
quantitative interval and ratio data. It does
not make sense for categorical data, as that
has no appropriate ordering.
• Median is a good summary statistic for data
where there is a forced cutoff at one end, or
possible distortion by extreme outliers