RS- 6 Flashcards
population
Is the collection of items under discussion / observation
* e.g. objects, events, hospital visits, procedures, observations, measurements or it can be an actual population of people, animals etc.
- Finite - if it is possible to count its individuals (to get a N number)
- Infinite – if there is no end to the population, or it is uncountable
- Real
- Hypothetical
Variates and variables
Variate: A quantity or attribute whose value can change- e.g. for the variable ‘sex’ the variates could be male or female. Male is a
variate.
* Variable: Any characteristics, number, or quantity that can be measured or counted. It can take on different values e.g. ‘sex’ is a variable as it can be male or female
Cases
an experiment unit from which data is collected
observations
is a set of one or more measurements on a single unit of observation
what are the types of variables?
quantitative- values are numerical, arithmetic operations can be performed on them, result from counting or measuring something
Qualitivaive- non numerical
constant- A quantity which can assume only one value is called a constant- They can be mathematical constants which do not vary, or they can, be categorical constant
why are these differences important?
The methods we employ to analyse the data depend on the level and type of data (variables) we have.
what are the types of quantitative variables?
- continuous- Can take any value within a range, they are continuous on a scale, the values
between the figures have meaning and the data can be fragmented into parts- e.g. birth weight of a baby in kg and g
- continuous- Can take any value within a range, they are continuous on a scale, the values
- discrete- Discrete variables are specific points on a scale, they might change by steps or jumps, the values between have no meaning, often are whole numbers e.g. number of children a person has – it cannot be 2.4 children it must be 1, 2,or 3 etc
what are the types of qualitative variables?
- Nominal
there is no natural order to the categories these variables are assigned to- e.g. degree course or hair colour
- Nominal
- Ordinal- there is a natural order to the categories e.g. months of the year follow an order, satisfaction scale from 1-10,
.3. Dichotomous
there are only 2 options
e.g. yes / no vote, leave / remain vote.
- Ordinal- there is a natural order to the categories e.g. months of the year follow an order, satisfaction scale from 1-10,
why do we summarise shite-
readable and understandable
what are the tables and give the descriptions
-Frequency table- grouping of data and intervals
-Histrogram- skewed, graph rep
-dot plot- linear axis, range is finite, retains all data in original form,
Talk about central form
- 1 number to represent data- the most central- central ‘tendency’.
Measured by median, mean (sum of all data points divided by number of observations) balance point, equal weighting on both sides however outliers strongly affect the mean, mode
median better because middle value, rank from lowest to highest, divides distribution data in 2 halves. sum of two middle values divided by 2.
Mode- data point that occurs most frequently, doesn’t rank. If two- then bimodal
when to use which central tendency
- For categorical data use the mode
- For quantitative data use the median or mean
- The mean is strongly affected by outliers
- The median is insensitive to outliers and to skewed distributions
what is the limiting factor of central tendency
Even though the central tendency information would tell us that. these data were very similar – we can see that there is a greater spread of the data for year 2.
* We therefore need a better way to describe the data, and the spread of the data we can see
what are the other ways t represent data that is not based on central tendency?
- range, is the measure of the difference between the lowest and the highest data values. shows range and spread but doesn’t show how variable the data is, extreme values extreme effects.
-IR- better idea of how the data is distributed. measure of where the “middle fifty %” is in a data set. It is a measure of where the bulk of the values lie. This help rule out extreme values or outliers in the data. divide middle 50 from median. The first quartile, denoted Q1, is the value in the data set that holds 25% of the values below it. The third quartile, denoted Q3, is the
value in the data set that holds 25% of the values above it. Q0-Q4, q1-q3. Interquartile range covers the middle two groups. Used by population scientists with large datasets. Not useful with small numbers of observations. FIND MEDIAN, MEDIAN FOR LOWER HALF AND MEDIAN FOR UPPER HALF.he interquartile range (IQR) shows the range in values of the central
50% of the data. To find the interquartile range, subtract the value of the lower quartile (Q1) from the value of the upper quartile (Q3). Shown by box and whisker plot
-Box whisker plot: A Box and Whisker Plot (or Box Plot) is a convenient way of visually displaying the data
distribution through their quartiles.
The lines extending parallel from the boxes are known as the “whiskers”, which are used
to indicate variability outside the upper and lower quartiles. Outliers are sometimes
plotted as individual dots that are in-line with whiskers. Box Plots can be drawn either
vertically or horizontally. If your extreme values (high or low) are more than 1.5x IQR below Q1 or above Q3
then they can be classed as outliers. They can be plotted separately from the box
and whisker plot as an asterisk * or other symbol.
summary- descriptive status
- Measures of average
– Mean: works best for mathematicians
– Median: sometimes gives a more sensible answer when there are
outliers, or a skewed distribution - Measures of spread
– Range(only tells you about smallest and largest observation)
– Interquartile range (only useful if large number of observations)