a condition/ characteristic that can change or _have different values_ defining characteristics attribute that describes a person/ place/ thing value can vary betw/ diff entity's

most familiar measure of central tendancy most common value in the data set even though its not one of the values=\> min error used wi/ discrete and continous data, latter most common sum of all the values divided by no of values in data set to min error includes every value of data set only central tendency w/ the sum of deviations of each value from mean = 0 sample mean = X bar populaiton mean = µ

the middle score for a set of data that has been arranged in order of magnitude least affected by outliers and skewed data order the values and find te middle, if even no. find mean of the two

most frequent score in the data set. highest bar on histogram used for categorical data when the most popular option is sought after

lec 5 variables and scales Flashcards by Timi Lamikanra

what is a variable

a condition/ characteristic that can change or have different values

defining characteristics

attribute that describes a person/ place/ thing
value can vary betw/ diff entity’s
*

How well did you know this?

Not at all

Perfectly

qualitative vs quantitative variables

qualitative: values that are names/ labels
quantitative: numeric variables that measure quantity

How well did you know this?

Not at all

Perfectly

discrete vs continuous variables

continuous variable: a variable that can have any value bet/ it’s minimum and max values

discrete: variable that can’t have any value betw/ min and max

How well did you know this?

Not at all

Perfectly

univariate vs bivariate data

Univariate data: when a study consists of only one variabe

Bivariate data: when a study examines the relationship bet/ 2 variabels

How well did you know this?

Not at all

Perfectly

what is a nominal scale

lowest statistical measurement level
this scale is given to items that are divided into categoris without any order or structure
- e.g.
  - gender
  - eye colour
    - blood type

How well did you know this?

Not at all

Perfectly

what is the ordinal scale

consists of variables that have an inherent order to the relationship among diff categories

a ranking of responses that may have diff meaning among individuals
allows gross order but not the relative distance between them as the distance is not equal
properties of ordinal scale:
1)Identity: quality being measured
2) Magnitude: amount of the quality being measured gives a quantitative distance betw/
*

How well did you know this?

Not at all

Perfectly

what is the interval scale

variables that have a constant and equal distances between values but the zero point is arbitrary

properties:

identity
magnitude
equal distance: shows how the difference bet/ points

e.g. IQ score, pain scale w/ no,

How well did you know this?

Not at all

Perfectly

what is a ratio scale

top level of measurement with all the properties of abstract an abstract number system but with an absolute zero

properties

identity
magnitude
equal distance
absolute zero: allows how many times greater one case is from another

allows use of all mathematical operations
- e.g.
  - wieght,
  - pulse rate
  - respiratory rate

How well did you know this?

Not at all

Perfectly

what is a measure of central tendency / central location

a single value that attempts to describe a set of data by identifying the central position within the data set

mean
median
mode

How well did you know this?

Not at all

Perfectly

describe the mean §

most familiar measure of central tendancy
most common value in the data set even though its not one of the values=> min error
used wi/ discrete and continous data, latter most common
sum of all the values divided by no of values in data set to min error
includes every value of data set
only central tendency w/ the sum of deviations of each value from mean = 0
sample mean = X bar
populaiton mean = µ

How well did you know this?

Not at all

Perfectly

what is the main disadvantage of the mean

very susceptible to outliers (values unusual compared to data set by small/ narge numerical value)

mean can be skewed by these values

if so the median is a better measure of central tendency

How well did you know this?

Not at all

Perfectly

when not to use the mean and use the median instead

presence of outliers

_skewed distributio_n- the mean moves away from the centre but the median stays central and is least influenced

in normal distribution: mean= median=mode

How well did you know this?

Not at all

Perfectly

what is the median

the middle score for a set of data that has been arranged in order of magnitude

least affected by outliers and skewed data
order the values and find te middle, if even no. find mean of the two

How well did you know this?

Not at all

Perfectly

what is the mode

most frequent score in the data set.

highest bar on histogram
used for categorical data when the most popular option is sought after
*

How well did you know this?

Not at all

Perfectly

problems with the mode

non unique,
- causes problems when 2 values are equally popular
- even more problematic for continous data as a finding an exact mode is unlikley=> mode is rarely used w/ continous data
if the mode is far from the rest of the data in the set then it’s inaccurate

How well did you know this?

Not at all

Perfectly

which data sets are best used in normal and sxewed distributions

Study These Flashcards

normal distribution

mean or median can be used but mean ideal because:
- as it has least amount of error since it includes all values in data set
- any change in the scores affects the value of the mean but not mode /meadian

skewed distribution

the mean is dragged in the direction of the skew so the median is best
increased skew increases the ddx bet/w mean and median

normal distribution w/ NON NORMAL DATA SETS by normality tests

median> mean as a rule of thumb unless there’s no reall dx betw/ median and mean

match the variable to the type of central tendency preffered

Study These Flashcards

nominal= Mode

Ordinal=Median

interval/ratio(non skew)= Mean

interval/ratio skew=Median

what is a measure of sprad // measure of dispersion

Study These Flashcards

describes the variability i a sample/ population

used wlongside measures of central tendency to give a describtion of the overall data

what is the purpose of measuring a data spread

Study These Flashcards

shows how well the central tendency represents the data
large spread suggests large diff betw individual scores and vv for small spread
consists of
- range
- quartiles
- absolute deviation
- standard deviation

what is the range

Study These Flashcards

the difference between the highest and lowest scores in a data set and is the simplest measure of spread

range =max value-min value
sets the boundraries for scores
- useful for measuring critilically high or low thresholds
detects errors when inputing data

what are quartile and interquartile ranges

Study These Flashcards

quartiles: breaks data into quarters

even numbers: finds the mean of the 2 scores at the quarterly places in the data set

odd number: the value at 25th, 50th and 75th, positions are the quartiles

Q2 i=median

benefits of qurtiles and what is interquartile range

Study These Flashcards

less affected by outliers and skewed data like the median so are best choice for measuring the spread of these data sets
interquartile range= the dx bet/w Q3 & Q1 which shows the range in the mid half of the distribution score
- Q3-Q1= interquartile range
semi interquartile range: half the interquartile range= (Q3-Q2) /2

Drawback of quartiles

Study These Flashcards

they dont rake into account every score in the data set

what is the absolute/ variance/ standard deviation

how to calculate absolute & mean absolute deviation

Study These Flashcards

shows the amount of deviation/variation that occurs around the mean score

total variability: addition of the deviation of each score/ by the number of scores

the choice of absolute deviation, variance and standard deviation depends on the type of statistic

easiest way to calc deviation = individual score minus mean score
values above mean are +ve and below are -ve
total variability would be 0 cause of the positive and negative cancelling so the signs are ignored and only absolute values are used = absolute deviation=>divided to give == mean absolute deviation

how to calc variance

achieves positive values of the deviations from the mean by squaring them addition of te squared deviations gives the sum of squares the sum of squares is divided by n * if the values in the data are spread out from the mean then the variance is a large number * if the values are closer to the mean then the variance is small

* problems with variance

* squaring gives more values to extreme scores so is susceptible to outliers * the units of variance are squared so they differ from the units of the data set so they can't be directly related to data set values * calulating the standard deviation solves this problem

what is the standard deviation

a measure of the spread of scores w/in a data set sample SD's divver from population SD's in their calculation when to calculate the pop SD 1. if data on entire pop is present 2. if the sample is all you're interested in and **don't want to generalize** your result when to use the sample SD: if you have sample data and wish to generalize to _population_ NB: **the sample SD is not a deviation of the sample itself but an estimate of the pop SD based on sample date**

which type of data of data should be used to calculate SD

* SD is used along w/ the mean to summarize **continous data** NOT CATEGORICAL DATA * anly appt if the data is normally distributed/ non skewed

define the EMPIRICAL RULE

**: for a normal distribution nearly all of the data will fall within three standard deviations of the mean**

what are the three parts of the empirical rule

1. 68% of data falls inthe 1st SD from the mean: **µ ± 1xSD** 2. 95% falls w/in 2 SD's: **µ ± 2xSD** 3. 99.7% fall w/in 3SD;s: **µ ± 3xSD** aka the 3 sigma rule

when is the 3 sigma rule used

used for giving an esitmateof the data collection if the entire pop was surveyed for when the right datta is diffucult/impossibe to get applies to a random variable applied to the normal distribution (bell curve) doesn't apply to non normal distributions but **chebyshev's theorem** can be used for those

lec 5 variables and scales Flashcards

(31 cards)