lec 5 variables and scales Flashcards

1
Q

what is a variable

A

a condition/ characteristic that can change or have different values

defining characteristics

  • attribute that describes a person/ place/ thing
  • value can vary betw/ diff entity’s
    *
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

qualitative vs quantitative variables

A

qualitative: values that are names/ labels
quantitative: numeric variables that measure quantity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

discrete vs continuous variables

A

continuous variable: a variable that can have any value bet/ it’s minimum and max values

discrete: variable that can’t have any value betw/ min and max

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

univariate vs bivariate data

A

Univariate data: when a study consists of only one variabe

Bivariate data: when a study examines the relationship bet/ 2 variabels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is a nominal scale

A
  • lowest statistical measurement level
  • this scale is given to items that are divided into categoris without any order or structure
    • e.g.
      • gender
      • eye colour
        • blood type
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the ordinal scale

A

consists of variables that have an inherent order to the relationship among diff categories

  • a ranking of responses that may have diff meaning among individuals
  • allows gross order but not the relative distance between them as the distance is not equal
  • properties of ordinal scale:
  • 1)Identity: quality being measured
  • 2) Magnitude: amount of the quality being measured gives a quantitative distance betw/
    *
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the interval scale

A

variables that have a constant and equal distances between values but the zero point is arbitrary

properties:

  1. identity
  2. magnitude
  3. equal distance: shows how the difference bet/ points

e.g. IQ score, pain scale w/ no,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is a ratio scale

A

top level of measurement with all the properties of abstract an abstract number system but with an absolute zero

properties

  1. identity
  2. magnitude
  3. equal distance
  4. absolute zero: allows how many times greater one case is from another
  • allows use of all mathematical operations
    • e.g.
      • wieght,
      • pulse rate
      • respiratory rate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is a measure of central tendency / central location

A

a single value that attempts to describe a set of data by identifying the central position within the data set

  • mean
  • median
  • mode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

describe the mean §

A
  • most familiar measure of central tendancy
  • most common value in the data set even though its not one of the values=> min error
  • used wi/ discrete and continous data, latter most common
  • sum of all the values divided by no of values in data set to min error
  • includes every value of data set
  • only central tendency w/ the sum of deviations of each value from mean = 0
  • sample mean = X bar
  • populaiton mean = µ
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the main disadvantage of the mean

A

very susceptible to outliers (values unusual compared to data set by small/ narge numerical value)

mean can be skewed by these values

if so the median is a better measure of central tendency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

when not to use the mean and use the median instead

A

presence of outliers

_skewed distributio_n- the mean moves away from the centre but the median stays central and is least influenced

  • in normal distribution: mean= median=mode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the median

A

the middle score for a set of data that has been arranged in order of magnitude

  • least affected by outliers and skewed data
  • order the values and find te middle, if even no. find mean of the two
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the mode

A

most frequent score in the data set.

  • highest bar on histogram
  • used for categorical data when the most popular option is sought after
    *
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

problems with the mode

A
  • non unique,
    • causes problems when 2 values are equally popular
    • even more problematic for continous data as a finding an exact mode is unlikley=> mode is rarely used w/ continous data
  • if the mode is far from the rest of the data in the set then it’s inaccurate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

which data sets are best used in normal and sxewed distributions

A

normal distribution

  • mean or median can be used but mean ideal because:
    • as it has least amount of error since it includes all values in data set
    • any change in the scores affects the value of the mean but not mode /meadian

skewed distribution

  • the mean is dragged in the direction of the skew so the median is best
  • increased skew increases the ddx bet/w mean and median

normal distribution w/ NON NORMAL DATA SETS by normality tests

  • median> mean as a rule of thumb unless there’s no reall dx betw/ median and mean
17
Q

match the variable to the type of central tendency preffered

A

nominal= Mode

Ordinal=Median

interval/ratio(non skew)= Mean

interval/ratio skew=Median

18
Q

what is a measure of sprad // measure of dispersion

A

describes the variability i a sample/ population

  • used wlongside measures of central tendency to give a describtion of the overall data
19
Q

what is the purpose of measuring a data spread

A
  • shows how well the central tendency represents the data
  • large spread suggests large diff betw individual scores and vv for small spread
  • consists of
    • range
    • quartiles
    • absolute deviation
    • standard deviation
20
Q

what is the range

A

the difference between the highest and lowest scores in a data set and is the simplest measure of spread

  • range =max value-min value
  • sets the boundraries for scores
    • useful for measuring critilically high or low thresholds
  • detects errors when inputing data
21
Q

what are quartile and interquartile ranges

A

quartiles: breaks data into quarters

even numbers: finds the mean of the 2 scores at the quarterly places in the data set

odd number: the value at 25th, 50th and 75th, positions are the quartiles

Q2 i=median

22
Q

benefits of qurtiles and what is interquartile range

A
  • less affected by outliers and skewed data like the median so are best choice for measuring the spread of these data sets
  • interquartile range= the dx bet/w Q3 & Q1 which shows the range in the mid half of the distribution score
    • Q3-Q1= interquartile range
  • semi interquartile range: half the interquartile range= (Q3-Q2) /2
23
Q

Drawback of quartiles

A

they dont rake into account every score in the data set

24
Q

what is the absolute/ variance/ standard deviation

how to calculate absolute & mean absolute deviation

A

shows the amount of deviation/variation that occurs around the mean score

total variability: addition of the deviation of each score/ by the number of scores

the choice of absolute deviation, variance and standard deviation depends on the type of statistic

  • easiest way to calc deviation = individual score minus mean score
  • values above mean are +ve and below are -ve
  • total variability would be 0 cause of the positive and negative cancelling so the signs are ignored and only absolute values are used = absolute deviation=>divided to give == mean absolute deviation
25
Q

how to calc variance

A

achieves positive values of the deviations from the mean by squaring them

addition of te squared deviations gives the sum of squares

the sum of squares is divided by n

  • if the values in the data are spread out from the mean then the variance is a large number
  • if the values are closer to the mean then the variance is small
26
Q
  • problems with variance
A
  • squaring gives more values to extreme scores so is susceptible to outliers
  • the units of variance are squared so they differ from the units of the data set so they can’t be directly related to data set values
  • calulating the standard deviation solves this problem
27
Q

what is the standard deviation

A

a measure of the spread of scores w/in a data set

sample SD’s divver from population SD’s in their calculation

when to calculate the pop SD

  1. if data on entire pop is present
  2. if the sample is all you’re interested in and don’t want to generalize your result

when to use the sample SD: if you have sample data and wish to generalize to population

NB: the sample SD is not a deviation of the sample itself but an estimate of the pop SD based on sample date

28
Q

which type of data of data should be used to calculate SD

A
  • SD is used along w/ the mean to summarize continous data NOT CATEGORICAL DATA
  • anly appt if the data is normally distributed/ non skewed
29
Q

define the EMPIRICAL RULE

A

: for a normal distribution nearly all of the data will fall within three standard deviations of the mean

30
Q

what are the three parts of the empirical rule

A
  1. 68% of data falls inthe 1st SD from the mean: µ ± 1xSD
  2. 95% falls w/in 2 SD’s: µ ± 2xSD
  3. 99.7% fall w/in 3SD;s: µ ± 3xSD

aka the 3 sigma rule

31
Q

when is the 3 sigma rule used

A

used for giving an esitmateof the data collection if the entire pop was surveyed for when the right datta is diffucult/impossibe to get

applies to a random variable applied to the normal distribution (bell curve)

doesn’t apply to non normal distributions but chebyshev’s theorem can be used for those