data analysis and descriptive tendencies Flashcards

1
Q

what is a population?

A
  • complete set of objects
  • group containing elements of anything you want to study
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is a sample?

A
  • subset of a given population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

does the sample have to be people?

A
  • no, can be cells, products, SMS messages
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

why do you take a sample?

A
  • cannot test every individual so take a sample and infer about population causing error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what should the sample represent? what should be considered?

A
  • represents the population
  • careful considerations of sub- categories required to ensure that the sample reliably represents the population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what shouldn’t be done to the samples after determined?

A
  • sample shouldn’t be modified or subdivided after determined for the sake of deriving a better conclusion
    ‘ cherry picking’
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is a variable?

A
  • set of related events that can take on more than one value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

can a variable be changed? give examples

A
  • something that can be changed
    e.g., characteristic or value like weight, exam mark, academic degree, hometown
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is statistical inference?

A
  • involves figuring out how well a property of one variable can be predicted by that of another variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is an independent variable?

A
  • value being changed or manipulated
  • controlled or selected to determine its relationship on an observed outcome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is a dependent variable?

A
  • observed result of the IV being manipulated
  • it is something that may depend on the IV
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what does research aim to do with the variables?

A
  • attempt made to find evidence that DV is dependent to IV
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what do independent variables consist of?

A
  • different categories called levels, conditions or treatments
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how are levels of independent variable different from number?

A
  • because there is multiple independent variable but you only belong to one level
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is a control variable?

A
  • kept constant to prevent them influencing the effect of IV on DV
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are control variables critical for?

A
  • critical for study design e.g., recruitment criteria for participants
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what are the different types of data?

A
  • categorical
  • ordered
  • continuous
  • measured
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what are nominal and ordinal variables?

A
  • qualitative and categorical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what are interval and ratio variables?

A
  • quantitative and continuous
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what is nominal data?

A
  • categorical
  • cannot be ordered/ counted
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what are examples of nominal data?

A
  • gender
  • country
  • occupation
  • blood type
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what is ordinal data?

A
  • can be ordered but cannot be added or subtracted
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what are examples of ordinal data?

A
  • satisfaction rating
  • education level
  • spice level
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what is interval data?

A
  • can be ordered
  • difference can be measured but cannot compute a ratio between two values
  • no meaningful zero exists
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

what are examples of interval data?

A
  • exam mark
  • date
  • year
26
Q

what is ratio data?

A
  • interval and can take a ratio between two
  • has meaningful zero
27
Q

what are examples of ratio data?

A
  • distance
  • height
  • annual income
  • number of success
28
Q

how do you distinguish between interval and ratio?

A
  • can it be doubled?
    yes= ratio; no= interval
29
Q

what are the four main descriptive tendencies?

A
  • central tendency
  • spread
  • shape
  • outliers
30
Q

what are the three central tendencies?

A
  • mode
  • median
  • mean
31
Q

what is the mode and what variable/ data is it used for?

A
  • highest value
  • can be used for all types of variables
  • often used for nominal and ordinal variables
32
Q

what is the median and what variable/ data is it used for?

A
  • middle value
  • cannot be obtained for nominal variables
  • obtained only on ordered variables e.g., ordinal, interval, ratio
33
Q

what is the mean and what variable/ data is it used for?

A
  • average
  • distances (1st moment) are balanced
  • only defined in interval and ratio variables
34
Q

what two of the central tendencies are normally similar?

A
  • mean and median are similar
35
Q

how does an outlier effect central tendencies?

A
  • hugely affects the mean value but doesn’t affect the median
36
Q

what are the three types of data found for spread?

A
  • quantile/ quartile/ percentile
  • variance and standard deviation
  • Z score
37
Q

how do you find out the quantile, quartile and percentiles?

A
  • divide data into sections containing the same number of data and report where the sections are located
38
Q

what is a quantile? where do we plot this data?

A
  • sample is divided into equal sized subgroups
  • for N sections = N-1 values
  • plotted onto a scatterplot
39
Q

what is a quartile? what is the median?

A
  • 1st to 3rd
  • when there are four sections in total
  • median= 2
40
Q

what is percentiles? what is the median?

A
  • 1st to 99th
  • when there are 100 sections
  • median is 50
41
Q

how do you calculate the 2nd moment ?

A

variance =(distance from mean)2 to each data point / number of data points

42
Q

what is the square root of variance called and what is it?

A
  • called standard deviation
  • standard distance from mean
43
Q

what does mean + / - SD provide information on?

A
  • where the centre is
  • how spread the data points are
44
Q

given SD, how can distance be described? what is this called and what does it enable?

A
  • distance can be described as a ratio with respect to SD
  • known as Z - score
  • enables fair comparison of deviations
45
Q

what are the two main types of shapes?

A
  • skewness
  • kurtosis
46
Q

what does skewness measure and correspond to?

A
  • measures degree of asymmetry
  • corresponds to 3rd moment
47
Q

how do you calculate the 3rd moment? what do you divide it by and why?

A

3rd moment = distance from mean^3 to each data point/ number of data points
- divide by SD^3 to make it dimensionless

48
Q

what does zero skewness mean?

A
  • data are symmetrically distributed
49
Q

what does high skewness mean?

A
  • distribution is highly asymmetrical
50
Q

what does positive/ negative skewness mean?

A
  • indicates which direction data are skewed
51
Q

what does kurtosis measure? what does it correspond to?

A
  • measures the sharpness/ thinness
  • corresponds to the 4th moment
52
Q

how do you work out 4th moment? what do you divide it by and why?

A

4th moment = distance from mean^4 to each data point/ number of data points
- divide by SD^4 to make it dimensionless

53
Q

what is kurtosis always by definition? what do we subtract?

A
  • always positive
  • subtract 3 (kurtosis of ‘ normal distribution)
54
Q

what are outliers?

A
  • extreme values relative to bulk of values in a data set
55
Q

what are outliers due to?

A
  • inaccuracies in data processing
  • problems with methodology e.g., measures, instruments, participants not following instructions
  • actual extreme value from an unusual participant
56
Q

what are the two ways you can detect outliers?

A
  • based on z- score
  • based on inter-quartile range
57
Q

how does Z- score detect outliers?

A
  • outlier if z-score is more than 3 or less than 3
  • when the distance from mean is more than 3x of SD
58
Q

how does inter- quartile range detect outliers?

A
  • width between 1st and 3rd quartile
  • outlier if value is greater than 1.5 IQR above 3rd quartile or smaller than 1.5 IQR below 2nd
59
Q

what samples do outliers distort data?

A
  • in small samples
60
Q

describe a histogram- what does height represent?

A
  • visualises how data is distributed
  • height represents frequency (how often a value appears in data)
61
Q

describe a box plot

A
  • plot summarising quartile- based stats of a data set, includes;
  • location of quartiles
  • range of data excluding outliers
  • outliers detected by quartiles