march 10 Flashcards

1
Q

Three steps to data analysis

A

data validaton
data preparation
data analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

data validation

A

confirm data collection occurred as planned
check for errors/ problems
if possible address/ repair

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

data preparation

A

data entry: convert data to electronic form if needed
data coding: group and assign numeric codes to qualitative responses
data cleaning: check for errors and inconsistencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

data analysis

A

exploration: get a feel for the data
tabulations and analyses: answer research questions using appropriate method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

exploration

A

tabulations: frequency tables, cross tabulations
descirpotbe statistics: measures of central tendency, measures of dispersion
visualizations: histograms, pie charts, tables, bar charts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Difference between frequencies and proportions

A

frequencies is how many people use something, and proportion is how many relative to a total (%), can use pie charts to represent proportions . bar chart can be used for simple proportions or frequencies . label the graphs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

cross tabulation

A

looking at 2 variables at once. Ex: gender which is variable 1 and carrier which is variable 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Measures of central tendency

A

Central tendency is to describe your typical respondent in a data set
Can use the mode, mean and median to describe your data set
Mode: most frequent value. Put in ascending value and count how many times each value occurs. Identify which value occurs most often
Median: middle value. Put values in ascending order and identify middle value
Mean: average, add up all values and divide by the number of observations.
Problem with mean: The mean is influenced by outliers (extreme values in data set)
Better way to examine : when you have skewed data, use the median, not influenced by outliers
Ex: salary data in canada, useful to look at median rather than the average salaries

mode: ordinal, interval, ratio and nominal
median: ordinal, interval and ratio
mean: interval, ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Normal distribution

A

A normal distribution describes that the values of a variable are normally distributed. This means that most data points cluster to the middle or centre of the distribution
Mean, median and mode are all equal bc all data values cluster to the centre
If you have a skewed graph (outliers are present) don’t have a normal distribution and then mean median and mode are not equal
Large sample sizes will often look like normal distributions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Dispersion

A

Range and standard deviation
Range: interval and ratio
Standard deviation: interval, ratio
Ex: the mean of 2 pizza companies can be the same cause its the average whereas you can have a very consistent company always delivering at the same time (low variability, not fast or slow but consistent) this is why the mean is not enough need to find the variance
Range: the max minus the min value, the spread
Standard deviation: about dispersion around the mean, good measure
if you have a skewed data set, standard deviation is high

How well did you know this?
1
Not at all
2
3
4
5
Perfectly