intro to data Flashcards

1
Q

data matrix

A

table of data
columns : variables
rows : individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are variables and individuals
(information given in data)

A

variables = characteristics
individuals = observational unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

quantitative variablee

A

numerical or measurement variable
ex: age, distance
two types: discrete & continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

quantitative variable
discrete

A

can only take numerical values with jumps,
1, 2, 3, 4
# of plants in a garden # of dogs in a house

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

quantitative variable
continuous

A

can take on any value in an interval
temperature throughout the day
decimals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

categorical variable

A

qualitative variable
place an individual or item into one of several groups or categories called levels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

examples of levels

A

blood types (A,B, AB, O)
gender

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

two types of categorical variables

A

nominal and ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

categorical variable
nominal

A

no natural ordering for the categories
Ex: dog breed, brand of soda

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

categorical variable
ordinal

A

have a logical order for categories
ex: size of soda, grade level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what graphs are used to graph categorical data

A

bar graph and pie chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

box plot qualities

A

shows the median using dark horizontal line
can’t see the number of modes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what graphs are used to graph quantitative data

A

dotplots and histograms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

dotplots qualities

A

represents each observation in a data set using a single dot along the x-axis
do well displaying values of a variable in a smaller data set
NOT good at displaying data with too many different values —> lose sense of overall distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

histograms qualities

A

give a good sense of the shape of the distribution
shows the modes
symmetry is visible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

distribution

A

what values does the variable take and how often

17
Q

modes

A

of peaks
univocal, bimodal, multimodal

18
Q

symmetry

A

symmetric
skewed to right (tail on right) lower values
skewed to left (tail on left) higher values

19
Q

outliers

A

observations that lie outside the overall pattern of distribution
^^ must consider reason they exist

20
Q

population

A

the entire group we are interested in learning about

21
Q

sample

A

subset of individuals that is often a small fraction of the overall population

22
Q

parameter

A

the numerical summary for a characteristic of the population (as a whole)
keyword : “All”

23
Q

statistic

A

the numerical summary for a characteristic of a sample
keyword: sample

24
Q

what two goes together
sample and parameter
statistic and population

A

sample is a statistic
population is a parameter

25
Q

sample mean

A

the sum of the observations divided by the # of observations
symbol: x bar
center when distribution is roughly symmetric

26
Q

sample median

A

the middle value when data are arranged from smallest to largest
center when distribution is skewed

27
Q

mean and median sensitivity to extremes

A

mean = sensitive to extremes
median = resistant to extremes

28
Q

what does the mean and median value relative to each other, tell us about the distribution
mean= median
mean>median
mean<median

A

mean = median : approximately symmetric
mean>median : skewed to right
mean<median : skewed to left

29
Q

range

A

maximum- minimum
very sensitive to extremes

30
Q

interquartile range

A

goes hand in hand with median
used to measure variability when median is the center
IQR = Q3-Q1
describes the variability of the middle 50% of data
NOT sensitive to extreme values

31
Q

percentiles

A

first percentile Q1 the 25th percentile
third percentile 75th percentile Q3

32
Q

standard deviation

A

used as measure of variability when sample mean is the measure of center
tells us how much an observation departs from the mean :observation - mean

33
Q

sample variance

A

s^2 = sum of squared deviations / n-1

n = number of observations

34
Q

sample standard deviation equation

A

s = sqrt (all observations - means)^2 / n-1

35
Q

how do u know if there is more variability when you look at standard deviation value (s)

A

when s is larger

36
Q

what two descriptions go together and when
median
mean
iqr
standard deviation

A

median and iqr = skewed
mean and sd = symmetric

37
Q

how to describe distribution of a graph

A

SOCS
shape - (# of modes/ symmetric? skewed?)
Outliers (do they exist?)
center (mean or median)
spread/variability (IQR or SD)

38
Q

suspected outliers in boxplots

A

points that lie at or below Q1-1.5xIQR or at or above Q3+1.5xIQR