intro to data Flashcards

1
Q

data matrix

A

table of data
columns : variables
rows : individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are variables and individuals
(information given in data)

A

variables = characteristics
individuals = observational unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

quantitative variablee

A

numerical or measurement variable
ex: age, distance
two types: discrete & continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

quantitative variable
discrete

A

can only take numerical values with jumps,
1, 2, 3, 4
# of plants in a garden # of dogs in a house

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

quantitative variable
continuous

A

can take on any value in an interval
temperature throughout the day
decimals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

categorical variable

A

qualitative variable
place an individual or item into one of several groups or categories called levels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

examples of levels

A

blood types (A,B, AB, O)
gender

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

two types of categorical variables

A

nominal and ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

categorical variable
nominal

A

no natural ordering for the categories
Ex: dog breed, brand of soda

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

categorical variable
ordinal

A

have a logical order for categories
ex: size of soda, grade level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what graphs are used to graph categorical data

A

bar graph and pie chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

box plot qualities

A

shows the median using dark horizontal line
can’t see the number of modes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what graphs are used to graph quantitative data

A

dotplots and histograms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

dotplots qualities

A

represents each observation in a data set using a single dot along the x-axis
do well displaying values of a variable in a smaller data set
NOT good at displaying data with too many different values —> lose sense of overall distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

histograms qualities

A

give a good sense of the shape of the distribution
shows the modes
symmetry is visible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

distribution

A

what values does the variable take and how often

17
Q

modes

A

of peaks
univocal, bimodal, multimodal

18
Q

symmetry

A

symmetric
skewed to right (tail on right) lower values
skewed to left (tail on left) higher values

19
Q

outliers

A

observations that lie outside the overall pattern of distribution
^^ must consider reason they exist

20
Q

population

A

the entire group we are interested in learning about

21
Q

sample

A

subset of individuals that is often a small fraction of the overall population

22
Q

parameter

A

the numerical summary for a characteristic of the population (as a whole)
keyword : “All”

23
Q

statistic

A

the numerical summary for a characteristic of a sample
keyword: sample

24
Q

what two goes together
sample and parameter
statistic and population

A

sample is a statistic
population is a parameter

25
sample mean
the sum of the observations divided by the # of observations symbol: x bar center when distribution is roughly symmetric
26
sample median
the middle value when data are arranged from smallest to largest center when distribution is skewed
27
mean and median sensitivity to extremes
mean = sensitive to extremes median = resistant to extremes
28
what does the mean and median value relative to each other, tell us about the distribution mean= median mean>median mean
mean = median : approximately symmetric mean>median : skewed to right mean
29
range
maximum- minimum very sensitive to extremes
30
interquartile range
goes hand in hand with median used to measure variability when median is the center IQR = Q3-Q1 describes the variability of the middle 50% of data NOT sensitive to extreme values
31
percentiles
first percentile Q1 the 25th percentile third percentile 75th percentile Q3
32
standard deviation
used as measure of variability when sample mean is the measure of center tells us how much an observation departs from the mean :observation - mean
33
sample variance
s^2 = sum of squared deviations / n-1 n = number of observations
34
sample standard deviation equation
s = sqrt (all observations - means)^2 / n-1
35
how do u know if there is more variability when you look at standard deviation value (s)
when s is larger
36
what two descriptions go together and when median mean iqr standard deviation
median and iqr = skewed mean and sd = symmetric
37
how to describe distribution of a graph
SOCS shape - (# of modes/ symmetric? skewed?) Outliers (do they exist?) center (mean or median) spread/variability (IQR or SD)
38
suspected outliers in boxplots
points that lie at or below Q1-1.5xIQR or at or above Q3+1.5xIQR