Chapter 1: Intro to statistics Flashcards

1
Q

What are cases, variables, and labels ?

A

Cases are the objects described by the data. Variable is a characteristic of case. Labels are special variables which distinguish between cases e.g. name of song.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are types of variables ?

A

Categorical: places cases in categories or groups. Continuous/quantitative has numerical value for which arithmatic operations can be performed.

Distribution of variable tells us WHAT VALUE variable takes & HOW OFTEN it takes each value.

Categorical: bar graph and pie chart (% required to make up whole pie, all cate must be included, less flexible)

Continuous: histograms, stemplots, timeplots

stemplot: all but final (rightmost) digit is stem, leaf are all final digits, put leaves in ascending order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Bar graph vs histogram

A

Bar graph gives the no. of cases in each categorical variables. There can be space between 2 bars.
Histogram gives the DISTRIBUTION of counts between single continuous variable which is subdivided into numerical classes. Cannot be space unless the frequency is 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Overall pattern of distribution

A

Shape (symmetry, unimodal, right-left skewed), center, spread (max-min value, outlier)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Right-skewed

A

Right side OF THE GRAPH (containing half of the observations and larger values) is longer than it’s left side

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

mean vs median

A

Give us center
Mean is average, sensitive to outliers
Median is the mid-pt (half values larger the other half smaller), resistant to outlier
odd: middle no.
even: average of 2 middle nos.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Spread

A

Can be given by:
1) Quartiles: Q1,Q2,Q3
Q1 is the median of obs to left of media
Q3 is the median of obs to the right of media
Q2=median
IQR=Q3-Q1
2) Five Number summary: Min Q1 M Q3 Max
Can be indicated by box plot. The whiskers extending from box give min and max values (THAT ARE NOT OUTLIERS).
3) 1.5IQR: Q1-1.5IQR, value below this is outlier
1.5IQR: Q1+1.5IQR,value above this is an outlier
4) Standard deviation (s):
obs-mean: devaition
average of square root of all deviations: variance (s2)
square root of variance is SD
But average done by n-1: degrees of freedom
USED only when MEAN is measure of center
SENSITIVE to outliers like mean , more bcuz of square root
s=0 when all values same i.e. no spread
SD preferred over Variance as UNITS same as OBS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Measure of center and spread

A

Use mean and SD only for symmetric distri as sensitive to outliers
For skewed distributions use median and quartiles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Linear transformation of variables

A

xnew=a +bx; x:obs
1) b multiplies center (mean, median) as well as spread(SD,IQR)
2) “a” adds to the center (mean, median) but NOT to the spread (SD,IQR) i.e. remains same

BUT OVERALL SPREAD and CENTER does not change by linear transformation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Density Curve

A
  • Histogram, smooth curve over it, make AUC=1 (proportion), always above or on hori axis
  • Outliers not described
  • Median: divides curve in half
  • Mean: Balance-pt
  • Actual obs: x-bar and s; density curve: mu and sigma
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Normal distribution and Normal density curve

A
  • Distribution that has shape: symmetric, unimodal and bell-shaped
  • It’s density curve is NORMAL density curve
  • N(mu, sigma)
  • Mean=median and it is the measure of center as it is a symmetric distribution
  • If mu changed then the curve moves to right or left but spread stays same
  • Spread determined by sigma
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

68-95-99.7 Rule

A

68% of obs fall within sigma of mu, 95% with 2sigma and 99.7% within 3sigma of mu
CHECKOUT SUMS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Standard Normal Distribution

A
  • Variable x has a “z-score/standardized-value”=x-u/sigma
  • This is standard normal distri
  • Always has N(0,1)
  • CHECKOUT SUMS
  • Would be with center 0 and 3 posi/nega SD on each side (acc to 68-95-99.7 rule)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Standard Normal Table

A

AUC to the left of each z value
-3.4 to 3.4
SOLVE SUMS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly