Lec 1 & TB Flashcards

1
Q

2 main types of data

A
  • Continuous
  • Discrete
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  • Define discrete data or 2 types of discrete data
    • 3 types of categorical data
A

Discrete data:

  • counts, # of times smth happens
  • Categorical: put in categories
    • Binary (0/1)
    • Ordered/ordinal: the order is meaningful; but the distance b/w each is not
      • E.g. S,M,L; agree, n0, disagree
      • IOW: no meaningful distance b/w S to M
    • Unordered/nominal: names; no logical order
      • race, sex
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define continuous data

  • 2 types of cont data
A
  • Cont data: measured quantities that can be measured to infinite prevision (eg height, weight, BP); difference b/w the intervals are meaningful
  • Sometimes data are technically discrete are treated like cont data (eg SF-26 QoL instrument)
    • Eg: likert scale is ordinal
      • Goal: add them up -> total #
      • Since the stuff you add up are diff v, it ends up LIKE a cont v
  • 2 types of cont data
    • Ratio: has TRUE meaningful zero (eg height, weight)
    • Interval: zero is arbitrary (eg scale data, temp)
      • Eg 0 in dC: freezing pt h2o
      • Eg 0 in dF: freezing pt of salt h2o
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

define “X”

A
  • “X”: variable of interest
  • “xi”: the subscript number is a specific item in the data set
  • N: number in pop
  • “n”: lower case is # in sample
  • fi = frequency of xi
  • f = total # of observations in an interval
  • ∑ = sum
  • X
  • Greek letters represent pop characteristics (parameters)
    • µ = pop mean
    • σ2 = pop variance
    • σ = pop SD
  • Roman letters rep sample characteristics (stats)
    • x̄ = sample mean
    • s2: sample variance
    • s = sample sd
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Type of data described by descriptive statistics

What does the distribution of data tells us?

A
  • Descriptive stats describe characteristics relating to distribution of data
  • The most appropriate descriptive stats depend on data distribution
  • Distribution of data = pattern of observations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

mean

  • Where is the mean if the graph is right skewed?
  • Where is the mean if the graph is left skewed?
  • Geometric mean
  • Arithmetic mean

median

  • define
  • odd vs even # of observations

mode

A

mean = avg

  • -ve or right skewed, mean is shifted to right
  • +Ve or left skewed, mean is shifted to left
  • Geometric mean: multiple then root
  • Arithmetic mean: add then divide

Median

  • (Q2): middle data value
  • Odd # of observations: median = middle #
  • Even # of observations: median = avg of 2 middle vales

Mode

  • The # is most frequently occurring in the data set
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  • 5 number summary
  • How to get Q1 and Q3
A
  • 5 # summary: min, Q1, Q2, Q3, max
  • Quartiles
    • Sort the data
    • Q1 = (n+1)/4 th ordered observation
    • Q3: 3(n+1)/4 th ordered observation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Formulas

  • sample variance
  • coefficient of variation
A

Sample variance: Right formula of image

Sample standard deviation: s = √s2

Interquartile range: IQR = Q3 − Q1

Range: Max − Min

Coefficient of variation: CV = (s/x)(100)% (Only valid for ratio data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

degree of freedom

Why do we √ the variance

When is the empirical rules used?

Empirical rule

x

How do we determine outliers if data is asymmetrical, not normally distributed

A

Degrees of freedom

  • df of an estimate is the # of independent pieces of info used to obtain the estimate
  • x
  • √ the variance gives us the sd, and restores the original unit
  • X
  • If the freq distribution is symmetrical and bell-shaped = normal distribution; empirical rule is used
  • Empirical rule: for a normal distribution, all the data lies in 3 sd of the mean
    • 68% of data lie w/in interval µ +/- σ
    • 95% of data in µ +/- 2σ
    • 99.7% in µ +/- 3σ
  • IOW: 0.3% of data are outliers
  • x
  • When we do not have a normal distribution/ data distribution is asymmetric, outliers are identified as
    • < Q1 – (1.5 x IQR)
    • > Q3 + (1.5 x IQR)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Graphs that show distribution

Graph that show association

Models for inferences

A
  • distributional: histogram, density plot, box-whisker plot, quantile-quantile (Q-Q) plot
  • Association: scatter plot
  • Inferences
    • t-tests (parametric), Wilcoxon (non-parametric)
    • Linear regression, analysis of variance (ANOVA)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Graphs

  • con
  • most common graph

Descriptive stats

  • what is displayed
  • how do we display relationships
A
  • Graphs
    • Challenging to display at times
    • Usually use dotplots
  • Descriptive stats
    • # or prop (%) of each category
    • Crosstabulations b/w categorical v (have multiple v) to display relationships
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Inference and stat models

  • binomial test
  • fischer’s exact or X^2 test
A

Inference and stat methods

  • Binary data: Binomial (prop) test (single sample)
  • Fischer’s exact or X^2 test (chi-square test) (comparing samples)
  • More than 2 categories, use chi-squared
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
  • Population
  • Population parameters
  • Sample
  • Sample stats
A

Population vs samples

  • Pop
    • Collection of all possible subjects
    • Parameters:
      • µ = mean[AL1]
      • sigma sq: variance
      • π = proportion w/ characteristic
    • Parameters: unknown constant to estimate
  • Sample
    • Subset of pop (estimates of pop using sample)
      • x bar = mean
      • s^2 = sample variance
      • p = proportion in the sample w/ characteristic
    • Sample stats: variable b/c it depends on a particular sample
      • Used to estimate pop parameters

[AL1]“m” “mu” “mean”

“s” “sigma” “sd”

“p” “pi” “prop”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe box plot

A

Box whisker

How well did you know this?
1
Not at all
2
3
4
5
Perfectly