intro to stats Flashcards

1
Q

categorical variables

A

Variable varies by type
- Levels are usually string-based (character-based)
- Can be numerical if the numbers are used as names (no numerical value associated with the number)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

integer variables

A
  • Numerical variable consisting of whole numbers
  • Numbers have real numerical meaning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

continuous variables

A
  • Numerical variables which can theoretically have infinite decimal places
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Dichotomous variables

A
  • only 2 levels
  • 0/1, Ctrl/Treatment, TRUE/FALSE
  • Can be categorical or integer
    Variables defined by data type
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

nominal variables

A

categorical variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

ordinal variables

A

ranked data
ex. 1st, 2nd, 3rd

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

interval and ratio scale variables

A

can be integers or decimal places
ratio : true zero, ratios can be meaningfully calculated (ex. 0K is absence of heat)

interval: does not have true zero,(0C is not absence of heat)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

variables defined by casual relationship

A

In an experimental setting, we manipulate the independent variable, and measure scores for the dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

nuisance variables

A
  • confounding variables can potentially change the value of the outcome variable, and vary systematically with the predictor variable
  • obscuring variables can also potentially change the value of the outcome variable, but do not vary systematically with the predictor variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

experimental designs 1

A

Different individuals are in different experimental conditions
* Between-subjects designs
* Independent groups designs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

experiment designs 2

A

The same individuals are in different experimental conditions
* Within-subjects designs
* Repeated-measures designs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

mixed designs

A

some predictor variables are between-subjects and some are within-subjects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

inference

A

based on various methods such as hypothesis testing,
confidence interval estimation and parameter estimation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

inferential statistics

A

uses sample statistics to estimate the value of a population parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

parameter

A

a constant numerical characteristic of a population
- can include shapes (normal distribution), as shapes can be defined numerically

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

statistic

A

corresponding value calculated for a sample

17
Q

population parameter and sample statistics symbols

A

standard deviation
- sigma: population parameter
- s: sample statistic

mean
- mu: population parameter
- M: sample statistic (or x bar)

18
Q

i

A

index or individual
- refers to each score

19
Q

statistics are invented tools

A
  • statistical tools are invented to estimate probabilities that guesses are correct
20
Q

characteristics of popular statistical methods

A
  • common sense
  • ease of use
  • inertia (being good enough)
21
Q

x^2

A

test statistic is a single number that represents how well the observed data fits your null hypothesis
-needs to produce a single number that incorporates two properties of the data(number and proportion)

22
Q

probability distribution

A

divide counts (y) by the total number of simulations
- used to obtain probabilities associated with specific outcomes

23
Q

p value

A

probability of obtaining our observed results if H0 was true
- p value low = low probability of obtaining our observed results, if H0 is true

24
Q

classical statistics

A

calculates the theoretical probability distribution that would be obtained if the null hypothesis is correct
ex. df= k-1

25
Q

simulation vs. classical statistics

A

theoretical
- any value of x^2 is possible, probability distribution is continuous

simulation based
- limited number of x^2 values possible
- small number of possible outcomes when counting numbers of heads/tails from 100 coin tosses
- probability distribution is discrete

simulation based are better, only deal with possible outcomes
- but classical stats are more widespread

26
Q

how can we assess variability amongst a set 6 scores?

A
  • calculate difference between each score and a single point
  • difference between each score and the mean
  • DEVIATION SCORE (xi -x bar)
27
Q

how to calculate mean deviation?

A
  1. ignore the signs (mean absolute deviation)
  2. remove the signs by squaring all deviation scores, calculating the average, then taking the square root (standard deviation
28
Q

MAD vs standard deviation (s)

A
  • outliers will distort estimates of s more than MAD- larger deviation scores get even larger when squared
  • MAD is more intuitive cause s is result of squaring, adding, square-rooting
  • in real datasets, MAD estimates from a sample may be better estimates of the underlying population parameter than s

S
- s is one of the parameters used to define the normal distribution, which is centrally important in classical statistics
- Fisher (1920) demonstrated that in a perfect normal distribution, sample s is a better estimate of population standard deviation compared with sample MAD as an estimate of population MAD (s estimates its corresponding parameter better than MAD)

s is the dominant measure of variation used in stats

29
Q

standard deviation (s)

A

s and variance (s^2) are primary measures of variation

divide n-1 (degrees of freedom)

s= dividing the sum of squares by the degrees of freedom, then taking the square root

mean deviation score is calculated by summing the squared deviation scores, then dividing number of scores that vary, then taking the square root

30
Q

df

A

first calculate sum of squares (SS): sum(xi-x bar)^2

df (number of things that can vary): n-1

Ex = n x x-bar

purpose of s= generate estimate of average variation

31
Q

normal distribution

A

natural variables commonly approx to normal distribution
errors of measurement commonly approximate to the normal distribution
means calculated from multiple samples drawn from pop will approx to normal distribution

is a probability distribution
- common use: derive probability that a score selected at random from normal-distributed pop will have a specific value

32
Q

reading the normal distribution

A

distribution
y axis: ignore values
- for most plots: interested in value of y that corresponds to a value of x
- probability distrubtions: interested in area under the cure between two variables of x, and express it as the percentage of total area under the curve

x-axis: number of standard deviations from the mean
- standard deviation: average deviation from the mean

33
Q

summary of normal distribution

A

originated from attempts to stop disputes between gamblers

distrubtion is an approx of the binomial distribution with a large number of trials (games) and can be calculated simply from mu and s

combo of mathematical simplicity and usefulness of the normal distribution in modelling real variables and errors resulted in it holding a central position in classical statistics

if we know a population mu and s, and know that the variable is normally distributed, we can easily estimate the probability that a score will be within a specific range of values