Midterm Flashcards

Question

observational studies

Answer 1

collection of data in way that doesn't directly interfere with how the data arises eg: collecting surveys, ethnography, etc

Answer 2

when individuals are randomly assigned to a group

Answer 3

variable correlated with both the explanatory and response variables aka: lurking variable, confounding factor, confounder

Answer 4

identifies individuals and collects information as events unfold eg: medical researchers may identify and follow a group of similar individuals over many years

Answer 5

collects data after events have taken place | eg: researchers may review past events in medical records

Answer 6

every case in population has equal chance of being included

Answer 7

divide-and-conquer; population is divided into strata (which are chosen so similar cases are grouped together), then a second sampling method (usually simple random) is employed within each stratum eg: who in Canada goes to theme parks? intentionally oversampling PEI because if we didn’t, most of the respondents would probably be from other provinces like Ontario, and PEI might be skipped entirely

Answer 8

when cases in each stratum are very similar with respect to the outcome of interest

Answer 9

break up population into clusters, then sample a fixed number of clusters and include all observations from each of the samples eg: surveying Saskatchewan children by sampling Saskatchewan schools randomly, then simple random sampling kids from the selected schools

Answer 10

like cluster sample, but collect random sample within each selected cluster

Answer 11

+cluster/multistage can be more economical than alternative sampling techniques +most useful when there's a lot of case-to-case variability within cluster but clusters themselves don't look very different from one another eg: neighbourhoods when they are very diverse -more advanced analysis techniques are typically required

Answer 12

provides case by case view of two numerical variables | +helpful in quickly spotting associations relating variables, trends, etc

Answer 13

provides most basic of displays for one variable; like a one-variable dot plot

Answer 14

common way to measure centre of distribution of data - add up and divide by n - often labeled as x-bar

Answer 15

population mean

Answer 16

used to represent which variable to population mean refers to

Answer 17

doesn't show value of each observation each value blongs to bin binned counts are plotted as bars on histogram provide view of data density

Answer 18

convenient for describing shape of data distribution | doesn't show mode

Answer 19

``` right skew (longer right tail) left skew (longer left tail) symmetric (equal tails) ```

Answer 20

unimodal, bimodal, multimodal

Answer 21

varaince, standard deviation

Answer 22

the average squared deviation | σ2, standard deviation squared

Answer 23

σ | describes how far way the typical observation is from the mean

Answer 24

distance of an observation from its mean

Answer 25

•summarizes data set using five statistics while also plotting unusual observations •step 1: draw dark line denoting the median, which splits data in half •step 2: draw rectangle to represent the middle 50% of the data ⁃aka interquartile range aka IQR ⁃measure of variability in data ⁃the more variable the data, the larger the standard deviation and IQR ⁃two boundaries are called first quartile and third quartile ⁃Q1 and Q3 respectively ⁃IQR = Q3 — Q1 •step 3: whiskers attempt to capture data outside of the box ⁃reach is never allowed to be more than 1.5 x IQR •step 4: any observations beyond the whiskers are identified as outliers •robust estimates: extreme observations have little effect on value ⁃median and IQR are robust estimates

Answer 26

colours are used to show higher and lower values of a variable not helpful for getting precise values helpful for seeing geographic trends and generating interesting research questions

Answer 27

summarized data for two categorical variables | -each value in table represents number of times a particular combination of variable outcomes occurred

Answer 28

total counts across each row

Answer 29

total counts down each column

Answer 30

replace counts with percentages or proportions

Answer 31

computed as counts divided by row totals

Answer 32

graphical display of contingency table information

Answer 33

graphical display of contingency table information | -use areas to represent number of observations

Answer 34

proportion of times the outcome would occur if we observed the random process an infinite number of times

Answer 35

as more observations are colelcted, the proportion p^n occurences with a particular outcome converges to the probability p of that outcome

Answer 36

aka mutually exclusive | when two outcomes cannot happen at the same time

Answer 37

table of all disjoint outcomes and their associated probabilities

Answer 38

all outcomes not in the event

Answer 39

set of all possible outcomes

Answer 40

when knowing the outcome of one process provides no useful information about the outcome of the other

Answer 41

if a probability is based on a single varaible

Answer 42

probability of outcomes is based on two or more variables

Answer 43

two parts: outcome of interest and condition

Answer 44

information we know to be true

Answer 45

the outcome of interests A given condition B

Answer 46

organize outcomes and probabilities around the structure of data

Answer 47

when two or more processes occur in a sequence and each process is conditioned on its predecessors

Answer 48

average outcome of X | denoated by E(X)

Answer 49

experience and rasong

Answer 50

Answer 51

downward part of wheel of science

Answer 52

"lack of money" vs "lack of opportunity" are two conceptualizations of poverty "do you have enough money to feed your family?" operationalizes the conceptualization of poverty different conceptualizations often require different operationalizations

Answer 53

a little about a lot of people vs a lot about a few people

Answer 54

growing source digitial data that is collected in process of administering other social goals everything from information attached to social health number to credit card number hard to make generalizations beyond the population eg using database dealing with health cards is hard to generalize to all of Canada because people who didn't use health cards would be completely ignored

Answer 55

designed to ask research questions responses distilled into data that we work with measurement necessitates some simplification because we need to compare across different groups of people

Answer 56

group we want to make a generalization about vs the group we actually have information about

Answer 57

rare kind of sample that covers an entire population, can be very expensive basically the opposite of an annecdote

Answer 58

vulnerable communities like illegal immigrant workers in America

Answer 59

typicaly create artificial situtions that are designed to isolate variables of interest and their effects

Answer 60

increasingly popular open source client | accessible because it's free

Answer 61

popular for undergrads and certain fields | designed for doing experiment research

Answer 62

popular among sociologists and economists

Answer 63

higher bars represent areas where there are more observations makes it easier to judge the centre and shape of the distribution

Answer 64

contains actual phrasing of question and options for the responses

Answer 65

summarize the data set; tells us what the dataset names mean like dictionary

Answer 66

micro data, summary statistics (overall estimates)

Answer 67

contains confidential information we can use the public-use parts of ODESI, in which everything is anonymized and variables have been "tweaked" a little in order to make sure that information can't be traced back to respondents

Answer 68

stuff you can't find on PUMFs

Answer 69

mode, median, mean

Answer 70

+can be used for all types of measures, relatively quick/simple measure -doesn't ues much information, most common doesn't necessarily mean typical (eg: 53 year old is mode, but there are plenty of people who aren't other ages)

Answer 71

odd: middle observation even: average of two middle observations

Answer 72

+capture actual centre of distribution, less suceptible to outliers -computationally awkward, cannot be estimated for unordered categorical variables

Answer 73

general concept, closely related to median (median = 50th percentile) 100 percetniles

Answer 74

between 25th and 75th

Answer 75

90% of observations are lower, 10% are higher

Answer 76

25% of observation are lower, 75% are higher

Answer 77

more susceptible to outliers

Answer 78

aim to give us a sense of breath of distribution

Answer 79

interval between smallest and largest values

Answer 80

+good for quick check | -only takes into account two observations, very sensitive, only useful for numeric variables

Answer 81

+variance and SD take into account all scores, accurately describes "typical" deviation, easily interpreted -sensitive to outliers, can only be calculated for numerical variables

Answer 82

frequencies are convoluted, make comparisons difficult, so proportions standardize frequency by number of cases

Answer 83

working with them is tough when trying to conceptualize comparisons -this can be fixed by changing them into percentages

Answer 84

the percentage in the category + the category under it | only works for ordinal variables

Answer 85

a process where we know what outcomes can happen, but we don’t know which particular outcome will happen

Answer 86

1. outcomes listed are disjoint 2. each probability must equal between 0 and 1 3. all probabilities must total 1

Answer 87

if we know the possibility of their component outcomes, we can know the probability of two events

Answer 88

another way of summarizing information -more advanced mathematical concept than bar graph the line is called probability density function -describes information in graph -has interesting properties -can be used to infer probability of any outcome -never loops back (line only moves from left to right) -always less than one -the area under the curve adds up to 1

Answer 89

the area under the curve gives the probability of people falling in that range

Midterm Flashcards

(114 cards)