Final POLI 399 Quantitative Research Methods Flashcards

Question

type 2 error

Answer 1

failing to abandon a null hypothesis when it is false. don't reject null hypothesis when you should reject it. assume there isn't a relationship when there is one.

Answer 2

used to describe the distribution of the data or how data is spread out. trying to explain variation is the goal of the scientific method. interquartile range, variance and sum of squares.

Answer 3

individual values of a standardized variable. the exact number of standard deviation units any particular case lies above of below the mean ( most typically occurring value). 2 z-scores out usually 95% of cases on a normal distribution.

Answer 4

statistics package for social scientists

Answer 5

empiricism, inter subjectivity, explanation, determinism

Answer 6

every knowledge claim must be verified by systematic observation. assumes that our senses give us the most accurate info about the world. explanations must be supported by observable evidence assumes objective reality exists guards against bias

Answer 7

prejudiced for or against a particular idea or explanation

Answer 8

agreement between individuals about how we do work using the scientific method views science as a way of knowing shared standards for determining what is acceptable empirically. two parts: transmissibility: steps followed in research so that someone can repeat your research. replicability: when your work is repeated, the same results are obtained. allows for other researchers to test for bias. journals have replication policies, peer review.

Answer 9

risk averse, observe systemically with criteria established in advance, avoid over generalizations, test alternative explanations, make more observations. guard against bias. v.s jump to conclusions, overlook contradictory evidence and explain it away.

Answer 10

1. never true or proven irrespective of how many times they have been subjected to testing. Still capable of making a knowledge claim. 2. Must be testable and thus able to be proven wrong or in other words falsifiable. 3. Disconfirming evidence must always be possible

Answer 11

push to become more scientific in the 50's and 60;s in the US 1. human reaction problem people perform or perceive differently when they know they're being observed. problem with empiricism because objective reality cannot be observed. however, reactivity is not an insurmountable methodological barrier. 2. can never be value free because value laden individuals studying value laden phenomena. problem with inter subjectivity and bias. value commitments should be recognized and made explicit as is the case with all scientific work Gunnar Myrdal. testing alternative interpretations and forming definitional agreement across a the spectrum of value commitments is powerful. 3. too complicated to can't explain in a generalizable way. problem with determinism. also not insurmountable. empirical laws do exist. 4. free will. causation is impossible because humans are free to behave as they wish. problems with determinism. confined by context they are in and bound by law. there are not endless possibilities so a pattern will eventually emerge. 5. every person is unique and behaves differently. the idea that nothing is shared is not a compelling critique and not necessarily true.

Answer 12

women absent from the work as researchers and subjects. critique methodological norms. 1. philosophical: previously, researchers asserted their work was value free but this is not the case anymore due to this critique. science cannot be value free. 2. moral: research ethics. treat people as humans not subjects. foundational work in response to this. 3. practical: most forceful at this point. the inclusion of women and poc should be mandatory if attempting to generalize to the population otherwise the conclusion will be misrepresentative and the conclusion will be distorted.

Answer 13

an inferential statistic (used to generalize from sample to population) used to discern statistical significance or the probability that the relationship occurred by chance. can have type 1 or type 2 error. theoretical probability distribution that gives the likelihood of each degree of the relationship occurring in the sample if there was no relationship in the population from which the sample was draw. assumes no relationship and compares it to the relationship at hand. p value of 0.05 or less indicates that researcher can be 95% confident the relationship did not occur by chance. 1. hypothesized relationship in advance 2. random sample, know odds of inclusion for everyone in pop and no one has a 0 chance. 3. no more than 25% of cells have an expected frequency of less than 5. 4. a non-significant chi square means no relationship but it does not indicate that the sample is not representative of the population.

Answer 14

Step 1: state the null hypothesis which asserts that there is no relationship between the IV and DV. No relationship would mean in a cross tabulation there would be no gaps across the columns. The goal is to reject the null hypothesis. Step 2: calculate the expected cell frequencies. if there is no relationship between the IV and DV then the cell percentage should be the same as the marginal row percentage. column marginal x row marginal/ total number of cases. Step 3: compare expected cell frequencies with observed cell frequencies. Step 4: adjust the sample size Step 5: calculate degrees of freedom (# of columns-1) (# rows-1). Step 6: consult the theoretical chi-square distribution.

Answer 15

a measure of association is a descriptive statistics that tells the researcher how strong the observed relationship between the IV and DV is. the appropriate measure of association is dependent on the level of measurement and must correspond to the lowest one. PRE's (proportionate reduction in error) are a specific type of measure of association as they can be interpreted in a certain way. They indicate how much the researchers predictive ability for values of the DV increases when cases of the IV are known. Can be interpreted as the percentage increase in predictive ability from 0-0.1. For example, 0.3 30% increase in predictive ability. strong relationship. Should always try to use PRE measures of association because they offer the most precise measure of association. Can say with percentage precision how much predictive ability is increased. Non pre can only state the degree of the relationship. Nominal- Lambda. asymmetric measure. best prediction is the modal category, so analyze what the mode is for every IV category and total error without knowing the IV-total error when the IV is known/total number of cases. Misleading because always 0 if modal category is the same for all categories of IV. Ordinal or Higher: Gamma, symmetric, if two variables are in perfect agreement, what is the probability of drawing a positive pair: a pair of cases ranked in the same order on both variables. -1-1, perfect inversion to perfect agreement. ignores every case that has a tie for the IV and DV, overstates relationship. Taub for square tables, tauc for rectangle tables.

Answer 16

a sample is data collected from a select group of people within a population which is every member in the group being studied. two methodological categories: probability v.s non probability probability: every person in the population has a non-zero probability of being included in the sample and that probability is known. need a population list or a surrogate population list. Advantageous as it is a way to avoid bias and use inferential statistics. non-probability: the probability of a person being included from the population are not known and it is not certain if everyone in the population has a non-zero chance of inclusion. cannot do inferential statistics. Economic and convenience considerations as well as no access to full population list. increase accuracy, increase size of the sample or reduce variability by stratifying

Answer 17

Simple Random Sample: every member in the population has an equal non zero probability of being included in the sample. can be created using a lottery method for a small population or a random number generator for a large population. can produce extreme samples bc equal prob.

Answer 18

Systematic Random Samples: the size of the total population is divided by the researchers desired sample size. this creates a sample interval (k). Using a population list, randomly select the first person and then choose every kth person. reduces risk of extreme samples but can produce them if there is a cyclical pattern in the population list.

Answer 19

Proportionate Stratified Random Samples: when the researcher is aware of certain characteristics that must be sampled from the population. adjustment of groups holding certain characteristics to correct for their population weight, or the proportion of that group within the population. three steps: the population is divided in to groups that hold the same characteristics. these groups are homogeneous and the characteristic by which the groups are being created are called stratification variables but must be related to chosen phenomena that is being studied. use a simple random sample to select a sample from within the homogeneous groups. combine each sample to produce a sample that is representative and weighted to the population. produce less extreme sample due to stratification but it may be difficult to operationalize a theoretically important characteristic.

Answer 20

Disproportionate Stratified Random Sample CES adjustment of groups holding certain characteristics to correct for their population weight, or the proportion of that group within the population. three steps: the population is divided in to groups that hold the same characteristics. these groups are homogeneous and the characteristic by which the groups are being created are called stratification variables but must be related to chosen phenomena that is being studied. use a simple random sample to select a sample from within the homogeneous groups. combine each sample to produce a sample that is representative and weighted to the population. however, over sample certain groups or strata and under sample others over or under their population weight. allows for stratum that are small but important from a theoretical perspective to be analyzed statistically. could be the territories or indigenous people.

Answer 21

Multi Stage Random Cluster Sample When there is no population list available the researcher randomly select groupings of members in the population. This process is repeated for different randomly selected groupings of members in the population but this confounds the risk of sampling error. study a geographic area start with municipal districts, then move to cities or towns then eventually households. three stage random cluster sample.

Answer 22

CANNOT USE INFERENTIAL STATISTICS WORST TO BEST 1. convenience sampling. use whoever is the easier to access, very unlikely to get a sample that is representative of the population. online polls and businesses like A & W who stop people on the street to try their food. 2. purposive. researcher uses own judgement by examining the population and applying own expertise to the sample. try to make sample as representative as possible and can yield representative results but can also be biased. 3. quota sampling similar to stratifying the population in psrs dsrs. create a sample that mirrors the population but ultimately up to the researchers discretion what that would look like.

Answer 23

population size has nothing to do with sample size, population dispersion does. to use inferential statistics, must be able to infer from sample to the population. when formulating sample size must consider the amount of error that researcher is prepared to tolerate, estimation of the population dispersion for a given variable using population standard deviation and the z-value which corresponds to the level of confidence. when there is low error, the sample is larger. populations with lots of dispersion (widely scattered data) are bigger samples. Central Limits Theorem: sampling distribution of the sample means. take multiple samples and find the mean of each sample. as the number of samples taken increases or the sample size gets larger, approach normal distribution (mean, median, mode all the same and all highest point on curve. symmetrical. 2 SD 95%). regardless of the population distribution, the mean of all the sample means is a true approximation of the population mean. If you have a normal distribution(according to central limits theorem sampling distribution of sampling means you should) can convert data in to z-scores to estimate the probability of any range of values occurring around the mean. SD units, refers to what area under the normal distribution is covered in percentages. (1- 68%, 2-95%, 3-99%). Confidence Intervals are the range around the sample estimate that the researcher is confident the real value is. determine how well the sample reflects the population. 1. estimate the error around the sample's mean and where it lies on the distribution of the mean of all samples taken. (SD of pop/square root sample size if don't have SD pop sub SD sample) 2. multiply by the level of confidence (68, 95 or 99 according to how much error willing to tolerate). 3. using a z score 1 z score = 1.64, 2 z-scores - 1.96 and 3 = 2.57

Answer 24

the difference between the sample and the population. however, scarcely are certain enough about the population value to compare the sample. low error means a larger sample population with a lot of dispersion a larger sample

Answer 25

Central Limits Theorem: N= 30+. sampling distribution of the sample means. take multiple samples and find the mean of each sample. as the number of samples taken increases or the sample size gets larger, approach normal distribution (mean, median, mode all the same and all highest point on curve. symmetrical. 2 SD 95%). regardless of the population distribution, the mean of all the sample means is a true approximation of the population mean

Answer 26

Confidence Intervals are the range around the sample estimate that the researcher is confident the real value is. determine how well the sample reflects the population. 1. estimate the error around the sample's mean and where it lies on the distribution of the mean of all samples taken (SD pop/ square root sample size sub sample SD if no pop SD) 2. multiply by the level of confidence (68, 95 or 99 according to how much error willing to tolerate). 3. using a z score 1 z score = 1.64 64% confident, 2 z-scores - 1.96 around 95% confident and 3 = 2.57 99% confident

Answer 27

1. look for stars. no stars = support for null hypothesis because no statistical significance 2. look for sign. negative sign means negative relationship. as IV (x) goes up, DV (y) goes down. positive sign means positive relationship. as IV (x) goes up, DV (y) goes up.

Answer 28

If both IV/DV interval/ration recode into ordinal variables or dichotomies and run crosstabs calculate correlation coefficient (r) run a regression

Answer 29

is Pearsons r measure the strength of the relationship between x and y (IV and DV). -1 - 1 where -1 is perfect negative relationship and +1 is perfect positive relationship. measure of association NOT causality so spuriousness is still an issue. cannot definitively state that x causes y. measures how close the observations are to the straight line of best fit. only works on linear relationships. the further away observations lie from the line of best fit, the weaker the relationship. +/- 0.01-0.30 is a weak correlation. most polisci work falls +/- 0.31-0.70 moderate +/- 0.71-0.1 strong.

Answer 30

explains how good the entire model is at explaining variation in the DV. how well IV explains DV r^2 x 100 is equal to a PRE measure and is the percentage of variation in Y explained by knowing X. stronger the relationship the steeper the line the regression line is the best guess for y for every value of x. the regression line counts all variation in y for every x value then draw a straight line that minimizes variation. DV is equal to a= the constant at which IV=0 and the regression line intersects the y axis + the slope of the regression line (how strong the relationship is) x the independent variable + the error term error term: estimate of how much error there will be (there is no such thing as no error) called residual and it is the best measure of prediction of error (must be small).

Answer 31

several x's and one y or in other words several IV's and their effect on one DV to reduce spuriousness and find the real causal influence on the DV more dominant in empirical polisci bc must examine three statistics: 1. r2, proportion of variation in the DV that is explained by the combination of IV in the model 2. F test- test for statistical significance that can be interpreted like chi-square. does this relationship hold true in the population from which the sample was drawn from ORRRRRR nah did it happen by chance 3. T test- statistical significance we should have in the coefficient being significantly different than zero. the slope of the linear relationship between the IVs and DV

Answer 32

Concepts: abstractions with clear definitions. universal descriptive word that must refer to an observable phenomena (indirectly or directly.) act as building blocks for theories as variation in concepts leads to theories. the relationship between components in the abstract level of experience, concepts, form theories. data containers directs what must be observed. can be can be defined in three ways. move from 2 to 3 is increasing in focus. 1. real (essential nature or attributes. not definition used in empirical research) 2. nominal: name the concept and the properties of the associated phenomena that the concept represents. usually some consensus. - clear and explicit with assumptions clearly stated. must be in accordance with transmissibility to guard against bias - precise. indicates what should be excluded and included. - non circular. must find a way to explain a phenomena that does not use or repeat the same words or ideas - positive. must define the concept by what it is not what characteristics it lacks. no positive statements 3. operational: after the nominal definition has been established, operational definitions indicate the observations that will be used to represent the concept in the real of empiricism. related to measurement validity (did we measure what we said we were going to measure).

Answer 33

Classification: sorting of phenomena by defining concepts and categorizing them in mutually exclusive groupings. Comparison: discerning whether there is more or less of a given concept Quantification: measuring how much or less of a concept there is. statistical analysis at this point. anything that is countable falls here.

Answer 34

in order to operationalize a concept, it must have a direct observable counterpart, an indirect observable counterpart through an operational definition of must form relationships with other concepts to form theories. a concepts empirical counterpart is a variable. not yet observation. when operationalizing a concept, can lose parts of the concept that do not transfer over in to operational definitions or variables. stating the relationship between concepts is the process of theory building. depending on the research method- inductive empirical generalizations or deductive propositions. important as concepts are abstractions, steps to move to the empirical realm. in order to build theories need concepts and theories are important as they explain how and why there is linkage between concepts, organize knowledge and help generate hypothesis. all steps to take before moving form the realm of abstraction to the realm of

Answer 35

IV is the column, DV is the row cell frequency is the number of people in the box column marginal is total for the column row marginal is total for the row ordinal (can't remove categories but can collapse into 3 or less categories) only interpret the top and bottom row. curvilinear gaps across columns cannot be congruent with the hypothesis since it is a linear conjecture. nom can remove, interpret row that is named in hypothesis. percentages calculated based on number of cases in each category of the IV. column percentages. percentage point gaps ACROSS columns. 1-4 trivial, 8-10 significant.

Answer 36

John Santos

Answer 37

Science Theaters 147

Answer 38

Ordinal level crosstabs gaps across categories are large then go small. relates to where the IV has the most power to move values on the DV. ensure that it does not become a curveliner gap across columns.