Final Flashcards

1
Q

What is a causal relationship? Can you give an example?

A

The theoretical linkage between two concepts or otherwise called cause and effect
The threat of mutually assured destruction prevents use of nuclear weapons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What’s the difference between falsification and proving theories? Why is this important in social science?

A

In the construction and proving of theories, the defining characteristic of a theory is that it be falsifiable. This is done so a scientists has some empirical pattern that scientist must use to prove or disprove their theory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between quantitative and qualitative methods? Can you give examples of both?

A

Quantitative methods involve the use of empirical data and patterns used to prove theories whereas qualitative tend to involve the concepts of perspective taking and feelings of subjects. In other words, quantitative uses numbers and statistics while qualitative use words and meanings. Qualitative also use much small cases numbers in research where quantitative use large scale studies in contrast.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a dependent variable? What is an independent variable? How are they related? Can you write a hypothesis with a dependent and independent variable and identify which is the dependent variable and which is the independent variable?

A

An independent variable would be considered the cause and a dependent variable would be considered the outcome of the effect in testing of a theory. Due to this the dependent variables values will change in correspondence to the independent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the four levels of measurement? How are they different? Why is the distinction important to know? What the difference between an independent variable and an interval variable?

A

Nominal: there is no inherent ranking, typically use for binary measures or categorical variables
Interval: there is an inherent ranking as there is a continuous series of numerical values
Ratio: similar to interval except that it contains an absolute zero
Ordinal: there is an inherent ranking to the values however there is not fixed distance or measure between the values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is central tendency? Can you explain the different of measures of central tendency? What’s the relationship between measures of central tendency and different types of variable measurements? Can you explain why certain measures of central tendency are only appropriate for certain types of measures?

A

Definition of central tendency: measures that indicate the locations where typical scores of a variable are found.
- Modal value: the most frequently occurring value of variable (x)
- Median value: the value that is located at the exact center of our cases (N) when the variable (x) is sorted from lowest to highest ( or highest to lowest)
- Mean value: sum of all values of a variable (x) across the observations divided by the total number of cases (N) in the sample (average)
Central Tendency and Levels of Measurement
- Nominal variables can only be summarized with the mode
- Ordinal variables can be summarized using both the mode and the media
o Special cases: when ordinal variables have a large number of values (typically 7 or more) we can use the mean because they begin to take on mathematical properties of continuous variables
- Interval and ration variables
o Can be summarized using the mode, the median and the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Can you explain the concept of dispersion? Why is it important? Are you familiar with different ways of calculating dispersion for different levels of measurement?

A

Is an important feature that refers to how spread out a variables scores are in a sample
Similar central tendency value yet the scores of one group might be tightly clustered, while the scores of other groups might be widely spread out. Measures of dispersion summarize how widley scores on a variable actually differ in a sample
4 Measures of Dispersion- remember the formulas on how to calculate each
- Range
- Variance
- Standard Deviation
- IQV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the characteristics of a distribution

A

The mean, mode and median are all equal. The curve is symmetric at the center (i.e. around the mean, μ). Exactly half of the values are to the left of center and exactly half the values are to the right. The total area under the curve is 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  1. What are the characteristics of the normal curve? Why is it important to know when most variables are not normally distributed?
A

Continuous variables (interval) with a normal curve constitute a class in the sense that they share a set of common characteristics:
o All three measures of central tendency (mode, median and mean) are approximately the same
o The distribution is symmetrical; the halves closely mirror one another, without skewness (or approximately without skewness, in practice)
 Example: heigh is normally distributed (follows a normal curve)
Fixed: going from the mean out a fixed distance (measure in terms of s), you will find the same percentage of cases, regardless of the raw values of the mean and standard deviation
It is the most important probability distribution in statistics because it accurately describes the distribution of values for many natural phenomena

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Can you explain what a z-score is? Why is it important?

A

For normally distributed variables, we can translate standard deviations into z-scores (a common unit for comparing values of a variable)
- We use the variables means and standard deviation to create z-scores
- The resulting distribution of the z-scores has a mean of 0 and a standard deviation of 1
We use z-scores to locate observations in a normal distribution. At a given z-score we can identify the percentage of cases above and below the observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Can you describe the difference between univariate, bivariate, and multivariate analysis

A

Univariate Analysis: When variables are reported, they are done so individually. In other words the analysis is on one variable.
Bi-Variate Analysis: is centered around the relationship between two variables.
Multivariate analysis occurs when statistical tools take a acacount of three or more variables simultaneously.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

We use the terms categorical and continuous variables often. What are they and how are they related to types of measures? Why is this distinction important for analysis?

A

Categorical Variables can be called discrete or qualitative variables. The levels of measurement used are nominal and ordinal variables. Categorical Variables come from the nature of the values of the variables. Something like religion could only be classified as categorical as religion doesn’t reperesnt any ranking or inherent order.
Continuous Variables can also be called quantitative variables. The numerical values of these variables can be subject to legitimate arithmetic operations. The levels of measurement used are interval and ratio.
The reason both are important are that it makes the organization of statistical choices easier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are crosstabs? What types of variables are most appropriate to use in crosstabs? Can you interpret them?

A

A bivariate method to analyze the strength and from
Use crosstabs when both variables are categorical (either ordinal or nominal)
Definition: a table displaying the frequencies of intersecting values of two variables (an independent and dependent variable
They are useful for establishing form between two variables however they cannot be used in univariate or multivariate analysis. They are useful for preliminary hypothesis testing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

We talk a lot about form, strength and precision. Can you define them? How are they different? Can you discuss these terms in the context of crosstabs and regression analysis?

A

Form: refers to the structure of the connection and answers the question “what kind of relationship is it”
Strength: addresses how much impact one variable has on another
Precisions refers to how well the regression line approximates the actual data and by extension, how appropriate the form and extent descriptions are.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Using your own words, can you explain regression analysis? What is it? How is it used? What types of measures can we use with regression analysis?

A

Regression is a technique used to model and analyze the relationships between variables and oftentimes how they contribute and are related to producing a particular outcome together. A linear regression refers to a regression model that is completely made up of linear variables

  • Multiple regression estimates each independent variable effect on the dependent, while taking into account the effect on other variables
  • Multiple regression can be used to identify
  • Which independent variables have the strongest relationship with the DV
  • How much impact each independent variable has on the DV
  • How the set of independent variables jointly affects the DV
  • How well the set of independent variables explains the DV
  • And models a multi-causal approach to explain the DV
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why do we need to account for multiple variables? How do the Venn diagrams discussed in class relate to bivariate and multiple regression?

A

Limitations with bivariate analysis

  • Does not account for other explanations
  • Does not include context
17
Q

What is Pearson’s r and how is it useful in quantitative analysis

A

The pearsons R is the measure used to account for precision. In particular, the larger the magnitude of Pearsons r, the better the regression line fits the data points. This is important because it is the best know measure of association between variables.
- The statistical measure for calculating precision is called the Pearsons R or the correlation coefficient. Pearsons R ranges from -1.0 to 1.0. A value of 0 signifies that the ordered pairs on a scatterplot are random, indicating no relationship. As the magnitude of r (independent of its sign) increases then precision increases

18
Q

What is the elaboration model and can you explain the logic behind it? We learned four types of relationships in the context of the elaboration model. What are they and how are they different?

A

Elaboration analysis starts with two sets of tables: an original table that contains the connection between the independent and dependent variables when other variaables are operating and partial tables that examine the indepdenent-dependent variable connection when a contextual variables effects are remove.
4 types of multivariate relationships
- Authentic
o The relationship between X and Y remains the same whether Z is allowed to vary or whether it is held Constant (same relationship under condition 1 and 2)
o Regardless of Z being taken into accoutn
- Spurious
o Spurious is the opposite of authentic; spurious relationships are illusory
o In a spurious relationship, an X-Y relationship is apparent when Z is operating (condition 1)(not being controlled), but this apparent X-Y connection disappears or weakens when Z is held constant
- Intervening
o The apparent relationship exists when Z is allowed to vary (condition 1), but disappears (usually the strength of relationship weakens) when Z and its effects are controlled (condition 2).
o X causes Z and Z causes Y
- Interactive
o Z1 and Z2 indicate different values of Z
o Here, the effect of Z on the X-Y relationship depends on the particular value of the Z variable and this is the sign of interaction
o Z2 to X to Z1 to Y

19
Q

. If given a regression table, can you write out the linear equation? Can you calculate predicted values (y-hat)? How do you interpret coefficients? What is the intercept and why is it important to include in the linear equation

A

Y = mx+b = equation of the best fit regression line

  • You can interpret the coefficients as the slope which is important because slope characterizes how much the dependent variable changes for a fixed change in the independent variable.
  • The Y intercept is important because it is the point where the line crosses the axis of the dependent variable.
20
Q

How is inferential statistics different from descriptive statistics? When do we use inferential statistics? What does “inferences” mean in the context of statistics?

A

Inferential statistics use the sample to make inferences (generalizations) about a broader population where as descriptive statistics focuses on the sample, describing relationships within the sample. Inferences mean taking data from a specific sample and drawing conclusions onto the larger population from that data. We use inferential statistics when we want to know more about a broader population as just analysing the feature of data does not prove theories. As while it is too difficult and costly to sample the entire population thus drawing inferences from a smaller sample is necessary.

21
Q

What is the difference between the types of sampling procedures learned in class? Why is sampling procedures important for inferential statistics?

A

There are two types of sampling procedures, non-probability sampling and probability (random) sampling.
• Non probability sampling
• Convenience (random polling, in a building for example), snowball (get a first sample individual to pass on the questions), quote (once you reach the required amount of characteristics for sample, stop polling that type of individual, for example you need 50 males)
• We cannot make credible inferences beyond the sample
• Commonly used in pilot studies
• Probability (Random) Sampling
• In taking probability sample, the researcher ensures that every member (unit) of the population (of interest) has an equal chance of being chosen for the sample
• In a student population (N) of 30,000 each student has a 1/30,000 chance of being chosen (known probability)
• A probability sample eliminates selection bias, a source systematic error
• But still contains random sampling error

22
Q

What is the sampling distribution? Can you describe the characteristics of the sampling distribution? Why is it the centerpiece of inferential statistics?

A

Definition: a sampling distribution is the set of different scores (statistics) that result from repeated replications of the same study using different samples. The differences in the outcomes are due to random sampling error. Understanding that the results of any single study come from a sampling distribution of different possible outcomes lets you appreciate that any single investigation, using the best sampling procedures can produce very wrong results. The problem of inference, of trying to generalize from any particular sample statistics to the population parameter, centers on managing this ever present possibility.
Increasing N, decreases standard error (increases our confidence that the sample statistic is close to population parameter)
Increased variation in the population parameter increases standard error
All variables including skewed ones have normally distributed sampling distributions
When thinking about inferential statistics, reporting a single value for a sample statistic is risky, why?
All values we take from a sample contain error
When you have systematic error, random sampling error can still occur

23
Q

What does “standard error” mean? What are the two types of standard error discussed in class? Why is it important to know? How do we use it in inferential statistics

A

Sampling error or standard error is defined as the extent to which a sample statistic differs, by chance (stochastic error), from a population parameter. It tells us how dispersed the sample statistics from various studies are. The size of the standard error affects the appearance of the sampling distribution. Since stand error indicate the level of dispersion.
It is important for inference, the standard error computation is really just the standard deviation calculated on a normal distribution of sample statistics. Even if the probability sampling techniques are used to select the sample from the population, it is possible that the sample statistics may imperfectly reflect the population parameter. Thus the Standard deviation of the sampling distribution indicates how wide the resulting sampling errors might be hence standard error.

24
Q

What’s the relationship between accuracy and precision? How do these terms relate to confidence intervals?

A

In general, in taking measurements or making inferences, there is always a trade off between precision and accuracy. The more precise statements are, the less accurate they are likely to be. For this reason, inferential statistics rely on interval estimates. An interval estimate uses sample statistics to generate a range of scores within the population parameter is likely to occur. Social science research uses a trade off of precision to improve the accuracy of parameter estimations. Thus the result of the trade off are interval estimates. Interval estimates generate confidence intervals, which is a range of scores with the statistical point estimate at its center. The wider the confidence interval the greater the confidence in the parameter estimate

25
Q

What is a point estimate? What is an interval estimate? How are they different? Can you give an example of a point estimate?

A

Point estimates are precise descriptive numbers that generalize directly from the sample statistics to the population parameter. Point estimates are appealing inferential procedures because of there precision.
Example of point estimate: if researchers concluded that because of the proportion of women in the sample is 0.56 they would be using a point estimate
Interval Estimates uses sample statistics to generate a range of scores within which the population parameter is likely to occur.

26
Q

What is the logic of confidence intervals? What is the relationship between z-scores and confidence intervals? What do we mean when we say “confidence”?

A

The logic of confidence intervals is that sense point estimates are very unlikely to be accurate a research employs a interval estimation by identifying a range within which the mean likely occurs. The probability of the confidence interval capturing the population parameter is called the confidence level.
You can use the area that our confidence level is in and contrast it with the area under a z-score to determine the width of the confidence interval. Confidence in this sense means, that if we have 80% confidence interval, then 80% of the sample statistics will fall within the confidence interval.

27
Q

What is hypothesis testing? Can you describe the logic?

A

The logic of hypothesis testing begins differently than interval estimation, wherein interval estimation procedures begin with no prior idea of the value of the population parameter. Hypothesis testing begins with an assumption about what the value of the population parameter is. The logic employs two kinds of hypothesis, null hypothesis (key one) and research hypothesis. They are contrasted against each other.

28
Q

Can you explain statistical significance? When is a sample statistic statistically significant

A

Test of statistical significance helps us consider whether the observed relationship in the sample is likely to occur in the population or whether it could have happene by chance when the sample was drawn
A measure of association tells the researcher whether two variables are associated Examples? strength and form in crosstabs, bivariate & multiple regression (R2, beta coefficient)
Determining the level of significance
• Convention in social science is .05 level and .01 level
• Ask this question: “if, in the unobserved population, H0 is true, how often by chance will we obtain the relationship observed in the sample”. If the answer is “more than 5 times out of 100”, do not reject H0. If the answer is “less than 5 times out of 100” reject H0
Compare the 95% intervals of two or more sample means.
If the intervals overlap, we cannot reject the null hypothesis

29
Q

What is the null hypothesis and how is it different from the research hypothesis? What does it mean to “accept/reject” the null hypothesis? Can you give an example of each? How does the logic of the null hypothesis relate to falsification and proving causal relationships

A

The null hypothesis makes an assumption about conditions in the population; typically stating that no relationship exists between the variables of interest in the population.
The research hypothesis makes a prediction about population parameters based on beliefs, expectations, theories.
They are different in that the research hypothesis is derived from the preliminary research activity and in that sense, it is substantive and is an alternative to null hypothesis. In short, the null hypothesis makes an assumption about conditions in the population while the research hypothesis makes a prediction about population parameters based on existing understandings