Final Flashcards
What is a causal relationship? Can you give an example?
The theoretical linkage between two concepts or otherwise called cause and effect
The threat of mutually assured destruction prevents use of nuclear weapons
What’s the difference between falsification and proving theories? Why is this important in social science?
In the construction and proving of theories, the defining characteristic of a theory is that it be falsifiable. This is done so a scientists has some empirical pattern that scientist must use to prove or disprove their theory.
What is the difference between quantitative and qualitative methods? Can you give examples of both?
Quantitative methods involve the use of empirical data and patterns used to prove theories whereas qualitative tend to involve the concepts of perspective taking and feelings of subjects. In other words, quantitative uses numbers and statistics while qualitative use words and meanings. Qualitative also use much small cases numbers in research where quantitative use large scale studies in contrast.
What is a dependent variable? What is an independent variable? How are they related? Can you write a hypothesis with a dependent and independent variable and identify which is the dependent variable and which is the independent variable?
An independent variable would be considered the cause and a dependent variable would be considered the outcome of the effect in testing of a theory. Due to this the dependent variables values will change in correspondence to the independent variable.
What are the four levels of measurement? How are they different? Why is the distinction important to know? What the difference between an independent variable and an interval variable?
Nominal: there is no inherent ranking, typically use for binary measures or categorical variables
Interval: there is an inherent ranking as there is a continuous series of numerical values
Ratio: similar to interval except that it contains an absolute zero
Ordinal: there is an inherent ranking to the values however there is not fixed distance or measure between the values
What is central tendency? Can you explain the different of measures of central tendency? What’s the relationship between measures of central tendency and different types of variable measurements? Can you explain why certain measures of central tendency are only appropriate for certain types of measures?
Definition of central tendency: measures that indicate the locations where typical scores of a variable are found.
- Modal value: the most frequently occurring value of variable (x)
- Median value: the value that is located at the exact center of our cases (N) when the variable (x) is sorted from lowest to highest ( or highest to lowest)
- Mean value: sum of all values of a variable (x) across the observations divided by the total number of cases (N) in the sample (average)
Central Tendency and Levels of Measurement
- Nominal variables can only be summarized with the mode
- Ordinal variables can be summarized using both the mode and the media
o Special cases: when ordinal variables have a large number of values (typically 7 or more) we can use the mean because they begin to take on mathematical properties of continuous variables
- Interval and ration variables
o Can be summarized using the mode, the median and the mean
Can you explain the concept of dispersion? Why is it important? Are you familiar with different ways of calculating dispersion for different levels of measurement?
Is an important feature that refers to how spread out a variables scores are in a sample
Similar central tendency value yet the scores of one group might be tightly clustered, while the scores of other groups might be widely spread out. Measures of dispersion summarize how widley scores on a variable actually differ in a sample
4 Measures of Dispersion- remember the formulas on how to calculate each
- Range
- Variance
- Standard Deviation
- IQV
What are the characteristics of a distribution
The mean, mode and median are all equal. The curve is symmetric at the center (i.e. around the mean, μ). Exactly half of the values are to the left of center and exactly half the values are to the right. The total area under the curve is 1.
- What are the characteristics of the normal curve? Why is it important to know when most variables are not normally distributed?
Continuous variables (interval) with a normal curve constitute a class in the sense that they share a set of common characteristics:
o All three measures of central tendency (mode, median and mean) are approximately the same
o The distribution is symmetrical; the halves closely mirror one another, without skewness (or approximately without skewness, in practice)
Example: heigh is normally distributed (follows a normal curve)
Fixed: going from the mean out a fixed distance (measure in terms of s), you will find the same percentage of cases, regardless of the raw values of the mean and standard deviation
It is the most important probability distribution in statistics because it accurately describes the distribution of values for many natural phenomena
Can you explain what a z-score is? Why is it important?
For normally distributed variables, we can translate standard deviations into z-scores (a common unit for comparing values of a variable)
- We use the variables means and standard deviation to create z-scores
- The resulting distribution of the z-scores has a mean of 0 and a standard deviation of 1
We use z-scores to locate observations in a normal distribution. At a given z-score we can identify the percentage of cases above and below the observations
Can you describe the difference between univariate, bivariate, and multivariate analysis
Univariate Analysis: When variables are reported, they are done so individually. In other words the analysis is on one variable.
Bi-Variate Analysis: is centered around the relationship between two variables.
Multivariate analysis occurs when statistical tools take a acacount of three or more variables simultaneously.
We use the terms categorical and continuous variables often. What are they and how are they related to types of measures? Why is this distinction important for analysis?
Categorical Variables can be called discrete or qualitative variables. The levels of measurement used are nominal and ordinal variables. Categorical Variables come from the nature of the values of the variables. Something like religion could only be classified as categorical as religion doesn’t reperesnt any ranking or inherent order.
Continuous Variables can also be called quantitative variables. The numerical values of these variables can be subject to legitimate arithmetic operations. The levels of measurement used are interval and ratio.
The reason both are important are that it makes the organization of statistical choices easier.
What are crosstabs? What types of variables are most appropriate to use in crosstabs? Can you interpret them?
A bivariate method to analyze the strength and from
Use crosstabs when both variables are categorical (either ordinal or nominal)
Definition: a table displaying the frequencies of intersecting values of two variables (an independent and dependent variable
They are useful for establishing form between two variables however they cannot be used in univariate or multivariate analysis. They are useful for preliminary hypothesis testing
We talk a lot about form, strength and precision. Can you define them? How are they different? Can you discuss these terms in the context of crosstabs and regression analysis?
Form: refers to the structure of the connection and answers the question “what kind of relationship is it”
Strength: addresses how much impact one variable has on another
Precisions refers to how well the regression line approximates the actual data and by extension, how appropriate the form and extent descriptions are.
Using your own words, can you explain regression analysis? What is it? How is it used? What types of measures can we use with regression analysis?
Regression is a technique used to model and analyze the relationships between variables and oftentimes how they contribute and are related to producing a particular outcome together. A linear regression refers to a regression model that is completely made up of linear variables
- Multiple regression estimates each independent variable effect on the dependent, while taking into account the effect on other variables
- Multiple regression can be used to identify
- Which independent variables have the strongest relationship with the DV
- How much impact each independent variable has on the DV
- How the set of independent variables jointly affects the DV
- How well the set of independent variables explains the DV
- And models a multi-causal approach to explain the DV