2: Descriptive Statistics And Inferential Statistics Flashcards
Measures of ______.
- Mean
- Median
- Mode
Central Tendency
Measures of Central Tendency:
Arithmetic ____ is the sum of all values over the total value numbers.
Mean
Measures of Central Tendency:
The middle number.
Median
Measures of Central Tendency:
Most occuring number.
Mode.
It provides insights into how spread out or scattered the data points in a data set are.
Measures of Variations
Measures of __________:
- Range
- Variance
- Standard Deviation (SD)
- Interquartile Range
- Coefficient of Variation
Variations
Measures of Variations:
Simplest measure of variation. Difference between maximum and minimum values in a data set. Cons are sensitive to outliers and may not provide a complete set of data dispersion.
Range
Measures of Variations:
Average squared difference from the mean, quantifying individual data points deviations.
Variance
Measures of Variations:
Square root of variance, showing average data dispersion, higher values indicating more variability.
Standard Deviation
Measures of Variations:
The range between 25th and 75th percentiles, less affected by outliers.
Interquartile Range
Measures of Variations:
Standard deviation relative to the mean, used for comparing variability in different datasets.
Coefficient of Variation
________ is the number of distances of deviation from the mean:
To bring it back, get the square root of Standard Deviation.
Variance
- Most common variation to describe data.
- Most confusing concept.
- Understanding it, is essential to understand statistics.
Standard Deviation (SD)
In Normal Distribution, this rule approximates number of values in SD; 68%, 95%, and 99.7%
Empirical Rule
The probability that the study results are due to chance.
To know this is the first step in avoiding common errors in statistical interpretation.
It puts number on uncertainty but cannot eliminate uncertainty.
It is the probability that the null hypothesis is true.
P-Value
A __________ is merely a statement of fact, which can be true or false.
Hypothesis
In _________ hypothesis testing, one takes the hypothesis of interest and translates it with “not” into a null hypothesis, and then looks for evidence to reject the null.
Classical
According to Dictionary of Epidemiology, ______ is the probability that a test statistic would be as extreme as or more extreme than observed if the null hypothesis were true.
P-Value
The near-impossibility for a truly random ______ is the first limitation and threat to the accuracy of P-Value and the ability to generalize results to a larger population.
Sample
The distribution of the test statistic _, has a mean of 0 and standard deviation of 1.
Z
True or False.
A nonsignificant P value is good evidence of a true hypothesis.
False.
Absence of evidence is not evidence of absence. Other evidence is needed to appropriately accept the null hypothesis.
Two types of Parameters:
- Metric Level (Quantitative Data)
- Categorical Parameters (Qualitative Data)
“Everything is related to everything else.” (Meehl, 1990b)
Crud Factor.
“But the notion that the correlation arbitrarily paired trait variables will be, while not literally zero, of such miniscule size as to be of no importance, is surely wrong.”
It is a way to understand and quantify the relationship between 2 or more variables.
Regression
Karl Popper, a philosoher of science has the one of the most significant contribution in philosohy and science.
“You can never prove a theory beyond all doubt, but you can disprove it through evidence.”
It is a process of representing our theories as null hypothesis and subjecting them to challenge.
Popperian Principle/Falsifiablitiy
2 major types of sampling:
Probability Sampling and Non-probability Sampling.
Probability Sampling - each member of the population has an equal chance of being selected for the survey.
Non-probability Sampling - does not involve random sampling.
Types of Probability Sampling: (3)
- Random Sampling - most basic form. Randomly choosing.
- Stratified Random Sampling - separating the population and randomly choosing the same ratio. For example, your population is undergraduate and graduate students in a university. If your university has 70% undergraduate, and 30% graduates, your sample will have a similar ratio.
- Cluster Sampling - used when a population is spread over a large geographic region. For example, if you need a sample from all over the Philippines, you will randomly choose 10 provinces, and choose random people in each selected province.
Types of Non-probability Sampling: (10)
Note. For reading purposes only lol
- Expert Sampling
a. Description: Experts in a particular field are selected to participate in the study.
b. Example: Conducting interviews with renowned psychologists on a specific psychological theory. - Availability Sampling
a. Description: Participants are selected based on their availability at a particular time or place.
b. Example: Surveying attendees at a specific event. - Quasi-Random Sampling
a. Description: Participants are selected using a method that appears to be random, but there may be some level of researcher discretion.
b. Example: Drawing names from a hat after assigning numbers to participants. - Homogeneous Sampling
a. Description: Selecting participants who share similar characteristics or experiences related to the research topic.
b. Example: Studying only psychology majors to understand their academic performance. - Heterogeneous Sampling
a. Description: Selecting participants who have a wide range of characteristics or experiences related to the research topic.
b. Example: Studying students from different majors to explore their career aspirations. - Volunteer (Self-selection) Sampling
a. Description: Participants voluntarily choose to participate in the study.
b. Example: Responding to an advertisement for a research study. - Quota Sampling
a. Description: The researcher sets specific quotas for different subgroups to ensure representation. Participants within those quotas are then selected using convenience or judgmental methods.
b. Example: Ensuring equal representation of different age groups in a survey. - Snowball Sampling
a. Description: Initial participants (seeds) are identified and then asked to refer others who meet the inclusion criteria.
b. Example: Researching a stigmatized group where direct access is challenging. - Purposive (Judgmental) Sampling
a. Description: Participants are selected based on specific characteristics or qualities that are relevant to the research question.
b. Example: Selecting individuals with a particular expertise for an expert panel. - Convenience Sampling
a. Description: Participants are chosen based on their easy accessibility or availability to the researcher.
b. Example: Surveying people passing by in a mall.
It states that the larger the sample, the more likely the distribution of the means will be normal.
Central Limit Theorem
Generally an analysis in determining relationships between two variables.
Under this analysis, there are specific analyses to use:
- Scatter plots: A graphical representation of the relationship between two continuous variables.
- Correlation analysis: Measures the strength and direction of association between two continuous variables.
- T-tests: Compares means of two groups to determine if there’s a significant difference.
- Chi-square tests: Examines the association between two categorical variables.
Bivariate Analysis
A table in which each cell represents a unique combination of values. It provides you with a visual view of comparative data but not a statistical significance.
Cross Tabulation (Cross Tab)
It is a test that looks at each cell in a cross-tabulation and measures the difference between what was observed and what would be expected in the general population.
It is one of the most important statistics when you are assessing the relationship between ordinal and/or nominal measures.
It cannot be used if any cell has an expected frequency of zero, or a negative integer.
It also provides p-value, like T-test.
Chi-Square
It compares the means between two values. For more than two groups, use ANOVA.
It involves means, therefore the dependent variable must be a ratio variable (e.g. exam score). The independent variable is nominal or ordinal (e.g. study technique).
T-test
It measures the strength of association between two variables, and reveals whether the correlation is negative or positive.
Correlation Coefficients
It is the number of values that can vary in the estimation of a parameter. It is calculated depending on the statistical test or analysis you are using.
Degrees of Freedom
Correlation coefficients is used whenever we want to test the strength of a relationship.
There are many tests to measure correlation; which one to use depends on what variables you are examining.
_______ variables: Phi, Cramer’s V, Lambda, Goodman and Kruskal’s Tau.
Nominal
Correlation coefficients is used whenever we want to test the strength of a relationship.
There are many tests to measure correlation; which one to use depends on what variables you are examining.
_______ variables: Gamma, Sommers D, Spearman’s Rho.
Ordinal
Correlation coefficients is used whenever we want to test the strength of a relationship.
There are many tests to measure correlation; which one to use depends on what variables you are examining.
_______ variables: Pearson r
Ratio
A statistical test that measures the means of more than two groups.
The dependent variable must be ratio. Independent variable must be nominal or ordinal.
It shows there are significant differences between groups but it does not illustrate where the significance lies.
ANOVA
If ANOVA cannot show where the significance lies, we use this test to determine where the significance lies.
Post Hoc Comparison
It looks at the relationship between more than two variables.
Examples:
- Multiple Regression: Examines the relationship between a dependent variable and two or more independent variables.
- Factor Analysis: Reduces a large number of variables into a smaller set of underlying factors.
- Principal Component Analysis (PCA): Similar to factor analysis, it identifies underlying patterns in the data.
- Cluster Analysis: Groups observations or variables based on similarity.
- MANOVA (Multivariate Analysis of Variance): Examines the differences in means across multiple dependent variables.
Multivariate Analysis
It is better than doing multiple ANOVA tests to reduce potential Type 1 errors.
MANOVA
It examines the relationship between one effect variable called dependent or outcome variable, and one or more predictors called independent variables.
Multiple Linear Regression