Research and Assessment Methods Flashcards
An approach for understanding the meaning individuals and groups ascribe to a human or social problem
Emerging questions
Flexible written report
Qualitative research
An approach for testing objective theories by examining the relationships among variables (deductive)
Numbered data which can be analyzed using statistical procedures
Structured written report
Quantitative research
Collection of both qualitative and quantitative data
Integrating the two forms of data
May involve both philosophical assumptions and theoretical frameworks
Mixed methods research
A research method focusing on the study of a single episode. Usually it is not designed to compare one individual or group to another, although sometimes it may be included in comparative analysis as a key or illustrative example.
Case Study Method
Analysis where data from different settings or groups at the same point in time or from the same settings or groups over a period of time are analyzed to identify similarities and differences.
Comparative Analysis
A study of the way versions or the world, society, events, and psyche are produced in the use of language. It is often concerned with the construction of subjects within various forms of knowledge/power. Semiotics, deconstruction, and narrative analysis are forms.
Discourse Analysis
Also known as e-Science or e-Social Science, the harnessing of any digital technology to undertake and promote social research. This includes treating the digital sphere as a site of research by examining social interaction in the e-infrastructure.
e-Research
A multi-method qualitative (participant observation, interviews, discourse analyses of natural language and personal documents) approach that studies people in their “…naturally occuring settings or ‘fields’ by means of methods which capture their social meanings and ordinary activities, involving the researcher participating directly in the setting…”
Ethnography
a researcher goes to observe an everyday event in the environment where it occurs.
Field Research
An inductive form of qualitative research where data collection and analysis are conducted together. Theories remain rooted in the observations rather than generated in the abstract. An approach that develops the theory from the data collected, rather than applying a theory to the data.
Grounded Theory
a form of discourse analysis that seeks to study the textual devices at work in the constructions of process or sequence within a text.
The respondent gives a detailed account of themselves and is encouraged to tell their story rather than answer a predetermined list of questions.
This method is more successful when people are discussing a life changing event. Analysis of the narrative tells the researcher about the person’s understanding of the meaning of events in their lives.
Narrative Analysis
steps in the statistical process
(1) collect data (e.g., surveys); (2) describe and summarize the distribution of the values in the data set; and (3) interpret by means of inferential statistics and statistical modeling (i.e., draw general conclusions for the population on the basis of the sample).
classified into mutually exclusive groups or categories and lack intrinsic order. A zoning classification, social security number, and sex are examples of nominal data. The label of the categories does not matter and should not imply any order. So, even if one category might be labeled as 1 and the other as 2, those labels can be switched.
Nominal data
ordered categories implying a ranking of the observations. Even though ordinal data may be given numerical values, such as 1, 2, 3, and 4, the values themselves are meaningless. Only the rank counts. It would be incorrect to infer, for example, that 4 is twice 2, despite the temptation. Examples of ordinal data include letter grades, suitability for development, and response scales on a survey (e.g., 1 through 5).
Ordinal data
has an ordered relationship where the difference between the scales has a meaningful interpretation. The typical example of interval data is temperature, where the difference between 40 and 30 degrees is the same as between 30 and 20 degrees, but 20 degrees is not twice as cold as 40 degrees.
Interval data
the gold standard of measurement, where both absolute and relative differences have a meaning. The classic example of ratio data is a distance measure, where the difference between 40 and 30 miles is the same as the difference between 30 and 20 miles, and in addition, 40 miles is twice as far as 20 miles.
Ratio data
Type of variable that represents interval or ratio measurements
Quantitative variable
Type of variable that represents nominal or ordinal measurement
Qualitative variable
take an infinite number of values, both positive and negative, and with as fine a degree of precision as desired
Continuous variables
can only take on a finite number of distinct values
Discrete variables
can only take on two values, typically coded as 0 and 1
binary or dichotomous variables
the totality of some entity
population
subset of the population
sample
the characteristics of the distribution of values in a population or in a sample
Descriptive Statistics. The context will make clear whether the statistic pertains to the population (all values known), or to a sample (only partial observations).
use probability theory to determine characteristics of a population based on observations made on a sample from that population
Inferential Statistics - We infer things about the population based on what is observed in the sample.
the overall shape of all observed data. It can be listed as an ordered table, or graphically represented by a histogram or density plot
Distribution - A histogram groups observations in bins represented as what’s commonly referred to as a bar chart. A density plot shows a smooth curve.
a typical or representative value for the distribution of observed values. There are several ways to measure central tendency, including mean, median, and mode
Central tendency
lack of symmetry (in distribution of data)
skewness
presence of thick tails
kurtosis (i.e high likelihood of extreme values)
the difference between the largest and the smallest value
range
symmetric and has the additional property that the spread around the mean can be related to the proportion of observations. Often used as the reference distribution for statistical inference.
normal or Gaussian distribution, also referred to as the bell curve - 95% of observations are within two standard deviations from the mean
an equal number of observations are below and above the mean (e.g., this is the case for the normal distribution)
Symmetric distribution
where there are either more observations below the mean or more above the mean
Asymmetric distribution, or skew
average of a distribution - computed by adding up the values and dividing by the number of observations
mean
greater importance placed on specific entries or when representative values are used for groups of observations
weighted mean - For example, when computing a measure for the mean income among a number of counties, the value for each county could be multiplied by the number of people of the county, yielding a population-weighted mean
the middle value of a ranked distribution
median
the most frequent number in a distribution
mode
the average squared deviation from the mean
variance
square root of the variance
standard deviation
For example, for [1, 2, 3, 4, 5], we find that the mean is 3
15/5 = 3.
The squared deviation for each observation is (1 - 3)2 = 4, (2 - 3)2 = 1, (3 -3)2 = 0, (4 -3)2 = 1, and (5 - 3)2 = 4.
The sum of these squared deviations is 4 + 1+ 0 + 1 + 4 = 10. The variance is this value divided by the number of observations, or 10/5 = 2.
The standard deviation is the square root of the variance or √2
= 1.41…
with samples, where the mean is estimated and not known, we have to compute the mean first, we subtract 1 from the number of observations and divide by n - 1
degree of freedom correction
observations that lay outside this range of two standard deviations below and above the mean in a normal distribution (5% of data points)
outlier
measures the relative dispersion from the mean by taking the standard deviation and dividing by the mean
coefficient of variation
a standardization of the original variable by subtracting the mean and dividing by the standard deviation
z-score - For example, a z-score of more than 2 would mean the observation is more than two standard deviations away from the mean, or, it is an outlier in the sense just defined.
An alternative measure of dispersion/outliers, the difference in value between the 75 percentile and the 25 percentile, i.e., the 1/4 cut-off value and 3/4 cut-off value in a set of ranked values
inter-quartile range or IQR - Two fences are computed as the first quartile less 1.5 times the IQR and the third quartile plus 1.5 times the IQR. Observations that are outside these fences are termed outliers. This is visualized in a box plot (also called box and whiskers plot).
a statement about a particular characteristic of a population (or several populations)
hypothesis test
the point of departure or reference
null hypothesis (H0)
the research hypothesis wanted to support rejecting the null hypothesis
alternative hypothesis - never accept the alternative hypothesis, only statistical evidence to reject the null
a way to operationalize a hypothesis test
test statistic
variation between the population statistic and the sample statistic
sampling error
model misspecification, the assumptions of the model are wrong
systematic error
p-value or Type 1 Error
Significance, or the chance that the statistical decision is wrong. 5% or 1% may be the benchmark
a range around the sample statistic that contains the population statistic with a given level of confidence, typically 95% or 99%
confidence interval
used to compare the means of two populations based on their sample averages; for testing the significance of a regression coefficient
t-test
a more complex form of testing the equality of means between groups; for treatment effects analysis where the outcome of a variable is compared between a treatment group and a control group
ANOVA or analysis of variance
a measure of fit. It is a test that assesses the difference between a sample distribution and a hypothesized distribution. A Chi Square test is often used to test the null hypothesis of independence in a contingency table, i.e. when the observations are grouped according to two categorical variables
Chi Square test
a skewed distribution that is obtained by taking the square of a standard normal variable (so, it only takes positive values
Chi Square distribution
measures the strength of a linear relationship between two variables. Note that, very importantly, this does not imply anything about causation, i.e., whether one variable influences the other
correlation coefficient, computed by standardizing each of the variables and its value is between -1 and +1. The square of a correlation coefficient is often referred to as r2 (or R2), i.e., r-squared.
high values of one variable match high values of the other, and low values match low values
positive correlation
high values of one variable match low values of the other, and vice versa
negative correlation
the linear relation between two or more variables; typically a dependent variable (on the left-hand side of the equal sign) and one or more explanatory variables (on the right-hand side of the equal sign)
linear regression
y = a + b1x1 + b2x2 + e
typical regression equation - y is the dependent variable, for example, the outcome on the AICP test, and x1 and x2 are explanatory variables, such as the number of hours studied and the years of experience. The e stands for a random error term, since the variables observed are a sample from the population. The coefficient is the intercept, and b1 and b2 are the slope coefficients. The coefficients of the linear regression are estimated by means of least squares, and their significance interpreted by means of a t-test.