Topic 1 - data relationships Flashcards
What are cases?
Cases are the objects described by a set of data; these may be
customers, companies, subjects in a study, units in an experiment or other objects
What is a variable?
is a characteristic of a case:
different cases can have different values of the variables
What are the 3 types of data with the 2 sub categories of on of them?
and what they mean.
Categorial (gender, race..)
Ordial (level of education, how much you agree..)
Numerical - discrete (number of) or continuous (weight, temperature of)
- can compute an average
How do we know if two variables measured on the same cases are associated?
Knowing the values of one of the variables tells us something about the values of the other variable
that we would not know without this information
e.g. number of books in the household is associated with higher children’s grades
When needing to see if there is any causation between 2 variables we need decide which is the response and explanatory variable?
Response variable (dependent) - this is the measure of the outcome of a study (e.g. grades)
Explanatory variable (independent) - which explains or causes changes in the response variable (e.g. books)
Whats a scatterplot?
and what’s on the vertical and horizontal axis?
A scatterplot is a graph showing the relationship between two quantitative variables measured on the same cases
vertical - y - dependent
horizontal - x - independent
What to use when interpreting scatterplots?
Form - linear or non-linear
Direction - positive or negative
Strength of the relationship - strong or weak
and striking deviations from the pattern - an outlier
Positively associated mean?
When the 2 variables accompany each in a positive relationship?
Negatively associated mean?
When the 2 variables accompany each in a negative relationship?
What correlation measures?
Measures the direction and strength of the linear relationships
between two quantitative variables
Whats a regression line?
Is a straight line that describes how a response/dependent variable y changes as an explanatory variable x changes. (The regression line is the one that best approximates the points in the scatterplot)
Whats is the use of extrapolation?
though…
Extrapolation is the use of a regression line for prediction far outside the range of values of the explanatory variable x used to obtain the line
- Such predictions are often not accurate and should be avoided
Whats an outlier?
An outlier is an observation that lies outside the overall pattern of the other observations
What happens with points that ra outliers in the y direction?
Points that are outliers in the y direction of a scatterplot have large regression residuals,
but other outliers need not have large residuals
Whats a lurking variable?
A lurking variable is a variable that is not among the explanatory nor response variables in a study
and yet may influence the interpretation of relationships among these variables