Maths Flashcards
Describe 4 types of data
- Ratio - data is interval data with a natural zero point. For example, time is ratio since 0 time is meaningful. Degrees Kelvin has a 0 point (absolute 0) and the steps in both these scales have the same degree of magnitude.
- Interval - like ordinal except we can say the intervals between each value are equally split (distance is meaningful). The most common example is temperature in degrees Fahrenheit. The difference between 29 and 30 degrees is the same magnitude as the difference between 78 and 79 (although I know I prefer the latter). With attitudinal scales and the Likert questions you usually see on a survey, these are rarely interval, although many points on the scale likely are of equal intervals.
- Ordinal - efers to quantities that have a natural ordering. The ranking of favorite sports, the order of people’s place in a line, the order of runners finishing a race or more often the choice on a rating scale from 1 to 5. With ordinal data you cannot state with certainty whether the intervals between each value are equal. For example, we often using rating scales (Likert questions). On a 10 point scale, the difference between a 9 and a 10 is not necessarily the same difference as the difference between a 6 and a 7. This is also an easy one to remember, ordinal sounds like order.
- Nominal - basically refers to categorically discrete data such as name of your school, type of car you drive or name of a book. This one is easy to remember because nominal sounds like name (they have the same Latin root).
Enumeration or census v. sample
Census refers to the quantitative research method, in which all the members of the population are enumerated. On the other hand, the sampling is the widely used method, in statistical testing, wherein a data set is selected from the large population, which represents the entire group.
Primary Data vs. Secondary Data
Primary data refers to the first hand data gathered by the researcher themself. Surveys, observations, experiments, questionnaire, personal interview, etc.
Secondary data means data collected by someone else earlier. Government publications, websites, books, journal articles, internal records etc.
Mean
Measure of central tendency. = Sum of items / Count of items
Median
Measure of central tendency. Sort items high to low and select middle item.
Mode
Which value occurs most often.
Bimodal distribution
When two clearly separate groups are visible in a histogram, you have a bimodal distribution. Literally, a bimodal distribution has two modes, or two distinct clusters of data.
Range
= High value minus low value
Variance
Subtract the mean from each value. Square the difference. Sum the squares of the differences and divide by the number of cases.
Standard deviation
Square root of the variance.
Measure of the amount of variation of a random variable expected about its mean. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while a high standard deviation indicates that the values are spread out over a wider range.
Normal distribution
Bell-shaped curve, where each band has a width of 1 standard deviation
Dependent variable (y variable)
Variable being predicted or explained
Independent variable (x variable)
Variable used to predict or explain.
Difference between bivariate regression and multiple regression
How many variables used to predict. Bivariate = One X variable, while multiple = two or more X variables.
Bivariate would be used to predict number of automobiles per household.
Multiple regression would be used to predict house sale price, based on a number of factors including bedroom and bathroom count, accessibility to employment, etc.
Regression
Assumes a straight line can be used
to describe the relationship between
the independent (x) variable and the
dependent (y) variable.
▪ y = a + bx (or y = mx + b)
▪ a is the line’s y intercept
▪ b is the line’s slope
▪ R2 measures how well the line fits the
data and ranges from 0.0 to 1.0