Module 1: Introduction to Data Flashcards
Concept
Answer
A frequency table exhibits how…
frequencies are distributed over various categories (known as a frequency distribution)
Associated variables
When two variables show some connection/relationship with one another
Blocking (experimental design)
Grouping the sample based on variables which may effect the outcome and then randomizing within groups
Categorical variable
The individual entries are categories, the possible values are called “levels”
Cluster sample
Break the population into groups and then sample a fixed number of those groups and include all observations from each group; helpful when there’s a lot of variability between cases within a cluster but the clusters themselves don’t differ much from one another
Confounding variable
A variable that is correlated with both the explanatory and the response variables
Continuous variable
A numerical variable that has no limitation (e.g. infinite decimal points for precision); e.x. height, weight (think how much)
Controlling (experimental design)
Mitigate the differences between groups
Convenience sample bias
When individuals who are more accessible are more likely to be included in the sample
Cumulative frequency
The total of a frequency and all frequencies below it in a frequency distribution; the running total of frequencies
Cumulative relative frequency
Cumulative frequency for that category/Sum of all frequencies
Data
Information we gather with experiments and with surveys
Description
Summarizing the data that are obtained
Descriptive statistics
Refers to methods for summarizing the data; describes the sample only (graphs, numerical summaries)
Design
Planning how to obtain data to answer the questions of interest (experimental design, sample size, power, etc.)
Discrete variable
A numerical variable that only takes number values in jumps (e.g. whole numbers); e.x. the number that appears when throwing a die (think how many)
Experiment
Used to investigate the possible causal connection between variables
Explanatory variable
The variable (first) that causually affects the other
Frequency
The number of elements that belong in a certain category
Graphical methods
Histogram, boxplot, bar graph, etc.
Graphs (categorical)
Bar chart, pie chart; focuses on frequencies or relative frequencies of the levels of the variable
Graphs (numerical/scale)
Dot chart (discrete variable), stem-and-leaf plot, histogram, boxplot, scatterplot
Histogram
A bar chart that gives the frequencies or relative frequencies of occurrances of a scale variable in certain intervals; the heights of the bars in the histogram are called the distribution of the sample
Characteristics of a distribution: left-skewed
Negatively skewed; the values to the left of the center fall further away from the center than those to the right of the center; the mean is less than the median
Characteristics of a distribution: Right-skewed
Positively skewed; the values to the right of the center fall further away from the center than those to the left of the center; the mean is greater than the median
Characteristics of a distribution: symmetric
Left and right sides of the graph are roughtly mirror images of eachother; the center is the mean and the mean ~ the median
How to describe graphical data
Center, variation, distribution, outliers, time
Independent variables
When two variables are not associated/there is no evident relationship between the two
Inference
Making decisions and predictions based on the data
Inferential statistics
Are used when data are available only for a sample but we want to make a decision or prediction about the entire population (confidence intervals, signficiance tests)
Intensity map (heat map)
Colors are used to show higher and lower values of a variable
Multi-stage sample
Clustering, but sample within each cluster rather than the entire cluster