UNIT 1, ALL LESSONS VOCAB / KEY CONCEPTS Flashcards
memorize & understand key concepts and vocab
descriptive statistics
summarizing and describing features of a data set without making any generalizations/conclusions about a population
› states facts and proven outcomes we already know
inferential statistics
uses data from a sample/population to draw conclusions and make predictions about a larger population
› analyzes data to make predictions that we don’t know
phases of a statistical study
› phase 1, data gathering: any process that gets you data (surveys, questionnaires, counting, etc.)
› phase 2, data organization and analysis: includes making graphs, charts, and tables, from data and can also include calculating stats and analyzing data looking for patterns
› phase 3, probability-based inference: the process of using data to make conclusions about a population based on a sample of that population
statistical inference
process of using data to make conclusions about a population based on a sample
statistic
number that describes a sample
probability
mathmatical concept that measures the likelihood of an event occuring
data
a bunch of facts collected together for reference and analysis
parameter
a number that describes a population
data set
set or collection of data
datum
a bit of information / facts
numeric/quantitative data
data that is used as numbers and can be sorted and calculated
› can be sorted and worked with mathematically
› presented in numbers
› can be discrete or continuous
- discrete data is gathered by counting and can be whole numbers (counted data is a form of discrete data (numbers of people, anything counted)
- continuous data is gathered by measuring and are always numerical, can be fractions or decimals of any length (measured data, is a form of continuous data (lengths, weights, volumes- anything measured)
(non-numeric) categorical/qualitative data
data that is named or can be put into categories and cannot be sorted and can’t do calculations
› can be divided into groups or categories
distribution
a set of numbers on a graph that shows the possible values for a variable and how often they occur
› cannot be any graph
variable
a characteristic, number, or quantity that can be measured or counted, and that can take on different values (ex. age, sex, income, etc.)
univariate data
a set of data that only focuses on one variable (ex. salaries of employees in a company, the number of pets in different households, length of trouts in a lake, etc.)
unit of analysis
the major entity that you are analyzing in your study
class width
the difference between the upper and lower class limits of a class interval (ex. class interval 163-175, class width is 175-163=12)
constant
a fixed value that never changes within a given context
bar graph
a visual representation of data where rectangular bars are used in a way of showing the distribution of data
› counts or percents are on the vertical axis
› categories are on the horizontal axis
observation
a fact or figure we collect about a given variable
class
the range of values assigned to a group of data points (ex. 0-10, 11-20, 21-30, etc.)
class boundaries
the values that separate different classes (or groups) within a data set
relative frequency
the ratio (fraction or proportion) of the number of times a value of the data occurs in the set of all outcomes to the total number of outcomes
› calculation: divide the frequency (the number of times a particular value for a variable has been observed to occur) of a specific category by the total number of observations
› can also be in table form- one column contains values or intervals- other column contains frequencies
class interval
a range of values within a data set that are grouped together for analysis
histogram
a graphical representation that displays the distribution of numerical data using rectangles
› height of a rectangle (vertical axis) represents the distribution frequency of a variable (amount, or how often that variable appears)
› height of bars display frequency
› width of bars display intervals
› has midpoints- the exact center of value of a given bin (or class interval), calculated by adding the upper and lower boundary values of that bin and dividing the sum by two)
relative frequency histogram
a graph that displays the classes on the horizontal axis and the relative frequencies on the classes on the vertical axis
frequency table
a list that shows how often each value occurs in a set of data
cumulative relative frequency
a summary of a data set showing the relative frequency of items less than or equal to the upper-class limit of each class; tells you the percentage of data that falls within a specific category or below a specific point on the data scale
› plot cumulative frequency on the vertical axis
› place class boundaries or interval midpoints on the horizontal axis
› classes are on the horizontal axis
› cumulative frequencies are on the vertical axis
SUMMARY 1.2 ACTIVE RECALL
data presented with frequencies
› individual values often occur many times in a distribution
› classes or intervals group data
› frequency tables show frequencies for all the classes in your data
histograms from frequency data
› histograms are graphical interpretations of frequency data
› are made up of vertical bars
› height of bars display frequency
› width of bars display intervals
relative frequency tables
› relative frequencies are included in a summary table
› one column contains values or intervals
› one column contains frequencies
relative frequency histograms
› similar to a histogram from count data
› vertical axis represents the relative frequency of occurrence
cumulative frequency plots
› line graphs
› classes are on the horizontal axis
› cumulative frequencies are on the vertical axis
frequency graph
a visual representation of data that shows how often each value (or category) appears in a dataset
stem and leaf plot
a visual representation of data where each data point is divided into a “stem” (leading digit) and a leaf (last digit)
› plot displays the data in a way that allows you to see spread of data, where most values clusters, and extreme outliers
spread
how much variation or dispersion exists within a data set; describes how scattered the data points are within a dataset
distribution
is the way data is spread or organized; describes how frequently each value occurs and provides insights into patterns and characteristics of the dataset
› is symmetric when it can be cut in half by a vertical line and the halves are exact mirror images of each other
› may be described as approx. symmetric if halves are close but not exact
› can also be called nonsymmetric
› is uniform when all values have roughly the same number of observations
› can have several modes; unimodal, bimodal, and multimodal
- unimodal distribution only has one peak in its distribution, bimodal has two peaks in its distribution, multimodal has three or more peaks in its distribution
› if moundshaped, with a long tail of observations that trail out in one direction or other, its skewed (if it sticks out to the left, left skewed, if it sticks out to the right, right skewed
› can have gaps, clusters, and outliers
- gaps happen where there is a significant number of values with no observations
- clusters are groups of observations at similar values
- outliers are isolated observations that lie far from the bulk of the distribution - separated from it by a wide gap