Exam 1 Flashcards
Variables take categories as their values such as “yes” “no “ or blue brown green
Categorical (qualitative)
Variables that have values that represent a counted or measured quantity
Numerical (quantitative)
Variables that arise from a counting process
Discrete / numerical (quantitative)
Variables that arise from a measuring process
Continuous/ numerical (quantitative )
Facts and figures callected, analyzed, and summarized for presentation and interpretation
Data
All the data collected in a particular study are referred to as the __________ for the study
Data set
The entities on which data are collected
Elements
A characteristic of interest for the elements
Variable
The set of measurements obtained for a particular element is called _____
An observation
A data set with n ________ contains n __________.
Elements, observations
How do you calculate the total number of data values
The total number of data values in a complete data set is the number of elements multiplied by the number of variables
Nominal data
Defined categories such as eye color, political party, marital status
Ordinal categories
Categorical - Ordered categories such as good, better, best or low, medium, high
Data classified into distinct categories in which no ranking is implied
A nominal scale ex: do you have a facebook profile? Y or N; cellular provider? (verizon, AT&T, etc.)
Classifies data into distinct categories in which ranking is implied
Ordinal data - EX: grades, ratings, product satisfaction
Data that has the properties of ordinal data and the ___ between observations is expressed in terms of a fixed unit of measure. it is always ___. The scale ____ contain a ____ value that indicates that noting exists for the variable at the _____ point.
Interval,, numeric, does not , zero x2
Data that has all the properties of internal data and the ___ of two values is meaningful. The scale ____ contain a zero value that indicates that nothing exists for the variabe at the zero point,
Ratio, must
Data that is collected at the same or approximately the same point in time.
Cross-sectional data. EX: data detailing the number of building oermits issued in Nov. 2019 in each of the counties of Ohio.
Data that is collected over several time periods
Time series - data detailing the number of building permits issued in lucas county, ohio in each of the last 36 months. Graphs of time series help analysts understand what happened in the past, identify any trendsa over time, and project future levels for the time series.
The set of all elements of interest in a particular study
Population
A subset of the population
Sample
The process of using data obtained from a sample to make estimates and test hypothesis about the characteristics of a population
Statistical inference
Collecting data for the entire population
Census
Collecting data for a sample
Sample survey
Collecting data via sampling is used when doing so is:
Less time consuming tham selecting every item in the population. It is less costly than selecting every item in the population. It is less cumbersome and more practical than analyzing the entire population.
Summarizes the value of a specific variable for a population.
Population Parameter
Summarizes the value of a specific variable for sample data
Sample statistic
Tallies the frequencies or percentages of items in a set of categories so that you can see differences between categories
A summary table
A summary of data showing the number (frequency) of observations in each of several non-overlapping categories or classes.
Frequency distribution
The ____ ______ of a class is the fraction or proportion of the total number of data items belonging to the class. What is the equation?
Relative frequency. Equation is ~ relative frequency of a class = frequency of class/ n
How do you calculate percent frequency of a class?
The relative frequency multiplied by 100
Used to study patterns that may exist between the responses of two or more categorical variables.
A contingency table
It cross tabulates or tallies jointly the responses of the categorical variables
A contingency table.
A tabular summary of data for two variables.
A crosstabulation
Contingency table - For two variables, the tallies for one variable are located in the ____ and the tallies for the second variable are located in the ________.
Rows, columns
A sequence of data, in rank order, from the smallesy value to the largest value.
An ordered array.
It shows range (minimum value to maximum value)
Ordered array.
May help identify outliers (unusual observations)
An ordered array
A summary table in which the data are arranged into numerically ordered class
Frequency distribution
You must give attewntion to selecting the appropriate number of _____ ______ for the table, determining a suitable width of a class grouping, and establishing the boundaries of each class grouping to avoid overlapping.
Class groupings.
How do you determine the width of a class interval?
Divide the range (highest value-lowest value) of the data by the number of class groupings desired
A ______ visualizes a categorical variable as a series of bars. The length of each bar represents either the _____ or ___ of values for each category. Each space is seperated by a space called ______
Bar chart, frequency or percentage, gap
A ___ ___ is a circle that is broken up into slices that represent categories. The zie of each ___ varies according to the percentage in each category.
Pie chart
A ___ ____ is the out part of a broken circle broken up into pieces that represent categories. The size of each piece varies according to the percentage of each category.
Donut chart
Used to portray categorical data. A verticle bar chart where categories are shown in decending order of frequency. A cumulative polygon is shown in the same graph. Used to seperate the “___ ___” from the “___ ___.”
The pareto chart. “vital few,” from the “trivial many”
Represents data from a contingency table
Side by side chart
Can be used to represent the data from a contingency table
Doughnut chart
Organizes data in groups (called ___) so that values within each group (the ____) branch out to the right of each row.
Stem-and-leaf display
A vertical bar chart of the data in a frequency distribution is called a _______.
Histogram
Formed by having the midpoint of each class represent the data in that class and then connecting teh sequence of midpoints at their respective class percentages.
Percentage polygon
Displays the variable of interest along the X axis, and the cumulative percentages along the Y axis. Useful when there are two or more groups to compare.
Cumulative percentage polygon, or ogive.
Used for numerical data consisting of paired observations taken from two numerical variables.
Scatter plots
Used to examine possible relationships between two numerical variables.
Scatter plots
Used to study patterns in the values of a numeric variable over time.
Time-series plot
Contructed by tallying the responses of three or more categorical variables.
Multidimensional contingency table.
Provides a measure of central location
Mean
The average of all the data values
Mean
Perhaps the most important measure of location.
The mean
The value in the midddle when the data items are arraneged in ascending order
Median
Whenever a data set has extreme values, the ____ is the preferred measure of central location
mean
For an odd number of observations (in ____ order) the median is the ____ value
ascending, middle
For an even number of observations (in ____order), the median is the ______ of the two middle values
Average ~ median = (19+26)/2 = 22.5
The ____ of a data set is the value that occurs with the greatest frequency
Mode
The greatest frequency can occur at two or more different values
The Mode
If the data have exactly two modes, the data are ____.
Bimodal
If the data have more than two modes, the data are ______.
multimodal
Excells mean function
=AVERAGE(data cell range)
Excels median function
=MEDIAN(data cell range)
Excells mode function
=MODE.SNGL(data cell range)
How does one calculate the geometric mean?
Finding the nth root of the product of n values
What is the geometric mean function?
=GEOMEAN(data cell range)
The _______ of a data set is a value such that at least _ percent of the items take on this value or less and at least (100- __) percent of the items take on this value or more.
pth percentile, p, p
Equation used to compute percentiles
=PERCENTILE.EXC(data range, p/100)
Quartiles examples
First quartile = 25th percentile, second quartile = 50th percentile = median, third quartile = 75th percentile
Measure of ____ give information on the ____ or _____ or _____ of the data values
Spread, variability, or dispersion
The difference between the largest and smallest data values
The range
What is the range calculation?
Range = largest value - smallest value
The simplest measure of variability
Range
Is very sensitive to the smalled and largest data values
Range
A measure of variability that utilizes all the data
Variance
Based on the difference between the value of each observation and the mean
Variance
The ____ _____ of a data set is the positive square root of the variance
Standard deviation
Measured in the same units as the data, making it more easily interpreted than the variance
Standard deviation
Excel function for sample variance
=VARS.S(data cell range)
Excel function for sample standard deviation
=STDEV.S(data cell range)
Indicates how large the standard deviation is in relation to the mean
Coefficient of variation
The number of standard deviation a data value is from the mean
Z-score
Describes how data are distributed
Shape of a distribution
Measures the extent to which data values are not symmetrical
Skewness
Measures the peakedness of the curve of the distribution- that is, how sharply the curve rises approaching the center of distribution
Kurtosis
Sumamry measures describing a population, called ____ are denoted with greek letters
Parameters
The sum of the values in the population divided by the population size, N
Population mean
The ___ ___ approximated the variation of data in a bell-shaped distribution
Empirical rule
The _____ measures the strength of the linear relationshop between two ____ variables
Covariance, numerical
Excel function for the coefficient of correlation ~ covariance
=COVARIANCE.S(X,Y0
Excel function for coefficient of correlation ~ correlation coefficient
=CORREL(X,Y)