Math: Data Analysis Review Flashcards
frequency/count
number of times the category/numerical value appears in data
frequency distribution
a table/graph that presents the categories or numerical values along with their corresponding frequencies
often presented as a 2 column table in which the categories or numerical values of the data are listed in the first column and the corresponding frequencies are listed in the second column
relative frequency
the corresponding frequency divide by the total number of data expressed in terms of percents, fractions or decimals
relative frequency distribution
table/graph that presents the relative frequencies of the categories or numerical values
often presented as a 2 column table in which the categories or numerical values of the data are listed in the first column and the corresponding relative frequencies are listed in the second column
bar graphs/charts
a frequency distribution or relative frequency distribution collected from a population observing one or more variables can be presented this way
each of teh data categories or numerical values represented by rectangular bar with height proportional but width the same presented either vertically or horizontally
Bar graphs enable comparisons across several categories more easily than tables do
They can sometimes be used to ocmpare numerical data as well
Segmented bar graph/ Stacked bar graph
similar to a regular bar graph except that in a segmented bar graph, each rectangular bar is divided, or segmented, into smaller rectangles that show how the variable is “separated” into other related variables
Histograms
organize the data by grouping values into intervals/classes (divide the entire interval of values into smaller ones of equal length and then count the values that fall into each interval)
graphs of frequency distributions that are similar to bar graphs BUT they must have a number line for the horizontal axis which represents a numerical value
there are no regular spaces between the bars unless there is no data in the intervals represented by the spaces
They are useful for identifying the general shape of a distribution of data (including center and degree of spread as well as high/low freq intervals)
Circle Graphs/ Pie Charts
used to represent data that have been separated into a smaller number of categories
may be used to represent a freq distribution or relative freq distribution
generally may rep any total amt that is distributed into smaller # of categories
each part of circle graph called sector
each sector is proportional to the percent of the whole
measure off central angle of a sector is proportional to 360 deg that sector reps
scatterplot
type of graph that is useful for showing the relationship b/w 2 numerical var whose values can be observed in a single pop of individuals or objs
values of one var appear on the horizontal axis and vals of other var appear on vertical access
for each individual or obj in data, an ordered pair of numbers is collected
makes it possible to observe an overall pattern/trend in the relationship b/w 2 vars. Also strength of trend and deviations from the trend
line/curve of best fit to make predictions about pop
line graphs
useful for showing the relationship b/w 2 numerical vars (specially in time)
uses a coordinate plane where data point reps a pair of vals observed for the 2 numerical vars
Sub Cat: Time series: time is plotted on horizontal axis with reg time intervals
Numerical methods for describing data
aka stats/stat measures
arithmetic mean
median
mode
Measures of position
basic positions: beginning, middle, end
quartiles and percentiles which divide data into roughly equal groups after the data have been ordered from least to greatest val
three quartile numbers: First, second and third
that divide the data in 4 ~= groups
99 percentile numbers that divide the data in ~= 100 groups
percentiles are mostly used for vary large lists of numerical data ordered from least to greatest instead of dividing the data into four groups, the 99 percentiles P1-P99 divide data into 100 groups
Quartiles
Q1,Q2,Q3
in all cases Q2= median
to find Q1 and Q3 most commonly first divide the total into two groups of equal data from least to greatest val and find Q2 (median) first. then take median of each group for Q1 and Q3
interquartile range: Q3-Q1 and measures the spread of the middle half of the data
measures of dispersion
dispersion: degree of spread of data
range: difference b/w greatest and least numb in data
outliers: data value lie so far out form rest of data
interquartile range: Q3-Q1, measures spread of middle half of data
box and whisker plots/ boxplots
a box is used to ID each of the 2 middle quartile groups of data and whiskers extend outward from the boxes to the least and greatest values
Using boxplots, several diff comparisons of the two lists can be made