General Flashcards
(77 cards)
Sample percentile
Within a sample that has been ranked from least to greatest the 100p percentile of data is the value of data where
- ) 100p percent of the data is equal to or less than the data value and
- ) 100(1-p) percent are greater than or equal to it.
A statistic
This is a numerical value that is derived from data.
Bivariate Data Analysis
This is when you are investigating an IV and a DV and the relationship between IV and DV
Box Plots
This is a plot that shows the extreme values, the first quartile, median, and third quartile.
Central Tendency Measures
This is described by the mean, median, and mode of the dataset where the mean is influenced by the extreme values and median is independent
Chubyshovs inequality
If we are trying to identify how much of a dataset lies between the values of x̄ +-ks where s = standard deviation and k = some number then
% min = 100(1- 1/(k2))
Class Boundaries
These are the max/min of the class intervals. We use the left-end inclusion rule which says that the value to the left is included in the bin and the one in the right is not.
Class intervals
These are the bins for grouping observations in a reasonable way.
Closed Data
This is data that is of a fixed ratio where the maximum cannot exceed some value.
Examples include any cumulative data.
Correlation Coefficient
r = [Σ (xi - x̄)( yi - ȳ)]/(n-1)sxsy = [Σ (xi - x̄)( yi - ȳ)]/[Σ (xi - x̄)2( yi - ȳ)2].5
This says that if we have a paired dataset such that xi,yi are the pairs and are described by their respective means such that y = mx + b then this statistic will indicate the linearity of the pairs of data
Cumulative Frequency
This shows the bins as a function of an additive frequency.
These are also called Ogives
Directional Data
This is data expressed in angles and can indicate how a vector is directed in space.
Frequency Table
This is a table that displays the number of occurrences vs. a characteristic of the sample being investigated with relatively small and discrete values.
Gini Coefficient
The gini coefficient (G) is the integral of the area between L(p) = 1 and the Lorenz Curve. It has a maximum value of .5 and a minimum value of 0
G=1-2B where B = area under Lorenze curve, L(p)
Histograms
These are bar charts without spaces
Image Processing
This is an increasingly important form of analysis that involves the changing of images from signals to visuals, enhancing the signal to noise ratio, extract features, and understand patterns.
Inferential Statistics
This is the practice of using statistics to make inferences about a experiment or population
Interval Data
These are data that are seperated by even values but they can be less than zero (temperature)
Lorenz Curve
This is a cumulative curve showing the income distribution
mean
x bar = Σx/n = Σ v*f/n
where v = bin value and f = frequency
Mean influence by multiplication/addition
for some function y = ax+b
y bar = a x(bar) + b so the mean is affected by both multiplication and addition in a linear way
Median
This is the middle value of a sample when data is arranged from least to greatest
If n is odd then the median value occurs at n = (n+1)/2
If n is even then the median is the average of (n/2)+1 and n/2
Mode
This is the observed value that occurs most often within a dataset. If there are more than one values that occur the same number of times then there are modal values
Nominal Data
This is data that is non-numerical in character (fossils, minerals, rocks…)
It is occasionally converted into binary (0=not present, 1 = present)