Topic 8 Flashcards
what is a histogram
graph showing the frequency of measurements/obeservations plotted against the range of observations
an important data exploration and summary tool
explain modality and symmetry
median - 50% higher and 50% lower
skew… what way is the tail facing
symmetrical distribution
mode, median, mode are coincident
modality
when there are more than one value with a high frequency
greatly impacts the use of median and mean measures
sample mean vs population mean
divided by the sample number or entire population
we use deviation from mean ro deviation from the median
median is more popular because it doesnt get so easily affected by outliers
standard deviation is important with data transformations T/F?
true
what is data normalization
raw totals (Numerator) are standardized against a denominator
min-max scaling
comparing something to make it comprehensible
standardization (z-score normalization)
types of normalization
denominator is standard deviation
max is 1
min is 0
goes to 0
min-max scaling
important for rasters
range of data
min and max values
does not go to 0 only to the min value
how do you know when you need to standardize your data?
know your data before normalizing it. Normalizing unrelated data is like mixing apples and oragnes. It makes fruit salad, not a good analysis
not all variables need to be normalized
results can be proportions or percentages
data classification considerations
grouping of numerical data into classes for mapping, with each class represented by an individual symbol
class interval: where to put breaks in the data
number of intervals : 4-7
describe equal intervals
equal intervals or steps along the number line
determine data range
not very good
susceptible to outliers
describe quantiles
each class contains the same number of observations/values
easy tp understand
describe mean standard deviation
derive classes from the descriptive statistics of overall data distribution
worst method