Data Summary Flashcards
What is Quantitative data?
Data measuring some quantity resulting in a numerical value
What is Qualitative data?
Data measuring the quality of something resulting in a value that doesn’t have a numerical value (colour, religion, seasons)
What is discrete quantitive data?
Data with distinct values and possible values take only a distinct series of numbers (number of traffic accidents, number of children born to a women)
What is continuous quantitive data?
Data with a value that can be measured evermore precisely (heights, speed)
What is ordinal qualitative data?
Non-numerical value but values that have some natural order (poor, fair, good, great)
What is nominal qualitative data?
Unordered, distinct by name only (red, blue, green)
What is a frequency distribution?
Used for discrete variables, with a limited number of distinct values. Formed by counting the number of frequency of each distinct value.
Meaning: mode
Most frequently recorded value
What re some measures of centre?
Mean and median
What are some measures of spread?
Range, interquartile range, sample variance, standard deviation
Why do we not usually know the population mean parameter?
Would have to sample the whole population which takes too long/ is too expensive
What would you use if you didn’t know the true value of the parameter?
Obtain the estimate (mu hat), the sample mean.
How would you find the sample mean (mu hat)?
Mu hat = the average of the set of values being used divided by the sample size
How would you find an outcome?
Outcome = (mean) + error
How would you find the destiny at some i?
Destiny at some time/location i is equal to the mean destiny plus some error
How would you find the sample median?
Sort the values into value order and find the middle number
What is the calculation for find the sample median for an ordered set of values?
(n+1)/2
What is the range?
The difference between the max and min value
What is an outlier?
A value that is very different to the other values
How do you find the interquartile range?
The 75th percentile minus the 25th percentile, and 50% of the data lies in that range
What is the error useful for?
The size of the error determines whether the model is a good or bad fit for the data
When would you use a bar plot?
When illustrating frequency information across discrete categories or groups
When would you use a histogram?
Used to display continuous data, data portioned into distinct bins
If he tail of the histogram is on the left what way is it skewed?
Left skewed
If the mean is less than the median what way is the graph skewed?
Left
When would you use a box plot?
Used to convey summary information about a variable
What do notched box plots include?
Info about the median
What are violin plots a combination of?
Box plots and a smoothed sideways histogram
What do the x and y variables usually represent?
X - explanatory variable
Y - response variable
Why would a scatter plot be used?
If there are two continuous variables