[LEC3] Descriptive Statistics Flashcards
Observation on one variable may be shown visually by putting the variable’s on one axis and putting the frequencies on the other.
Visual Presentation of Data
___________ are best used to interpret the frequerncy distribution visually.
Figures
A bar graph wherein the number of units observed is on the y-axis while the measurement levels on the x-axis. The bars are visually proportional to each other.
Histogram
A figure that shorthanded presents a histogram. A dot is placed at the center of the top of the bars and connected to form a polygon. This better ennucuates the data shape.
Frequency Polygon
Basic graphs that can illustrate one or more data sets in one graph.
Line Graph
Have both x and y-axes on an arithmetic scale.
Arithmetics Line Graphs
has the y-axis as alogarithmic axes
Semilogarithmic Line Graph
Frequency distributions from continuos data are defined by types of descriptors, known as ___________
Parameters
The two measures of __________ and _______
Central Tendecy and Dispersion
- Defined as the value used to represent the center or the middle of a set of data values.
- Locates observations on a measurement scale.
Central Tendency
- Descibes the spread of values in a given data set.
- Suggests how widely spread out the observation are.
Dispersion
- The average value or the sum (Σ -sigma) of all the observed values (xi) divded by the total numver of observation (N).
- has the most mathematical properties and most representatives of dataset if not for outliers.
- used arithmetics, mathematical techniques
Mean (x̅)
The middle observation data when data has been arranged from highest to lowest. When the dataset is an even number (hence no natural middle point), the two middling variables are average to find a __________.
Median
Rarely used to make inferential conclusion from but is used frequently in healthcare and economics.
Median
- The most commonly observed value.
- Has some clinical interest, but seldomly used in statistics. if two or more values appear with the same frequency, each is a mode. The downside to using the mode as a measure of central tendecy is that a set of data may have no mode, or it may have more than one mode.
Mode
-bX or Xi means for “for each individual observation”
- Simple to calculate
Arithmetic Mean
A type of mean that is calculated by multiplying the weight (or probability) associated with a particular event or outcome with its.
A type of mean that gives differing importance to the values in a datsets.
Weighted Means
A statistical measurement of the spread between numbers in a data set. It measures how far each number in the set is from the mean (average), and thus from every other number in the set.
Variance
The average amount of variability in your dataset. It tells you, on average, how far each value lies from the mean. A high standard deviation means that values are generally far from the mean, while a low standard deviation indicates that values are clustered close to the mean.
Standard Deviation
The difference between the observed value of a data point and the expected value is known as ____________ in statistics.
Deviation
- also called as mean absolute deivation
- the average deviation of a data point from the mean, median or mode of the data set.
mean Deviation
Degree of Freedom
N-1
Values that split sorted data or a probability distribution into equal parts. In general terms, a q-quantile divides sorted data into q parts.
Quantiles
A statistical term that describes a divisions of observations into four defined intervals based on the values of the data and how they compare to the entire set of observation.
Quartiles
Quartiles are are organized into:
- lower quartiles
- median quartiles
- upper quartiles
Values that split sorted data or a probability distribution into equal parts.
Quantiles
- Type of quantiles, obtained adopting a subdivison into 100 groups.
- A number denoting the position of a data point within a numeric dataset by indicating the percentage of the dataset with a lesser value.
Percentiles
Percentiles are calculated by dividing an ordered set of data into ______________
100 equal parts
Statistics for a given data set is the difference between the highest and lowest values.
Range
The size of the narrowest interval which contains all the data,
Range
Range formul
Range = Max(X) - Min (X)
Defined as the difference between the third and the first quartile.
Interquartile Range
The range of a set data is size of the narrowest interval which contains all the data.
Interquartile Range
A measure of the asymmetry of a distribution
Skewness
A distribution is a symmetrical when its left and right side are not mirror images. A distribution can have righ (or positive), left (or negative) or zero skewness.
Skewness (Horizontal Imbalance)
A descriptive statistics used to help measure how data disperse between a distributions center and tails, with larger values indicatin a data distribution may have “heavy” tails that thickly concentrated with observations or that are long with extreme observations.
Kurtosis (Vertical Imbalance)