Describing data Flashcards
Micro data
collected on individuals
Macro data
Collected on groups of units
Population
The set of all statistical units of the interested object. Denoted by N
Sample
Subset drawn from the population. Denoted by n
Non-probability sampling
Units are drawn from the population according to the judgement of the researcher
Probability sampling
Units are drawn from the population randomly. It ensures that the sample is representative of the population, by not favoring any part of N
Inferential process
Drawing conclusions that concern the entire population from the information drawn from n.
Collection of techniques that make use of sample statistics to learn on N parameters
Parameter
Numerical summary of a characteristic at N level
Statistic
Numerical summary of a characteristic at n level
Categorical values
Non numerical values, can be either nominal or ordinal
Nominal categorical values
Non-numerical that cannot be ranked
Ordinal categorical values
Non-numerical that can be ranked
Numerical values
number values, can either be discrete or continuous
Discrete Numerical value
takes on a finite number of values of infinite but COUNTABLE
Numerical continuous values
Can take any value between two numbers (ex.: height and weight)
How is a frequency distribution table composed (its columns)
RIGHT: observed distinct values (classes/groups)
MIDDLE: absolute frequency or absolute values of the observations
LEFT: relative frequency
How can you represent a freq. table (not with intervals)?
Pie or bar chart
How can you represent a freq. table (with intervals)?
Histogram
How can you read a histogram?
HORIZONTAL AXIS: Intervals –> on each interval there is a bar having area equal to its relative frequency
VERTICAL AXIS: interval density
How can you calculate an interval density?
relative frequency/ interval length
The higher the number of intervals the ……….. is the degree of detail of the description
higher
Mode
The level or value of a variable that is observed with the highest frequency = the most observed value
What is the unique measure for nominal variables?
Mode
Median
The central value of the distribution. It divides the sample in half
How to calculate the median for odd and even numbers?
ODD: (n+1)/2
EVEN: any of the two middle observations, or the arithmetic avg of them
How can we calculate the median from a frequency table?
It can either be the value in which the cumulative percentage is 50% or the first value that weights more than 50%
Mean
Arithmetic average of all variable values. ONLY for numerical values
Deviation
The difference between each observed value and the mean. Positive if higher than the mean and negative if lower
How to calculate the mean from freq. distribution tables?
(Valuefrequency) + (value2freqeuency) +…… / n
Do outliers affect the mean? Why?
Yes, because it is measured using ALL the values from the observation
Do outliers affect the median? Why?
No, because it is measured only by using the frequencies