economic methods statistics Flashcards
what is data?
it is a body of numerical evidence
what is population?
a population is the collection of all items under investigation
what is a sample?
a sample is an observed subset of the population.
what is a parameter?
a parameter is a specific characteristic of population
what is a statistic?
it is a specific characteristic of a sample
what are the two branches of statistics?
descriptive statistics and inferential statistics
what is a descriptive statistic?
graphical and numerical procedures to summarise and process data
what is inferential statistics ?
the use of sample data to make predictions and estimates about the population
what are the two types studies ?
observational and experimental
what is an observational study?
the observational study does not directly interfere with how data arise. it could be retrospective or prospective
what is an experimental study?
an experimental study randomly assign subjects to treatment. it directly involves with how data arises.
what is a simple random sample?
a simple random sample is a procedure in which:
- each member of the population is chosen strictly by chance
- each member of the population is equally likely to be chosen
- every possible sample of objects is equally likely to be chosen
what is needed for observational data to be reliable?
If observational data are not collected in a random framework from a population, these statistical methods – the estimates and errors associated with the estimates – are not reliable.
what is systematic sampling?
Suppose that the population list is arranged in some fashion unconnected with the subject of interest. Systematic sampling involves the selection of every j th item in the population, where j is the ratio of the population size N to the desired sample size, n ; that is, j = N > n . Randomly select a number from 1 to j to obtain the first item to be included in your systematic sample.
what is a histogram?
it is a graph that provides a view of data distribution among a population. the higher the bars represnt where data is more common. on the Y axis is frequency density and on x are the bins. a chosen bin width can alter the story the histogram is telling.
what does unimodal mean?
a single peak
what does bi modal mean
two peaks
what does multimodal mean
more than two peaks
what does uniform histogram mean
rougly straight line across the bars. it can have some peaks but they need to be close together
what are the two ways of describing data numerically
central tendency and variation
what does central tendency include?
mean, median and the mode
what does variation include?
range , interquartile range, variance, standard deviation, coefficient of variation
what is the mean
the average of the data. it comes in two forms sample and population
what is the formula for the sample mean?
x1+x2+x3….xn/n where n is sample size
what is the formula for the population mean?
population mean is equal to x1 +x2+x3…xN/ N where N is population size
what is the median?
the middle point value of an ordered list
what is the mode?
the mode is the most frequently observed value
what is a flaw with the mode?
it may not exist or there may be multiple
what is a benefit of the mode
it is not affected by outliers
what is a benefit with the median?
it is not affected by outliers
what is the formula for finding the median?
The location of the median is found by the formula (n+1)/2 , if number is odd it will give you a number in list but if number of observations is even then the median is the mean of two numbers . the value outputted by the formula is not the median value but instead the point in list where the median will be found
what is a flaw with the mean?
it is affected by outliers
what are the three ways a data can be distributed?
it can be symetric, left skewed or right skewed
when is a curve left skewed?
A curve is left skewed when the mean < median
when is a curve right skewed?
A curve is right skewed when the mean> median
when is a curve symmetric?
A curve is symetric when the median is equal to the mean
what is a percentile?
A percentile indicates the value below which a given percentage of observations lie
what is a quartile?
Quartiles split the ranked data into 4 segments with an equal number of values per segment( although the widths may vary)
what is the five number summary and what does it include?
Refers to the descriptive measures
Minimum
First quartile
Median
Third quartile
Maximum
what is the measures of variability?
the Measures of variability gives information on the spread of variability
what are the different types of measurements of variablity?
Range, interquartile range, variance, standard deviation, coefficient of variation
what is the range?
The simplest measure of variation
The difference between the largest value and the smallest observation
what is the flaw with using the range?
it is affected by outliers
what is the interquartile range?
The difference between the difference between the observation at the third quartile and the observation at the first quartile
what is the benefit of using the interquartile range?
it is not as affected by outliers
what is the flaw with using the interquartile range?
it does not take into account the full population which might not be representative of the total population/
what is the population varience?
The population varience is the mean of the squared deviations of values from the mean
why do you square the numbers in the varience?
the calculation of variance uses squares because it weighs outliers more heavily than data that appears closer to the mean. This calculation also prevents differences above the mean from canceling out those below, which would result in a variance of zero.
what is the formula for the varience
mean of the squares minus the square of the means
what is the sample varience?
The sample varience is the mean (approximately) of the squared deviations from the mean
what is the standard deviaton?
it is the most commonly used measure of variation. it is the square root of the variance
what is the advantage of using the standard deviation over varience?
Its advantage over variance is that it has the same units as the original data
what is the coefficent of variation?
Measures the relative variation
Is expressed as a percentage
Can be used to compare two or more sets of data measured in different units
There is a population coefficient of variation, there is also a sample coefficient of variation
what are the 2 measures of relationships between variables?
there is the covarience and the correlation coefficient
what is the correlation coefficient?
Correlation coefficent- a measure of both the direction and strength of a linear relationship between two variables
what is the covariance?
Covariance- a measure of the direction of a linear relationship between two variables
what do the variables do when the covariance is positive (negative)?
When the covariance is positve (negative) the variables tend to move in the same(opposite) direction
what is the correlation when the coefficent is closer to 1 or (-1)?
The closer the correlation coefficient is to 1 ( -1), the stronger is the postive (negative ) linear relationship between the variables