L1 Flashcards
What is the definition of statistics
Math used to describe and answer questions about a dataset
What is a dataset
Organized facts that can generally be assembled in a table
What are the two branches of statistics
Descriptive and inferential
What is the descriptive branch of statistics
It summarizes the population or sample data
How does one do inference in statistics
One makes conclusions about a population based on sample data
What is a population in statistics
The entire potential dataset one is interested in
What is a sample of a population in statistics
A subset of the unavailable dataset one is interested in
Why use samples when doing statistics
It is more feasible and cheap
How does one make sure that the sample data is representative of the population
Through random selection
What is the scale of variable usefulness from worst to best
Nominal, ordinal, interval and ratio
What is the properties of a nominal variable
They are non numerical but can be used for categorizing the data into larger groups that cannot be ordered in any obvious way such as a firms industry or an animals reproduction strategy
What are the properties of ordinal variables
These can be ordered but does not have a set distance between the categories. Often used in subjective questions where for example 5 does not have to be five times as great as 1 so the variables cannot be used in arithmetic +*-/
What properties do interval scale variables have
The data can be ranked and we can do meaningful arithmetic with it (addition and subtraction) although there is on meaningful zero so we cannot do multiplication, division and ratios. An example of this is time.
What properties do ratio scale variables have
They can be fully compared and used in math. There is a meaningful zero. An example of a ratio is that 1kg bag is twice as heavy as a 500g bag.
Which variable scales have fixed differences
Interval and ratio not nominal and ordinal
What variable scales dont have a ranked order
Only nominal
Wich is the only variable scale with an absolute zero
Ratio
Which variable scales can you add subtract, multiply and divide
Add and subtract interval and ratio but only ratio can be multiplied and divided
What is a continuous variable
An interval or ratio variable that can have the value of any real number aka anything that can be described as a decimal .0 or ratio 0/0 (math error). The possible variables are uncountably manny
What are discrete variables
Variables that can only have the values of a finite, countable set if alternatives. Often dine out of connivance
What are some types of unit/time data
Cross sectional, time series and panel data
What is cross sectional data
N>1, T=1 many units observed during the same time
What is a time series of data
N=1, T>1 observing the same unit at multiple points in time
What is panel data
N>1,T>1 observing different units of data at different points in time
What is the risk when presenting raw data
That it will overwhelm the watchers and not be very easy to understand
What are the pros and cons of visualization in statistics
They make the data easier to understand but they can also be missleading
What is a frequency distribution in statistics
The share of observations that fall into each category of a variable, can be used to give a value to the frequency at which a nominal variable apears
What is meant by qualitative data
That it is nominal, aka misleadingly less useful
What is meant by the modal answer
The answer that is most common
What is a way to order continuous data
To use the intervals as categories. If there can be infinite alternatives between 0-1 you can simply have that as a category and save some paper
What are the requirements when choosing intervals to compare when presenting a dataset
The interval categories must be mutually exclusive so no variable can appear in two as well as exhaustive aka all variables must fit into a category and for goodness sake it is best to make it limited and easy to understand
What is cumulative relative frequency
The proportion of observations that falls below the upper limit of a particular interval
What is a histogram
A series if rectangles in an x and y axis where x rectangle width represents the interval size and y rectangle hight represents the frequency
What can be learned from a histogram about the variables
The spread and rough shape
What is meant by a population being positively skewed
That the outliers are positive and that the frequency of the population is higher in the low intervals
What is a scatterplot
A diagram that shows two variables of a population such as height and weight of gothwnburgers in one figure by having the x axis representing one variable and y the other to see if they are correlated by observing the pattern
How can you introduce more categories in a scatterplot diagram
By having it represented by colors or symbols or some other dimension
What is a line chart
A chart that draws a line by connecting consecutive observations of a numerical variable