Statistics Flashcards
Descriptive and Inferential Statistics
`Statistics is the science of making decisions under uncertainty and variability .
Its a group of methods used to collect , analyze , present and interpret data to make decisions .
Terminology
Data ; Measurements or facts that are effected from a statistical unit / entity of interest .
Population ; Set of all elements in the universe of interest to the researcher .
Sample ; Subset or a portion of the population of interest . Sample must be representative of the population .
Elements / Units ; Entities or objects which the data are collected from .
Variable ; Characteristic or an attribute that can assume different values .
Census ; Gather data from the whole population for a given measurement .
Sampling ; Collecting data on only a sample of a population.
Sampling Error ; Error in a statistical analysis arising from the unrepresentativeness of the sample taken .
Branches Of Statistics
Descriptive ; use graphical / numerical methods to summarize data .
Inferential ; used to reach interferences about populations from which the samples have been derived .
Raw data table
Consists of all the individual values measured during a study .
Frequency Table
Shows how many times “x” occurred in the data set
Tally Marks
When the observation are large we can use tally marks which consist of slashes similar to that of “/” , Once 4 of these marks are present its cut by a diagonal slash .
Cumulative Frequency Table
Shows the total number of values that fall below the upper boundary of each variable .
Measures of Central Tendency
Central tendency , a descriptive statistic , calculates the average . It will point out the repeated and/or centralized number .
Mean (Arithmetic Average)
This is the average out of numerical values .
It can only be applied to quantitative data and is found through adding all the numbers and then dividing the answer by the amount of numbers added .
Equation to calculate Mean ; = x-Bar ( Σ x) / n
Median
Is the central value of a set of numbers , it can be calculated through two methods ;
Method 1 ; If its an odd number then add “1” to the number and divide it by 2 - (n=1)/2
If its an even number [(n/2) + value of (n/2) + 1] /2
Method 2 ; First divide by 2 , then if n/2 is a whole number find the mid point .
If n/2 isn’t a whole number , round the number up and pick the corresponding term .
Mode
This is the most frequently occurring number .
At times there may be no mode or there may be multiple modes ; when there are two modes its called bi mode .
If there are more than 2 modes then this data becomes less reliable .
Mode is usually made up of nominal data .
Measures of Dispersion
This describes how spread out the data is within a dataset .
This the degree to which numerical data tends to spread about an average value .
Without knowing the variation between numbers then the central tendency can be misleading .
Range
This is the difference between the largest and smallest number .
Variance
This is the average between the squared differences from the mean .
Variance can be found out by :
Working out the mean .
Subtracting each number from the dataset with the mean , then square the result .
Finally working out the average of the squared differences .
Decimals , Fractions and Percentages
`Percent to Decimal - Divide by 100 .
Decimal to Percent - Multiply decimal by 100 .
Fraction to Decimal - Divide numerator by denominator .
Decimal to Fraction - Multiply both numerator and denominator by 100 , then simplify .
Fraction To Percentage - Multiply fraction by 100 .
Percentage to Fraction - Write the percentage in fraction form , then simplify .
Group Frequency Distribution
Obtained by constructing classes/intervals for the data and then listing the corresponding number of values in each interval .
Classes are mutually exclusive , which means a particular observation can only fit in one category .
Class Intervals And Class Intervals
The total range of the observations are divided into a number of classes , which are known as class intervals
Class limits are the smallest and largest possible values that can fall into a given class .
The frequency in a class interval refers to the amount of values present within a specific interval .
Class Boundaries
In a frequency distribution , class boundaries are the values that separate the classes .
Obtained by adding the upper limit and lower limit class limit of the next higher class interval and divided by two .
Class Width
The difference between the lower and upper class boundaries .
Width = (Largest Value - Smallest Value) / Number of Classes .
Class Mark
Mid point of the class ; (Lower Limit + Upper Limit) / 2
The mid point value is taken as the representative of the class .
Bar Graphs
A Bar Graph or Bar Chart is a graphical display of data using bars of different height .
Bar graphs are more efficient to use when we have data that can be categorised .
Histogram
Useful for presenting distributions of observations of continuous variables .
The area of one block is proportional to its frequency .
Much of the original data is destroyed through the grouping process .
The importance of histograms is that the overall picture of a set of data can be obtained .
The frequency of an individual bar can be found via the equation ;
Frequency Density = Frequency / Class Width