Summarising Data Flashcards
Types of data
Numerical data
Categorical data
Numerical data
Discrete data
Continuous data
Categorical data
Attribute/ dichotomous data
Nominal data
Ordinal data
Descriptive statistics
The methodology for describing or summarising a set of data using tables, diagrams and numerical measures.
Batch data
Are a set of related observation, such as the current inflation rates of EU countries.
Sample data
Are a set of observation selected from a population and designed to be representative of that population.
Discrete data
Can only take one of a set of particular values.
Discrete data arise from counting.
Continuous data
Can take any value within a specified range.
Continuous data arise from measuring.
Attribute/ Dichotomous data
Have only two categories.
Eg yes/no, male/ female
Nominal data
Have several unordered categories.
Type of policy, nature of claim
Ordinal data
Have several ordered categories.
Strongly in favour/ … / Strongly against
Frequency distribution
List data values along with there corresponding frequencies.
Frequency
The number of times something occurs.
Types of frequency distribution
Standard frequency distribution Cumulative frequency distribution Grouped frequency distribution Relative frequency distribution Percentage frequency distribution
Number of classes in a frequency distribution
2^k >= n
K no of classes
N no of observation
Class interval
Each category of the data sample.
Class interval formula
Max value - min value
Width class
Class interval / no of classes
Bar Chart
Is a chart or graph that represent categorical data with rectangular bars with heights proportional to the values that they represent.
A bar graph shows comparisons among discrete categories.
Types of bar chart
Standard bar chart
Grouped bar chart
Stacked bar chart
Grouped bar chart
Is used to compare the same categories within different groups.
Stacked Bar Chart
Highlight the part to whole relationship of categories and compare various groups with this stacked bar graph.
Histogram
Is an accurate representation of the distribution of nu erical data; an estimate of the probability distribution of a continuous variable.
Measures of location are used to
Estimate the Central point of a sample; different ways of calculating the average value for the data set.
The sample mean
Is used to describe Central tendency where the sample is not influenced by the outlines.
The sample mean for grouped data
Used the midpoint of each group to be determined
The median
Place the n observation in order of magnitude. The median is a value, which splits the data in two equal halves, so that the half observatios are less than the median and half are grater than the median.
How can the median be exxpressed
(n+1)/2 the observation
The median is used
- when the distribution is skewed
- for ordinal data in which values are ranked relative to each other but are not measured absolutely
Advantage of the median
Is robust or resistant to the effects of extreme observations
The median of grouped data
We use interpolation
The mode
The value which occurs with the greatest frequency or the most typical value
Probability Space
Is a mathematical construct that models a real world procces or “experiment” consisting of states that occur randomly.
A probability space consist of
A sample space
A set of events
A function that measures the likelihood of happening
A sample space
Is the set of all possible outcomes
A set of events
Each event contains 0 or more outcomes; is a subset of the sample space.
The probability function
Is a function returning an event’s probability; a number between 0 and 1
Outcome
The result of a single execution of the model