L1 - Presentation of Data Flashcards
What is the aim of descriptive statistical methods?
- it is simply to present information in a clear, concise and accurate manner
- the difficulty come when analysing many phenomena, be they economic, social or otherwise, is that there is simply too much for the mind to assimilate
- The task of descriptive methods is therefore to summarise all this information and draw out the main features, without distorting the picture
What is Cross-sectional data?
A cross-sectional dataset is one where all data is treated as being at one point in time. Let’s say you have a dataset of salaries across a city - they have all been gathered at one point in time and thus we refer to the data as cross-sectional.
- Ordering of cross-sectional data is irrelevant
-
What is an issue with cross-sectional data?
- While such an enumeration provides all the information available it is difficult to get any overall ‘feel’ for certain crucial features of the data, such as the average income and the extent of deviations about that average, which would give some indication of income inequality across the world
- as they only come from one year and income inequity can only be seen over time
What is a histogram good for?
a diagram consisting of rectangles whose area is proportional to the frequency of a variable and whose width is equal to the class interval.
- a histogram is a means of representing the underlying frequency distribution of the data
- it is good at identifying outliers
What is a nominal variable?
- A variable can be treated as nominal when its values represent categories with no intrinsic ranking; for example, the department of the company in which an employee works.
- Examples of nominal variables include region, zip code, or religious affiliation
What is a ordinal variable?
- A variable can be treated as ordinal when its values represent categories with some intrinsic ranking; for example, levels of service satisfaction from highly dissatisfied to highly satisfied.
- Examples of ordinal variables include attitude scores representing degree of satisfaction or confidence and preference rating scores.
- For ordinal string variables, the alphabetic order of string values is assumed to reflect the true order of the categories
- . For example, for a string variable with the values of low, medium, high, the order of the categories is interpreted as high, low,medium which is not the correct order. In general, it is more reliable to use numeric codes to represent ordinal data.
What is a scale?
- A variable can be treated as scale when its values represent ordered categories with a meaningful metric, so that distance comparisons between values are appropriate.
- Examples of scale variables include age in years and income in thousands of dollars
what is time series data?
- A time series dataset is one where the observations are time dependent. For instance, let us now suppose that a researcher collects salary data across a city on a month-by-month basis. The observations in the dataset will now differ across time.
-Ordering or in other words the calendar characteristics of the
data is a key feature of time series data
How do you find the angle need for a set of data on a pie chart?
angle = ((frequency)/(total frequency)) x 360
How do you workout frequency density on a histogram?
frequency density = frequency/class width
What is relative frequency?
- relative frequency= frequency of that category/ sum of frequencies
What are the different ways of summarising data using numerical techniques?
- A measure of location –> giving an idea of whether people own a lot of wealth or a little, an example is the average, which gives some idea of where distribution is location along the x-axis. IN fact, we will encounter three different measure of the ‘average’ –> mean,median, mode
- A measure of dispersion –> showing how wealth is dispersed around (usually) the average, whether it is concentrated close to the average or is generally far away from it. An example here is the standard deviation
- A measure of skewness –> showing how symmetric or not the distribution is a mirror image of the right half or not. this is obviously not the case for the wealth distribution
How do you calculate the mean?
x̅ =Σx/N
- for grouped data
x̅= Σfx/Σf –> where x is the midpoint of the group
- Typically, no data value will exactly equal the mean
what is an issue with the mean?
While the sample mean is a very popular and simple to calculate measure of location or central tendency, it can be overly influenced by extreme values
How do you calculate the median?
- order the numbers
- add one to the total then divide by 2
- if total is odd then the median with be given
- if total is even you will be put between two values, therefore add the values together and divide by two to get the median
- Note that the median calculation does not preserve the natural time ordering of the data