L1 - Presentation of Data Flashcards

1
Q

What is the aim of descriptive statistical methods?

A
  • it is simply to present information in a clear, concise and accurate manner
  • the difficulty come when analysing many phenomena, be they economic, social or otherwise, is that there is simply too much for the mind to assimilate
  • The task of descriptive methods is therefore to summarise all this information and draw out the main features, without distorting the picture
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Cross-sectional data?

A

A cross-sectional dataset is one where all data is treated as being at one point in time. Let’s say you have a dataset of salaries across a city - they have all been gathered at one point in time and thus we refer to the data as cross-sectional.
- Ordering of cross-sectional data is irrelevant
-

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an issue with cross-sectional data?

A
  • While such an enumeration provides all the information available it is difficult to get any overall ‘feel’ for certain crucial features of the data, such as the average income and the extent of deviations about that average, which would give some indication of income inequality across the world
  • as they only come from one year and income inequity can only be seen over time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a histogram good for?

A

a diagram consisting of rectangles whose area is proportional to the frequency of a variable and whose width is equal to the class interval.

  • a histogram is a means of representing the underlying frequency distribution of the data
  • it is good at identifying outliers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a nominal variable?

A
  • A variable can be treated as nominal when its values represent categories with no intrinsic ranking; for example, the department of the company in which an employee works.
  • Examples of nominal variables include region, zip code, or religious affiliation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a ordinal variable?

A
  • A variable can be treated as ordinal when its values represent categories with some intrinsic ranking; for example, levels of service satisfaction from highly dissatisfied to highly satisfied.
  • Examples of ordinal variables include attitude scores representing degree of satisfaction or confidence and preference rating scores.
  • For ordinal string variables, the alphabetic order of string values is assumed to reflect the true order of the categories
  • . For example, for a string variable with the values of low, medium, high, the order of the categories is interpreted as high, low,medium which is not the correct order. In general, it is more reliable to use numeric codes to represent ordinal data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a scale?

A
  • A variable can be treated as scale when its values represent ordered categories with a meaningful metric, so that distance comparisons between values are appropriate.
  • Examples of scale variables include age in years and income in thousands of dollars
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is time series data?

A
  • A time series dataset is one where the observations are time dependent. For instance, let us now suppose that a researcher collects salary data across a city on a month-by-month basis. The observations in the dataset will now differ across time.
    -Ordering or in other words the calendar characteristics of the
    data is a key feature of time series data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you find the angle need for a set of data on a pie chart?

A

angle = ((frequency)/(total frequency)) x 360

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you workout frequency density on a histogram?

A

frequency density = frequency/class width

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is relative frequency?

A
  • relative frequency= frequency of that category/ sum of frequencies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the different ways of summarising data using numerical techniques?

A
  • A measure of location –> giving an idea of whether people own a lot of wealth or a little, an example is the average, which gives some idea of where distribution is location along the x-axis. IN fact, we will encounter three different measure of the ‘average’ –> mean,median, mode
  • A measure of dispersion –> showing how wealth is dispersed around (usually) the average, whether it is concentrated close to the average or is generally far away from it. An example here is the standard deviation
  • A measure of skewness –> showing how symmetric or not the distribution is a mirror image of the right half or not. this is obviously not the case for the wealth distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you calculate the mean?

A

x̅ =Σx/N
- for grouped data
x̅= Σfx/Σf –> where x is the midpoint of the group
- Typically, no data value will exactly equal the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is an issue with the mean?

A

While the sample mean is a very popular and simple to calculate measure of location or central tendency, it can be overly influenced by extreme values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you calculate the median?

A
  • order the numbers
  • add one to the total then divide by 2
  • if total is odd then the median with be given
  • if total is even you will be put between two values, therefore add the values together and divide by two to get the median
  • Note that the median calculation does not preserve the natural time ordering of the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you calculate the range?

A
  • the last number - the first number

however it is sensitive to outliers

17
Q

How do you find the Upper and Lower Quartiles and the Interquartile Range?

A
  • The quartiles are those values that ‘split’ the ordered data into four equally sized parts.
  • after finding the median look at the group of numbers above and below it
  • find the median of these two groups to give you Q{1} and Q{3}
  • the IOR = Q{3}- Q{1}
18
Q

What is the weighted average?

A
  • The weighted average is obtained by multiplying each unit cost figure by the proportion of the variable in each category and summing
  • x̅{w}= Σ{i}w{i}x{i}
  • where Σ{i}w{i}= 1 (weight must sum to 1)
19
Q

How do you calculate the mode?

A
  • look for the most common set of data
20
Q

Why is the Interquartile Range useful?

A

On its own, the IQR is no more than a summary measure, but it could be used to compare the dispersion of, say, two income distributions if they were both measured in the same currency units: the distribution having the larger IQR would exhibit the greater inequality.

21
Q

What is standard deviation and how do you calculate it?

A
  • A more useful measure of dispersion is the sample variance, which makes use of all the available data. It is defined as the average of the ‘squared deviations about the mean’,
  • s = sqrt((Σ(x{i}-x̅)^2)/(N-1))
    for grouped data –>
    s = sqrt((Σf(x{i}-)^2)/(N-1))
  • where s^2 is the variance
22
Q

What is one of the difficulty with the variance?

A

is that it is measured in units that are the square of the units that the data are measured in. Thus, for our income data, the variance will be in ‘squared dollars’, which are both very difficult to interpret and, in this case, lead to a very large numerical value.

23
Q

How do you calculate the coefficient of variation?

A
  • on its own a standard deviation is of little use but when used in conjunction with the sample mean various features of the data can be measured
  • CV = s/x̅
24
Q

How is the different summary statistics represented on a boxplot?

A
  • the median and the two quartiles represent the lines of the box the two whiskers extend above and below box as far as the highest and lowest value observations or at least up to a maximum point that excludes outliers
  • outliers can be observation more than 1.5 x IQR above and below the Upper and Lower Quarters
  • Mean is represented with a star
  • and any outliers represented with a cross
25
Q

What is Skewness and How do you calculate it?

A
  • a measure of skewness give a numerical indication of how asymmetric is the distribution
  • the coefficient of skewness is the average of cubed deviations from the sample mean divided by the sample variance
  • skew = )Σ(x{i}-x̅)^3)/N(s^2))
26
Q

What does the Coefficient of Skewness tell us?

A
  • The direction of skewness is given by the sign.
  • The coefficient compares the sample distribution with a normal distribution. The larger the value, the larger the distribution differs from a normal distribution.
  • A value of zero means no skewness at all.
  • A large negative value means the distribution is negatively skewed. it has a left hand tail and the mode>median>mean
  • A large positive value means the distribution is positively skewed. it has a right hand tail and the mode
27
Q

what is a problem with the coefficient of skewness?

A
  • The measure of skewness is much less useful in practical work than measures of location and dispersion, and even knowing the value of the coefficient does not always give much idea of the shape of the distribution: two quite different distributions can share the same coefficient.
  • In descriptive work it is probably better to draw the histogram itself