Introduction Flashcards by rachael piddington

What does a geographical data frame comprise?

Cases + geographical references + variables (attributes/measurements about the cases)

How well did you know this?

Not at all

Perfectly

Salary in £s per week is an example of which measurement?

Ratio

How well did you know this?

Not at all

Perfectly

What are the three broad purposes of data analysis?

Description/exploration
Probabilistic inference and confirmation
Modelling relationships

How well did you know this?

Not at all

Perfectly

What measures of central tendency can be used to describe ratio data?

Mode, Median, arithmetic mean, geometric mean

How well did you know this?

Not at all

Perfectly

What measures of spread can be used to describe ordinal data, and which cannot?

% in the mode and IQR can both be used, but standard deviation and the coefficient of variation cannot.

How well did you know this?

Not at all

Perfectly

What is the varience, and why does this explain its low resistance?

The average of the squared deviations - therefore any outliers are made worse by squaring.

How well did you know this?

Not at all

Perfectly

By dividing the standard deviation by the mean, and multiplying the outcome by 100 what do I get?

The coefficient of variation - a dimensionless measure that gives relative spread in comparison to the mean.

How well did you know this?

Not at all

Perfectly

How do you standardize values?

Subtract the mean and then divide by the standard deviation.

How well did you know this?

Not at all

Perfectly

What is the coefficient of variation particularly useful for?

Comparing distributions with very different means and comparing variables measured in different units.

How well did you know this?

Not at all

Perfectly

Boxplots are good for comparing multiple batches of data, but what do they show about each batch?

The middle, extremes, IQR and identifies outliers.

How well did you know this?

Not at all

Perfectly

What equation(s) can be used to find outliers on a boxplot?

> UQ + 1.5*IQR

<LQ - 1.5*IQR

How well did you know this?

Not at all

Perfectly

What do stem and leaf plots enable us to see about the data?

The frequency distribution and overall shape of the data, the centre of the data and marked deviations.

How well did you know this?

Not at all

Perfectly

What is important about the sum of absolute deviations from the mean?

It will be less than from any other number.

How well did you know this?

Not at all

Perfectly

How can the arithmetic mean be applied to nominal data?

Binary variables can be split into categories of 0 and 1, with the mean giving us the proportion of data in the ‘1’ class.

How well did you know this?

Not at all

Perfectly

Describe the difference between inferential, explanatory and relational statistics.

Inferential - Go beyond the data to say something about a population.
Relational - why two events coincide
Explanatory - What caused an event

How well did you know this?

Not at all

Perfectly

How do you calculate the standard deviation?

Study These Flashcards

The square root of the varience.

When calculating the trimmed mean, what is g? And how can g:0 and g:1 be otherwise described?

Study These Flashcards

g = robustness factor.

g:0 is the arithmetic mean (no values trimmed) whilst g:1 represents guarding against 2 outliers.

For data with a positive skew, the median, mode and mean occur in which order?

Study These Flashcards

Mode<Mean

Starting with x^3 as a transformation for negatively skewed data, write out the ladder of power.

Study These Flashcards

x^3…..x^2…….x…….sqrtx……ln(x)…..-1/sqrtx……-1/x…….-1/x^2

What skew is found in most distributions and what transformation is used to correct this?

Study These Flashcards

Positive - log scale corrects this.

If an outlier is a mistake, you can correct or omit it. If not, what two steps should you follow?

Study These Flashcards

Try a transformation. If there is still no apparent reason, do a with/without analysis.

What does the multiplication theorem state?

Study These Flashcards

That for independent events, the probability of them occurring together is given by the product of their individual probabilities.

A permutation is a set of objects in a given order, whereas in a combination the order matters. How can we find the number of possible combinations of a sample?

Study These Flashcards

c(population size, sample size) = c(3,2) = 3!/2!(3-2)!

What is probabilistic inference?

Study These Flashcards

Making an inference from a sample to a population and calculating the chances of this being wrong.

After stating H1, collecting representative data and stating H0, what are stages 4,5 and 6 in hypothesis testing?

4 - specify significance level 5 - choose statistical test 6 - calculate test statistic

What does a p value tell you?

the chance of getting an observed value by chance if H0 is true.

Under what circumstance can we reject H0 for a given p value?

When p

What does a sampling distribution show?

All possible results that can be obtained under H0

Linear association between two variables is measured by what unit free indicator of both strength and direction?

Pearsons correlation coefficient - r. Where r = -1 there is very strong negative correlation. Where r=0 there is no linear correlation and where r=1, there is very strong positive correlation.

What is 'error'?

The deviation of points from their expected value.

To make all deviation positive, we use the sum of the squared error. How do you calculate this?

sum of (y-y^)^2

Introduction Flashcards

(31 cards)