Quiz 2 Flashcards
What is a nominal scale?
A scale whose numbers serve only as labels or tags for identifying and classifying objects with a strict one-to-one correspondence between the numbers and the objects eg. medicare numbers.
What is an ordinal scale?
A ranking scale in which numbers are assigned to indicate the relative extent to which they possess some characteristic eg. market position.
What is an interval scale?
A scale in which numerically equal distances represent equal values in the characteristic being measured, eg. attitudes and opinions
What is a ratio scale?
A scale that allows the researcher to identify or classify objects, rank order the objects, compare intervals or differences and compute ratios of scale values.
Instead of using one of the four scales of measurement (nominal, ordinal, interval or ratio), what are the two other ways?
- Metric data
* Categorical data
What is metric data?
Data which includes interval and ratio. It is numeric and is measured on some sort of comparative scale - eg. how old are you in years?
What is categorical data?
Includes nominal and ordinal data and groups possible responses into two or more separate categories, eg. are you male or female?
Can data fit into both metric and categorical categories?
Yes, age, can be metric, 10 years old or categorical in a 0-18 year category.
The data produced from a multi-item scale such as a Likert scale produces what sort of data?
Individual questions are categorical, but the rating applied by averaging the responses is metric.
What is a frequency table?
A tabulation of how many times each of the possible responses was recorded.
What is a pie chart?
A graphical representation of data where the number of categories is not too large and no individual category is too small.
When writing a report what information should always be mentioned?
- the sample size
- percentages
- Interesting aspects of the responses
- type of tests applied
- the middle
- the spread
- the shape
What is important to remember in capturing results?
Do not speculate - keep it fact based, speculation is for the discussion section.
What are the rules of using a histogram?
- No space between the bars
* Each category must be the same size
What is the difference between a bar chart and a histogram?
- Bar charts use categorical data on the x axis
* There are gaps between the bars on a bar chart.
What are the key parts of a histogram or data that need to be described?
- The Middle
- The Spread
- The Shape
What are the descriptors of the middle of a data set?
- Mean
- Mode
- Median
What is the Mean?
The average - the value obtained by summing all elements in a set and dividing by the number of elements
What is the Mode?
Is a measure of central tendency given as the value that occurs the most n the sample distribution.
What is the Median?
It is a measure of the most central tendency given as the value above which half of the values fall and below which half of the values fall.
What are the descriptors of the spread of a data set?
- Range
- Percentile
- First quartile
- Interquartile range
- Variance
- Standard deviation
What is the range?
The difference between the largest and smallest values of a distribution.
What is the Percentile?
These are values below which a certain percentage of the data lies, eg. the 30th percentile had 30% of the data beneath it.
What is First Quartile?
This is the first quarter (25th Percentile) of the data.
What is the interquartile range?
It is the range of a distribution encompassing the middle 50% of the observations, eg from 25th to 75th percentile.
What is variance?
It is the mean squared deviation of all the values from the mean.
What is standard deviation and what is the important rule with standard deviation?
Is the square root of the variance. At least 75% of all data will be within 2 standard deviations of the mean and at least 88.89% of all data within 3 standard deviations of the mean.
What are the options for the shape of the data set?
Symmetrical (where mean = median)
Negative skewed - where mean is lower than the median
Positive skewed - where mean is higher than the median.
When writing your report, what is the rule to use if the distribution is skewed?
Refer to the median & quartiles not the mean and standard deviation
What is descriptive statistics?
A term used to describe how statistics are looking and is the basic way data is described.
Categorical data can be classified into what two categories?
- Ordinal data - can be classified and ranked
* Nominal data - values and observations can be classified but cannot be ranked.
What is the advantage of a box plot graph?
It shows spread, interquartile range and the median
What is a uniform graph?
Where all boxes on a bar graph are around about the same height.
What is a statistic inference?
It means we can use the sample to say things about the population.
What is comparing when referring to statistical inference?
It involves looking at responses based on groups.
What is correlation?
Metric data can be plotted on a graph, how closely they align to a trend (drawn by a line) indicates how correlated the data is.
How is perfect correlation expressed?
r=1
What is a perfect linear positive relationship?
Where high values of one variable means high values of another (r= 1)
What is a perfect linear negative relationship?
Where high values of one variable means low values of another (r= -1)
What is a non linear relationship?
Values of one variable do not correlate to the other value at all (r=0)
How would a correlation coefficient of 0.8 be described as?
Strong positive linear relationship
How would a correlation coefficient of -0.5 be described as?
Moderate negative linear relationship
How would a correlation of 0.1 be described?
Extremely weak linear relationship.
What is the purpose of the p-value?
The p-value is the probability that our sample represents the population. It helps determine how likely our result will occur in the population.
What is the ‘considered’ acceptable level of p-value to indicate there is enough evidence to believe the survey represents the population?
p-value = 0.05
Where a p-value is high, what can the person do to reduce it?
- Increase the sample size
* Determine the point at which the results are reasonable (under 0.05)
When the p-value is high how would we write this into our report
There is insufficient evidence to make a conclusion
What are the different types of tests that can be applied?
- One Sample t-test
- Two sample t-test
- Paired t-test
- ANOVA (Analysis of Variance)
- Chi-square
- z-test
- F-test
What is a point estimate?
It is where you use a percentage by itself and is generally considered as bad in statistics.
What is a confidence interval?
It is a an interval for a parameter estimate, with a specified level of certainty. So if we think 55% from sample - we could say with 95% confidence it would be 45-65%
To provide a higher level of confidence what happens to the range of expected solution?
It gets wider.
How will a higher sample size affect the confidence interval?
It narrows the interval - provides more confidence
Does correlation mean causation?
No - be very careful just because they are correlated they are not necessarily linked by cause.
Correlation tells us about the strength, what does regression do?
It gives us an equation that characterises the straight line relationship between the two variables (an independent variable that predicts a dependent variable)
What is the difference between correlation and regression?
There is a link of cause in regression.
What is regression analysis?
A statistical procedure for analysing associative relationships between a metric dependent variable and one or more independent variables.
What is factor analysis?
It allows us to find correlating data and to test the correlation of items in a multi-item scale. It is an exploratory technique.
What does factor analysis achieve?
Reduces a large number of intercorrelated variables down to a smaller set of meaningful underlying variables.
What are the two main uses of factor analysis?
- Summarising information
* Creating/testing scales
What is a factor analysis table?
It shows how strongly each item responds to each factor. 1.00 equals a perfect correlation (though this is rare)
What are the most important items on a factor analysis table?
Those that have a greater than 0.3 or -0.3 score.
What are the three components to a good recommendation?
What - recommending course of action
Why - giving data based evidence for our recommendation
How - further detail about the action.
What are the four main steps in the ‘what’ stage of a report?
- Find significant relationships in the data
- Use this to determine sensible actions to recommend
- Recommendations generally relate to 1) features and design and 2) marketing
- Common sense check.
What are the three main steps in the ‘why’ stage of a report?
- Give evidence from your data for why we made our recommendation
- Describe relationships in words (not statistical terms)
- Common sense check.
What goes into the ‘how’ stage of a report?
More detail on the recommendation - be clear and concise and ensure it is a recommendation, not a statement of intent.
What is the sampling design process?
- Define the target population
- Determine the sampling frame
- Select the sampling techniques
- Determine the sample size
- Execute the sampling process
What is the sampling frame?
A representation of the elements of the target population. It consists of a list or set of directions for identifying the target population.
What are the two major types of sampling techniques?
- Non-probability sampling techniques - relies on the judgment of the researcher
- Probability sampling techniques - each element of the population is selected by chance.
What are the four types of non-probability sampling techniques?
- Convenience sampling
- Judgemental sampling
- Quota sampling
- Snowball sampling
What are the five types of probability sampling techniques?
- Simple random sampling
- Systematic sampling
- Stratified sampling
- Cluster Sampling
- Other sampling techniques
What is convenience sampling?
- Non-probability sampling that attempts to obtain a sample of convenient elements chosen by the interviewer, eg right place right time - shopping centres
What is judgemental sampling?
- A form of convenience sampling (non-probability) in which the population elements are selected based on researcher’s judgement - specified places, eg university to get student opinions on teaching methods
What is quota sampling?
- A non-probability sampling technique consisting of two-stage restricted judgemental sampling. The first stage looks at elements in the population, eg if sex is 40% women, then the quota for the sample of 100 people would be 40 women. The second stage is based on convenience or judgement.
What is snowball sampling?
A non-probability sample technique in which initial group is selected randomly and then they identify others and so on and so on. This allows to get people who have rare characteristics, such as widowed men under 35 years.
What is simple random sampling?
A probability sampling technique in which every element is selected independently of every other element of the sample is drawn by a random procedure from the sampling frame.
What is systematic sampling?
A probability sampling technique in which the sample is chosen by selecting a random starting point and then picking every ‘x’th element in succession from the sampling frame.
What is stratified sampling?
A probability sampling technique that uses a two-step process to partition the population into subpopulations, or strata. Elements are selected from each stratum by a random procedure.
MUST MENTION STRATA
What is cluster sampling?
A two-step probability sampling technique. First the target population is divided into mutually exclusive and collectively exhaustive subpopulations called clusters. Then a random sample of clusters is selected based on a probability sampling technique such as SRS. For each selected cluster, either all the elements are included in the sample or a sample of elements is drawn.
What are examples of a non-sampling error or bias?
A statistical error caused by human error to which statistical analysis is exposed. Data entry errors, bias questions, biased processes or decision making, inappropriate analysis and incorrect conclusions