Chapter 1-9 Flashcards
An outlier is a data value that…
…is not consistent with the bulk of the data
Which statistics is not resistant to an outlier in the data?
Mean
Which one of these statistics would be affected by an outlier?
Standard Deviation
Which of these statistics is unaffected (not affected) by outliers? A. Mean B. SD C. Interquartile Range D. Range
C. Interquartile range
…
…
True or False: Outliers cause complications in all statistical analysis
False: outliers do affect such statistics such as means and standard deviation but there are appropriate measures of location and spread
True or false: since outliers cause complications in statistical analyses, they should be discarded before computing summaries such as the mean and the standard deviation
False: they should never be discarded without justification
What is a reasonable action if an outlier was a mistake made in measuring the object?
The value should be corrected if possible or discarded if not possible to correct it
What is a reasonable action if an outlier is the value for the only young subject in a sample where all the other values were older subjects?
The value should be discarded and the results summarised and reported for the older subjects only
Tallies and cross-tabulations are used to summarise which of these variable types? A. Quantitative. B. Mathematical C. Continuous D. Categorical
D. Categorical
Which of these variables is a categorical variable?
A. Number of ear pierces a person has
B. Height of a person
C. Weight of a person
D Opinion about legalisation of marijuana
D.
Which of the following variables is not categorical? A. Age of a person B. Gender of a person C. Choice of test item: true of false D. Martial status of a person
A. Age
Which of the following is not a term used for quantitative variable? A. Measurement variable B. Numerical variable C. Continuous Variable D. Categorical variable
D. Categorical variable
A variable that is not the main concern of the study but may be partially responsible for the observed results is known as…
… the cofounding variable
_______ of only a few thousands, or few hundreds, can give reasonable accurate information about a population of many millions
A representative sample
All of the following are categorical variables except?
A. Gender of a student
B. Colour of a car entering the parking low
C. Number of flowers on an azalea plant
D. The state in which a person lives
C.
Which of the following is not a continuous variable?
A. A persons body temp
B. Number of claims received by an insurance company during one day
C. Weight of two dozen shrimp
D. Height in inches of freshman at a university
B.
An experiment was conducted to compare the mean lengths of time required for the body to absorb two drugs ( A and B ). Ten people were randomly selected and assigned to receive one of the drugs. The length of time In minutes for the drug to reach a specified level In the blood was recorded. What is the explanatory variable in tis study?
The type fo drug
What is the percent of data which lies between the minimum and the upper quartile?
75%
Which of the following would indicate that a dataset is not bell shaped?
A. The range is equally to 6 SD
B. The range is larger than the interquartile range
C. The mean is much smaller than the median
D. There are no outliers
C.
Exam scores in % range from 0 to 100. Suppose an exam was difficult and most of the students scored low with only a few students scoring high. Which would best describe the shape of the distribution?
A. Right - Skewed
B. Left - skewed
Right-skewed
If a diagram moves from high (left) to low (right). What direction is it skewed.
Skewed to the right
When the results of an experiment can be applied to real-world conditions, that experiment is said to have... A. Factorial Validity B. Criterion Validity C. Ecological Validity D. Content Validity
C
A frequency distribution in which high scores are most frequent (i.e bars on the graph are highest on the right hand side) is said to be... A. Negatively skewed B. Leptokurtic C. Positively skewed D. Platykurtic
A
Which of the following is designed to compensate for practice effects
A. A repeated measured design
B. Counterbalancing
D. A control condition
D. Giving participants a break between tasks
B
Variation due to variable that have not been measured is known as: A. unsystematic variation B. Homogenous variance C. Systematic variation D. Residual variance
A
What is the standard deviation?
A. The variance sqaured
B. the degree to which scores cluster at the ends of the distribution
C. A measure of relationship b/w two variables
B
What does a low SD indicate?
the values tend to be close to the mean of the set
What does a high SD indicate?
the values are spread out over a wider range
If a test is valid, what does it mean?
A. The test has internal consistency
B. The test will give consistent results
C. The test measured a useful construct or variable
D. The test measures what it claims to measure
D
A variable that measures the effect that manipulating another variable has it known as: A. Independent variable B. Dependent variable C. Cofounding variable D. Predictor variable
B
If the scores on a test have a mean of 26 and a SD of 4, what is the z-score for a score of 18? 2 11 -2 -1.14
-2
18-26)/(4
How do you calculate the z-score?
(score-mean)/(SD)
A frequency distribution in which there are too many scores at the extremes of the distribution said to be: A. Negatively skewed B. Leptokurtic (steep) C. Positively skewed D. Platykurtic (flat)
Platykurtic (flat)
A café owner wanted to compare how much revenue he gained from lattes across different months of the year. What type of variable is 'month'? A. categorical B. dependent C. interval D. continuous
A
Complete the following sentence: A large standard deviation (relative to the value of the mean itself)…
…indicates that the data points are distant from the mean (i.e. the mean is a poor fit of the data).
Complete the following sentence: A small standard deviation (relative to the value of the mean itself)…
…indicates that data points are close to the mean (i.e. the mean is a good fit of the data).
A frequency distribution in which low scores are most frequent (i.e. bars on the graph are highest on the left hand side) is said to be:
positively (right) skewed
A frequency distribution in which there are too many scores at the extremes of the distribution said to be:
Platykurtic (flat)
At Donald’s Donuts the number of donut holes in a bag can vary. Help Donald find the mode.
12,10,10,10,13,12,11,13,10
10
Roger bowled 7 games last weekend. His scores are: 155, 165, 138, 172, 127, 193, 142. What is the range of Roger’s scores?
66
193-127
Find the mean of the following cell phone usage per month: 445, 516, 618, 575, 288
488.4
Find the median from the list of numbers:
60, 58, 52, 48, 60, 67
59
"Students' scores on a biology test" is an example of which scale of measurement? A. Ratio B. interval C. ordinal D. Nominal
A. ratio
Calender year is an example of what scale of measurement?
interval
“Amount of calories in a small Al Marai Yogurt” is which scale of measurement?
Ratio
Shades of lipstick available in A MAC store, is which scale of measurement?
Nominal
Your age is an example of which scale of measurement?
Ratio
ZIP code is an example of which scale of measurement?
nominal
Arranging the shirt sizes as small, medium and large is an example of which scale of measurement?
ordinal
Pain scale in a doctor’s office is an example of which scale of measurement?
ordinal
blood type is an example of which scale of measurement?
nominal
A(n) _________ is a person or object that is a member of the population being studied.
Individual
The entire group of individuals to be studied is called the population. An individual is a person or object that is a member of the population being studied.
A(n) ______ is a numerical summary of a sample.
A statistic is a numerical summary of a sample.
A(n) ______ is a numerical summary of a population.
A parameter is a numerical summary of a population.
_________ are the characteristics of the individuals of the population being studied.
variables
What is the difference in a parameter and a statistic?
A parameter is a numerical description of a population characteristic.
A statistic is a numerical description of a sample characteristic.
Determine whether the variable is qualitative or quantitative.
Color of a car driven
The variable is qualitative b/c it is an attribute characteristic.
Determine whether the quantitative variable is discrete or continuous.
Freq. of a guitar note
The variable is continuous b/c it is not countable.
What is the difference in a discrete and continuous variable?
A discrete variable is a quantitative variable that has either a finite number of possible values or a countable number of possible values.
The term “countable” means that the values result from counting, such as 0, 1, 2, 3, and so on.
A continuous variable is a quantitative variable that has an infinite number of possible values that are not countable.
Determine whether the quantitative variable is discrete or continuous.
Points scored in a college basketball game
The variable is discrete because it is countable.
What is the advantage of using SPSS over calculating statistics by hand?
a) This is how most quantitative data analysis is done in “real research” nowadays
b) It reduces the chance of making errors in your calculations
c) It equips you with a useful transferable skill
ALL OF THE ABOVE
In SPSS, what is the “Data Viewer”?
A spreadsheet into which data can be entered
How is a variable name different from a variable label?
It is shorter and less detailed
What does the operation “Recode Into Different Variables” do to the data?
Redistributes a range of values into a new set of categories and creates a new variable
How would you use the drop-down menus in SPSS to generate a frequency table?
Click on: Analyze; Descriptive Statistics; Frequencies
Why might you tell SPSS to represent the “slices” of a pie chart in different patterns?
If you do not have a colour printer, it makes the differences between the slices clearer
In which sub-dialog box can the Chi Square test be found?
Crosstabs: Statistics
To generate a Spearman’s rho test, which set of instructions should you give SPSS?
Analyze; Correlate; Bivariate; [select variables]; Spearman; OK
Determining a Raw Score (X) from a z-Score
X = μ + zσ
The value of zσ is the deviation of X and determines both the direction and the size of the distance from the mean.
what does Normal distribution (Gaussian distribution) look like?
A symmetrical, bell-shape that describes the distribution of many types of data; most scores fall near the mean (68 percent fall within one standard deviation of it) and fewer and fewer near the extremes.
how to calculate variance?
The average of the squared difference from the mean.
What is Central tendency?
Average of a set of data (mean, median and mode).
The grades on a math midterm are normally distributed with mean of 67% and a standard deviation of 2.5. Greg scored a 70%. What is his z-score?
1.2
Richard grows prize winning pumpkins. He grows a pumpkin which weighs 450 pounds and enters it into a contest. The average weight of pumpkins in the contest is 320 pounds with a standard deviation of 75 pounds. What percentage of pumpkins weigh more than Richard’s pumpkin?
The z-score is 1.73, so the percentage of people with pumpkins weighing less than Richard’s is 95.82%. This means that only 4.18% of pumpkins weighed more than his.
What is measure of dispersion used to describe?
used to describe the spread of data items in a data set.
What are the two most common measures of dispersion?
range and standard deviation.
What are the 2 basic approaches to research design?
Comparative design
Correlational design
What is comparative design?
look for differences between different groups
What is correlational design?
look for relationships between variables in a single group of cases
What does correlational design investigate?
It investigates whether there is a correlation (i.e. a statistical relationship) between the chosen variables. in a single group of cases
What is a case?
A case is simply the source of the data; in psychology, this is usually an individual person
What are the 2 main types of data?
Numerical and nominal
What is the difference between numerical and nominal data?
Numerical variables are those in which a case is assigned a numerical value. Numerical variables are also referred to as score or quantitative variables
Nominal variables are those in which a case falls into one of two or more categories. Nominal variables are also referred to as categorical or qualitative variables
What are the 4 scales of measurement?
Nominal
Ordinal
Interval
Ratio
What is nominal scale of measurement?
Placing cases into named categories (e.g. sprinters could be categorised based on their nationality)
What is ordinal scale of measurement?
This ranks cases based on their order on a given variable (i.e. sprinters can be ranked 1st, 2nd, 3rd etc.)
What is interval scale of measurement?
Where the distances between the sequential points on the scale are equal (e.g. the temperature at the time of the race)
What is ratio scale of measurement?
The same as interval categorisation, but with an absolute zero (e.g. the sprinters’ best times)
Statistical techniques perform three key functions: What are they?
Descriptive
Inferential
Data reduction
What is descriptive statistical technique used to describe?
used to describe the information collected
What is inferential descriptive statistical technique?
relates to the confidence with which we can generalise from our sample to the population of interest
What is data reduction statistical technique?
allow a researcher to make sense of large amounts of data through using more advanced statistics
What are descriptive statistics?
Descriptive statistics are visual and numerical techniques for presenting the major features of one’s raw data
What is raw data?
Raw data are the actual measures taken from the sample (e.g. gender = female, age = 19)
What are score variables?
Score variables are those in which cases are given a numerical value (e.g. age, income)
What are nominal variables?
Nominal variables are those where cases are placed into named groups (e.g. gender, eye colour)
What is the best way to present data in which is in the form of scores?
Frequency charts and histograms
What is valid percent?
…is a percentage that does not include missing cases
how do calculate the cumulative percent?
divide cumulative frequency by the total number of observations then multiply by 100
Steps to creating a frequency distribution table and histogram on SPSS?
- Click on Analyze -> Descriptive Statistics -> Frequencies
- Move the variable of interest into the right- hand column
- Click on the Chart button, select Histograms, and the press the Continue button
- Click OK to generate a frequency distribution table
To save space, we also use numerical values to describe our sample. What are the three categories these fall into?
– Central tendency (i.e. mean, median and mode)
– Spread (i.e. range and inter quartile range)
– Variability (i.e. variance)
Measures of central tendency (mean, median, mode) use a…
… single value to describe the data set.
What are the three measures of central tendency?
mean, median, and mode
What is interquartile range?
Considers the middle 50% of the cases.
How to find the interquartile range?
- You remove the lowest and highest 25% of the sample; in our case this is 17, 18, 35 and 41.
- This leaves 19, 21, 22 and 25 and hence the inter-quartile range is between 19–25 = 6 years.
What is variance?
- focuses on the average amount that each case in the sample differs from the mean.
- It is an indication of the variability of your data.
- The variance is not a stand-alone statistic. It is typically used in order to calculate other statistics, such as the standard deviation. The higher the variance, the more spread out your data are.
How to calculate variance?
You can work out the variance by calculating:
– The mean
– How much each case differs from this mean – Squaring each of these deviations
– Summing these squared deviations
– Dividing the result by the number of cases
What are the three categories of central tendency?
Arithmetic Mean
Median
Mode
What are the 2 categories of spread?
Range
Interquartile range
What is skewness?
…considers whether the data is mostly to the left, right or central
– It is the degree of distortion from the symmetrical bell curve in a probability distribution.
What is kurtosis?
considers whether the distribution is
particularly flat or steep
Is negative skewed left or right?
left; lowest side on the left
is positive skewed left or right?
right: lowest part on the right
What is a negative (left) skew?
- More scores are to the right of the mode than to the left.
* The mean and median are smaller than the mode
What is a positive (right) skew?
- More scores are to the left of the mode than to the right
* The mean and median are bigger than the mode
SPSS includes skewness in:
Descriptive Stats, Frequencies, Explore, Statistics
A positive value of kurtosis:
the curve is steep compared to the normal curve
A zero value of kurtosis:
the curve is middling – just like the normal curve
A negative value of kurtosis:
the curve is flatter compared to the normal curve
SPSS includes kurtosis in:
Descriptive Stats, Frequencies, Explore, Statistics
In kurtosis, value of 0 means?
no kurtosis
In kurtosis, negative value means?
flat curve
In kurtosis, positive value means?
steep curve
What is bimodal?
frequency distribution with two peaks (or multiple peaks)
A simple frequency distribution indicates…
…the number of people who achieved any particular score
A cumulative frequency distribution gives …
the number scoring, say, one, two or less, three or less, four or less, and five or less. In other words the frequencies accumulate.
What are percentiles?
- Merely a form of cumulative frequency distribution but the categorisation is in terms of whole numbers of percentages of people
- So it is the score which a given percentage of scores equals or is less than
The 50th percentile corresponds to?
the median score
What are the quartiles?
Quartiles are the 25th percentile, the 50th and the 75th percentile
What are quartiles used in?
They are commonly used in standardisation tables of psychological test and measures
What is SD computationally?
the square root of the variance.
What is Standard deviation Conceptually?
a distance along a frequency distribution of scores.
What is standard deviation?
the average amount that the scores on a variable deviate (or differ) from the mean of the set of scores and It is the square root of the variance
As the SD gets larger, the distribution gets … which can make distributions look …
fatter… flat when in fact they are not
The variance and SD tell us about the shape of the distribution of scores. As such, they are a measure of …
dispersion
If the mean represents the data well, then most scores will _____ to the mean and the resulting SD is small relative to the mean.
cluster close
When the mean is a worse representation of the data, the scores cluster more _____ around the mean.
widely
The SD and estimated SD are slightly different. Why?
The estimated SD is used when you are generalizing from a sample to the population from which the sample was taken.
How do you calculate SD on SPSS?
• Click Analyze -> Descriptive Statistics ->
Descriptives
• Drag the variable of interest from the left into the Variables box on the right
• Click Options, and select Standard Deviation
• Press Continue, and then press OK
• Result will appear in the SPSS output viewer
What are three most importance features of z-score?
- The mean of a distribution (e.g.,100)
- The SD of the distribution (e.g.,10)
• IQ of 120, Z-score is 120-100 = 20/10 =2 - That the distribution is more or less bell-
shaped (or normal)
If a score is interval or ratio in nature, SD and z-scores are ______?
appropriate
Difference between correlation and regression?
Correlation considers how closely the data points fall to the line of best fit
Regression describes the characteristics of the straight line
What does regression allow researchers to do?
to make predictions
In regression what should the horizontal (x) axis represent and what should the vertical (y) axis represent?
X = present the variable from which the prediction is being made Y = what is being predicted
In regression, is the independent variable presented on the x or y axis?
x axis (horizontal)
In regression, is the dependent variable presented on the x or y axis?
y axis (vertical)
What is the manual dexterity test score used to predict?
the number of units produced per hour
How is the manual dexterity test score done?
– A vertical line is drawn from manual dexterity score until you meet the best fit line
– You then draw a horizontal line until you meet
the vertical axis, which is the predicted score
One major problem with drawing lines to the regression line: it’s a _____?
subjective factor
What is the regression line?
the closest fit to the points on the scattergram.
In order to specify the regression line for any scattergram, you quantify 2 things: What are they?
– The intercept (a) or constant: the point at which the regression line cuts the vertical axis at X = 0. This is a number of units of measurement from the zero point of the vertical axis.
– The slope (b) of the regression line or the gradient of the best- fitting line through the points on the scattergram. This slope may be positive or negative.
When both variables are numerical what type of table and graph should be used?
Cross-tabulation or contingency tables
Scatterplot
Overlapping point marked with lines around the point on the scattergram is called?
sunflowers
When both variables are nominal what type of table and graph should be used?
Cross-tabulation or contingency tables
Compound Bar Chart & Stacked Bar Chart
If you have more than a few nominal categories in a cross-tabulation table, the tables or diagram can be too big and cumbersome. True or false?
True
When both variables are nominal and a Cross-tabulation or contingency table is used, percentages and frequencies should be used. True or false?
True
When one variable is nominal and the other is numerical, what type of table and graph should be used?
Cross-tabulation or Contingency Table
Compound histogram
What type of graph can you use to get a pictorial view of the relationship between two variables?
Scatter plot
In a scatter plot, if there is a negative relationship between your variables then there will be a general trend from…?
from the top left to the bottom right
In a scatter plot, if there is a positive relationship between your variables then there will be a general trend from…?
from the bottom left to the top right
What does correlation efficient indicate?
The amount of variance shared by two variables
A correlation of ___ shows that all of the variance can be explained
1.0
A correlation of ___ shows that none of the variance can be explained
0
when the correlation is closer to 1 (or-1 ) the ____ the relationship; the closer to 0 the ____ the relationship
stronger
weaker
How can one can calculate the ‘proportion of variance explained’?
by squaring the correlation coefficient and multiplying by 100
The squared correlation coefficient is also known as the?
coefficient of determination
What are the two types of correlation?
Pearson Product Moment Correlation Coefficient
Spearman Rho Correlation Coefficient
What is Pearson Product Moment Correlation Coefficient?
This can be used to explore the linear relationship between two variables
What is Spearman Rho Correlation Coefficient?
This is similar to the PPMCC but uses ranked scores rather than raw data
What is an main assumption underlying Pearson correlation?
the variables are normally distributed and correspond to the bell-shaped frequency curve
If the main assumptions underlying Pearson correlation is not met, then the outcome becomes?
then the outcome becomes more and more inaccurate.
The spearman Rho correlation makes the assumption about the normality of variables. True or false?
False. it doesn’t
Definition of non - parametric?
statistical techniques which do not assume that each variable is normally distributed
Use Pearson correlation coefficient unless?
there is a good reason not to
The correlation coefficient sometimes used as an indicator of the validity of a psychological test. True or false?
True
– It might be used to indicate the relationship between a test of intelligence and children’s performance in school.
– The test is a valid predictor of school performance if there is a substantial correlation between the test score and school performance.
The correlation coefficient is not a useful indicator of the reliability of a psychological test.
False
For example, the extent to which people’s scores on the test are consistent over time. You can use the correlation coefficient to indicate who perform well now on the test also performed well a year ago (test-retest reliability)
Which of the following is not categorical?
A. Age
B. Gender
C. martial status
A. Age
Which of these variables is a nominal variable? A. Age B. Weight C. Gender D. Intelligence
C. Gender
What does the operation “Recode Into Different Variables” do to the data?
A. Replaces missing data with some random scores
B. Represents the data in the form of a pie chart
C. transforms value ranges into a new set of categories & creates new variable
D. Reverses the position of the independent and dependent variable on a graph
C. transforms value ranges into a new set of categories & creates new variable
How would you use the drop-down menus in SPSS to generate a frequency table?
A. Open the Output Viewer and click: Save As; Pie Chart
B. Click on: Graphs; Frequencies; Pearson
C. Click on: Analyse; Descriptive Statistics; Frequencies
D. Open the Variable Viewer and recode the value labels
C. Click on: Analyse; Descriptive Statistics; Frequencies
In order to work out the z-score for a particular score (X) on a variable, we need to know: A. the mean and median B. variance and standard deviation C. the standard deviation and the median D. the mean and standard deviation
D. the mean and standard deviation
A big advantage using z-scores is:
A. variables measured using different units of measurement can be compared.
B. They are the same as standard deviations
C. They are pretty and interesting
D. They always show a normal distribution.
A. variables measured using different units of measurement can be compared.
A data set has a mean of 290 and a standard deviation of 25. Calculate the z-score for X=265. A. -1 B. 1 C. -1.25 D. 1.25
-1
What does it mean to have a z-score of z=0 on a quiz?
A. you scored the lowest grade
B. doesn’t mean anything
C. you scored the same grade as the average grade.
D. you scored the highest grade
C. you scored the same grade as the average grade.
The sign of the z-score indicates whether the location is above(positive) or below(negative) the mean. True or False?
True
The distribution of z-scores will always have a standard deviation of 1. True or false
True
The mean of the z-score will always be zero even though the raw scores is 100. True or False
True
Transforming raw scores to z-scores will change the shape of the distribution. True False
False
In SPSS: how is a variable name different from a variable label?
A. It is abstract and unspecific
B. It is longer and more detailed
C. It refers to codes rather than variables
D. It is shorter and less detailed
D. It is shorter and less detailed
The inter-quartile range is:
A. the range of the central 50% of the sample
B. the difference between the lowest and the highest value
C. the most frequent scores
D. a symmetrical ‘bell-shaped’ distribution
A. the range of the central 50% of the sample
A steep curve has a... A. positive kurtosis value B. negative kurtosis value C. zero kurtosis value D. none of these answers
A. positive kurtosis value
Standard Deviation is... A. the same value as the variance B. 1 value lower than the variance C. the squared variance D. the square root of the variance
D. the square root of the variance
What diagram will you build If you have 2 score variables? A. bar chart B. Scatter plot C. histogram D. Compound histogram
B. scatter plot
This cross-tabulation / contingency table has:
A. 1 nominal variable and 1 score variable
B. 2 score variables
C. 2 nominal variables
D. none of these
C. 2 nominal variables
Overlapping points marked with lines around the point on the scattergram/scatterplot are called: A. daisies B. tic-tac-toe C. Sunflowers D. Dandelions
C. sunflowers
Diagrams: For 2 score variables you use _____; 2 nominal variables _______ and 1 score and 1 nominal _______
A.compound/ stacked bar chart, compound histogram, scatter plot.
B. scatter plot, compound/stacked bar chart, compound histogram
C. scatter plot, compound histogram, compound.stacked bar chart.
D. scatterplot, compound bar chart, compound bar chart.
B. scatter plot, compound/stacked bar chart, compound histogram