Study unit 8: Describing and interpreting quantitative date Flashcards
define descriptive statistics
: mathematical techniques used to see underlying patterns of data
what is frequency distribution
frequency distribution. Such a distribution indicates the number of cases in a data set that obtained a particular score or that fall in a particular category of a variable. Frequency distribution is therefore the grouping of raw data
what is a grouped frequency table
grouped
frequency table: frequency distribution table with a limited number of categories
what is cumulative frequency
The cumulative frequency (cf) of a class interval is the number of cases in the specified interval plus all the cases in the previous intervals. In other words, the cumulative frequency (cf) of a class interval is the number of cases that fall below the lower limit of the next interval
explain percentages
The percentage of a category, a score value or a class interval indicates what part of the whole sample of scores that particular category, value or class interval represents. Percentage is determined by dividing the frequency by the total number of cases (n) and then multiplying it by 100 (100% represents the whole sample).
what is one advantage of graphs
An important advantage of graphs is that they make it easier to obtain an overall impression of the data: a graph gives you a “picture” of a set of scores.
what is a bar chart
bar chart: graph representing the frequency distribution of categorical data
what is a histogram
histogram:
graph representing the frequency distribution of successive scores or class intervals
what is the difference between barchart and histogram
Histograms
are used to illustrate the frequency distribution of numerical
data (data measured on an interval or ratio level of measurement). A bar
chart reflects discrete data, whereas a histogram is used for continuous data
what is a frequency polygon
frequency polygons:graph in which the frequencies of class intervals are connected by straight lines
how does distribution of data differ
The distributions of data differ in terms of central location (the middle point of the distribution) and variation (the spread of the scores around the middle point).
Distributions also differ in skewness, that is, the symmetry or asymmetry of the distribution. A distribution can be symmetrical — that is, it can have the same shape on both sides of the middle point. If a distribution is asymmetrical and the larger frequencies are concentrated towards the low end, it is said to be positively skewed. If the larger frequencies are concentrated toward the high end of the variable, the distribution is negatively skewed
The
kurtosis of distributions refers to the flatness or peakedness of the distribution. A symmetrical bell-shaped distribution is known as a normal distribution. In terms of kurtosis this distribution is mesokurtic. A more peaked distribution is called leptokurtic, while a flatter distribution is platykurtic
what is the measure of central tendancy
A score or value which represents all the scores in the sample is called a measure of central tendency
define mode
score in a sample of scores that occurs with the greatest frequency
If two or more successive scores in a sample all have the highest frequency, the average (this term will be explained later on in this section) of those scores is taken as the mode of the distribution
if two values that do not follow on each other both have the highest frequency, the sample has two modes. Such a distribution is called bimodal (compared to a unimodal distribution with a single mode).
discuss median
value or score such that half the observations fall above it and half below it
If the number of scores is an odd number, the median is simply the score in the middle of the list. When the number of scores is an even number, the middle of the list falls between two values and the median is the average of these two scores
explain what is mean
mean:
sum of a sample of scores divided by the number of scores in the sample
The n measurements in a sample of scores are thus represented by the symbols x1, x2, x3, …, xn. The formula for the mean is
x¯ = x1 x2 x3 + … + xn
n
and this can also be written as
x¯ = Σ x n
what is score varibility
The degree to which scores in a sample differ, that is, how spread out they are, is called the variability of the scores.
The simplest measure of variability is the range. In any sample of scores the range is taken as the difference between the highest and lowest scores. The range is a measure of variability of scores in a sample, because it indicates the range of the distribution of scores from the lowest to the highest.
A disadvantage of the range of the distribution as a measure of variability is that it is calculated by using only two of the scores in the sample of scores; the other scores are ignored
what is variance
variance:
measure of variability based on the deviation of each score in a distribution from the mean of that distribution
s2 = Σ (x – x)2 n – 1
In this formula s2 is the variance, Σ means to sum, x is each raw score, x¯ is the mean, and n is the sample size.
what is standard deviation
distributionstandard deviation: index of variability that is expressed in the same units as the original measures
s = √s2¯
Both the variance and the standard deviation of a sample of scores indicate the average extent to which scores in a distribution differ from one another. Because the standard deviation is expressed in the same units as the original measure,
comment on the relationship between variables
direct or positive relationship means that relatively high scores on one variable are associated with relatively high scores on the other and relatively low scores on the first correspond with relatively low scores on the second. An inverse or negative relationship means that high scores on one variable correspond with low scores on the other variable. If the variables are not related, changes on the one variable do not correspond with changes on the other.
what is a correlation coefficient
correlation coefficient: index of the extent of the linear relationship between two variables
These values represent a perfect negative (–1) or a perfect positive correlation (+1). A value close to 0 indicates a weak relationship, while 0 means there is no relationship
A positive correlation means that an increase in one variable is associated with an increase in the other. A negative correlation between two variables means that as the value of one variable increases, the value of the other one decreases.