Week 11 - Introduction, Data, and Graphical presentation Flashcards
What is data?
the facts and figures collected, summarised, analysed, and interpreted
What is a data set?
The data collected in a particular study
What are elements?
the entities on which data are collected
What is a variable?
a characteristic of interest for the elements
What is an observation?
The set of measurements collected for a particular element
How is the stock return calculated?
Percentage change in the price of the stock
How do you calculate the total number of data values in a data set?
the number of elements multiplied by the number of variables
What are the scales of measurement?
Scales of measurement refer to how data is categorised, ordered, and quantified.
There are four main types of measurement scales: Nominal, Ordinal, Interval, and Ratio.
Data is either?
Qualitative and quantitative
What is within quantitative data?
numerical ->
interval, ratio
What is within qualitative data
Numerical ->
Nominal, ordinal
Non-numerical ->
Nominal, ordinal
What is nominal data?
Data are labels or names used to identify an attribute of the element.
A nonnumeric label or numeric code may be used.
Example of nominal data
Students of a university are classified by the school in which they are enrolled using a nonnumeric label such as Business, Humanities, Education, and so on.
Alternatively, a numeric code could be used for the school variable (e.g. 1 denotes Business, 2 denotes Humanities, 3 denotes Education, and so on).
What is ordinal data?
The data have the properties of nominal data and the order or rank of the data is meaningful.
A nonnumeric label or numeric code may be used.
Example of ordinal data
Students of a university are classified by their course performance using a nonnumeric label such as Distinction, Merit, Pass or Fail.
Alternatively, a numeric code could be used for the class standing variable (e.g. 1 denotes Distinction, 2 denotes Merit and so on).
What is interval data?
The data have the properties of ordinal data, and the interval between observations is expressed in terms of a fixed unit of measure.
Interval data are always numeric.
Example of interval data
Marianna has an GMAT score of 605, while Kostas has an GMAT score of 490. Marianna scored 115 points more than Kostas.
What is ratio data?
The data have all the properties of interval data and the ratio of two values is meaningful.
Variables such as distance, height, weight, and time use the ratio scale.
This scale must contain a zero value that indicates that nothing exists for the variable at the zero point.
Example of ratio data
Marianna’s college record shows 36 credits earned, while Kostas’s record shows 72 credits earned. Kostas has twice as many credits earned as Marianna.
What is cross sectional data?
data collected at a single point in time or over a short period for multiple subjects (e.g., individuals, companies, countries).
Example of cross sectional data
data detailing the number of building permits issued in June 2006 in each of the regions of Italy
What is time series data?
data collected over a period of time at regular intervals (e.g., daily, monthly, yearly).
Example of time series data
data detailing the number of building permits issued in Tuscany, Italy in each of the last 36 months
What are descriptive statistics
are the tabular, graphical, and numerical methods used to summarise data.
What are the 3 main categories of descriptive statistics?
- Measures of Central Tendency (Where is the data centered?)
- Measures of Dispersion (Variability) (How spread out is the data?)
- Measures of Shape & Distribution (What is the pattern of data?)
Example of descriptive statistics
The manager of Heuson Auto would like to have a better understanding of the cost of parts used in the engine tune-ups performed in the shop. She examines 50 customer invoices for tune-ups. The costs of parts, rounded to the nearest euro, are listed on the next slide.
How is a histogram presented?
The variable of interest is placed on the x axis.
A rectangle is drawn above each class interval with its height corresponding to the interval’s frequency, relative frequency, or percentage frequency.
How to describe a symmetrical histogram?
left tail is the mirror image of the right tail
How to describe a skewed left (negative skewness) histogram?
a longer tail to the left
How to describe a right skewed (positive skewness) histogram?
a longer tail to the right
What does the cumulative frequency distribution show?
the number of items with values less than or equal to the upper limit of each class
What does the cumulative relative frequency distribution show?
the proportion of items with values less than or equal to the upper limit of each class
What does the cumulative percentage frequency distribution show?
the percentage of items with values less than or equal to the upper limit of each class
What are the 3 measures of central tendency?
mean, median, mode
What are the 4 measures of dispersion?
standard deviation, variance, range and interquartile range
What are the 2 measures of shape and distribution?
skewness and kurtosis
What is a scatter diagram?
a graphical presentation of the relationship between two quantitative variables
How is a scatter diagram presented?
One variable is shown on the x axis and the other variable is shown on the y axis
The general pattern of the plotted points suggests the overall relationship between the variables
A trend line is an approximation of the relationship
What does a positive relationship trend like look like?
upward sloping line
What does a negative relationship trend like look like?
downward sloping line
What does a no apparent relationship trend like look like?
horizontal
What is statistical inference?
the process of using data obtained from a sample to make estimates and test hypotheses about the characteristics of a population
What is the population (part of statistical inference)?
the set of all elements of interest in a particular study
What is a sample (part of statistical inference)?
a subset of the population
What does statistical analysis often involve?
working with large amounts of data
What is usually used to conduct statistical analysis?
Computer software is typically used to conduct the analysis.
Statistical software packages such as Microsoft Excel, Minitab and SPSS and are capable of data management, analysis, and presentation