BUSI344 / CHAPTER 3 Flashcards
___________________ is an approach to learning from data.
A by-product of both the computer revolution and the growth of the Internet has been an exponential growth in the amount and complexity of available data.
Exploratory Data Analysis (EDA) is an approach to learning from data.
A by-product of both the computer revolution and the growth of the Internet has been an exponential growth in the amount and complexity of available data.
Give examples of ratio variables
Variables like height, weight, enzyme activity are ratio variables.
Interval Variables
Interval variables have a relationship between them, e.g., House A built in 2011 is five years newer than House B built in 2006. However, the relative “distance” between the two does not have a direct mathematical relationship or meaning. For example, is House A 2011 2006 = 1.00249 times better than House B based on year built? Probably not.
EDA is a way to help _ _ _ _ _ _
EDA is a way to help make sense of the vast data facing today’s business analyst.
The EDA process is about _________, __________, and _______ information.
The EDA process is about evaluating, synthesizing, and leveraging information.
A set of data involves a number of “variables” and “observations” or “cases”. The “cases” are the observations accumulated into the dataset, such as 550 property sales, 120 leases, or 14,562 automobile purchasers.
The “variables” are the characteristics of the cases, such as the number of bedrooms, square footage, base rents, or car colour preferences.
A set of data involves a number of “variables” and “observations” or “cases”. The “cases” are the observations accumulated into the dataset, such as 550 property sales, 120 leases, or 14,562 automobile purchasers.
The “variables” are the characteristics of the cases, such as the number of bedrooms, square footage, base rents, or car colour preferences.
BINARY VARIABLES
Binary variables (or dummy variables) are a special case of a discrete variable used for non-numeric variables, such as location, building features, and views. A binary variable, as the name implies, has only two possible values; the classic example is on and off. These are most often used in data analysis to indicate the presence or absence of a particular characteristic.
Explain ordinal data?
Ordinal data is a categorical, statistical data type where the variables have natural, ordered categories and the distances between the categories is not known.
Examples of ordinal data
A well-known example of ordinal data is the Likert scale.
Examples of ordinal data are often found in questionnaires: for example, the survey question “Is your general health poor, reasonable, good, or excellent?” may have those answers coded respectively as 1, 2, 3, and 4. Sometimes data on an interval scale or ratio scale are grouped onto an ordinal scale: for example, individuals whose income is known might be grouped into the income categories $0-$19,999, $20,000-$39,999, $40,000-$59,999, …, which then might be coded as 1, 2, 3, 4, …. Other examples of ordinal data include socioeconomic status, military ranks, and letter grades for coursework.
Explain what a nominal variable is?
A nominal variable is another name for a categorical variable.
Nominal variables have two or more categories without having any kind of natural order. they are variables with no numeric value, such as occupation or political party affiliation.
What is an interval variable?
A interval variable is a measurement where the difference between two values is meaningful. The difference between a temperature of 100 degrees and 90 degrees is the same difference as between 90 degrees and 80 degrees.
What is a ratio variable?
A ratio variable, has all the properties of an interval variable, and also has a clear definition of 0.0. When the variable equals 0.0, there is none of that variable.
Variables like height, weight, enzyme activity are ratio variables.
Temperature, expressed in F or C, is not a ratio variable. A temperature of 0.0 on either of those scales does not mean ‘no heat’.
However, temperature in Kelvin is a ratio variable, as 0.0 Kelvin really does mean ‘no heat’.
Another counter example is pH. It is not a ratio variable, as pH=0 just means 1 molar of H+. and the definition of molar is fairly arbitrary. A pH of 0.0 does not mean ‘no acidity’ (quite the opposite!). W
hen working with ratio variables, but not interval variables, you can look at the ratio of two measurements. A weight of 4 grams is twice a weight of 2 grams, because weight is a ratio variable.
A temperature of 100 degrees C is not twice as hot as 50 degrees C, because temperature C is not a ratio variable. A pH of 3 is not twice as acidic as a pH of 6, because pH is not a ratio variable.
Ratio variables on the other hand can tell us a great deal. For example, a 5,000 square metre warehouse is twice as large as a 2,500 square metre one. A house with 50 metres of waterfront has 25% more than a house with 40 metres.
Define Ratio Variable
Variable ratio definition
Ratio variables are interval variables, but with the added condition that 0 (zero) of the measurement indicates that there is none of that variable. So, temperature measured in degrees Celsius or Fahrenheit is not a ratio variable because 0C does not mean there is no temperature.
BINARY VARIABLES
A SPECIAL CASE OF
Binary variables (or dummy variables) are a special case of a discrete variable used for non-numeric variables, such as location, building features, and views.
4 Rs OF EDA
The “four Rs” of EDA: reduction, revelation, re-expression, and residuals (from models).
REDUCTION
Reduction means simplifying the information, focusing it to a small enough “package” that it becomes comprehensible. As an analogy, consider how the term “reduction” is used in cooking: boiling down a soup until the excess liquid is evaporated, with the broth becoming increasingly concentrated. The same process is used in data analysis: “boil it down” until the essential elements become clear.