1.3 Understanding your data set Flashcards

Question 1

Q

Observation

Answer

A

are instances of some group of interest and are generally represented as the rows of a worksheet. If you had data on individual students, for example, each student would count as a single observation. If you had data on different songs, each song would count as a single observation.

Question 2

Q

characteristics

Answer

A

are what each column within a data set represents. In Figure 2, for instance, each observation, or student, has only a single characteristic: “Classroom.” In the second image, another characteristic is added: “Semester.” While both of these examples are quite simple, in principle, you could have any number of characteristics.

Question 3

Q

Discrete

Answer

A

data is data that can only be represented through whole numbers (e.g., the number of students in a class or the number of animals in a zoo). You couldn’t have half of a student, or, say, .378 of a leopard (unless you’re looking at data from some kind of horror movie!).

Question 4

Q

Continuous

Answer

A

data, on the other hand, is measured along a scale and can take any point along that scale as its value. (for tempature 94.5 93.5 etc

Question 5

Q

Nominal

Answer

A

the categories have no order. If you were categorizing cars, for example, you could have categories for each manufacturer (e.g., Honda, Ford, Toyota, etc.). As none of these categories are more or less than the other categories, there’s no implicit order to how you might organize them.
Honda, Ford, Toyota

Question 6

Q

Ordinal

Answer

A

data in which there exists an implicit order to the way it’s organized, f
Small, Medium, Large

Question 7

Q

Binary

Answer

A

categorizes data into two groups
Good, Bad yes, no

Question 8

Q

Discrete

Answer

A

Number of Students

Question 9

Q

Continuous

Answer

A

Tempature

Question 10

Q

Continuous

Answer

A

Temperature

Question 11

Q

data imputation

Answer

A

involves substituting an estimated value for a missing value. There are various approaches to making the estimation: averaging the non-missing values, taking the most common of the non-missing values, or even taking a random value from the non-missing data. At the end of the day, the analyst needs to decide carefully whether to remove rows with missing data or to impute values for the missing data.

Question 12

Q

What to do if the missing data is random

Answer

A

remove those rows guuurrll

Question 13

Q

If the missing data isn’t random?

Answer

A

If the missing data isn’t random, however, it’s usually better to impute. This will keep you from introducing bias into your data set.

Question 14

Q

Population

Answer

A

domain of interest ex, women between 25-35 years of age

Question 15

Q

sample

Answer

A

subset of that population

Question 16

Q

There will always be a gap between inferences you make on the basis of your sample and what’s actually true of the population. This could be described as