Quiz 8 Flashcards

1
Q

What does the unique function do in R?

A

Tells you what all of the unique values are in a particular column
ex. unique(ldt_data$RT_speed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does the “filter” function do?

A

Selects the rows that match given criteria
ex. filter(ldt_data,RT_speed==“fast”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does == do in R?

A

(two equals signs)
tests/checks whether something is equal
ex. x==y will test if they are equal, say “TRUE” or “FALSE”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you eliminate rows in a column that have a specific value?

A

filter(ldt_data, Length_type != “mid”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can you get R to convert what it is considering characters to be seen as numbers (ex. if your numeric data has quotes around it)

A

ex. x<- (“10”, 20, 30)—will read all as characters
as.numeric(x) (now can run numeric functions)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can you combine multiple conditions when filtering in R?

A

use ‘&’ to separate conditions
ex. filter(ldt_data, RT_speed==“slow” & Length ==9)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does the “mutate” function do?

A

create a new variable (“mutate” data frame)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can you save a new data frame that has the changes made from a mutate operation?

A

can reassign variable ex. ldt_data
ldt_data<- mutate(ldt_data, Length_10= Length*10)

or, if not appending previous data, can save as new variable:
ex. new_length<-(ldt_data$Length*10)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the purpose of descriptive statistics?

A
  • to make a summary of the data we have at hand
  • to explore existing data for patterns
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the types of data in descriptive statistics?

A
  1. nominal (values are names or labels)
  2. ordinal (nominal categories are ordered)
  3. interval (equal intervals on scale represent equal differences between the points on the scale)
  4. ratio (similar to interval ratios, but here a zero is meaningful, not arbitrary)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe nominal data

A
  • values simply names or labels
  • nominal variables represent the least precise and informative level of measurement in comparison to other data types
    ex. speaker of a language can be:
  • “native” or “non-native”
  • male or female
    A doesn’t = B (vowel category: i, e, a, o, u)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe ordinal data

A
  • dealing with ordinal variables when nominal categories are ordered
  • ex. “likert scale”-> ‘strongly disagree->’disagree>’agree>’strongly agree’
  • ex. vowel height: low—low-mid—high-mid—high
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe interval data

A
  • dealing with interval variables if equal intervals on the scale represent equal differences between the points on the scale
  • ex. temperature (C or F)— 25C to 30C same as 20C to 25C (0C doesn’t mean no temperature)
  • multiplication, division don’t make sense (20C not twice as warm as 10C)
  • negative values possible
    A+B, A-B (birth year 1985, 1978, 2005)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe ratio data

A
  • ratio variables similar to interval, but zero is meaningful
  • zero can mean absence of the thing being measured ex. 0 occurrences of word “aardvark” in a text
  • multiplication and division make senes (book can have 2x as many “aardvark” references as another
  • negative value not possible
    A x B, A / B (vowel duration: 50, 49, 53, 60ms)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a pipe in tidyverse?

A

a special operator %>% to feed things to the next process
ex. df4<-ldt_data %>%
filter(Length>9) %>%
(takes whatever comes before it and passes it on to next step)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What kinds of data are considered “categorical”?

A

nominal and ordinal

17
Q

What kinds of data are called “continuous/numeric” data?

A

interval and ratio

18
Q

When is it useful to use the mean of a variable?

A

when the data is normally distributed

19
Q

What is standard deviation?

A
  • measure of how far apart the data points of a variable are from the mean
  • indicator of how reliable the mean is
    *when 2 means are the same, the one with the smaller SD is more reliable
    **necessary to include SD when mean is reported
20
Q

What is the median in a list of data points?

A

the data point that is right in the middle after data points are sorted in ascending order

21
Q

What is the IQR?

A

the interquartile range
the difference between the 75th percentile (Q3) and the 25th percentile (Q1) of the datapoints (see slide 18 (Descriptive data slides)

22
Q

How do you calculate the IQR?

A
  1. calculate overall median
  2. calculate the median of the first half (Q1)
  3. calculate the median of the 2nd half (Q3)
  4. find difference between Q1 and Q3
    (see slide 18 of descriptive data slides)
23
Q

How are IQR and median related

A

IQR should be reported with median
smaller IQR= more accurate median

24
Q

When is it better to report mean vs median?

A
  • mean a better choice when the distribution is symmetric and normal
  • median better when the distribution of the data is asymmetric and non-normal (skewed) (ex. mean income for a Canadian could seem really high because of small number of very rich people…so data is not normal distribution)—accounts for outliers that are not part of main trend being examined
25
Q

What is a histogram?

A

a plot that shows the distribution of datapoints in a variable by plotting how many times each datapoint occurs in the data

26
Q

How can you decide whether to use mean or median by looking at a histogram?

A

A histogram will be bell-shaped if there is a normal distribution, in which case the mean is a better option

27
Q

What are the characteristics of a (perfect) normal distribution?

A
  • mean= median
  • 50% of values on the left side of the mean (and median)
  • 50% of values on the right side of the mean (and median)
  • will be bell-shaped (“bell curve”)
28
Q

What does a left-skewed distribution look like?

A
  • the curve has a long left tail (negatively skewed)