W3: Data Visualization Flashcards
What are the 2 methods of scoring scales?
- Added together for total sum score
- Average of all items
What are the two ways of averaging item scales using rowMeans()
- Normal, just average
- Multiple by number of items after averaging
What is .SD
- Refers to Subset (S) of Data (D)
- On its own returns all data you’re working on
E.g unicorn[ , .SD] - .SDcols tells data.table what columns you want
db[, StressAVG := rowMeans(.SD, na.rm = TRUE), .SDcols = c(“PSS1”, “PSS2”, “PSS3”, “PSS4)]
What function is used to calculate reliability of a scale?
psych::alpha()
* refers to Cronbach’s alpha
What should you add to psych::alpha when using reverse scored scales?
- check.keys = TRUE
E.g psych::alpha( as.data.frame( db
[, .(PSS1, PSS2r, PSS3r, PSS4)]),
check.keys = TRUE)
What do aesthetics do and what are 4 examples of them in ggplot2?
- Controls how geometrics are displayed
- Size, shape, colour, transparency level
What are 4 common geoms_ used for univariate graphs?
geom_histogram( ) , geom_density( ) , geom_dotplot( ), geom_qq( )
What argument does geom_qq() need?
scale( predictor ) to z-score data
* z = (x-mean) / SD
geom_abline( intercept = 0, slope = 1): line where all points would fall if normally distributed
What function is used to check distribution?
plot( testDistribution() )
E.g plot(testDistribution(db$Stress,
extremevalues = “theoretical”, ev.perc = .005))
What 3plots are shown from using plot( testDistribution() )?
Density plot, rug plot, deviates plot
What do you need to do when mapping additional categorical variables onto graphs?
Convert variable into a factor
* db [, sex := factor ( sex, levels = c(1,2),
labels = c(“male”, “female))]
* ggplot(db[!is.na(sex)], aes(Stress, colour = sex)) + geom_density()
When using geom_histogram, it is more helpful to control what?
Fill colour
e.g ggplot(db[!is.na(sex)], aes(Stress, fill = sex)) +
geom_histogram()
What is the argument to have bars side by side when using geom_historgram?
geom_histogram(position = “dodge”)
* bars are stacked by default
What are 3 common geoms used for bivariate graphs?
geom_point() scatter plot, geom_line(), geom_bar(stat = “identity”) for values to be actual bar height
What is best practice for data visualization?
More data, less ink