Refers to Subset (S) of Data (D) On its own returns all data you're working on E.g unicorn[ , .SD] .SDcols tells data.table what columns you want db[, StressAVG := rowMeans(.SD, na.rm = TRUE), .SDcols = c(PSS1, PSS2, PSS3, PSS4)]

W3: Data Visualization Flashcards by Val Y

What are the 2 methods of scoring scales?

Added together for total sum score
Average of all items

How well did you know this?

Not at all

Perfectly

What are the two ways of averaging item scales using rowMeans()

Normal, just average
Multiple by number of items after averaging

How well did you know this?

Not at all

Perfectly

What is .SD

Refers to Subset (S) of Data (D)
On its own returns all data you’re working on
E.g unicorn[ , .SD]
.SDcols tells data.table what columns you want
db[, StressAVG := rowMeans(.SD, na.rm = TRUE), .SDcols = c(“PSS1”, “PSS2”, “PSS3”, “PSS4)]

How well did you know this?

Not at all

Perfectly

What function is used to calculate reliability of a scale?

psych::alpha()
* refers to Cronbach’s alpha

How well did you know this?

Not at all

Perfectly

What should you add to psych::alpha when using reverse scored scales?

check.keys = TRUE
E.g psych::alpha( as.data.frame( db
[, .(PSS1, PSS2r, PSS3r, PSS4)]),
check.keys = TRUE)

How well did you know this?

Not at all

Perfectly

What do aesthetics do and what are 4 examples of them in ggplot2?

Controls how geometrics are displayed
Size, shape, colour, transparency level

How well did you know this?

Not at all

Perfectly

What are 4 common geoms_ used for univariate graphs?

geom_histogram( ) , geom_density( ) , geom_dotplot( ), geom_qq( )

How well did you know this?

Not at all

Perfectly

What argument does geom_qq() need?

scale( predictor ) to z-score data
* z = (x-mean) / SD
geom_abline( intercept = 0, slope = 1): line where all points would fall if normally distributed

How well did you know this?

Not at all

Perfectly

What function is used to check distribution?

plot( testDistribution() )
E.g plot(testDistribution(db$Stress,
extremevalues = “theoretical”, ev.perc = .005))

How well did you know this?

Not at all

Perfectly

What 3plots are shown from using plot( testDistribution() )?

Density plot, rug plot, deviates plot

How well did you know this?

Not at all

Perfectly

What do you need to do when mapping additional categorical variables onto graphs?

Convert variable into a factor
* db [, sex := factor ( sex, levels = c(1,2),
labels = c(“male”, “female))]
* ggplot(db[!is.na(sex)], aes(Stress, colour = sex)) + geom_density()

How well did you know this?

Not at all

Perfectly

When using geom_histogram, it is more helpful to control what?

Fill colour
e.g ggplot(db[!is.na(sex)], aes(Stress, fill = sex)) +
geom_histogram()

How well did you know this?

Not at all

Perfectly

What is the argument to have bars side by side when using geom_historgram?

geom_histogram(position = “dodge”)
* bars are stacked by default

How well did you know this?

Not at all

Perfectly

What are 3 common geoms used for bivariate graphs?

geom_point() scatter plot, geom_line(), geom_bar(stat = “identity”) for values to be actual bar height

How well did you know this?

Not at all

Perfectly

What is best practice for data visualization?

More data, less ink

How well did you know this?

Not at all

Perfectly

What are 4 ways to reduce ink and provide more data in graphs?

Study These Flashcards

Remove background borders - theme_pubr()
Remove axis lines - theme(axis.line = element_blank() )
Replace geom_bar with geom_point
Using shapes for values - scale_shape_manual(
name = “Sex”,
values = c(“male” = 1, “female” = 3))

How do you change axes to only go the range of observed data?

Study These Flashcards

Using geom_rangeframe()

What are 2 ways to add [interquartile] break points to axis labels?

Study These Flashcards

Using quantile()
* scale_x_continuous(breaks = as.numeric(quantile(db$Stress)))
* scale_y_continuous(breaks = as.numeric(quantile(db$SE)))
Using scale_x/y_discrete
scale_y_continuous(labels = percent) +
scale_x_discrete(
breaks = c(“High SE”, “Low SE”),
labels = c(“High SE (median)”, “Low SE (median)”))

What are the functions used for boxplot with raw data shown?

Study These Flashcards

geom_boxplot() + geom_jitter()

What are the 2 ways to provide mean and/or 95% CI on graphs?

Study These Flashcards

stat_summary(fun.data = mean_cl_normal)
Using prop.test
LL = prop.test
(x = sum(sex == “female”, na.rm = TRUE), n = sum(!is.na(sex)), correct = FALSE)$conf.int[1],
UL = prop.test(
x = sum(sex == “female”, na.rm = TRUE),
n = sum(!is.na(sex)), correct =FALSE)$conf.int[2])

What should you do before graphing all categorical variables?

Study These Flashcards

Make 1 variable “continuous” by getting their percentages using egltable()

What function provides multi-panel plot which is useful for all categorical variables graphing?

Study These Flashcards

facet_grid and/or coord_flip

What is the common graph/geom for all continuous variables?

Study These Flashcards

geom_point i.e scatter plot

What are 4 things you can add to a graph with all continuous variables?

Study These Flashcards

correlation coeff and p-values using
cor.test(~ SE + Stress, data = db)
regression line using
stat_smooth(method = “lm”)
text annotation using
annotate(“text”, x = max(db$Stress), y = max(db$SE),
label = “r = -0.65, p < .001”,
size = 6, hjust = 1, vjust = 1)
histograms to margins using
ggMarginal( x, type = “histogram”)

How do you make more space for long axis labels?

ggarrange( ggtitle ( "rotate text") or ("rotate graph")

What are 3 ways to improve geom_dotplot visualizations?

1. binwidth = .1 to shrink dot size 2. alpha = .2 for dot transparency 3. y = jitter to add noise of scores

What are 2 scenarios you would use geom_violin?

1. For large datasets 2. To compare distributions across variables

What is a benefit of using rowMeans instead of simply adding all variable scores together?

it imputes the mean for a person with missing data

W3: Data Visualization Flashcards

(28 cards)