Week 3: Univariate Flashcards
In psychology, scales/subscales are scored in which two ways?
either the individual items are simply added together or the individual items are averaged.
Sometimes items are worded in the opposite direction and must first be reverse coded and then added/averaged.
What is the rule of thumb for scoring in data management?
Using the rowMeans function. If the scale is typically added up, then multiply.
List each item variable name separated by + for addition.
Downside, if a participant misses a single item, missing on entire subscale
rowMeans function
It’s standard to report the internal consistency or reliability of a scale. How do we get this from the psych package?
By using the alpha() function.
The long way of typing it is psych::alpha() which is telling R we want to use the alpha function from the psych package. But mostly we don’t need to do this.
In this case we need to specify psych alpha because we have another package ggplot2 loaded up that also has an alpha() function. So need to make sure R doesn’t get too confused. But if was just using psych or ggplot2 package, not neccesary.
What does the .SD symbol in data table mean?
Currently selected data.
Whatever rows we picked and whichever columns specified by .SDcols, which we also listed at the end
Why is .SD needed in the row(Means) function?
because rowMeans() expects to be given a data set, not individual variables, but we are calling it already within db, a dataset, so we need some way of referring to a subset of the dataset within the data.table and the way we do that is with .SD
Under the system of ggplots2, are line plots and scatter plots different?
No, they are essentially the same. Both data mapped to x and y axis.
What is the difference between line plots and scatterplot data?
plotting symbol (geometries labelled geoms in R) in is a point or line.
What are aesthetics in ggplot2 and what do they do?
They control how geometries are displayed. For example, the size, shape, colour, transparency level all are aesthetics.
What do density plots do?
attempt to provide an empirical approximation of the probability density function (PDF) for data.
What is a probability density function (PDF)?
A probability density function always sums to one (i.e., if you integrated to get the area under the curve, it would always be one). The underlying assumption is that the observed data likely come from some relatively smooth distribution, so typically a smoothing kernel is used so that you see the approximate density of the data, rather than seeing exactly where each data point falls. Density plots show a univariate distribution.
If you have a small data set would you use a histogram or a dotplot?
Dotplot as provides greater precision.
If two data points would overlap, they are vertically displaced leading to another name: stacked dot plots.
What type of data does dotplot show?
Raw
What type of distribution do dotplots show?
Univariate
For basic univariate plots, do we need values on the y axis?
Not just on the x axis
How does the Y axis differ between a histogram and a density plot?
In a histogram the Y axis shows the COUNT of the observations in the bins, whereas on the Y axis in the density plot you see the relative frequency (more like a percentage) of observations