Week 3: Univariate Flashcards

1
Q

In psychology, scales/subscales are scored in which two ways?

A

either the individual items are simply added together or the individual items are averaged.
Sometimes items are worded in the opposite direction and must first be reverse coded and then added/averaged.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the rule of thumb for scoring in data management?

A

Using the rowMeans function. If the scale is typically added up, then multiply.

List each item variable name separated by + for addition.

Downside, if a participant misses a single item, missing on entire subscale

rowMeans function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

It’s standard to report the internal consistency or reliability of a scale. How do we get this from the psych package?

A

By using the alpha() function.

The long way of typing it is psych::alpha() which is telling R we want to use the alpha function from the psych package. But mostly we don’t need to do this.

In this case we need to specify psych alpha because we have another package ggplot2 loaded up that also has an alpha() function. So need to make sure R doesn’t get too confused. But if was just using psych or ggplot2 package, not neccesary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the .SD symbol in data table mean?

A

Currently selected data.

Whatever rows we picked and whichever columns specified by .SDcols, which we also listed at the end

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is .SD needed in the row(Means) function?

A

because rowMeans() expects to be given a data set, not individual variables, but we are calling it already within db, a dataset, so we need some way of referring to a subset of the dataset within the data.table and the way we do that is with .SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Under the system of ggplots2, are line plots and scatter plots different?

A

No, they are essentially the same. Both data mapped to x and y axis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between line plots and scatterplot data?

A

plotting symbol (geometries labelled geoms in R) in is a point or line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are aesthetics in ggplot2 and what do they do?

A

They control how geometries are displayed. For example, the size, shape, colour, transparency level all are aesthetics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What do density plots do?

A

attempt to provide an empirical approximation of the probability density function (PDF) for data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a probability density function (PDF)?

A

A probability density function always sums to one (i.e., if you integrated to get the area under the curve, it would always be one). The underlying assumption is that the observed data likely come from some relatively smooth distribution, so typically a smoothing kernel is used so that you see the approximate density of the data, rather than seeing exactly where each data point falls. Density plots show a univariate distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

If you have a small data set would you use a histogram or a dotplot?

A

Dotplot as provides greater precision.

If two data points would overlap, they are vertically displaced leading to another name: stacked dot plots.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What type of data does dotplot show?

A

Raw

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What type of distribution do dotplots show?

A

Univariate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

For basic univariate plots, do we need values on the y axis?

A

Not just on the x axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does the Y axis differ between a histogram and a density plot?

A

In a histogram the Y axis shows the COUNT of the observations in the bins, whereas on the Y axis in the density plot you see the relative frequency (more like a percentage) of observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

On a QQ plot what is the Y axis showing?

A

On the Y axis QQ plot is showing our sample of the variable (X in the dataset/baseline/whatever) and its plotting it against some theoretical distribution. So in the example below, how close to a normal (theoretical) distribution is it.

17
Q

What is a univariate plot showing?

A

The frequency or distribution of a single variable

18
Q

What is a bivariate plot showing?

A

The relationship between two variables

19
Q

What do we use z scores for in checking the distribution of our data?

A

If variable follows a normal distribution, we use z scores to identify extreme values or outliers.

20
Q

In a density plot if the two lines (blue dotted and black) are close together what does this indicate?

A

the variable is approximately normally distributed.

21
Q

When there are multiple univariate distributions to view, would we use a histogram or density plot?

A

Density plot as Histograms are difficult to view because they either are stacked, which makes interpretation more difficult or dodged which is visually difficult to see, or overplotted, which can hide some of the data.