Week 3: Univariate Flashcards

Question 1

Q

In psychology, scales/subscales are scored in which two ways?

Answer

A

either the individual items are simply added together or the individual items are averaged.
Sometimes items are worded in the opposite direction and must first be reverse coded and then added/averaged.

Question 2

Q

What is the rule of thumb for scoring in data management?

Answer

A

Using the rowMeans function. If the scale is typically added up, then multiply.

List each item variable name separated by + for addition.

Downside, if a participant misses a single item, missing on entire subscale

rowMeans function

Question 3

Q

It’s standard to report the internal consistency or reliability of a scale. How do we get this from the psych package?

Answer

A

By using the alpha() function.

The long way of typing it is psych::alpha() which is telling R we want to use the alpha function from the psych package. But mostly we don’t need to do this.

In this case we need to specify psych alpha because we have another package ggplot2 loaded up that also has an alpha() function. So need to make sure R doesn’t get too confused. But if was just using psych or ggplot2 package, not neccesary.

Question 4

Q

What does the .SD symbol in data table mean?

Answer

A

Currently selected data.

Whatever rows we picked and whichever columns specified by .SDcols, which we also listed at the end

Question 5

Q

Why is .SD needed in the row(Means) function?

Answer

A

because rowMeans() expects to be given a data set, not individual variables, but we are calling it already within db, a dataset, so we need some way of referring to a subset of the dataset within the data.table and the way we do that is with .SD

Question 6

Q

Under the system of ggplots2, are line plots and scatter plots different?

Answer

A

No, they are essentially the same. Both data mapped to x and y axis.

Question 7

Q

What is the difference between line plots and scatterplot data?

Answer

A

plotting symbol (geometries labelled geoms in R) in is a point or line.

Question 8

Q

What are aesthetics in ggplot2 and what do they do?

Answer

A

They control how geometries are displayed. For example, the size, shape, colour, transparency level all are aesthetics.

Question 9

Q

What do density plots do?

Answer

A

attempt to provide an empirical approximation of the probability density function (PDF) for data.

Question 10

Q

What is a probability density function (PDF)?

Answer

A

A probability density function always sums to one (i.e., if you integrated to get the area under the curve, it would always be one). The underlying assumption is that the observed data likely come from some relatively smooth distribution, so typically a smoothing kernel is used so that you see the approximate density of the data, rather than seeing exactly where each data point falls. Density plots show a univariate distribution.

Question 11

Q

If you have a small data set would you use a histogram or a dotplot?

Answer

A

Dotplot as provides greater precision.

If two data points would overlap, they are vertically displaced leading to another name: stacked dot plots.

Question 12

Q

What type of data does dotplot show?

Question 13

Q

What type of distribution do dotplots show?

Answer

A

Univariate

Question 14

Q

For basic univariate plots, do we need values on the y axis?

Answer

A

Not just on the x axis

Question 15

Q

How does the Y axis differ between a histogram and a density plot?

Answer

A

In a histogram the Y axis shows the COUNT of the observations in the bins, whereas on the Y axis in the density plot you see the relative frequency (more like a percentage) of observations

Question 16

Q

On a QQ plot what is the Y axis showing?

Answer

Study These Flashcards

A

On the Y axis QQ plot is showing our sample of the variable (X in the dataset/baseline/whatever) and its plotting it against some theoretical distribution. So in the example below, how close to a normal (theoretical) distribution is it.

Question 17

Q

What is a univariate plot showing?

Answer

Study These Flashcards

A

The frequency or distribution of a single variable

Question 18

Q

What is a bivariate plot showing?

Answer

Study These Flashcards

A

The relationship between two variables

Question 19

Q

What do we use z scores for in checking the distribution of our data?

Answer

Study These Flashcards

A

If variable follows a normal distribution, we use z scores to identify extreme values or outliers.

Question 20

Q

In a density plot if the two lines (blue dotted and black) are close together what does this indicate?

Answer

Study These Flashcards

A

the variable is approximately normally distributed.

Question 21

Q

When there are multiple univariate distributions to view, would we use a histogram or density plot?

Answer

Study These Flashcards

A

Density plot as Histograms are difficult to view because they either are stacked, which makes interpretation more difficult or dodged which is visually difficult to see, or overplotted, which can hide some of the data.

Week 3: Univariate Flashcards

(21 cards)