Final exam Flashcards

Question

what is head()?

Answer 1

shows first 5 observations

Answer 2

the relational operator that evaluates whether two values are equal to each other. The output is a logical value: TRUE or FALSE

Answer 3

creates the contents of a new variable based on the values of an existing one: "if logical test is true, return this, else return that". Теоритично це дає можливість щось замінити на щось. Накриклад, якщо у нас текстовий показник, ми його можемо замінити на цифровий. ifelse(data$var=="yes", 1,0)

Answer 4

data$var<-ifelse(data$var=="yes", 1,0) ifelse() is just an example

Answer 5

[] is the operator used to extract a selection of observations from a variable. To its left, we specify the variable we want to subset. Inside the [] we specify the creation of selection. Example: data$var1[data$variable2==1]

Answer 6

in the same unit of measurement as the outcome variable

Answer 7

in percentage points (after multiplying the result by 100)

Answer 8

1. the predictor (X): variable that we use as the basis for our predictors 2. the outcome variable (Y): variable that we are trying to predict based on the values of the predictor

Answer 9

the values of Y we predict based on (i) the fitted model that summarizes the relationship between X and Y in a dataset where we observe both X and Y for each observation and (ii) the observed values of X

Answer 10

1. fit a model: (i) we observe both X and Y (ii) we summarize the relationship between the average Y and X with a model 2. make predictions: (i) we observe X but not Y (ii) we compute Yˆ by plugging the observed values of X into the fitted model

Answer 11

it measures how far our prediction is from the observed value; it's the difference between the observed outcome and predicted outcome

Answer 12

Yi = α + βXi + єi Yi - is the outcome for observation i α (the Greek letter alpha) - is the intercept coefficient β (the Greek letter beta) - is the slope coefficient Xi - is the value of the predictor (or independent variable) for observation i єi (pronounced epsilon sub i) - is the error for observation i

Answer 13

the intercept and the slope

Answer 14

Yˆ when X=0. Increasing and decreasing the intercept moves the line up and down

Answer 15

a line specifies the angle or stepless of the line. Associated with on-unit increase in X

Answer 16

indetifies the line that minimizes the 'sum of the squared residuals"

Answer 17

dim(dataset_name)

Answer 18

plot(data$x_var, data$y_var)

Answer 19

it ranges from -1 to 1 and summarizes the direction and strength of the linear association between two variables

Answer 20

cor(data$var_1, data$var_2)

Answer 21

1. If the correlation coefficient is positive (closer to 1), it means the two variables tend to increase or decrease together. When one goes up, the other tends to go up, and when one goes down, the other tends to go down. 2. If the correlation coefficient is negative (closer to -1), it means there's an inverse relationship. When one variable goes up, the other tends to go down, and vice versa. 3. A correlation coefficient close to 0 suggests a weak or no linear relationship.

Answer 22

lm(data$y_var ~ data$x_var)

Answer 23

it computes the natural logarithm of the argument specified inside the parentheses. this a kind of a transformation that will make the variable of interest more normally distributed and in turn improve the fit of the line to the data

Answer 24

hist(data$variable)

Answer 25

it's a fitted linear model in which both Y and X have been log-transformed. In this model, we interpret ß as the predicted percentage change in the outcome associated with an increase in the predictor of 1 percent

Answer 26

ranges from 0 to 1 and measures the proportion of the variation of the outcome variable explained by the model. The higher the Rˆ2, the better the model fits the data.

Answer 27

it equivalent to the correlation between X and Y squared: Rˆ2=cor(X,Y)ˆ2 based on this it becomes clear that the higher the correlation between X and Y (in absolute terms) the better the model fits the data.

Answer 28

Estimate=Estimand+Bias+Noise

Answer 29

the number we get as a result of our analysis

Answer 30

is the true quantity of interest in the population that we are trying to learn about. Out hope is that our estimate closely approximates our estimand

Answer 31

errors that occur for systematic reasons

Answer 32

idiosyncratic error (random mistake) that occur because of chance

Answer 33

refers to the natural differences or fluctuations that can occur when you take different samples from the same population

Answer 34

1. Randomization of treatment status 2. careful selection of units and measurements

Answer 35

tells you how much you can expect that sample mean to vary from the actual population mean. 1. If the standard error is large, then the estimates would be very spread out and the estimator is relatively imprecise 2. if the standard error is small then the estimates would be very close together and the estimator is relatively precise

Answer 36

a strategy for assessing the probability of getting a result as extreme as yours under the assumption that the null hypotheses is true.

Answer 37

This is a statement of no effect or no difference. It represents the status quo or the idea that there is no change or no relationship in the population.

Answer 38

the probability that we observe a value of the test statistic at least as extreme as the one we actually observed if the (sharp) null hypothesis is true A smaller p-value provides stronger evidence against the null

Answer 39

the range od values that is likely to include the true value of the parameter (estimand)

Answer 40

determines the rejection threshold of the test and characterizes the probability of false rejection of the null hypothesis

Answer 41

provides a table with the following statistics related to a fitted linear model: estimated regression coefficients, standard errors, test statistics and two-sided p values

Answer 42

EST+/-1.96*SE

Answer 43

a variable that affects both the likelihood of receiving the treatment and the outcome

Answer 44

treatment and control groups are not comparable, correlation don't necessarily imply causation and the DiM estimator don't provide a valid estimate of the average treatment effect

Answer 45

by randomly assigning treatment we break the link between any potential confounders and the treatment variables, thereby eliminating all potential confounding variables

Final exam Flashcards

(70 cards)