R Details Flashcards

Question

Histogram code

Answer 1

>hist(vector)

Answer 2

> dataset = read.table(“name of file”, header = TRUE) >attach(dataset) >dataset* *this will show you the attached data set >summary(dataset)

Answer 3

Nominal or frequency data 2 categories

Answer 4

Nominal or frequency More than 2 categories

Answer 5

Interval or ratio data and measures with a reasonably normal distribution 2 conditions Testing hypotheses about: Correlation - relationship between two dependent variables

Answer 6

Interval or ratio data and measures with a reasonably normal distribution 2 conditions Testing hypotheses: Regression - effect of an independent variable upon a dependant variable

Answer 7

Interval or ratio data and measures with a reasonably normal distribution 2 conditions Testing hypotheses about Means Independent measures design

Answer 8

Interval or ratio data and measures with a reasonably normal distribution 2 conditions Testing hypothesis about- means Matched measures or repeated measures designs

Answer 9

Interval or ratio data and measures with a reasonably normal distribution More than 2 conditions Testing hypotheses about - means Difference between means Null hypotheses= there is no significant difference between the means of two conditions

Answer 10

Interval or ratio data and measures with a reasonably normal distribution More than 2 conditions Testing hypotheses about - regression (effect of 2 or more independent variables upon a dependant variable)

Answer 11

Ordinal data or non-normal distribution of measure 2 conditions Testing hypotheses - correlation - relationship between two dependent variables

Answer 12

Ordinal data or non-normal distribution or measure 2 conditions Testing hypotheses - medians Independent measures

Answer 13

Ordinal data or non normal distribution of measure 2 conditions Testing hypotheses about - medians Repeated measures

Answer 14

Ordinal data or non-normal distribution of measure More than 2 conditions Non-parametric analysis of variance Independent measures

Answer 15

Ordinal or non-normal distribution of measure More than 2 conditions Non- parametric analysis of variance Repeated measures

Answer 16

Take on any value within a given range There are an infinite number of possible values, limited only by our ability to measure them eg distance

Answer 17

Only certain distinct values within a given range The scale is still meaningful - cant have half numbers

Answer 18

One in which the value taken by the variable is a non numerical category or class

Answer 19

Is a categorical variable in which the categories imply some order or relative positive Numerical values are usually assigned but 4 is not necessarily twice as many as two

Answer 20

1. Use intervals of equal length with midpoints at convenient round numbers 2. For small data sets, se a small number of intervals 3. For large data sets , use more intervals

Answer 21

Allow a summary of the data , retaining the original values 1. Stem consists of a column of figures, omitting the last digit 2. Add the final digit of each weight in the final row 3. Put the “leaves” in order

Answer 22

Based on the median Divides the data into four equal groups and looks to see how far apart the extreme groups are 1. Put the data in numerical order 2. Find the overal median. Divide the data set in two subsets with an equal size. If n for the whole set of data is odd, out the overall median in both subsets 3. Find the median for the lower groups . This is the first quartile 4. Find the median for the upper group. This is the third quartile Interquartile range is IQR=Q3-Q1

Answer 23

A way to illustrate the IQR A good way to demonstrate the differences between groups

Answer 24

A measure of spread around the mean A bit like the average of the data from the mean

Answer 25

Is the numerical outcome of a random experiment

Answer 26

Variable is one with just two possible outcomes eg a single toss of a coin - one outcome = success and the other= failure

Answer 27

Variously - wide and flat Or Narrow and high

Answer 28

Suitable for frequency data: counts of things Do the number of individuals in different categories fit a null hypothesis of some sort (the expectation)

Answer 29

Apply where there are only two categories of data (Eg. Male and female) Substract 0.5 from each value of O-E ignoring the sign IO-EI-0.5 Continue rest of calculation as normal

Answer 30

Non parametric alternative to the unpaired t-test Tests for the significant difference between the median of two independent groups Use this test when one or both groups have non-normal distribution

Answer 31

Non parametric alternative to the paired t-test Tests for a significant difference between the medians of paired observations Use this test when one or both groups have a non normal distribution (or cannot be induced to be normal)

Answer 32

Non-parametric one-way analysis of variance Non-parametric alternative to one way ANOVA

Answer 33

Non-parametric to way analysis of variance - alternative Used to detect differences in medians between three of more treatments of the same subject Wide variations of the standard deviations for rows or columns of a data matrix suggest that we cannot use parametric ANOVA

Answer 34

Based on assumptions about the distributions of population from which the sample was taken Evaluate hypotheses for a particular parameter usually the population mean Quantitative data Require assumptions about the distributional characteristics of the population distribution - normal data - equal variance More powerful than non parametric test when assumptions are met

Answer 35

Evaluate hypotheses for entire population distributions Quantitative,ranked qualitative data Require no assumptions (distribution free) so used with non normal distributions and when variance of the groups are not equal Generally easy to compute

Answer 36

Paired t test Unpaired t test Pearson correlation ANOVA

Answer 37

Wilcoxon rank sum test Mann-Whitney U test Spearman correlation Kruskal Wallis test Friedman

Answer 38

1. A way to find hierarchical patterns of similarity between sets of objects 2. Not a test. There is no null hypothesis. No assumption about the distribution of the data

Answer 39

You have objects or things described by a large number of continuous or discrete variables Some implementations also work with ordinal or categorical variables Allows you to visualise this graphically (dendogram or tree)

Answer 40

1. Data transformation (eg Z-scores) 2. Matrix of similarities , differences or distances (eg Euclidean) 3. Clustering algorithm (eg UPGMA, average neighbour)

Answer 41

1. A data reduction technique 2. Not a test. There is no null hypothesis. No assumption about the distribution of the data

Answer 42

You have objects or things described by a large number of continuous or discrete variables (not ordinal or categorical) You want to explore the differences between the objects as measured by all the variables simultaneously Allows you to visualise this graphically (space- filling model)

Answer 43

1. An extension of linear regression to situations where there is more than one independent variable 2. A data reduction technique. Seeks to explain a reasonable fraction of the variance in the dependent variable using only some of the independent variables

Answer 44

You have objects or things described by a large number of continuous or discrete variables These are distributed reasonably normally

Answer 45

The variance test is sensitive to departures from normality

Answer 46

Temperature = random ANOVA

Answer 47

Barely = 2 varieties

Answer 48

categories ranked and have equal spacing between adjacent values only ratio scaled have a true zeros - zero is treated as a point of origin

Answer 49

Binomial / chi squared 2 conditions: Pearson product Simple linear regression T-test (unpaired and paired)

Answer 50

Multiple hypothesis describe variability among observed , correlated variables

Answer 51

Piersons regression = straight line Spearman’s rank= only has to be correlated

Answer 52

A regression model that estimates the relationship between one independent variable and one dependent variable using a straight line

Answer 53

Reliable method of identifying which variables have an impact on a topic of interest. the process of performing a regression allows you to confidently determine which factors matter most , which factors can be ignored and how the factors influence each other

Answer 54

Chi squared ANOVA Kruksal wallis Friedman

Answer 55

Regression model that estimates the relationship between a quantitative dependent variable and 2 or more independent variables using a straight line

Answer 56

For variables that are strongly correlated PCA technique is using in processing data where multi-linearity exists between the features / variables

Answer 57

1- calculated difference between groups 2- absolute differences 3- rank absolute differences

Answer 58

Ordinal data violates the assumption of normal distribution - categories within a variables that have a natural rank order

Answer 59

Tells you the degree of spread in your data set

Answer 60

Sees if the variance of 2 populations from which the samples have been drawn is equal or not

Answer 61

SSyy= variation explained by regression SSR= regression SSE= error

Answer 62

Sum of the squared estimate of errors

Answer 63

Is the line with the smaller SSE

Answer 64

X= predictor variable Y= response variable Y= a+bx

Answer 65

Regression analysis is a set of statistical methods used for the estimation of relationships between a dependent variable and one or more independent variables It can be utilised to assess the strength of the relationship between variables and for modelling the future relationship between them

Answer 66

SSR is the additional amount of explained variability in Y due to the regression model compared to the baseline model

Answer 67

It is the phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy It is a problem because it undermines the statistical significance of an independent variable

Answer 68

The coefficient will be less statistically significant

Answer 69

Is a tool for exploring the structure of multi variate data Data reduction technique - allows us to reduce the number of variables to a manageable number of new variables or components

Answer 70

Variables must be continuous or on an interval scale

Answer 71

Covariance matrix - applies more weigh to some variables than others Correlation matrix - expression each variable with equal weight

Answer 72

1 mean value is significantly different to a set mean

Answer 73

Test whether unknown population means are equal or not

Answer 74

2 different categories eg different weights of lemurs Or how many carrots boys and girls eat

Answer 75

Speed of a human wearing 1 type of shoe compared to another The measurement use be paired due to different running speeds of humans no matter the shoe type

Answer 76

Only one way the results can go - directional results The area of distribution is (for example) greater than the value specified in the null hypothesis

Answer 77

Critical area of distribution is two sided and tests whether a sample is greater than or less than a certain range of values Group higher scored higher or lower than Group B

Answer 78

Pubs in two towns = pre defined clusters (already close together) Geographical mid point of all Swindon pubs and the mid point of bath pubs and measure that distance = centroid clustering Average distance between every pub in Swindon and every pub in bath = average linkage clustering Closest pair of pubs one from each town = single linkage or nearest neighbour clustering Take the most distant pair = complete linkage clustering

Answer 79

Condense univariate distances down to a single number Add them up: manhattan / duty block distance Or Euclidean distance = square root of the sum of the squares of univariate distances

Answer 80

Rank observations as if they were single sample - eg smallest to largest

R Details Flashcards

(104 cards)