Behavioural Analytics Flashcards
What does GAM stand for?
Generalised Additive Model
Why are GAMs useful in looking at emotion?
We have to understand how emotions change over time - looking at people interacting we often have time series data.
What are often used to look at emotional states?
Often use valence and arousal to look at the emotional state.
What is trace annotation?
Looks at videos that people are looking at and says what is their valence and arousal at any time.
This typically goes up and down. Our classical statistical techniques we use (eg linear models) aren’t great for this. Therefore we need GAMs.
What doe GAMs allow?
Allows you to analyse trace data in a way that is similar to the ideas within regression.
There are advanced modelling abilities.
What command in R is used for linear models?
lm()
What is the equation for a straight line?
y = mx + c
y = a + Bx + E
where E (epsilon) represents the errors around the line
In regression, what should we always be checking?
Our assumptions of linearity
Check the partial plots (in a multiple regression) or regression plots and investigate to see if the data has some curved nature.
Eg curvature in residuals vs the fitted values suggests a straight line is not a good way to capture the model.
What is Anscombes Quartet?
A visual warning that you should always visualise the data and do some EDA.
Different versions of data have similar statistics but are different.
What is a more modern version of Anscombes Quartet?
The Datasaurus dozen
What is Simpson’s Paradox?
A statistical phenomenon where an association between two variables in a population emerges, disappears or reverses when the population is divided into subpopulations.
What are other non-linear models like GAMs?
LOESS - locally estimated scatterplot smoothing
ARIMA - auto regressive integrated moving average
What is the function for a GAM?
y = a + f(x) + E
It is the regression function as before, but we swap out the single beta coefficient for a function.
This function allows us to come up with a way of capturing the data - splines that combine to make a non-linear smooth representation of the function.
Coefficients tell you about the nature of the basis functions, you had them together to give you an overall “wiggly” line.
What do we call the combination of basis functions?
A smooth
What is the line of code to produce a GAM model?
model <- gam(dependent ~ s(predictor), data)
What is different about gam() compared to lm()?
The predictor variable is wrapped in s() which instructs R to come up with a function which best fits this data.
Once we have created a GAM model, what functions do we call on the model?
- summary(model)
- coef(model)
- plot(model)
- gam.check(model, pages = 1)
In the output, what is the EDF?
Effective degrees of freedom
What does a very small p value represent?
When there is a good fit to a curve.
What does gam.check() do?
Checks to see if there is enough of a curve.
We don’t want a very small P value in the GAM check.
Higher p-values are preferred because they suggest the model residuals are well-behaved.
What should we change about the GAM model?
Change the basis functions or knots by adjusting the k argument within the smooth function. K controls the wiggliness of the lines.
model <- gam(response ~ s(predictor, k = 15), data)
What is concurvity?
An issue we need to deal with, the smooth equivalent of collinearity.
What determines the wiggliness of a smooth?
- The number of knots / basis functions
- The smoothing parameter - lambda
How do we change the lambda smoothing parameter?
Use the term sp = within the GAM specification
model <- gam(response ~ s(predictor, k = 15), sp = 0.1, data)
What other way can we change the smoothing?
REstricted Maximum Likelihood method
model <- gam(response ~ s(predictor, k = 15), method = “REML”, data)
How can we improve GAM plots?
- Adding confidence intervals
- Adding residuals
How can we improve GAM plots by adding confidence intervals?
Adding confidence intervals (variability bands) and shading them.
plot(model, se = TRUE, shade = TRUE, shade.col = “rosybrown2”)
How can we improve GAM plots by adding residuals?
plot(model, se = TRUE, shade = TRUE, shade.col = “rosybrown2”, residuals = TRUE, pch = 1, cex = 1)
How can you add in covariates to the GAM?
As it is an “additive” model, the separate components simply add to create the overall model.
Looking at more than one function in the same model
model <- gam(response ~ s(pred1) + s(pred2), data)
Or looking at them together as an interaction
model <- gam(response ~ s(pred1, pred2), data)
In addition to covariates in GAM, how else can you have a multivariate GAM?
Include linear variables
model <- gam(response ~ s(pred1) + pred2, data)
Include factor/categorical variables
model <- gam(response ~ s(pred1, by = sex), data)
Tensor product smooths can be made for 2D and spatial data. What do tensors allow for?
Two differing scales to interact
What function is used to visualise GAMs?
vis.gam()
What is a GAMM?
Generalised additive mixed model
They are the multi-level mixed model form. They are more sophisticated.
How do you initialise a new plot?
ggplot()
How do you change the “dot” of plotted points?
geom_point(shape = 1) creates a hollow circle
How do you plot lines on a plot?
geom_abline() - diagonal line
geom_hline() - horizontal line
geom_vline() - vertical line
How do you expand the axis in view of ggplot?
coord_cartesian(xlim = c(-1, 3), ylim = c(-1, 3))
Why may linear regression not be ideal for a real-world scenario?
In a real world data collection situation we never get data that falls along a straight line. We always have some aspects of the data that are not explained by the model.
What is a residual in linear regression?
The difference between the actual value and the value predicted by the model (y-ŷ) for any given point
How do you determine the predictions for data based on a model?
predict(model)
How do you determine the residuals of the data based on model predictions?
residuals(model)
How do you add the predicted values onto a plot?
+ geom_point(aes(y = predicted), shape = 1)
Predicted must be added to the dataframe
How do you add on vertical lines to show the difference between a predicted value and the actual value (residuals)?
+ geom_segment(aes(xend = Time, yend = predicted), alpha = 0.5)
What does geom_segment() do?
Draws a straight line between two points - we use it to show the residual
How do you add labels to show the residuals in a plot?
geom_text(aes(y = predicted + (residuals / 2),
label = paste0(round(residuals, 1))),
nudge_x = 0.5, size = 2)
nudge - so they don’t overlap with the points
What does adding fill = na to geom_smooth() do?
Removes the default shaded confidence interval
What is Anscombe’s quartet?
A statistical warning - it comprises four datasets that have nearly identical simple descriptive statistics, yet have very different distributions and appear very different when visualised.
What kind of non-linear relationship is better captured by Generalised Additive Models than linear models?
Curvilinear
Describe the data in the built-in anscombe data.
Columns 1-4 (x1, x2, x3, x4) contain x-values.
Columns 5-8 (y1, y2, y3, y4) contain corresponding y-values.
How do you get a fitted line to extend beyond the range of the data?
fullrange = TRUE
How do you plot the anscombe data?
anscombeData <- data.frame()
for (i in 1:4)
anscombeData <- rbind(anscombeData, data.frame(set = i, x =anscombe[,i], y = anscombe[,i+4]))
ggplot(anscombeData, aes(x, y)) +
theme_bw() +
geom_point(size = 3, color =”red”, fill = “orange”, shape = 21) +
geom_smooth(method = “lm”, fill = NA, fullrange = TRUE) +
facet_wrap(~ set, ncol = 2)
How do you account for randomness?
set.seed(123)
How do you add randomness to a curve?
randomError <- rnorm(mean=0, sd=0.5*sd(z), n=length(x))
Where z is the function outcome and x is the number of data points
What is the difference between se = FALSE and fill = NA in geom_smooth()?
se = FALSE - Removes Confidence Interval
fill = NA - Removes the Fill Colour of the Confidence Interval
What does geom_path() do?
Adds a line which connects the points in the order they are given in the data frame.
What should we always do when fitting a linear model?
Check our assumptions
What is the quick “cheat” way to get a smooth curve?
Using the geom_smooth() function.
What are the automatic methods of geom_smooth()
Loess if the number of observations < 1000
GAM if the number of observations > 1000
What are curvilinear lines also known as?
Smooths
What does LOESS stand for?
Locally estimated scatterplot smoothing
What is the disadvantage of using geom_smooth() over a GAM?
It is not as sophisticated.
It draws the line but does not give us an output to interpret and understand (we need the GAM package for this)
What package do we use for gams?
mgcv
What is nlme?
A mixed modelling regression package.
How is the wiggliness of a GAM curve determined?
The number of joining knots that are allowed in a spline
What is a spline?
A piecewise combination of lots of little cubic sections
How much data is there usually in emotion data?
Emotion examples use intensive longitudinal data so we typically have lots of data.
How do we check the number of basis functions or knots?
Using the p-value of the reported statistic as we can see it tells us “Low p-value (k-index<1) may indicate that k is too low”
When using the REML method, what happens to the P value when you re-run?
When using REML we have a p value that actually jumps around a bit.
If we run this more than once we can see that the p value changes. This is telling us that something is wrong as this in not reliable.
Which method should we use when running a GAM?
REML is probably best as a default
If you don’t specify this it defaults to using Generalised Cross-Validation (GCV)
What does it mean if the GAM is not stable?
How can you tell if the model is unstable?
It typically means that the model is overfitting, underfitting, or failing to converge properly.
If the model is unstable, the p-value from the k-check (which tests whether the basis dimension k is sufficient) changes each time you run the model. This variability is a sign that the model is not robust.
What is one way to check stability?
run gam.check(model)
What does gamSim() do?
The function gamSim() generates example datasets that are commonly used for demonstrating GAMs.
What does a lower GCV score suggest?
Better model generalisability
What is concurvity?
Concurvity is a term used in generalized additive models (GAMs) and similar non-linear models to describe a situation where two or more smooth terms (or predictor variables) are highly correlated in terms of their functional forms. In simpler terms, concurvity occurs when the smooth terms or predictors in the model are strongly related or move in similar ways, leading to potential issues with model estimation and interpretation.
What is physio dash?
An R Shiny App
What are the three parts of a Shiny app?
- Initialisation code
- A UI part
- A server part
What is the popular software design pattern, MVC?
Model–View–Controller (MVC)
The model is where the data is stored and manipulated - this is the server part in Shiny.
The View part is the code for the user interface.
The controller code glues these aspects together and deals with user inputs.
Why is simulated data beneficial to use?
It gives you some fast and straightforward data in a shape you define, this allows you to quickly show how data will work throughout the model and UI parts of your app.
In the physio_dash application, what do the sliders do?
Collect information concerning the nature of the simulated data - this information will be sent to the server part of the Shiny app to feed into the simulation functions.
Describe how time is created for the the physiological features when the “simulate_data” button is pressed?
It is created as a vector with the seq() function.
Time can be difficult to work with - one of the easiest ways is to treat it as an integer, using UNIX time/POSIX time.
How does UNIX deal with time?
By taking the number of seconds that have elapsed since Jan 01 1970 (UTC)
What is the arima.sim() function?
ARIMA stands for AutoRegressive Integrated Moving Average
arima.sim() simulates data for these kinds of models
In arima.sim(), what does the model argument take?
model = list(ar = input$autocorrelation)
It takes the input from the autocorrelation slider in our UI as the model and uses it as a simple autoregressive (ar) model.
How do you add an icon to the shiny app?
shiny::icon(“heart-o”)
How do you output a graph in the Shiny app?
UI - dygraphOutput()
Server - renderDygraph
What is a dygraph, what does this package do?
Dynamic graphic
Dygraphs produces nice dynamic interactive graphs for time series style data.
What are some ways that dplyr is used for tidy data manipulation in physio dash?
(Exercise 4 in summary)
data <- data %>%
dplyr::select(time,Heartrate) %>% # Keeps only time and Heartrate columns
dplyr::mutate(biometric = “HR”) %>% # Adds a new column biometric with a constant value “HR”, identifying the data type.
dplyr::rename(value = Heartrate) %>% # Renames “Heartrate” to “value”
dplyr::mutate(time_date = as.POSIXct(as.numeric(as.character(time))/1000, origin = “1970-01-01”, tz=”Europe/London”)) # Convert UNIX Timestamp to Human-Readable Time
return(data)
How is HTML code placed into the shiny app?
Using the tags command
What is a requirement for using the dygraphs package for time series data?
Time series data must be presented in xts format.
Type ?xts
We opt for a POSIX form of data representation, using the POSIXct class with the as.POSIXct() function.
What does the combination of dygraph and Shiny allow?
Interactivity
In physio dash, what is a difference in dygraphs for heart rate and ECG?
ECG processing detects peaks, extracts RR intervals, and computes NIHR, while HR processing simply plots BPM values.
ECG processing extracts HRV metrics from the raw ECG signal, while HR processing only displays BPM trends.
Additional plots in ECG - ECG processing allows advanced HRV visualizations, while HR processing is limited to basic HR plotting.
What is the features tab of physio dash for?
Taking a deeper dive into the data on an individual level
What is the biometrics tab of physio dash for?
Seeks to try and combine the data for a view of the different data streams all at the same time.
In a dygraph, how do you add a zoomable range selector?
dyRangeSelector()
In a dygraph, how do you change the colours?
eg within renderdygraph custom_palette <- c(“red”,”blue”,”orange”,”green”,”darkgreen”,”purple”) defined outside
within dygraph: dyOptions(colors = custom_palette)
How do you adjust the legend size within dygraph?
dyLegend(width = 500)
How do you input a conditional message for loading, if the data is taking time?
conditionalPanel(condition=”$(‘html’).hasClass(‘shiny-busy’)”,tags$div(“Loading…“,id=”loadmessage”)),
How do you add in a piece of text (not a heading/paragraph) to inform the user what a certain aspect is for?
helpText(“Click and drag on the plot to zoom and select date ranges”),
What does the descriptive tab of physio dash show?
The plot presents the time series together in a series of faceted plots
How do you extract and transform the galvanic skin response data for plotting in the descriptive tab?
The descriptive tab plots six plots (facet wrapped) so the GSR must be split into its two components
data_SCL <- data_GSR() %>%
select(time,SCL,time_date) %>%
mutate(biometric = “SCL”) %>%
dplyr::rename(value = SCL)
data_SCR <- data_GSR() %>%
select(time,SCR,time_date) %>%
mutate(biometric = “SCR”) %>%
dplyr::rename(value = SCR)
What is the difference between merge.zoo() and rbind()
merge.zoo - biometric plot
rbind - descriptive plot
Use rbind() when you want a tidy, long-format dataset for faceted plotting (ggplot2) (vertical stacking)
Use merge.zoo() when you need to align time series and create a wide-format dataset for dygraph()
How are GAMs utilised in physio dash?
Generalised Additive Mixed Models (GAMM) used to analyse biometric data and display key statistics in value boxes.
What does renderValueBox() allow you?
Create an infographic style display, and in particular we extracted summary statistics and present them in a best manner.
How do you fit a GAMM model for the physio data?
eg within renderValueBox()
gamm.HR <- gam(value~s(time),data=data_HR(),method =”REML”,correlation = corAR1())
summary_gamm.HR <- summary(gamm.HR)
summary_gamm.HR <- round(summary_gamm.HR$edf,2)
valueBox(
summary_gamm.HR, “Heart Rate”, icon = icon(“heartbeat”),
color = “red”
)
What does GAMM allowed compared to GAM?
This is a more complicated style of regression model that allows us to incorporate correlations in the data into the model.
This is useful where the independence assumptions of a regression are violated, we can bring them into the model as an approach to dealing with that violation.
What do we need to be aware of with time series data?
As these are time series data we have to be aware of autocorrelation and these gamm models allow us to deal with these issues at least to some extent.
What does gather() do?
eg tidyr::gather(key = emotion,value = emotion_score,c(“joy”,”fear”,”disgust”,”sadness”,”anger”,”surprise”))
Creates a longer format, where each row represents a time point and the corresponding score for each emotion.
The wide format data (where each emotion has its own column) isn’t suitable for plotting or time series analysis because it complicates visualizing how emotion scores change over time.
gather() converts the data into a long format, where you can easily plot emotion scores (emotion_score) against time (time), making it simpler to work with for visualization.
What is the difference when using readr::read_csv()?
The readr function creates a tibble rather than a dataframe.
This is the Tidyverse version of a data.frame/data.table that works in a way that is compatible with Tidyverse code
What is and how do you create a gauge chart?
A Gauge chart is a type of chart that uses a radial scale to display data in the form of a dial.
renderGauge()
What kind of machine learning is the inter rater reliability?
Supervised machine learning - we get an algorithm to learn the relationship that exists between some set of inputs and known outcome.
This is carried out on a training set of data, then we provide a novel set of data as an input and the algorithm makes a prediction or classification based on the learnings of the training set.
What does “ground truth” refer to?
In the context of machine learning, “ground truth” refers to the actual, true values or correct labels used for training, validation, or testing a machine learning model. Ground truth is the benchmark or standard that a model’s predictions are compared against. It is the “truth” that we rely on when training models or evaluating their accuracy.
It is an assumption or “operationalisation” of the truth - it typically depends on many theoretical assumptions that have been made when collecting material for the training set.
What do poorly defined assumptions result in?
A poorly defined and collected training set of data means the algorithms developed will not function well when given new data from the real world, as it will not perform according to the poorly-defined “ground truth”.
What should you do with a training set?
Reserve some of it as a test set.
We want to be able to test the extent to which the algorithm can do its job. We want to test it on previously unseen data. This also helps us to investigate if the algorithm is overfit to the training data.
What does it mean if the model is overfit?
If the algorithm is overfit to a particular set of data, it will perform very well on this data (eg even to go as far as accounting for the unique statistical anomalies) and will not generalise well to unseen data.
What is the disadvantage of labelled data?
Labelled data is usually expensive, labour intensive and takes a lot of time to create.
Humans are often the people who create labelled data sets, and in effect the machine learning algorithms are trying to copy the human behaviour and set of decision processes that went into labelling data.
Why is there a temptation to use as much of the training set as possible, thus minimising the amount of data for testing? Why is this a problem?
It is so expensive to create training sets and machine learning performs better with large quantities of data.
It will almost certainly lead to creating an algorithm that is overfit.
You need to choose the proportions of test and training data wisely.
What does testing on a test set allow?
Allows you to improve the accuracy of the algorithm.
How well the algorithm performs on the test set provides an error metric that you can use to improve your algorithm by changing the parameters, or adding components to more complex models.
What are synonyms for labelling?
Labelling, annotating, rating and coding
All mean someone observing data and making a judgement about it.
What is the gold standard for labelling?
Using well trained experts with a well defined coding scheme.
eg PhD students at universities have the time to create the labels or annotations.
Due to the intensity of the work, large numbers of raters are not normally possible.
What is one way to get more raters?
Use naive raters who do not need to be trained in as much depth.
This typically requires a sample coding scheme and payment for the recruits.
This is useful as control can be retained over the performance of the raters, but it takes a lot of organisation and teaching of the raters.
What is another alternative to expert or naive raters?
Crowdsourcing - using the web and internet to get access to a large number of raters.
- eg set up a website and use gamification to get people to do labelling
- eg use a paid crowdsourcing site eg Amazon Mechanical Turks, typically these tasks need to be very simple and there is very limited control over the people who participate
What is a coding scheme?
A coding scheme is a structured system used to classify, categorise, and interpret data.
It involves assigning labels, numbers, or categories to different elements of a dataset based on predefined rules.
- eg transcription - there could be variations in punctuation choices
What can be a problem with coding schemes?
There can be a lot of room for subjective judgement
What do we need to do in response to the fact that there can be a lot of room for subjective judgement?
We need to be able to check how well each of the subjective decision makers are in agreement to give some idea of the consistency of the coding scheme, and a measure of how objective its use its.
What are the two main ways coding can be done?
- Discrete (categorical or nominal) coding
- Continuous (ordinal, interval or ratio variables) coding
What is inter rater reliability (IRR)?
Inter-rater reliability (IRR) is a measure of how much agreement there is between multiple raters or observers. It’s used to ensure that data is consistent and reliable, regardless of who collects or analyses it.
What is Hallgren’s account of measurement error?
Observed score = True score + Measurement Error
Var(X) = Var(T) + Var(E)
ie the variance (the variability in the scores we observe) can be thought of in a useful way by realising that it is made up of the actual true score that we want to observe
How are IRR scores typically set up?
To give an estimate of how much of the true scores we are getting.
eg an IRR estimate of 0.8 indicates that 80% of the observed variance is due to the true score variance, or similarity in ratings between coders and 20% is due to error variance or differences in ratings between coders.
What is the most simple type of inter rater agreement?
Percentage rater agreement
- This captures the amount of times two raters agree in a very simple sense
% Agreement = (no. observations agreed by raters) / (total no. observations)
What does percentage agreement work easily for?
Categorical data.
It does not work so well for continuous data, where some sort of agreement interval is needed (ie tuning the continuous data into a form of categorical data).
What is the downside of percentage agreement?
It does not take into account chance agreement
When can chance agreement be a really big problem?
In a simple classification problem where there are only two categories - it is likely that a lot of agreement occurs by chance.
eg rating cells as cancerous or not - about 1 in 20 cells is expected to be cancerous (p = 0.05)
What function is used to tell us how similar ratings are?
The agree() function from the IRR package.
How do we account for chance agreement between raters?
Cohen’s Kappa
What does Cohen’s Kappa range from?
-1 to +1
0 - no agreement
1 - perfect agreement
Negative - systematic disagreement
How do we combine ratings of different ratings?
cbind()
Each column is a different rater and rows are subjects
What are the limitations of Cohen’s Kappa?
It is limited to cases where there is categorical data and two raters (nominal - Hallgren).
If we want to have faith in our ground truth, we would rather have it coming from more than the opinion of just two raters.
For continuous data with more than two raters, what is the appropriate statistic?
Intraclass correlation coefficient (ICC)
What is the code for Cohen’s Kappa?
kappa2()
Who proposed different ICCs?
Shrout and Fleiss
How many ways of calculating ICCs did Shrout and Fleiss suggest?
6 different ways
Appropriate depending on the characteristics of the data and the goals of the researchers
What is something to consider in ICC?
The spread of ratings across the whole data set.
Due to expense and time associated with rating, often sets of ratings are partially coded by different people.
Eg full coding: every rater coded every subject
Eg one rater has much more time available - a subset of the material is coded by other fathers to ensure that the main rater is coding according to the coding scheme (often 10% may be coded by others)
Eg (Often in online ratings with many naive raters) different subsets of material are rated by different people, in a way that means no single rater rates all the data.
ICC can handle all of these situations but you need to be aware of what style of rating you are using.
What is the ideal scenario for ICC?
Fully crossed - two way model
Describe the two way model.
We have information about all of the raters rating all of the subjects.
This allows you to see how the two things interact - the ratings and the subjects that they rate.
What are the four things to consider for ICC?
- If it is fully crossed or not
- How the ratings should be interpreted (absolute values or consistency of ratings)
- The way the coding is set up (average or single measures)
- Wether coders selected for the study are considered to be random or fixed effects
What is the difference between IRR and ICC?
Both Inter-Rater Reliability (IRR) and Intraclass Correlation Coefficient (ICC) measure agreement between raters, but they differ in what they measure and how they are calculated.
What does a fully crossed design take into account that a non-fully crossed model does not?
In a fully crossed design, the ICC can take into account systematic deviations between the coders because it has that information
Why are there different equations for fully crossed and not fully crossed models?
Models which are not fully crossed do not have enough information so the systematic deviation must be left out
When do we use a two way model and when do we choose a one way model?
- Two way model: when it is fully crossed
- One way model: when it is not fully crossed
What is the difference in equation for the fully crossed and not fully crossed?
- In the not fully crossed, only information about the ratings (r) can be used
- In the fully crossed, both information about the ratings (r) and raters/coders (c) can be used and there is in an interaction between them, making this a two way design (rc)
Discuss why you would want to know how the ratings should be interpreted.
- Sometimes we are interested in absolute values ie that the raters get the right value correct
- Eg the intensity of smiles
- Sometimes we are interested in the consistency of ratings ie how the ratings change, we want to see if values go up and down in the same way but it does not matter if the exact numbers are different.
Discuss why you want to consider the way people set up the coding.
- We can use the average of all the raters to calculate the ICC in a fully crossed design - we have enough information to use this and it will allow more confidence and a higher ICC as we have more of the relevant data
- When we use a subset of ratings to justify the ratings of a single coder, we have to say it is a single measures, which is a more conservative calculation
Discuss why you should consider if coders selected for the study are considered to be random or fixed effects.
- If coders are selected from a larger population and the ratings are meant to generalise to the population, can use random effects model - Random Model
- If you do not wish the generalise the results to a larger population of coders or if coders in the sample are not randomly sampled, use a fixed effects model (subjects considered random by coders considered fixed)
What are the different types of ICC we considered?
Different types of ICC exist depending on the study design and whether raters are considered random or fixed effects.
Shrout and Fleiss
What do A and C refer to in ICC notation?
Uses nomenclature from McGraw and Wong
C - consistency
A - absolute agreement
What is the code for computing the ICC?
eg ICC(1,1)
icc(dataicc1, model=”oneway”, type=”agreement”, unit=”single”)
What is one issue that can be difficult and common in ICC?
Missing data
How do we deal with missing data for ICC?
A package has become available on CRAN that helps deal with missing data
irrNA - copes with randomly missing data
What format does irrNA expect the data to be in?
Columns - raters
Rows - subjects
May need to transpose the data
How do we transpose the data?
t(data)
What is Krippendorf’s alpha?
A modern approach to IRR which can be used for all kinds of data (eg nominal, ordinal, interval, ratio etc).
It is newer and is therefore not as familiar as other methods (as not been adopted as widely).
It is also robust to missing values.
Where is Krippendorf’s alpha less flexible as the various types of intra class correlation?
Interval data
How do we conduct a Krippendorf’s alpha?
kripp.alpha()
Pass in data and “type” of data if applicable
May need to transpose the data
What is one of the advantages of Krippendorf’s alpha?
The ability to apply the same measurement across different forms of data.
What is the drawback of Krippendorf’s alpha?
It does not have the flexibility offered by the varieties of icc that we can engage in
When applied to interval data, what is Krippendorf’s alpha equivalent to?
ICC(1)
This is the special case Krippendorf’s alpha
How does the data expected for kripp.alpha() differ to that expected by irr()?
The data format expected by kripp.alpha() is pivoted from that expected by icc
What are regular expressions?
They are a way to describe a set of strings.
They allow us to create patterns that can then be used to search and replace very efficiently.
In regular expressions, what is the difference between [0-9] and [0-9]+?
[0-9] will find all numbers, but to find all numbers longer than 1 digit you need to add a plus at the end [0-9]+
[A-Za-z0-9]+ finds uppercase letters, lowercase and digits.
What is * in regular expressions?
A wildcard - it can match zero or more characters
What are the quantifiers in regular expressions?
- ?
What is + in regular expressions?
Matches something one or more times
Regex: go+d
Matches: god, good, goood (but NOT gd)
What is ? in regular expressions?
Matches something zero or one times
a? → Matches “”, “a” (but NOT “aa”)
colou?r → Matches “color” and “colour”
What is ‘\d” in regex?
Matches numbers in the same way as [0-9]
What is “\w” in regex?
matches any word character like [a-zA-Z], add a + to get full words.
How do we find words within a string?
str_detect()
In our example, we specified the column the text was in
str_detect(headlines$title, “word1|word2|word3”)
str_detect(headlines$title, “[0-9]+”) - for headlines with numbers
str_detect(headlines$title, “"[a-zA-z"]”) - for headlines with quotes - this requires the escape character \
How do we find the position of matched words?
str_match to return matched patterns
eg str_match(headlines$title, “word”)
How do we find matched lines?
str_subset to return matched lines
eg str_subset(headlines$title, “word”)
How do you replace matches with new text?
str_replace
str_replace_all(headlines$title, “Cameron”, “Pancake”)
How do we import the IMDB dataset?
from datasets import load_dataset
imdb_dataset = load_dataset(“imdb”)
How should we investigate a dataset?
dataset.shape
dataset.num_columns
dataset.num_rows
dataset.column_names
type(dataset)
What is one of the biggest barriers of natural language processing?
We have to get our text into a shape that can be used by these pre-trained models.
They each expect the text to come in a very precise format with special tokens added to the text that inform the model of the start and end of sentences for example.
eg BERT - get turned into tokens that have integer labels and there are a few special tokens that need to be used to delimit the boundaries of the sentences.
What are the tokens for BERT?
[CLS] - all sentences start with a special token
[SEP] - all sentences end with
[UNK] - when a word is unknown
[PAD] - fills out the empty space at the end of sentences, all of our sentences need to be the same length to keep our matrix square
How do we create the autotokeniser to prepare the data?
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(“bert-base-cased”)
This now contains the tokeniser associated with the “bert-base-cased” model
What kind of design can you use Cohen’s kappa on?
Fully-crossed designs with exactly two coders
What is a difference between Cohen’s Kappa and ICC?
ICCs incorporate the magnitude of the disagreement to compute IRR estimates.
Larger magnitude disagreements result in lower ICCs than smaller-magnitude disagreements.
Which ICCs tend to have higher values?
Average-measure ICCs higher than single-measure ICCs.
What kind of deletion for missing data does ICC use?
List-wise deletion
Therefore it cannot accommodate datasets in fully-crossed designs with large amounts of missing data - krippendorff’s alpha may be more suitable when problems are posed by missing data in fully costed designs.
When would you use average-measures?
When you have all subjects rated by all coders.
The researcher is likely interested in the reliability of the mean ratings provided by all coders.
What is the null hypothesis for IRR?
That ICC = 0
What is the code you need if a package is not installed?
install.packages(“readr”)
What outputs can you run after a model?
The sum of squares of the residuals
print(sum(curveData$residuals^2))
or anova(model)
How do you check the assumption of linearity?
plot(model, 1) - we want the first plot: residuals against the predicted model / fitted values
What function generates diagnostic plots to check if the model assumptions hold?
appraise(model)
How can you plot the model using gratia?
draw(curveGAMModel, rug = FALSE)
How do you extract the basis dimensions from the model using gratia?
model$smooth[[1]]$bs.dim
How should you investigate imported data?
data <- read.csv(“file.csv)
- str(data)
- glimpse(data)
- head(data)
What does REML help with?
model stability
How do you rearrange the data to select only columns y and x2, removing all others, sorting based on x2?
stableData1 <- dplyr::select(stableData1, y, x2) %>%
arrange(x2)
How do you check for concurvity?
concurvity(model, full = TRUE)
full = FALSE for pairwise comparison
Want things to be < 0.8
For adding a different smooth type, what should we look at?
?smooth.terms
What help commands can you use for the plots in physio dash?
?dygraphs