Lecture 1 Flashcards
behavioural data science =
Behavioral Data Science is a multidisciplinary
scientific field that aims to facilitate
understanding, prediction, and change of
human behavior through the analysis of
behaviorally defined variables as they arise in
large datasets (“Big Data”), typically gathered
using modern digital technology (e.g., online
or through mobile devices) and analyzed with
techniques for detecting patterns from high-
dimensional data (e.g., machine learning).
belangrijk ding hieraan
- multidisciplinary field: crossroad of methodology, psychology, mathematical modeling and statistics.
3 doelen van bds
facilitate understanding, prediction, and change of human behavior
understanding =
CONSTRUCTION OF PSYCHOLOGICAL THEORIES TO EXPLAIN BEHAVIOR
prediction =
APPLICATION OF STATISTICAL MODELS TO PREDICT BEHAVIOR
change =
DEVELOPMENT OF INTERVENTIONS TO CHANGE BEHAVIOR
Human behaviour is at the root of
many of the most central problems of
our time: COVID-19 spread and
climate change but also war and
famine have important behavioural
components
oke
Human behaviour = (Skinner, 1987)
“is possibly the most difficult subject ever submitted to scientific analysis”
Yet standard methods to study it are remarkably simple: questionnaires, tests, and small scale experiments
dus concept = complicated, methods zijn (te?) simpel
hoe noemen we de deze era voor social sciences
The golden age of social science
twitter experiment segregation
Example of a polarized and segregated network on Twitter. The network visualizes retweets of political hashtags from the 2010 US midterm elections. The nodes represent Twitter users and there is a directed edge from node i to node j if user j retweeted user i. Colors represent political preference: red for conservatives and blue for progressives
wat liet twitter zien
- faster connections between people
- but these contacts have the tendency of polarization
variables are…
abstract entities!!! they are made up.
phenomena=
patterns in data, robust features of the world
data =
representations of observations
- Observation example:
“Pete correctly solved IQ
test item 36” - Representation: the row
that represents Pete has a
1 in the column that
represents the IQ item - Typically, data are
structured in rows and
columns, i.e., in a
spreadsheet
oke
rows =
cases
columns =
represent features/properties/attributes
verschil data en phenomena
Phenomena are not themselves data! Rather, phenomena are evidenced by patterns in the data. Because psychology is very complex, we often need advanced statistical
models to “see” the patterns data = observations. phenomena are robust, does not matter what kind of methods you applied (where, what year etc).
voorbeeld phenomena
For instance, the positive
manifold of intelligence,
the robust correlation
between insomnia and
depression, the effect of
time pressure on
accuracy
positive manifold =
refers to the fact that scores on cognitive assessment tend to correlate very highly with each other, indicating a common latent dimension that is very strong.
explanatory theory=
is a set of principles that aims to explain phenomena
It describes a world in which the phenomena would follow “as a matter of course”
Coming up with a good theory is a creative act,
but it can be systematized and practiced
Ideally, in behavioral data science we are after
mathematically formulated models
oke
dus verschil theory phenomena and data?
theories explain phenomena, and phenomena are evidenced by the data
Identical twins’ cognitive test scores are more similar
than those of fraternal twins. This feature is best represented as
a) data
b) a phenomenon
c) an explanatory theory
b
lexical decision task=
- Participants have to decide whether a letter string
is a word (e.g., tango) or a nonword (e.g., drapa). - Participants usually decide by pressing a
keyboard key with their index fingers. - Participants may judge hundreds or even
thousands of letter strings in a single session. - Usually the stimulus set contains 50% words.
- Performance on this task is supposed to
measure the ease with which lexical
representations are activated from memory. - For instance, performance is better for high-
frequency words (e.g., cat) than for low-
frequency words (e.g., feline). - Participants are usually told to do this as
quickly and accurately as possible. - Key dependent variables of interest are
response time (RT) and accuracy (proportion
correct responses).
lexical decisions task resultaten
older adults are slower than younger adults (na 30 ongeveer)
how is this result explained on ldt
This decrease of response speed is explained by
the “general slowing” hypothesis, which says that
all cognitive processes operate more slowly in
older adults.
Maybe age-related demyelinization harms basic
neural transmission speed?
wat als je alleen kijkt naar de response tijd
the speed-accuracy trade off wordt dan geen rekening mee gehouden -> daarom moet je ook kijken naar hoeveelheid goed!
dus niet alleen kijken naar response time!!1 dat zou bias zijn voor mensen die rustiger aan doen met betere accuracy.
dus dan process model gebruiken -> bv ratcliffs diffusion model
ratcliff diffusion model
- A model that describes how
noisy evidence is accumulated
over time. - A model that describes the data
from simple decision making
experiments (dus gaat over een paar seconden decision making) - A model that allows manifest
behavior to be decomposed into
latent psychological processes.
diffusion process in action
door raam licht little particles -> voorbeeld van een random walk.
wat is sequential sampling en drift rate
- In the model, noisy information is accumulated over time (=sequential sampling).
- The deterministic or signal component of this
noisy process is called the drift rate.
stel dat je een woord ziet
dan is er alsnog randomheid, maar dan ga je al snel naar een decisison. bij een lastig woord is het moeilijker en dan is er eerst meer randomheid.
what do you assess in the model
the distance of the boundary to the word to the red line.
waar leidt repeated draws to
Repeated draws from the underlying lexical
dimension drive a noisy accumulation of
evidence.
After some time the accumulated evidence
reaches a predetermined threshold amount,
and the corresponding response is initiated.