Defining the model¶
Let's begin by defining a function that accumulates evidence in steps. This accumulation is stochastic, from a normal process with mean $\mu$ and standard deviation $\sigma$. The diffusion stops once the accumulated evidence passes one of the boundaries.
drift = function(mu = .1, sigma = .5, bounds = c(-10, 10), bias = 0){
d = bias
ds = d
while ((d < bounds[2] & d > bounds[1])){
e = rnorm(1, mu, sigma)
ds = c(ds, e)
d = cumsum(ds)[length(ds)]
}
return(cumsum(ds))
}
The function returns a single trial by default
plot(drift(), type = 'l')
However, it is trivial to use it to generate many trials, with help of the replicate function.
nTrials = 1000
data = replicate(nTrials, drift())
head(sapply(data, length)) #note how each trial has a different length
- 92
- 156
- 101
- 165
- 133
- 82
Let's try plotting them all
plotDrifts = function(data, bounds = c(-10, 10), bias = 0){
plot(data[[1]], type = 'l', xlim = c(1, max(sapply(data, length))), ylim = c(-1, 1)*max(sapply(data, max)),
xlab = 'Step', ylab = 'Evidence')
for (l in 2:length(data)){
lines(data[[l]], type = 'l')
}
abline(h = bounds)
abline(h = bias, lty = 'dashed')
}
plotDrifts(data)
We can also calculate the overall proportion of "correct" choices. In this case, trials in which evidence was larger than the first bound.
getAcc = function(trials){
return(mean(sapply(trials, function(x) sign(x[length(x)])) == 1))
}
getAcc(data) #perfect accuracy
Similarly, we can get the mean reaction time, as well as their histogram and density plots. We need to come up with a va lue that describes how much time it takes to take each step, however.
getRTs = function(trials, step = .1){
return(sapply(trials, length)*step)
}
rts = getRTs(data)
op = par(mfrow = c(1, 2))
hist(rts, breaks = 20)
abline(v = mean(rts), lty = 'dashed')
plot(density(rts), main = mean(rts))
abline(v = mean(rts), lty = 'dashed')
par(op)
Modeling bias: changing the starting point¶
The model can model bias in two ways. The first way is to shift the starting point for each difussion process (z, in the model), towards the biased response. The SDT equivalent of z is of course, c.
nTrials = 200
bias = 5
drate = .02
bounds = c(-10, 10)
abiased = replicate(nTrials, drift(mu = drate, bounds = bounds, bias = bias))
bbiased = replicate(nTrials, drift(mu = -drate, bounds = bounds, bias = bias))
plotDrifts(abiased, bias = bias)
plotDrifts(bbiased, bias = bias)
#Get accuracies
getAcc(abiased)
1-getAcc(bbiased)
#Show RTs
arts = getRTs(abiased)
brts = getRTs(bbiased)
plot(density(arts), main = c(mean(arts), mean(brts)), col = 'blue')
abline(v = mean(arts), lty = 'dashed', col = 'blue')
lines(density(brts), col = 'red')
abline(v = mean(brts), lty = 'dashed', col = 'red')
Modeling bias: Using different drift rates for each response¶
The second way to model bias is to change the mean of the normal process for the drift rates of each stimuli. If we assume normal trial-to-trial variation, with fixed standard deviation, then we can assign a larger drift rate to the biased response. In essence, we are expanding the drift rate parameter, $\mu$, as:
$$ \mu \sim Norm(\mu, \sigma_\mu) $$Were $\sigma_\mu$ is the standard deviation of this specific normal process (not to be confused with the $\sigma$ used in the evidence sampling process).
#Set parameters
adrift = .05
bdrift = -.0075
driftsd = .01
#Let's look at the drift distributions
hist(rnorm(1000, adrift, driftsd))
hist(rnorm(1000, bdrift, driftsd))
a = replicate(nTrials, drift(mu = rnorm(1, adrift, driftsd)))
b = replicate(nTrials, drift(mu = rnorm(1, bdrift, driftsd)))
plotDrifts(a)
plotDrifts(b)
getAcc(a)
1-getAcc(b)
Let's take a look at the RTs
arts = getRTs(a)
brts = getRTs(b)
plot(density(arts), main = c(mean(arts), mean(brts)), col = 'blue')
abline(v = mean(arts), lty = 'dashed', col = 'blue')
lines(density(brts), col = 'red')
abline(v = mean(brts), lty = 'dashed', col = 'red')
Quantile plots¶
A way to summarize RT distributions for correct and incorrect choices are quantile plots. On the x-axis, quantile plots specify the proportion of correct choices. Each line (usually 5) represents a quantile value in the RT distribution (.1, .3, .5, .7, and .9). Each point in the line represents an experimental condition. Let's say we are running a prevalence experiment with multiple prevalence values. We'll choose to simulate prevalence effects as a change in the starting point of the diffusion process.
Additionally, the starting point of each trial, z, is randomized on a trial-by-trial basis, by sampling uniformly from a range (zrange in the code).
Caution: I'm using the parallel package to run these using as many cores as possible. If you're running them yourself, you'll need about 3GB of free memory for the data alone.
#Set parameters
nTrials = 1000 #the smoothness of the final plot is very sensitive to the number of trials simulated (for obvious reasons)
bounds = c(-10, 10)
prevs = -7:0
musd = .012
zrange = .2 #for the uniform sampling
mu = .03
library(parallel)
no_cores = detectCores()
cl = makeCluster(no_cores)
clusterExport(cl, c('bounds', 'zrange', 'mu', 'musd', 'nTrials', 'drift'))
data = parLapply(cl, prevs, function(x)
replicate(nTrials, drift(mu = rnorm(1, mu, musd),
bounds = bounds,
bias = runif(1, min = x-zrange, max = x+zrange))))
It takes about 5 minutes to get the data, and it seems an eternity; I'm very spoiled.
With the data in hand, we can define a function to obtain the quantiles for each condition.
#define function for getting the quantiles of a given condition
getQuantiles = function(data, quantiles = seq(.1, .9, .2)){
#get RTs
rtdata = getRTs(data)
#identify correct responses
corrects = sapply(data, function(x) sign(x[length(x)]) == 1)
pcorr = mean(corrects)
#get quantiles for correct and incorrect choices
corr = quantile(rtdata[corrects], quantiles)
incorr = quantile(rtdata[!corrects], quantiles)
return(
data.frame(RT = c(corr, incorr),
q = quantiles,
pcorr = rep(c(pcorr, 1-pcorr), each = length(quantiles)))
)
}
#use sapply to do it all
plot_data = do.call('rbind',
sapply(1:length(prevs), function(x) cbind(prev = prevs[x], getQuantiles(data[[x]])), simplify = F))
Finally, plot the data.
library(ggplot2)
ggplot(plot_data, aes(x = pcorr, y = RT, colour = as.factor(q))) +
geom_line() + geom_point() + theme_bw() + labs(x = 'P(correct)', colour = 'Quantile') +
geom_line(data = plot_data[plot_data$prev == 0 & plot_data$q == .5, ], linetype = 'dashed', colour = 'black') +
geom_line(data = plot_data[plot_data$prev == -7 & plot_data$q == .5, ], linetype = 'solid', colour = 'black') +
geom_point(data = plot_data[plot_data$prev == 0 & plot_data$q == .5, ], aes(x = sum(pcorr*pcorr), y = sum(RT*pcorr)),
colour = 'black') +
geom_point(data = plot_data[plot_data$prev == -7 & plot_data$q == .5, ], aes(x = sum(pcorr*pcorr), y = sum(RT*pcorr)),
colour = 'black') +
coord_cartesian(xlim = c(0, 1))
Warning message: "package 'ggplot2' was built under R version 4.2.3"
Note that at each quantile line (different colors), points should be interpreted in pairs, from the middle (.5) and outwards in each direction. As one goes more and more outwards, the task becomes easier and easier (in this case, that means that the starting point becomes unbiased (i.e. z = 0)).
A very weird, non-monotonic plot. As we approach the maximum difficulty (which in this case, it would mean that the starting point for stimulus A is right on top of the bound for stimulus B), we reach the boundary, and the function flips. At that point, the model is undefined, and participants are infinitely fast in making incorrect choices, and infinitely slow in making correct choices.
Boundary aside, the DDM predicts that overall RTs should be faster on high prevalence trials, relative to low prevalnce trials. In the plot, this fact is illustrated by the solid and dashed black lines. The solid line connects the expected RTs on correct and incorrect trials, for high prevalence trials. On high prevalence trials, the probability of being correct is larger than the probability of being incorrect, of course, and so, the weighted average of RTs should approximate the expected RT for correct trials (black dot). Coversely, the dashed line connects the expected correct and incorrect RTs on low prevalence trials. Here, the probability of being correct is only slightly larger than the probability of being incorrect. Hence, the weighted average of RTs on these trials should be close to the mid point in the line (black dot).