ADMUR: Ancient Demographic Modelling Using Radiocarbon
This vignette provides a comprehensive guide to modelling population dynamics using the R package ADMUR, and accompanies the publication ‘Directly modelling population dynamics in the South American Arid Diagonal using 14C dates’, Philosophical Transactions B, 2020, A. Timpson et al. https://royalsocietypublishing.org/doi/10.1098/rstb.2019.0723
Throughout this vignette, R code blocks often use objects created earlier in the vignette in previous code blocks. However, the manual for each function provides examples with self sufficient R code blocks.
The motivation for creating the ADMUR package is to provide a robust framework to infer population dynamics from radiocarbon datasets, given the uncontroversial assumption that (to a first order of approximation) the archaeological record contains more dateable anthropogenic material from prehistoric periods when population levels were greater. Unfortunately, the spatiotemporal sparsity of radiocarbon data conspires with the wiggly nature of the calibration curve to encourage the overinterpretation of such datasets, often leading to colourful but statistically unjustified interpretations of population dynamics. No statistical method can (or ever will) be able to perfectly reconstruct the true population dynamics from such a dataset. ADMUR is no exception to this, but provides tools to infer a plausible yet conservative reconstruction of population dynamics.
ADMUR discretises time to calculate model likelihoods numerically, and uses the Bayesian framework to obtain posterior probabilities of population parameters by combining a prior with the likelihood. However, given the recent fetishization of the word ‘Bayesian’ in archaeology, it is worth reflecting on exactly what this means. The Bayesian approach provides a superb framework for justifying knowledge as beliefs that are being constantly updated in the light of new evidence. This implies that objective knowledge does not exist. Instead, given a set of observations, every observer should form a different conclusion depending on their prior beliefs. This makes intuitive sense; a child may believe they have witnessed a victim being sawn in half at a magic show, whilst a parent is unlikely to believe it was real given their strong prior beliefs formed over many years of absorbing data that informs on what is anatomically possible with a human, the concepts of showmanship, and the punishments for murder. Therefore, the beauty and great power of the Bayesian approach is in providing a sensible means of making inferences when there are strong prior beliefs and only a small amount of new data. Priors are, by their very nature, subjective. The likelihood is not (the likelihood of something being true is defined as the probability of the data give that thing is true). In Bayesian inference, the most challenging part is usually formulating the likelihood function. Once obtained as a mathematical function, making the inference Bayesian is trivial. Both the frequentist and Bayesians agree on the likelihood, but in the extreme case on no data, the frequentist cannot provide an estimate, whilst the Bayesian recovers her prior. In the opposite situation where there are weak priors and a lot of data, the priors are ‘washed out’ by the new data, and the Bayesian posterior becomes indistinguishable from the frequentist estimate; both are data-dominated estimates (i.e. dominated by the likelihood). In the case of demographic modelling, we are fundamentally interested in what the data has to tell us, and are not interested in posteriors dominated by one researcher’s strong priors and a tiny dataset. Therefore ADMUR applies a weak prior that is uniformly distributed across parameter space, and there is deliberately no provision for the user to change this prior.
The ADMUR package can be installed directly from the CRAN in the usual way:
Alternatively it can be installed from GitHub, after installing and loading the ‘devtools’ package on the CRAN:
Either way, the ADMUR package can then be locally loaded:
A summary of the available help files and data sets included in the package can be browsed, which include a terrestrial anthropogenic 14C dataset from the South American Arid Diagonal:
Datasets must be structured as a data frame that include columns ‘age’ and ‘sd’, which represent the uncalibrated 14C age and its error, respectively.
## UniqID site lat long age sd LabNo Material_D
## 1 237 Villa del Mar -17.62295 -71.34017 6360 60 Beta-71133 bone
## 2 240 Yara -17.51998 -71.36716 5020 60 Beta-80970 bone
## 3 505 El Ahogado -17.96667 -70.88333 3515 40 Pa-1769 charcoal
## 4 506 El Ahogado -17.96667 -70.88333 3535 60 Pa-1768 charcoal
## 5 507 El Ahogado -17.96667 -70.88333 3660 40 Pa-1789 charcoal
Citations are available as follows:
## To cite the ADMUR package in publications please use both the
## following:
##
## A. Timpson, R. Barberena, M. G. Thomas, C. Méndez, K. Manning.
## Directly modelling population dynamics in the South American Arid
## Diagonal using 14C dates. 2020. Philosophical Transactions of the
## Royal Society B.
## https://royalsocietypublishing.org/doi/10.1098/rstb.2019.0723
##
## ADMUR: Ancient Demographic Modelling Using Radiocarbon. Adrian
## Timpson. University College London. Research Department of Genetics,
## Environment and Evolution (GEE), Darwin Building, Gower Street,
## London, WC1E 6BT. 2020 https://github.com/UCL/ADMUR
##
## To see these entries in BibTeX format, use 'print(<citation>,
## bibtex=TRUE)', 'toBibtex(.)', or set
## 'options(citation.bibtex.max=999)'.
The algorithm used by ADMUR to calculate model likelihoods of a 14C dataset uses several functions to first calibrate 14C dates. These functions are also intrinsically useful for 14C date calibration or for generating a Summed Probability Distribution (SPD).
Generating a single calibrated date distribution or SPD requires either a two-step process to give the user full control of the date range and temporal resolution, or a simpler one step process using a wrapper function that automatically estimates a sensible date range and resolution from the dataset, performs the two step process internally, and plots the SPD.
Notice the function assumes the data provided were all 14C dates. However, if you have other kinds of date such as thermoluminescence you can specify this. Non-14C types are assumed to be in calendar time, BP. You can also specify a particular calibration curve:
Generating the SPD without the wrapper gives you more control, and requires a two-step process:
This is useful for improving computational times if generating many SPDs, for example in a simulation framework, since the CalArray needs generating only once.
data <- data.frame( age = c(9144), sd=c(151) )
CalArray <- makeCalArray( calcurve=intcal20, calrange=c(8000,13000) )
cal <- summedCalibrator(data, CalArray)
plotPD(cal)
The CalArray is essentially a two-dimensional probability array of the calibration curve, and can be viewed using the plotCalArray() function. Calibration curves vary in their temporal resolution, and the preferred resolution can be specified using the parameter inc which interpolates the calibration curve. It would become prohibitively time and memory costly if analysing the entire 50,000 year range of the calibration curve at a 1 year resolution (requiring a 50,000 by 50,000 array) and in practice the default 5 year resolution provides equivalent results to 1 year resolution for study periods wider than c.1000 years.
It is worth noting that the algorithm used by this package to calibrate 14C dates gives practically equivalent results to those from OxCal generated using oxcAAR and Bchron
However, there are two fringe circumstances where these software programs differ substantially: at the border of the calibration curve; and if a date has a large error.
Consider the real 14C date [MAMS-13035] https://doi.org/10.1016/j.aeae.2015.11.003 age: 50524 +/- 833 BP calibrated through intcal13, which only extends to 46401BP. Bchron throws an error, whilst OxCal applies a one-to-one mapping between Conventional Radiocarbon (CRA) time and calendar time for any date (mean) beyond the range of the calibration curve. The latter is in theory a reasonable way to mitigate the problem, however OxCal applies this in a binary manner that can create peculiarities. Instead ADMUR gradually fades the calibration curve to a one-to-one mapping between the end of the curve and 60,000 BP.
A 14C date is typically reported as a mean date with an error, which is often interpreted as representing a symmetric Gaussian distribution before calibration. However, a Gaussian has a non-zero probability at all possible years (between -∞ and +∞), and therefore cannot fairly represent the date uncertainty which must be skewed towards the past. Specifically, if we consider the date in CRA time, it must have a zero probability of occurring in the future. Alternatively, if we consider the date as a 14C/12C ratio, it cannot be smaller than 1 (the present). Therefore ADMUR assumes a 14C date error is lognormally distributed with a mean equal to the CRA date, and a variance equal to the CRA error squared. This naturally skews the distribution away from the present. In practice, this difference is undetectably trivial for typical radiocarbon errors since the lognormal distribution approximates a normal distribution away from zero. However, theoretically the differences can be large if considering dates with large errors that are close to the present.
A naive approach to generating an SPD as a proxy for population dynamics would be to sum all dates in the dataset, but a more sensible approach is to sum the SPDs of each phase. The need to bin dates into phases is an important step in modelling population dynamics to adjust for the data ascertainment bias of some archaeological finds having more dates by virtue of a larger research interest or budget.
Therefore phaseCalibrator() generates an SPD for each phase in a dataset, and includes a binning algorithm which provides a useful solution to handling large datasets that have not been phased. For example, consider the following 8 dates from 2 sites:
## site lat long age sd LabNo
## 1192 Carrizal -6.063056 -79.49806 3640 100 Beta-31075
## 1193 Carrizal -6.063056 -79.49806 4390 110 Beta-18920
## 1194 Carrizal -6.063056 -79.49806 4450 100 Beta-31073
## 1195 Carrizal -6.063056 -79.49806 4620 100 Beta-31074
## 1196 Carrizal -6.063056 -79.49806 4690 120 Beta-27417
## 1205 Pacopampa -6.200000 -79.01000 2385 155 SI-794
## 1206 Pacopampa -6.200000 -79.01000 2765 135 SI-792
## 1207 Pacopampa -6.200000 -79.01000 2855 95 SI-793
The data have not already been phased (do not include a column ‘phase’) therefore the default binning algorithm calibrates these dates into four phases. this is achieved by binning dates that have a mean 14C date within 200 14C years of any other date in that respective bin. Therefore Pacopampa.1 comprises samples 1207 and 1206, Pacopampa.2 comprises sample 1205, Carrizal.1 comprises samples 1196 and 1195 and 1194 and 1193, and Carrizal.2 comprises sample 1192:
CalArray <- makeCalArray( calcurve=shcal20, calrange=c(2000,6000) )
x <- phaseCalibrator(data=data, CalArray=CalArray)
plotPD(x)
Finally, the distributions in each phase can be summed and normalised to unity. It is straight forward to achieve this directly from the dataframe created above:
Alternatively, the wrapper function summedPhaseCalibrator() will perform this entire workflow internally:
A CPL model lends itself well to the objectives of identifying specific demographic events. Its parameters are the (x,y) coordinates of the hinge points, which are the relative population size (y) and timing (x) of these events.
Theoretically a calibrated date should be a continuous Probability Density Function (PDF), however in practice a date is represented as a discrete vector of probabilities corresponding to each calendar year, and therefore is a Probability Mass Function (PMF). This discretisation provides the advantage that numerical methods can be used to easily calculate likelihoods, provided the model is also discretised to the same time points.
A toy() model is provided to demonstrate how this achieved. First, we simulate a plausible 14C dataset and calibrate it. The function simulateCalendarDates() automatically covers a slightly wider date range to ensure simulated 14C dates are well represented around the edges:
set.seed(12345)
N <- 350
# randomly sample calendar dates from the toy model
cal <- simulateCalendarDates(toy, N)
# Convert to 14C dates.
age <- uncalibrateCalendarDates(cal, shcal20)
data <- data.frame(age = age, sd = 50, phase = 1:N, datingType = '14C')
# Calibrate each phase, taking care to restrict to the modelled date range with 'remove.external'
CalArray <- makeCalArray(shcal20, calrange = range(toy$year))
PD <- phaseCalibrator(data, CalArray, remove.external = TRUE)
The argument ‘remove.external = TRUE’ ensures any calibrated phases with less than 50% of their probability mass within the modelled date range are excluded, reducing the effective sample size from 350 to 303. This is a crucial step to avoid mischievous edge effects of dates outside the date range. Similarly, notice we constrained the CalArray to the modelled date range. These are important to ensure that we only model the population across a range that is well represented by data. To extend the model beyond the range of available data would be to assume the absence of evidence means evidence of absence. No doubt there may be occasions when this is reasonable (for example if modelling the first colonisation of an island that has been well excavated, and the period before arrival is evidenced by the absence of datable material), but more often the range of representative data is due to research interest, and therefore the logic of only including dates with at least 50% of their probability within the date range is that their true dates are more likely to be internal (within the date range) than external.
## [1] 303
Finally we calculate the overall relative log likelihood of the model using function loglik()
## [1] -2266.526
For comparison, we can calculate the overall relative likelihood of a uniform model given exactly the same data. Intuitively this should have a lower relative likelihood, since our dataset was randomly generated from the non-uniform toy population history:
uniform.model <- convertPars(pars=NA, years=5500:7500, type='uniform')
loglik(PD=PD, model=uniform.model)
## [1] -2311.632
And indeed the toy model is thirty nine million trillion times more likely than the uniform model:
## [1] 3.882277e+19
Crucially, loglik() calculates the relative likelihoods for each effective sample separately (each phase containing a few dates). The overall model likelihood is the overall product of these individual likelihoods. This means that even in the case where there is no ascertainment bias, each date should still be assigned to its own phase, to ensure phaseCalibrator() calibrates each date separately. In contrast, attempting to calculate a likelihood for a single SPD constructed from the entire dataset would be incorrect, as this would be treating the entire dataset as a single ‘average’ sample.
We can use any out-of-the-box search algorithm to find the maximum a posteriori (MAP) estimates for the model parameters. This first requires us to describe the PD of any population model in terms of a small number of parameters, rather than a vector of probabilities for each year.
We achieve this using the Continuous Piecewise Linear (CPL) model, which is defined by the (x,y) coordinates of its hinge points.
When performing a search for the best 3-CPL model coordinates (given a dataset), only five of these eight values are free parameters. The x-coordinates of the start and end (5500 BP and 7500 BP) are fixed by the choice of date range. Additionally, one of the y-coordinates must be constrained by the other parameters, since the total probability (area) must equal 1. As a result, an n-CPL model will have 2n-1 free parameters.
We use the function convertPars() to map our search parameters to their corresponding PD coordinates. This allows us to propose independent parameter values from a uniform distribution between 0 and 1, and convert them into coordinates that describe a corresponding CPL model PD. This parameter-to-coordinate mapping is achieved using a modified stick breaking Dirichlet process.
The Dirichlet Process (not to be confused with the Dirichlet distribution) is an algorithm that can break a stick (the x-axis date range) into a desired number of pieces, ensuring all lengths are sampled evenly. The length (proportion) of remaining stick to break is chosen by sampling from the Beta distribution, such that we use the Beta CDF (with α = 1 and β = the number of pieces still to be broken) to convert an x-parameter into its equivalent x-coordinate value. We extend this algorithm for use with the CPL model by also converting y-parameters to y-coordinates as follows:
The parameters must be provided as a single vector with an odd length, each between 0 and 1 (y,x,y,x,…y). For example, a randomly generated 6-CPL model will have 11 parameters and 7 hinges:
## year pdf
## 1 5500.000 3.261064e-06
## 2 5950.539 4.771822e-07
## 3 6580.113 1.299434e-06
## 4 6929.120 3.426049e-06
## 5 7307.354 1.357384e-05
## 6 7395.293 1.031902e-02
## 7 7500.000 7.915814e-08
Note: The Area Breaking Algorithm is a heuristic that ensures all parameter space is explored and therefore the MAP parameters are always found. However, unlike the one-dimensional stick-breaking process, its mapping of random parameters to PD coordinates is not perfectly even, and we welcome ideas for a more elegant algorithm.
Any preferred search algorithm can be used. For example, the JDEoptim function from DEoptimR uses a differential evolution optimisation algorithm that performs very nicely for this application. We recommend increasing the default NP parameter to at least 20 times the number of parameters, and repeating the search to ensure consistency:
library(DEoptimR)
best <- JDEoptim(lower = rep(0,5),
upper = rep(1,5),
fn = objectiveFunction,
PDarray = PD,
type = 'CPL',
NP = 100,
trace = TRUE)
CPL <- CPLparsToHinges(pars=best$par, years=5500:7500)
SPD <- summedPhaseCalibrator( data=data, calcurve=shcal20, calrange=c(5500,7500) )
plotPD(SPD)
lines(CPL$year, CPL$pdf, lwd=2, col='firebrick')
legend(x=6300, y=max(CPL$pdf), cex=0.7, lwd=2, col='firebrick', bty='n', legend='best fitted 3-CPL')
text(x=CPL$year, y=CPL$pdf, pos=3, labels=c('H1','H2','H3','H4'))
The ADMUR function mcmc() uses the Metropolis-Hastings algorithm to search joint parameter values of an n-CPL model, given a the calibrated probability distributions of phases in a 14C dataset (PDarray). In principle the starting parameters do not matter if burn is of an appropriate length, but in practice it is more efficient to start in a sensible place such as the MAP parameters:
chain <- mcmc(PDarray=PD, startPars=best$par, type='CPL', N=100000, burn=2000, thin=5, jumps =0.025)
The acceptance ratio (AR) and raw chain (before burn-in and thinning) can be sanity checked. Ideally we want the AR somewhere in the range 0.3 to 0.5 (this can be tuned with the ‘jumps’ argument), and the raw chain to resemble ‘hairy caterpillars’:
print(chain$acceptance.ratio)
par(mfrow=c(3,2), mar=c(4,3,3,1))
col <- 'steelblue'
for(n in 1:5){
plot(chain$all.pars[,n], type='l', ylim=c(0,1), col=col, xlab='', ylab='', main=paste('par',n))
}
These parameters can then be converted to the hinge coordinates using the convertPars() function, and their marginal distributions plotted. Note, the MAP parameters (red lines) may not exactly match the peaks of these distributions because they are only marginals. Note also the dates of hinge 1 and 2 are fixed at 5500 and 7500:
hinges <- convertPars(pars=chain$res, years=5500:7500, type='CPL')
par(mfrow=c(3,2), mar=c(4,3,3,1))
c1 <- 'steelblue'
c2 <- 'firebrick'
lwd <- 3
pdf.brk <- seq(0,0.0015, length.out=40)
yr.brk <- seq(5500,7500,length.out=40)
names <- c('Date of H2','Date of H3','PD of H1','PD of H2','PD of H3','PD of H4')
hist(hinges$yr2,border=c1,breaks=yr.brk, main=names[1], xlab='');abline(v=CPL$year[2],col=c2,lwd=lwd)
hist(hinges$yr3, border=c1,breaks=yr.brk, main=names[2], xlab='');abline(v=CPL$year[3],col=c2,lwd=lwd)
hist(hinges$pdf1, border=c1,breaks=pdf.brk, main=names[3], xlab='');abline(v=CPL$pdf[1],col=c2,lwd=lwd)
hist(hinges$pdf2, border=c1,breaks=pdf.brk, main=names[4], xlab='');abline(v=CPL$pdf[2],col=c2,lwd=lwd)
hist(hinges$pdf3, border=c1,breaks=pdf.brk, main=names[5], xlab='');abline(v=CPL$pdf[3],col=c2,lwd=lwd)
hist(hinges$pdf4, border=c1,breaks=pdf.brk, main=names[6], xlab='');abline(v=CPL$pdf[4],col=c2,lwd=lwd)
Some two-dimensional combinations of joint parameters may be preferred, but still these are 2D marginal representations of 5D parameters, again with MAP in red:
library(scales)
par( mfrow=c(1,2) , mar=c(4,4,1.5,2), cex=0.7 )
plot(hinges$yr2, hinges$pdf2, pch=16, col=alpha(1,0.02), ylim=c(0,0.0005))
points(CPL$year[2], CPL$pdf[2], col='red', pch=16, cex=1.2)
plot(hinges$yr3, hinges$pdf3, pch=16, col=alpha(1,0.02), ylim=c(0,0.0015))
points(CPL$year[3], CPL$pdf[3], col='red', pch=16, cex=1.2)
Alternatively, the joint distributions can be visualised by plotting the CPL model for each iteration of the chain, with the MAP in red:
plot(NULL, xlim=c(7500,5500),ylim=c(0,0.0011), xlab='calBP', ylab='PD', cex=0.7)
for(n in 1:nrow(hinges)){
x <- c(hinges$yr1[n], hinges$yr2[n], hinges$yr3[n], hinges$yr4[n])
y <- c(hinges$pdf1[n], hinges$pdf2[n], hinges$pdf3[n], hinges$pdf4[n])
lines( x, y, col=alpha(1,0.005) )
}
lines(x=CPL$year, y=CPL$pdf, lwd=2, col=c2)
Percentage growth rates per generation provide an intuitive statistic to quantify and compare population changes through time. However there are two key issues to overcome when estimating growth rates for a CPL model.
The first problem is an extreme manifestation of the asymmetry from conventionally reporting change always with respect to the first value. For example, if we consider a population of 80 individuals at time t1, changing to 100 at t2 and to 80 at t3, this would be conventionally described as a 25% increase followed by a 20% decrease. This asymmetry is unintuitive and unhelpful in the context of population change, and instead we use a relative rate which is always calculated with respect to the larger value (e.g., a ‘20% relative growth’ followed by ‘20% relative decline’).
We overcome the second problem by calculating the expected (mean average) rate across the entire linear piece. This is achieved by notionally breaking the line into N equal pieces, such that the coordinates of the ends of the ith piece are (x1, y1) and (x2, y2). The generational (25 yr) rate r of this ith piece is: $$r_i=100\times exp[\ln(\frac{y_2}{y_1})/\frac{x_1-x_2}{25}]-100$$ and the expected rate across the entire line as N approaches +∞ is: $$\sum_{i=1}^{N}r_i/N$$
For example, a population decline from n=200 to n=160 across 100 years is conventionally considered to have a generational decline rate of $100\times exp[\ln(\frac{160}{200})/\frac{100}{25}]-100$ = 5.426% loss per generation. If partitioned into just N = 2 equal sections (n=200, n=180, n=160), we require two generational decline rates: $100\times exp[\ln(\frac{180}{200})/\frac{50}{25}]-100$ = 5.132% and $100\times exp[\ln(\frac{160}{180})/\frac{50}{25}]-100$ = 5.719%, giving a mean of 5.425%. As the number of sections N approaches +∞, the mean rate asymptotically approaches 5.507%. The similarity to the conventional rate of 5.426% is because the total percentage loss is small (20%), therefore an exponential curve between n=200 and n=160 is similar to a straight line. In contrast, a huge percentage loss of 99.5% illustrates the importance of calculating the expected growth rate, averaged across the whole line: An exponential curve between n=200 and n=1 across the same 100 years has a decline rate of $100\times exp[\ln(\frac{1}{200})/\frac{100}{25}]-100$ = 73.409% loss per 25 yr generation. Meanwhile a linear model between n=200 and n=1 across the same 100 years has an expected decline rate of 47.835% loss per generation.
The relationship between the conventional rate and relative rate is almost identical for realistic rates of change (c. -10% to +10% per generation):
N <- 1000
x <- cbind(rep(5100,N),rep(5000,N))
y <- cbind(seq(1,100,length.out=N),seq(100,1,length.out=N))
conventional <- 100 * exp(log(y[,2]/y[,1])/((x[,1]-x[,2])/25))-100
relative <- relativeRate(x,y)
plot(conventional, relative, type='l')
rect(-100,-100,c(10,0,-10),c(10,0,-10), lty=2,border='grey')
A fundamentally important issue in modelling is the need to avoid overfitting an unjustifiably complex model to data, by using a formal model selection approach. In the example above we arbitrarily chose a 3-CPL model to fit to the data (since the data was randomly sampled from a 3-CPL toy population), however, given the small sample size (n = 303) it is possible a simpler model may have better predictive power. ADMUR achieves this using the so-called Bayesian Information Criterion (BIC) aka Schwarz Information Criterion, which balances the model likelihood against the number of parameters and sample size.
Therefore we should also find the Maximum Likelihood for other plausible models such as a 4-CPL, 2-CPL, 1-CPL, exponential and even a uniform:
# CPL parameters must be between 0 and 1, and an odd length.
CPL.1 <- JDEoptim(lower=0, upper=1, fn=objectiveFunction, PDarray=PD, type='CPL', NP=20)
CPL.2 <- JDEoptim(lower=rep(0,3), upper=rep(1,3), fn=objectiveFunction, PDarray=PD, type='CPL', NP=60)
CPL.3 <- JDEoptim(lower=rep(0,5), upper=rep(1,5), fn=objectiveFunction, PDarray=PD, type='CPL', NP=100)
CPL.4 <- JDEoptim(lower=rep(0,7), upper=rep(1,7), fn=objectiveFunction, PDarray=PD, type='CPL', NP=140)
# exponential has a single parameter, which can be negative (decay).
exp <- JDEoptim(lower=-0.01, upper=0.01, fn=objectiveFunction, PDarray=PD, type='exp', NP=20)
# uniform has no parameters so a search is not required.
uniform <- objectiveFunction(NULL, PD, type='uniform')
The objective function returns the negative log-likelihood since the search algorithm seeks to minimise the objective function. It is therefore trivial to extract the log-likelihoods, and calculate the BIC scores using the formula BIC = kln (n) − 2L where k is the number of parameters, n is the effective sample size (i.e. the number of phases = 303), and L is the maximum log-likelihood.
# likelihoods
data.frame(L1= -CPL.1$value,
L2= -CPL.2$value,
L3= -CPL.3$value,
L4= -CPL.4$value,
Lexp= -exp$value,
Lunif= -uniform)
## L1 L2 L3 L4 Lexp Lunif
## 1 -2282.807 -2277.687 -2264.563 -2263.702 -2285.838 -2311.632
BIC.1 <- 1*log(303) - 2*(-CPL.1$value)
BIC.2 <- 3*log(303) - 2*(-CPL.2$value)
BIC.3 <- 5*log(303) - 2*(-CPL.3$value)
BIC.4 <- 7*log(303) - 2*(-CPL.4$value)
BIC.exp <- 1*log(303) - 2*(-exp$value)
BIC.uniform <- 0 - 2*(-uniform)
data.frame(BIC.1,BIC.2,BIC.3,BIC.4,BIC.exp,BIC.uniform)
## BIC.1 BIC.2 BIC.3 BIC.4 BIC.exp BIC.uniform
## 1 4571.328 4572.515 4557.695 4567.401 4577.391 4623.263
Clearly the 4-CPL has the highest likelihood, however the 3-CPL model has the lowest BIC and is selected as the best. This tells us that the 4-CPL is overfitted to the data and is unjustifiably complex, whilst the other models are underfitted and lack explanatory power. Nevertheless for comparison we can plot all the competing models, illustrating that the 4-CPL fits the closest, but cannot warn us that it is overfit:
# convert parameters to model PDs
CPL1 <- convertPars(pars=CPL.1$par, years=5500:7500, type='CPL')
CPL2 <- convertPars(pars=CPL.2$par, years=5500:7500, type='CPL')
CPL3 <- convertPars(pars=CPL.3$par, years=5500:7500, type='CPL')
CPL4 <- convertPars(pars=CPL.4$par, years=5500:7500, type='CPL')
EXP <- convertPars(pars=exp$par, years=5500:7500, type='exp')
# Plot SPD and five competing models:
plotPD(SPD)
cols <- c('firebrick','orchid2','coral2','steelblue','goldenrod3')
lines(CPL1$year, CPL1$pdf, col=cols[1], lwd=2)
lines(CPL2$year, CPL2$pdf, col=cols[2], lwd=2)
lines(CPL3$year, CPL3$pdf, col=cols[3], lwd=2)
lines(CPL4$year, CPL4$pdf, col=cols[4], lwd=2)
lines(EXP$year, EXP$pdf, col=cols[5], lwd=2)
legend <- c('1-CPL','2-CPL','3-CPL','4-CPL','exponential')
legend(x=6300, y=max(CPL$pdf), cex=0.7, lwd=2, col=cols, bty='n', legend=legend)
it is crucial to test if the selected model is plausible, or in other words, to test if the observed data is a reasonable outcome of the model. If the observed data is highly unlikely the model must be rejected, even if it was the best model selected.
Typically a GOF quantifies how unusual it would be for the observed data to be generated by the model. Of course the probability of any particular dataset being generated by any particular model is vanishingly small, so instead we estimate how probable it is for the model to produce the observed data, or data that are more extreme. This is the same concept as the p-value, but instead of using a null hypothesis we use the best selected model. We can generate many simulated datasets under this model, and calculate a summary statistic for each simulation. A one-tailed test will then establish the proportion of simulations that have a poorer summary statistic (more extreme) than the observed data’s summary statistic.
For each dataset (simulated and observed) we generate an SPD and use a statistic that measures how divergent each SPD is from expectation, by calculating the proportion of the SPD that sits outside the 95% CI.
summary <- SPDsimulationTest(data, calcurve=shcal20, calrange=c(5500,7500), pars=CPL.3$par, type='CPL')
The test provides a p-value of 1.00 for the best model (3-CPL), since all of the 20,000 simulated SPDs were as or more extreme than the observed SPD, providing a sanity check that the data cannot be rejected under this model, and therefore is a plausible model:
## [1] 1
The previous section provided a framework to directly select the best model given a dataset. This contrasts with the SPD simulation methodology which requires the researcher to a priori specify a single null model, then generate many simulated datasets under this null model which are compared with the observed dataset to generate a p-value. Without the model selection framework, the SPD simulation approach alone has several inferential shortcomings:
Nevertheless, the p-value from the SPD simulation framework is hugely useful in providing a Goodness of Fit test for the best selected model. Therefore the summary generated in the section ‘Goodness of fit test’ by the SPDsimulationTest() function provides a number of other useful outputs that can be plotted, including:
pvalue
the proportion of N simulated SPDs that have more points outside the 95%CI than the observed SPD has.
observed.stat
the summary statistic for the observed data (number of points outside the 95% CI).
simulated.stat
a vector of summary statistics (number of points outside the 95% CI), one for each simulated SPD.
n.dates.all
the total number of date in the whole data set. Trivially, the number of rows in data.
n.dates.effective
the effective number of dates within the date range. Will be non-integer since a proportion of some dates will be outside the date range.
n.phases.all
the total number of phases in the whole data set.
n.phases.effective
the effective number of phases within the date range. Will be non-integer since a proportion of some phases will be outside the date range.
n.phases.internal
an integer subset of n.phases.all that have more than 50% of their total probability mass within the date range.
timeseries
a data frame containing the following:
CI
several vectors of various Confidence Intervals.
calBP
a vector of calendar years BP.
expected.sim
a vector of the expected simulation (mean average of all N simulations).
local.sd
a vector of the local (each year) standard deviation of all N simulations.
model
a vector of the model PDF.
SPD
a vector of the observed SPD PDF, generated from data.
index
a vector of -1,0,+1 corresponding to the SPD points that are above, within or below the 95% CI of all N simulations.
summary <- SPDsimulationTest(data, calcurve=shcal20, calrange=c(5500,7500), pars=exp$par, type='exp')
The function plotSimulationSummary() then represents these summary results in a single plot:
The above modelling components (MCMC, GOF, model comparison, relative likelihoods, BIC etc) are not constrained to CPL models, but can be applied to any model structure. Currently ADMUR offers the following:
CPL, Uniform, Exponential, Gaussian, Cauchy, Sinusoidal, Logistic, Power law
See convertPars() for details. Care should be taken when considering a Gaussian model. The distribution of data from a single event can often superficially appear to be normally distributed due to the tendency to unconsciously apply regression methods (minimising the residuals). However, contrary to appearances (and intuitions) a Gaussian does not ‘flatten’ towards the tails, but decreases at a greater and greater rate towards zero. As a consequence, small amounts of data that are several standard deviations away from the mean appear to fit a Gaussian quite well, but are in fact absurdly improbable. Instead, for single events consider a Cauchy model, given the phenomenon that real life data usually has fatter tails than a Gaussian. Alternatively, if the waxing and waning of data is suspected to be driven by an oscillating system (such as climate) a sinusoidal model may more sensible.
The following code uses three toy datasets to demonstrate these models. After calibration through intcal20, they retain effective sample sizes of a little under n = 100.
# generate SPDs
CalArray <- makeCalArray(intcal20, calrange = c(1000,4000))
spd1 <- summedCalibrator(data1, CalArray, normalise='full')
spd2 <- summedCalibrator(data2, CalArray, normalise='full')
spd3 <- summedCalibrator(data3, CalArray, normalise='full')
# calibrate phases
PD1 <- phaseCalibrator(data1, CalArray, remove.external = TRUE)
PD2 <- phaseCalibrator(data2, CalArray, remove.external = TRUE)
PD3 <- phaseCalibrator(data3, CalArray, remove.external = TRUE)
# effective sample sizes
ncol(PD1)
ncol(PD2)
ncol(PD3)
# MAP search, fitting various models to various datasets
norm <- JDEoptim(lower=c(1000,1), upper=c(4000,5000),
fn=objectiveFunction, PDarray=PD1, type='norm', NP=40, trace=T)
cauchy <- JDEoptim(lower=c(1000,1), upper=c(4000,5000),
fn=objectiveFunction, PDarray=PD1, type='cauchy', NP=40, trace=T)
sine <- JDEoptim(lower=c(0,0,0), upper=c(1/1000,2*pi,1),
fn=objectiveFunction, PDarray=PD2, type='sine', NP=60, trace=T)
logistic <- JDEoptim(lower=c(0,0), upper=c(1,10000),
fn=objectiveFunction, PDarray=PD3, type='logistic', NP=40, trace=T)
exp <- JDEoptim(lower=c(0), upper=c(1),
fn=objectiveFunction, PDarray=PD3, type='exp', NP=20, trace=T)
power <- JDEoptim(lower=c(0,-10), upper=c(10000,0),
fn=objectiveFunction, PDarray=PD3, type='power', NP=40, trace=T)
Note the upper boundaries for the sinewave (see sinewavePDF() for details). The first parameter governs the frequency, so should be constrained by a wavelength no shorter than c. 1/10th of the date range.
Now the MAP parameters need to be converted into a PDF and plotted:
# convert parameters to model PDs
years <- 1000:4000
mod.norm <- convertPars(pars=norm$par, years, type='norm')
mod.cauchy <- convertPars(pars=cauchy$par, years, type='cauchy')
mod.sine <- convertPars(pars=sine$par, years, type='sine')
mod.uniform <- convertPars(pars=NULL, years, type='uniform')
mod.logistic <- convertPars(pars=logistic$par, years, type='logistic')
mod.exp <- convertPars(pars=exp$par, years, type='exp')
mod.power <- convertPars(pars=power$par, years, type='power')
# Plot SPDs and various fitted models:
par(mfrow=c(3,1), mar=c(4,4,1,1))
cols <- c('steelblue','firebrick','orange')
plotPD(spd1)
lines(mod.norm, col=cols[1], lwd=5)
lines(mod.cauchy, col=cols[2], lwd=5)
legend(x=4000, y=max(spd1)*1.2, lwd=5, col=cols, bty='n', legend=c('Gaussian','Cauchy'))
plotPD(spd2)
lines(mod.sine, col=cols[1], lwd=5)
lines(mod.uniform, col=cols[2], lwd=5)
legend(x=4000, y=max(spd2)*1.2, lwd=5, col=cols, bty='n', legend=c('Sinewave','Uniform'))
plotPD(spd3)
lines(mod.logistic, col=cols[1], lwd=5)
lines(mod.exp, col=cols[2], lwd=5)
lines(mod.power, col=cols[3], lwd=5)
legend(x=4000, y=max(spd3)*1.2, lwd=5, col=cols, bty='n', legend=c('Logistic','Exponential','Power Law'))
ADMUR can also combine multiple models previously described into a more complex aggregated model. For example, we might hypothesise that an island’s population dynamics are driven by the combination of some regular climatic oscillations (perhaps from orbital forcing) and growth to carrying capacity, and therefore wish to use a model that is the combination of sinusoidal and logistic. This can be achieved by using a vector of model types, and combining all the required parameters into a single vector, in their respective order. For example:
# generate SPDs
CalArray <- makeCalArray(intcal20, calrange = c(1000,4000))
spd <- summedCalibrator(data2, CalArray, normalise='full')
# calibrate phases
PD <- phaseCalibrator(data2, CalArray, remove.external = TRUE)
# MAP parameter search
combo.sine.logistic <- JDEoptim(lower=c(0,0,0,0,0), upper=c(1/1000,2*pi,1,1,10000),
fn=objectiveFunction, PDarray=PD, type=c('sine','logistic'), NP=100, trace=T)
# Convert MAP parameters to a PDF
mod.combo <- convertPars(pars=combo.sine.logistic$par, years, type=c('sine','logistic'))
# Plot the SPD and the model
Taphonomic loss has an important influence on the amount of datable material that can be recovered, with the obvious bias that older material is less likely to survive. This means that if a constant population deposited a perfectly uniform amount of material through time, we should expect the archaeological record to show an increase in dates towards the present, rather than a uniform distribution.
This taphonomic loss rate has been estimated by Surovell et al. and Bluhm and Surovell who make a compelling argument that a power function a(x + b)c provides a useful model of taphonomic loss through time (x), providing not only a good statistical fit to empirical data, but is also consistent with the mechanism that datable material is subject to greater initial environmental degradation when first deposited on the ground surface compared to the increasing protection through time as it becomes cocooned from these forces.
However, there are two important issues to consider when modelling taphonomy:
The above studies use regression methods to estimate the taphonomic curve parameters. These methods don’t incorporate the full information of the calibrated 14C dates (instead a point estimate is used for each date), and therefore are not based on likelihoods. Nor do they provide confidence intervals for the curve parameters. Finally, the parameter a is unnecessary for the purposes of population modelling, since we are not interested in estimating the absolute loss in material, merely the relative loss through time. Therefore we can consider the taphonomic curve as a PDF such that the total area across the study period equals 1. This results in the following formula, where x is time, and xmin and xmax are the time boundaries of the study period: $$\frac{(c+1)(b+x)^c}{(b+x_{max})^{(c+1)} - (b+x_{min})^{(c+1)}}$$
This defines ADMUR’s power model PDF which we apply within the MCMC framework to estimate the joint parameter distributions of b and c from the same two datasets used in the above studies, constraining the study period (as Bluhm and Surovell did) to between 1kyr and 40kyr BP as follows:
# generate an PD array for each dataset
years <- seq(1000,40000,by=50)
CalArray <- makeCalArray(intcal20, calrange = c(1000,40000),inc=50)
PD1 <- phaseCalibrator(bryson1848, CalArray, remove.external = TRUE)
PD2 <- phaseCalibrator(bluhm2421, CalArray, remove.external = TRUE)
# MCMC search
chain.bryson <- mcmc(PDarray=PD1,
startPars=c(10000,-1.5),
type='power', N=50000,
burn=2000,
thin=5,
jumps =c(250,0.075))
chain.bluhm <- mcmc(PDarray=PD2,
startPars=c(10000,-1.5),
type='power', N=50000,
burn=2000,
thin=5,
jumps =c(250,0.075))
# convert parameters to taphonomy curves
curve.bryson <- convertPars(chain.bryson$res, type='power', years=years)
curve.bluhm <- convertPars(chain.bluhm$res, type='power', years=years)
# plot
plot(NULL, xlim=c(0,12000),ylim=c(-2.5,-1), xlab='parameter b', ylab='parameter c')
points(chain.bryson$res, col=cols[1])
points(chain.bluhm$res, col=cols[2])
plot(NULL, xlim=c(0,40000),ylim=c(0,0.00025), xlab='yrs BP', ylab='PD')
N <- nrow(chain.bryson$res)
for(n in sample(1:N,size=1000)){
lines(years,curve.bryson[n,], col=cols[1])
lines(years,curve.bluhm[n,], col=cols[2])
}
Clearly the taphonomic parameters b and c are highly correlated, and although the curves superficially appear very similar, the parameters differ significantly between the two datasets.
Taphonomy can be included in any ADMUR model by including ‘power’ model parameters (taphonomic parameters b and c). We suggest constraining these parameters to 0 < b < 20000 and −3 < c < 0, but if there is better prior knowledge of this range (perhaps an independent dataset based on volcanic eruptions for the same study area) then this can be further constrained accordingly.
For example, we might perform a MAP search using the previously generated PD array, to find the best 3-CPL model with and without taphonomy as follows:
best <- JDEoptim(lower=c(0,0,0,0,0),
upper=c(1,1,1,1,1),
fn=objectiveFunction,
PDarray=PD,
type='CPL',
trace=T,
NP=100)
best.taph <- JDEoptim(lower=c(0,0,0,0,0,0,-3),
upper=c(1,1,1,1,1,20000,0),
fn=objectiveFunction,
PDarray=PD,
type=c('CPL','power'),
trace=T,
NP=140)
These parameters can then be converted to model PDFs and plotted:
CPL <- convertPars(pars=best$par, years=5500:7500, type='CPL')
CPL.taph <- convertPars(pars=best.taph$par, years=5500:7500, type=c('CPL','power'))
SPD <- summedPhaseCalibrator( data=data, calcurve=shcal20, calrange=c(5500,7500) )
plotPD(SPD)
lines(CPL$year, CPL$pdf, lwd=2, col=cols[1])
lines(CPL.taph$year, CPL.taph$pdf, lwd=2, col=cols[2])
legend(x=6300,y=0.001,legend=c('3-CPL','3-CPL with taphonomy'),bty='n',col=cols[1:2],lwd=2,cex=0.7)
The above 3-CPL with taphonomy model represents a conflation of two model components: the population dynamics and the taphonomic loss. Instead we are interested in separating these components:
pop <- convertPars(pars=best.taph$par[1:5], years=5500:7500, type='CPL')
taph <- convertPars(pars=best.taph$par[6:7], years=5500:7500, type='power')
plotPD(pop)
title('Population dynamics')
Finally the hinge coordinates of the population dynamics can be extracted:
## year pdf
## 1 5500.000 0.0001464940
## 2 6260.272 0.0001168829
## 3 6749.097 0.0009878312
## 4 7500.000 0.0006898161
We should always be cautious of assigning too much importance to point estimates. MAP estimates above are no exception to this. Smaller sample sizes will always result in larger uncertainties, and it is always better to estimate the plausible range of results. This is of particular concern with taphonomic parameters since the reanalysis of the volcanic datasets above illustrates how a large range of parameter combinations provide very similar taphonomic curves. Furthermore, when including taphonomy in the model, the taphonomic parameters have the potential to interact with the population dynamics parameters in vastly more parameter combinations that will give in many different combinations of similar overall radiocarbon date distributions. Therefore we can perform an MCMC parameter search as follows:
chain.taph <- mcmc(PDarray = PD,
startPars = c(0.5,0.5,0.5,0.5,0.5,10000,-1.5),
type=c('CPL','power'),
N = 30000,
burn = 2000,
thin = 5,
jumps = 0.025)
These can then be separated into population dynamics parameters and taphonomic parameters for either direct plotting, or converted to model PDFs and plotted:
# convert parameters into model PDFs
pop <- convertPars(pars=chain.taph$res[,1:5], years=5500:7500, type='CPL')
taph <- convertPars(pars=chain.taph$res[,6:7], years=seq(1000,30000,by=50), type='power')
# plot population dynamics PDF
plot(NULL, xlim=c(7500,5500),ylim=c(0,0.0013), xlab='calBP', ylab='PD', las=1)
for(n in 1:nrow(pop))lines(5500:7500, pop[n,],col=alpha(1,0.05))
# plot taphonomy PDF
plot(NULL, xlim=c(30000,0),ylim=c(0,0.00025), xlab='calBP', ylab='PD',las=1,)
for(n in 1:nrow(taph))lines(seq(1000,30000,by=50), taph[n,],col=alpha(1,0.02))
# plot taphonomic parameters
plot(NULL, xlim=c(0,20000),ylim=c(-3,0), xlab='parameter b', ylab='parameter c',las=1)
for(n in 1:nrow(chain.taph$res))points(chain.taph$res[n,6], chain.taph$res[n,7],col=alpha(1,0.2),pch=20)