Goodness-of-fit test for the von Mises distribution: the bootstrapped Watson test in R

The von Mises distribution, a close approximation of the wrapped Gaussian, is a particularly popular option to describe circular, unimodal and symmetric data. The parameters of the distribution, the mean and the kappa, can be estimated using classical fitting procedure like Maximum Likelihood Estimation (MLE) or Bayesian estimations.

For model selection, one should be able to test whether the choice of a von Mises to describe a given dataset is sound. The use of a standard goodness-of-fit test like Kolmogorov-Smirnoff (K-S) or the Cramér-von Mises criterion (C-vM) is not recommended, because of the need for setting an arbitrary reference point on the circle in order to define the CDF. In other words, because the space is circular, one should expect invariance of the test results to space rotations, something not considered in the K-S and C-vM test. Two alternatives are usually used when dealing with circular data: the Kuiper and the Watson tests. Here, I propose a basic bootstrap implementation of the Watson test in R.

The U2 statistic and its asymptotic null distribution

Bootstrap script in R

Existing implementations of the Watson tests in R are part of the Circular and CircStats packages. They are using a critical points table for the asymptotic U2 distribution provided by Lockhart & Stephens (1985), which links different (Kappa,Alpha) ranges to U2 diagnostic values. However, this table is only providing significance ranges, and has an lower bound limit of 0.005 for the p-value. Here, we will try a very basic implementation of a bootstrap version of this test, where instead of the original table, we will be computing a bootstrapped U2 distribution to compare to our data. This way, p-values can be approximated at an arbitrary precision level.

First, we need to define our R function for the U2 statistic:

require("circular")

# Watson U^2 statistic
get.u2<-function(x){
    n <- length(x)
    res <- mle.vonmises(x, bias=FALSE)
    mu.hat <- res$mu
    kappa.hat <- res$kappa
    x <- (x - mu.hat) %% (2 * pi)
    x <- matrix(x, ncol = 1)
    z <- apply(x, 1, pvonmises, mu=0, kappa=kappa.hat, tol=1e-020)
    z <- sort(z)
    z.bar <- mean.default(z)
    i <- 1:n
    sum.terms <- (z - (2 * i - 1)/(2 * n))^2
    Value <- sum(sum.terms) - n * (z.bar - 0.5)^2 + 1/(12 * n)
    return(Value)
}

This function is largely inspired from the original watson.test (Circular package). Note the estimate of the two parameters via MLE (the “res” variable). These values are then used to calculate the U2 value:

Now, we just have to define a function that will generate random samples from a true von Mises distribution, and estimate the U2 value for different groups of samples.


bootstrap.u2<-function(x,n=100){
  samples<-length(x) # number of samples in the original data
  mle.result<-mle.vonmises(x,bias=F) # von Mises MLE parameters estimate for the original data 
  i=0
  
  bootstrap.u2.distrib<-c()
  
  while(i<n){
    boots<-rvonmises(samples,mle.result$mu,mle.result$kappa) # generate deviates from a von Mises with the estimated parameters from the original data
    u2.boots<-get.u2(boots) # get U2 statistics for the current random deviates
    bootstrap.u2.distrib<-c(bootstrap.u2.distrib,u2.boots)
    i=i+1
  }
  results<-sort(bootstrap.u2.distrib)
  return(results)
}

This function takes the original angular values as a vector, calculates the U2 of the original dataset, and estimates the mean and kappa via MLE. Importantly, the estimated parameters values from the original data are then used to generate random von Mises deviates using the rvonmises() function. The number of samples is determined by the original vector size. The U2 is then calculated for these each vector of random deviates, and stored in the bootstrap.u2.distrib vector. This process is iterated for n bootstraps. The function returns an approximation of the U2 distribution for a given kappa and mean, which represents our “null” distribution, that is, the U2 values and their probabilities that corresponds to a given von Mises distribution.

To test whether a given circular distribution is significantly differing from a von Mises, we can count the number of time our U2 statistic from the “null” distribution (returned by our bootstrap.u2() function) falls above the U2 value from our empirical distribution (calculated using get.u2() on our original data). The ‘p-value’ can then be inferred by dividing this value by the total number of U2 samples in the distribution.

References

Lockhart, R. A., & Stephens, M. A. (1985). Tests of fit for the von Mises distribution. Biometrika, 72(3), 647-652. https://doi.org/10.1093/biomet/72.3.647

Sun, Z. (2009). Comparing measures of fit for circular distributions (Doctoral dissertation).
Retrieved from https://dspace.library.uvic.ca/bitstream/handle/1828/2698/zhengsun_master_thesis.pdf

Watson, G. S. (1961). Goodness-of-fit tests on a circle. Biometrika, 48(1/2), 109-114.
https://doi.org/10.1093/biomet/48.1-2.109

Machine learning for birds conservation

The ‘microfaune’ (or ‘micro-fauna’ in English) project was started last September within our research division at Wazo, the NGO I co-founded. Microfaune aims to develop machine-learning tools for bioacoustics research, in order to foster birds and wildlife conservation in cities.

The goal of the project is to improve the assessment of urban biodiversity with deep learning algorithms. A first step involves the detection of birdsong from audio recordings, made at the Cité Universitaire de Paris using devices provided by the Cornell University Laboratory of Ornithology. The contributions of this project are:

  • A platform for annotating bird songs (presence or absence)
  • A model allowing the rapid identification chunking and labelling of bird songs
  • An open-source labelled database
The Cornell Lab of Ornithology provided us with state-of-the-art recording tools. Independent recording devices can be installed at specific locations of interest in the canopy. A typical device is able to collect approx 250 GO of data every 15 days.

The project, led by Hadrien Jean and his team, was selected for the Fall 2019 and Fall 2020 season of DataForGood, the French incubator for common good which provide AI resources and solutions for the future of humanity.

The code has been made freely available on github, and can be deployed on Google Cloud AI platform and AWS.

Wazo just awarded the Medal of honour from the City of Paris

Wazo – Cité internationale des Oiseaux, a France-based NGO I co-founded two years ago to fight biodiversity decline in cities, has been awarded the prestigious Medal of Honor of the City of Paris. Wazo develops machine-learning algorithms and bioacoustics tools for birds and wildlife conservation. We are particularly involved in certain parks in Paris (e.g. the Cité internationale universtaire park, in the 14th arrondissement).