Intro

Can we improve the odds of finding new fossil sites during palaeoanthropological reconnaissance? Much has changed since the early days of human palaeontology in Africa when miners uncovered hominin fossils from limestone breccias with explosive charges. Today, the use of military, topographic, and geological maps, aerial photos taken by planes or drones, and software like Google Earth or ArcGIS has become ubiquitous. These spatial datasets can aid in identifying high-priority areas prior to systematic surveying efforts with multidisciplinary teams (Njau and Hlusko, 2010). While there are still major fossil finds being attributed to serendipity, modern surveying is no longer just a question of being in the right place at the right time.

Geospatial palaeontology is a new and emergent discipline that uses remote sensing data, GIS and statistical algorithms to model the probability of finding new fossiliferous deposits (Anemone et al., 2011). Most of this research has been applied to identifying new Eocene deposits containing, among other vertebrates, early primates (Conroy, 2014; Emerson et al., 2015). However, such approaches have yet to be applied to regions known to be rich in hominin-bearing fossil deposits, such as the Omo-Turkana basin.

Methods

# The following are all essential packages you should install in R if you would like to run all the code contained in this notebook. The command to install packages is install.packages("nameofpackage")

library(raster)
library(maptools)
library(ggplot2)
library(zoo)
library(oneClass) # for svm
library(dismo) # for maxEnt
rangePlot <- function(x){(x - min(x))/(max(x) - min(x))} # function to standardize ranges used in plots

Various supervised learning algorithms for one-class classification are currently being applied to a dataset of Koobi Fora fossiliferous areas. The purpose of a one-class classifier is identical to the purpose of a supervised binary classifier. In contrast, however, to most standard classification problems, the training data of the one-class classifier only contains labelled samples from the positive class (here, fossil sites), while in binary classification, the negative class, has to be mapped in the training set. But collecting a representative training set for the negative class can be very costly and time-consuming due to the fact that the negative class is the aggregation of all other classes without the positive class (Mack and Waske, 2017). Thus, a one-class classifier is particularly useful when only one class has to be mapped and when the acquisition of representative labelled data for the negative class is expensive or complicated to define. Moreover, why should we assume a priori that other classes of landcover (e.g. shrubland, grassland, etc.) are unfossiliferous, instead of letting the model determine this from the data distribution? An early pipeline has already been developed using a subset of georeferenced fossils from The Turkana Database (Bobe et al., 2011).

code pipeline

We begin by loading a landsat8 scene in R, downloaded from USGS. For reproducibility we provide here its ID (LANDSAT_SCENE_ID = “LC81690572018036LGN00”) so others can download the same file and run all the analyses. We used a good quality, free-of-clouds (over Koobi Fora), and most recent satellite image available, representing a northeast section of lake Turkana (from 5th of February, 2018).