Classification is a common data science problem that involves determining object membership in a set of categories. All remote sensing data may be useful for the classification task. LiDAR data provides information on the spatial variation in canopy height that may allow distinguishing species based on height, shape, or crown structure variability. Hyperspectral data allows development of spectral signatures to identify object categories and is likely the most useful remote sensing dataset for classification. RGB photographs provide visible-color spectral reflectance but at a fine spatial resolution (0.10 x 0.10 m as opposed to 1.0 x 1.0 m for the hyperspectral data), which may provide additional useful information for spectral separability of species classes.

A large number of ecological, environmental, and conservation oriented questions depend on species identification. This includes efforts to conserve individual species, understand and maintain biodiversity, and incorporate the biosphere into global circulation models. Being able to describe the density and distribution of different species using remote sensing would allow these efforts to occur more rapidly and at larger scales than field sampling.

The goal of this task is to classify trees in remote sensing data to their taxonomic species. In addition to its utility for the domain, this task represents a challenging version of general classification problems because it involves classifying different species with very similar spectral signatures and categorizing data where some categories (species) have only small samples in the training set (i.e. rare species). 

To make this task independent of delineation task, ITC delineations are provided. Participants will determine the probability that each ITC belongs to a species class. Since classification is at the level of the ITC, any pixel-level classification models must be upscaled to the crown.

One important criterion for evaluating a classification model is its ability to deal with species from outside the training set. Thus, we also evaluate participants’ classification models on their ability to correctly identify crowns to an “Other” class when they do not belong to any of the previously seen classes.

Data split

The data will be split into datasets for training to develop models and testing to evaluate model performance. In addition, since this competition will compare how well methods generalize to different forests, different data are provided for the OSBS, MLBS, and TALL sites.

Training​ ​Data

Remote sensing data, which includes the hyperspectral, LiDAR, and RGB photos, are provided for the training plots at the OSBS and MLBS sites. ITC data provided are spatial bounding boxes that define each ITC for all plots. Field data are provided for all ITCs. The field data provide the taxonomic species class for each ITC.  The ITC and field data can be used in any way for developing delineation methods, whether that be directly for supervised methods, or indirectly by evaluating the output of unsupervised delineation method. Since the TALL site is used as a test of how models apply to untrained sites, no TALL data is provided in the training data. Participants can use the train data for self-evaluation of their methods.

Test​ ​Data

Remote sensing data are provided for the testing plots at the OSBS, MLBS, and TALL sites. The ITC bounding boxes that define the pixels to be classified are provided, but no field data are provided. Participants apply methods developed using the training data to the testing data.

Submission​ ​Data

Participants will submit taxonomic species predictions for ITCs in the test data. The predictions should be a probability from 0 to 100% that the crown belongs to the associated taxon ID.

Submissions will be a single .csv containing information on the crown ID and the taxonomic ID probabilities for each crown. The files should contain one row for each ITC crown ID and taxon ID combination (i.e., the number of rows should be equal to the number of ITC​’s in the testing data x the number of unique taxon IDs in the training data). The dictionary of taxonomic species classifications and their unique IDs will be provided.

The testing data contains classes that are not present in the training data. Therefore, participant submission data may include ITC labels that do not have a known taxonID class. When this occurs, participants should include the class prediction as “Other” so signify the class is unknown and not one of the classes included in the training data.

  • indvdID​:​ ​the​ ​matching​ ​ID​ ​from​ ​the​ ​ITC​ ​data
  • taxonID​:​ ​the​ ​predicted​ ​taxonomic code
  • probability​: the probability that the crown belongs to the associated genus or species. The probabilities for a given ​ITC ID (including the “Other” category) will be normalized to sum to 1 if the​ ​submitted​ ​values​ ​do​ ​not​ ​already.

Performance​ ​Metrics

The primary metric for assessing species classification will be the macro F1 score:

This score is the product of the precision and recall scores for each class in the dataset. The score will weight each class equally regardless of the number of individuals in each class. This makes this score good for unbalanced datasets because it weights the performance on rare classes (species) equally.

For the site where no training data is provided, the primary comparisons will be made only on species that are in the training data for the other two sites. To assess models ability to identify unseen classes we will also compute a second macro F1 score with unseen classes designated as “other”.

Both of these F1 scores will be computed using the following sklearn function: 

sklearn.metrics.f1_score(y_true, y_pred, average=’macro’)

or submissions that include probabilistic classification (i.e. a probability that each individual belongs to each class) we will assess model performance incorporating this uncertainty using cross-entropy. Average​ ​cross-entropy,​ ​is​ ​defined​ ​as:


The ​δ(x, y) is a function that takes a value of 1 when x = y​. This metric rewards participants for submitting well-calibrated probabilities that reflect their uncertainty about​ ​which​ ​crowns​ ​belong​ ​to​ ​which​ ​class. If the probability values do not sum to 1, they will be renormalized. Cross entropy scores will be computed using sklearn’s function as follows: 

sklearn.metrics.log_loss(y_true, y_pred, eps=1e-15, normalize=True)

Finally, full confusion matrices will be calculated for each submission to allow for further analysis, discussion and comparison, particularly to identify classes that are commonly confused (e.g., species within a genus) across methods. Confusion matrices will be computed using sklearn’s function as follows: 

sklearn.metrics.confusion_matrix(y_true, y_pred)