Data Challenge

CVPR 2019 AAMVEM Workshop Challenge

GOAL: achieve the highest detection and classification accuracy of organisms in the provided videos.


Please find the annotated training data here, and the challenge data here.

Data Overview

The data releases are comprised of images and annotations from five different data sources, with six datasets in total.

  • HabCam: abcam_seq0
  • MOUSS: mouss_seq2, mouss_seq3
  • AFSC DropCam: afsc_seq0
  • MBARI: mbari_seq0
  • NWFSC: nwfsc_seq0

Each dataset contains different types of imagery — different lighting conditions, camera angles, and wildlife. The data released depends on the nature of the data in the entire dataset. The following gallery contains an example image from each dataset. See details below.

HabCam:abcam_seq0 The HabCam imagery is collected from a down-looking, RGB camera about 1.5m above the ocean floor. The platform includes synchronized flashes near the cameras, as there is no ambient illumination at the collection depths (>200m). The images tend to show scallops, sand dollars, rocks, sand and the occasional fish. The annotations include scallops and fish. The initial dataset has 10465 images. The images are 2720x1024. Note that habcam imagery was captured in stereo. Each image in this dataset contains both the left and right camera stacked horizontally. Annotations are only provided for the left camera.
MOUSS: mouss_seq2 The MOUSS data is collected from a stationary, horizontal, grayscale camera 1-2 meters above the ocean floor, using ambient illumination. Typically the camera is on the bottom for 15 minutes in each position. The initial data is from three such collections, consisting of 159 images with species labels on all fish close enough to the camera to be identified by a human expert. Fish that are smaller than this (about 30 pixels in length) will not be disregarded in scoring, i.e. missing them will not count against recall and detecting them will not count as false alarms. The test data without released annotations will include images from the training collections as well as novel collections. The images have a resolution of 968x728.
MOUSS: mouss_seq3 The AFSC data is collected from an underwater remotely operated vehicle (ROV) equipped with an RGB video camera looking horizontally. The overall dataset consists of a number of videos from different ROV missions. Because the platform moves slowly, some of the images have a fair amount of spatial overlap; such groups of images are called “clusters”. The released data is randomly sampled from some of the clusters, while other clusters are sequestered and will only be released in the test data. There are 571 images in this dataset. The image resolution is 2112x2816.
AFSC DropCam: afsc_seq0 The AFSC data is collected from an underwater remotely operated vehicle (ROV) equipped with an RGB video camera looking horizontally. The overall dataset consists of a number of videos from different ROV missions. Because the platform moves slowly, some of the images have a fair amount of spatial overlap; such groups of images are called “clusters”. The released data is randomly sampled from some of the clusters, while other clusters are sequestered and will only be released in the test data. There are 571 images in this dataset. The image resolution is 2112x2816.
MBARI: mbari_seq0 The MBARI dataset was collected by the Monterey Bay Aquariam Research Institute. It constains a a single video consisting of 666 RGB frames. Each image has a 1920x1080 resolution.
NWFSC: nwfsc_seq0 The NWFSC data was also collected from an ROV, but looking downward at the ocean floor. The spatial overlap in this collection is minimal, and the released data is randomly sampled from the overall set. There are 123 images in the initial release and these images have a resolution of 2448x2050. The annotations in this dataset are actually keypoints instead of bounding boxes.

Ground Truth Annotation Format

The annotations are formatted according to the MSCOCO standard. This is a JSON-compliant format which contains the images included, the annotations for those images, and the categories for those annotations.

See Sections 1 and 2 on the official documentation page for more information.

Results Annotation Format

The results files submitted must be in the COCO results format. Failure to comply with this standard will cause your submission to fail. Further, please not that your submission files may (currently) only contain bounding box annotations, not key points. These are JSON-compliant format which, for each annotation, you include the image ID, category ID (both of which are defined in the input MSCOCO file), the score of your detector, and either the bounding box or key points of the detection.

Archive Structure

The data are in two tar balls which are structured as detailed below:

Imagery Tarball

The imagery tarball contains 6 folders, each corresponding to a dataset. Each folder contains the images belonging to that dataset either in jpeg or png format. The image names have not been changed and should be considered arbitrary.


Annotations Tarball

The root annotations tarball has 6 symlinks and 6 folders. Each folder corresponds to a dataset and contains 5 different “flavors” of the dataset (see dataset details for more information on different flavors). The 6 symlinks link to the default “coarse-bbox-only” flavor of the dataset.

data-challenge-training-annotations.tar.gzafsc_seq0.mscoco.json -> afsc_seq0/afsc_seq0-coarse-bbox-only.mscoco.jsonhabcam_seq0.mscoco.json ->habcam_seq0/habcam_seq0-coarse-bbox-only.mscoco.jsonmouss_seq0.mscoco.json ->mouss_seq0/mouss_seq0-coarse-bbox-only.mscoco.jsonmouss_seq1.mscoco.json ->mouss_seq1/mouss_seq1-coarse-bbox-only.mscoco.jsonmbari_seq0.mscoco.json ->mbari_seq0/mbari_seq0-coarse-bbox-only.mscoco.jsonnwfsc_seq0.mscoco.json ->nwfsc_seq0/nwfsc_seq0-coarse-bbox-only.mscoco.jsonafsc_seq0afsc_seq0-coarse-bbox-only.mscoco.jsonafsc_seq0-fine-bbox-only.mscoco.jsonafsc_seq0-coarse-bbox-keypoints.mscoco.jsonafsc_seq0-fine-bbox-keypoints.mscoco.jsonoriginal_afsc_seq0.mscoco.jsonhabcam_seq0habcam_seq0-coarse-bbox-only.mscoco.jsonhabcam_seq0-fine-bbox-only.mscoco.jsonhabcam_seq0-coarse-bbox-keypoints.mscoco.jsonhabcam_seq0-fine-bbox-keypoints.mscoco.jsonoriginal_habcam_seq1.mscoco.jsonmouss_seq0mouss_seq1mbari_seq0nwfsc_seq0..

Dataset Details

The six original datasets consist of several disparate annotation formats. Some objects were annotated using boxes, others using lines, and others using points. Furthermore, the raw class labelings were inconsistent between datasets. For these reasons we have taken steps to preprocess and standardize the data, which itself was a challenge. First, all datasets have been converted into the MSCOCO format.

To both capture the original nature of the datasets and provide ready-to-use annotations we made the decision to create 4 flavors of the annotations. For each dataset we create a variants with either coarse or fine-trained categories (see notes about category standardization) and variants with or without the more challenging keypoint annotations (see notes about keypoint annotations). We also include the original raw categories.

Thus, for each dataset (afsc_seq0, habcam_seq0, mbari_seq0, mouss_seq0, mouss_seq1, nwfsc_seq0) there are 5 annotation files.

  • -coarse-bbox-only.mscoco.json
  • -fine-bbox-only.mscoco.json
  • -coarse-bbox-keypoints.mscoco.json
  • -fine-bbox-keypoints.mscoco.json
  • original_.mscoco.json

Note that contestants may choose to be evaluated on the coarse-grained or fine-grained categories or both. Contestants will not be scored on keypoint annotations, but they may be used in training.

Category standardization and hierarchy

In an effort to standardize the class labels between the different datasets we have relabeled the categories in the original datasets by mapping each category to the appropriate and most specific scientific organism name. This mapping defines the fine-grained categorization.

Because many classes only had a few examples we made the choice to coarsen the categorization and merge related classes together (e.g. all types of rockfish, greenlings, etc. were merged into the Scorpaeniformes category). This reduction significantly increases the number of examples-per-class for most categories.

In both the coarse and fine-grained cases, we provide a category hierarchy, created using the NCBI taxonomic database. This encodes the information that annotations originally labeled as “Rockfish” might reasonably be labeled as a “Sebastes maliger” or “Sebastes ruberrimus” in the fine-grained case. An example in the coarse grained case are the categories: “Fish” and its children “Pleuronectiformes” (flat fish) and “NotPleuronectiformes” (round fish). The category hierarchy is encoded as a tree (actually a forest) in the MSCOCO data using the “supercategory” attribute. Note that each dataset contains annotations from both leaf and non-leaf categories.

Images without annotations

In some cases there may be an image that has no annotations, but contains objects of interest. In an effort to provide information about when this is the case we augment each image object in the MSCOCO json dataset with an attribute “has_annots”. If “has_annots” can be either true, false, or null. If it is true, then the image contains objects of interest even if there are no annotation objects associated with it (e.g. if they were removed keypoint annotations). If “has_annots” is false, then that image was explicitly labeled as having no objects of interest. Otherwise, if “has_annots” is null, then the image might or might not have objects of interest. However, in most circumstances if “has_annots” is null the image contains no objects of interest.

Bounding Box, Line, and Keypoint Annotations

Originally, the datasets contained annotations in the forms of boxes, lines, and points. In our preprocessing step, we have converted of line annotations into boxes by interpreting each line as the diameter of a circle and inscribing a box around that circle. The majority of annotations are provided with bounding box annotations. However, there are a significant number of images where each object is labeled with a keypoint.

For these keypoint annotations, the points are not always in consistent locations on the object. Often the point does not even directly touch the object of interest. The general rule used when creating the keypoint annotations is that each point should be able to be unambiguously associated with a single object. This means that using these keypoint annotations as groundtruth for training an object detector is a tricky challenge.

For these reasons, we provide keypoint annotations will not be count towards final scoring. We provide them for optional use in training a bounding box detector. For convenience, we also provide a flavor of each dataset where the keypoint annotations have been removed. Note, we do not remove the image from the dataset, only the annotations. This can cause an image to appear as if it has no objects in it when in fact it does (see next section for more details).

The following gallery illustrates images with different styles of annotations.

Phase1 Dataset Statistics

The following is a table summarizing statistics for each dataset. The “roi shapes” row indicates the number of annotations of each type (e.g. bbox, keypoints, line) in the original data. The “#negative images” indicates the number of images with no objects of interest. In the case of nwfsc_seq1, images were explicitly labeled as negative, but for habcam_seq1, images without annotations might contain unannotated objects of interest.

For each dataset we summarize the number of annotations for each coarse category.

Finally, the following trees illustrate the coarse grained and fine-grained category heirachy. The suffix of each node is the number of annotations in the phase1 data with that label in all datasets. We then summarize this data on a per-dataset basis.

Coarse Heirarchy├── “Physical”:0│ ├── “Animalia”:0│ │ ├── “Decapoda”:623│ │ ├── “Echinodermata”:189│ │ ├── “Chordata”:0│ │ │ ├── “Fish”:929│ │ │ │ ├── “Pleuronectiformes”:2050│ │ │ │ └── “NotPleuronectiformes”:1913│ │ │ │ ├── “Scorpaeniformes”:4580│ │ │ │ ├── “Gadiformes”:697│ │ │ │ ├── “Perciformes”:1809│ │ │ │ ├── “Rajiformes”:317│ │ │ │ ├── “Osmeriformes”:11│ │ │ │ └── “Carcharhiniformes”:175│ │ │ └── “Aplousobranchia”:1│ │ └── “Mollusca”:0│ │ ├── “Cephalopoda”:2│ │ ├── “Gastropoda”:186│ │ └── “Osteroida”:42133│ └── “NonLiving”:2067└── “ignore”:0
Fine-Grained Heirarchy├── “Physical”:0│ ├── “Animalia”:0│ │ ├── “Chordata”:0│ │ │ ├── “Fish”:929│ │ │ │ ├── “Pleuronectiformes”:441│ │ │ │ │ ├── “Solea solea”:1│ │ │ │ │ └── “Pleuronectidae”:0│ │ │ │ │ ├── “Hippoglossus stenolepis”:53│ │ │ │ │ ├── “Glyptocephalus zachirus”:1541│ │ │ │ │ ├── “Atheresthes stomias”:4│ │ │ │ │ ├── “Hippoglossoides elassodon”:5│ │ │ │ │ ├── “Lepidopsetta bilineata”:3│ │ │ │ │ ├── “Eopsetta jordani”:1│ │ │ │ │ └── “Parophrys vetulus”:1│ │ │ │ └── “NotPleuronectiformes”:1909│ │ │ │ ├── “Rajidae”:316│ │ │ │ │ └── “Dipturus oxyrinchus”:1│ │ │ │ ├── “Osmeridae”:11│ │ │ │ ├── “Clupea harengus”:1│ │ │ │ ├── “Carcharhinus”:0│ │ │ │ │ └── “Carcharhinus plumbeus”:175│ │ │ │ ├── “Gadiformes”:0│ │ │ │ │ ├── “Gadidae”:3│ │ │ │ │ │ ├── “Gadus macrocephalus”:351│ │ │ │ │ │ └── “Pollachius”:43│ │ │ │ │ └── “Merluccius productus”:300│ │ │ │ ├── “Hydrolagus colliei”:2│ │ │ │ ├── “Lophius”:1│ │ │ │ ├── “Perciformes”:0│ │ │ │ │ ├── “Bathymasteridae”:30│ │ │ │ │ │ └── “Bathymaster signatus”:50│ │ │ │ │ ├── “Zaprora silenus”:3│ │ │ │ │ ├── “Stichaeidae”:17│ │ │ │ │ ├── “Pholidichthys leucotaenia”:17│ │ │ │ │ ├── “Pristipomoides”:0│ │ │ │ │ │ ├── “Pristipomoides filamentosus”:856│ │ │ │ │ │ └── “Pristipomoides sieboldii”:772│ │ │ │ │ └── “Zoarcidae”:0│ │ │ │ │ ├── “Lycodes”:55│ │ │ │ │ │ └── “Lycodes diapterus”:8│ │ │ │ │ └── “Lycodopsis pacificus”:1│ │ │ │ └── “Scorpaeniformes”:0│ │ │ │ ├── “Cottoidea”:57│ │ │ │ │ ├── “Agonidae”:94│ │ │ │ │ └── “Cottidae”:0│ │ │ │ │ ├── “Hemilepidotus hemilepidotus”:59│ │ │ │ │ └── “Icelinus filamentosus”:2│ │ │ │ ├── “Hexagrammidae”:2│ │ │ │ │ ├── “Pleurogrammus monopterygius”:16│ │ │ │ │ ├── “Hexagrammos decagrammus”:0│ │ │ │ │ └── “Ophiodon elongatus”:1│ │ │ │ ├── “Liparidae”:1│ │ │ │ ├── “Anoplopoma fimbria”:1│ │ │ │ └── “Sebastidae”:0│ │ │ │ ├── “Sebastes”:3083│ │ │ │ │ ├── “Sebastes alutus”:252│ │ │ │ │ ├── “Sebastes polyspinis”:750│ │ │ │ │ ├── “Sebastes ciliatus”:88│ │ │ │ │ ├── “Sebastes variegatus”:120│ │ │ │ │ ├── “Sebastes zacentrus”:4│ │ │ │ │ ├── “Sebastes melanostictus”:1│ │ │ │ │ ├── “Sebastes melanops”:4│ │ │ │ │ ├── “Sebastes proriger”:5│ │ │ │ │ ├── “Sebastes borealis”:1│ │ │ │ │ ├── “Sebastes brevispinis”:11│ │ │ │ │ ├── “Sebastes helvomaculatus”:9│ │ │ │ │ ├── “Sebastes ruberrimus”:3│ │ │ │ │ ├── “Sebastes maliger”:1│ │ │ │ │ ├── “Sebastes elongatus”:8│ │ │ │ │ ├── “Sebastes emphaeus”:2│ │ │ │ │ └── “Sebastes saxicola”:3│ │ │ │ └── “Sebastolobus”:1│ │ │ │ └── “Sebastolobus altivelis”:1│ │ │ └── “Didemnum”:1│ │ ├── “Decapoda”:0│ │ │ ├── “Brachyura”:64│ │ │ │ ├── “Chionoecetes bairdi”:2│ │ │ │ └── “jonah_or_rock_crab”:555│ │ │ ├── “shrimp”:1│ │ │ └── “Homarus americanus”:1│ │ ├── “Echinodermata”:0│ │ │ ├── “Asteroidea”:1│ │ │ │ └── “Rathbunaster californicus”:187│ │ │ ├── “Psolus segregatus”:1│ │ └── “Mollusca”:0│ │ ├── “Cephalopoda”:0│ │ │ ├── “Octopoda”:1│ │ │ └── “Teuthida”:1│ │ ├── “Gastropoda”:0│ │ │ ├── “Nudibranchia”:1│ │ │ └── “Buccinum undatum”:185│ │ └── “Placopecten magellanicus”:0│ │ ├── “scallopdead”:146│ │ │ └── “scallopclapper”:114│ │ └── “scalloplive”:39363│ │ └── “scallopswimming”:2510│ └── “NonLiving”:1963│ ├── “DustCloud”:96│ └── “Rock”:8└── “ignore”:0


The challenge will evaluate accuracy in detection and classification, following the methodology in the MSCOCO Detection Challenge, for bounding box output (not pixel-level segmentation masks). The annotations for scoring are bounding boxes around every animal, with a species classification label for each. Kitware’s online challenge platform will be used to perform submission scoring in an automated way, with participants uploading their annotations (in the MSCOCO format) to the challenge website. Kitware hosts a number of challenges in the biomedical domain, which can be found here.


The submission process is handled on our data challenge site. For a successful submission, the following steps must be taken:

  • For each folder in the challenge data release, you should run your classifier on the imagery and annotation data in that folder. The output of your classifier must be in the COCO results format. The name of your output file MUST be: foldername.mscoco.json, where foldername corresponds to the folder on which your ran your classifier to generate the results. See the File Names section below for a list of the valid filenames one can use for the submission.

Note that, due to the nature of the challenge data being images only, we’ve released a JSON file that maps filenames to the corresponding image_id. So, the image_ids in your result submissions can be found by this file. Please go here for the download.

  • You can submit detections for as few as one or as many as all of the folders contained in the released challenge data. The scorer will only score on the files you submit. Your submission file thus must be in the valid results format, with only bounding box detections, otherwise the scorer will fail.
  • Once you have generated the files you want to submit, go to the submission site for our challenge on
  • On that page, you will see a green button that says “Submit your results”. You’ll be prompted to enter a submission title (choose something descriptive that contains your team name).
  • You will see another green button that says “Browse or drop files here”. When you click that button a file browser should open up, allowing you to select one or many submission files. Choose the file(s) you would like to submit for scoring. Ensure the files follow the naming requirements described in step 1. or in more detail in File Names below. Click “Choose”.
  • Press the “Start Upload” button. You’ll see progress bars showing the upload progress
  • Once the submission is uploaded, refresh your browser and you will see a progress wheel spinning that says “Your submission is being scored, please wait…”. The scoring process can take some time, depending on server load and other factors. Feel free to navigate away from that page.
  • Once the scoring is completed, you’ll receive an email from notifying you as such. Click the link in the email to see your score, which will be broken down by dataset/submission. You can also download your submission to ensure what you expected to be scored was actually scored.

File Names

The challenge data was released with a folder for each dataset. The submission format will closely resemble this, but with files. If you were to make a full submission, you would upload files with the names:


If you would like to make a partial submission, you only need to submit files with the above names corresponding to the datasets you would like to be scored on.