As ecological research and biomonitoring increasingly explore and measure diatom diversity using molecular methods, it is becoming more and more crucial to ‘curate’ the sequence data in publicly available databases (such as GenBank and BOLD) to create a reliable reference database that can be used to identify High Throughput Sequencing DNA reads. This is because many of the publicly available sequences are from isolates that were incorrectly identified at the time they were obtained, or that have become incorrectly identified because of improvements in taxonomy.
To respond to this need, a group of us, led by Frédéric Rimet (INRAE, Thonon, France) has therefore been working to update a DNA reference database of rbcL sequences, called Diat.barcode, first created by Frédéric in 2012. Every year, we collaborate to review new and old sequences and a new version of the database is produced and made available (open access) for downloading. The curatorial work is described in Rimet et al (2019) and the database (old and new versions, currently version 8) is available at https://www6.inrae.fr/carrtel-collection/Barcoding-database. Enjoy!
Reference: Rimet, F., Gusev, E., Kahlert, M., Kelly, M.G., Kulikovskiy, M., Maltsev, Y., Mann, D.G., Pfannkuchen, M., Trobajo, R., Vasselon, V., Zimmermann, J. & Bouchez, A. (2019). Diat.barcode, an open-access curated barcode library for diatoms. Scientific Reports 9: 15116. https://doi.org/10.1038/s41598-019-51500-6