Open Datasets

At the Future Blood Testing Network+ we are putting together a list of open datasets pertaining to blood monitoring and healthcare data. If you have any suggestions for additions to this list please email s.kanza@reading.ac.uk

Awesome-CGM
https://github.com/irinagain/Awesome-CGM
This is a collection of links to publicly available continuous glucose monitoring (CGM) data. CGMs are small wearable devices that allow to measure glucose levels continuously throughout the day, with some meters taking measurements as often as every 5 min. For the head start on CGM data analyses, check out our R package iglu. This collection follows the style of Mike Love’s awesome-multi-omics and Sean Davis’ awesome-single-cell repos, although the latter are collections of methods rather than dataset links.

PhysioNet
https://physionet.org/
PhysioNet, as noted, is not only the name of the Research Resource for Complex Physiologic Signals, but also of its web site, physionet.org. The website was established by the Resource as its mechanism for free and open dissemination and exchange of recorded biomedical signals and open-source software for analyzing them, by providing facilities for cooperative analysis of data and evaluation of proposed new algorithms. 

A dataset for microscopic peripheral blood cell images for development of automatic recognition systems
https://data.mendeley.com/datasets/snkd93bnjr/1

The dataset contains a total of 17,092 images of individual normal cells, which were acquired using the analyzer CellaVision DM96 in the Core Laboratory at the Hospital Clinic of Barcelona. The dataset is organized in the following eight groups: neutrophils, eosinophils, basophils, lymphocytes, monocytes, immature granulocytes (promyelocytes, myelocytes, and metamyelocytes), erythroblasts and platelets or thrombocytes. The size of the images is 360 x 363 pixels, in format JPG, and they were annotated by expert clinical pathologists. The images were captured from individuals without infection, hematologic or oncologic disease and free of any pharmacologic treatment at the moment of blood collection. This high-quality labelled dataset may be used to train and test machine learning and deep learning models to recognize different types of normal peripheral blood cells. To our knowledge, this is the first publicly available set with large numbers of normal peripheral blood cells, so that it is expected to be a canonical dataset for model benchmarking.