'Comically bad' datasets used to train clinical models for stroke and diabetes
Summary
A detailed report on 'comically bad' datasets used to train clinical models for stroke and diabetes. The Kaggle dataset contains celebrity images used as stroke indicators, raising concerns about data provenance, ethics, and reliability. The article traces how multiple papers relied on these dubious datasets and how publishers and platforms responded, including retractions and calls for better provenance checks.