Early detection of colorectal polyps is critical for reducing risks of cancer-related mortality. However, polyps are still often missed during colonoscopy. This study aimed to develop and evaluate a YOLOv9-based polyp detection model, augmented with synthetic polyp images generated using a diffusion model.
The YOLOv9-C model, pretrained on the MS COCO dataset, was fine-tuned on the LDPolypVideo dataset. The model trained with all the synthetic images (precision 0.895, recall 0.776, f1 0.831, and mAP 50 0.859) performed better than the model trained without synthetic images (precision 0.812, recall 0.706, f1 0.755, and mAP 50 0.783) in all metrics when internally tested on the LDPolypVideo dataset.
Our best performing model trained on the LDPolypVideo dataset augmented with synthetic polyps achieved a high precision of 0.878, recall of 0.861, f1 score of 0.869, and mAP 50 of 0.897 when externally tested on the Kvasir-SEG dataset.
Our YOLOv9 model was able to generalize beyond its internal training, validation, and test sets, to achieve very good performance even on an external dataset for which it had not been trained on. The integration of synthetic polyp images improved the performance of the YOLOv9 model for colorectal polyp detection. This approach addresses the challenge of limited polyp diversity in training datasets and enhances the potential for real-world application in colonoscopy.