Conventional object detection dataset preparation pipelines impose a multi-stage engineering burden: images are collected separately, annotated using desktop tools that produce intermediate formats such as Pascal VOC XML or COCO JSON, and subsequently processed through conversion scripts before becoming training-ready. This fragmented workflow is inaccessible on resource-constrained devices and imposes unnecessary latency on rapid AI development cycles. This paper introduces YOLOFlow, an Android-native annotation and dataset engineering framework engineered specifically to eliminate these intermediate stages. YOLOFlow integrates image capture, touch-based bounding box annotation, realtime YOLO-format coordinate generation, and structured dataset export into a single 6.3 MB application that operates entirely offline on devices with as little as 500 MB of RAM. The framework standardizes image resolution at 640 × 640 pixels to decouple annotation accuracy from device hardware diversity. A dataset of 1,247 garbage and waste images was annotated using YOLOFlow and evaluated against an equivalent LabelImg-based desktop pipeline across six operational dimensions. YOLOFlow reduced the end-to-end dataset preparation workflow from seven discrete stages to three, eliminated all format conversion requirements, and produced training-ready YOLO datasets exportable via USB, compressed archive, or direct transfer through communication platforms. The proposed framework demonstrates particular utility for field data collection, edge AI prototyping, and resource-constrained research environments where desktop annotation infrastructure is unavailable.
Sanmay Kotkar (Tue,) studied this question.