Context
A robotics R&D project requiring real-time coordination of multiple autonomous quadcopters. The system needed to handle high-frequency sensor data from motion capture, onboard IMUs, and computer vision — all while maintaining deterministic control loops and producing analyzable datasets.
Problem
Coordinating multiple autonomous drones presents significant data engineering challenges:
- High-frequency data streams: 6+ devices producing IMU, position, and video data simultaneously
- Real-time requirements: Control decisions needed within milliseconds
- Training data quality: ML models for drone detection required carefully curated datasets
- Reproducibility: Flight logs needed to support post-hoc analysis and debugging
What We Built
Data Pipeline Architecture
Sensor Fusion Layer:
- OptiTrack motion capture integration via VRPN protocol (6+ rigid bodies)
- Per-drone IMU polling with coordinate frame corrections (body frame FLU → lab frame ENU)
- Automatic yaw calibration handling 180-degree quaternion ambiguity
- Timestamp-synchronized logging across all devices
Protocol Buffers Logging:
- Custom
.protoschema for efficient binary serialization - Message types: OptiTrack position, IMU readings, RC commands, coordination state
- Supports 200Hz+ write rates with minimal overhead
- CSV export for analysis, binary archives for replay
ML Training Pipeline
Custom Dataset Tooling:
- Interactive annotation editor with keyboard/mouse controls for bounding box adjustment
- Chunk-based train/val splitting to prevent temporal data leakage
- Automatic smoothing filters for label jitter in video sequences
- Statistical analysis for diagnosing camera pitch issues
YOLOv8 Drone Detection:
- Specialized augmentation strategy for small object detection (drones at 20-100px)
- Disabled mosaic augmentation (prevents shrinking small objects)
- Conservative scaling (0.3 vs default 1.0) to preserve drone visibility
- Trained on lab environment data with careful session isolation
OWLv2 Zero-Shot Detection:
- Google’s Open-World Localization Vision Transformer
- Natural language prompts: “DJI Tello, quadcopter, drone”
- Advanced filtering: size constraints, aspect ratio, ROI masking, motion detection
- Hysteresis tracking for stable bounding boxes across frames
Distributed Control System
Multi-Drone Coordination:
- DroneManager handling 6+ concurrent Tello EDU units
- JSON-based device configuration with MAC/IP/tracker mappings
- Auto-reconnection logic with 2Hz health checks
- Thread-safe Qt signals/slots architecture
Physics Simulation:
- Force-based flocking model with 7 force components
- KDTree spatial indexing for O(log n) neighbor queries
- Vectorized NumPy operations for real-time integration
- Stress-tested at 50 agents / 200Hz
Technical Stack
- C++17 / Qt: Core control system, real-time loop, visualization
- Python: ML training, data analysis, CV inference
- Protocol Buffers: High-frequency data serialization
- OpenCV / PyTorch: Detection pipeline
- Ultralytics YOLOv8: Object detection training
- VRPN: Motion capture interface
Results
- Real-time coordination of 6+ drones with sub-10ms latency
- Custom ML pipeline from raw video → trained detector → real-time inference
- Reproducible experiments via comprehensive protobuf logging
- Scalable architecture validated at 50+ simulated agents
Architecture described at high level; specific algorithms and trained models remain proprietary.