From import to inference.
Four lines.
Bounding boxes, segmentation masks, and pose skeletons from a single import. Millisecond inference. No dependency hell.

{"latency_ms": 4.2,"detections": [{ "label": "person", "conf": 0.97, "bbox": [142, 88, 310, 420] },{ "label": "laptop", "conf": 0.91, "bbox": [380, 210, 560, 380] },{ "label": "backpack", "conf": 0.83, "bbox": [88, 310, 195, 490] }]}
Instruments on. Numbers real.
Five modules. One API.
Every capability uses the same model.task(frame) pattern. No context switching.
Object Detection
YOLO-v9, RT-DETR, and custom architectures. Multi-class, multi-scale, with NMS built in.
Instance Segmentation
Pixel-perfect masks per instance. SAM-2, Mask R-CNN, and YOLO-seg in one API surface.
Multi-Object Tracking
ByteTrack, DeepSORT, and BoT-SORT. Persistent IDs across frames, even through occlusions.
Pose Estimation
17-keypoint COCO skeleton. Whole-body, hand, and face landmarks. Real-time on edge devices.
Image Classification
Top-k predictions with calibrated confidence. EfficientNet, ViT, ConvNeXt. Fine-tune in 3 lines.
Plugs into your stack. Ships fast.
Stop evaluating.
Start inferring.
Free tier. No email gate. Apache 2.0. The only thing between you and 4ms inference is one terminal command.