July 2nd, 2024

DETRs Beat YOLOs on Real-Time Object Detection

DETRs outperform YOLOs with RT-DETR model, balancing speed and accuracy by adjusting decoder layers. Achieving 53.1% / 54.3% AP on COCO and 108 / 74 FPS on T4 GPU, RT-DETR-R50 surpasses DINO-R50 by 2.2% AP and 21 times in FPS.

Read original articleLink Icon
DETRs Beat YOLOs on Real-Time Object Detection

DETRs have surpassed YOLOs in real-time object detection by introducing the Real-Time DEtection TRansformer (RT-DETR) model. RT-DETR focuses on maintaining accuracy while improving speed and vice versa, offering a flexible approach by adjusting the number of decoder layers. It outperforms YOLOs in both speed and accuracy metrics, achieving 53.1% / 54.3% AP on COCO and 108 / 74 FPS on T4 GPU. Additionally, RT-DETR-R50 outperforms DINO-R50 by 2.2% AP in accuracy and about 21 times in FPS. The model leverages an efficient hybrid encoder and uncertainty-minimal query selection to enhance performance. The NMS analysis reveals that adjusting confidence thresholds can impact the execution time of NMS, affecting the overall detection process. The hybrid encoder in RT-DETR processes multi-scale features efficiently, while the uncertainty-minimal query selection improves the quality of initial queries for the decoder. These advancements contribute to RT-DETR's success in real-time object detection, showcasing superior performance compared to existing frameworks.

Related

20x Faster Background Removal in the Browser Using ONNX Runtime with WebGPU

20x Faster Background Removal in the Browser Using ONNX Runtime with WebGPU

Using ONNX Runtime with WebGPU and WebAssembly in browsers achieves 20x speedup for background removal, reducing server load, enhancing scalability, and improving data security. ONNX models run efficiently with WebGPU support, offering near real-time performance. Leveraging modern technology, IMG.LY aims to enhance design tools' accessibility and efficiency.

HybridNeRF: Efficient Neural Rendering

HybridNeRF: Efficient Neural Rendering

HybridNeRF combines surface and volumetric representations for efficient neural rendering, achieving 15-30% error rate improvement over baselines. It enables real-time framerates of 36 FPS at 2K×2K resolutions, outperforming VR-NeRF in quality and speed on various datasets.

Etched Is Making the Biggest Bet in AI

Etched Is Making the Biggest Bet in AI

Etched invests in AI with Sohu, a specialized chip for transformers, surpassing traditional models like DLRMs and CNNs. Sohu optimizes transformer models like ChatGPT, aiming to excel in AI superintelligence.

Whats better: Neural nets wider with less layers or thinner with more layers

Whats better: Neural nets wider with less layers or thinner with more layers

Experiments compared Transformer models with varying layer depths and widths. Optimal performance was achieved with a model featuring four layers and an embedding dimension of 1024. Balancing layer depth and width is crucial for efficiency and performance improvement.

Getting the World Record in Hatetris (2022)

Getting the World Record in Hatetris (2022)

David and Felipe set a world record in HATETRIS, a tough Tetris version. They used Rust, MCTS, and AlphaZero concepts to enhance gameplay, achieving a score of 66 points in 2021.

Link Icon 4 comments
By @isoprophlex - 6 months
Good to see some progress in the space, as NMS always felt like a very hacky solution to me.

Also good that the YOLO moniker is being challenged. After pjreddie went off to do better things, I've always felt a bit sad about random parties co-opting the YOLO name. And then with Ultralyics and their weird approach to monetizing YOLOv8 "yeah uh so it's open source but if you actually train a model plz fork over your money)... not a nice look.

By @imjonse - 6 months
YOLO 9 and YOLO 10 were both published after this paper and according to them they are better than RTDETR.
By @GaggiX - 6 months
Just to clarify this paper was released in April 2023.