Depth Anything V2
Depth Anything V2 is a monocular depth estimation model trained on synthetic and real images, offering improved details, robustness, and speed compared to previous models. It focuses on enhancing predictions using synthetic images and large-scale pseudo-labeled real images.
Read original articleDepth Anything V2 is a monocular depth estimation model trained on 595K synthetic labeled images and 62M+ real unlabeled images. It offers finer details, improved robustness compared to Depth Anything V1 and SD-based models, and is 10 times faster and lighter than SD-based models. The model also demonstrates impressive fine-tuned performance and releases six metric depth models for indoor and outdoor scenes. The work focuses on enhancing depth predictions by using synthetic images, scaling up the teacher model, and training student models on large-scale pseudo-labeled real images. The models are more efficient and accurate than those built on Stable Diffusion. The framework involves training a teacher model on synthetic images to generate pseudo labels for real images, followed by training student models on these labels. The dataset includes 595K synthetic images and 62M+ real pseudo-labeled images. The paper provides a benchmark for evaluation with sparse depth annotations to support future research.
Related
Video annotator: a framework for efficiently building video classifiers
The Netflix Technology Blog presents the Video Annotator (VA) framework for efficient video classifier creation. VA integrates vision-language models, active learning, and user validation, outperforming baseline methods with an 8.3 point Average Precision improvement.
MeshAnything – Converts 3D representations into efficient 3D meshes
MeshAnything efficiently generates high-quality Artist-Created Meshes with optimized topology, fewer faces, and precise shapes. Its innovative approach enhances 3D industry applications by improving storage and rendering efficiencies.
HybridNeRF: Efficient Neural Rendering
HybridNeRF combines surface and volumetric representations for efficient neural rendering, achieving 15-30% error rate improvement over baselines. It enables real-time framerates of 36 FPS at 2K×2K resolutions, outperforming VR-NeRF in quality and speed on various datasets.
TERI, almost IRL Blade Runner movie image enhancement tool
Researchers at the University of South Florida introduce TERI, an image processing algorithm inspired by Blade Runner, reconstructing hidden objects in photos using shadows. Potential applications in self-driving vehicles and robotics.
DETRs Beat YOLOs on Real-Time Object Detection
DETRs outperform YOLOs with RT-DETR model, balancing speed and accuracy by adjusting decoder layers. Achieving 53.1% / 54.3% AP on COCO and 108 / 74 FPS on T4 GPU, RT-DETR-R50 surpasses DINO-R50 by 2.2% AP and 21 times in FPS.
Related
Video annotator: a framework for efficiently building video classifiers
The Netflix Technology Blog presents the Video Annotator (VA) framework for efficient video classifier creation. VA integrates vision-language models, active learning, and user validation, outperforming baseline methods with an 8.3 point Average Precision improvement.
MeshAnything – Converts 3D representations into efficient 3D meshes
MeshAnything efficiently generates high-quality Artist-Created Meshes with optimized topology, fewer faces, and precise shapes. Its innovative approach enhances 3D industry applications by improving storage and rendering efficiencies.
HybridNeRF: Efficient Neural Rendering
HybridNeRF combines surface and volumetric representations for efficient neural rendering, achieving 15-30% error rate improvement over baselines. It enables real-time framerates of 36 FPS at 2K×2K resolutions, outperforming VR-NeRF in quality and speed on various datasets.
TERI, almost IRL Blade Runner movie image enhancement tool
Researchers at the University of South Florida introduce TERI, an image processing algorithm inspired by Blade Runner, reconstructing hidden objects in photos using shadows. Potential applications in self-driving vehicles and robotics.
DETRs Beat YOLOs on Real-Time Object Detection
DETRs outperform YOLOs with RT-DETR model, balancing speed and accuracy by adjusting decoder layers. Achieving 53.1% / 54.3% AP on COCO and 108 / 74 FPS on T4 GPU, RT-DETR-R50 surpasses DINO-R50 by 2.2% AP and 21 times in FPS.