July 31st, 2024

Aryn/deformable-detr-DocLayNet – open-source Layout Model

The Deformable DETR model, trained on DocLayNet, achieves 57.1 mAP for object detection using a transformer architecture. It is available on Hugging Face and has been downloaded 108,960 times recently.

Read original articleLink Icon
Aryn/deformable-detr-DocLayNet – open-source Layout Model

The Deformable DETR model, trained on the DocLayNet dataset, is designed for object detection tasks. It utilizes an encoder-decoder transformer architecture with a convolutional backbone, incorporating two heads for class label prediction and bounding box regression. The model employs object queries to identify specific objects within images, with a typical configuration of 100 queries for datasets like COCO. Training involves a bipartite matching loss, which aligns predicted classes and bounding boxes with ground truth annotations using the Hungarian matching algorithm. The model achieves a mean Average Precision (mAP) of 57.1 on the DocLayNet dataset, which consists of 80,000 annotated pages across 11 classes.

To utilize the model, users can import necessary libraries, load an image, and process it through the model to obtain detection results. The outputs include bounding boxes and class logits, which can be filtered based on a confidence threshold. The model is accessible via the Hugging Face platform, where users can find additional resources and related models. The Deformable DETR model is licensed under Apache 2.0, and its development is documented in the paper "Deformable DETR: Deformable Transformers for End-to-End Object Detection." The model has been downloaded 108,960 times in the last month, indicating significant interest and usage within the community.

Link Icon 0 comments