We provide a spectrum of pre-trained models on different datasets.
import layoutparser as lp
model = lp.Detectron2LayoutModel(
config_path ='lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config', # In model catalog
label_map ={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}, # In model`label_map`
extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8] # Optional
)
model.detect(image)import layoutparser as lp
model = lp.PaddleDetectionLayoutModel(
config_path="lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config", # In model catalog
label_map ={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}, # In model`label_map`
threshold =0.5] # Optional
)
model.detect(image)| Dataset | Model | Config Path | Eval Result (mAP) |
|---|---|---|---|
| HJDataset | faster_rcnn_R_50_FPN_3x | lp://HJDataset/faster_rcnn_R_50_FPN_3x/config | |
| HJDataset | mask_rcnn_R_50_FPN_3x | lp://HJDataset/mask_rcnn_R_50_FPN_3x/config | |
| HJDataset | retinanet_R_50_FPN_3x | lp://HJDataset/retinanet_R_50_FPN_3x/config | |
| PubLayNet | faster_rcnn_R_50_FPN_3x | lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config | |
| PubLayNet | mask_rcnn_R_50_FPN_3x | lp://PubLayNet/mask_rcnn_R_50_FPN_3x/config | |
| PubLayNet | mask_rcnn_X_101_32x8d_FPN_3x | lp://PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/config | 88.98 eval.csv |
| PubLayNet | ppyolov2_r50vd_dcn_365e_publaynet | lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config | 93.6 eval.csv |
| PrimaLayout | mask_rcnn_R_50_FPN_3x | lp://PrimaLayout/mask_rcnn_R_50_FPN_3x/config | 69.35 eval.csv |
| NewspaperNavigator | faster_rcnn_R_50_FPN_3x | lp://NewspaperNavigator/faster_rcnn_R_50_FPN_3x/config | |
| TableBank | faster_rcnn_R_50_FPN_3x | lp://TableBank/faster_rcnn_R_50_FPN_3x/config | 89.78 eval.csv |
| TableBank | faster_rcnn_R_101_FPN_3x | lp://TableBank/faster_rcnn_R_101_FPN_3x/config | 91.26 eval.csv |
| TableBank | ppyolov2_r50vd_dcn_365e_tableBank_word | lp://TableBank/ppyolov2_r50vd_dcn_365e_tableBank_word/config | 96.2 eval.csv |
- For PubLayNet models, we suggest using
mask_rcnn_X_101_32x8d_FPN_3xmodel as it's trained on the whole training set, while others are only trained on the validation set (the size is only around 1/50). You could expect a 15% AP improvement using themask_rcnn_X_101_32x8d_FPN_3xmodel. - Compare the time cost of Detectron2 and PaddleDetection(ppyolov2_* models in the above table):
PubLayNet Dataset:
| Model | model | mAP | CPU time cost | GPU time cost |
|---|---|---|---|---|
| Detectron2 | mask_rcnn_X_101_32x8d_FPN_3x | 89.0 | 16545.5ms | 209.5ms |
| PaddleDetection | ppyolov2_r50vd_dcn_365e | 93.6 | 1713.7ms | 66.6ms |
TableBank Dataset:
| Model | model | mAP | CPU time cost | GPU time cost |
|---|---|---|---|---|
| Detectron2 | faster_rcnn_R_101_FPN_3x | 91.3 | 7623.2ms | 104.2.ms |
| PaddleDetection | ppyolov2_r50vd_dcn_365e | 96.2 | 1968.4ms | 65.1ms |
Envrionment:
CPU: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz,24core
GPU: a single NVIDIA Tesla P40
| Dataset | Label Map |
|---|---|
| HJDataset | {1:"Page Frame", 2:"Row", 3:"Title Region", 4:"Text Region", 5:"Title", 6:"Subtitle", 7:"Other"} |
| PubLayNet | {0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"} |
| PrimaLayout | {1:"TextRegion", 2:"ImageRegion", 3:"TableRegion", 4:"MathsRegion", 5:"SeparatorRegion", 6:"OtherRegion"} |
| NewspaperNavigator | {0: "Photograph", 1: "Illustration", 2: "Map", 3: "Comics/Cartoon", 4: "Editorial Cartoon", 5: "Headline", 6: "Advertisement"} |
| TableBank | {0: "Table"} |