A PREPRINT - MAY 6, 2021
Discussion.
From this competition, we have some reflections. For the end-to-end table recognition to the HTML code,
structure prediction is an extremely important stage, especially for the TEDS indicator. As shown in Figure 7, although
all text line information is correctly recognized. Our method obtains very low TEDS (0.423) due to wrong structure
prediction. Although the provided data set is large, we still believe larger scale of data that cover more templates
may further improve the structure prediction. Secondly, text line detection and text line recognition are easy tasks
considering all table images are print. Thirdly, There are some labeling inconsistency issues, such as <td></td> and
<td> </td>. Finally, the box assignment sub-task can be conducted by Graph Neural Network (GNN) [
12
] instead of
hand-crafted rules.
4 Conclusion
In this paper, we present our solution for the ICDAR 2021 competition on Scientific Literature Parsing task B: table
recognition to HTML. We divide the table recognition system into four sub-tasks, table structure prediction, text line
detection, text line recognition, and box assignment. Our system gets a 96.84 TEDS scores on the validation data set in
the development phase, and gets a 96.324 TEDS score in the final evaluation phase.
References
[1]
Ning Lu, Wenwen Yu, Xianbiao Qi, Yihao Chen, Ping Gong, Rong Xiao, and Xiang Bai. Master: Multi-aspect
non-local network for scene text recognition. Pattern Recognition, 2021.
[2]
Wenhai Wang, Enze Xie, Xiang Li, Wenbo Hou, Tong Lu, Gang Yu, and Shuai Shao. Shape robust text detection
with progressive scale expansion network. In Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pages 9336–9345, 2019.
[3]
Xu Zhong, Elaheh ShafieiBavani, and Antonio Jimeno Yepes. Image-based table recognition: data, model, and
evaluation. arXiv preprint arXiv:1911.10683, 2019.
[4]
Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, and Amit Agrawal.
Context encoding for semantic segmentation. In Proceedings of the IEEE conference on Computer Vision and
Pattern Recognition, pages 7151–7160, 2018.
[5] Less Wright. Ranger-Deep-Learning-Optimizer, 2019.
[6]
Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han. On
the variance of the adaptive learning rate and beyond. In Proceedings of the Eighth International Conference on
Learning Representations (ICLR 2020), April 2020.
[7]
Michael R Zhang, James Lucas, Geoffrey Hinton, and Jimmy Ba. Lookahead optimizer: k steps forward, 1 step
back. arXiv preprint arXiv:1907.08610, 2019.
[8]
Hongwei Yong, Jianqiang Huang, Xiansheng Hua, and Lei Zhang. Gradient centralization: A new optimization
technique for deep neural networks. In European Conference on Computer Vision, pages 635–652. Springer, 2020.
[9]
Zi-Yi Dou, Zhaopeng Tu, Xing Wang, Shuming Shi, and Tong Zhang. Exploiting deep representations for neural
machine translation. arXiv preprint arXiv:1810.10181, 2018.
[10]
Yelin He, Xianbiao Qi, Jiaquan Ye, Peng Gao, Yihao Chen, Bingcong Li, Xin Tang, and Rong Xiao. Pingan-
vcgroup’s solution for icdar 2021 competition on scientific table image recognition to latex. arXiv, 2021.
[11]
Pratik Kayal, Mrinal Anand, Harsh Desai, and Mayank Singh. Icdar 2021 competition on scientific table image
recognition to latex. In 2021 International Conference on Document Analysis and Recognition (ICDAR). IEEE,
2021.
[12]
Yihao Chen, Xin Tang, Xianbiao Qi, Chun-Guang Li, and Rong Xiao. Learning graph normalization for graph
neural networks. arXiv preprint arXiv:2009.11746, 2020.
8