{"id":1148,"date":"2023-06-01T09:49:52","date_gmt":"2023-06-01T09:49:52","guid":{"rendered":"https:\/\/wp.lancs.ac.uk\/cfie\/?page_id=1148"},"modified":"2023-10-25T07:21:15","modified_gmt":"2023-10-25T07:21:15","slug":"fintoc2023","status":"publish","type":"page","link":"http:\/\/wp.lancs.ac.uk\/cfie\/fintoc2023\/","title":{"rendered":"FinTOC 2023"},"content":{"rendered":"<h1><span style=\"font-size: 18pt;color: #ff6600\"><strong>FinTOC-2023 Shared Task: <\/strong><strong>&#8220;<\/strong><strong>Financial Document Structure Extraction<\/strong><strong>&#8220;<\/strong><\/span><\/h1>\n<p>To be held at <a href=\"https:\/\/wp.lancs.ac.uk\/cfie\/fnp2023\/\">The 5th Financial Narrative Processing Workshop (FNP 2023)<\/a>, Sorrento, Italy, <strong>15-18 December 2023<\/strong>.<\/p>\n<hr \/>\n<p><strong><span style=\"font-size: 14pt\"><span style=\"color: #ff6600\">Important Dates:<\/span><\/span><\/strong><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">1st Call for papers &amp; shared task participants: June 12, 2023<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">2nd Call for papers &amp; shared task participants: July 17, 2023<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Final Call for papers &amp; shared task participants: August 17, 2023<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Training set release: August 21, 2023<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Blind test set release: September 21, 2023<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Systems submission: October 03, 2023<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Release of results: October 09, 2023<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Paper submission deadline:<strong>October 30, 2023<\/strong> (anywhere in the world).<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Notification of paper acceptance to authors: November<\/span> 12,<span style=\"font-weight: 400\"> 2023<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Camera-ready of accepted papers: November 20, 2023<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Workshop date (1-day event): December 15-18, 2023 (exact date to be announced)<\/span><\/li>\n<\/ul>\n<hr \/>\n<p style=\"text-align: justify\"><strong><span style=\"font-size: 14pt\"><span style=\"color: #ff6600\">Introduction:<\/span><\/span><\/strong><\/p>\n<p><span style=\"font-weight: 400\">A vast and continuously growing volume of financial documents is being created and published in machine-readable formats, predominantly in PDF format. Unfortunately, these documents often lack comprehensive structural information, presenting a challenge for efficient analysis and interpretation. Nevertheless, these documents play a crucial role in enabling firms to report their activities, financial situation, and investment plans to shareholders, investors, and the financial markets. They serve as corporate annual reports, offering detailed financial and operational information.<\/span><\/p>\n<p><span style=\"font-weight: 400\">In certain countries like the United States and France, regulators such as the SEC (Securities and Exchange Commission) and the AMF (Financial Markets Authority) have implemented requirements for firms to adhere to specific reporting templates. These regulations aim to promote standardization and consistency across firms&#8217; disclosures. However, in various European countries, management typically possesses more flexibility in determining what, where, and how to report financial information, resulting in a lack of standardization among financial documents published within the same market.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Although there has been some research conducted on the recognition of books and document table of contents (TOC), most of the existing work has focused on small-scale, application-dependent and domain-specific datasets. This limited scope poses challenges when dealing with a vast collection of heterogeneous documents and books, where TOCs from different domains exhibit significant variations in visual layout and style. Consequently, recognizing and extracting TOCs becomes an intricate problem. Indeed, in comparison to regular books that are typically provided in a full-text format with limited structural information, such as pages and paragraphs, financial documents possess a more complex structure. They consist of various elements, including parts, sections, sub-sections, and even sub-sub-sections, incorporating both textual and non-textual content. Thus, TOC pages are not always present to help readers navigate the document, and when they are, they often only provide access to the main sections.<\/span><\/p>\n<p><span style=\"font-weight: 400\">In this shared task, our objective is to undertake the analysis of various types of financial documents:<\/span><\/p>\n<ol>\n<li><span style=\"font-weight: 400\"><strong>KIID<\/strong>: Key Investor Information Document.<\/span><\/li>\n<li><span style=\"font-weight: 400\"><strong>Prospectus<\/strong>: official PDF documents where investment funds meticulously describe their characteristics and investment modalities.<\/span><\/li>\n<li><span style=\"font-weight: 400\"><strong>R\u00e9glement and Financial Annual Reports\/Financial Statements<\/strong>: they provide a detailed overview of a company&#8217;s financial performance and operations over the course of a fiscal year.<\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 400\">These documents play a vital role in providing crucial information to investors, stakeholders, and regulatory bodies. While the content they must contain is often prescribed and regulated, their format lacks standardization, leading to a significant degree of variability. The presentation styles range from plain text format to more visually rich and data-driven graphical and tabular representations. Notably, the majority of those documents are published without a table of contents.<\/span><\/p>\n<p><span style=\"font-weight: 400\">A TOC is typically essential for readers as it enables easy navigation within the document by providing a clear outline of headers and corresponding page numbers. Additionally, TOCs serve as a valuable resource for legal teams, facilitating the verification of the inclusion of all the required contents. Consequently, the automated analysis of these documents to extract their structure is becoming increasingly useful for numerous firms worldwide.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Our primary focus for this edition is to expand the extraction of table of contents to a wider variety of financial documents, and the task will involve developing highly efficient algorithms and methodologies to address the challenges associated with such a dataset. Our aim is to achieve a level of generalization, ensuring that the developed system can be applied to different types of financial documents. This way, we want to demonstrate the versatility and effectiveness of the ML algorithms used in TOC extraction, enabling a streamlined and consistent approach across various financial document types.<\/span><\/p>\n<p><span style=\"font-weight: 400\">In addition, for this edition, we are excited to introduce a dataset that goes beyond textual annotations. Our proposed dataset will include visual (spatial) annotations that capture the coordinates of the titles and the hierarchical structure of the documents. This comprehensive approach enables a more holistic analysis and understanding of financial documents.<\/span><\/p>\n<p><span style=\"font-weight: 400\">By incorporating visual annotations, we can capture the visual cues and design elements that contribute to the overall structure and organization of the documents. This allows us to delve deeper into the visual representation of the table of contents and extract valuable insights from the visual hierarchy present in these financial documents. The combination of textual and visual annotations provides a richer and more nuanced dataset, making it possible to increase the accuracy and effectiveness of the machine learning algorithms and methodologies employed in TOC extraction.<\/span><\/p>\n<p>Thanks to the contribution of the Autonomous University of Madrid (UAM, Spain), the fifth edition of the FinTOC Shared Task welcomes a track for Spanish documents, continuing from the previous edition, in addition to the English and French tracks.<\/p>\n<p><span style=\"font-weight: 400\">In this edition, systems will be scored based on their performance in both Title detection and TOC generation using more precise evaluation metrics based on visual annotations.<\/span><\/p>\n<hr \/>\n<p style=\"text-align: justify\"><strong><span style=\"font-size: 14pt\"><span style=\"color: #ff6600\">Task:<\/span><\/span><\/strong><\/p>\n<p><em>The fifth edition of the FinTOC Shared Task introduces three tracks, following the format of FinTOC&#8217;4. These tracks include one for English documents, one for French documents, and a third track for Spanish documents. <span style=\"font-weight: 400\">In this edition, systems will be scored based on their performance in both Title detection and TOC generation using more precise evaluation metrics based on visual annotations.<\/span><\/em><\/p>\n<p>Participants need to register. Once registered, all participating teams will be provided with a common training dataset containing PDF documents and the associated TOC annotation.<\/p>\n<hr \/>\n<p style=\"text-align: justify\"><strong><span style=\"font-size: 14pt\"><span style=\"color: #ff6600\">Background:<\/span><\/span><\/strong><\/p>\n<p>Existing work on book and document table of contents (TOC) recognition has been almost all on small-size, application-dependent, and domain-specific datasets. However, TOC of documents from different domains differ significantly in their visual layout and style, making TOC recognition a challenging problem for a large-scale collection of heterogeneous documents and books. Compared to regular books (mostly provided in a full-text format with limited structural information such as pages and paragraphs), financial documents containing textual and non-textual content have a more sophisticated structure, including parts, sections, sub-sections, and sub-sub-sections.<\/p>\n<hr \/>\n<p><span style=\"color: #ff6600\"><span style=\"font-size: 18.6667px\"><b>How to participate:<\/b><\/span><\/span><\/p>\n<p>To participate, please use <a href=\"https:\/\/docs.google.com\/forms\/d\/e\/1FAIpQLSdqUKy3YGho0Cw2GF__VHilHZZbR75UDG3JRBC4k0Yxw4acWg\/viewform?usp=pp_url\"><strong>this registration form<\/strong><\/a> to add details of your team.<\/p>\n<p><strong>It is now open as of 06\/01\/2023.<\/strong><\/p>\n<hr \/>\n<p style=\"text-align: justify\"><span style=\"font-size: 14pt\"><span style=\"color: #ff6600\"><b>Data Format and <\/b><span style=\"font-size: 18.6667px\"><b>Evaluation<\/b><\/span><b>:<\/b><\/span><\/span><\/p>\n<p>TBA<\/p>\n<hr \/>\n<h1><span style=\"color: #ff6600;font-size: 14pt\"><b>Shared task Paper Submission Instructions:<\/b><\/span><\/h1>\n<p><span style=\"font-family: helvetica\">TBA<\/span><\/p>\n<hr \/>\n<p style=\"text-align: justify\"><strong><span style=\"font-size: 14pt\"><span style=\"color: #ff6600\">Shared Task Organisers:<\/span><\/span><\/strong><\/p>\n<ul>\n<li>Abderrahim Aitazzi, 3DS Outscale (ex Fortia), France<\/li>\n<li>Sandra Bellato, <span style=\"font-weight: 400\">3DS Outscale (ex Fortia), France<\/span><\/li>\n<li>Blanca Carbajo Coronado, Universidad Aut\u00f3noma de Madrid<\/li>\n<li>Dr Ismail El Maarouf, <span style=\"font-weight: 400\">Imprevicible<\/span><\/li>\n<li>Dr Juyeon Kang, <span style=\"font-weight: 400\">3DS Outscale (ex Fortia), France<\/span><\/li>\n<li>Prof. Ana Gisbert, Universidad Aut\u00f3noma de Madrid<\/li>\n<li>Prof. Antonio Moreno Sandoval, Universidad Aut\u00f3noma de Madrid<\/li>\n<\/ul>\n<hr \/>\n<p style=\"text-align: justify\"><strong><span style=\"font-size: 14pt\"><span style=\"color: #ff6600\">Shared Task Contact:<\/span><\/span><\/strong><\/p>\n<p>Questions about FinTOC-2023 shared task can be sent to:<\/p>\n<p><a href=\"mailto:fin.toc.task@gmail.com\">fin.toc.task@gmail.com<\/a><\/p>\n\n<div class=\"twitter-share\"><a href=\"https:\/\/twitter.com\/intent\/tweet?via=FinancialNLP\" class=\"twitter-share-button\" data-size=\"large\">Tweet<\/a><\/div>\n","protected":false},"excerpt":{"rendered":"<p>FinTOC-2023 Shared Task: &#8220;Financial Document Structure Extraction&#8220; To be held at The 5th Financial Narrative Processing Workshop (FNP 2023), Sorrento, Italy, 15-18 December 2023. Important Dates: 1st Call for papers&hellip; <a href=\"http:\/\/wp.lancs.ac.uk\/cfie\/fintoc2023\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">FinTOC 2023<\/span><\/a><\/p>\n","protected":false},"author":1659,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"class_list":["post-1148","page","type-page","status-publish","hentry","without-featured-image"],"_links":{"self":[{"href":"http:\/\/wp.lancs.ac.uk\/cfie\/wp-json\/wp\/v2\/pages\/1148","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/wp.lancs.ac.uk\/cfie\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/wp.lancs.ac.uk\/cfie\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/wp.lancs.ac.uk\/cfie\/wp-json\/wp\/v2\/users\/1659"}],"replies":[{"embeddable":true,"href":"http:\/\/wp.lancs.ac.uk\/cfie\/wp-json\/wp\/v2\/comments?post=1148"}],"version-history":[{"count":13,"href":"http:\/\/wp.lancs.ac.uk\/cfie\/wp-json\/wp\/v2\/pages\/1148\/revisions"}],"predecessor-version":[{"id":1263,"href":"http:\/\/wp.lancs.ac.uk\/cfie\/wp-json\/wp\/v2\/pages\/1148\/revisions\/1263"}],"wp:attachment":[{"href":"http:\/\/wp.lancs.ac.uk\/cfie\/wp-json\/wp\/v2\/media?parent=1148"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}