The 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation (FNP-FNS 2020)

Financial Narrative Summarisation (FNS 2020)

To be held at The 28th International Conference on Computational Linguistics (COLING’2020), Barcelona, Spain [online] on 12 December 2020.

FNP-FNS Online Running Instructions: http://wp.lancs.ac.uk/cfie/fnpfns-instructions/

Workshop Program: Click here to see the workshop schedule

Keynote speaker: Dr Ana Gisbert, to join the talk: http://wp.lancs.ac.uk/cfie/keynote/

Shared task results: https://bit.ly/2CoSLP4

submission instructions http://wp.lancs.ac.uk/cfie/fnp2020/guidelines/

The training and validation datasets have been released. Please use the registration form below in order to receive a copy, otherwise please email fns.shared.task@gmail.com

~~Participation Form: http://bit.ly/34xWCCp~~

Dataset Sample: http://bit.ly/2PaUdbL

Task Details FNS_2020_Financial_Summarisation description (Please read)

Important Dates [New Dates].

~~December 1st, 2019: Registration opens.~~
~~February 17th, 2020: Release of training set.~~
~~March 23rd, 2020: Release of test set.~~
~~April 6th, 2020: Registration deadline.~~
~~Shared Task results submission June, 15 2020~~
Shared task papers due September 1, 2020
~~Notification of acceptance October 1, 2020~~
~~Camera-ready papers due November 1, 2020~~
Workshop and shared task dates December 12, 2020

Introduction

The volume of available financial information is increasing sharply and therefore the study of NLP methods that automatically summarise content has grown rapidly into a major research area.

The Financial Narrative Summarisation (FNS 2020) aims to demonstrate the value and challenges of applying automatic text summarisation to financial text written in English, usually referred to as financial narrative disclosures. The task dataset has been extracted from UK annual reports published in PDF file format.

The participants are asked to provide structured single summaries, based on real-world, publicly available financial annual reports of UK firms by extracting information from different key sections. Participants will be asked to generate summaries that reflects the analysis and assessment of the financial trend of the business over the past year, as provided by annual reports. For the evaluation we aim to use the JRouge package for ROUGE, from Marina Litvak’s team (https://bitbucket.org/nocgod/jrouge/wiki/Home), using multiple variants (ROUGE-2, ROUGE-SU4).

Summaries Evaluation

We also aim to use NPowER package (https://github.com/ggianna/SummaryEvaluation/tree/master/Releases/V1), which includes AutoSummENG as one of the outputs. The parameters will be the default (minNgram=3, maxNgram=3, dist/Dwin=3).

Task

For FNS 2020 task we focus on annual reports produced by UK firms listed on The London Stock Exchange (LSE). In the UK and elsewhere, annual report structure is much less rigid than those produced in the US. Companies produce glossy brochures with a much looser structure, and this makes automatic summarisation of narratives in UK annual reports a challenging task. This is due to the fact that the structure of those documents needs to be extracted first in order to summarise the narrative sections of the annual reports. This can happen by detecting narrative sections that usually include the management disclosures rather than the financial statements of the annual reports.

In this task we will introduce a new summarisation task which we call Financial Narrative Summarisation. In this task the summary requires extraction from different key sections found in the annual reports. Those sections are usually referred to as “narrative sections” or “front-end” sections and they usually contain textual information and reviews by the firm’s management and board of directors. Sections containing financial statements in terms of tables and numbers are usually referred to as “back-end” sections and are not supposed to be part of the narrative summaries. UK annual reports are lengthy documents with around 80 pages on average, some annual reports could span over more than 250 pages, making the summarisation task a challenging but an academically interesting one.

For the purpose of this task we will ask the participants to produce one summary for each annual report. The summary length should not exceed 1000 words. We advise that the summary is generated/extracted based on the narrative sections, therefore the participating summarisers need to be trained to detect narrative sections before creating the summaries. The MultiLing team along with help from Barcelona’s UPF summarisation team will help in organising the shared task including the generation of the evaluation results and final proceedings. The MultiLing team have a rich experience in organising summarisation tasks since 2011.

FAQs:

Q1: Is this an Extractive or Abstractive summarisation task?
A1: The process is about extracting (or regenerating) relevant information from a document. Therefore the process can be either extractive or abstractive, the choice is yours.

Q2: Is this similar to summarising news documents?
A2: Financial annual reports are a bit different from news articles, they are large in size and contain vast information that is deemed repetitive or irrelevant to the year’s performance summary.

Q3: What part of the annual reports are you interested in summarising?
A3: We are basically interested in summarising the narrative sections (learn more about narratives FNS_2020_Financial_Summarisation).

Q4: How did you generate the gold-standard sumamries?
A4: Every section in an annual report is written by human experts with the aim of summarising the previous year’s performance. To clarify, we did not ask those experts to write down a summary, as we are not involved in the annual report creation, instead we asked the experts who created those annual reports to tell us which sections in the annual reports are considered a summary of the whole annual reports, and those sections were used as gold standard summaries.

Q5: Are the gold standard summaries contained within the annual reports?
A5: Yes, they are contained within the reports. Having said that, such information is not always in the same order, location, format or even contents. Therefore detecting which information in an annual report needs to be extracted into the final summary is a challenge even for experts in the field.

Q6: What is the word limit for the generated summaries?
A6: You are to generate a summary of no more than 1000 words.

Q7: Can we submit results for more than one summarisation-system?
A7: Yes, you can submit up to three systems (one run each).

Q8: Will there be a leader-board?
A8: Yes.

Q9: Is the dataset extracted from PDF to txt following the procedure in this referenced paper?
A9: Yes

Q10: Is it possible to provide us with the original PDF files so we can look into the structures?
A10: Unfortunately, we cannot provide the original PDF files due to copyright issues. The idea is to work on extracting a structure for the unstructured plain text file formats.

Q11: Would you clarify about the quoted sentences in the gold summaries? Those seem to be not quotations from the full text.
A11: The quotations could be a result of highlighted text (usually floating text) in the original PDF files but they are not intended to affect or guide the summarisation process in any way.

Q12: Is it safe to ignore the new lines in the gold summary? I assume they are results of PDF conversions.
A12: Yes, we will not look into line breaks so feel free to ignore them.

Q13: What evaluation methods will be used?
A13: We aim to use the JRouge package for ROUGE, from Marina Litvak’s team (JROUGE), using multiple variants (ROUGE-2, ROUGE-SU4). We also aim to use NPowER package (NPowER), which includes AutoSummENG as one of the outputs. The parameters will be the default (minNgram=3, maxNgram=3, dist/Dwin=3).

Q14: Do we need to submit a paper describing our system(s)?
A14: Preferably, yes! Depending on the number of teams across all shared tasks we might limit that to the best performing teams. More information to follow nearer to the results submission deadline.

If in doubt please do not hesitate to contact us on fns.shared.task@gmail.com

Dataset

Training and Validation sets have been released. Only registered teams will receive the datasets. Click here to participate, or use the email at the end of this page. Participation Form: http://bit.ly/34xWCCp

For the creation of the financial narrative summarisation dataset we used a number of 3,863 annual reports. We randomly split the dataset into training (c75%), testing and validation (c25%). Table 1 shows the dataset details. We will provide the participants with the training and validation sets including the full text of each annual report along with the extracted sections and gold-standard summaries. At a later stage the participants will be given the testing data. On average there are at least 2 gold-standard summaries for each annual report.

Table 1: Dataset

Data Type	Training	Validation	Testing	Total
Report full text	3,000	363	500	3,863
Gold summaries	9,873	1,250	1,673	12,796

Output:

Details on how to perform the task and the format of the output in the following document: FNS_2020_Financial_Summarisation description (Please read before running the task).

Each team should write a short paper describing their methods. The paper will be published on ACL Anthology in the FNP 2020 proceedings as part of COLING 2020.

Shared task Paper Submission Instructions:

Submission URL: https://www.softconf.com/coling2020/FNP-FNS/

Detailed submission guidelines can be found here: http://wp.lancs.ac.uk/cfie/fnp2020/guidelines/

Shared Task Co-Organisers

Mahmoud El-Haj (Lancaster University)
Ahmed AbuRa’ed (Universitat Pompeu Fabra),
Nikiforos Pittaras (NCSR, Demokritos).
Marina Litvak (Sami Shamoon College of Engineering).
George Giannakopoulos (SKEL Lab – NCSR Demokritos).

Contact

fns.shared.task@gmail.com

FNS 2020