Workshop on: Hypothesis Generating in Genetics and Biomedical Text Mining

08 January 2019

Registration: [9:30-10:00], Workshop talks and discussion [10:00-16:00]

Venue: Charles Carter Room A19, Lancaster University

NEW: Workshop programme and abstracts: HG2BTM_Programme_and_Abstracts

Over recent years, research into biomedical data using corpora and corpus methods has moved from a small-scale activity with isolated pockets of activity to a much larger very active field, with work advancing rapidly on many different fronts in both corpus and computational linguistics.

In many areas of academic publishing, there is an explosion of literature, and sub-division of fields into subfields, leading to stove-piping where sub-communities of expertise become disconnected from each other. This is especially true in the genetics literature over the last 10 years where researchers are no longer able to maintain knowledge of previously related areas.

We invite one-page abstract submissions on topics that include, but are not limited to, techniques developed in Natural Language Processing (NLP) and Corpus Linguistics (CL) can help in closing this gap of knowledge leading to a better hypothesis generation. This multidisciplinary effort aims to harness the power of NLP and CL to be able to build method and techniques that will provide new clues to disease aetiology.

HG2BTM workshop aims to create a venue where different activities in corpus research into biomedical data can be brought together to explore progress in the field through inviting renowned speakers working on the fields of NLP and CL towards advancing biomedical and gene ontology research.

Organisers:

Confirmed Speakers (so far):

Prof Nancy Ide, Department of Computer Science, Vassar College, USA. [SLIDES]
Prof Udo Hahn, Jena University Language & Information Engineering (JULIE) Lab at University of Jena, Germany.
Dr Paul Thompson, Centre for Corpus Research, University of Birmingham, UK.
Dr Mark Stevenson, Department of Computer Science, Sheffield University, UK. [SLIDES]
Dr Mahmoud El-Haj, Dr Sheryl Prentice, Nathan Rutherford, SCC, Lancaster University, UK. [SLIDES]

Speakers Short Bio:

Professor Nancy Ide:

Nancy Ide is Professor of Computer Science at Vassar College. She has been an active researcher in the field of computational linguistics for over 30 years and has published copiously on topics including computational lexicography, word sense disambiguation, semantic web implementations of linguistically annotated data, and interoperable standards for representing language resources. In 1987, she co-founded the Text Encoding Initiative, which continues to be the major XML format for representing annotated humanities data. In the early and mid-1990s she was project leader for two major European projects involving the creation and annotation of large-scale, multi-lingual corpora and the design and implementation of pipeline architectures for language processing tools. In this context she developed the XML Corpus Encoding Standard (XCES), which is a standard in the field. She has been Principal Investigator on several National Science Foundation-funded projects, including a major effort to create a massive linguistically-annotated corpus of American English, the Open American National Corpus (OANC), and a manually annotated sub-corpus of the OANC (MASC); and an ongoing project to provide an easy-to-use platform including fully interoperable NLP tools and data, the Language Applications (LAPPS) Grid. Dr. Ide serves on several sub-committees and working groups in the International Standards Organization (ISO) Technical Committee on Language Resource Management, and is the principal architect of the ISO Linguistic Annotation Framework. Recently, she co-edited (with James Pustejovsky) and contributed several chapters to a 1300-page Handbook of Linguistic Annotation, published in 2017 by Springer. Dr. Ide is the co-editor-in-chief of the Springer journal Language Resources and Evaluation, one of the premier journals in the field of computational linguistics. She is also editor of the Springer book series entitled Text, Speech, and Language Technology, which contains over 40 books on topics covering the full range of computational linguistics research. She is the co-founder and President of the Association for Computational Linguistics special interest group on Annotation (ACL-SIGANN). From 1985-1995 she was President of the Association for Computers and the Humanities, and served as co-editor for the journal Computers and the Humanities from 1996-2001 (since renamed as Language Resources and Evaluation).

Show less

Professor Udo Hahn:

Heads the Jena University Language & Information Engineering (JULIE) Lab at the University of Jena, Germany. His work focuses on biomedical information extraction and text mining and associated development of language resources (corpora and softwaretools) since more than a two decades. Recently, his lab was appointed the leading role in a national research initiative for setting up a fully interoperable (cross-)clinical NLP infrastructure in Germany as part of a multi-milllion funding initiative by the German Ministry of Research and Education (BMBF).

Show less

Professor Joanne Knight:

Arrived at Lancaster in January 2016. During her time here, she has become the Research Director of the Medical School and is the Health Theme lead for the Data Science Institute. Her expertise lies within statistical genetics – analysing genetic data to identify variants that increase risk of complex traits. Jo publishes both applied and methodological work in journals including Nature and Nature Genetics. The diseases that she has most experience in include psychiatric, autoimmune traits and cardio vascular disease. More recently Jo has started a number of projects under the broad title of “using NHS held data to improve health records”.

Show more

Show less

Dr Paul Rayson:

Paul is the director of the UCREL research centre and a Reader in the School of Computing and Communications, in InfoLab21 at Lancaster University. A long term focus of his work is the application of semantic-based NLP in extreme circumstances where language is noisy e.g. in historical, learner, speech, email, txt and other CMC varieties. His applied research is in the areas of dementia detection, online child protection, cyber security, learner dictionaries, and text mining of historical corpora and annual financial reports.

Show less

Dr Mahmoud EL-HAJ:

Mahmoud is a Senior Research Associate at the School of Computing and Communications at Lancaster University. Mahmoud received his PhD in Computer Science from The University of Essex working on Arabic Multi-document Summarization. His research interests include Arabic and multilingual NLP, Machine Learning, Information Extraction, Financial Narratives Processing and Corpus and Computational Linguistics. Mahmoud worked on multidisciplinary research projects at Lancaster University collaborating with big financial firms in London and has previously worked as a Data Mining developer and researcher at the UK Data Archive. Website: http://www.lancaster.ac.uk/staff/elhaj/

Show less

Dr Sheryl Prentice:

Sheryl is a Senior Research Associate in the Faculty of Health and Medicine. She is currently employed on the BioTM project, a project seeking to utilise text mining methods for the generation of new hypotheses in genetics research. Her background is in the field of corpus linguistics, having undertaken a number of research projects crossing multiple disciplines, including psychology, linguistics, and most recently, health. She is currently exploring the application of corpus methods to genetic association study abstracts to extract common and unusual terms.

Show less

Important dates:

~~Registration is now open: Closing date 18^th December (nominal fee applies)~~
~~Abstracts submission deadline (200-300 words): 8 December 2018 10 December 2018 (11:59pm anywhere in the world).~~
~~Notification of acceptance: 15 December 2018.~~
Workshop: 8 January 2019 (Registration: [9:30-10], Workshop talks [10:00-16:00])

Venue:

Charles Carter Room A19, Lancaster University

Contact details:

Email: BioTM.Project@gmail.com

URL: http://wp.lancs.ac.uk/btm/hg2btm

Biomedical Text Mining @ Lancaster University

HG2BTM

Workshop on: Hypothesis Generating in Genetics and Biomedical Text Mining

Organisers:

Confirmed Speakers (so far):

Speakers Short Bio:

Dr Paul Rayson:

Dr Mahmoud EL-HAJ:

Dr Sheryl Prentice:

Important dates:

Venue:

Contact details:

Follow Biomedical Text Mining @ Lancaster University