Programme

The workshop will take place on the afternoon of Monday 5th December in room “Congressional B”. Our acceptance rate was 60%.

Each paper is 15 minutes plus 5 minutes for questions.

1:30 Welcome and introduction (Programme Chair: Paul Rayson) (slides)

1:30-2:30 Session 1: Social/topic and large-scale processing

1:30 Domain-specific user preference prediction based on multiple user activities (slides)
Yunfei Long, Qin Lu, Yue Xiao, MingLei Li, and Chu-Ren Huang
1:50 Large-scale text processing pipeline with Apache Spark (slides)
Alexey Svyatkovskiy, Kosuke Imai, Mary Kroeger, and Yuki Shiraito
2:10 lexiDB: A Scalable Corpus Database Management System (slides)
Matthew Coole, Paul Rayson, and John Mariani

2:30-3:30 Session 2: Annotation

2:30 Scaling Character-Based Morphological Tagging to Fourteen Languages (slides)
Georg Heigold, Günter Neumann, and Josef van Genabith
2:50 A Grapheme-level Approach for Constructing a Korean Morphological Analyzer without Linguistic Knowledge
Jihun Choi, Jonghem Youn, and Sang-goo Lee
3:10 Lightweight System for NE-tagged News Headlines corpus creation (slides and video)
Avinash Kumar, Dhaval Patel, and Nikita Jain

3:30 Afternoon coffee break

3:50-5:10 Session 3: Classification

3:50 Document Classification through Image-Based Character Embedding and Wildcard Training (slides)
Daiki Shimada, Ryunosuke Kotani, and Hitoshi Iyatomi
4:10 Automatic Classification of Securities using Hierarchical Clustering of the 10-Ks (slides)
Hoseong Yang, Hye Jin Lee, Eugene Cho, and Sungzoon Cho
4:30 Large-Scale Taxonomy Categorization for Noisy Product Listings
Pradipto Das, Yandi Xia, Aaron Levine, Giuseppe Di Fabbrizio, and Ankur Datta (withdrawn: accepted at EACL2017)
4:50 Efficient Natural Language Pre-processing for analysing large data sets
Billal Belainine, Alexsandro Fonseca, and Fatiha Sadat (withdrawn: author unable to present)

5:10 System demonstrations
6:00 Finish