Language Technologies for the Challenges of the Digital Age 27th International Conference, GSCL 2017, Berlin, Germany, September 13-14, 2017, Proceedings

This open access volume constitutes the refereed proceedings of the 27th biennial conference of the German Society for Computational Linguistics and Language Technology, GSCL 2017, held in Berlin, Germany, in September 2017, which focused on language technologies for the digital age. The 16 full pap...

Celý popis

Uloženo v:

Podrobná bibliografie
Hlavní autoři:	Rehm, Georg, Declerck, Thierry
Médium:	E-kniha
Jazyk:	angličtina
Vydáno:	Cham Springer Nature 2018 Springer Open Springer International Publishing AG
Vydání:	1
Edice:	Lecture Notes in Artificial Intelligence
Témata:	artificial intelligence Computer programming, programs, data Education Language and Linguistics Language Arts & Disciplines machine learning named entities natural language processing natural language processing systems NLP semantics social networking Society and Social Sciences support vector machines SVM Technology & Engineering Technology, Engineering, Agriculture, Industrial processes Technology: general issues
ISBN:	3319737066, 9783319737065, 3319737058, 9783319737058, 9784431543275, 4431543279
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Obsah:

NECKAr: A Named Entity Classifier for Wikidata -- 1 Introduction -- 2 Related Work -- 3 The NECKAr Tool -- 3.1 Wikidata Data Model -- 3.2 Location Extraction -- 3.3 Organization Extraction -- 3.4 Person Extraction -- 3.5 Extracting Links to Other Knowledge Bases -- 3.6 Extraction Algorithm -- 4 Wikidata NE Dataset -- 4.1 Location Entities -- 4.2 Person Entities -- 4.3 Organization Entities -- 4.4 Assignment to More Than One Class -- 5 Comparison to YAGO3 -- 5.1 Location Comparison -- 5.2 Person Comparison -- 5.3 Organization Comparison -- 6 Conclusion and Future Work -- References -- Investigating the Morphological Complexity of German Named Entities: The Case of the GermEval NER Challenge -- 1 Introduction -- 2 Related Work -- 3 Data Basis and Approach -- 3.1 GermEval 2014 NER Challenge Corpus -- 3.2 GermEval 2014 System Predictions -- 3.3 Scope of the Analyses -- 4 Morphological Complexity of German NE Tokens -- 4.1 Measuring Morphological Complexity -- 4.2 Distribution of Morphologically Complex NE Tokens -- 4.3 Morphological Complexity in Context of NER System Errors -- 5 Reference Annotation Related Issues -- 5.1 Reference Annotation Issue Types -- 5.2 Distribution and Effects of Annotation Issues -- 6 Conclusion -- References -- Detecting Named Entities and Relations in German Clinical Reports -- 1 Introduction -- 2 Data and Methods -- 2.1 Annotated Data -- 2.2 Machine Learning Methods -- 3 Experiment -- 3.1 Preprocessing -- 3.2 Named Entity Recognition -- 3.3 Relation Extraction -- 4 Conclusion and Future Work -- References -- In-Memory Distributed Training of Linear-Chain Conditional Random Fields with an Application to Fine-Grained Named Entity Recognition -- 1 Introduction -- 2 Parallelization of Conditional Random Fields -- 3 Implementation -- 4 Scalability Experiments -- 5 Scalability Evaluation -- 6 Accuracy Experiments
4.1 Add-On Value of Coreference Resolution to Digital Curation Scenarios -- 5 Conclusions and Future Work -- References -- Word and Sentence Segmentation in German: Overcoming Idiosyncrasies in the Use of Punctuation in Private Communication -- 1 Introduction -- 2 Use of Punctuation -- 3 Conditional Random Fields (CRF)-Based Text Segmentation -- 3.1 Conditional Random Fields -- 3.2 Sequence -- 3.3 Features -- 4 Experiments -- 5 Evaluation and Conclusion -- 6 Appendix -- References -- Fine-Grained POS Tagging of German Social Media and Web Texts -- 1 Introduction -- 2 Model -- 2.1 Baseline Model -- 2.2 Distributional Smoothing -- 2.3 Lookup -- 3 Evaluation -- 3.1 Datasets -- 3.2 Experimental Setup -- 3.3 Results -- 3.4 Performance on Unknown Words -- 4 Conclusions -- References -- Developing a Stemmer for German Based on a Comparative Analysis of Publicly Available Stemmers -- 1 Introduction -- 1.1 The Stemming Task -- 1.2 Motivation -- 1.3 Summary of Work -- 2 Existing Stemmers for German and Related Work -- 2.1 German Stemmers -- 2.2 Evaluation Studies -- 3 Evaluation -- 3.1 Runtime Analysis -- 3.2 Gold Standard Development -- 3.3 Evaluation -- 3.4 Results -- 4 Development -- 4.1 CISTEM Development -- 4.2 Final Evaluation -- 5 Conclusion -- References -- Negation Modeling for German Polarity Classification -- 1 Introduction -- 2 Data and Annotation -- 3 Baselines -- 3.1 Baseline I: Window-Based Scope -- 3.2 Baseline II: Clause-Based Scope -- 4 Our Approach -- 4.1 Scope for Negation Function Words -- 4.2 Scope for Negation Content Words -- 4.3 Normalization of the Dependency Graph -- 4.4 Scope Expansion -- 5 Experiments -- 5.1 Intrinsic Evaluation on Negation Dataset -- 5.2 Extrinsic Evaluation on Sentence-Level Polarity Classification -- 6 Related Work -- 7 Conclusion -- References -- Processing German: Named Entities
Intro -- Preface -- Organization -- Contents -- Processing German: Basic Technologies -- Reconstruction of Separable Particle Verbs in a Corpus of Spoken German -- 1 Introduction -- 2 Detecting Separable Particle Verbs -- 3 Results and Discussion -- 4 Related Work -- 5 Conclusion and Outlook -- References -- Detecting Vocal Irony -- 1 Introduction -- 2 The Irony Simulation App -- 3 Acoustic Irony Classification -- 4 Textual Sentiment Classification -- 5 Data Collection -- 6 Data Labeling -- 6.1 Textual Data Labeling -- 7 Experiments -- 7.1 Irony and Anger Expression -- 7.2 Sentiment Analysis -- 7.3 Acoustic Emotion Classification Results -- 8 Summary and Outlook -- References -- The Devil is in the Details: Parsing Unknown German Words -- 1 Introduction -- 2 Related Work -- 2.1 Handling Rare and Unknown Words -- 2.2 Word Clustering -- 3 Methodology -- 3.1 Clustering Data -- 3.2 Methods -- 3.3 Evaluation -- 4 Results -- 4.1 Rare and Unknown Word Thresholds -- 4.2 Suffix Results -- 4.3 Cluster and Signature Results -- 5 Discussion -- 5.1 External POS Tagger -- 5.2 Number of Clusters -- 6 Conclusion and Future Work -- References -- Exploring Ensemble Dependency Parsing to Reduce Manual Annotation Workload -- 1 Introduction -- 2 Ensemble Parsing -- 3 Setting -- 3.1 Parser Ensemble -- 3.2 Training Domain -- 3.3 Test Domain and Gold Standard -- 4 Results -- 4.1 Quantitative Results -- 4.2 Qualitative Results -- 5 Conclusion -- References -- Different German and English Coreference Resolution Models for Multi-domain Content Curation Scenarios -- 1 Introduction to Coreference Resolution -- 2 Summary of Approaches to Coreference Resolution -- 3 Three Implementations -- 3.1 Rule-Based Approach -- 3.2 Statistical Approach -- 3.3 Projection-Based Approach -- 4 Evaluation and Case Studies
Different Types of Automated and Semi-automated Semantic Storytelling: Curation Technologies for Different Sectors -- 1 Introduction -- 2 Curation Technologies -- 2.1 Named Entity Recognition and Named Entity Linking -- 2.2 Geographical Localisation Module and Map Visualisations -- 2.3 Temporal Expression Analysis and Timelining -- 2.4 Text Classification and Document Clustering -- 2.5 Coreference Resolution -- 2.6 Monolingual and Cross-Lingual Event Detection -- 2.7 Single and Multi-document Summarisation -- 2.8 User Interaction in the Curation Technologies Prototypes -- 3 Semantic Storytelling: Four Sector-Specific Use Cases -- 3.1 Sector: Museums and Exhibitions -- 3.2 Sector: Public Archives, Libraries, Digital Humanities -- 3.3 Sector: Journalism -- 4 Related Work -- 5 Conclusions -- References -- Twitter Geolocation Prediction Using Neural Networks -- 1 Introduction -- 2 Related Work -- 3 Methods -- 3.1 Model Combination -- 4 Results -- 5 Conclusion -- References -- Miscellaneous -- Diachronic Variation of Temporal Expressions in Scientific Writing Through the Lens of Relative Entropy -- 1 Introduction -- 2 Related Work -- 3 Data -- 4 Processing Temporal Information -- 4.1 Temporal Expressions -- 4.2 Temporal Tagging -- 4.3 Extraction Quality -- 5 Typicality of Temporal Expressions -- 6 Analysis -- 6.1 Frequency-Based Diachronic Tendencies -- 6.2 Diachronic Tendencies of `Typical' Temporal Expressions -- 7 Discussion and Conclusion -- References -- A Case Study on the Relevance of the Competence Assumption for Implicature Calculation in Dialogue Systems -- 1 Introduction -- 2 Related Work -- 3 Testing the Competence Assumption Locally and Globally -- 3.1 Participants -- 3.2 Materials -- 3.3 Discussion and Results -- 4 Implications for Future Work -- References -- Supporting Sustainable Process Documentation -- 1 Introduction -- 2 Related Work
7 Accuracy Evaluation -- 8 Discussion and Conclusion -- References -- Online-Media and Online-Content -- What Does This Imply? Examining the Impact of Implicitness on the Perception of Hate Speech -- 1 Introduction -- 2 Theoretical Grounding -- 3 Manufacturing Controllable Explicitness -- 3.1 Indicators for Explicit Hate Speech -- 3.2 Paraphrasing -- 3.3 Supervised Machine Learning -- 4 User Study -- 5 Results -- 6 Conclusion and Future Work -- References -- Automatic Classification of Abusive Language and Personal Attacks in Various Forms of Online Communication -- 1 Introduction -- 2 Related Work -- 3 Data Sets -- 4 Evaluation -- 5 Conclusion -- References -- Token Level Code-Switching Detection Using Wikipedia as a Lexical Resource -- 1 Introduction -- 2 Related Work -- 3 Datasets -- 4 Classification -- 5 Results -- 6 Conclusion and Future Work -- References -- How Social Media Text Analysis Can Inform Disaster Management -- 1 Introduction -- 2 Approach -- 2.1 End-User Requirements -- 2.2 Implementation of Requirements -- 3 Examples -- 4 Conclusion -- References -- A Comparative Study of Uncertainty Based Active Learning Strategies for General Purpose Twitter Sentiment Analysis with Deep Neural Networks -- 1 Introduction -- 2 Experimental Setup -- 2.1 Initial Deep Neural Network -- 2.2 Investigated Active Learning Strategies -- 2.3 Experimental Procedure and Data Usage -- 3 Results -- 4 Conclusions and Outlook -- References -- An Infrastructure for Empowering Internet Users to Handle Fake News and Other Online Media Phenomena -- 1 Introduction -- 2 Modern Online Media Phenomena -- 3 Technology Framework: Approach -- 3.1 Services of the Infrastructure -- 3.2 Characteristics of the Infrastructure -- 3.3 Building Blocks of the Proposed Infrastructure -- 4 Related Work -- 5 Summary and Conclusions -- References
3 Process Metadata