You are here: TUCS > RESEARCH > Research Units > Turku BioNLP Group
Turku BioNLP Group
The Turku BioNLP Group is a group of researchers at the Department of Information technology at the University of Turku as well as the Turku Centre for Computer Science (TUCS) graduate school. The main focus of our research are various aspects of Natural Language Processing, ranging from corpus annotation to machine learning theory and applications. The main application area we've been focusing on is the domain of biological, biomedical, and clinical text.
Research Unit Web Page: http://bionlp.utu.fi/
Leader of the unit
Tapio SalakoskiResearchers
Jorma Boberg Filip Ginter Tapio Pahikkala Antti Airola Veronika LaippalaDoctoral Students
Jari Björne Katri Haverinen Juho Heimonen Timo ViljanenProjects
BioInfer
We have created the BioInfer corpus to support the development of IE systems in the biomedical domain. The project has its own webpage where you can find the corpus as well as the software relevant to it.
PPI Corpora
We have created and released a conversion software for five well-known protein-protein interaction corpora (AIMed, BioInfer, LLL, IEPA, and HPRD50) into a shared XML-based format. This project has its own webpage where you can find the software as well as a pre-processed release of BioInfer.
Ikitik
The aim of IKITIK is to support producing and using health information and communication by developing innovative, intelligent, state-of-the-art clinical information and language technology solutions. They are based on end-user needs and will be carefully tested using both statistical techniques and genuine end-user feedback. To assure their quality, international applicability, practical relevance and interoperability with existing electronic patient information systems, solutions are developed in interdisciplinary and international collaboration of care providers, clinical documentation and decision-making experts, as well as information and communication technology developers and providers. Outcomes contribute to clarity, understandability and accessibility of patient narratives. This has positive impacts on patient safety, care quality, and efficiency and profitability of health care services. Further, improved patient narratives emphasize customer orientation and individualized care. (Webpage)
RLScore
RLScore is a Regularized Least-Squares (RLS) based machine learning package. It contains implementations of the RLS and RankRLS learners allowing the optimization of performance measures for the tasks of regression, ranking and classification. Implementations of efficient cross-validation algorithms are integrated to the package, combined together with functionality for fast parallel learning of multiple outputs. (Webpage)
Turku Dependency Treebank
We are building a broad-coverage dependency-annotated treebank of general Finnish. The treebank is annotated in a minor revision of the Stanford dependency scheme (de Marneffe et al. [1,2]). The primary purpose of the treebank is to support Finnish NLP.
Turku Clinical Corpus
We have developed a dependency-annotated treebank of Finnish Intensive Care Nursing Narratives. The treebank is annotated in a minor revision of the Stanford dependency scheme (de Marneffe et al. [1,2]). A PropBank-style predicate argument annotation is built on top of the syntactic annotation, covering 90% of all verb occurrences in the corpus. The argument annotation is tightly bound to the syntax, requiring arguments to be governed by the verb.
Biological Event Extraction
This project concerns the extraction from text of biomolecular events, which are recursively nested, typed associations of arbitrarily many participants (genes / gene products) in specific roles
Publications
Click here to see the full list of publications from the TUCS Publication Database
The latest updated publications:
Predrag Radivojac, Wyatt T. Clark, Tal Ronnen Oron, Alexandra M. Schnoes, Tobias Wittkop Wittkop, Artem Sokolov, Kiley Graim, Christopher Funk, Karin Verspoor, Asa Ben-Hur, Gaurav Pandey, Jeffrey M. Yunes, Ameet S. Talwalkar, Susanna Repo, Michael L. Souza, Damiano Piovesan, Rita Casadio, Zheng Wang, Jianlin Cheng, Hai Fang, Julian Gough, Patrik Koskinen, Petri Törönen, Jussi Nokso-Koivisto, Liisa Holm, Domenico Cozzetto, Daniel W. A. Buchan, Kevin Bryson, David T. Jones, Bhakt Limave, Harshal Inamdar, Avik Datta, Sunitha K. Manjari, Rajendra Joshi, Meghana Chitale, Daisuke Kihara, Andreas M. Lisewski, Serkan Erdin, Eric Venner, Olivier Lichtarge, Robert Rentzsch, Haixuan Yang, Alfonso E. Romero, Prajwal Bhat, Alberto Paccanaro, Tobias Hamp, Rebecca Kaßner, Stefan Seemayer, Esmeralda Vicedo, Christian Schaefer, Dominik Achten, Florian Auer, Ariane Boehm, Tatjana Braun, Maximilian Hecht, Mark Heron, Peter Hönigschmid, Thomas A. Hopf, Stefanie Kaufmann, Michael Kiening, Denis Krompass, Cedric Landerer, Yannick Mahlich, Manfred Roos, Jari Björne, Tapio Salakoski, Andrew Wong, Hagit Shatkay, Fanny Gatzmann, Ingolf Sommer, Mark N. Wass, Michael J. E. Sternberg, Nives Škunca, Fran Supek, Matko Bošnjak, Panče Panov, Sašo Džeroski, Tomislav Šmuc, Yiannis A. I. Kourmpetis, Aalt D. J. van Dijk, Cajo J. F. ter Braak, Yuanpeng Zhou, Qingtian Gong, Xinran Dong, Weidong Tian, Marco Falda, Paolo Fontana, Enrico Lavezzo, Barbara Di Camillo, Stefano Toppo, Liang Lan, Nemanja Djuric, Yuhong Guo, Slobodan Vucetic Vucetic, Amos Bairoch, Michal Linial, Patricia C. Babbitt, Steven E. Brenner, Christine Orengo, Burkhard Rost, Sean D. Mooney, Iddo Friedberg, A Large-Scale Evaluation of Computational Protein Function Prediction. Nature methods 10, 221–227, 2013.
Antti Airola, Tapio Pahikkala, Heljä Lundgrén-Laine, Anne Santalahti, Päivi Rautava, Sanna Salanterä, Tapio Salakoski, A Machine Learning Approach Towards Early Detection of Frequent Health Care Users. In: Hanna Suominen (Ed.), Proceedings of the 4th International Louhi Workshop on Health Document Text Mining and Information Analysis, –, National ICT Australia, 2013.
Veronika Laippala, Timo Viljanen, Antti Airola, Jenna Nyblom, Sanna Salanterä, Tapio Salakoski, Filip Ginter, Statistical Parsing of Varieties of Clinical Finnish. In: Hanna Suominen (Ed.), Proceedings of the 4th International Louhi Workshop on Health Document Text Mining and Information Analysis, 1–6, National ICT Australia, 2013.
Sasu Tarkoma, Joni-Kristian Kämäräinen, Tapio Pahikkala (Eds.), The Federated Computer Science Event, Unigrafia Oy, 2012.
Björne Jari, Ginter Filip, Salakoski Tapio, University of Turku in the BioNLP'11 Shared Task. BMC Bioinformatics 13, S4, 2012.
Fabian Gieseke, Oliver Kramer, Antti Airola, Tapio Pahikkala, Efficient Recurrent Local Search Strategies for Semi- and Unsupervised Regularized Least-Squares Classification. Evolutionary Intelligence 5(3), 189–205, 2012.
Tapio Pahikkala, Antti Airola, Thomas Canhao Xu, Pasi Liljeberg, Hannu Tenhunen, Tapio Salakoski, Parallelized Online Regularized Least-Squares for Adaptive Embedded Systems. International Journal of Embedded and Real-Time Communication Systems 3(2), 73–91, 2012.
Tapio Pahikkala, Sebastian Okser, Antti Airola, Tapio Salakoski, Tero Aittokallio, Wrapper-based selection of genetic features in genome-wide association studies through fast matrix operations. Algorithms for Molecular Biology 7, 1–11, 2012.
Tapio Pahikkala, Hanna Suominen, Jorma Boberg, Efficient Cross-Validation for Kernelized Least-Squares Regression with Sparse Basis Expansions. Machine Learning 87(3), 381–407, 2012.
Sofie Van Landeghem, Jari Björne, Thomas Abeel, Bernard De Baets, Tapio Salakoski, Yves Van de Peer, Semantically linking molecular entities in literature through entity relationships. BMC bioinformatics 13, S6, 2012.
Willem Waegeman, Tapio Pahikkala, Antti Airola, Tapio Salakoski, Michiel Stock, Bernard De Baets, A Kernel-based Framework for Learning Graded Relations from Data. IEEE TRANSACTIONS ON FUZZY SYSTEMS 20(6), 1090–1101, 2012.
Antti Airola, Machine Learning and Performance Estimation Methods for Ranking Problems. In: Sasu Tarkoma, Joni-Kristian Kämäräinen, Tapio Pahikkala (Eds.), Proceedings of the Federated Computer Science Event 2012, 8–14, University of Helsinki, 2012.
Jari Björne, Sofie van Landeghem, Sampo Pyysalo, Tomoko Ohta, Filip Ginter, Yves van de Peer, Sophia Ananiadou, Tapio Salakoski, PubMed-Scale Event Extraction for Post-Translational Modifications, Epigenetics and Protein Structural Relations. In: Kevin Bretonnel Cohen, Dina Demner-Fushman, Sophia Ananiadou, John Pestian (Eds.), Proceedings of the 2012 Workshop on Biomedical Natural Language Processing (BioNLP 2012), 82–90, Association for Computational Linguistics, 2012.
Fabian Gieseke, Antti Airola, Tapio Pahikkala, Oliver Kramer, Sparse Quasi-Newton Optimization for Semi-Supervised Support Vector Machines. In: Pedro Latorre Carmona, J. Salvador Sánchez, Ana L. N. Fred (Eds.), Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods (ICPRAM), 45–54, SciTePress, 2012.
Kai Hakala, Sofie van Landeghem, Suwisa Kaewphan, Tapio Salakoski, Yves van de Peer, Filip Ginter, CyEVEX: Literature-Scale Network Integration and Visualization Through Cytoscape. In: Sophia Ananiadou, Sampo Pyysalo, Dietrich Rebholz-Schuhmann, Fabio Rinaldi, Tapio Salakoski (Eds.), Proceedings of the 5th International Symposium on Semantic Mining in Biomedicine, 91–96, ACM Press, 2012.
Juho Heimonen, Tapio Salakoski, Sanna Salanterä, An Ontology to Improve Accessibility and Quality of Patient Instructions. In: Pamela Forner, Jussi Karlgren, Christa Womser-Hacker (Eds.), Proceedings of CLEF 2012 Evaluation Labs and Workshop, 137, Fondazione Bruno Kessler Press, 2012.
Suwisa Kaewphan, Sanna Kreula, Sofie Van Landeghem, Yves Van de Peer, Patrik R. Jones, Filip Ginter, Integrating Large-Scale Text Mining and Co-Expression Networks: Targeting NADP(H) Metabolism in E. coli with Event Extraction. In: Sophia Ananiadou, Kevin Cohen, Dina Demner-Fushman, Paul Thompson (Eds.), Third Workshop on Building and Evaluating Resources for Biomedical Text Mining, 8–15, European Language Resources Association (ELRA), 2012.
Olli Sjöblom, Juho Heimonen, Lotta Kauhanen, Veronika Laippala, Heljä Lundgrén-Laine, Laura-Maria Murtola, Tapio Salakoski, Sanna Salanterä, Avoiding Hazards - What Can Health Care Learn from Aviation?. In: Kristina Eriksson-Backa, Annika Luoma, Erica Krook (Eds.), Exploring the Abyss of Inequalities - Proceedings of the 4th International Conference on Well-Being in the Information Society, Communications in Computer and Information Science 313, 119–127, Springer, 2012.
Michiel Stock, Tapio Pahikkala, Antti Airola, Tapio Salakoski, Bernard De Baets, Willem Waegeman, Learning Monadic and Dyadic Relations: Three Case Studies in Systems Biology. In: Oliver Ray, Katsumi Inoue (Eds.), Proceedings of the ECML/PKDD 2012 Workshop on Learning and Discovery in Symbolic Systems Biology, 74–84, ECML/PKDD 2012 Workshop on Learning and Discovery in Symbolic Systems Biology, 2012.
Willem Waegeman, Tapio Pahikkala, Antti Airola, Tapio Salakoski, Bernard De Baets, Learning Valued Relations from Data. In: Pedro Melo-Pinto, Pedro Couto, Carlos Serôdio, János Fodor, Bernard De Baets (Eds.), Eurofuse 2011, Advances in Soft Computing 107, 257–268, Springer, 2012.
Willem Waegeman, Michiel Stock, Bernard De Baets, Tapio Pahikkala, Antti Airola, Tapio Salakoski, Conditional Ranking Algorithms for Efficient Object Retrieval and Object Querying on Relational Data. In: Thomas Demeester, Johannes Deleu, Laurent Mertens, Dieter Plaetinck, An De Moor, Thong Hoang, Tim Wauters, Chris Develder, Brecht Vermeulen, Piet Demeester (Eds.), Proceedings of the 12th Dutch-Belgian Information Retrieval Workshop (DIR 2012), 59–60, Ghent University, 2012.
Katri Haverinen, Syntax Annotation Guidelines for the Turku Dependency Treebank. TUCS Technical Reports 1034, Turku Centre for Computer Science, 2012.
