Exploiting domain information for Word Sense Disambiguation of medical documents
- 1Department of Computer Science, Sheffield University, Sheffield, UK
- 2IXA NLP Group, University of the Basque Country, Donostia, Basque Country, Spain
- Correspondence to Dr Mark Stevenson, Department of Computer Science, Sheffield University, Regent Court, 211 Portobello, Sheffield S1 4DP, UK;
- Received 1 June 2011
- Accepted 11 August 2011
- Published Online First 7 September 2011
Objective Current techniques for knowledge-based Word Sense Disambiguation (WSD) of ambiguous biomedical terms rely on relations in the Unified Medical Language System Metathesaurus but do not take into account the domain of the target documents. The authors' goal is to improve these methods by using information about the topic of the document in which the ambiguous term appears.
Design The authors proposed and implemented several methods to extract lists of key terms associated with Medical Subject Heading terms. These key terms are used to represent the document topic in a knowledge-based WSD system. They are applied both alone and in combination with local context.
Measurements A standard measure of accuracy was calculated over the set of target words in the widely used National Library of Medicine WSD dataset.
Results and discussion The authors report a significant improvement when combining those key terms with local context, showing that domain information improves the results of a WSD system based on the Unified Medical Language System Metathesaurus alone. The best results were obtained using key terms obtained by relevance feedback and weighted by inverse document frequency.
Funding MS is grateful for support from the Engineering and Physical Sciences Research Council (EP/D069548/1). EA and AS are grateful for support from the Ministry of Science (KNOW2—TIN2009-14715-C04-01): (1) Ministerio de Educacion y Ciencia; (2) Engineering and Physical Sciences Research Council.
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.