The Yale cTAKES extensions for document classification: architecture and application
- Vijay Garla1,
- Vincent Lo Re III2,
- Zachariah Dorey-Stein2,
- Farah Kidwai3,
- Matthew Scotch3,4,
- Julie Womack3,5,
- Amy Justice3,6,
- Cynthia Brandt3,7
- 1Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, New Haven, Connecticut, USA
- 2Center for Clinical Epidemiology and Biostatistics, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, USA
- 3Connecticut VA Healthcare System, West Haven, Connecticut, USA
- 4Department of Biomedical Informatics, Arizona State University, Tempe, Arizona, USA
- 5Yale University School of Nursing, New Haven, Connecticut, USA
- 6General Internal Medicine, Yale University School of Medicine, New Haven, Connecticut, USA
- 7Yale Center of Medical Informatics, Yale University School of Medicine, New Haven, Connecticut, USA
- Correspondence to Vijay Garla, Yale Center for Medical Informatics, PO Box 208009, New Haven, CT 06520-8009, USA;
- Received 9 December 2010
- Accepted 22 April 2011
- Published Online First 27 May 2011
Background Open-source clinical natural-language-processing (NLP) systems have lowered the barrier to the development of effective clinical document classification systems. Clinical natural-language-processing systems annotate the syntax and semantics of clinical text; however, feature extraction and representation for document classification pose technical challenges.
Methods The authors developed extensions to the clinical Text Analysis and Knowledge Extraction System (cTAKES) that simplify feature extraction, experimentation with various feature representations, and the development of both rule and machine-learning based document classifiers. The authors describe and evaluate their system, the Yale cTAKES Extensions (YTEX), on the classification of radiology reports that contain findings suggestive of hepatic decompensation.
Results and discussion The F1-Score of the system for the retrieval of abdominal radiology reports was 96%, and was 79%, 91%, and 95% for the presence of liver masses, ascites, and varices, respectively. The authors released YTEX as open source, available at http://code.google.com/p/ytex.
- visualization of data and knowledge
- computational methods
- advanced algorithms
- natural-language processing
- distributed systems
- software engineering: architecture
- developing and refining EHR data standards (including image standards)
- data models, data exchange
- integration across care settings (inter- and intra-enterprise)
- detecting disease outbreaks and biological threats
- translational research—application of biological knowledge to clinical care
- linking the genotype and phenotype
- natural-language processing
- delivering health information and knowledge to the public
- monitoring the health of populations
The views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the Department of Veterans Affairs.
Funding Yale School of Medicine (VG). VA grant HIR 08-374 HSR&D: Consortium for Health Informatics (CB, MS). VA Office, Academic Affiliations, Information Research & Development (Medical Informatics Fellowship Program) (JW, CB, AJ). National Institute on Alcohol Abuse and Alcoholism (U10 AA 13566) (AJ, PI; CB, FK). National Institute of Allergy and Infectious Diseases (K01 AI 070001; VLR).
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.