J Am Med Inform Assoc 18:614-620 doi:10.1136/amiajnl-2011-000093
  • Research and applications

The Yale cTAKES extensions for document classification: architecture and application

  1. Cynthia Brandt3,7
  1. 1Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, New Haven, Connecticut, USA
  2. 2Center for Clinical Epidemiology and Biostatistics, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, USA
  3. 3Connecticut VA Healthcare System, West Haven, Connecticut, USA
  4. 4Department of Biomedical Informatics, Arizona State University, Tempe, Arizona, USA
  5. 5Yale University School of Nursing, New Haven, Connecticut, USA
  6. 6General Internal Medicine, Yale University School of Medicine, New Haven, Connecticut, USA
  7. 7Yale Center of Medical Informatics, Yale University School of Medicine, New Haven, Connecticut, USA
  1. Correspondence to Vijay Garla, Yale Center for Medical Informatics, PO Box 208009, New Haven, CT 06520-8009, USA; vijay.garla{at}
  • Received 9 December 2010
  • Accepted 22 April 2011
  • Published Online First 27 May 2011


Background Open-source clinical natural-language-processing (NLP) systems have lowered the barrier to the development of effective clinical document classification systems. Clinical natural-language-processing systems annotate the syntax and semantics of clinical text; however, feature extraction and representation for document classification pose technical challenges.

Methods The authors developed extensions to the clinical Text Analysis and Knowledge Extraction System (cTAKES) that simplify feature extraction, experimentation with various feature representations, and the development of both rule and machine-learning based document classifiers. The authors describe and evaluate their system, the Yale cTAKES Extensions (YTEX), on the classification of radiology reports that contain findings suggestive of hepatic decompensation.

Results and discussion The F1-Score of the system for the retrieval of abdominal radiology reports was 96%, and was 79%, 91%, and 95% for the presence of liver masses, ascites, and varices, respectively. The authors released YTEX as open source, available at


  • The views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the Department of Veterans Affairs.

  • Funding Yale School of Medicine (VG). VA grant HIR 08-374 HSR&D: Consortium for Health Informatics (CB, MS). VA Office, Academic Affiliations, Information Research & Development (Medical Informatics Fellowship Program) (JW, CB, AJ). National Institute on Alcohol Abuse and Alcoholism (U10 AA 13566) (AJ, PI; CB, FK). National Institute of Allergy and Infectious Diseases (K01 AI 070001; VLR).

  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Related Article

Free Sample

This recent issue is free to all users to allow everyone the opportunity to see the full scope and typical content of JAMIA.
View free sample issue >>

Access policy for JAMIA

All content published in JAMIA is deposited with PubMed Central by the publisher with a 12 month embargo. Authors/funders may pay an Open Access fee of $2,000 to make the article free on the JAMIA website and PMC immediately on publication.

All content older than 12 months is freely available on this website.

AMIA members can log in with their JAMIA user name (email address) and password or via the AMIA website.

Navigate This Article