A Comparative Analysis of the Entropy and Transition Point Approach in Representing Index Terms of Literary Text
Abstract
Problem statement: Concept hierarchy is a hierarchically organized collection of domain concepts. It is particularly useful in many applications such as information retrieval, document browsing and document classification. Approach: One of the important tasks in construction of concept hierarchy is identification of suitable terms with appropriate size of domain vocabulary. Results: One way of achieving such a size is by using term reduction. The aim of this study is to examine the effectiveness of reduction approach to reduce size of vocabulary using term selection methods for literary text. The experiment compares entropy method, transition point method and hybrid of transition point and entropy methods with the Vector Space Model (VSM). Conclusion/Recommendations: Results indicate the effectiveness of Transition Point method as compared to the others in reducing size of vocabulary but at same time preserve those important terms that exist in the literary documents.
DOI: https://doi.org/10.3844/jcssp.2011.1088.1093
Copyright: © 2011 Hayati Abd Rahman and Shahrul Azman Noah. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 3,092 Views
- 2,432 Downloads
- 1 Citations
Download
Keywords
- Information retrieval
- term reduction
- concept hierarchy
- Dominating Set Problem (DSP)
- Vector Space Model (VSM)
- Transition Point (TP)