Malaysian Journal of Computer Science (ISSN 0127-9084)
Indexing Page
Visit the official web site at

Article Information
Title:Linguistic Feature Classifying and Tracing
Auhtor(s): Mohammadreza Moohebat,Ram Gopal Raj ,Dirk Thorleuchter,S. Abdul-Kareem ,
Journal:Malaysian Journal of Computer Science (ISSN 0127-9084)
Volume:30, No 2
Keywords:Scientific articles, Linguistic features, Latent semantic indexing, Text Mining
Abstract:We investigate the identification and analysis of linguistic (lexico-grammatical) features that are characteristically used by articles of a specific year of publication. Linguistic features differ from shallow features because they represent authors’ lexico-grammatical writing styles and do not consider well-known bag-of-words model. Current literature focusses on shallow features rather than on linguistic features and existing methods for identifying linguistic features use well-known knowledge-structure based approaches. In contrast to this, we advance these existing methods by applying semantic clustering instead of using knowledge-structure based approaches. For evaluation purpose, a linguistic feature-based prediction model is built to enable an automated assignment of articles to their years of publication. In a case study, the proposed methodology is applied to articles of the Springer book series 'Communications in Computer and Information Science' published from 2009 to 2013. The Case study results show the feasibility of the proposed approach as compared to frequently used baseline.

Volume Listing