Arabic Text Classification Using Linear Discriminant Analysis

EasyChair Preprint 75

6 pages•Date: April 18, 2018

Abstract

Linear Discriminant Analysis (LDA) is a dimensionality reduction technique that is widely used in patter recognition applications. LDA aims at generating effective feature vectors by reducing the dimensions of the original data (e.g. bag-of-words representation) into a low dimensional space. Hence, LDA is a convenient method for text classification that generally characterized by high dimensional feature vectors. In this paper, we empirically investigated two LDA based methods for Arabic text classification. The first method based on computing the generalized eigenvectors of the ratio (inverse within-class and between-class) scatters, the second method include linear classification functions that assume equal population covariance matrices (i.e. pooled sample covariance matrix). We used a textual data collection that contains 1,750 documents belong to five categories. The testing set contains 250 documents belong to five categories (50 documents for each category). The experimental results show that the linear classification functions method outperforms the eigenvalue decomposition method.

Keyphrases: Arabic, Classification, Fisher, Linear Discriminant Analysis, text

Links:

https://easychair.org/publications/preprint/7Krk

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:75,
  author    = {Fawaz Al-Anzi and Dia Abuzeina},
  title     = {Arabic Text Classification Using Linear Discriminant Analysis},
  howpublished = {EasyChair Preprint 75},
  year      = {EasyChair, 2018}}

Download PDF Open PDF in browser