Download PDFOpen PDF in browser

Classification based on Associations (CBA) - a performance analysis

EasyChair Preprint 501

9 pagesDate: September 12, 2018

Abstract

Classification Based on Associations (CBA) has for  two decades been the algorithm of choice for researchers as well as  practitioners owing to simplicity of the produced rules, accuracy of models, and also fast model building.  Two versions of CBA differing in speed -- M1 and M2 -- were originally proposed  by Liu et al in 1998. While the more complex M2 version was originally designated as on average 50% faster, in this article we present benchmarks performed with multiple CBA implementations on the UCI lymph dataset contesting the M2 supremacy: the results show that M1 had faster processing speeds in most evaluated setups. M2 was recorded to be faster only when the number of input rules was  very small and the number of input instances was large. We hypothesize that the better performance of the  M1 version can be attributed  to  recent advances in optimization of vectorized operations and memory structures in SciKit learn and R, which the M1 can better utilize due to better predispositions for vectorization. 
This paper is accompanied by a Python implementation of CBA available at https://pypi.org/project/pyARC/.

Keyphrases: CBA, Classification, Classification by Associations, association rule, benchmark

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:501,
  author    = {Jiří Filip and Tomáš Kliegr},
  title     = {Classification based on Associations (CBA) - a performance analysis},
  doi       = {10.29007/gjl4},
  howpublished = {EasyChair Preprint 501},
  year      = {EasyChair, 2018}}
Download PDFOpen PDF in browser