pedrosaurio pedrosaurio - 1 day ago 5
R Question

how to ignore list of attributes using the command line while clustering in weka?

I am running a series of clustering analyses in weka and I have realized that automatizing it is the way to go if I want to get somewhere. I'll explain a bit how I am working.


  • I do all the pre-processing manually in R and save it as a csv file, importing it in weka and saving it again as an arff file.

  • I use weka's GUI, and in general I just open my data with in the arff file and go directly to the clustering tab and play around. (My experience using the CLI is limited).



I am trying to reproduce some results I've got by using the GUI, but now with commands in the CLI. The problem is that I usually ignore a list of attributes when clustering using the GUI. I cannot find a way of selecting a list of attributes to be ignored in the command line.

For example:

java weka.clusterers.XMeans \
-I 10 -M 1000 -J 1000 \
-L 2 -H 9 -B 1.0 -C 0.25 \
-D "weka.core.MinkowskiDistance -R first-last" -S 10 \
-t "/home/pedrosaurio/bigtable.arff"


My experience with weka is limited so I don't know if I am missing some basic understanding of how it works.

Answer

Data Preprocessing functions are called filters. You need to use filters together with cluster algorithm. See below example.

java weka.clusterers.FilteredClusterer \ 
-F weka.filters.unsupervised.attribute.Remove -V -R 1,5  \
-W weka.clusterers.XMeans  -I 10 -M 1000   -J 1000  -L 2 -H 9 -B 1.0 -C 0.25 \ 
-D "weka.core.MinkowskiDistance -R first-last" -S 10 \ 
-t "/home/pedrosaurio/bigtable.arff"

Here we remove attributes 1-5 then use xmeans.

Comments