Berk U. Berk U. - 8 days ago 9
R Question

Printing Rules / Tree from C5.0 Output in R

I currently working on a project where I have to compare classiication models that are produced using different algorithms. I am wondering how I can save a text version of the rules / tree that is produced using the C5.0 package in R.

Currently I can go about setting up and training the model as follows:

c50model = C5.0(x=X, y=Y, rules=TRUE)


I can then get the full version by calling:

summary(c50model)


This command produces a nice output of the model in the command window, though I am not sure of how to save it into a formatted text file.

I also know that the C50 package will produce a character version of the rules file in
c50model$tree
and a string version of the tree file in
c50model$tree
. Ideally, I would like to print the contents of these files into a text file so that I can easily incorporate it into a research paper later on. Unfortunately, however, the output of these fields is always in a weird type of format... such as:

"id=\"See5/C5.0 2.07 GPL Edition 2013-03-13\"\nentries=\"1\"\nrules=\"6\" default=\"0\"\nconds=\"2\" cover=\"322\" ok=\"321\" lift=\"1.55321\" class=\"0\"\ntype=\"2\" att=\"UniformityOfCellSize\" cut=\"3\" result=\"<\"\ntype=\"2\" att=\"BareNuclei\" cut=\"2\" result=\"<\"\nconds=\"2\" cover=\"305\" ok=\"304\" lift=\"1.55268\" class=\"0\"\ntype=\"2\" att=\"UniformityOfCellShape\" cut=\"2\" result=\"<\"\ntype=\"2\" att=\"BareNuclei\" cut=\"3\" result=\"<\"\nconds=\"2\" cover=\"310\" ok=\"307\" lift=\"1.54282\" class=\"0\"\ntype=\"2\" att=\"UniformityOfCellShape\" cut=\"2\" result=\"<\"\ntype=\"2\" att=\"NormalNucleoli\" cut=\"2\" result=\"<\"\nconds=\"2\" cover=\"137\" ok=\"132\" lift=\"2.65679\" class=\"1\"\ntype=\"2\" att=\"BareNuclei\" cut=\"3\" result=\">\"\ntype=\"2\" att=\"NormalNucleoli\" cut=\"2\" result=\">\"\nconds=\"2\" cover=\"179\" ok=\"170\" lift=\"2.62324\" class=\"1\"\ntype=\"2\" att=\"UniformityOfCellShape\" cut=\"2\" result=\">\"\ntype=\"2\" att=\"BareNuclei\" cut=\"2\" result=\">\"\nconds=\"2\" cover=\"175\" ok=\"166\" lift=\"2.61978\" class=\"1\"\ntype=\"2\" att=\"UniformityOfCellSize\" cut=\"3\" result=\">\"\ntype=\"2\" att=\"UniformityOfCellShape\" cut=\"2\" result=\">\"\n"


Any advice is always appreciated.

Answer
string <- id=\"See5/C5.0 2.07 GPL Edition 2013-03-13\"\nentries=\"1\"\nrules=\"6\" default=\"0\"\nconds=\"2\" cover=\"322\" ok=\"321\" lift=\"1.55321\" class=\"0\"\ntype=\"2\" att=\"UniformityOfCellSize\" cut=\"3\" result=\"<\"\ntype=\"2\" att=\"BareNuclei\" cut=\"2\" result=\"<\"\nconds=\"2\" cover=\"305\" ok=\"304\" lift=\"1.55268\" class=\"0\"\ntype=\"2\" att=\"UniformityOfCellShape\" cut=\"2\" result=\"<\"\ntype=\"2\" att=\"BareNuclei\" cut=\"3\" result=\"<\"\nconds=\"2\" cover=\"310\" ok=\"307\" lift=\"1.54282\" class=\"0\"\ntype=\"2\" att=\"UniformityOfCellShape\" cut=\"2\" result=\"<\"\ntype=\"2\" att=\"NormalNucleoli\" cut=\"2\" result=\"<\"\nconds=\"2\" cover=\"137\" ok=\"132\" lift=\"2.65679\" class=\"1\"\ntype=\"2\" att=\"BareNuclei\" cut=\"3\" result=\">\"\ntype=\"2\" att=\"NormalNucleoli\" cut=\"2\" result=\">\"\nconds=\"2\" cover=\"179\" ok=\"170\" lift=\"2.62324\" class=\"1\"\ntype=\"2\" att=\"UniformityOfCellShape\" cut=\"2\" result=\">\"\ntype=\"2\" att=\"BareNuclei\" cut=\"2\" result=\">\"\nconds=\"2\" cover=\"175\" ok=\"166\" lift=\"2.61978\" class=\"1\"\ntype=\"2\" att=\"UniformityOfCellSize\" cut=\"3\" result=\">\"\ntype=\"2\" att=\"UniformityOfCellShape\" cut=\"2\" result=\">\"\n"

If you want txt file output:

write(string, file="string.txt")

You'll notice that there's a new line every time you see the \n character and that all quotes are escaped by having \ before them. print doesn't do anything with them, but cat does. If you want to see beforehand what it will look like in the file, you can use:

cat(string)

Alternatively, if you just want the output from the summary:

write(capture.output(summary(c50model)), "c50model.txt")
Comments