Harish Harish - 1 month ago 19
R Question

How to remove empty columns in transaction data read with the arules package?

I have a dataset made in the format of a basket data. I have read that dataset in R using a package call arules which has an inbuilt function for reading transactions, so I have used that and read my dataset. Following is the code I used:

trans = read.transactions("C:/Users/HARI/Desktop/Graph_mining/transactional_data_v3.csv", format = "basket", sep=",",rm.duplicates=TRUE)
inspect(trans[1:5])
items
1 {,
ANTIVERT,
SOFTCLIX}
2 {,
CEFADROXIL,
ESTROGEN}
3 {,
BENZAMYCIN,
BETAMETH,
KEFLEX,
PERCOCET}
4 {,
ACCUTANE(RXPAK;10X10),
BENZAMYCIN}
5 {,
ALBUTEROL,
BUTISOLSODIUM,
CLARITIN,
NASACORTAQ}


As you can see, when I use
inspect(trans)
it shows transactions with an empty column in each. My question is how can I remove those empty columns?

For a full
dput
of the
trans
object, please see this link.

Answer

I think I've found a solution to your problem. I took your csv file, opened it in Excel and replaced all empty cells with NA. Then I pasted the whole thing into LibreOffice Calc and saved it back to csv specifying that double quotes should be used for all cells (oddly enough, Excel won't do that except with a vba macro. You could read the file directly in LibreOffice instead of Excel, however, replacing empty cells with NA's will take forever). Then:

trans <- read.table("d:/downloads/transactional_data_2.csv", sep=",", stringsAsFactors = TRUE, na.strings="NA", header=TRUE)
trans2 <- as(trans, "transactions")
inspect(trans2[1:5])

RESULTS

inspect(trans[1:5])

  items                      transactionID
1 {X1=SOFTCLIX,                           
   X2=ANTIVERT}                          1
2 {X1=ESTROGEN,                           
   X2=CEFADROXIL}                        2
3 {X1=KEFLEX,                             
   X2=BETAMETH,                           
   X3=PERCOCET,                           
   X4=BENZAMYCIN}                        3
4 {X1=BENZAMYCIN,                         
   X2=ACCUTANE(RXPAK;10X10)}             4
5 {X1=CLARITIN,                           
   X2=ALBUTEROL,                          
   X3=NASACORTAQ,                         
   X4=BUTISOLSODIUM}                     5

I think that's the results you're looking for...?

Comments