Betty Betty - 1 month ago 6
R Question

Converting data to a 2 column format?

If I have a dataset like the following:

LA NY MA
1 2 3
4 5 6
3 5
4


(In other words, each row has a different structure. LA has 3 values, NY has 4 values, etc.)
I am trying to use
lm
to perform an ANOVA test (to decide whether the mean number is the same in each state), and it keeps showing "an error occurred" because rows do not match. One idea I got was to convert data to a 2-column format. Which command/package should I use to perform that task?

Edit: the data is from the txt file.

Answer

Another option after you read the file to convert to a 2-column format would be

df <- read.table("Betty.txt", header=TRUE, fill=TRUE, sep="\t") 
## (as @Richard Scriven mentioned in the comment)
na.omit(stack(df))
#    values ind
#1       1  LA
#2       4  LA
#3       3  LA
#5       2  NY
#6       5  NY
#7       5  NY
#8       4  NY
#9       3  MA
#10      6  MA

Update

The above I got by transforming the data to have \t delimiter. But, if the file is copy/pasted directly from the OP's post without any change (making sure that there are spaces for the 3rd and 4th row after the 2nd column)

 lines <- readLines('Betty1.txt')
 lines2 <-  gsub("(?<=[^ ]) +|^[ ]+(?<=[ ])(?=[^ ])", ",", lines, perl=TRUE)

 lines2
 #[1] "LA,NY,MA" "1,2,3"    "4,5,6"    "3,5,"     ",4,"    
 df1 <- read.table(text=lines2, sep=',', header=TRUE)
 df1
 # LA NY MA
 #1  1  2  3
 #2  4  5  6
 #3  3  5 NA
 #4 NA  4 NA

and then do

 na.omit(stack(df1))

Update2

Another option if you have fixed width columns is to use read.fwf

 df <- read.fwf('Betty1.txt', widths=c(3,3,3), skip=1)
 colnames(df) <- scan('Betty1.txt', nlines=1, what="", quiet=TRUE)
 df
 #  LA NY MA
 #1  1  2  3
 #2  4  5  6
 #3  3  5 NA
 #4 NA  4 NA

 library(tidyr)
 gather(df, Var, Val, LA:MA, na.rm=TRUE)
 #  Var Val
 #1  LA   1
 #2  LA   4
 #3  LA   3
 #4  NY   2
 #5  NY   5
 #6  NY   5
 #7  NY   4
 #8  MA   3
 #9  MA   6
Comments