Wilcar Wilcar - 3 months ago 8
R Question

spliting hashtags in a data.frame object with R

I am collecting twitter's hashtags. Each tweet can include hashtags.

tests <- c("xxxxxx #SaveTheDate xxxxxx #Histoire] xxxxxx #Femmes xxxxxxx #ports",
"xxxxxxxxxxxx",
"xxxx #rock xxxxxx #Nantes" ,
"xxxxxx #lvan xxxxxxx #nantes xxxxx #ilsepassetoujoursuntruc")


library (stringr)

hashtags <- str_extract_all(tests, "#\\S+")

str (hashtags)


Ma results:

str(hashtags)
list of 4
$ : chr [1:4] "#SaveTheDate" "#Histoire]" "#Femmes" "#ports"
$ : chr(0)
$ : chr [1:2] "#rock" "#Nantes"
$ : chr [1:3] "#lvan" "#nantes" "#ilsepassetoujoursuntruc"


What I expect: a data.frame with one hashtag for a row

"#SaveTheDate"
"#Histoire"
"#Femmes"
"#ports"
NA
....


What I tried:

hashtags_df <-as.data.frame(hashtags)

Answer
for (i in 1:length(hashtags)) {
   if (length(hashtags[[i]]) < 1) {
      hashtags[[i]] <- NA
   }
}

This will replace your length zero lists with NAs.

hashtags <- unlist(hashtags)

will give you a column vector of the values. If you'd like a dataframe, you can use as.data.frame now.

hashtags_df <- as.data.frame(hashtags)

I don't know the best way to extract hashtags, etc., but this should answer the question as currently asked.