dommer dommer - 11 days ago 5
R Question

How do I convert a integer column in a SparkR data frame to a string?

I have a SparkR dataframe where all columns are integers. I want to replace one column with strings.

So, if the column contains 0, 1, 1, 0, I want to make that "no", "yes", "yes", "no".

I tried

df$C0 <- ifelse(df$C0 == 0, "no", "yes)


but that just gives me

Error in as.logical(from) :
cannot coerce type 'S4' to vector of type 'logical'


How would I go about making this update?

P.S. I based the above attempt on the fact that this works:

df$C0 <- df$C0 + 1

Answer

Probably the simplest solution here is to use SQL:

# Because it is hard to live without pipes
library(magrittr)

# Create sqlContext
sqlContext <- sparkRSQL.init(sc)
sqlContext <- SQLContext(sc)

# Register table
registerTempTable(df, 'df')

# Query
sql(sqlContext, "SELECT *, IF(C0 = 0, 'yes', 'no') AS C0 FROM df") %>% showDF()

Unfortunately it creates a duplicate name so it probably to rename existing one first:

df <- df %>% withColumnRenamed(existingCol = 'C0', newCol = 'CO_old')
registerTempTable(df, 'df')
sql(sqlContext, "SELECT *, IF(C0_old = 0, 'yes', 'no') AS C0 FROM df")

or simply replace * with a list of columns you need.

It is also possible to use when / otherwise:

df %>% select(when(df$C) == 0, 'yes') %>% otherwise('no'))
Comments