nik nik - 3 months ago 9
R Question

how to add a character to a duplicated string and then remove it

I have a data with one column of string and 10 columns of values. I cannot set the first column as row columns and I figured out that I have duplicated strings in my first column. so what I did was to identify them like below

dftt <- data.frame(myname[which(duplicated(myname)),])


A small portion of dftt is shown below

dftt<- structure(list(V1 = structure(c(6L, 6L, 4L, 6L, 2L, 9L, 10L,
1L, 7L, 11L, 10L, 3L, 8L, 5L, 10L, 10L, 1L, 10L, 11L, 1L), .Label = c("alp-1",
"cfim-2", "eps-8", "fln-2", "istr-1", "lev-11", "pqn-87", "ret-1",
"sao-1", "sup-26", "vab-10"), class = "factor")), .Names = "V1", class = "data.frame", row.names = c(NA,
-20L))


What I want is to add something to duplicate strings to make them unique so that I can be able to set it as row names but keep and index of it or something than after I manipulate the data, I get rid of them

As an example it could be number or specific letter

# V1
#1 lev-11
#2 lev-11_nik1
#3 fln-2
#4 lev-11_nik2
#5 cfim-2
#6 sao-1
#7 sup-26
#8 alp-1
#9 pqn-87
#10 vab-10
#11 sup-26_nik1
#12 eps-8
#13 ret-1
#14 istr-1
#15 sup-26_nik2
#16 sup-26_nik3
#17 alp-1_nik1
#18 sup-26_nik4
#19 vab-10
#20 alp-1_nik2


I know probably I must use paste0 or paste but I don't know how

Answer

First, determine the duplicates:

dup <- duplicated(dftt$V1)

Now make all values in the column unique with make.unique():

dftt$V1 <- make.unique(as.character(dftt$V1), sep = "_nik")
dftt
#             V1
# 1       lev-11
# 2  lev-11_nik1
# 3        fln-2
# 4  lev-11_nik2
# 5       cfim-2
# 6        sao-1
# 7       sup-26
# 8        alp-1
# 9       pqn-87
# 10      vab-10
# 11 sup-26_nik1
# 12       eps-8
# 13       ret-1
# 14      istr-1
# 15 sup-26_nik2
# 16 sup-26_nik3
# 17  alp-1_nik1
# 18 sup-26_nik4
# 19 vab-10_nik1
# 20  alp-1_nik2

To return the values back to their original state, remove _nik and trailing digit(s) from the end of the string with sub():

dftt$V1[dup] <- sub("_nik\\d+$", "", dftt$V1[dup])
dftt
#        V1
# 1  lev-11
# 2  lev-11
# 3   fln-2
# 4  lev-11
# 5  cfim-2
# 6   sao-1
# 7  sup-26
# 8   alp-1
# 9  pqn-87
# 10 vab-10
# 11 sup-26
# 12  eps-8
# 13  ret-1
# 14 istr-1
# 15 sup-26
# 16 sup-26
# 17  alp-1
# 18 sup-26
# 19 vab-10
# 20  alp-1

Note that this will change the column from factor to character, which is going to make things easier for these types of string operations anyway.