arielle arielle - 4 months ago 31
R Question

R: Sorting a vector alphabetically after nth character

I would like to sort the elements (string) of a vector alphabetically, but only considering characters after the nth. The strings can contain both numbers and characters, for example:

> v <- c("ENCSR529JNJ_HNR35NPK_21_K562", "ENCSR529MBZ_AP22IG_11_K562", "ENCSR529MBZ_AP22IG_21_K562", "ENCSR530BOP_DUPT6H_11_K562", "ENCSR530BOP_DUPT6H_21_K562")

and after sorting after the 11th character, v would become:

"ENCSR529MBZ_AP22IG_11_K562", "ENCSR529MBZ_AP22IG_21_K562", "ENCSR530BOP_DUPT6H_11_K562", "ENCSR530BOP_DUPT6H_21_K562", "ENCSR529JNJ_HNR35NPK_21_K562"

Any help will be greatly appreciated! Thanks

v[order(substr(v, start = 12, stop = max(nchar(v))))]
# [1] "ENCSR529MBZ_AP22IG_11_K562"   "ENCSR529MBZ_AP22IG_21_K562"   "ENCSR530BOP_DUPT6H_11_K562"   "ENCSR530BOP_DUPT6H_21_K562"  
# [5] "ENCSR529JNJ_HNR35NPK_21_K562"

substr(v, start = 12, stop = max(nchar(v))) gives the substring omitting the first 11 characters. So we order by that.