I have a big data set with around 500.000 rows. Each of them are strings. I would like to trim all rows to a fixed size.
I found this:
dt$rev <- strtrim(dt$rev, width=max_len)
This has nothing to do with data.table. It's just that
strtrim() is fairly slow.
As long as you're operating on single-width characters (i.e., characters that aren't, for instance, Chinese/Japanese/Korean), you can instead use
substr(), which is much faster.
## Make a long character vector with 5 million elements x <- rep(state.name, 1e5) ## Speed comparison system.time(substr(x, 1, 3)) # user system elapsed # 0.43 0.00 0.44 system.time(strtrim(x, 3)) # user system elapsed # 44.63 0.03 44.85 ## Confirm that both methods return the same output identical(substr(state.name,1,3), strtrim(state.name,3)) #  TRUE