White Big White Big - 2 months ago 6
R Question

Sort data frame by values of character string in column in R

I have dataset like this:

term occ value
Less Than 1 year Yale 1
Less Than 1 year MIT 3
1 Year Yale 2
2 Years Yale 3
2 Years Yale 8
2 years CMU 2
3 Years Yale 5
3 years NYU 2
Greater than 3 Years NYU 5
Greater Than 3 Years CALTEC 4
No Fixed Term Yale 2
Other Bu 9


I want a table shows counts of the numbers of records by Term. And table should be in order of Term.

NOTE: The difference between "Years" and "years", "Than" and "than".

The output is like this:

term count
Less Than 1 year 2
1 Year 1
2 Years 3
3 Years 2
Greater than 3 Years 2
No Fixed Term 1
Other 1

Answer

If you want a special order, you need to specify the order of the levels in the factor. Also you need to do comparisons without regard for the case. This should work

# reproducible data
dd<-read.table(text="term,occ,value 
Less Than 1 year,Yale,1
Less Than 1 year,MIT,3
1 Year,Yale,2
2 Years,Yale,3
2 Years,Yale,8
2 years,CMU,2
3 Years,Yale,5
3 years,NYU,2
Greater than 3 Years,NYU,5
Greater Than 3 Years,CALTEC,4
No Fixed Term,Yale,2
Other,Bu,9", header=T, sep=",")

# specify custom order

termorder<-c("Less Than 1 year","1 Year","2 Years","3 Years",
    "Greater than 3 Years","No Fixed Term","Other")

#tabulate
tt <- table(factor(tolower(dd$term), levels=tolower(termorder), labels=termorder))

that returns a named vector. if you want a data.frame you can do

as.data.frame(tt)
#                 Var1 Freq
# 1     Less Than 1 year    2
# 2               1 Year    1
# 3              2 Years    3
# 4              3 Years    2
# 5 Greater than 3 Years    2
# 6        No Fixed Term    1
# 7                Other    1
Comments