leo leo - 1 year ago 47
Bash Question

How to sort a 3G bytes access log file?

Hi all: Now I have a 3G bytes tomcat access log named urls, each line is a url. I want to count each url and sort these urls order by the number of each url. I did it this way:

awk '{print $0}' urls | sort | uniq -c | sort -nr >> output

But it took really long time to finish this job, it's already took 30 minutes and its still working.
log file is like bellow:


Is there any other way that I could process and sort a 3G bytes file? Thanks in advance!

Answer Source

I'm not sure why you're using awk at the moment - it's not doing anything useful.

I would suggest using something like this:

awk '{ ++urls[$0] } END { for (i in urls) print urls[i], i }' urls | sort -nr

This builds up a count of each URL and then sorts the output.