leo leo - 1 year ago 70
Bash Question

How to use shell to filter a file by lines?

I hava a log file like bellow:

5082 //open_api/user/get_user_info
5074 /user/get_user_idCard_info?passportId=YRD1412538757&viewSource=02
5029 /user/getuserinfo?passportId=YRD1412538757
4706 /user/getuserinfo?passportId=YRD1507000030516
4611 /user/get_user_idCard_info?passportId=YRD1507000030516&viewSource=02
4040 /salesloan/update_draw_bank

The output should be like:

5082 //open_api/user/get_user_info
9685 /user/get_user_idCard_info
9735 /user/getuserinfo
4040 /salesloan/update_draw_bank

The number before each line is the number this url is called. Now I want to count how many times each url(without params for get http request)is requested, for example as above I only want to count the times the '/repay/query_need_repay_data.action' url was called. Now I am using java to filter and process the lines, but for a 200M bytes file it already took 4 hours and still working, I want to know in which way could I get the work done quickly?

Java codes:

public static void main(String[] args) throws IOException {
String source = "/Users/leo/logs/p2pservice/access/a2.output";
String target = "/Users/leo/logs/p2pservice/access/targetUrls";
File targetFile = new File(target);
String splinter = "\\?";

List<String> strings = Files.readLines(new File(source), Charsets.UTF_8);
for (String string : strings) {
if (string.contains("?")) {
String[] split = string.split(splinter);
Files.append(string.split(splinter)[0].toString() + "\n", targetFile, Charsets.UTF_8);
} else {
Files.append(string + "\n", targetFile, Charsets.UTF_8);

Thanks in advance.

Answer Source

awk to the rescue!

$ awk -F'[ ?]' '{a[$2]+=$1} END{for(k in a) print a[k], k}' file

14341 /repay/query_need_repay_data.action