user1471980 user1471980 - 3 months ago 7
R Question

how do you parse a sysmon file to extract certain information using R?

I am trying to read bunch of this type of files using R to parse out the information and put the data in a data frame like format:

this is the contents of the file:

last_run current_run seconds
------------------------------- ------------------------------- -----------
Jul 4 2016 7:17AM Jul 4 2016 7:21AM 226


Engine Utilization (Tick %) User Busy System Busy I/O Busy Idle
------------------------- ------------ ------------ ---------- ----------
ThreadPool : syb_default_pool
Engine 0 5.0 % 0.4 % 22.4 % 72.1 %
Engine 1 3.9 % 0.5 % 22.8 % 72.8 %
Engine 2 5.6 % 0.3 % 22.5 % 71.6 %
Engine 3 5.1 % 0.4 % 22.7 % 71.8 %

------------------------- ------------ ------------ ---------- ----------
Pool Summary Total 336.1 % 25.6 % 1834.6 % 5803.8 %
Average 4.2 % 0.3 % 22.9 % 72.5 %

------------------------- ------------ ------------ ---------- ----------
Server Summary Total 336.1 % 25.6 % 1834.6 % 5803.8 %
Average 4.2 % 0.3 % 22.9 % 72.5 %

Transaction Profile
-------------------

Transaction Summary per sec per xact count % of total
------------------------- ------------ ------------ ---------- ----------
Committed Xacts 137.3 n/a 41198 n/a

Average Runnable Tasks 1 min 5 min 15 min % of total
------------------------- ------------ ------------ ---------- ----------
ThreadPool : syb_default_pool
Global Queue 0.0 0.0 0.0 0.0 %
Engine 0 0.0 0.1 0.1 0.6 %
Engine 1 0.0 0.0 0.0 0.0 %
Engine 2 0.2 0.1 0.1 2.6 %

------------------------- ------------ ------------ ----------
Pool Summary Total 7.2 5.9 6.1
Average 0.1 0.1 0.1

------------------------- ------------ ------------ ----------
Server Summary Total 7.2 5.9 6.1
Average 0.1 0.1 0.1


I need to capture "Jul 4 2016 7:21AM" as Date,
from "Engine Utilization (Tick%) line, Server Summary ->Average "4.2%"

From "Transaction Profile" section ->Transaction Profile "count" entry.

so, my data frame should look something like this:

Date Cpu Count
Jul 4 2016 7:21AM 4.2 41198


Can somebody help me how to parse this file to get these output?

I have tried something like this:

read.table(text=readLines("file.txt")[count.fields("file.txt", blank.lines.skip=FALSE) == 9])


to get this line:

Average 4.2 % 0.3 % 22.9 % 72.5 %


But I want to be able to only extract Average right after

Engine Utilization (Tick %), since there could be many lines that start with Average. The Average line that shows up right after Engine Utilization (Tick %), is the one I want.

How do I put that in this line to extract this information from this file:

read.table(text=readLines("file.txt")[count.fields("file.txt", blank.lines.skip=FALSE) == 9])


Can I use grep in this read.table line to search for certain characters?

Answer
extract <- function(filenam="file.txt"){
    txt <- readLines(filenam)

    ## date of current run:
    ## assumed to be on 2nd line following the first line matching "current_run"
    ii <- 2 + grep("current_run",txt, fixed=TRUE)[1]
    line_current_run <- Filter(function(v) v!="", strsplit(txt[ii]," ")[[1]])
    date_current_run <- paste(line_current_run[5:8], collapse=" ")


    ## Cpu:
    ## assumed to be on line following the first line matching "Server Summary"
    ## which comes after the first line matching "Engine Utilization ..."
    jj <- grep("Engine Utilization (Tick %)", txt, fixed=TRUE)[1]
    ii <- grep("Server Summary",txt, fixed=TRUE)
    ii <- 1 + min(ii[ii>jj])
    line_Cpu <- Filter(function(v) v!="", strsplit(txt[ii]," ")[[1]])
    Cpu <- line_Cpu[2]


    ## Count:
    ## assumed to be on 2nd line following the first line matching "Transaction Summary"
    ii <- 2 + grep("Transaction Summary",txt, fixed=TRUE)[1]
    line_count <- Filter(function(v) v!="", strsplit(txt[ii]," ")[[1]])
    count <- line_count[5]

    data.frame(Date=date_current_run, Cpu=Cpu, Count=count, stringsAsFactors=FALSE)
}

print(extract("file.txt"))

##file.list <- dir("./")
file.list <- rep("file.txt",3)
merged <- do.call("rbind", lapply(file.list, extract))

print(merged)

file.list <- rep("file.txt",2000)
print(system.time(merged <- do.call("rbind", lapply(file.list, extract))))
## runs in about 2.5 secs on my laptop
Comments