Mohsin Waqas Mohsin Waqas - 1 year ago 25
R Question

How to extract the high numeric values from data frame in R

I am presenting a small data frame here that is from model output file and I extracted the required parameters

time
and
WatBlar
and converted it into data frame.
Complete code starts from here.

library(stringr)

x <- readLines("G:/Rlearning/Mohsin-FM/Balance.out")

a <- grep("[T]", x, value = T)
b <- grep("Time", a , value = T)

c <- b[-c(1,2)]
d <- grep("WatBalR", x, value = T)


The data is like that

data <- data.frame(time =c, watbalr = d)

> data


time watbalr
1 Time [T] 3.0000 WatBalR [%] 0.040
2 Time [T] 6.0000 WatBalR [%] 0.024
3 Time [T] 9.0000 WatBalR [%] 0.044
4 Time [T] 30.0000 WatBalR [%] 0.034


I checked the data class it is data frame that is shown below.

> c
[1] " Time [T] 3.0000" " Time [T] 6.0000"
[3] " Time [T] 9.0000" " Time [T] 30.0000"

> class(c)
[1] "character"



> d
[1] " WatBalR [%] 0.040" " WatBalR [%] 0.024"
[3] " WatBalR [%] 0.044" " WatBalR [%] 0.034"

> class(d)
[1] "character"

> class(data)
[1] "data.frame"


The code to extract the required values is written as shown below. But it is just assigning the value of the time 0 to 9, any value above than 9 it just start it again 0 to 9.

times <- sub("^.+?(\\d)", "\\1", c)
WatBlaR <- sub("^.+?(\\d)", "\\1", d)

times <- as.numeric(times)
WatBlaR <- as.numeric(WatBlaR)

# plot
plot(x = times, y = WatBlaR)


The results for 4 values as mentioned above in data frame are shown below.

> times
[1] 3 6 9 0


But the required results for time are

3, 6, 9, 30


When I want to extract the model data from the daily basis data it present the values as

> times
0,1,2,3,4,5,6,7,8,9, 0,1,2,3,4,5,6,7,8,9, 0,1,2,3,4,5,6,7,8,9


It just followed the sequence of all the time available 0 to 9, the required out should be like that

> times
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30

Answer Source

You can extract the numbers using sub from the base package where you look for a pattern of the form

  • any number of digits followed by
  • a dot (optional) followed by
  • any number of digits (optional)

This is how you could do it:

library(magrittr)   ## For pipe %>%

# Some sample data
data <- data.frame(time = c(" Time       [T]        3.0000", 
                " Time       [T]        6.0000",
                " Time       [T]        9.0000", 
                " Time       [T]       30.0000"),
        watbalr = c(" WatBalR  [%]              0.040", 
                " WatBalR  [%]              0.024", 
                " WatBalR  [%]              0.044", 
                " WatBalR  [%]              0.034"),    stringsAsFactors = FALSE)

## Extract pattern and convert to numeric:
times <- sub("[^[:digit:]]*(\\d+\\.?\\d*).*", "\\1", data$time) %>%
        as.numeric
WatBalR  <- sub("[^[:digit:]]*(\\d+\\.?\\d*).*", "\\1", data$watbalr) %>%
        as.numeric

> times
# [1]  3  6  9 30
> WatBalR
# [1] 0.040 0.024 0.044 0.034
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download