Jelmer - 3 years ago 73
R Question

# Merge 2 dataframes based on condition in R

I have the following 2 data frames that I want to merge:

``````x <- data.frame(a= 1:11, b =3:13, c=2:12, d=7:17, invoice = 1:11)
x =
a  b  c  d invoice
1  3  2  7       1
2  4  3  8       2
3  5  4  9       3
4  6  5 10       4
5  7  6 11       5
6  8  7 12       6
7  9  8 13       7
8 10  9 14       8
9 11 10 15       9
10 12 11 16      10
11 13 12 17      11

y <- data.frame(nr = 100:125, invoice = 1)
y\$invoice[12:26] <- 2

> y
nr invoice
100       1
101       1
102       1
103       1
104       1
105       1
106       1
107       1
108       1
109       1
110       1
111       2
112       2
113       2
114       2
115       2
116       2
117       2
``````

I want to merge the letters from dataframe X with dataframe Y when the invoice number is the same. It should start with merging the value from letter A, then B ect. This should be happening until the invoice number is not the same anymore and then choose the numbers from invoice nr 2.

the output should be like this:

``````> output
nr invoice  letter_count
100       1    1
101       1    3
102       1    2
103       1    7
104       1    1
105       1    3
106       1    2
107       1    7
108       1    1
109       1    2
110       1    7
111       2    2
112       2    4
113       2    3
114       2    8
115       2    2
116       2    4
``````

I tried to use the
`merge`
function with the
`by`
argument but this created an error that the number of rows is not the same. Any help I will appreciate.

Here is a solution using the `purrr` package.

``````# Prepare the data frames
x <- data.frame(a = 1:11, b = 3:13, c = 2:12, d = 7:17, invoice = 1:11)
y <- data.frame(nr = 100:125, invoice = 1)
y\$invoice[12:26] <- 2

library(purrr)

# Split the data based on invoice
y_list <- split(y, f = y\$invoice)

# Design a function to transfer data
trans_fun <- function(main_df, letter_df = x){

# Get the invoice number
temp_num<- unique(main_df\$invoice)
# Extract letter_count information from x
add_vec <- unlist(letter_df[letter_df\$invoice == temp_num, 1:4])
# Get the remainder of nrow(main_df) and length(add_vec)
reamin_num <- nrow(main_df) %% length(add_vec)
# Get the multiple of nrow(main_df) and length(add_vec)
multiple_num <- nrow(main_df) %/% length(add_vec)
# Create the entire sequence to add
# Add new column, add_seq2, to main_df

return(main_df)
}

# Apply the trans_fun function using map_df
output <- map_df(y_list, .f = trans_fun)

# See the result
output
nr invoice letter_count
1  100       1            1
2  101       1            3
3  102       1            2
4  103       1            7
5  104       1            1
6  105       1            3
7  106       1            2
8  107       1            7
9  108       1            1
10 109       1            3
11 110       1            2
12 111       2            2
13 112       2            4
14 113       2            3
15 114       2            8
16 115       2            2
17 116       2            4
18 117       2            3
19 118       2            8
20 119       2            2
21 120       2            4
22 121       2            3
23 122       2            8
24 123       2            2
25 124       2            4
26 125       2            3
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download