Merging two datasets only by first row in R

r merge

40 观看

1回复

662 作者的声誉

This question already has an answer here:

I need to merge two datasets, but in the second one, there can be duplicate id, for example several id of 1,1,1. If there are duplicate id, how to merge to the very first row of them?

To be more clear, here's a reproducible example:

df1
structure(list(id = 1:2, y = 10:11), .Names = c("id", "y"), class = "data.frame", row.names = c(NA, 
-2L))

df2
structure(list(id = c(1L, 1L, 1L, 2L), x1 = 435:438, x2 = c(435L, 
436L, 436L, 438L), x3 = c(435L, 436L, 436L, 438L)), .Names = c("id", 
"x1", "x2", "x3"), class = "data.frame", row.names = c(NA, -4L
))

Eaxample: In output i expect this format

id  y   x1  x2  x3
1   10  435 435 435
2   11  438 438 438

I.E. 2 and 3 rows (1 id) do not participate in merging.

作者: D.Joe 的来源 发布者: 2017 年 12 月 27 日

回应 1


1

1545 作者的声誉

You can do it using data.table. You can retain only first occurrence where id == 1 from your second data set and then merge both the data sets.

Here is the solution:

library(data.table)
setDT(df2)
df2[, idx := 1:.N, by = id]
df2 <- df2[idx == 1, ]
df2[, idx := NULL]
output <- merge(df1, df2, by = "id")
output

It'll give you your desired output:

 id  y  x1  x2  x3
1  1 10 435 435 435
2  2 11 438 438 438
作者: samadhi 发布者: 2017 年 12 月 27 日
32x32