I have two tables and I would like to append them so that only all the data in table A is retained and data from table B is only added if its key is unique (Key values are unique in table A and B however in some cases a Key will occur in both table A and B).
I think the way to do this will involve some sort of filtering join (anti-join) to get values in table B that do not occur in table A then append the two tables.
I am familiar with R and this is the code I would use to do this in R.
## Filtering join to remove values already in "TableA" from "TableB"
FilteredTableB <- anti_join(TableB,TableA, by = "Key")
## Append "FilteredTableB" to "TableA"
CombinedTable <- bind_rows(TableA,FilteredTableB)
Consider the following dataframes
TableA = pd.DataFrame(np.random.rand(4, 3), pd.Index(list('abcd'), name='Key'), ['A', 'B', 'C']).reset_index() TableB = pd.DataFrame(np.random.rand(4, 3), pd.Index(list('aecf'), name='Key'), ['A', 'B', 'C']).reset_index()
This is one way to do what you want
# Identify what values are in TableB and not in TableA key_diff = set(TableB.Key).difference(TableA.Key) where_diff = TableB.Key.isin(key_diff) # Slice TableB accordingly and append to TableA TableA.append(TableB[where_diff], ignore_index=True)
rows =  for i, row in TableB.iterrows(): if row.Key not in TableA.Key.values: rows.append(row) pd.concat([TableA.T] + rows, axis=1).T
4 rows with 2 overlap
Method 1 is much quicker
10,000 rows 5,000 overlap
loops are bad