I'm looking at trying to query a table in a MySQL database (I have the primary key, which is comprised of two categories, a name and a number but string comparison), such that this table could have anywhere from very few rows to upwards of hundreds of millions. Now, for efficiency, I'm not exactly sure how costly it is to actually do an
Let the database do the work.
You should do the second method. If you don't want to get a failure, you can use
on duplicate key update:
insert into t(pk1, pk2, . . . ) values ( . . . ) on duplicate key update set pk1 = values(pk1);
The only purpose of
on duplicate key update is to do nothing useful but not return an error.
Why is this the best solution? In a database, a primary key (or columns declared unique) have an index structure. This is efficient for the database to use.
Second, this requires only one round-trip to the database.
Third, there are no race conditions, if you have multiple threads or applications that might be attempting to insert the same record(s).
Fourth, the method with
on duplicate key update will work for inserting multiple rows at once. (Without
on duplicate key insert, then a multi-value statement would fail if a single row is duplicated.) Combining multiple inserts into a single statement can be another big efficiency.
Your second option is really the right way to go.