Jonathan Stegall Jonathan Stegall - 2 months ago 14
MySQL Question

SQL to migrate author/story relationships from Drupal to WordPress

I'm working on a series of SQL (MySQL) queries to migrate a site from Drupal 6 to WordPress (4.5.3 currently). I'm running all the queries on localhost on my laptop (so it just moves data from one database to another), so at this time performance is not a concern.

I'm having trouble migrating the relationships between author nodes and story nodes. In Drupal it works like this:


  • There is an article node. For this purpose it is not custom. There are 50,000+ of these.

  • There is a custom content type for authors (because many authors are not actually Drupal users, and also because it allows us a more custom display for author pages). There are ~2000 of these.

  • There is a
    content_field_op_author
    table with the following columns: vid (for revisions), nid (for the article node), and field_op_author_nid (for the author node).



I tried running this query to understand how many pairs there are:

SELECT COUNT(*) FROM content_field_op_author GROUP BY nid;


This resulted in 57,898 rows (which seems realistic to me).

In WordPress, the setup is like this:


  • wp_posts has the standard post content type. All articles have been successfully imported, and the primary ID matches the node rows in Drupal.

  • There is a custom post type for guest author (this is run by a plugin). All ~2000 have been successfully imported, and the primary ID matches the node rows in Drupal here as well.

  • Each guest author also exists as a row in
    wp_terms
    and
    wp_term_taxonomy
    . There is a column I've created called
    user_node_id_old
    just to preserve the node ID for the user's post ID. I'm not sure if this is necessary, but it does exist.



I've been unable to create the correct rows in
wp_term_relationships
with some different query attempts.

Queries I've tried:

INSERT IGNORE INTO `wordpress`.wp_term_relationships(object_id, term_taxonomy_id)
SELECT nid as object_id, tax.term_taxonomy_id as term_taxonomy_id
FROM `drupal`.content_field_op_author author
INNER JOIN `wordpress`.wp_terms t ON t.user_node_id_old = author.field_op_author_nid
INNER JOIN `wordpress`.wp_term_taxonomy tax ON t.term_id = tax.term_id
INNER JOIN `wordpress`.wp_posts p ON author.nid = p.Id
WHERE field_op_author_nid IS NOT NULL
GROUP BY object_id
;


This results in 52,754 rows being added to
wp_term_relationships
which seems like a bizarre gap to me.

I ran this query to try to find rows that were in my Drupal database but not in WordPress:

SELECT DISTINCT `wordpress`.p.ID as wordpress_post_id, `wordpress`.t.name as wordpress_author_name, n.nid as drupal_id, au.title as drupal_author_name
FROM `wordpress`.wp_posts p
INNER JOIN `wordpress`.wp_term_relationships r ON r.object_id = p.ID
INNER JOIN `wordpress`.wp_term_taxonomy tax ON tax.term_taxonomy_id = r.term_taxonomy_id
INNER JOIN `wordpress`.wp_terms t ON t.term_id = tax.term_id
LEFT OUTER JOIN `drupal`.node n ON p.ID = n.nid
LEFT OUTER JOIN `drupal`.content_field_op_author a ON n.nid = a.nid
LEFT OUTER JOIN `drupal`.node au ON a.field_op_author_nid = au.nid
WHERE tax.taxonomy = 'author'
AND `wordpress`.t.name != au.title
;


This resulted in 333 rows.

I tried this one as well:

INSERT IGNORE INTO `wordpress`.wp_term_relationships(object_id, term_taxonomy_id, user_node_id_old)
SELECT nid as object_id, 0 as term_taxonomy_id, field_op_author_nid as user_node_id_old
FROM `drupal`.content_field_op_author
WHERE field_op_author_nid IS NOT NULL
GROUP BY CONCAT(nid, field_op_author_nid)
;


This resulted in 331 rows. I think this query is much more accurate with what it's actually migrating, but it still doesn't get the number match I think I need. It also doesn't seem relevant to the difference in total row count.

Answer

I've determined that at least part of the reason for my mismatch is that Drupal stores the story/author pair in whatever order it wants to. It's not alphabetical, or by the author's node ID. I discovered this after many different ways of trying to sort the insert before it went into WordPress.

I tried this query, and it resulted in 0 rows. This is as it should be, but I'm still a bit nervous that this means everything was successful:

SELECT DISTINCT `wordpress`.p.ID as wordpress_post_id, `wordpress`.t.name as wordpress_author_name, n.nid as drupal_id, au.title as drupal_author_name
FROM `wordpress`.wp_posts p
INNER JOIN `wordpress`.wp_term_relationships r ON r.object_id = p.ID
INNER JOIN `wordpress`.wp_term_taxonomy tax ON tax.term_taxonomy_id = r.term_taxonomy_id
INNER JOIN `wordpress`.wp_terms t ON t.term_id = tax.term_id
LEFT OUTER JOIN `drupal`.node n ON p.ID = n.nid
LEFT OUTER JOIN `drupal`.content_field_op_author a ON n.nid = a.nid
LEFT OUTER JOIN `drupal`.node au ON t.name = au.title
WHERE tax.taxonomy = 'author'
AND `wordpress`.t.name != au.title
;

This query - I think - does a join on the name of the author rather than the ID it comes in with, which should allow for a match even if the order is flipped. This flipped order was what I kept noticing with the previous query - if a story had multiple authors, their order would be flipped between the two systems.

Comments