Chris Chris - 1 month ago 14
MySQL Question

ElasticSearch usage with MySQL

I'm using ElasticSearch for the search component of a site. The data that is being indexed and eventually searched is the same data that is being saved in a MySQL DB.

My approach to this is to add/delete/modify data in the index when the corresponding CRUD MySQL operation happens.

For instance, a create operation looks something like this:

public function savePost(Request $request) {
//Firstly, create the object and save it to MySQL
$post = new Post();
$post->title = $request->title;
$post->body = $request->body;
//...
//and so on
$post->save();

//Secondly, index this new data:
$elasticSearchClient = ClientBuilder::create()->build();

$params = [
'index' => 'some_index_elasticsearch',
'id' => $post->id,
'type' => 'post',
'timestamp' => time(),
'body' => [
'id' => $post->id,
'title' => $post->title,
'body' => $post->body,
//... and so on
],
];

$elasticSearchClient->index($params);

}


If the data is deleted/updated in MySQL I'd just delete it or update it from the index.

Is this the right approach to using MySQL with ElasticSearch (or any other comparable technology like Sphinx)? or would you recommend a better approach to using MySQL as a more of a data source for ElasticSearch? (which really isn't happening at all here because there is no interaction between ElasticSearch and MySQL at all).

I'm using https://github.com/elastic/elasticsearch-php to interact with ElasticSearch if it makes any difference.

Just to clarify: this approach does work so far - I'm just not sure if it is the right way, or if anyone can see problems that I may run into with this way of doing things.

Val Val
Answer

There is no "right way" to use Elasticsearch. "Right" is relative, so the "right way" is a way that supports your use case(s). Elasticsearch doesn't only work for one specific use case, but for increasingly many more than one use case.

The case you describe is a perfectly valid one, i.e. indexing in ES whatever content you have in another RDBMS such as MySQL and making sure the indexed content is in synch with the primary source of truth.

One difficult thing in your use case that you need to keep in mind is that you have to guarantee that MySQL and ES are always 1:1 in synch, and that's not necessarily easy to do for various reasons:

  • what happens if you need to bring ES down for maintenance, but your app has to stay up for whatever reason?
  • what happens if there's an issue in ES and a document doesn't get indexed/updated/deleted? (remember there's no transactional support)

There are otherways to synch MySQL and ES that are less brittle, e.g. by using the binlog.

You need to ask yourself those questions and figure out a strategy to mitigate those potential problems, because I can assure you they (and others) will definitely arise.

To sum up, there's no problem with your architecture, thousands of companies do the exact same thing, however, you need to have a plan if your synch plan goes south.