user1255553 user1255553 - 7 months ago 30
PHP Question

Improve speed recommendations Neo4j

I'm trying to create a simple recommendation engine using Neo4j and Reco4PHP.

The data model consists of the following nodes and relationship:


(User)-[:HAS_BOUGHT]->(Product {category_id: int}
)-[:DESIGNED_BY]->(Designer)


In this system I want to recommend products and boost products with the same designer as the user already bought. To create the recommendations I use one Discovery-class and one Post-Processor class to boost the products. See below. This works, but it is very slow. It takes more than 5 seconds to complete, while the datamodel holds ~1000 products and ~100 designers.

// Disovery class
<?php
namespace App\Reco4PHP\Discovery;
use GraphAware\Common\Cypher\Statement;
use GraphAware\Common\Type\NodeInterface;
use GraphAware\Reco4PHP\Engine\SingleDiscoveryEngine;

class InCategory extends SingleDiscoveryEngine {

protected $categoryId;

public function __construct($categoryId) {
$this->categoryId = $categoryId;
}

/**
* @return string The name of the discovery engine
*/
public function name() {
return 'in_category';
}

/**
* The statement to be executed for finding items to be recommended
*
* @param \GraphAware\Common\Type\NodeInterface $input
* @return \GraphAware\Common\Cypher\Statement
*/
public function discoveryQuery(NodeInterface $input) {

$query = "
MATCH (reco:Card)
WHERE reco.category_id = {category_id}
RETURN reco, 1 as score
";

return Statement::create($query, ['category_id' => $this->categoryId]);
}
}

// Boost shared designers
class RewardSharedDesigners extends RecommendationSetPostProcessor {

public function buildQuery(NodeInterface $input, Recommendations $recommendations)
{
$ids = [];
foreach ($recommendations->getItems() as $recommendation) {
$ids[] = $recommendation->item()->identity();
}

$query = 'UNWIND {ids} as id
MATCH (reco) WHERE id(reco) = id
MATCH (user:User) WHERE id(user) = {userId}
MATCH (user)-[:HAS_BOUGHT]->(product:Product)-[:DESIGNED_BY]->()<-[:DESIGNED_BY]-(reco)

RETURN id, count(product) as sharedDesignedBy';

return Statement::create($query, ['ids' => $ids, 'userId' => $input->identity()]);
}

public function postProcess(Node $input, Recommendation $recommendation, Record $record) {
$recommendation->addScore($this->name(), new SingleScore((int)$record->get('sharedDesignedBy')));
}

public function name() {
return 'reward_shared_designers';
}
}


I'm happy that it works, but if it takes more than 5 seconds to compute it is not useable in a production environment.

To improve the speed I have:


  • created indexes in Product:id and Designer:id

  • Add node_auto_indexing=true to neo4j.properties.

  • Add -Xmx4096m to .neo4j-community.vmoptions
    But it doesn't really make a difference.



It is normal that these Cypher queries take more than 5 seconds or are there some improvements possible? :)

Answer

The main problem is with your post processor query. The goal is :

Boost the recommendation based on the number of products I bought from the designer having designed the recommended item.

Therefore, you can modify a bit your query to match directly the designer and aggregate on it, also it's best to find first the user before the UNWIND as otherwise it will match the user on every iteration of the product ids :

MATCH (user) WHERE id(user) = {userId}
UNWIND {ids} as productId
MATCH (product:Product)-[:DESIGNED_BY]->(designer)
WHERE id(product) = productId
WITH productId, designer, user
MATCH (user)-[:BOUGHT]->(p)-[:DESIGNED_BY]->(designer)
RETURN productId as id, count(*) as score

The complete post processor would look like this :

    public function buildQuery(NodeInterface $input, Recommendations $recommendations)
    {
        $ids = [];
        foreach ($recommendations->getItems() as $recommendation) {
            $ids[] = $recommendation->item()->identity();
        }

        $query = 'MATCH (user) WHERE id(user) = {userId}
        UNWIND {ids} as productId
        MATCH (product:Product)-[:DESIGNED_BY]->(designer)
        WHERE id(product) = productId
        WITH productId, designer, user
        MATCH (user)-[:BOUGHT]->(p)-[:DESIGNED_BY]->(designer)
        RETURN productId as id, count(*) as score';

        return Statement::create($query, ['userId' => $input->identity(), 'ids' => $ids]);
    }

    public function postProcess(Node $input, Recommendation $recommendation, Record $record)
    {
        $recommendation->addScore($this->name(), new SingleScore($record->get('score')));
    }

I have created a repository where I have a fully functional implementation following your domain :

https://github.com/ikwattro/reco4php-example-so