Scrobot Scrobot - 7 months ago 139
PHP Question

How to optimise hadle of big data on laravel?

Everyone!
I have some problem. My task is:
"To take transactions table, grouped row by transaction date and calculate statuses. This manipulations will be formed statistics, wich will be rendered on the page".

This is my method of this statistics generation

public static function getStatistics(Website $website = null)
{
if($website == null) return [];

$query = \DB::table('transactions')->where("website_id", $website->id)->orderBy("dt", "desc")->get();

$transitions = collect(static::convertDate($query))->groupBy("dt");
$statistics = collect();

dd($transitions);

foreach ($transitions as $date => $trans) {
$subscriptions = $trans->where("status", 'subscribe')->count();
$unsubscriptions = $trans->where("status", 'unsubscribe')->count();
$prolongations = $trans->where("status", 'rebilling')->count();
$redirections = $trans->where("status", 'redirect_to_lp')->count();
$conversion = $redirections == 0 ? 0 : ((float) ($subscriptions / $redirections));
$earnings = $trans->sum("pay");

$statistics->push((object)[
"date" => $date,
"subscriptions" => $subscriptions,
'unsubscriptions' => $unsubscriptions,
'prolongations' => $prolongations,
'redirections' => $redirections,
'conversion' => round($conversion, 2),
'earnings' => $earnings,
]);

}

return $statistics;
}


if count of transaction rows below 100,000 - it's all wright. But, if count is above 150-200k - nginx throw 502 bad gateway. What can you advise to me? I'm dont' have any expierince in bigdata handling. May be, my impiments has fundamental error?

Thanks.

Answer

So. After some days of learning info about this question, i'm find only one wright answer:

NOT to use PHP for handling raw data. It's better to use SQL!

In my case, we are use PostgreSQL.

Below, i'll write sql-query that help me, maybe it will help anybody else.

WITH
        cte_range(dt) AS
        (
            SELECT
                generate_series('2016-04-01 00:00:00'::timestamp with time zone, '{$date} 00:00:00'::timestamp with time zone, INTERVAL '1 day')
        ),

        cte_data AS
        (
            SELECT
                date_trunc('day', dt) AS dt,
                COUNT(*) FILTER (WHERE status = 'subscribe') AS count_subscribes,
                COUNT(*) FILTER (WHERE status = 'unsubscribe') AS count_unsubscribes,
                COUNT(*) FILTER (WHERE status = 'rebilling') AS count_rebillings,
                COUNT(*) FILTER (WHERE status = 'redirect_to_lp') AS count_redirects_to_lp,
                SUM(pay) AS earnings,
                CASE
                    WHEN COUNT(*) FILTER (WHERE status = 'redirect_to_lp') > 0 THEN 100.0 * COUNT(*) FILTER (WHERE status = 'subscribe')::float / COUNT(*) FILTER (WHERE status = 'redirect_to_lp')::float
                    ELSE 0
                END
                AS conversion_percent

            FROM
                transactions

            WHERE
                website_id = {$website->id}

            GROUP BY
                date_trunc('day', dt)
        )

        SELECT
            to_char(cte_range.dt, 'YYYY-MM-DD') AS day,
            COALESCE(cte_data.count_subscribes, 0) AS count_subscribe,
            COALESCE(cte_data.count_unsubscribes, 0) AS count_unsubscribes,
            COALESCE(cte_data.count_rebillings, 0) AS count_rebillings,
            COALESCE(cte_data.count_redirects_to_lp, 0) AS count_redirects_to_lp,
            COALESCE(cte_data.conversion_percent, 0) AS conversion_percent,
            COALESCE(cte_data.earnings, 0) AS earnings

        FROM
            cte_range

        LEFT JOIN
            cte_data
            ON cte_data.dt = cte_range.dt

        ORDER BY
            cte_range.dt DESC