I have a very big unindexed table called table with rows like this:
IP entrypoint timestamp
220.127.116.11 /page-title/?kw=abc 2016-04-14 11:59:52
18.104.22.168 /another-page/?kw=123 2016-04-12 04:13:20
22.214.171.124 /a-third-page/ 2016-05-12 09:43:30
Rows from the table
SELECT ... FROM very_big_unindexed_table t
only within the past week...
WHERE t.timestamp >= NOW() + INTERVAL - 1 WEEK
that contains '?kw=' in the entry point
AND t.entrypoint LIKE '%?kw=%'
only the latest row for each IP. There's a couple of approaches to that. A correlated subquery on a very big unindexed table is going to eat your lunch and your lunch box. And without an index, there's no getting around a full scan of the table and a "Using filesort" operation.
Given the unfortunate circumstances, our best bet for performance is likely going to be getting the set whittled down as small as we can, and then perform the sort, and avoid any join operations (back to that table) and avoid correlated subqueries.
So, let's start with something like this, to return all of the rows from the past week with '?kw=' in entry point. This is going to be full scan of the table, and a sort operation...
SELECT t.ip , t.timestamp , t.entry_point FROM very_big_unindexed_table t WHERE t.timestamp >= NOW() + INTERVAL -1 WEEK AND t.entrypoint LIKE '%?kw=%' ORDER BY t.ip DESC, t.timestamp DESC
We can use an unsupported trick with user-defined variables. (The MySQL Reference Manual specifically warns against using a pattern like this, because the behavior is (officially) undefined. Unofficially, the optimizer in MySQL 5.1 and 5.5 (at least) is very predictable.
I think this is going to be about as good as you are going to get, if the number of rows from the past week are significant subset of the entire table. This is going to create a sizable intermediate resultset (derived table), if there are lot of rows that satisfy the predicates.
SELECT q.ip , q.entrypoint , q.timestamp FROM ( SELECT IF(t.ip = @prev_ip, 0, 1) AS new_ip , @prev_ip := t.ip AS ip , t.timestamp AS timestamp , t.entrypoint AS entrypoint FROM (SELECT @prev_ip := NULL) i CROSS JOIN very_big_unindexed_table t WHERE t.timestamp >= NOW() + INTERVAL -1 WEEK AND t.entrypoint LIKE '%?kw=%' ORDER BY t.ip DESC, t.timestamp DESC ) q WHERE q.new_ip
Execution of that query will require (in terms of what's going to take the time)