Nabih Ibrahim Bawazir Nabih Ibrahim Bawazir - 8 months ago 56
MySQL Question

Reading database table in pandas dataframe by parts and show how much time consumed

For example I have 88 million rows in my SQL table, I want to import this in pandas, I do query every million rows, how to not hardcode the process

Here's my previous code

%%time
import mysql.connector as sql import pandas as pd

db_connection = sql.connect(host='an aws link', database='p_v1', user='sci', password='tRaF') db_cursor = db_connection.cursor() db_cursor.execute('SELECT * FROM transactions t limit 10')

table_rows = db_cursor.fetchall()

df1 = pd.read_sql('SELECT order_id, title, qty, amount FROM transactions t where order_id <='1000000' , con=db_connection)


and

%%time
df2 = pd.read_sql('SELECT order_id, title, qty, amount FROM transactions t where >1000000 and order_id <=2000000', con=db_connection)


etc.

After that

frames = [df1, df2, ..., df88] #I make it shorter to make easier to read
result = pd.concat(frames)


I guess the solution is using chunk, loop, or anything, but I still fail to get other solution.

Answer Source

Because I can't comment on question, here's the step

I quite sure any popular library have a ready to go tool for this case.

You need to execute algorithm: In this case I can think about something like this:

  1. Build a query that requires every limit the query to 100000 (for example)
  2. Make sure you repeat to 880 times based on data and dint forget ad %%timein first pasrt of the cell

The problem maybe in your mysql connetion, not a code problem

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download