duckman duckman - 6 months ago 41
Python Question

how to create a SQL database from a csv file in Python

I need to load in a large dataset (20gb for now but will be 100 gb in the future) from a csv file. I am using the MySQLdb module in python (PyCharm). I also need to only select some certain columns. So far I have tried this:

import csv
import MySQLdb

mydb = MySQLdb.connect(host='localhost',
user='root',
passwd='',
db='mydb')
cursor = mydb.cursor()

csv_data = csv.reader(file('collected_quotes_sample.csv'))
for row in csv_data:
cursor.execute('INSERT INTO testcsv(RIC, Date, Time, Ask, Bid, BAS, window ) VALUES(%s, %s, %s, %s, %s, %s, %s)', row)
#close the connection to the database.
mydb.commit()
cursor.close()
print "Done"


but it provides many errors. I am new to python and SQL database so I am not familiar with the commands/codes. So I have a few questions:

1. with MySQLdb.connect( host,user,passwd,db etc), what are host, user, password etc? to my understanding, they are credentials for my account in the computer. so do I need to put in my user account and password?
2. What does mydb.cursor do?
3. How to upload a csv file into a SQL database? and after the database is created, I can write a python script to work on it and there is no need to re read/create the database?


Thanks a lots!

Answer

To answer your questions

  1. with MySQLdb.connect( host,user,passwd,db etc), what are host, user, password etc? to my understanding, they are credentials for my account in the computer. so do I need to put in my user account and password?
  • Answer: host is the server IP address + the port num. user/pwd is the client user you created on the server side. And one host can have many DB created inside, so you need to specify the DB. Usually when you started, you are likely using the localhost(127.0.0.1), and the port number is what you defined when you create the server. Once the server is started, one or more clients can be connected to the database server. You then need to have a superuser (like root) with all the privileges and several other normal users (probably with less privileges).
  1. What does mydb.cursor do?
  • Answer: cursor is a object that can execute operations such as SQL statements. You always need Cursor objects interact with the MySQL server. If you are using native SQL script to interact with the MySQL server, you don't really need this cursor object, but since you are using MySQLdb as a Python wrapper, then you need to use it as the DB-API requires that you interface to them that way (The cursor object is an abstraction specified in the Python DB-API 2.0).
  1. How to upload a csv file into a SQL database? and after the database is created, I can write a python script to work on it and there is no need to re read/create the database?
  • Answer: Generally, the code you shown in the question is performing the uploading process. So once you uploaded successfully, you can then retrieve it later by implementing some SQL retrieving statement in your Python by using MySQLdb. Once the DB is created, you will rarely need to re-create unless you would like to delete all your old data. And to work with it, you would always need to read from the DB. But you can put the commonly used SQL (or call some stored function/procedure) into your Python code so that you can just call one function to retrieve the data in the way you want.

Generally, I think you should understand more about MySQL basics before rushing to use them. And also how to Install MySQL on Windows

Comments