Ke Tian Ke Tian - 4 months ago 14x
HTML Question

Python to parse html data and store into the database

This is trouble me for two days, I am new one to python, I want to Parse the html data as the following link:

and then store the data into the postgresql database named movie_db, and there is table named films which is created by the following command:

title varchar(128) NOT NULL,
description varchar(256) NOT NULL,
directors varchar(128)[],
roles varchar(128)[]

I have parsed data, there are three list data for title, description, director, roles. such as title =['a', .....,'b'], description = ['c',....,'f'], director= ['d',.....,'g'], roles = [['f','g','t'], ......,['h', 't','u']].

sql = "INSERT INTO films (title, description, directors, roles)


(%s, %s, %s, %s);"
for obj in zip(t, des, dirt, r):
cur.execute(cur.mogrify(sql, obj))

There is error:

psycopg2.DataError: malformed array literal: "サム・メンデス"

LINE 1: ...ームズ・ボンドの戦いを描く『007』シリーズ第24作', 'サム・メ...
DETAIL: Array value must start with "{" or dimension information.


I know this error. It means you are trying to insert string values into array columns. You can verify the SQL as below.

sql2 = cur.mogrify(SQL, obj)
print sql2

Your directors and roles fetched from html are list of strings. So after zip function the obj contains dir and roles as strings.

For your case you are trying to insert only 1 row. So there is probably no need to zip.

I am not familiar with this API you used, but can you try to print the values received from html before inserting? I can provide you the exact SQL required.