jonasv jonasv - 5 months ago 18
SQL Question

Skipping first column with LOAD DATA INFILE

I have got such table:

mysql> show create table final\G;
*************************** 1. row ***************************
Table: final
Create Table: CREATE TABLE `final` (
`id` int(4) NOT NULL AUTO_INCREMENT,
`cdatetime` varchar(255) NOT NULL,
`address` varchar(255) NOT NULL,
`district` varchar(255) NOT NULL,
`beat` varchar(255) NOT NULL,
`grid` varchar(255) NOT NULL,
`crimedescr` varchar(255) NOT NULL,
`ucr_ncic_code` varchar(255) NOT NULL,
`latitude` varchar(255) NOT NULL,
`longitude` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)


And I have got csv file which looks like this:

cdatetime,address,district,beat,grid,crimedescr,ucr_ncic_code,latitude,longitude
1/1/06 0:00,3108 OCCIDENTAL DR,3,3C ,1115,10851(A)VC TAKE VEH W/O OWNER,2404,38.55042047,-121.3914158
1/1/06 0:00,2082 EXPEDITION WAY,5,5A ,1512,459 PC BURGLARY RESIDENCE,2204,38.47350069,-121.4901858
1/1/06 0:00,4 PALEN CT,2,2A ,212,10851(A)VC TAKE VEH W/O OWNER,2404,38.65784584,-121.4621009
1/1/06 0:00,22 BECKFORD CT,6,6C ,1443,476 PC PASS FICTICIOUS CHECK,2501,38.50677377,-121.4269508


What I want to do is load that CSV file into table final. Problem is that csv file doesn't have ID column so I'm thinking is it possible to somehow tell mysql to skip column ID and load data into the rest columns, but ID must be used. So ideally it would look like this:

"1/1/06 0:00,3108 OCCIDENTAL DR,3,3C ,1115,10851(A)VC TAKE VEH W/O OWNER,2404,38.55042047,-121.3914158" gets loaded into columns and mysql automatically add 1 to column ID, then "1/1/06 0:00,2082 EXPEDITION WAY,5,5A ,1512,459 PC BURGLARY RESIDENCE,2204,38.47350069,-121.4901858" gets loaded and mysql add 2 to ID column etc etc..

Recently user 'Shadow' told me that I should specify which column I want to load so I did something like this:

load data infile '/SacramentocrimeJanuary2006.csv' INTO TABLE final (cdatetime, address, district, beat, grid, crimedescr, ucr_ncic_code, latitude, longitude);


Mysql returns:

ERROR 1261 (01000): Row 1 doesn't contain data for all columns


According to mysql load data infile manual field delimiter is not "," so I tried to change it by adding FIELDS TERMINATED BY ',' at the end of my statement but this breaks query. What is the correct syntax here ?

Thanks

ANSWER

mysql> CREATE TABLE `final` (
-> `id` int(4) NOT NULL AUTO_INCREMENT,
-> `cdatetime` longtext NULL,
-> `address` longtext NULL,
-> `district` longtext NULL,
-> `beat` longtext NULL,
-> `grid` longtext NULL,
-> `crimedescr` longtext NULL,
-> `ucr_ncic_code` longtext NULL,
-> `latitude` longtext NULL,
-> `longitude` longtext NULL,
-> PRIMARY KEY (`id`)
-> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.17 sec)

mysql> LOAD DATA infile '/SacramentocrimeJanuary2006.csv' INTO TABLE final FIELDS TERMINATED BY ',' lines terminated by '\r' IGNORE 1 ROWS (cdatetime, address, district, beat, grid, crimedescr, ucr_ncic_code, latitude, longitude);
Query OK, 7584 rows affected (0.08 sec)
Records: 7584 Deleted: 0 Skipped: 0 Warnings: 0

Answer

Linux:

LOAD DATA INFILE '/home/frank/try_this123.txt'
INTO TABLE final
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(cdatetime, address,district,beat,grid,crimedescr,ucr_ncic_code,latitude,longitude)
set id = NULL;

or Windows:

LOAD DATA INFILE 'c:\\nate\\try_this123.txt'
INTO TABLE final
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(cdatetime, address,district,beat,grid,crimedescr,ucr_ncic_code,latitude,longitude)
set id = NULL;

.

mysql> select * from final;
+----+-------------+---------------------+----------+------------+------+-------------------------------+---------------+-------------+---------------+
| id | cdatetime   | address             | district | beat       | grid | crimedescr                    | ucr_ncic_code | latitude    | longitude     |
+----+-------------+---------------------+----------+------------+------+-------------------------------+---------------+-------------+---------------+
 | 1 | 1/1/06 0:00 | 3108 OCCIDENTAL DR  | 3        | 3C         | 1115 | 10851(A)VC TAKE VEH W/O OWNER | 2404          | 38.55042047 | -121.3914158
 | 2 | 1/1/06 0:00 | 2082 EXPEDITION WAY | 5        | 5A         | 1512 | 459 PC  BURGLARY RESIDENCE    | 2204          | 38.47350069 | -121.4901858
 | 3 | 1/1/06 0:00 | 4 PALEN CT          | 2        | 2A         | 212  | 10851(A)VC TAKE VEH W/O OWNER | 2404          | 38.65784584 | -121.4621009
 | 4 | 1/1/06 0:00 | 22 BECKFORD CT      | 6        | 6C         | 1443 | 476 PC PASS FICTICIOUS CHECK  | 2501          | 38.50677377 | -121.4269508
+----+-------------+---------------------+----------+------------+------+-------------------------------+---------------+-------------+---------------+

I got that to work without any enclosing demarcations like single or double quotes. The problem is, what is going to happen when, say, your address has commas and it throws off all your data with a shifting problem.

That is why, ideally (read: almost absolutely), you need the data wrapped in double quotes in general unless your data is generated by you and almost simplistic, like:

1,2,cat,14,8

So, in the case of 3rd party systems when there is no control over how the data is pumped in, people have to write ETL routines to scrub the data first to get the data ready for imports with adequate fail-safe wrappers.

Comments