I want to upload a large csv file approx 10,000,000 records in mysql table which also contain same or more no. of records and also some duplicate records. I tried Local data infile but it is also taking more time. How can I resolve this without waiting for a long time. If it can't be resolved then how can I do it with AJAX to send some records and process it at a time and will do it till the whole csv get uploaded/proccessed.
I want to upload a large csv file approx 10,000,000 records in mysql table which also contain same or more no. of records and also some duplicate records. I tried Local data infile but it is also taking more time. How can I resolve this without waiting for a long time. If it can't be resolved then how can I do it with AJAX to send some records and process it at a time and will do it till the whole csv get uploaded/proccessed.
Share Improve this question edited Sep 28, 2011 at 20:57 akashdeep asked Sep 28, 2011 at 20:50 akashdeepakashdeep 3393 silver badges13 bronze badges 6- 2 You'll have to first explain this new number notation you've e up with. – mowwwalker Commented Sep 28, 2011 at 20:51
- is it a billion records or 10 million? – webbiedave Commented Sep 28, 2011 at 20:52
- 2 via ajax would be even slower. if you want the load infile mands not take so long, break up the csv into smaller chunks. – Marc B Commented Sep 28, 2011 at 20:53
- Through adjusting your mas, it seems it's a million? Is that correct? We need to know the scale on which we're talking. – Cyclone Commented Sep 28, 2011 at 20:53
- 1 Walkerneo it's the Indian style of writing. 10 million = 1,00,00,000 = 1 crore. – akashdeep Commented Sep 28, 2011 at 20:59
6 Answers
Reset to default 6LOAD DATA INFILE
isn't going to be beat speed-wise. There are a few things you can do to speed it up:
- Drop or disable some indexes (but of course, you'll get to wait for them to build after the load. But this is often faster). If you're using MyISAM, you can
ALTER TABLE *foo* DISABLE KEYS
, but InnoDB doesn't support that, unfortunately. You'll have to drop them instead. - Optimize your my.cnf settings. In particular, you may be able to disable a lot of safety things (like fsync). Of course, if you take a crash, you'll have to restore a backup and start the load over again. Also, if you're running the default my.cnf, last I checked its pretty sub-optimal for a database machine. Plenty of tuning guides are around.
- Buy faster hardware. Or rent some (e.g., try a fast Amazon ECC instance).
- As @ZendDevel mentions, consider other data storage solutions, if you're not locked into MySQL. For example, if you're just storing a list of telephone numbers (and some data with them), a plain hash table is going to be many times faster.
If the problem is that its killing a database performance, you can split your CSV file into multiple CSV files, and load them in chunks.
Try this:
load data local infile '/yourcsvfile.csv' into table yourtable fields terminated by ',' lines terminated by '\r\n'
Depending on your storage engine this can take a long time. I've noticed that with MYISAM it goes a bit faster. I've just tested with the exact same dataset and I finally went with PostgreSQL because it was more robust at loading the file. Innodb was so slow I aborted it after two hours with the same size dataset but it was 10,000,000 records by 128 columns full of data.
As this is a white list being updated on a daily basis does this not mean that there are a very large number of duplicates (after the first day)? If this is the case it would make the upload a lot faster to do a simple script which checks if the record already exists before inserting it.
Try this query:
$sql="LOAD DATA LOCAL INFILE '../upload/csvfile.csv'
INTO TABLE table_name FIELDS
TERMINATED BY ','
ENCLOSED BY ''
LINES TERMINATED BY '\n' "
I was realize the same problem and find out a way out. You can check the process to upload large CSV file using AJAX.
How to use AJAX to upload large CSV file?