Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: Java JavaSE6 JDBC
I am importing huge amount of data from CSV files into MSSQL Server 2008. I am using core JDBC (Without any ORM frameworks) and communicating with the DB using the driver 'sqljdbc4.jar' provided by Microsoft.
 
As of now am importing the records one by one. Which takes considerable amount of time even though am using PreparedStatement.
 
I am not using batch import since I need to log the exact error into the Error file.
 
Kindly suggest any an idea to improve the performance with out sacrificing the accurate error logging. And I am forced to do this without any ORM tools.
 
Here is the sample code:
 
public void importCSV(){
   // Create a db connection if its null or closed.
   // Create PreparedStatement objects for selects and inserts if null or closed.

   for (Map<String, String> csvRecord : csvAsList) {
      // Prepare category object using csvRecord.
      // Check whether category exists in by.
      // Import files can have up to 1,00,000 records so tracking errors is critical.
      try{
         categoryDAO.findByName(categoryName,<PreparedStatement object>);
      }
      catch(Exception exp){
         // log this to error.csv file
      }
      // If its a new category import it to the db.
      try{   
         categoryDAO.insert(category,<PreparedStatement object>);
      }
      catch(Exception exp){
         // log this to error.csv file
      }
   }
   // Close PreparedStatement objects
   // Close DB Connection
}
 
public Category insert(Category category,PreparedStatement pstmt ) throws SQLException{
 
   if (category == null) {
      return null;
   }
   ResultSet rs = null;
   try {
      pstmt.setInt(1, category.getField1());
      pstmt.setString(2, category.getField2());
      int result = pstmt.executeUpdate();
      if (result < 1) {
         return null;
      }
      rs = pstmt.getGeneratedKeys();
      if (rs.next()) {
         category.setId(rs.getInt(1));
      }
   } finally {
      if (rs != null)
         rs.close();
   }
   return category;
}
 
Note:
The time taken for inserting 42390 records + (8 * 42390) select operations is 6.30 mins(approx). Is it a good figure ? Or still is there any way to improve the performance even better ?
 

Thanks in advance.
Posted 21-Sep-12 4:02am
Edited 21-Sep-12 4:11am
v2
Comments
Ashraff Ali Wahab at 21-Sep-12 10:05am
   
You can split the list into configurable size and use the thread to instert the data.

1 solution

Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

  Permalink  
Comments
pasztorpisti at 21-Sep-12 14:06pm
   
+5 bulk insert is nice magic. however I would mention a funny story: on a machine the regional/language settings contained comma (',') as a decimal point. For this reason excel saved the floating point data so that the decimal point was ',' in the cvs file (just like the separators of course). The saved data was corrupted so badly even excel was unable to load the data back correctly from the file and we succeeded to mess up the database as well. +few hours of manual work to cleanup. :-)
Mehdi Gholam at 21-Sep-12 14:15pm
   
Cheers!
leejoyprakash at 26-Sep-12 8:38am
   
Thank you Mehdi Gholam for your answer. This is really a useful piece of information.
 
Sorry that I forget to explain about my CSV file structure, in my case, the contents of one CSV file doesn't belongs to one table.
The CSV file is as follows:
 
COL 1,CAT 1,CAT 2,CAT 3,CAT 4,ITEM,DESCRIPTION
Val 1,Val 2,Val 3,Val 4,Val 5,Val 6,Val 7
 
The requirements are as follows:
 
1. COL 1 belongs to table1. First I need to check whether it exists, if not skip the record by tracking this in Error.csv
2. CAT 1 to CAT 4 belongs to table2. Check whether each of this exists, if not import into table2.
3. Item & Description belongs to table3. Check if exists, if not import into table3.
 
So I assume that, it not possible to do a bulk import for a CSV with values belonging to more than one table. Please correct me if I am wrong.
 
Thanks once again for your answer which really was a new idea for me.
 
Thanks

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 Maciej Los 290
1 OriginalGriff 275
2 Sergey Alexandrovich Kryukov 195
3 BillWoodruff 140
4 ProgramFOX 140
0 OriginalGriff 6,524
1 Sergey Alexandrovich Kryukov 6,048
2 DamithSL 5,218
3 Manas Bhardwaj 4,657
4 Maciej Los 4,150


Advertise | Privacy | Mobile
Web04 | 2.8.1411022.1 | Last Updated 21 Sep 2012
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100