Click here to Skip to main content
15,887,027 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I'm having an array list which has more than 1 million of records in it, I want to dump it as the way it is in PostgreSQL database using something built in methods like Copy-manger or any method which will be more faster to import this large amount of data in postgresql

My code is As follows

What I have tried:

@Service
public class OrderProcessingService {
    
    @Autowired
    private OrderRepository orderRepository;
    
    public void process_data(String hfrcode, List<Order> order) {
            
    List<Order> orderList = new ArrayList<>(); 
  
    for(var data : order) {
    Order order_data = new Visit();
    order_data .setUuid(UUID.randomUUID().toString());
    order_data .setOrderID(data.getOrderID());
    order_data .setOrderDate(data.getOrderDate());
    order_data .setWeight(data.getWeight());
    order_data .setOrderQuantity(data.getOrderQuantity());
    order_data .setOrderLocation(data.getOrderLocation());
    orderList.add(order_data );
    }

   orderRepository.saveAll(orderList);
 }
} 



Instead of saving the list I want to copy it and dump it into PostgreSQL database so as I can minimize the execution time of importing the data into database.
Posted
Comments
PIEBALDconsult 19-Mar-24 17:25pm    
A number of points come to mind...
0) Why are you copying all the data from data to orderdata/orderlist?
1) Can't you pre-allocate the orderlist with the ArrayList(Int32) constructor? That should save some time.
1.a) Or the ArrayList(ICollection) constructor?
2) Instantiating such a large structure for such a short duration is a waste of resources.
3) Wouldn't something like orderRepository.save(data)work?
3.a) Maybe with a begin transaction / commit transaction?

Looking at that code, all I see is time/resource wasters.

Nobody can guarantee you "a fraction of a second" to add a million rows to a DB, PostgreSQL or not - it will depend on far too many factors: the computer(s) that the DB engine is running on, the activity level on the DB, the connection between the DB computer and the computer running your code.
Think about it: if you have 1,000,000 rows to add, and each row has 100 characters (generally a conservative estimate) then you have 100 megabytes to transfer - which is a significant chunk of bandwidth: a gigabit network will take a second or more to transfer it between the two computers if there is no other traffic on the network! And that's before the DB engine gets to process it and start juggling rows in the DB files to get them stored.

Have a think about what you are trying to do and why, then think about ways you can heavily reduce the data instead of assuming it's "just a software problem".
 
Share this answer
 
v2
Comments
Office Systems 19-Mar-24 8:17am    
Thanks @OriginalGriff for your detailed explanation however the list that I want to dump it into PostgresQL is large and due to its processing from the server side application which is spring boot this is where the problem starts if the amount of data is reducing then the application works fine and it just takes miliseconds but when the amount of data increases this is where the things gets a little bit complicated so I was trying to figure it around if there is a means of copying the whole data from the list and dump them at once even the process will take a minute but they will have to be inserted.
Without knowing where you were sourcing your data, this is an almost impossible question to answer. There are a few things that do occur from looking at this. You are adding a random UUID but that could easily be auto-generated by the database. There's nothing else there that depends on this value. If the data comes from the result of an API call, you could use an ETL tool such as HPCC Systems[^] to do the heavy lifting for you.

Ultimately, you are going to have to try some things out to find out what is going to work best for you.
 
Share this answer
 
Comments
Office Systems 19-Mar-24 8:58am    
Thanks @Pete O'Hanlon for that response let me have a look on these ETL tools so that I can use them to lift data into postgreSQL data, thanks again in advance.
Pete O'Hanlon 19-Mar-24 9:05am    
You are most welcome. I am a big fan of ETL tools.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900