Click here to Skip to main content
15,890,282 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I'm doing a little light dabbling in Java, and decided to write something to benchmark how the top four db engines perform a set of basic operations. I decided to read the server settings from an Excel spreadsheet, and write the results back to the same sheet. The first column holds a short description of the current operation, and it processes each column after that as a separate server, until we run out of populated columns. If the input is garbage, or anything goes wrong, it populates the status row with the relevant error.

The thing is, if you provide the details for a server that doesn't in fact exist, it can take quite some time for the error to return. There's no way to get away from that, but it would be nice to multithread each column so all the nonexistent servers fail concurrently instead of sequentially.

I coded this up using apache poi, which is utterly brilliant, I'm not using any sort of intermediary data structure at all, I read the input cells and update the results straight back. But it (understandably) doesn't allow multithreading. I think I'll give it a try anyway, because every thread has it's own column, so it may be forgiving, but probably not.

Plan B is to expose an int array in the runnable class, and return the times in that. The status message too. The parent process then loads all of the results back into the sheet.

What I have tried:

I'm kinda just looking for a sanity check here. Is there some data structure designed specifically for multithreading? Also, I'm something of a spreadsheet fanboi, so is that just a slightly weird output for normal people? Are there even any normal people here? Spin round three times singing the Peter gunn theme if you're a normal person.

EDIT or I could just multithread the connections to the servers, and then run the timings sequentially on the ones that exist.
Posted
Updated 20-Oct-21 20:45pm
v2

I would multithread the connections, then sequentially run the timings: otherwise you aren't timing the same thing - they are "stealing bandwidth" from each other in terms of file access, network access, processor core access, memory, ...
It gets really difficult to get even close to accurate timings unless each test is run on a "bare machine" where nothing else is going on.
 
Share this answer
 
Comments
ThePotty1 18-May-21 3:29am    
Yeah I assumed that would kinda balance out, everyone steals as much as they can get and nobody's entirely happy, but they all get equal shares. Perhaps not a safe bet, it doesn't seem to be working irl :p
Multithreading the timings really distorts the results, so as suggested those are now run sequentially. The code is running about as well as I can make it at this point, and I'm pretty chuffed with it.

This is the output I'm getting using batched prepared SQL. I commented out the updatable resultset code, because it works great for SQL server, but MySQL was unacceptably slow. Unbatched prepared sql was the same. I suspect the SQL server Java data driver does automatic batching, while the MySQL one needs to be told to batch. So those 18 seconds to update / delete in SQL server may look bad, but MySQL was taking just under 30 minutes to insert 24k records using either unbatched prepared sql, or resultsets.

Server engine  SQL server 2019    SQL server 2017    MySQL
Server name    //localhost:54321  //localhost:54322  //localhost:3306

Table (NumKey,NumData) 
Create table          6                3              1007
Insert 24k rows     425              160              6057
Delete 4k rows       36               29               462
Update 4k values     78               28               441
Select *              1                1                15
Delete all (SQL)     33               16               234
Drop table            2               13               191

Table (Alphakey,NumKey,NumData,AlphaData)
Create table         1                 1               375
Insert 24k rows    241               170              6107
Delete 4k rows   18496             18521               547
Update 4k rows   16399             16410               551
Select *             1                 1                18
Delete all (SQL)    31                21               205
Drop table           1                 2               298


That said, I realise that it's probably a horrendous legacy-like abomination that needs to be abstracted the hell out of. Thing is, I'm pretty much a horrendous legacy-like abomination myself, so I need some guidance on how I need to change this thing so it's not quite so legacy. Can I dump the code on this thread for comments, or do I need to start a new thread?
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900