Multithreaded output to a spreadsheet

Question

0.00/5 (No votes)

See more:

I'm doing a little light dabbling in Java, and decided to write something to benchmark how the top four db engines perform a set of basic operations. I decided to read the server settings from an Excel spreadsheet, and write the results back to the same sheet. The first column holds a short description of the current operation, and it processes each column after that as a separate server, until we run out of populated columns. If the input is garbage, or anything goes wrong, it populates the status row with the relevant error.

The thing is, if you provide the details for a server that doesn't in fact exist, it can take quite some time for the error to return. There's no way to get away from that, but it would be nice to multithread each column so all the nonexistent servers fail concurrently instead of sequentially.

I coded this up using apache poi, which is utterly brilliant, I'm not using any sort of intermediary data structure at all, I read the input cells and update the results straight back. But it (understandably) doesn't allow multithreading. I think I'll give it a try anyway, because every thread has it's own column, so it may be forgiving, but probably not.

Plan B is to expose an int array in the runnable class, and return the times in that. The status message too. The parent process then loads all of the results back into the sheet.

What I have tried:

I'm kinda just looking for a sanity check here. Is there some data structure designed specifically for multithreading? Also, I'm something of a spreadsheet fanboi, so is that just a slightly weird output for normal people? Are there even any normal people here? Spin round three times singing the Peter gunn theme if you're a normal person.

EDIT or I could just multithread the connections to the servers, and then run the timings sequentially on the ones that exist.

Posted 17-May-21 2:36am

ThePotty1

Updated 20-Oct-21 20:45pm

v2

Add a Solution

2 solutions

Solution 2

Multithreading the timings really distorts the results, so as suggested those are now run sequentially. The code is running about as well as I can make it at this point, and I'm pretty chuffed with it.

This is the output I'm getting using batched prepared SQL. I commented out the updatable resultset code, because it works great for SQL server, but MySQL was unacceptably slow. Unbatched prepared sql was the same. I suspect the SQL server Java data driver does automatic batching, while the MySQL one needs to be told to batch. So those 18 seconds to update / delete in SQL server may look bad, but MySQL was taking just under 30 minutes to insert 24k records using either unbatched prepared sql, or resultsets.

Server engine  SQL server 2019    SQL server 2017    MySQL
Server name    //localhost:54321  //localhost:54322  //localhost:3306

Table (NumKey,NumData) 
Create table          6                3              1007
Insert 24k rows     425              160              6057
Delete 4k rows       36               29               462
Update 4k values     78               28               441
Select *              1                1                15
Delete all (SQL)     33               16               234
Drop table            2               13               191

Table (Alphakey,NumKey,NumData,AlphaData)
Create table         1                 1               375
Insert 24k rows    241               170              6107
Delete 4k rows   18496             18521               547
Update 4k rows   16399             16410               551
Select *             1                 1                18
Delete all (SQL)    31                21               205
Drop table           1                 2               298

That said, I realise that it's probably a horrendous legacy-like abomination that needs to be abstracted the hell out of. Thing is, I'm pretty much a horrendous legacy-like abomination myself, so I need some guidance on how I need to change this thing so it's not quite so legacy. Can I dump the code on this thread for comments, or do I need to start a new thread?

Posted 20-Oct-21 20:45pm

ThePotty1

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

OriginalGriff · Accepted Answer · 2021-05-17T03:41:00

Solution 1

I would multithread the connections, then sequentially run the timings: otherwise you aren't timing the same thing - they are "stealing bandwidth" from each other in terms of file access, network access, processor core access, memory, ...
It gets really difficult to get even close to accurate timings unless each test is run on a "bare machine" where nothing else is going on.

Posted 17-May-21 3:41am

OriginalGriff

Comments

ThePotty1 18-May-21 3:29am

Yeah I assumed that would kinda balance out, everyone steals as much as they can get and nobody's entirely happy, but they all get equal shares. Perhaps not a safe bet, it doesn't seem to be working irl :p