Click here to Skip to main content
14,980,370 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I'm doing a little light dabbling in Java, and decided to write something to benchmark how the top four db engines perform a set of basic operations. I decided to read the server settings from an Excel spreadsheet, and write the results back to the same sheet. The first column holds a short description of the current operation, and it processes each column after that as a separate server, until we run out of populated columns. If the input is garbage, or anything goes wrong, it populates the status row with the relevant error.

The thing is, if you provide the details for a server that doesn't in fact exist, it can take quite some time for the error to return. There's no way to get away from that, but it would be nice to multithread each column so all the nonexistent servers fail concurrently instead of sequentially.

I coded this up using apache poi, which is utterly brilliant, I'm not using any sort of intermediary data structure at all, I read the input cells and update the results straight back. But it (understandably) doesn't allow multithreading. I think I'll give it a try anyway, because every thread has it's own column, so it may be forgiving, but probably not.

Plan B is to expose an int array in the runnable class, and return the times in that. The status message too. The parent process then loads all of the results back into the sheet.

What I have tried:

I'm kinda just looking for a sanity check here. Is there some data structure designed specifically for multithreading? Also, I'm something of a spreadsheet fanboi, so is that just a slightly weird output for normal people? Are there even any normal people here? Spin round three times singing the Peter gunn theme if you're a normal person.

EDIT or I could just multithread the connections to the servers, and then run the timings sequentially on the ones that exist.
Posted
Updated 17-May-21 3:41am
v2

1 solution

I would multithread the connections, then sequentially run the timings: otherwise you aren't timing the same thing - they are "stealing bandwidth" from each other in terms of file access, network access, processor core access, memory, ...
It gets really difficult to get even close to accurate timings unless each test is run on a "bare machine" where nothing else is going on.
   
Comments
ThePotty1 18-May-21 3:29am
   
Yeah I assumed that would kinda balance out, everyone steals as much as they can get and nobody's entirely happy, but they all get equal shares. Perhaps not a safe bet, it doesn't seem to be working irl :p

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900