|
Are there no other NoSQL databases you can try besides LiteDB? If you go the relational route and attempt to normalize your flattened data being inserted, it's not going to get faster. Not sure if your important is mean to be realtime or not. So, this is just general-purpose ideas...
These would be your options:
1) Are you using connnection pooling or opening an new connection with every insert? Perhaps that's the bottleneck. LiteDB may not have the concept of connections at all, but if it does that's the first place to check.
2) Can you toss in more threads to this import process? Will LiteDB even handle concurrency or will it choke?
3) Determine why your current LiteDB is choking. Is the bottleneck in your code or the DB? Is there a locked transaction not working? Is it thread safe? Are you using more than one thread getting locked? etc.
4) If the above doesn't work, find a different NoSQL DB that doesn't choke. Get a real one and not one shipped as a DLL once meant for concurrency as it'll have the best throughput even on a single user. I've used MongoDB, I'm sure there are others.
5) If none of that works, then go download MariaDB (MySQL fork), but make sure this import table is using ISAM storage. SQL Server will not be anywhere as fast as this as it doesn't allow you to choose storage engines. This will be for unnormalized data only that supports nothing fancy like triggers, constraints, and foreign keys, but ISAM is fast preciously for that reason. And, you can always have another import process/ETL transform the data if needed in non-realtime.
6) Since it's just an import, the fastest (but last resort) solution would be to just write out the data into an appended binary file. You'd still need a process to import it into an actual DB, but that can be offloaded so the original process isn't bottlenecked.
Rene Rose 2021 wrote: One possible solution would be to distribute this data class across multiple tables to avoid duplicate entries. However, I'm concerned that this may cause performance issues when inserting several hundred thousand data points. Duplication isn't an issue as long as your tables aren't indexed... for writing. Most NoSQL databases don't index, so chances are you're good. Reading is a different story however.
Jeremy Falcon
|
|
|
|
|
Cutting to a chase, both SQLite (always) and MS SQL Server (for development, certainly) are free so assuming that you've got your experience writing SQL in LiteDB and all your code ... it should be rather easy to convert to either one of these, and without much ado, find out for yourself whether by using them they solve the problems you have.
modified 20-Apr-24 15:20pm.
|
|
|
|
|
Rene Rose 2021 wrote: The issue is that the identifier is identical for several hundred thousand data points,
Err...that is not a 'database' problem.
It is a architecture/design problem.
Can you store duplicate rows in a database? Yes.
Should you? That is architecture/design question.
Certainly if the rows are exactly the same then there is no point in storing them. But even in that case you can.
Rene Rose 2021 wrote: when inserting several hundred thousand data points.
Over what period of time?
One day? Easy. Any database can handle that.
One second? Then yes that is going to be a problem. But even then if that only happens once a day it is not a problem. But if continuous then yes it is a problem.
Lets say it is continuous and you do 100,000 a second and each row is 100 bytes.
Size = (24 * 60 * 60) * 100,000 * 100
If I did the math right then that means you are storing 800 gigs a day.
So if every day then for a year you are going to need a petrabyte of storage for each year. So you might want to check your cloud pricing options before you dive in.
Not to mention that only using 100 bytes of data for one row is very small.
|
|
|
|
|
Run an external program C# from within the Oracle Developer and receive the return values from the external program
|
|
|
|
|
|
Greetings,
How do I ensure that the following code only queries records that are submitted between March and May 2024?
If a record has been submitted between the above date range, display 1. Otherwise, display 0.
Currently, we have a code that does this but for entire year.
In the code below, I left the code that performs this check with alias of thisYear.
I left this line of code but commented it out just to show what we have that works except this time, we just want this check to be between March and May 2024.
The code:
CASE WHEN d.dateCreated BETWEEN DATEFROMPARTS(2024, 3, 1) AND DATEFROMPARTS(2024, 5 + 1, 1) THEN 1 ELSE 0 END as thisYear
appears to work sometimes but does not work other times. No errors but wrong results.
That code above, temporarily replaces this line of code below as described above.
--CASE WHEN YEAR(d.dateCreated) = YEAR(getdate()) THEN 1 ELSE 0 END as thisYear
Here is the entire code:
SELECT e.Name, e.email, e.emptitle, d.dateCreated,
CASE WHEN YEAR(d.dateSubmitted) < YEAR(getdate()) THEN 1 ELSE 0 END as previousYear,
--CASE WHEN YEAR(d.dateSubmitted) = YEAR(getdate()) THEN 1 ELSE 0 END as thisYear
CASE WHEN d.dateSubmitted BETWEEN DATEFROMPARTS(2024, 3, 1) AND DATEFROMPARTS(2024, 5 + 1, 1) THEN 1 ELSE 0 END as thisYear
FROM Employees e
INNER JOIN dateDetails d on e.employeeID = d.employeeID
WHERE e.employeeID = someID
I have also tried this:
CASE WHEN d.dateSubmitted >= DATEFROMPARTS(2024, 3, 1) AND d.dateSubmitted < DATEFROMPARTS(2024,5 + 1, 1)
Same inconsistent result.
I guess my question is why does this work perfectly:
CASE WHEN YEAR(d.dateSubmitted) = YEAR(getdate()) THEN 1 ELSE 0 END as thisYear
but the date range does not work well?
Thanks in advance for your help.
|
|
|
|
|
You didn't say what "inconsistent" means.
Additionally what is the data type of 'dateSubmitted'?
|
|
|
|
|
You could have easily told me if my code was right or wrong.
Anyway, I have resolved it.
I was actually coming here to delete the thread.
|
|
|
|
|
You'd be better share your solution.
|
|
|
|
|
samflex wrote: I was actually coming here to delete the thread. Please don't. Removing some part and leaving an orphaned thread makes it difficult to know what the original question was. And many of the replies do not make sense. Just update your question with the solution you found and add the word "Solved" to the title.
|
|
|
|
|
I wasn't going to delete the thread once someone had responded to it.
Here is what ultimately worked for me.
SELECT e.Name, e.email, e.emptitle, d.dateSubmitted,
CASE WHEN YEAR(d.dateSubmitted) < YEAR(getdate()) THEN 1 ELSE 0 END as previousYear,
CASE WHEN d.dateSubmitted >= '20240301' AND d.dateSubmitted < '20240501' THEN 1 ELSE 0 END as thisYear
FROM Employees e
INNER JOIN dateDetails d on e.employeeID = d.employeeID
WHERE e.employeeID = someID
|
|
|
|
|
samflex wrote: me if my code was right or wrong.
I wasn't sure without the information I asked for.
However I can recognize that mixing date/time/stamp can be problematic. As well as not accounting for what time zone means both for programming and display.
|
|
|
|
|
Let's say you have a table with an index on the field "username".
Is it true that eventually you'll have some really time-consuming inserts when the hash table has to resize itself?
Hash table resize is O(n), I think.
Thanks.
|
|
|
|
|
INSERT is costly if there's a lot of indexes. It is not that much about resizing, there's usually a lot of free spare space in a DB and hardly anything needs resizing.
If you can come up with a better strategy than indexes, I'd love to hear it.
Bastard Programmer from Hell
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
|
mike7411 wrote: time
... define ...
|
|
|
|
|
I was looking at the documentation for CREATE INDEX in SQL:
https://www.w3schools.com/sql/sql_create_index.asp
CREATE INDEX index_name
ON table_name (column1, column2, ...);
I don't understand when you would create it for multiple columns. Let's say you did this:
CREATE INDEX idx_pname ON Persons (LastName, FirstName);
Does that create one index on two columns? Or, is that the same as creating two indexes?
For instance, does that statement create two hashtables? Or one hashtable?
Thank you.
|
|
|
|
|
|
One index. First column is most significant, second is then used to reduce it further when doing searches.
I don't think you should consider it as a 'hash table'. At a minimum I doubt any relational database uses a hash, in general, to fully implement an index. I suspect hash tables are not an optimal solution for file based storage. A quick google suggests SQL Server specifically supports that type but for a memory table (so not a normal table.)
I don't want to do the deep dive research to actually figure out what they might use for db indexes. But you can read up on what Clustered Indexes are.
Although not directly related the one example I know without research is the comparison of doing sorting. Binary Sort is for memory. Heap Sort is for file based processing.
|
|
|
|
|
Copy/paste from the answer to the question that was posted before yours;
Quote: That means the index is sorted on col1, then col2. Meaning, col1 will be located first, then most rows following with the same col1 will be sorted on col2. Meaning, in a list of users, while random stored in the table, would have an index that orders it as such;
Lastname, Firstname
Doe, Jane
Doe, John
Vluggen, Albert
Vluggen, Eddy
Vluggen, Zack
If you search it, you go "where lastname = bla (col1 from index) AND firstname = bla2 (col2 from index)".
This way, it will first locate the segment for "Vluggen", and only has to scan until it finds "Eddy" within that segment. That's what the DB is optimized to do.
Bastard Programmer from Hell
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
Let's say you have a SQL database with a table called USERS.
Let's say it has 500,000 users in it.
Let's say you do this:
SELECT PASSWORD FROM USERS WHERE USER='bill.gates';
Let's say there's no indices on the table.
Does the database just do a linear search thru each user?
Thank you.
|
|
|
|
|
Let's say there are books available on SQL, and let's say there are search engines on the internet.
|
|
|
|
|
Yes. It will scan segments and it breaks once all predicates are met. That's why having a PK with a clustered index is so very nice to have. Who comes up with the idea of a non-indexed table?
Bastard Programmer from Hell
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
In data base talk it is known as a 'table scan'.
Most databases have a way which allows you to see what it is doing. Oracle, SQL Server and MySQL IDEs all have a way to see this clearly. The nomenclature used in the output is hard to read but practice makes that easier.
|
|
|
|
|