|
I have a MFC Application, where I am creating a new feature.
The new feature is to write a structure around size 900 KB to a file on disk every second. Similarly, my another application will read the same file every second after some time.
My questions are:
1. How to save the structure (900 KB) to the file in reduced size format, such that the file size doesn't become too big?
2. Saving the structure every second data as an individual file for reading and writing (there will be 60 files for 60 seconds)
or saving the structure sequentially in the same file(only one file)...Which is efficient for reading and writing
Please note that I am going to write the structure to the file for 4 days as a maximum period.
|
|
|
|
|
Quote: 1. How to save the structure (900 KB) to the file in reduced size format, such that the file size doesn't become too big? It depends on the data. A general compression algorithm is probably not reccomended here, because you have to access fast the struct. So, it is up to you (based on data knowledge) to implement (if feasible) an ad hoc compression/decompression algorithm.
Quote: Saving the structure every second data as an individual file for reading and writing (there will be 60 files for 60 seconds)
or saving the structure sequentially in the same file(only one file)...Which is efficient for reading and writing At first glance the individual file option looks more promising. But, again, data knowledge could change such an estimate.
|
|
|
|
|
Unless you have a lot of software experience, consider posting your specification (i.e. the actual problem that you're trying to solve). It might be that there are better approaches than having to write, and later read, a 900KB file every second.
|
|
|
|
|
manoharbalu wrote: Please note that I am going to write the structure to the file for 4 days as a maximum period. 4 days is 345,600 seconds. Are you saying that you are going to write 311 gigabytes of data to disk in 4 days? (That is if you don't find a good way to compress it.)
I certainly would not want to manage a single 311 GByte file. Nor would I want to manage 345,600 files ...
Writing a megabyte per second to a modern disk is nothing to worry about. A modern processor could easily handle compression of a megabyte a second, too. A disk to handle a third of a terabyte is no problem. You don't need any super-technology to get this to work. I would be more worried about either handling a 300 GByte file, or a third of a million tiny files (yes, by modern standards, files < 1Mbyte are "tiny").
Compressing to save space is just a technical detail - at application level, you realate to it at its uncompressed size. That comes natural if data are, say, a video stream, but then you would never consider to spread it over a third of a million files. So I am really curious to hear about the nature of the data!
I am not shaing CPallini's sceptisism to general compression algorithms. They work by the principle of recognizing recurring patterns, representing them by a shorter codeword. In totally unprocessed data there is often a large share of repeated patterns, especially in logging data. And in all sorts of readable text. Either, you'll come quite some way using a general method (such as pkzip), or it takes lots of detail knowledge of the information structures to do it much better.
Obviously, if you data is a known media type, such as video, photo or sound, then the mehtods have already been developed; you can get hold of a library to do it - I am thinking of when you to handle you own datatypes for which no dedicated compression method is yet known. Try out general methods first. Only if the results are completely unsatifactory, the files still way too large, may you consider making something of your own. Maybe you can reduce the data volume before you send it to compression, factoring out information elements, omitting data that can be reconstructed from other data etc.
Also consider whether you need losslss compression. If these 900 kB chunks are individual log entries, will they ever later be addressed as individual entries? Maybe you can reduce the data to statistics - averages, sums, whatever. (Like, when you swipe your card at the grocery store, the 49 individual entires on your checkout slip is reduced to a single total sum.)
Giving specific advice would be a lot easier with some details about the nature of the data you want to store.
|
|
|
|
|
Putting aside the data storage constraints mentioned by others, you need to experiment.
1) Is the structure bitwise copyable?
2) If so, just write it to disk using CFile and measure the impact.
2a) Use LZ4 to compress the data first (My own guess is that this would make a difference with a hard drive, but may actually take more time with an SSD.)
3) If not, you will need to serialize it; I'd still keep all the data binary.
3a) You could serialize it to a block of memory and then write that in one go (very likely the fastest way)
3b) You could serialize to a file using CStdioFile (depending on the data, this could be very slow. Even slower if you used C++ iostreams.)
3c) Do a combination; Serialize the data in 32k chunks and write them, perhaps asynchronously. (That said, when doing anything comparable, I prefer just having a second thread write synchronously.)
If the data needs to be future proofed, consider serializing using RapidJSON (which is a very fast C++ JSON library), compressing the result with LZ4 and then writing that. However, this could easily take longer than a second, depending on what the data is.
Edit: If the data is fairly regular, you might be able to save the full thing every ten seconds and differences in between. This can be tricky; I once worked on an app which did this and I found that the differencing alone exceeded the time it took to transmit the data over TCP.
|
|
|
|
|
Joe Woodbury wrote: If the data needs to be future proofed, consider serializing using RapidJSON (which is a very fast C++ JSON library), compressing the result with LZ4 and then writing that Guaranteed to be a viable format forever, or five years, whichever comes first.
During my career, I have seen shockingly many "future proof" formats come and go. I have come to adopt the attitude I met when working on a large project with digital libraries: Do not go for one standardized format, but let each library choose its own. When you, thirty years from now, need to recover some document, hopefully one of the different formats used is still recognized.
Don't forget that when you pick up that file thirty years from now, that ST506 disk needs a machine with the proper interface. And you need software to interpret the disk layout and directory structures of the file system. You may need to understand all the metatdata associated with the data stream. The record structure in the file. The encoding of numerics and characters. The data schema. The semantics of the data.
Sure, JSON is one of the fourteen everlasting syntax standards for how to represent a hierarchical structure. Ten years ago, it wasn't - it didn't exist. Some of those format used ten years ago are dead now; maybe JSON will be dead in ten years.
Bottom line: Never choose a data representation because it will last forever. Or more than five years.
If you have a need for that, make a plan for regularly move your data to a new disk (or other physical storage) format. Move it to a new machine with the proper interface for the physical unit. Move it to a new file system. A new character (/other low level) encoding. A new schema. A new concrete grammar. Be preparered for some information loss during each move. While having format n as the primary one, always preserve format n-1 (along with all hardware and software required to access it) until you switch to format n+1 - i.e. always have data available in two operational formats. Preferably generate format n+1 from format n-1, to avoid the losses from n-1 to n and further from n to n+1.
But first of all: Don't trust DCT100 or Travan to be a format to last forever. Nor eSATA. Nor SGML. Nor EBCDIC. HFS. BER. YAML. ... For five years, it may be safe. Anything significantly beyond that is gambling.
|
|
|
|
|
JSON will be around for a while. At the very least, it's very readable and a step up from plain text or csv. (And to be pedantic, JSON has been around for almost 20 years. XML has been around 24 years and it's predecessor, SGML, for 34 years.)
Your rant is borderline senseless and unproductive. I use the slang "future proof" meaning it will last as long as the program and you know that; you are arguing for argument sake and preening while doing so. Moreover, your statement "Do not go for one standardized format, but let each library choose its own." is meaningless in this context--short lived files--and in the broader sense since it traps you back where you started, afraid to do anything lest it become obsolete.
You are also mixing hardware protocols with file formats. Even obscure formats, such as BSON, would be readable in a lossless way fifty years from now as would a BMP. A plain text file of JSON or YAML is even more readable.
JSON is one step above key/value pairs--how would you lose information? And, if moving from one disk to another, what does "proper interface for the physical unit" have to do with anything? It's a file. I have 30 year old text files; should I be panicking. Perhaps I should have kept them on floppy disks and kept a floppy disk reader and format n-1 (whatever that is for a text file.)
modified 9-Jun-20 5:32am.
|
|
|
|
|
I am happy to see that you have rock solid confidence in today's high fashions. Keep it up! We need enthusiasts.
Nevertheless: When you work on digital library projects with the aim to preserve information (as contrasted to "data" and "files" - the semantic contents!) for at least one hundred years, maybe five hundred, then things come in a different light.
To illustrate problems, I used to show the audience an XML file where everything was tagged in Northern Sami. It made little sense to anyone (except that Sami guy who had helped me make the file). So why couldn't the entire world speak English? My next example was one where a 'p' tag identified a 'person', a 'part' or a 'paragraph', depending on context. It makes little difference whether those are XML or JSON tags if they make no sense whatsoever to the reader, or are highly ambiguous unless you have extensive context information.
Of course you can loose information! Say that you want to preserve a document where page formatting is essential (it could be for legal reasons, or other): For this digital library project, it didn't take me long to dig up seven different stragegies in use in various text processing systems for how to handle space between paragrahps in connection with page breaks. If you can "do an XKCD 927" and replace the 14 existing standards with a 15th replace them all, then good luck! Many have tried, none have succeeded. When you select a format, whether JSON, MS-Word, HTML, PDF or any other for your document storage, and convert documents in other formats to the chosen one, you will lose information.
I could show you a pile of different physical media with my old files, totally inaccessible today. If you want to preserve data, you cannot simply say "forget about physical formats, interfaces, media - as long as we use JSON it is safe". No, it isn't. The Travan tape reader uses its own interface card, for the ISA bus. I no longer have a PC with an ISA bus. I've got one SCSI disk with a 50-pin connector, it is not the Centronix 50-pin but a "D"-connector with three rows of pins. I once had a 50-pin D-to-Centronics cable, but even if I had saved it, I have no SCSI-interface where it fits. I have got the full source code of the OS I was using, on an 8-in harddisk cartridge disk, but this millenium I haven't seen a reader for it. I still keep a Win98 PC alive: If I plug the USB floppy disk reader into a modern (XP or later) PC; it won't read floppies without a proper format code in the boot sector. Win98 could.
Sometime in the 1990 I has a user coming with 8" floppy disks from an old project, asking if we could still handle them - I was the one who still had a reader for those. The user had no idea about formats at any level, but they needed all the information they could get from these floppies, whatever it was. I copied each floppy to a disk file for analysis; at first it looked like digitzed white noise. It was suspected to be text, so I ran a count of byte values: the most common values where the EBCDIC values for 'e' and 't'. EBCDIC comes in many variants ("code pages"), and we had to try a few before getting all the characters right. That made it possible to identify the block size, and I could start flipping unordered blocks around to make longer strands of coherent text.
Then I might have found some JSON (if it had existed by then) to start to identify the information - the semantics. Forunately, the information was not in Northern Sami. Or in French - in another project, we obtained another structured text, a C source library for record management, that had high reputation. When we got the source code, all variable names, all comments, all documentation, was in French, a language none of us mastered. We were unable to make the extensions to the library that we had planned.
Long time information preservation and recovery goes far beyond JSON style "firstName" : "Peter", "age" : 51, ... JSON is no more than some dots of foam on top of the ocean. Cute enough for a demo, and you may even employ it for something useful! But dig down in history to see how ASCII was introduced as the final solution for all computers all over the world to interchange arbitrary data, the solution to all interconnect problems. The universal format for all purposes. Yeah, right.
|
|
|
|
|
The original poster isn't preserving information; he is saving it temporarily. This is all about temporary data.
Right now you are just repeating the obvious. You also shifted your argument; you went from you will lose information to you can lose information. Then you introduce both complex documents and hardware into your argument. Nobody disagrees about the transience of hardware standards and physical media, so why preach on it? What does it have to do with a text or csv file I have from 1993?
I have thirty year old source that was on floppy, tape backup, Zip drives, Jazz drives, various hard drives, CDROM, DVD-ROM and is now on an SSD and on OneDrive. It still compiles. Yet, you argued that I was all but guaranteed to lose information in each transfer; that no file format lasts. It has. (Granted, I'm the only one to care about that specific project, so when I die or get tired of keeping it around, it will vanish, but that's also true about almost everything I own--that's life.)
I'm also a bit baffled by the claim that "ASCII was introduced as the final solution for all computers all over the world to interchange arbitrary data". No it wasn't. Your straw man collection is now complete! Or is it?
|
|
|
|
|
I'm not sure where "member" is coming from. I get the point, but I'm not sure it's meaningful. This all boils down to requirements, and nothing has been offered by the OP except for "I need to write it fast." There is nothing to do with hardware here - it might be another requirement.
As for the file format - document it and you're done. This is not rocket science. Custom file formats are all over the place, and as long as they are documented, you can always write a filter to change their format. OP, I recommend that you rig up a sample application that dumps dummy data at the frequency you need. See if it works. Fool around with it a bit. This is what you originally posted:
Quote: 1. How to save the structure (900 KB) to the file in reduced size format, such that the file size doesn't become too big?
2. Saving the structure every second data as an individual file for reading and writing (there will be 60 files for 60 seconds)
or saving the structure sequentially in the same file(only one file)...Which is efficient for reading and writing
Please note that I am going to write the structure to the file for 4 days as a maximum period.
Item 1: why do you care about file size? Are you on an embedded system or have restricted resources? Does this project limit your disk size? If not, size the drive accordingly, write the data and move on.
Item 2: see my note about - *try* something. You can easily rig up code to test this. fwiw, you've not provided sufficient requirements for us to help you.
You are going to have to worry about mutual file access, so you'll need to be thoughtful, but I don't see a performance concern at all.
Charlie Gilley
<italic>Stuck in a dysfunctional matrix from which I must escape...
"Where liberty dwells, there is my country." B. Franklin, 1783
“They who can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.” BF, 1759
|
|
|
|
|
charlieg wrote: This all boils down to requirements, and nothing has been offered by the OP except for "I need to write it fast." There is nothing to do with hardware here - it might be another requirement. What spun off this sub-thread of the discussion was
Joe Woodbury wrote: If the data needs to be future proofed, consider serializing using RapidJSON (which is a very fast C++ JSON library), compressing the result with LZ4 and then writing that. "Future proof" was not stated as a requirement, but now that Joe Woodbury presented what is - in my eyes - a rather naive approach to future proofing, I chose to point out that if you want future proofing, it takes a lot more than just using a basic structure encoding that is currently fashonable.
It seems quite obvious that Joe Woodbury has never been working in the area of long time information preservation. I have a few years of experience. I know that it is not a trivial issue. When someone makes a statement that suggests "Just use JSON and LZ4, and the information is safe for the future", I think that this is so naive that it crosses the border to "fake news", and I want to correct it.
However, Joe Woodbury is not willing to accept anything that can affect the validity of his claim, calling my comments a "rant", "senseless and unproductive", that I am "mixing up" things by pointing to other important elements, that I am "just repeating the obvious".
I wrote "Be preparered for some information loss during each move". When Joe Woodbury stated "JSON is one step above key/value pairs--how would you lose information?", I went on to provide examples. Then he comes back with "You also shifted your argument; you went from you will lose information to you can lose information", and concludes "Your straw man collection is now complete!"
I don't think anything valuable will come out of a further discussion with Joe Woodbury. So I let it rest.
What regards "just document it": I have seen guides seriously suggesting a URL to visit if you have problems connecting to the Internet. I have seen document format descriptions stored electronically in the format that is described. I have seen format "documentation" that is hopelessly inadequate - having worked with the format for a long time gives the documentation some value, but often you need access to the format designer to have him explain it. I have format descriptions on 5.25" floppies. Are you able to imagine that there could be a complete breakdown of the Internet? How much of the format descriptions would then be inaccessible?
Documenting the format is a neccesary, but not sufficient provision for the data to be accessible in the future: The documentation must unconditionally be available to the person who needs to decode some data. That takes a lot more than just JSON or something similar; it takes a full storage stack all the way down to the physical medium. That could be a physical printout, on acid free paper. How many format specifications do you currently have in a printed format on acid free paper? I've got a couple printed ones, but I am not sure that the paper will survive that many years.
One of the format descriptions I have in print is for a 30+ year old format; the manufacturer went bankrupt 28 years ago and the format description was only available internally. The format was used for tens or hundreds of thousands of documents that are still residing in old archives around the contry. If someone need to access one of these documents, how would they know that they could come to me for the format descriptions? I have asked a few of my co-workers from the early 1980s if they have kept the specification document; I haven't met one that has. It could be that my copy is the last one in existence (at least in an immediately available format - the electronic version of the format was written in its own format).
So your requirement about documenting the format is satisfied. When you need to decode a document file in that format, just come to me. Problem solved. Or...?
|
|
|
|
|
I catch what you are saying, it just seemed to me that for the context of the original question the discussion veered. No harm or foul.
Charlie Gilley
<italic>Stuck in a dysfunctional matrix from which I must escape...
"Where liberty dwells, there is my country." B. Franklin, 1783
“They who can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.” BF, 1759
|
|
|
|
|
We do not know how WaitForSingleObject is implemented. We are just relying on the MSDN documentation.
Documentation says, if I pass INFINITE as a timeout value, the function will be blocked indefinitely until the object is signaled.
But when I checked the #define value of the INFINITE, it is 0xFFFFFFFF
So we pass OxFFFFFFFF to WaitForSingleObject as a timeout. It is equivalent to 49 days.
So my question is, does WaitForSingleObject timeout after 49 days if INFINITE is passed.
Prafulla Vedante
modified 8-Jun-20 13:26pm.
|
|
|
|
|
Without testing it, I'd say it won't time out. Some value has to be used as INFINITE , and the maximum unsigned value is a good choice. Writing the code so that it never expires isn't difficult; it's not like something is going to wake up and decrement it every millisecond.
|
|
|
|
|
0xFFFF is a value which is literally NOT "passed". That is, it's something like a zero ("0") used in the BIOS that a user can select/type as input which, under advanced CPU settings for example, signals that "ALL" cores are to be used (say there are 16 cores). Which is not to say that 1, 2, 3, 4, etc cores can take zero's place in the control for the specific input value.
It seems contrary to common sense use that a real value can be used to represent a ceiling or a floor as such but when it is the case, generally there's a typed message/note to the substitution next to such a control.
|
|
|
|
|
Quote: But when I checked the #define value of the INFINITE, it is 0xFFFF. Probably you meant 0xFFFFFFFF .
That said, the documentation is clear: if you pass INFINITE (0xFFFFFFFF ) then the function shall wait forever (not 2^32-1 milliseconds). So it just depends on how much you trust the documentation.
|
|
|
|
|
You are right 8 Fs are there as it is a 32 bit value.
Quote: So it just depends on how much you trust the documentation.
Quote Selected Text
Yes, As the api source is not open source, Is there any minor possibility of having it blocked for 49 days..... Probably we can check it by debugging the api in assembly code.
I am just doubting if microsoft has passed the timeout value to the internal generic logic considering the user will never keep itself block for 49 days ....
Prafulla Vedante
|
|
|
|
|
Generally speaking, I trust their documentation.
|
|
|
|
|
PrafullaVedante wrote: considering the user will never keep itself block for 49 days ....
I have. Code starts and a section of it waits for a global shutdown object to be signaled. (That aside, I believe Raymond Chen has confirmed that INFINITE really is infinite.)
|
|
|
|
|
Supose two matrices are there a[3][3] and b[3][3] we scanned the element in a[][] matrix and b is the transpose of former . NOW
when we write a code to assign the values we use
for(i=0;i<3;i++)
{for(j=0;j<3;j++)
b[i][j]=a[j][i];
}//correct
but when we do b[j][i]=a[i][j] it gives a wrong result why? since in both the cases all assigning cases are same.
|
|
|
|
|
This should be very easy to trace in a debugger. You don't even tell in which way the result is wrong, and you do not show the declaration to fhe two matrices, so there is really not much information to make a qualified guess.
|
|
|
|
|
Of course the result must be the same. Try
#include <iostream>
using namespace std;
void transp_v1( char tgt[3][3], char src[3][3] );
void transp_v2( char tgt[3][3], char src[3][3] );
void dump( char m[3][3] );
int main()
{
char a[3][3] =
{
{ 'a', 'b', 'c' },
{ 'd', 'e', 'f' },
{ 'g', 'h', 'i' },
};
char b[3][3];
char c[3][3];
transp_v1( b, a);
transp_v2( c, a);
dump(a);
cout << "--------\n";
dump(b);
cout << "--------\n";
dump(c);
cout << "--------\n";
}
void transp_v1( char tgt[3][3], char src[3][3] )
{
for (int i=0; i<3; ++i)
for (int j=0; j<3; ++j)
tgt[i][j] = src[j][i];
}
void transp_v2( char tgt[3][3], char src[3][3] )
{
for (int i=0; i<3; ++i)
for (int j=0; j<3; ++j)
tgt[j][i] = src[i][j];
}
void dump( char m[3][3] )
{
for (int i=0; i<3; ++i)
{
for (int j=0; j<3; ++j)
cout << m[i][j] << " ";
cout << "\n";
}
}
output:
a b c
d e f
g h i
--------
a d g
b e h
c f i
--------
a d g
b e h
c f i
--------
|
|
|
|
|
but as per my code its giving wrong result whats wrong with my code
i tested many times
void main(){
int a[3][3],b[3][3],i,j;
for(i=0;i<3;i++)
for(j=0;j<3;j++)
scanf("%d",&a[i][j]);
for(i=0;i<3;i++)
{for(j=0;j<3;j++)
{printf("%d ",a[i][j]);
}
printf("\n");
}
for(i=0;i<3;i++)
{for(j=0;j<3;j++)
{b[j][i]=a[i][j];
printf("%d ",b[i][j]);
}
printf("\n");
}
}
|
|
|
|
|
Your code is wrong because you are printing the b matrix while you're updating it (you are printing 'not yet updated' items).
Try
#include <stdio.h>
int main()
{
int a[3][3],b[3][3],i,j;
for(i=0;i<3;i++)
for(j=0;j<3;j++)
scanf("%d",&a[i][j]);
for(i=0;i<3;i++)
{
for(j=0;j<3;j++)
{
printf("%d ",a[i][j]);
}
printf("\n");
}
for(i=0;i<3;i++)
{
for(j=0;j<3;j++)
{
b[j][i]=a[i][j];
}
printf("\n");
}
for(i=0;i<3;i++)
{
for(j=0;j<3;j++)
{
printf("%d ",b[i][j]);
}
printf("\n");
}
return 0;
}
|
|
|
|
|
b[j][i]=a[i][j];
printf("%d ",b[i][j]); So you wanted to print the value of b[i][j] after you updated b[j][i]?
"the debugger doesn't tell me anything because this code compiles just fine" - random QA comment
"Facebook is where you tell lies to your friends. Twitter is where you tell the truth to strangers." - chriselst
"I don't drink any more... then again, I don't drink any less." - Mike Mullikins uncle
|
|
|
|
|