I would guess that something, more like SED, is what you need. As I remember it from way back (many years), it would search for differences down to single characters. Its name means Stream EDitor.
I'm not too sure whether this would work or not, depending on the implementation. Usually DIFF algos are based on line differences and the whole line is marked as different. Depending on how large the files are, this could take a lot of time (order O(nm)). The method I developed talking to Skippums (in the referenced thread) speeded this up with some tricks.
Later, Skippums had another problem. He wanted an archive method for huge files (exceeding 512 MB) where the files were prior versions of documents, or programs, etc, etc - what it amounted to was he needed a small SED type of file created in order to save only the changes. Once you get into massive files, comparisons like this are extremely difficult and time consuming. I developed a method to minimize this time and was testing this. I asked for a sample file or two of this massive size (I could only come up with 9 and 11 MB libraries). I had sent him this data (the algorithm - my code was in MASM) and he finally responded to my Email about a month later explaining that he had been on vacation and would get back to me That was in July. I have checked his message log, and that is about when his last message was posted here.
I have also worked with DIFF algos.
If any of this algo info would be of use to you, I would be happy to help you.
There is a question I have been wondering about for ages and I was hoping someone could give me an answer to rest my mind.
Lets assume I have an input stream (like a file/socket/pipe) and want to parse the incoming data. Lets assume each block of incoming data is split by a newline like most common internet protocols. This just however just as well be parsing html/xml or any other smart data structure. The point is that the data is split into logical blocks by a delimiter rather than a fixed length. How can I buffer the data to wait for the delimiter to appear?
The question seems simple enough: just have a large enough byte/char array to fit the entire thing.
But what if the delimiter come after the fixed size buffer is full? This is actually a question about how to fit a dynamic block of data in a fixed size block. I can only really think of a few alternatives:
1) increase the buffer size when needed. This may require heavy memory reallocation, and may lead to resource exhaustion from specially crafted streams (or perhaps even denial of service in the case of sockets where we want to protect ourselves against exhaustion attacks and drop connections that try to exhaust resources...and an attacker starts sending fake, oversized, packets to trigger the protection)
2) start overwriting old data by using a circular buffer. Perhaps not an ideal method since the logical block would become incomplete
3) dump new data when the buffer is full. However, this way the delimiter will never be found so this choice is obviously not a good option
4) just make the fixed size buffer damn large and assume all incoming logical data blocks is within its bounds...and if ever full just interpret the full buffer as a logical block...
In either case I feel we must just assume that the logical blocks will never exceed a certain size...
Any thoughts on this topic? Obviously it must be a way since the higher level languages offer some sort of buffering mechanisms with their readLine() stream methods.
Is there any "best way" to solve this or is there always a tradeoff? I really appreciate all thoughts and ideas on this topic since this question has been haunting me everytime I have needed to write a parser of some sort.
You're right reallocation can be very costly, one solution I've seen to be quiet effective is to have a memory pool (set of buffers of common length) and have your stream maintain the chaining, as more memory is required (using a linked-list like approach) chain-in more blocks of memory from the pool.
Finally (socket example) when the frame, packet or token has arrived, provide per item(char usually) iterator access to the chain of buffers so as to make the chain transparent to the end user.
This obviously has its limits as well, specifically the largest block you can read will be the size of the pool of memory, however due to the design choice, if you ever reach that limit all you need to do is create more memory blocks in the pool which is much cheaper than realloc'ing everything accumulated so far.
It sounds like you need to make a policy decision that's algorithm-independent: How large can a block be before you decide it's a denial-of-service attack?
This is domain-dependent. It depends on the knowledge you have of blocks' contents from your particular domain, and on the resources of your system. If you have lots of memory, you can afford to accept a big block on the chance it may be real.
The memory-reallocation approach wastes time because each reallocation requires you to recopy all the data received for that block so far. This starts to approach an O(n^2) running time for what should be a linear algorithm.
It seems like a buffer pool would use up memory waiting for big blocks that may never come. It would also limit the maximum size of a block you could accept.
The linked-list approach is the standard way of dealing with this type of situation; don't preallocate anything, and dynamically allocate blocks as you need them.
I know that I am opening Pandora's box, and I fully expect to be met with an angry hoard of rabid Bats, but I have an idea for constructing a convex hull from a data set. What I was looking for was a "Best" algorithm (here come the Bats) so that I might code it as a baseline to see if my algorithm is any faster. I was actually looking for triangulation, but wanted to start with the convex hull first.
I have already googled convex hull, but there is not too much info about relative performance.
The data will be just 2D (x,y) where there will be 64K-1 unsigned shorts for both x and y, no repeats (randomly select an integer from the set of (1 to (64k-1)), then remove the integer from the set, decrement the set size, and randomly select another entry from the remainder of the set). x and y will be independently randomized. The x y pair will not both be equal (swap the y value with the immediately following or previously defined pair whichever does not violate x!=y for either resultant pair). The set of points will be saved on a file as binary pairs, each algorithm will then be used to read the points and create the convex hull.
I will code my algorithm and the base algorithm in MASM, single threaded (the method should be able to be implemented as multi threaded, but for simplicity and timing, I wanted single threaded). I will use the MASM32 timing macros to set the CPU into Real Time Priority to try to get as accurate a timing as possible.
I'm not looking for code, just a detailed algorithm which would allow me to implement this sought after "Best" algorithm to use as a baseline.
Anyone have a guess how long this 64K-1 set might take to encircle? Should I reduce the size of the set?
Yes, I have read all of the CP articles and Wikis related to "Convex Hull" and read all of their references, but there is not too much reference to relative speeds under the same set of conditions, the touted advantages are mainly about how the algorithm extends to three or more dimensions, etc.
My attempt will be along the lines of "divide and conquer" with several twists I dreamed up (literally - went to bed thinking about the problem and woke up with what I thought was a good solution).
Sorry for the double post, but please forgive me. I did not thank you for your reply (I did vote you for a good answer to my question). I actually had expected you to be the first to respond, and my initial reply was a attempt to be humorous (think back to 1941 and "Casablanca").
The limit for a particular class of functions or an arbitrary function?
If you want to enter any given function and return the mathematical limit, then you'll need a parser and some kind of symbolic math engine in order to carry out operations like L'Hopital's rule, etc...
I am wondering what kind of answer you are looking for. Are you looking for a numerical answer such as 3.14159 or a symbolic answer like PI? Also, do you want this function to work any function? For example the limit as x goes to 0 of log(x) is not defined. I believe I could give you a better answer if I had a better idea of the real problem you are solving.
If you already have some for drawring, you should be able to adapt that. Lim x->c = slope of a function at point c, right? (change in y over change in x) If you add some boundary checking, you might come close without too much code required.
I think you need to be more clear about what, exactly, you want. For certain classes of functions calculating the limit is quite involved and requires differentiation. If you want to do this for any input function then you'll need to get into symbolic manipulation. That's a project in itself. Calculating mathematical limits is not an easy task.
Thank you all
In fact yes, I want to write a program for calculating all limits, for example Lim(sin(2/x)) when x->0, or even x->Infinite
so I must write a parser I think
but maybe I can find something that helps me how to start!
The problem is not just the parser. You will also need both a symbolic manipulation and numeric component. The symbolic package will be needed to calculate derivatives for certain limits that require l'Hopital's rule[^]. Other methods used in computer-based limit calculations are Wynn's epsilon algorithm[^] and generalized Euler transforms.
As you can see, it is a non-trivial project and requires an understanding of some rather advanced mathematics.
Yeah, but don't black bodies absorb and then re-emit photons of the same frequency? Melanin absorbs UV and then re-emits about 99.9% of it as infra-red. How does it do THAT? Where does the rest of the energy go?
Richard A. Abbott wrote:
Are you querying this because of interest in Anti-Cancer agents or ???
If I can remember back 40 years to my Chemistry/Physics classes, the energentic UV hits the atoms of the Melanin and causes the electrons to jump several states (all jumps are discrete jumps of energy levels), and then the electrons drop back down one level at a time releasing the energy in the form of IR (lower frequency, lower energy). There is no extra energy, but the IR will cause an increase in temperature in the local area.