Add your own alternative version
Stats
533.2K views 4.6K downloads 333 bookmarked
Posted
24 Sep 2006
|
Comments and Discussions
|
|
I read all commented following Damon Carr's post. Very interesting, the conversations between people from different backgrounds.
After all, I always think the old technology is not necessarily bad technology, and new technology is not necessarily a silver bullet more powerful. I do agree that "standard compression" suggested by Damon might be more appropriate, of course, depending more "factors" unspoken.
It is always risky to put a lot programming efforts on the top of an architecture not intend for the use case. This could be consider as hacking the architecture, comparable to over-clocking the CPU.
I don't deny that sometimes we need to hack to make things work (for deadline, or for lack of knowledge of alternative architectures), but we need to keep in mind we are hacking.
For articles published in CodeProject, I wish the authors who publish their works, in particular programming works, talk about more "factors", such as use cases, limitations, and how to make choices or balances between alternative solutions.
Zijian
|
|
|
|
|
Hi Zijian
The issue with using "standard compression" for remoting (I'm assuming you mean by creating custom sinks) is that it can only be applied *after* the object graph has been serialized.
It will help therefore to reduce the amount of information that is *transmitted* across the wire but won't help with memory usage/fragmentation prior to transmission and it won't help with the speed of the serialization process - in fact it will add an overhead both timewise and memorywise. It also won't help if the serialization process throws an OutOfMemoryException.
I don't believe it is a hack at all. Microsoft supplies the ISerializable interface specifically to allow the developer to store one or more data items each with a string tag.
All this code is doing is storing a single data item with a string tag (a byte[]) which contains a binary representation of all the data items combined.
Therefore it is still well within the contract defined by ISerializable and therefore works transparently with respect to the remoting process. I don't agree that it is a new technology from the remoting point of view - that stays *exactly* the same - the only difference is the data supplied to that technology.
Cheers
Simon
|
|
|
|
|
Hi Simmo
Well, read all the comments, and I just thought I drop you a note to say Thank You for the excellent and relevant (to me) code. I work in the Financial Industry, and specifically are involved a lot with Trading Systems and Internet Trading Systems for Stock Exchanges and Futures Exchanges. Every little piece of bytes shaved of data, not to mention improvements in serialization speed is extremly important. With your serialization technique I was able to shave 40% of the size of our quote messages (90% of messages sent). While this might not appear to make a major difference when the packets are only +- 180 bytes in size, to us, living in a country where bandwith is still an extremly expensive resource, this is major. This provides me with extremly happy clients as they do not need to upgrade expensive bandwith.
So, my 2 cents for the current thread is, that indeed for a lot of scenarios, this is extremly relevant code (and no, we are not using DataTable's). If you take a look at the FIX protocol, even they are using 7bit encoding for their new FAST protocol.
Drop me a mail and I will send you a Nice bottle of wine as a thank you .
Cheers
Andre
|
|
|
|
|
Hi Andre
Thanks for your comments and thanks for the offer of wine, very much appreciated but there really is no need - just happy that someone is making use of the code.
Cheers
Simon
|
|
|
|
|
Zijan,
If I had 1/2 the diplomatic skills you have my post would of been almost exactly the same. But I was working on 48 hours with no sleep.
If I only had one point it would be what you said:
On articles published in CodeProject, I wish the authors who publish their works, in particular programming works, talk about more "factors", such as use cases, limitations, and how to make choices or balances between alternative solutions.
Unfortunately our industry still operates on the 'Fredrick Douglass' Scientific model and the 'Economies of Scale' model where a belief of 'more programmers on a slipping project, more work is done. No process change required. Waterfall has always 'sort of' worked for us since the 70s'.
We all know that is untrue. Work is often segmented to individuals who operate in a vacuum, amd people are put in a box.
'Oh, Jim is our database guy. You have to talk to him'.
'Oh, Mary is our Reflection expert. I cannot help you with that code'
etc.......
It's all creating waste.
Rarely do all team-members get to participate in all aspects (Architecture, Design, Iterative Development, etc.). THAT is what my firm does (and many, many like mine). There are no junior people. We all do everything. And we kick ass doing it time and time again (OK self-flattery I know).
For larger firms, I always see so many isolated 'cubicle programmer' as I call them, who could care less about the 'big picture', instead just hammering out code (no daily stand-up, no pair programming or collective ownership, no TDD, No Continuous Integration, VERY long build cycles instead on multiple a day, QA not involved for months after code is written (the most common) and dev 'throwing code over the fense) at the end of a cycle to QA, creating weeks of wasted re-work. The customer often gets involved only here and says.... Well I know we said X but now I see this, I think Y woul dbe better. The test works but fail it and send it back to dev. We need to change the requirements.
All these are disasterous. Our industry is like the caveman era for large corps that do not build software a revenue source. It is a cost-center and treated with massive disdain.
want a solution? Look at Toyota! We are so arrogant in the US. Very few have the guts to stand up at the C-Level and say ENOUGH! we are loosing millions here! The CEO says 'but it is technology... I don;t kow how to fix it' and the CTO is probably some cronie of the CEO so he spins it, and the CEO buys it by mentioning some 'silver bullet' of the moment which typically fails.
Just read the two books by the 'Poppendiecks' to see why.... Start with the first, then the second. If you don't learn this stuff, prepare to be a cubicle programmer forever with a commensurate salary/compensation.
For example, do you follow a waterfall method? Large up front requirements? A static project plan? God, is this new to anyone? This is common sense since like 2002!
You will almost certainly fail if so...... Not my words... Studies have proven this (I can provide them if anyone likes). My book is largest based on these facts (not opinions).
I am almost embarased to be involved in this, but at least I am part of the solution (or so I tell myself).
Thanks,
Damon
|
|
|
|
|
To jump in here...
1. It's difficult to criticize a design without knowing the history that led up to the design decision.
2. The real point of this article and my article that the author referenced is that Microsoft gives you the illusion that you just slap a BinaryFormatter onto a DataSet or DataTable and you're done. If you want to criticize anybody for design issues, I'd criticize Microsoft for creating a framework that makes it so easy to do serialization wrong way and for the wrong reasons.
1) Cost of Incremental Development and Maintenance. Remember the majority of costs for a system are in the maintenance AFTER release, not development. Most systems die a slow and painful death due to entropy and code like you advocate here which nobody can figure out (when the rookies come in to maintain the thing).
This is a most dangerous statement. It certainly reflects reality. I was essentially booted off a project because my code needed to be "dumbed down" for exactly the reasons you state--the rookies couldn't figure it out. But it's a dangerous statement because it implies that complicated, difficult to maintain code is not acceptable. Sometimes complicated code, or more precisely, the complicated issues that require the complicated code, are unavoidable.
In fact, what really surprised me was that most people seem to view layers of abstraction, declarative programming, and other techniques, which I personally view as promoting maintenance, they view them as making the code more difficult to maintain! It makes no sense to me.
DamonCarr wrote: I am an Agile process leader
Then you of all people should appreciate that the real issue is not the complexity of the code but the quality of the infrastructure supporting the code--the documentation, change logs, ATP's, unit tests, and so forth. Given a quality infrastructure, yes, a rookie should be able to maintain code to any level of complexity.
Yet in my experience, that's the area ignored in the cost of development. If the development cost accurately reflected the cost of documentation, testing, test plans, procedures, tools, etc., then the maintenance cost wouldn't be burdened with the costs of an incomplete development budget.
Marc
Thyme In The CountryInteracxPeople are just notoriously impossible. --DavidCrow There's NO excuse for not commenting your code. -- John Simmons / outlaw programmer People who say that they will refactor their code later to make it "good" don't understand refactoring, nor the art and craft of programming. -- Josh Smith
|
|
|
|
|
 Marc,
I love your points. Almost all I agree with.
To jump in here...
1. It's difficult to criticize a design without knowing the history that led up to the design decision.
Agreed. That is why I repeatedly asked for more information on what led to the design. Nice point, but I tried my best not to do what you have described.
2. The real point of this article and my article that the author referenced is that Microsoft gives you the illusion that you just slap a BinaryFormatter onto a DataSet or DataTable and you're done. If you want to criticize anybody for design issues, I'd criticize Microsoft for creating a framework that makes it so easy to do serialization wrong way and for the wrong reasons.
Wel... I would say it is a poor design (again depending on (1) above) to serialize the types.. But that is just me... Microsoft HAD to make this work. They are large types because they are amazingly powerful.
Most people doing read-only should get a DataReader and move info into Domain Classes stored in a Generic collection or other serializable collections (Like HashSet from IESI (SP?)- another plug for NHibernate) (with the domain types defined by Interface or AbstractBase).
1) Cost of Incremental Development and Maintenance. Remember the majority of costs for a system are in the maintenance AFTER release, not development. Most systems die a slow and painful death due to entropy and code like you advocate here which nobody can figure out (when the rookies come in to maintain the thing).
This is a most dangerous statement.
Like Darwin's? That was called dangerous. It still doesn't make it false or irrelevant (not that I mean to infer you are saying that).
It certainly reflects reality. I was essentially booted off a project because my code needed to be "dumbed down" for exactly the reasons you state--the rookies couldn't figure it out.
I have almost left this industry for this reason. I fundamentally do not believe in 'entry level' developers except for the most irrelevant projects. Give me 4 superstars and I will blow away 50 mediocre people any day (depending on (1) above - ha ha).
But it's a dangerous statement because it implies that complicated, difficult to maintain code is not acceptable. Sometimes complicated code, or more precisely, the complicated issues that require the complicated code, are unavoidable.
Kent Beck says he is often critizised for this. He says: NO! Don't just make the code simple
r then it needs to be, make it AS SIMPLE AS IT MUST BE TO MEET CURRENT AND FUTURE CHANGES! (ok I added that last but but it is the same).
If someone does not know Design Patterns, and other other techniques (plug-in architectures using reflection, etc.) then they should:
1) Learn - Buy 'Head First Design Patterns' and Read it for god sake! Buy 'refactoring and Refactoring to Patterns! (Fowler/Kerievsky). I have a full reading list on Amazon for .NET people. Many others do as well. The best developers are up to 28x better then the worst. Where are you reader? I only hire those top people and I make a massive return and they get paid hundreds of thousands a year. You wwant to make $40,000 a year? Want to go home and have a beer and watch American Idol? See 2. I will profit from you lack of interestingness.
2) Find a new profession
3) Work for the Government (not NASA please or any other area where I could die)
In fact, what really surprised me was that most people seem to view layers of abstraction, declarative programming, and other techniques, which I personally view as promoting maintenance, they view them as making the code more difficult to maintain!
See above. Abstract or go home. SERIOUSLY! We need a zero tolerance policy for this crap! Why will it never happen? The people hiring them are even stupider.
It makes no sense to me.
Nor me... That is why I come in and charge insane amounts to usually be told 'we cannot change like that. It is too hard'.
Fine by me. I still get paid and you still loose another $10,000,000 a year in wasted dev projects. Oh, and I know to short your stock.
DamonCarr wrote:
I am an Agile process leader
Then you of all people should appreciate that the real issue is not the complexity of the code but the quality of the infrastructure supporting the code
OK here we diverge a little. It's not the complexity, but the NECESSARY complexity. No more. In other terms, as simple as possible but nbo simpler (as I believe I said before). This article presented (see (1) for a caveat) what some might see as a license to violate this.
THIS IS HOW THIS ALL STARTED! Now it is a full-on discussion and I love it.
--the documentation, change logs, ATP's, unit tests, and so forth.
See my article on this in 'Agile Development'. I am with you here (although it depends on what you mean by documenation).....For me I need:
1) Test Driven Development
2) Continuous Integration
3) Nightly full-Automated System Regression Testing (like AutomatedQA)
4) Daily Stand-Up
5) Short (1-2 week) iterations
and here is where I start to brea away
6) An obsession with Design Patterns as the iterations evolve. We call this 'Pattern Hunting'
7) We reverse engineer the code into UML Diagrams at the start of an Iteration, look for patterns, play with ideas, and start TDD, eventually throwing away all UML diagrams. They are just a base of reference and an 'AH HA! Here is a place where a Command pattern would help!'.
TDD is the 'architect' of a system for lack of a better term, not large up-front UML (and I know you never said that).
6) MASSIVE customer involvement
7) 'Iteration Planning; and ORM 'Index Card' style thinking
8) LOTS of white boars everywhere
9) I know I am forgetting at least 1 thing......
Given a quality infrastructure, yes, a rookie should be able to maintain code to any level of complexity.
Absolutely, positively not true. Here is my main disagreement with you.
We cannot compriomise our profession to (as people would say here) the 'lowest common denominator' which is an idiot in most cases. Do this and all is lost.
Instead? REQUIRE ALL DEVELOPERS TO 'RAISE THE WATER LEVEL'. How? I certainly do not know. I know Brain Bench has a Design Patterns certification which is probably the single most important thing I can think of for a developer to master.
Often my first interview question: Name 3 Design patterns and how you used them to make your software more flexible to change. Fail on that? Interview is over.
90% fail.
Second question:
What are the two main data types in .NET and how are they managed differently in terms of data structure (Reference/Heap and Value/Stack). The remaining 6% fail. That leaves 4%. I am lucky to hire 1 (if 100 were starting). Bonus points for the large object heap (almost nobody mentions it).
So we have a failure of: 1) OO Principle as Practices that are over a decade old and 2) Platform Knowledge. And I have like 18 more questions!
No wonder Microsoft and Google get the best people!
Assume we recieve a new requirement which fits perfectly into the existing Decorator we have set-up.
The 'rookie' has never even heard of the Decorator. So he will F**L things up. Again, Michael Feathers wrote the absolute definitive book (Working Effectively with Legacy Code) on this (although he is far more diplomatic then I).
Hell, most NEW projects are starting out by writing legacy code!
Yet in my experience, that's the area ignored in the cost of development.
Yes... Managers need to hire FAR better people and FAR fewer people. Buy how? They need to:
1) Be amazing themselves or
2) Have help from someone amazing in the hiring process
3) Read and re-read the book 'Facts and Fallacies of Software Engineering' by Robert Glass.
If the development cost accurately reflected the cost of documentation, testing, test plans, procedures, tools, etc., then the maintenance cost wouldn't be burdened with the costs of an incomplete development budget.
They may never. Can you imagine going to get money approval for a $10,000,000 project (3 years) and saying 'well it is now $100,000,000 over 15'?
It's like political figures with a constituancy, CEOs, etc. Deliver short term and get as much (which is not much) staytegy into your legacy. That is why CTOs are the least likely to become CEO and the most likely to be fired.
Thanks,
Damon Carr
|
|
|
|
|
What I've done:
I read the article and the ensuing discussion in full... I did not download the code and play with it, but I did read over it enough to understand what it's doing.
What I still don't understand:
Why was this developed? What is the problem this is solving? Optimization, in general, is done to create performance gains in a situation where performance is lacking, or is the major failing point to a existing stable system.
What is the system? Could someone please describe, in simple terms, the system in which this code is playing a part? Could someone also describe how the standard serialization techniques would be applied, and then compare and contrast it with this solution?
Could someone describe how the standard serialization scheme created a performance problem which was best solved by this optimization?
I apologize if I missed where this was already clearly explained.
The reason I ask, is that this seems like an interesting topic, and the discussion has some obviously experienced and capable people involved, and if I was able to put it in some kind of concrete context, I think I would get a LOT more out of the discussion as a whole.
As a side note, this might also be very useful to me in the near future, as we're currently working on a dataserver architecture for a distributed computing system which will involve extremely large object counts as well as large individual object sizes in some cases, being shuffled around between various systems depending what work needs to be done on the object at the time... A fast and stable serialization and transport system would benefit this project greatly, and the existing systems have already proven somewhat insufficient for our needs.
Thanks,
Troy
|
|
|
|
|
Troy,
My stack pointer is at an invalid memory location for my Win32 process and about to try to jump to pop the stack and execute the pointer for a routine that is supposed to be there..... - GRIN.
I hope on the behalf of the small group you can see that EVERYONE is right from the correct perspective. Nietzsche helped us understand that there is no one universal right and wrong vantage point. It is dependent on the observer and their own mental filters and the specifics of their concerns. Listen to the reports of a crime from 10 suspects and you will see just how important and powerful this is.
We can say things like 'rape is wrong and we need laws to prosecute those who commit it' and who would argue? Not me….
However, let’s take the moral and legal points out for a moment and ask ‘Why do men rape’? Actually people already have. Some very smart ‘Evolutionary Psychologists’ (a field I am deeply interested in as a way to gain a better understanding of ‘human nature’) who wrte a book on the idea that JUST SAID MABEE this was an evolutionary trait that evolved as a mating option for those with no other. Does that make it MORALLY right? Hell no.
Does it help us deal with this profound problem? Absolutely it does, as it would lead to a better understanding of rape prevention and treatment for those who perform these vile acts.
But you might know what happened to them. They were attacked viciously (as I was at least once here).
Our society says to scientists:
1) You Can work over here
2) But don’t even THINK about going there (as in ‘the Bell Curve’ which tried to attribute certain traits (intelligence being just one of many) to ethic groups. They were also attached viciously as it went against one of the largest false statements ever made: ‘All men are created equal’.
Yeah? Then what about the twin studies? What about children from highly intelligent parents who are adopted? Yet statistically they are FAR more likely to carry on that intelligence. I could go on and on….
JUST FOR RAISING THE INTELLECTUAL ARGUMENT they (and hundreds, even thousands of scientists who choose to study 'unfavorable' subject matter are prosecuted, jailed, exiled, even killed for their work). Am I likening myself in some grandiose way to a scientist? Well I am a scientist. So are all of you. I just don’t have a PhD (grin).. Seriously we all must not allow our own baggage to keep us from an objective view on new ideas.
These scientists were destroyed in the media and as far as I know, lost their careers or significantly destroyed them (anyone? I am not sure…). Did they deserve it for just doing what they are supposed to do? Hel no…NOTHING is beyond deep examination at any time, especially that which you hold most sacred and truthful.
My point? They were just scientists investigating a hypothesis and were doing exactly what they should; screw the 'taboo' nature of the subject. Religion teaches us not to lie, however it practically discourages scientific discovery of the truth. So religion and science are aligned in one way, yet science often proves religion to be incorrect in 'faith' based believes.
My god the Pope even now says Darwin had it right, however the instant we became conscious, it was a divine act. Hmm.. Interesting that of the millions of species that evolve in exactly the same way we did, we are the only ones.. But I have nothing against religion (OK I do but it is not relevant here).
How did we now get to religion? Just stay with me…
Religion demands we not lie and seek truth. You may not immediately see my larger point here. Both institutions (Science and Religion) are correct and form a kind of balance. People need faith based ‘supernatural’ beliefs as humans. We now know that. However science is destroying these ‘supernatural’ religious beliefs one by one (for all religions).
Science proved the ‘Shroud of Turin’ to be a fake. Yet it is still proudly displayed and worship because WE NEED FAITH.
We even have the big-bag figured out to something like everything after 10 to the -100000 seconds. It is predicted we will be able to create ‘Carbon based Life’ in the same form as the first life on Earth in labs within 10-25 years (or less) and the ‘singularity’ where Computers exceed human intelligence is likely in the next 20-50 years.
Both have a place in society, just as all of the opinions here are valid.
So I may appear to be asking others to agree with me in my posts (and in a way I would be lying to myself and my ego if I said it wasn't a nice thought), but what I am REALLY doing instead is ASKING THEM to try to see my 'observer's' perspective and to try to view this problem from an opposing (but no less valid) viewpoint.
I think if you were to say 'He was right here and he was wrong here' it would be a negative and possibly destructive way to move forward. There is no way one person can say this as a kind of ‘universal truth’.
All I can even say is 'based on the work of others and based on my experiences, this APPEARS to have been a bad implementation'. And even then ALL I AM ASKING FUTURE AUTHORS TO DO is to first write a disclaimer as such in similar situations:
1) "The techniques provided in this article are not generally recommended as a 'first line of attack'. Instead this is a solution when you are not faced with any other alternatives but to get yourself out of a jam you likely didn’t create. To START a design using this work would be a mistake as Optimization should be left to the end of an iteration. Optimize last, never before there is a very good business driven reason and be sure it is not caused by a flawed design/architecture first if you have the luxury to revisit it. Also remember solutions in software are not ‘waiting to be found’ as Michelangelo said of the people ‘trapped’ the huge slabs of marble he simply ‘set free’. There is no one person; there are hundreds, all with Greek-God like pros and cons. Your job is to help find the best balance.
How do you do this? The only way in my 16-17 year career, and decades of obsessive reading, is through an iterative process. In other words, you cannot know what you want before you start so don’t even try. Software is what is known as a ’Wicked Problem’.
I could go off here on another tangent. Just please (unless you already know the idea) educate yourself on what is the central principal in software development which has made our industry a kind of Joke. We have insane losses and a miserable track-record and it is not improving.
http://en.wikipedia.org/wiki/Wicked_problems
All points (from what I have hear, especially the gentleman who likened me to a movie's idiotic middle manager, have merit).
He was just expressing his ‘perspective’ that I was full of it (and perhaps I am).
Hell, I even emailed that person directly to see if I could learn more from them. LEARN! I thought he could teach me something about myself and help me improve how I communicate.
I would not have been angry, only interested in their life experiences that would make them read my writings and think what they concluded. THAT interests me, not any absolute 'right and wrong'.
Others have already benefited from this article, so from a utilitarian perspective, perhaps THAT moral framework calls it a success. In my framework, it is far more complex (and I could argue even the Utilitarian model fails here). I would look at all of the POTENTIALLY negative forces the article would move people towards.
What I've done:
I read the article and the ensuing discussion in full... I did not download the code and play with it, but I did read over it enough to understand what it's doing.
That is excellent. But I would only humbly ask you not attempt to provide 'judgment' or 'right and wrong' here. It does not exist. We are all right and wrong from our perspectives. By sharing mine, I hope some people could see the article in a different light, one that I am paid to represent and one that I would never change as it is the largest good I can do (far more good than any development role n the big-picture on almost all occassions).
Most superstar developers are instinctually against my position here. 10 years ago I would of probably flamed me as well. There is a ying and yang here (sorry this is sounding more like a Dali Lama speech then a post I am starting to realize)…..
I represent the opposing force where I must consider the 3-10 year picture where the 'superstar' developer(s) will probably be long gone. I represent the client's interests and am paid many orders of magnitudes above what a developer is. Why? Because the ROI I provide is many orders of magnitude higher. Am I some tyrant, but kills all creativity and advanced code? HELL NO! Code must be as complicated as it must be, no more! And that in my experience has been pretty damn complex many times.
However as a coder (just as much as I was at 26 – I am now 36) and as a previous 'young superstar' I can instinctively feed the pull of the other side) – OK now this sounds like Star Wars (grin)….
The other main opposing force is that of the developer who:
1) Optimizes code before any indication it needs optimizing as it is challenging and shows his/her peers thier status
2) From their perspective this is almost always the 'right thing to do', even though it may create waste and no tangible benefit (NOT what I am saying about the article. That would have to be looked at on a case by case basis).
What I still don't understand:
Why was this developed?
This was clear: The writer had a requirement to serialize very large data (and did not have any ability to re-architect the solution). For this, it appears to have been the ‘least worst option’.
I wrote my comments as I am very alarmed at the undeniable dynamic of the ‘guru developer culture' of 'screw the client', and just coding the coolest (most complex) work possible.
What is the problem this is solving?
Overall time to transfer via Single Machine-Cross AppDomain or cross-machine serialization of very large objects is significantly reduced (like storing that dataset in the SQL Server session). A bad design, but we all must live with them.
Optimization, in general, is done to create performance gains in a situation where performance is lacking,
I would agree but add: Optimization is almost (and should almost) always be done ONLY after a performance problem has been shown to be a problem. The anticipation of areas to optimize and the work done BEFORE they occur are almost always waste. Why? We have many studies that show we are usually wrong (not always). This is a foundation of Agile. Don't optimize until you must, or if you are sure it will be a problem, do it last. Make sure to get it working first, then add small levels of complexity and unit tests (you started with one remember – TDD) as you Refactor your way to NECESSARY optimization.
Also, "code without Unit Tests - Think NUnit or equivalent - is Legacy code". Why? ‘
You cannot verify the stability of your code base when any change occurs (I am of course speaking of non-trivial apps here) after any changes are made and you almost certainly do not have time to pay a QA individual to manually regression the entire system every day (and that is only SYSTEM Regresssion. You still need Unit Regression).
AGAIN: I highly recommend you all read Michael Feather's book "Working Effectively with Legacy Code". It should be called:
"Being an Excellent Developer: Both on Legacy projects and New Ones - or How not to Code a New Project as a Legacy One"
To most of the readers here, I probably do represent 'the man'. However I can code at the level of 99% of the people here I would guess (in C# or Java across all domains, especially large distributed object systems like the one this article is trying to help.
or is the major failing point to a existing stable system.
I believe the author said this was an existing problem to an unfortunate prior architecture that fundamentally eroded in designing things this way (again, just my 'perspective').
What is the system? Could someone please describe, in simple terms, the system in which this code is playing a part? Could someone also describe how the standard serialization techniques would be applied, and then compare and contrast it with this solution?
Author?
Could someone describe how the standard serialization scheme created a performance problem which was best solved by this optimization?
Easy... It was far too bloated to support the flawed architecture in place (again my perspective) so a kind of 'hack' was required to get around the architectural ignorance that was present I believe before the author became involved. I could be wrong. It would appear however the Author did a damn fine job in the nasty place he found himself in.
I apologize if I missed where this was already clearly explained.
The reason I ask, is that this seems like an interesting topic, and the discussion has some obviously experienced and capable people involved, and if I was able to put it in some kind of concrete context, I think I would get a LOT more out of the discussion as a whole.
I agree. But your 'context' is unlike anyone else's. Your experiences, biases, concerns, etc. make the questions you asked wise ones in my opinion.
What are my thoughts to you? If you have the luxury of architecting this correctly, serializing large single transaction style objects over the wire is a recipe for deep disaster in most cases (and not even my opinion really).
Ask yourself:
1) How are recoveries performed and security enforced?
2) Are these required to be 'guaranteed' in any way if the destination server is down (or overloaded because you are slamming it so hard)?
3) Are there multiple units of work that need to be atomic?
4) Do not rely on sending large data over a network unless you do so in a guaranteed way and usually in a batch style mode, not transactional
5) If you ARE doing ATOMIC work, learn from TCP/IP and other protocols and how they deal with this problem.
6) Design your domain objects with MANY small classes each with very specific and singular responsibilities
7) With the use of Generics, there is little argument among the top .NET gurus (not me) now (as there was before) that sending around DataSets/Types is a legacy concept.
It is just too easy to do Generic Collections which better represent your domain (where almost all development should be focused anyway) and allow you to easily get around the large DataSet scenario that so many people try to force. It's one of the most common consulting 'short-term' engagements’ I get asked to fix:
“A web app has moved session state from 'In-Proc' to SQL Server and now, all of a sudden (with a new farm of web servers) the app takes 10 seconds to load a page instead of 1 when only 1 server existed before”
Why? Every user is storing a 100,000 row DataSet in their session, which must now serialize to SQL Server. When it was in-memory all was OK (bad design from my perspective but still to the business experience, no idea of the problem). This brings home the point of FUTURE CHANGES and Entropy kill systems, not guns (grin).
As a side note, this might also be very useful to me in the near future, as we're currently working on a dataserver architecture for a distributed computing system which will involve extremely large object counts as well as large individual object sizes in some cases, being shuffled around between various systems depending what work needs to be done on the object at the time...
Well, this demands the highest levels of .NET expertise and architectural expertise IN GENERAL across all the .NET Distributed Object technologies. I would be happy to offer ideas as I have done on many systems like this (most for global Financial Services firms in New York and London). I am sure others here could help as well.
Remember: There is basically never a 'best' solution for a scenario but there are almost always MANY bad solutions.
A fast and stable serialization and transport system would benefit this project greatly, and the existing systems have already proven somewhat insufficient for our needs.
No, you already posses a 'fast and stable serialization and transport system' in .NET.
This amazing system is called Remoting and represents millions and millions of dollars of investment. What you DO NOT seem to have is an architecture that will use this 'fast and stable serialization and transport system' in the best way for your needs.
Thanks,
Damon Carr
|
|
|
|
|
Hi Damon,
Thank you for your lengthy response.
Regarding our current data server design, we've been going with a standard setup of .Net Remoting for transport and Serialization for persistence. We have already done a lot of work done that. My main goal was not to think about changing our existing design to implement the custom serialization done here, but rather to gain some perspective on our design by hearing the account of why the original poster choose to implement this design, in detail.
I agree that there as many ways to skin a cat as there are cats with skin, but as everyone knows, you generally start with a knife and a living cat that you have to first chase down.
On my way walking to the light-rail a couple days ago, I saw something that reminded me (for some reason) of this discussion. On a smallish residential street near the city, I saw a crow sort of dancing around under a walnut tree. The crow was playing with a fallen walnut. He picked it up in his beak, then flew to the top of a light pole. He waited there a moment, then dropped the nut on the ground. He immediately flew after it, and tapped it around a little while on the ground, before picking it up again, and flying to the top of the pole once more. He dropped the walnut again, and repeated the whole process about two or three times, before finally, the walnut had broken open, and he pecked away at the soft nut meats inside.
As I said, I don't know what the relevance is, but I thought I'd share.
Talk to you soon,
Troy
|
|
|
|
|
Troy,
The Crow is a magnificant illustration.
If you are at all interested in Philosophy (as I am based on my rather long, on-topic to me but probably 'crazy sounding to most' topics) we are all living the fate of Sisyphus (SIS-i-fus).
http://www.mythweb.com/encyc/entries/sisyphus.html[^]
Who was he?
In Greek Mythology, a sinner condemned in Tartarus (even below hell - a deep, gloomy place, a pit or abyss used as a dungeon of torment and suffering) to an eternity of rolling a boulder uphill then watching it roll back down again. (then push it back up, just to watch it fall back again... FOREVER!)
Sounds KIND of like the crow..
Anyway, I love the book by Camu as well called 'The Myth of Sisyphus'.
http://en.wikipedia.org/wiki/The_Myth_of_Sisyphus[^]
Camus undertakes to answer what he considers to be the central question of philosophy: Does the realization of the meaninglessness and absurdity of life require suicide?
Anyway... Back to more positive and hopefully revenue generating topics....
In spite of posts to the contrary, I would never say it is ALWAYS bad to send large objects via serialization. ALL THINGS IN CONTEXT!
So I am willing to help anyone who asks with this domain issue and offer my suggestion should you be so interested. Just please (as you posted as well) will someone provide a REAL example do we can respond instead of discuss Greek Mythology and Existential Philosophy?
Thanks,
Damon
|
|
|
|
|
Hi Troy
The reasons why I need some optimization are mainly detailed in the article (with some additional detail in these comments) - standard .Net serialization took too much time and memory space and could crash with an out of memory exception under certain circumstances.
Let me start by briefly describing how Serialization works (as I understand it)
Any class that will be involved in serialization/remoting *must* have the [Serializable] attribute applied to it - if the .Net serializer encounters a class anywhere in the object graph that doesn't, it will throw an exception.
During serialization, .Net will examine all of the private fields within a class and attempt to store them in a binary stream. If the field is a value type then it is stored directly otherwise if it is an object type then it is added to the object graph and a reference to it stored - this is done so that a given object will only be stored once.
The examination of objects is done via reflection so that .Net serialization can cope with, in theory, any type of object without any prior knowledge of it. Now we know that reflection is relatively slow in comparison to direct field access but for most serialization/remoting work the performance is acceptable and you don't need to write any code to make it work.
There are some options to 'help' the .Net serializer during its object examination. You can apply a [NonSerializable] attribute to any field to indicate it should be ignored for example but thats about all without writing some code.
The next level of optimization is to implement the ISerializable interface on your class and you will need to implement two methods (well one method and a special constructor). The GetObjectData method allows you to take over the process any store any data you deem necessary to reconstruct your object in the SerializationInfo object passed into the method. It is like a dictionary in that you tag each of your data items with a string. In your deserialization constructor, you do the reverse and exact your data via its string name. One thing you need to be careful of is that the objects you retrieve are not necessarily populated at this point - so don't try to use them - just store them in your fields and they will be populated once serialization is complete.
There is also the optional IDeserializationCallback interface which give you an opportunity to be 'notified' when deserialization of the entire object graph has been completed, ie your objects are now fully populated and usable.
This level allows more control (and possibly some increase in speed since reflection is not used) but involves some manual work on the developer's part. You won't want to do this on all your objects, only those frequently used or that can result in large object graphs.
It still has some drawbacks however. Suppose your object has a object[10] as part of its data. You can either store this as a single object[] or as 10 object items. The latter is actually quicker but involves writing a loop and you will have 10 string names all of which take up extra space.
My code is a utility (not a design!) to allow further optimization as a replacement for some or all of the code you would write in the previous ISerializable optimization. It allows you to store extremely quickly and compactly all your classes 'owned' data in a single byte[]. Instead of identifying which data items you want to store and giving them names, you Write it into a SerializationWriter instance and, once all data has been written, you store the resulting byte[] into the SerializationInfo block using a single name. Most data types have some level of optimization but where it really shines is when you have data where the type is unknown at compile time, such as an object[], in which case it identifies the type and stores it in its most optimized form. The biggest payback though, is when you can identify certain 'root' classes that encapsulate potentially many other objects (a DataSet for example) - by having a single SerializationWriter instance store data for the whole object graph, you get the advantages of string tokenization across all your objects and just one single, relatively small, byte[] to store.
So you are still having to do some manual coding work for optimization but typically no more than if you were using the standard .Net way of using ISerializable and typically on the fewest classes that you need.
In my particular case, DataSets and LLBLGenPro entities/collections (2 completely different projects) gave excellent results for size, speed, and (indirectly) memory fragmentation and network usage. The code only needed to be written once for each type but will work for *any* DataSet we now pass across the network regardless of whether it is small or large or even empty and the same for the LLBLGenPro entities/collections we have now or create in the future because they are all ultimately derived from the same class. Write well once, use many times.
Other posters have found value even when they *know* that the objects they are remoting will always be small - see andreboom's comment about saving 40% even on an object that is around 180 bytes. Another poster needed to serialize web state and found FastSerialization to be much faster than Microsoft's LosFormatter which does a similar thing but only for a few, specific types. v2.1 added support for Surrogate helper classes so that you can move the code for FastSerialization into separate classes to help optimize serialization of classes where you don't have control of the source code - hence a WebFastSerializationHelper sample to get you started.
Damon's main point (there are many!) is that your application should have been designed so that the objects going across the network are small and few - anything else indicating a design flaw. I don't necessarily fully subscribe to this point of view. Yes, by all means reduce what it going across the network but its not the individual object size that causes problems but the object *graph* size and that is very rarely predictable. If you let your user enter range criteria, for a set of reports say, then it is very difficult to predicate how much data this will actually involve and especially if there are a large number of criteria permutations.
The intention of the article is not to tell you how to design or write your application but provide an option for optimization (and some technique to use) where you deem it necessary or desirable. If you can identify certain classes that will be remoted frequently or in large quantities (as part of an object graph, not just their individual size), it may be worthwhile investing a little time to see if they are optimizable using FastSerialization. If you use DataSets or LLBLGenPro entities then the work has already been done for you.
In your particular case, you seem to have already identified that many and possibly large objects will need to be moved around. You can either speak to Damon who will tell you that your design is wrong or you can see whether the speed is acceptable using as-is .Net remoting and, if not, try ISerializable (.Net style) and then ISerializable (Fast Serialization style). Compression is also an option to look at but bear in mind it is usually applied *after* serialization not *during* and so helps only with the network transmission side.
Cheers
Simon
|
|
|
|
|
@Damon Wow! What a clever man! Youre definitely one of those dangerous "code religious" types (database or else). This whole discussion has been taken over by one huge ego massage. Others have found benefit, why try to convince us that we have not when we are not blind?
modified on Thursday, February 07, 2008 3:43:19 AM
|
|
|
|
|
Dear Damon Carr:
I wasn't going to add something to a 1½ year old thread, but I see somebody else couldn't resist, so I'll give in to my innate fish-slapping urges and reply, too.
My favorite quote from your original post:
"Can you really do a better job the Microsoft here?"
Well, yes... yes, he can. He has proven he can, and taken the time to analyze the results and post them for us to review. He has given a set of detailed posts explaining the underlying problem, come up with an answer, presented us with the code, and explained in detail what he is doing to address the problem and why.
He has researched shortcomings of his approach and improved his code. He has listened to the input provided by many readers (even you), and incorporated their responses into his solution. He even tracked down a faster compression library (one that I, at least, had never heard of), improving his results even further.
Simon Hewitt is the perfect consultant: smart, insightful, and clever. He has the perfect balance of theory and practicality, and, most importantly, he solves problems (and his solutions are not the horrific mess you make them out to be).
Damon, if I were looking for a company to contract for work, and read this thread, I would dump agilefactor out of consideration and not give them another thought. I would, however, feel very confident in SimmoTech's ability to meet my needs and would be willing to trust them with my most important projects.
Not only does your post come across as quite arrogant, but I see you wearing blinders, on a moral high road that you will follow regardless of the reality in front of you. You say you are willing to acknowledge circumstances, but that sounds like lip-service to me. Nobody truly willing to acknowledge circumstances would even consider writing a reply like yours.
Be honest: if the circumstances required you to pass large quantities of data across the wire, how willing would you be to do it? How much time would you lose, certain there is a 'better' way? At the end of the day, the goal is a working solution, and the key to success is balance.
You probably can't imagine a case where you would have to pass so much data, so answering those questions might be tough for you. I don't have to imagine any specific situation to acknowledge that such a case could easily exist. But as a lead developer for a data-analysis/metrics company, I could rattle off a dozen situations where we have to aggregate a million rows or more by arbitrary parameters and display the results in various graphs and controls across the web. Simon's code is salvation for a problem we face, one that "better design" wouldn't resolve if we spent the rest of the year on it.
You may feel that your post was respectful and encouraged a "lively debate", but I found it condescending, uninformed, and near-sighted, and every reply you posted merely cast you in a worsening light. These posts are now preserved for countless generations to discover when they search for more information about you and your company.
When you are looking for a "lively debate", consider the difference between "I know best and your way is wrong" and statements such as "The problem I see with this approach", or "I've found there's often a better way of handling this much data", or "I'm concerned that a, b, and c".
When I say statements such as these, I mean them, because (a) no matter how much experience I gain, I know that the person across from me has different, but equally valid, experiences; (b) a tone like that of your post will lead to an attack-and-defend debate rather than an open meeting of minds; and (c) that even if I were 100% right, and the other person 100% wrong, talking down to them and implying they have no idea what they're doing is unproductive.
I am extremely unimpressed with your knowledge, maturity, and professionalism, and sincerely hope you take these thoughts into consideration when you work with others.
Best Regards,
James B
modified 23-Jan-13 22:00pm.
|
|
|
|
|
Hi,
You write, that the WriteString-Method is allways optimized, so I ask me, if the ReadString can deserialize an V1-serialized String-Object.
Backgound: I'm using your FastSerializer V1 to store serialized objects in a database (as BLOB). Is it possible to deserialize these objects by using V2 of your FastSerializer or would there be any problems?
Example:
Greetings
Klaus-Jürgen
|
|
|
|
|
Hi Klaus
Its not really recommended to use different versions for this purpose since even the tiniest change can result in a failure to deserialize - remoting was the original aim where the code would always be the same version. (Having said that I have an app which stores very large DataSets of audit data in a BLOB field - I ran a script to reserialize all of the stored data when I moved to .NET 2. That worked for me but might not be appropriate for you)
Even if the string optimization code was identical between versions, changes to other parts of the code, even a reordering the Enum of type codes, might result in the data not being deserializable.
The easiest way to test this for your particular situation is write a quick app using v2 to read and deserialize your BLOB fields - you will soon get an exception if the data is not exactly in the expected format.
Another alternative to be absolutely safe is to incorporate both versions into your code - just move v1 to a different namespace. You still have the problem of knowing which to run for a given BLOB field though.
Further still, you could change the source code to add some versioning information to the stream.
Persisting serialized data will always have this problem - even Microsoft have this problem.
Cheers
Simon
|
|
|
|
|
Hi,
Great code !
What's the best approach for enum ?
I'm doing it :
writer.Write(MyEnumType.ToString());
... read
_myenumtype = (MyEnumType)Enum.Parse(typeof(MyEnumType),reader.ReadString());
Do you have a better option ?
|
|
|
|
|
Depends on the Enum.
If you will know its type at deserialization time then you can cast it to/from an int and store it optimized.
If you don't know the type, then v2.1 has support in WriteObject for Enums anyway and I hope to release it this week.
Cheers
Simon
|
|
|
|
|
Very Good article...
azam's
|
|
|
|
|
Hi,
First of all, thanks for your work. Rare quality, does as advertised beautifully, and browsing through it made me learn a lot of new stuff (like the excellent ConditionalAttribute). Thank you.
I have noticed a tiny copy/paste error in FastSerializer.cs - class SerializationWriter - Method public void WriteOptimized(Decimal value) :
else if (data[1] <= HighestOptimizable32BitValue && data[0] >= 0) (and again below with data[2]. The second condition should be data[1] >= 0 (and then data[2]).
Best Regards,
Eric.
|
|
|
|
|
Well spotted Eric
'Tis indeed a bug and was picked up in the comments in Part 2 of the article.
Cheers
Simon
Cheers
Simon
|
|
|
|
|
can you post a new version with all the fixes from part 1 & 2 of the article?
|
|
|
|
|
Hi,
first of all, really cool code well done!
i am trying to save the view state of ASP.NET on the server. the problem is that currently the view state is serialized using the losFormatter which is the slowest piece of code i ever saw. i serialize an array list with 5000 objects i created (all of the same type) using your code. the serialization takes 1 second. the serialization is around 1.5MB of data. to deserialize it the losFormatter first needs to deserialize the viewstate to a byte array so i can use your code to deserialize into the 5000 objects. the losFormatter deserialization takes 50 seconds! and your code takes 1-2 seconds to deserialize the stream back into objects
to make a long story short, i thought about replacing the losFormatter class with your class so i can serialize the view state faster. however when trying to do so i found that your serializer does not handle 2 types of objects that are in the view state all the time. these are System.Web.UI.Triplet and System.Web.UI.Pair only other thing that is missing is serialization of the System.Web.UI.StateBag which is basically a dictionary based collection.
can you modify your code to include these types? i would have done it myself (and posted the results of corse) but i am a VB guy and not a C# guy.
|
|
|
|
|
Hi Dan
The problem with including Pair, Triplet and StateBag is that System.Web would need to referenced by the project holding FastSerializer.cs. Now since I don't write Web apps at all, I and presumably other non web-app writers would prefer not to have that reference included in our WinForm apps.
Also, there are a limited number of slots in the Type Enum remaining and whilst it sounds like these classes are frequently used in web apps, it is difficult to decide which classes should be assigned to these slots.
However, I think a solution might be possible. I was thinking of adding support for a 'surrogate' type of class whereby the WriteObject method would give an instance of surrogate class an opportunity to perform the Fast Serialization before the default of using the .NET serializer is used.
This would allow an unlimited number of Types to be supported without having to change the base code and would allow VB developers an easier way to extend FastSerializer.cs without changing C# code.
I envisage it on the lines of defining an interface, say IFastSerializerSurrogate, with members to allow SerializationWriter to query whether a Type is supported; to perform the FastSerialization; to perform the FastDeserialization and to allow surrogate instances to be chained.
Back to your problem. Whether FastSerialization would be effective or not in your circumstances depends very much on the type of objects you will be storing in Pair, Triplet and StateBag. If they are primitive types or self-contained items, ie no circular references, you will probably get excellent results. If there are other object references included and especially if there are circular references then the default binary serializer will be used and at best the objects will be serialized multiple times and at worst may cause an infinite loop.
Having said that, I had a quick look at this LosFormatter class. Hadn't studied this before but there is a bit of deja vu - tokens for known Types, string tables etc! The quick look shows that there doesn't seem to be any particular support for finding circular references and that the resulting data is stored as a Base64 string and encrypted too? - maybe that accounts for the 50 seconds.
There are some classes tokenized by LosFormatter that FastSerialization doesn't currently directly support: IntEnum, HybridDictionary, Color, KnownColor, Unit/EmptyUnit, Pair, Triplet. The former two (possibly four) seem good candidates for addition to the base FastSerializer code and the others as part of a WebFastSurrogateSerializer class.
Due to time constraints, I can't promise anything but I'll try to at least add the surrogate functionality within a week or so.
In the meantime, can you let me know what a web app typically stores in a StateBag and whether you would be prepared to try out any beta code I write?
Cheers
Simon
|
|
|
|
|
Hi Somon,
thanks for the quick reply. the main problem with the losformatter .net 1.1 implementation in that it stores the fast serialization byte stream as base64 string. to read the string it goes in a loop reading 1 char at a time from the base 64 until it reaches the token of the end of the string (since i have around 1.5MB of string that probably takes 50 sec).
as to the problem with triplet & pair serialization (and possibly other types of data who do not support serialization natively). i do agree that adding a type for each of the new objects will finish the type enumerator fast. so what i propose is similar to what you wrote:
add a function to the writer/reader in which i give it a type and a delegate of a function for serialization like this (VB syntax):
function AddSerializationDelegate(Type as System.Type,Delegate as ObjectSerializationDelegate)
on my code i would do something like this
Writer.AddSerializationDelegate(GetType(System.Web.UI.Triplet),AddressOf MySerialize)
Writer.AddSerializationDelegate(GetType(System.Web.UI.Pair),AddressOf MySerialize)
and so on.....
all of these entries go into a collection of specific serializers when serializing if i send any of the types in the collection you call the delegate and provide it with the object and the serialization stream and i in my code will implemnent the serialization/deserialization
as for your question about what is stored in the statebag, almost anything goes. in my case i store an ICollection based class which holds a collection of 5000 objects each with a set of primitive type properties. no self references or anything like that. other apps/pages in my app might store datasets, primitive types and almost anything else if it serializable or the losformatter has a built in support for it.
i will be able to test beta code and help you improve it. you can email me directly so we don't have to do the code development in the forum.
|
|
|
|
|
|
General News Suggestion Question Bug Answer Joke Praise Rant Admin
Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.
|
|