Feels good to be a part of this forum of technical stalwarts and Gurus. I feel I will get a solution for my long time problem here .
The problem we are facing here is the performance issues when processing the large XML files. Using Java API transformer factory in interpretive mode takes unbelievable 6 days for processing huge files which is a point of very serious concern in a production system.
Please help me out and do let me know what are the different things required to be attached which will be helpful for the analysis.
takes unbelievable 6 days for processing huge files
Most people would have figured there was something wrong before this. Take a look at the data you are processing and how you are dealing with it. The chances of anyone being able to guess what is wrong is not very high.
Answer is no , to the people whom I have checked with was not able to diagnose where the problem is and what is the way around it. Everyone definitely understand that the volume of data in the file is huge , say 100MB xml document but no one till data have given a way to work around it efficiently. Let me know if you wish to have a glance and will look forward to your suggestions
This is not the sort of issue that can be resolved in a forum like this. You need to do some analysis of your code and measuring of your processing time and code paths. Try running against some small data files and see how long it takes and try to work out whether that is an acceptable length of time.
I have done my analysis and I guess you do not understand actually what the issue is.Do understand am not a kinder garden student to just post a question and expect a ready made answer as I have done quite a lot of research in it more than you . I will look for someone who can just help me out on what are the different parameters that can be looked for rather than just replying generic. Please do not bother to reply further. Thanks again.
I have asked specifically what sort of information I should update here which is required for your analysis as I don't know what logs, JVM settings , XSL snippets which is taking more memory and time you may require for your analysis.If you have asked me any one of the above details or whatever you think is required, I can definitely provide them at first place.
Like most people on this forum, I do not have the time or resources to analyse an issue like this. As I said earlier this question cannot be answered in a forum such as this. You need to do the analysis and ask a more specific and detailed question before anyone can hope to offer any suggestions.
Have you tried using a profiler? A colleague of mine wrote an xml parser/validator in .Net for large files and the first file would complete in about 20-30 minutes. After the first though, it would get incredibly slow, taking around 18-30 hours to process 50mb files. When we checked the memory allocations there were about 10 times more than necessary. The app was running out of memory and continually hitting the page file on disk to make up the difference. All of that because he didn't understand the immutable nature of .Net Strings.
EDIT - I realize you are not using .Net, but I am thinking you could try a profiler for java.
That is the best advice I can give, good luck friend.
The issue is most likely poor XML and XSLT design. Too many people refuse to take the time to understand their data and XML and in the rush to 'get something working' they create a schema that is bloated, incomplete and that requires a lot of resources to even attempt to use.
Six days for 100MB? Yes - I can say that the schema does not represent the data well and the XSL is probably even worse because it must translate this rickety XML into something that might be even more poorly designed and implemented and it is apparent that the XSL has been where all of the changes and fixes have been applied.
For starters -- do not avoid using attributes simply because elements seem 'easier' -- they are not. Things that describe a thing -- are attributes. Things that own things -- are usually elements.
I post here because -- when you try to engineer a system and write the entire application at the same time -- all the while saying -- just get it running, we'll fill in the details later -- and also you take alot of shortcuts ('it will be easier if we do ....')....all you are doing is moving the work further down the line and making it more difficult....
There is a definite amount of work that must be done -- never assume that you can avoid it and do as much up front as you can.
I have a program I have written that displays the contents of a xml file in a browser. I am trying to make it more user friendly for people to change the xml files without opening them in notepad or a canned xml editor. What I would like to do is be able to pull the information from the xml file into a webpage and allow the users to make changes and save it back to the xml file. I've done google searches for about a week and can't come up with a good way to do it. Any suggestions would be much appreciated.
Hello. Im trying to create an xml schema that can capture the following behavior:
1. <info> element should be the first inside <root>
2. <ParamA>, <ParamB>, <ParamC>, <ParamD>, and <ParamE> are optional, should appear after the <info> element and can appear in any order.
The following xml are valid:
<root><info></info><ParamB></ParamB><!--Interchangeable with other Param--><ParamD></ParamD><!--Interchangeable with other Param--><ParamC></ParamC><!--Interchangeable with other Param--><ParamA></ParamA><!--Interchangeable with other Param--><ParamE></ParamE><!--Interchangeable with other Param--></root>
The following xml is NOT valid:
<root><ParamB></ParamB><ParamD></ParamD><ParamC></ParamC><ParamA></ParamA><info></info><!--This should appear as the first element inside root--><ParamE></ParamE></root>