Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: Java
Hi friends,
I am trying to extract the contents of ODT files for indexing.
Let me elaborate.
 
The following are the steps i follow to extract the contents of the odt file:
 
Steps
1 - convert the odt file into a temporary zip file.
2 - loop thru the files inside and retrieve the 'content.xml' file.
3 - the actual content of the ODT file resides in an xml element called <text:p>
4 - index the contents retrieved from <text:p>
 

I am having trouble in step 3.
I do not have the content.xml's schema. Only with the schema, i can generate the respective java classes of the elements.
 
Pls guide me
Posted 8-Mar-10 21:57pm
Edited 9-Mar-10 21:10pm
v5
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

And which part of your program are you having trouble with?
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 3

koolshiva wrote:
But it doesn't work.

 
Sorry, but that really does not help anyone to guess what might be wrong. Take a look at this article[^] for guidance on reading XML data.
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 5

Hey friends,
 
I have found an alternative. I am using SAX instead of JAXB now. I already had this option, but i personally preferred JAXB owing to performance.
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 2

I am using JAXB to extract from the 'content.xml' file in the odt. I am unable to get the XML Schema of the content.xml file. I tried generating it from the xml using hitsw site. But it doesn't work.
  Permalink  
Comments
Sudhakar Shinde at 28-May-13 2:17am
   
You have to put this as a comment and not as an answer.
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 4

Sorry for not being specific. Let me elaborate.
 
The following are the steps i follow to extract the contents of the odt file:
 
Steps
1 - convert the odt file into a temporary zip file.
2 - loop thru the files inside and retrieve the 'content.xml' file.
3 - the actual content of the ODT file resides in an xml element called <text:p>
4 - index the contents retrieved from <text:p>>
 
I am having trouble in step 3.
I do not have the content.xml's schema. Only with the schema, i can generate the respective java classes of the elements.
 
Pls guide me
  Permalink  
v2
Comments
Sudhakar Shinde at 28-May-13 2:16am
   
You have to put this as a comment and not as an answer.
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 6

could you share your source code for me? I have the same questions
  Permalink  

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 Sergey Alexandrovich Kryukov 269
1 OriginalGriff 180
2 DamithSL 180
3 Peter Leow 125
4 Kornfeld Eliyahu Peter 95
0 OriginalGriff 7,355
1 DamithSL 5,254
2 Sergey Alexandrovich Kryukov 4,942
3 Maciej Los 4,906
4 Kornfeld Eliyahu Peter 4,514


Advertise | Privacy | Mobile
Web02 | 2.8.141223.1 | Last Updated 25 May 2013
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100