Click here to Skip to main content
12,701,854 members (34,988 online)
Rate this:
 
Please Sign up or sign in to vote.
See more: Java
Hi friends,
I am trying to extract the contents of ODT files for indexing.
Let me elaborate.

The following are the steps i follow to extract the contents of the odt file:

Steps
1 - convert the odt file into a temporary zip file.
2 - loop thru the files inside and retrieve the 'content.xml' file.
3 - the actual content of the ODT file resides in an xml element called <text:p>
4 - index the contents retrieved from <text:p>


I am having trouble in step 3.
I do not have the content.xml's schema. Only with the schema, i can generate the respective java classes of the elements.

Pls guide me
Posted 8-Mar-10 21:57pm
Updated 9-Mar-10 21:10pm
v5
Rate this: bad
 
good
Please Sign up or sign in to vote.

Solution 1

And which part of your program are you having trouble with?
  Permalink  
Rate this: bad
 
good
Please Sign up or sign in to vote.

Solution 3

koolshiva wrote:
But it doesn't work.


Sorry, but that really does not help anyone to guess what might be wrong. Take a look at this article[^] for guidance on reading XML data.
  Permalink  
Rate this: bad
 
good
Please Sign up or sign in to vote.

Solution 5

Hey friends,

I have found an alternative. I am using SAX instead of JAXB now. I already had this option, but i personally preferred JAXB owing to performance.
  Permalink  
Rate this: bad
 
good
Please Sign up or sign in to vote.

Solution 2

I am using JAXB to extract from the 'content.xml' file in the odt. I am unable to get the XML Schema of the content.xml file. I tried generating it from the xml using hitsw site. But it doesn't work.
  Permalink  
Comments
Sudhakar Shinde 28-May-13 2:17am
   
You have to put this as a comment and not as an answer.
Rate this: bad
 
good
Please Sign up or sign in to vote.

Solution 4

Sorry for not being specific. Let me elaborate.

The following are the steps i follow to extract the contents of the odt file:

Steps
1 - convert the odt file into a temporary zip file.
2 - loop thru the files inside and retrieve the 'content.xml' file.
3 - the actual content of the ODT file resides in an xml element called <text:p>
4 - index the contents retrieved from <text:p>>

I am having trouble in step 3.
I do not have the content.xml's schema. Only with the schema, i can generate the respective java classes of the elements.

Pls guide me
  Permalink  
v2
Comments
Sudhakar Shinde 28-May-13 2:16am
   
You have to put this as a comment and not as an answer.
Rate this: bad
 
good
Please Sign up or sign in to vote.

Solution 6

could you share your source code for me? I have the same questions
  Permalink  

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
Top Experts
Last 24hrsThis month


Advertise | Privacy | Mobile
Web02 | 2.8.170118.1 | Last Updated 25 May 2013
Copyright © CodeProject, 1999-2017
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100