Click here to Skip to main content
15,880,427 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I wanted to copy the content of one docx/document to another docx/document at particular position.

Note: Content might be having text data with background color, tables, smartart and images.

What I have tried:

def add_para_data(output_doc_name, paragraph):
    """
    Write the run to the new file and then set its font, bold, alignment, color etc. data.
    """
    output_para = output_doc_name.add_paragraph()
    for run in paragraph.runs:
        # output_run = output_para.add_run(run.text)
        output_run = output_para.add_run()

        # Run's bold data
        output_run.bold = run.bold
        # Run's italic data
        output_run.italic = run.italic
        # Run's underline data
        output_run.underline = run.underline
        # Run's color data
        output_run.font.color.rgb = run.font.color.rgb
        # Run's font data
        output_run.font.name = run.font.name
        output_run.font.highlight_color = run.font.highlight_color
        output_run.font.size = run.font.size
        # output_run.style.name = run.style.name
        output_run.style = run.style



    # Paragraph's alignment data
    output_para.style = paragraph.style
    output_para.alignment = paragraph.alignment
    output_para.paragraph_format.alignment = paragraph.paragraph_format.alignment
    output_para.paragraph_format.widow_control = paragraph.paragraph_format.widow_control
Posted
Comments
Richard MacCutchan 19-Jul-19 10:52am    
And, what is the question?
Raman_patil 13-Aug-19 2:43am    
How to copy paragraph of docx to another docx and retain all styles
Richard MacCutchan 13-Aug-19 4:05am    
You need to study the documentation for the package that you are using, to see how it identifies different data types.
Raman_patil 13-Aug-19 8:38am    
I gone through the documentation of python-docx.
It reads only content and it won't identify the images and tables inside the paragraph in between text data. The python-docx will identifies all the images and tables present in whole document not particularly with paragraph.

I used some more modules, please find the observations below:
1. Python-docx:
• Able to read the text data by iterating over paragraph, however paragraph won’t detect the table and images in it
• We have to use the document object which will detect all the tables and images present in the whole document

2. Tika:
• Tika reads both text data and meta data(like images) in separate variable
• Tika reads whole content in one single string so it won’t be possible to iterate over it

3. Win32:
• Unable to read the data with formats

4. Converting docx to xml:
• Bullets were not getting copied along with the content
• Image will be present in the different directory which has to copied separately in the output file, even if we are able to get the image path we are unable to add it to specific position in the output docx file

5. Converting docx to HTML:
• Unable to copy the bullet point
• Image is stored in html page with Base64 encoding, which will not be store in the directory
• Base64 is basically used for representing data in an ASCII string format

6. BytesIO:
• Able to read and write complete data, however this data will not be iterated line by line, please find the Documentation

7. Mammoth:
• Similar to HTML parsing

8. Pydocx:
• Similar to HTML parsing
Richard MacCutchan 13-Aug-19 12:50pm    
Looks like you will have to find some other way of reading them. If you know C# you can use the Microsoft provided interop libraries.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900