python-openxml / python-docx Goto Github PK
View Code? Open in Web Editor NEWCreate and modify Word documents with Python
License: MIT License
Create and modify Word documents with Python
License: MIT License
I see section breaks are discussed in the Analysis section. Does it mean that it would be added sometime soon? I would need a feature where I can switch page-orientation mid-document. Is there a way to achieve this?
Hi scanny.
I need to insert some picture , but that throw Exception.
code in below.
from docx import Document
from docx.shared import Inches
document = Document()
document.add_heading('Document Title', 0)
p = document.add_paragraph('A plain paragraph having some ')
document.add_picture('amazon.png', width=Inches(1.25))
document.add_picture('web_report.png', width=Inches(1.25))
table = document.add_table(rows=1, cols=3)
hdr_cells = table.rows[0].cells
hdr_cells[0].text = 'Qty'
hdr_cells[1].text = 'Id'
hdr_cells[2].text = 'Desc'
document.add_page_break()
document.save('demo.docx')
and Exception is
Traceback (most recent call last):
File "test_add_picture.py", line 11, in
document.add_picture('web_report.png', width=Inches(1.25))
File "C:\Python27\lib\site-packages\python_docx-0.3.0a1-py2.7.egg\docx\api.py"
, line 83, in add_picture
picture = self.inline_shapes.add_picture(image_path_or_stream)
File "C:\Python27\lib\site-packages\python_docx-0.3.0a1-py2.7.egg\docx\parts\d
ocument.py", line 207, in add_picture
image_part, rId = self.part.get_or_add_image_part(image_descriptor)
File "C:\Python27\lib\site-packages\python_docx-0.3.0a1-py2.7.egg\docx\parts\d
ocument.py", line 64, in get_or_add_image_part
image_part = image_parts.get_or_add_image_part(image_descriptor)
File "C:\Python27\lib\site-packages\python_docx-0.3.0a1-py2.7.egg\docx\package
.py", line 76, in get_or_add_image_part
matching_image_part = self._get_by_sha1(image.sha1)
File "C:\Python27\lib\site-packages\python_docx-0.3.0a1-py2.7.egg\docx\package
.py", line 97, in _get_by_sha1
if image_part.sha1 == sha1:
File "C:\Python27\lib\site-packages\python_docx-0.3.0a1-py2.7.egg\docx\parts\i
mage.py", line 269, in sha1
raise NotImplementedError
NotImplementedError
Thanks for the docs sharing! Got a question, does docx support page orientation, I think this feature (landscape & portrait) is quite useful for common usage, if not support for the moment, does it on a schedule?
It would be nice, especially when using loaded documents, to be able to manipulate a document in some manner beyond appending elements to their parent element. Two possibilities that leap to mind:
index
kwarg to the various add
methods so that they can be used in lieu of insert
. Possibly include a delete
method as well.paragraphs
and runs
properties of the Document
and Paragraph
classes, respectively. Overwrite the various list methods to appropriately modify the underlying Etree elements. For instance, d.paragraphs[3] = "Hello world"
would replace the 4th paragraph with a new hello world paragraph. This could be powerful and flexible, but it also feels hackish. I'm not really sure.Thoughts?
Hello,
it would be nice, if an object could be added to a document as a duplicate of an existing object. Don't I just see how this is done or does it not yet work? Any chance, this feature will be implemented?
Example:
tblList = wdoc.tables
t = tblList[2]
t2 = wdoc.add_table(t)
...
I'd like to create in docx by python-docx such a structured text:
which can also be edited in ms office, when I delete "2.2. section 2", then "2.3. chapter 3", will become “2.2. chapter 3” automatically, and its subsection number will change automatically too, that is "2.3.1. subsection 1" to "2.2.1. subsection 1", "2.3.2. subsection 1" to "2.2.2. subsection 1"
In fact, the structured text in ms word format is from a .xmind file created by xmind 3.4.1, so I wonder whether it can be created by python-docx?
A similar question is about the numbered figures, how can the number of figures can change automatically, for example, when I delete a figure, those figure number behand this figure will reduce 1 automatically.
When inserting pictures into a document, the first picture works just fine, as shown by the example, etc. This problem arises when you go to add a second picture. The second picture addition triggers the sha1 function in the ImagePart class in docx/parts/image.py. This function is currently:
@property
def sha1(self):
"""
SHA1 hash digest of the blob of this image part.
"""
raise NotImplementedError
Which is clearly not very helpful. The fix is quite simple, just add the same sha1 functionality from the Image class in the same file. The resulting routine is:
@property
def sha1(self):
"""
SHA1 hash digest of the blob of this image part.
"""
return hashlib.sha1(self.blob).hexdigest()
Note that this should also probably be a lazyproperty instead of property, but either will work.
I experienced this error under Python 3.3.0 and 3.3.3, but it seems it will happen under any version.
Regards,
Steve
Hi,
Is it possible to extract data from tables using the docx module?
It would be useful to have more examples with regards to learning to use this library.
Regards,
Ben
Really liking the new api, but I have a need to set column widths for a table and am unable to. I would appreciate this module being enhanced to allow for setting column and row properties such as width and height.
in Run.add_text maybe, or might be better to do it closer to the API level, for add_paragraph('string\totherstring') and Paragraph.add_run('text\tseparated\tby\ttabs').
It looks like you want the user to work entirely through python-docx, as Etree elements are abstracted away through wrapper classes. If that's the case, what are you planning with regards to methods such as iter(), find(), xpath expressions etc.? I know that for simpler documents, statements like document.add_paragraph() are sufficient, but I've found lxml methods like the ones I mentioned above to be invaluable for more involved Docx scripting.
I understand I cannot do something like table_cell[0].add_picture() as well as with document?
Can you implement this feautre or I I shouldn't want this issue? :-)
With this version how can i get only de Text from docx as getdocumenttext ?
I'd like to add a feature request to support superscripts and subscripts. Thank you.
After reading the documentation, I cannot find a way to change the size of a cell in a table, much like mentioned here: https://stackoverflow.com/questions/15688389/cell-spanning-multiple-columns-in-table-using-python-docx Is there a way to do this?
I'd like to iterate over the elements of they document as they appear in it. For example if there is a paragraph a table and then a paragraph again, I want to get them in that order. AFAIK currently there are two properties on Document
, paragraphs
and tables
but have no notion of ordering between them.
How could I set the paragraph about the font and font size, is there any way to do it?
Using setup.py install
throws the following error for me in Windows 7.
C:\Users\efredericksen\Documents\GitHub\python-docx>python setup.py install Traceback (most recent call last): File "setup.py", line 24, in <module> LICENSE = open(license).read() File "C:\Users\efredericksen\python33\lib\encodings\cp1252.py", line 23, in de code return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 231: char acter maps to <undefined>
I replaced the “ characters in the LICENSE file with " and the installation ran fine afterwards.
Hi scanny!
I searched about Cell Class API, And i find it can't support now.
so could you tell me , is it can be support in the future ?
Hi scanny.
How do i set table style?
When i using
document.add_table(rows=len(table_data), cols=len(table_data[0]),style="TableGrid")
I search the API , but i didn't find method do it, I need set table size and width and height and color and so on.
"Cell class" also didn't find any method support it.
When inside a cell of a table i have some text that contains a line break the line break is missing when i parse the file.
The cell seem to have only one paragraph (that is correct) but without the Line Break.
The document is created with python-docx and the Line Break is created automatically when i add the text that contains "\n" in the middle at the text member of the cell.
Then i load the document in office and libreoffice, resave the document and reload it in python.
In office and libreoffice the line feed is correct. It's missing only when parsing the file again with python-docx
If i try to replace the "\n" with "\r" the table disapper and i'm left with a list of paragraph
Hi! Thanks for this great library. After looking at some of the XML docs, i really see the pain in creating it ;)
I am developing a webapplication taking user input in a form and generating a docx file from it. Amongst others some fields are formatted in markdown. I am planning to take the markdown fields, convert them to XML (with pandoc or python markdown) and put it into the document via your low level API.
Is there a better/easier way to do this or any plans for implementing markdown directly into python-docx?
greatings
Hello
Does anyone have an example of how to merge 2 word documents into one file?
Thanks,
Greg
A footnote is two thing :
<w:r>
<w:rPr>
<w:rStyle w:val="FootnoteReference"/>
</w:rPr>
<w:footnoteReference w:id="1"/>
</w:r>
<w:footnote w:id="1">
<w:p w:rsidRDefault="00D935D7" w:rsidR="00D935D7">
<w:pPr>
<w:pStyle w:val="FootnoteText"/>
</w:pPr>
<w:r>
<w:rPr>
<w:rStyle w:val="FootnoteReference"/>
</w:rPr>
<w:footnoteRef/>
</w:r>
<w:r>
<w:t xml:space="preserve"> Note</w:t>
</w:r>
</w:p>
</w:footnote>
is there a way to center text?
document.add_picture(img, width=Inches(7))
fileName,extension=os.path.splitext(img)
capt='Figure %d,Meter number %s' % (figureNum, fileName)
c=document.add_paragraph(capt, style='Caption')
Now I would like the caption to be centered on the page?
<w:pStyle w:val="Caption"/>
<w:jc w:val="center"/>
Request to add support for paragraph alignment:
left
, right
, both
, center
...
Should it support True
, False
, and None
.
Is this the correct enumeration: http://msdn.microsoft.com/en-us/library/office/ff835817(v=office.15).aspx
it would be nice to have the ability to insert *.emf images.
I understand how to "style" a complete paragraph, but how do I apply a named style to a part of a paragraph (a "run")? I can "style" it with 'bold' or 'italics' but how about a named character style like "emphasis" or "link"?
I just discovered this great project and I wonder if there is a feature to add a Table of Contents to a document that I create with python-docx.
I need to generate a .docx file for a customer and he wants to have a TOC in it.
We need to change header text-orientation of tables. We are aware that this may not be possible with the current state of the API. We identify the xml-snippet to be inserted using opc-diag as suggested elsewhere. Can we use xml-snippet insertion to achieve this? If yes what is the API-command to do the xml-insertion at a specific point of the docx?
-- sub
I'm creating a docx containing a table with three columns, some of them containing 80 characters of text.
When I open the doc, the first column of the table is wider than the page. When I select table properties and set the width to relative and 100%, it fits the whole table nicely and wraps the text where necessary.
Is there a way to specify the width of the created table to 100% relative?
At the moment I'm digging around in docx/oxml/table.py and using this http://www.docx4java.org/forums/pdf-output-f27/pdf-conversion-table-width-t1233.html as a hint, but pointers would be greatly appreciated!
Document.paragraphs shall contain the sequence of paragraphs corresponding to the "Final" view of the document. Inserted paragraphs appear in the sequence. Deleted paragraphs do not. Moved paragraphs appear in their new location.
I have a use case in which a Cell within a table contains another table. I can extract the paragraphs of the Cell but not the sub-table. I am able to workaround this by traversing the element tree and searching for sub-rows.
It is very easy to create a docx file by python-docx, but I like to search some specific words and count the number it occurs, how can I do in python-docx. I know this can be done in mikemaccana/python-docx, but the mikemaccana/python-docx code grammer is different from python-openxml / python-docx, I do not like to switch to mikemaccana/python-docx .
Provide an underline
property on Run
with semantics similar to .bold
and .italic
, allowing simple underline formatting to be applied to a run, but not precluding the broader set of possible underline styles that are possible, such as dashed, wavy, and double-underline.
Feature Suggestion/Request
Support for inserting/adding Field Codes in a word document. They are a handy feature for report generation type applications (originally intended for automatic mailout merges I believe).
In Office, they make it easy to add dynamic features to a document without getting your fingers all slimey with macros/VBA (although, if designed properly they are accessible from VBA using custom DocProperties and clever references). You work with them using text "markups" that form a restricted scripting framework (hit ctrl+F9 to get started within Word).
They can easily work with DocProperties, named elements (tables, lists, headings), and external text documents that will drive the dynamic content. You can still use styles to drive a document, but with field codes you can adjust/apply syles conditionally.But field codes are admittedly ackward to work with (odd syntax, updating, poor UI tools). Thats where python-docx needs to come in.
I think with a little love, working with field codes could actually be neat, organized, readable, and very functional. They would slip in just lile your other elements...
document.add_fcode("ASK", "chap1_caption", "Type in caption for Chaper 1")
Nesting field codes arbitrarily would be important requirement.
Instead of making custom routines in python-docx that help a user hack together structured portions of a document or specific output patterns, let us use field codes to define those structures expliciltly. And then when we export back into an Office driven workflow, all of the glue is still intact and fully functional.
The main weakness of field codes is that they are typically hidden/disabled by default in Word, but in that regard, its not much different than shipping a document with embedded macros: it is understood that you know how to interact with the extended features.
Proxy classes such as Document, Paragraph, and Table each hold a private reference to the lxml element they correspond to, <w:document>
, <w:p>
, and <w:tbl>
respectively. With these elements, advanced users can call the underlying lxml API directly to develop customized solutions the existing API does not yet support.
Add documentation so advanced users can readily access these elements without consulting the source code.
I'm doing a lot of work with existing docx (creating many docx from a template). I hacked this together but there are better ways I think, any plans to natively increase support in modifying docx? XPATH? This is my main use case.
def replace(document, search, replace):
"""Walk the tree down to w:t xml and update text node"""
searchre = re.compile(search)
count = 0
# Loop over all paras in doc
for para in document.paragraphs:
# Loop over all runs in para
for run in para.runs:
if len(run._r.t_lst) > 1:
raise
if len(run._r.t_lst) == 1:
element_wt = run._r.t_lst[0]
this_text = element_wt.text
if searchre.search(this_text):
newtext = re.sub(search, replace, this_text)
count += 1
element_wt.text = newtext
else:
continue
logging.debug("Replaced {} with {} {} times".format(search, replace, count))
We can easily add ordered list with document.add_paragraph(style='ListNumber')
code. But how can we restart its numbering?
Thanks for python-docx!
I need to be able to add an image in the middle of a document, so document.add_picture doesn't work for my purposes.
To be precise, I have a template .docx which contains the text [$signature], and I need to be able to replace that text wherever it appears with a signature image. Ideally, there would be an add_picture method on the Run class. Would you be willing to accept a pull request that added this?
Here's how I'm currently doing this:
d = Document('template_doc.docx')
# p = paragraph to append image to
# ...
image_part, r_id = d.inline_shapes.part.get_or_add_image_part('sig.png')
shape_id = d.inline_shapes.part.next_id
r=p.add_run()
InlineShape.new_picture(r._r, image_part, r_id, shape_id)
Hi, in your documentation, you detail how to add a table. How about detailing how to read a table from an existing document?
1)will be possible to have a function for to merge/concatenate two file docx with all image paragraphs etc?
In order to modify an existing document
As a developer using python-pptx
I need a way to delete a paragraph
Need to account for the possibility the paragraph contains the last reference to a relationship, such as might a hyperlink or inline picture.
Hi,Scanny,
Is there any way to change font, size, color or style of a paragraph or a character?
Does this package support usage of word-docx templates?
If so how?
It seems, from the discussion at http://stackoverflow.com/questions/22625022/reading-coreproperties-keywords-from-docx-file-with-python-docx
that python-docx can write keywords but not read them.
Could a method/function/etc for reading them be added?
Right now, the quickest way to add text to the last run of paragraph p
(preserving formatting/styles, versus p.add_run()
) is through p.runs[-1].add_text()
, which doesn't look particularly clean. I know that Texts aren't children of Paragraphs, but p.add_text()
is intuitive and I suspect that many people will try to call it when they first start using the library.
Hi all,
Firstly, great work on this project.
I believe I've found a bug in /docx/parts/document.py, in function "next_id".
In some of the documents that I have been using python-docx with, it turns out that some of the IDs are non-numeric. For example, inserting a "print(id_str_lst)" at line 90 in the aforementioned file gives me:
['4', '_x0000_t202', 'Text Box 5', '7', 'Text Box 9', '9', 'Text Box 7', '8', 'Text Box 11', '6', 'Text Box 6', '10', '0', '1', '3', 'Group 4', 'AutoShape 3', '5', '0', '12', '0', '26', '0', '25', '0', '2', '0', '13', '1', '14', '1', '15', '1', '16', '1', '39', '0', '40', '0', '35', '0', '21', '21', '22', '22', '20', '0', '18', '1']
Thus, I would get a ValueError as soon as the second element in the list was processed with "int(id_str)".
I have implemented a workaround by modifying the code for the "next_id" function to the following, to perform a quick check to ensure the id is numeric prior to adding to the list of used IDs:
def next_id(self):
"""
The next available positive integer id value in this document. Gaps
in id sequence are filled. The id attribute value is unique in the
document, without regard to the element type it appears on.
"""
id_str_lst = self._element.xpath('//@id')
used_ids = []
for id_str in id_str_lst:
if id_str.isdigit():
used_ids.append(int(id_str))
for n in range(1, len(used_ids)+2):
if n not in used_ids:
return n
This appears to fix the problem for me.
This is the first time I've ever had input to an open source project, so I am not certain how to go about officially submitting this 'fix' to the repository, and surely a better programmer than I will have a more efficient fix. :-)
Thanks again, and I hope this helps.
Kind regards,
Mike Nye
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.