GithubHelp home page GithubHelp logo

toxicphreak / python-docx-ng Goto Github PK

View Code? Open in Web Editor NEW

This project forked from python-openxml/python-docx

16.0 4.0 2.0 43.91 MB

Create and modify Word documents with Python (next-gen)

License: MIT License

Python 93.58% Makefile 0.09% Gherkin 6.33%

python-docx-ng's Introduction

python-docx-ng

Python Packaging

python-docx-ng is a Python library for creating and updating Microsoft Word (.docx) files. It was originally designed and developed by scanny as python-docx. As he is not actively developing his repo and there are soo many useful pull requests, bringing together a more powerful tool. This repo should merge a lot of those things and create a more powerful version, hopefully bearing the original structure of scanny in mind.

A new documentation section will be build up soon based on Markdown in the docs section. Examples can be found here: examples Older information is available in the python-docx Documentation.

Repo: https://github.com/toxicphreAK/python-docx-ng/blob/master/
Release: https://github.com/toxicphreAK/python-docx-ng/releases
PyPi: https://pypi.org/project/python-docx-ng/

Installation

pip install python-docx-ng

Hint: The library is called docx in python scripts, so use imports like import docx.

Features

  • Extended Properties support python-docx #1206
  • Word 16 (Office 2019) Template (54a1269)
  • Faster & improved tables (#1)
  • SVG support (#4)
  • EMF support (85a30f1)
  • WMF support (9288ec9)
  • Font scaling (#6)
  • Outline level (#7) - shows outline in navigation (e.g. Word or PDF application - not affecting the document itself)
  • RGB color font highlighting (#14)
  • Hyperlink text (#16)
  • .docm file support (#19) - enables marco documents
  • Form fields & AltChunk support (#20)
  • Custom namespaces (#21)
  • Comment support (85a30f1)
  • Footnote support (85a30f1)
  • Shading support (9288ec9)
  • Performance improvements
    • Paragraph.text ([#3}(#3)
    • Cache for table cells (#8)
  • Fixes
    • Fix table issue python-docx#1196 - as table columns were not assigned correctly, see python-docx#1193
    • Fix table merging recusion python-openxml#1208 - replace recursion with for loop
    • add_picture (#10) - fix next_id to support multiple pictures
    • Heading 1 key error due to style capitalization (e.g. in LibreOffice) (#12)
    • Fix XPath for sectPr in document (#15)
    • Reproducible documents (#17) - same binary output with same data
    • AttValue too long in etree xml parser (#24)

Roadmap

  • Document all functionallities building a new sample document with all (most) features included
  • Remove code references to original repo of python-docx
  • Setup new docs (markdown based)
  • Add missing tests

python-docx-ng's People

Contributors

scanny avatar toxicphreak avatar dkwoods avatar ondrej-111 avatar revossen-asml avatar apteryks avatar onlyjus avatar virajkanwade avatar takis avatar eupharis avatar kjhellico avatar lonetwin avatar keisial avatar ziembla avatar yoniy-talon avatar vbeland avatar samzhangjy avatar timgates42 avatar stevecohen42 avatar stdedos avatar yudytskiy avatar fdabek1 avatar edwinsmulders avatar dominiclauyf avatar martin005 avatar brnstz avatar bdgalloway avatar andresrreina avatar

Stargazers

Карабасова Юлия Алексеевна avatar Ruslan avatar Jonathon avatar  avatar Dean Qin avatar  avatar  avatar  avatar  avatar Stefan Meinecke avatar Su Min avatar GuillaumeG avatar Alexey avatar jgrimard avatar #A avatar  avatar

Watchers

James Cloos avatar  avatar Su Min avatar  avatar

python-docx-ng's Issues

Include VBA handling

Use olevba.py to basically support VBA in python.

  1. Reading / Decompressing (nearly as is in file)
  2. Writing ole objects to create some VBA content from nowhere

Catchup with python-docx

Any plans to refresh the alignment between this project and the original python-docx? It looks like they've added a fair few changes since the fork was done.

Failing to import module

Hi, thank you for bringing python-docx back to life!

I'm trying to use python-docx-ng but a basic import of from docx import Document fails for me. Am I missing something here?

STR:

python -m venv .venv
source .venv/bin/activate
echo 'python-docx-ng' > requirements.txt
pip3 install -r requirements.txt
❯ python3
Python 3.10.11 (main, May 10 2023, 11:30:20) [Clang 11.1.0 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from docx import Document
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/erasmas/docx-ng/.venv/lib/python3.10/site-packages/docx/__init__.py", line 3, in <module>
    from docx.api import Document  # noqa
  File "/Users/erasmas/docx-ng/.venv/lib/python3.10/site-packages/docx/api.py", line 14, in <module>
    from docx.package import Package
  File "/Users/erasmas/docx-ng/.venv/lib/python3.10/site-packages/docx/package.py", line 9, in <module>
    from docx.opc.package import OpcPackage
  File "/Users/erasmas/docx-ng/.venv/lib/python3.10/site-packages/docx/opc/package.py", line 9, in <module>
    from docx.opc.part import PartFactory
  File "/Users/erasmas/docx-ng/.venv/lib/python3.10/site-packages/docx/opc/part.py", line 13, in <module>
    from ..oxml import parse_xml
  File "/Users/erasmas/docx-ng/.venv/lib/python3.10/site-packages/docx/oxml/__init__.py", line 334, in <module>
    from .comment import CT_Comments,CT_Com, CT_CRE, CT_CRS, CT_CRef
  File "/Users/erasmas/docx-ng/.venv/lib/python3.10/site-packages/docx/oxml/comment.py", line 8, in <module>
    from ..text.paragraph import Paragraph
  File "/Users/erasmas/docx-ng/.venv/lib/python3.10/site-packages/docx/text/paragraph.py", line 13, in <module>
    from .run import Run
  File "/Users/erasmas/docx-ng/.venv/lib/python3.10/site-packages/docx/text/run.py", line 11, in <module>
    from docx.opc.part import PackURI, Part
ImportError: cannot import name 'PackURI' from partially initialized module 'docx.opc.part' (most likely due to a circular import) (/Users/erasmas/docx-ng/.venv/lib/python3.10/site-packages/docx/opc/part.py)

Review and merge bigger forks

Word Default Template

Write a macro adding all default styles to new document and remove whole document content. Save it as default.docx.
Actually it is creating a new document adding all styles by hand and removing content from file as only by adding e.g. a new heading to the doc it will be in the XML (only styles used once are integrated).

Remove author from doc props after creation.

read TOC of a document through paragraph

Hi,

I am trying to get TOC of my document, but can not get through paragraphs, so I use etree to pragraph element.

here is my document, and I check source seems TOC packed in w:std, is it root cause?

just part from source( get source by rename to zip file andunzip)

<w:sdt>
<w:sdtPr>
<w:rPr>
<w:rFonts w:ascii="Tahoma" w:eastAsia="微软雅黑" w:hAnsi="Tahoma" w:cs="黑体"/>
<w:sz w:val="22"/>
<w:lang w:val="zh-CN"/>
</w:rPr>
<w:id w:val="1773670984"/>
<w:docPartObj>
<w:docPartGallery w:val="Table of Contents"/>
<w:docPartUnique/>
</w:docPartObj>
</w:sdtPr>
<w:sdtEndPr>
<w:rPr>
<w:b/>
<w:bCs/>
</w:rPr>
</w:sdtEndPr>
<w:sdtContent>
<w:p w14:paraId="0B4A64B5" w14:textId="77777777" w:rsidR="001A16C5" w:rsidRDefault="001A16C5">
<w:pPr>
<w:pStyle w:val="TOC1"/>
</w:pPr>
</w:p>
<w:p w14:paraId="400634CB" w14:textId="77777777" w:rsidR="001A16C5" w:rsidRDefault="00000000">
<w:pPr>
<w:pStyle w:val="TOC1"/>
<w:rPr>
<w:rFonts w:asciiTheme="minorHAnsi" w:eastAsiaTheme="minorEastAsia" w:hAnsiTheme="minorHAnsi" w:cstheme="minorBidi"/>
<w:kern w:val="2"/>
<w:sz w:val="21"/>
</w:rPr>
</w:pPr>
<w:r>
<w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r>
<w:instrText xml:space="preserve"> TOC \o "1-3" \h \z \u </w:instrText>
</w:r>
<w:r>
<w:fldChar w:fldCharType="separate"/>
</w:r>
<w:hyperlink w:anchor="_Toc131613027" w:history="1">
<w:r>
<w:rPr>
<w:rStyle w:val="afa"/>
<w:rFonts w:ascii="黑体" w:hAnsi="黑体"/>
</w:rPr>
<w:t>摘  要</w:t>
</w:r>
<w:r>
<w:tab/>
</w:r>
<w:r>
<w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r>
<w:instrText xml:space="preserve"> PAGEREF _Toc131613027 \h </w:instrText>
</w:r>
<w:r>
<w:fldChar w:fldCharType="separate"/>
</w:r>
<w:r>
<w:t>I</w:t>
</w:r>
<w:r>
<w:fldChar w:fldCharType="end"/>
</w:r>
</w:hyperlink>
</w:p>
<w:p w14:paraId="49241CB2" w14:textId="77777777" w:rsidR="001A16C5" w:rsidRDefault="00000000">
<w:pPr>
<w:pStyle w:val="TOC1"/>
<w:rPr>
<w:rFonts w:asciiTheme="minorHAnsi" w:eastAsiaTheme="minorEastAsia" w:hAnsiTheme="minorHAnsi" w:cstheme="minorBidi"/>
<w:kern w:val="2"/>
<w:sz w:val="21"/>
</w:rPr>
</w:pPr>
<w:hyperlink w:anchor="_Toc131613028" w:history="1">
<w:r>
<w:rPr>
<w:rStyle w:val="afa"/>
<w:rFonts w:eastAsia="宋体"/>
<w:b/>
</w:rPr>
<w:t>ABSTRACT</w:t>
</w:r>
<w:r>
<w:tab/>
</w:r>
<w:r>
<w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r>
<w:instrText xml:space="preserve"> PAGEREF _Toc131613028 \h </w:instrText>
</w:r>
<w:r>
<w:fldChar w:fldCharType="separate"/>
</w:r>
<w:r>
<w:t>II</w:t>
</w:r>
<w:r>
<w:fldChar w:fldCharType="end"/>
</w:r>
</w:hyperlink>
</w:p>
<w:p w14:paraId="096DC769" w14:textId="77777777" w:rsidR="001A16C5" w:rsidRDefault="00000000">
<w:pPr>
<w:pStyle w:val="TOC1"/>
<w:rPr>
<w:rFonts w:asciiTheme="minorHAnsi" w:eastAsiaTheme="minorEastAsia" w:hAnsiTheme="minorHAnsi" w:cstheme="minorBidi"/>
<w:kern w:val="2"/>
<w:sz w:val="21"/>
</w:rPr>
</w:pPr>
<w:hyperlink w:anchor="_Toc131613029" w:history="1">
<w:r>
<w:rPr>
<w:rStyle w:val="afa"/>
</w:rPr>
<w:t>第</w:t>
</w:r>
<w:r>
<w:rPr>
<w:rStyle w:val="afa"/>
</w:rPr>
<w:t>1</w:t>
</w:r>
<w:r>
331.docx

Add contributing guidelines

Hello @toxicphreAK, so glad you started a fork like this and have begun to sift through things. Thanks for your work on this.

Would you like some help? Any thoughts about how you'd like contributions to happen? I can draft some guidelines if you give me a steer, and then hopefully start contributing in other ways.

Can't style a table.

    table.style = document.styles["Table Grid"]
                  ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
  File "/home/user/folder/.direnv/python-3.11.8/lib/python3.11/site-packages/docx/styles/styles.py", line 57, in __getitem__
    raise KeyError("no style with name '%s'" % key)
KeyError: "no style with name 'Table Grid'"

I have done some research and the object CT_Styles that is used in my Document i think something is wrong with it.
Being self an instance of CT_Styles

(Pdb) pp [style.name.val for style in self._iter_styles()]
['Normal',
 'heading 1',
 'heading 2',
 'heading 3',
 'heading 4',
 'heading 5',
 'Default Paragraph Font',
 'Normal Table',
 'No List',
 'Title',
 'Title Char',
 'Heading 1 Char',
 'Heading 2 Char',
 'Heading 3 Char',
 'Heading 4 Char',
 'Heading 5 Char',
 'Subtitle',
 'Subtitle Char',
 'Subtle Emphasis',
 'Emphasis',
 'Intense Emphasis',
 'Strong',
 'Quote',
 'Quote Char',
 'Intense Quote',
 'Intense Quote Char',
 'Subtle Reference',
 'Intense Reference',
 'Book Title',
 'List Paragraph',
 'No Spacing',
 'List Number',
 'List Bullet',
 'macro',
 'Macro Text Char',
 'Mention',
 'Message Header',
 'Message Header Char',
 'Plain Text',
 'Plain Text Char']

I tested the same in python-docx and this happened:

(Pdb) styles = [item.name.val for item in self._iter_styles()]
(Pdb) styles
['Normal', 'header', 'Header Char', 'footer', 'Footer Char', 'heading 1', 'heading 2', 'heading 3', 'heading 4', 'heading 5', 'heading 6', 'heading 7', 'heading 8', 'heading 9', 'Default Paragraph Font', 'Normal Table', 'No List', 'No Spacing', 'Heading 1 Char', 'Heading 2 Char', 'Heading 3 Char', 'Title', 'Title Char', 'Subtitle', 'Subtitle Char', 'List Paragraph', 'Body Text', 'Body Text Char', 'Body Text 2', 'Body Text 2 Char', 'Body Text 3', 'Body Text 3 Char', 'List', 'List 2', 'List 3', 'List Bullet', 'List Bullet 2', 'List Bullet 3', 'List Number', 'List Number 2', 'List Number 3', 'List Continue', 'List Continue 2', 'List Continue 3', 'macro', 'Macro Text Char', 'Quote', 'Quote Char', 'Heading 4 Char', 'Heading 5 Char', 'Heading 6 Char', 'Heading 7 Char', 'Heading 8 Char', 'Heading 9 Char', 'caption', 'Strong', 'Emphasis', 'Intense Quote', 'Intense Quote Char', 'Subtle Emphasis', 'Intense Emphasis', 'Subtle Reference', 'Intense Reference', 'Book Title', 'TOC Heading', 'Table Grid', 'Light Shading', 'Light Shading Accent 1', 'Light Shading Accent 2', 'Light Shading Accent 3', 'Light Shading Accent 4', 'Light Shading Accent 5', 'Light Shading Accent 6', 'Light List', 'Light List Accent 1', 'Light List Accent 2', 'Light List Accent 3', 'Light List Accent 4', 'Light List Accent 5', 'Light List Accent 6', 'Light Grid', 'Light Grid Accent 1', 'Light Grid Accent 2', 'Light Grid Accent 3', 'Light Grid Accent 4', 'Light Grid Accent 5', 'Light Grid Accent 6', 'Medium Shading 1', 'Medium Shading 1 Accent 1', 'Medium Shading 1 Accent 2', 'Medium Shading 1 Accent 3', 'Medium Shading 1 Accent 4', 'Medium Shading 1 Accent 5', 'Medium Shading 1 Accent 6', 'Medium Shading 2', 'Medium Shading 2 Accent 1', 'Medium Shading 2 Accent 2', 'Medium Shading 2 Accent 3', 'Medium Shading 2 Accent 4', 'Medium Shading 2 Accent 5', 'Medium Shading 2 Accent 6', 'Medium List 1', 'Medium List 1 Accent 1', 'Medium List 1 Accent 2', 'Medium List 1 Accent 3', 'Medium List 1 Accent 4', 'Medium List 1 Accent 5', 'Medium List 1 Accent 6', 'Medium List 2', 'Medium List 2 Accent 1', 'Medium List 2 Accent 2', 'Medium List 2 Accent 3', 'Medium List 2 Accent 4', 'Medium List 2 Accent 5', 'Medium List 2 Accent 6', 'Medium Grid 1', 'Medium Grid 1 Accent 1', 'Medium Grid 1 Accent 2', 'Medium Grid 1 Accent 3', 'Medium Grid 1 Accent 4', 'Medium Grid 1 Accent 5', 'Medium Grid 1 Accent 6', 'Medium Grid 2', 'Medium Grid 2 Accent 1', 'Medium Grid 2 Accent 2', 'Medium Grid 2 Accent 3', 'Medium Grid 2 Accent 4', 'Medium Grid 2 Accent 5', 'Medium Grid 2 Accent 6', 'Medium Grid 3', 'Medium Grid 3 Accent 1', 'Medium Grid 3 Accent 2', 'Medium Grid 3 Accent 3', 'Medium Grid 3 Accent 4', 'Medium Grid 3 Accent 5', 'Medium Grid 3 Accent 6', 'Dark List', 'Dark List Accent 1', 'Dark List Accent 2', 'Dark List Accent 3', 'Dark List Accent 4', 'Dark List Accent 5', 'Dark List Accent 6', 'Colorful Shading', 'Colorful Shading Accent 1', 'Colorful Shading Accent 2', 'Colorful Shading Accent 3', 'Colorful Shading Accent 4', 'Colorful Shading Accent 5', 'Colorful Shading Accent 6', 'Colorful List', 'Colorful List Accent 1', 'Colorful List Accent 2', 'Colorful List Accent 3', 'Colorful List Accent 4', 'Colorful List Accent 5', 'Colorful List Accent 6', 'Colorful Grid', 'Colorful Grid Accent 1', 'Colorful Grid Accent 2', 'Colorful Grid Accent 3', 'Colorful Grid Accent 4', 'Colorful Grid Accent 5', 'Colorful Grid Accent 6']
(Pdb) "Table Grid" in styles
True

I think you are missing something.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.