bokuweb / docx-rs Goto Github PK

View Code? Open in Web Editor NEW

323.0 10.0 56.0 11.24 MB

:memo: A .docx file writer with Rust/WebAssembly.

Home Page: https://bokuweb.github.io/docx-rs/

License: MIT License

JavaScript 3.13% Rust 86.19% HTML 0.02% Makefile 0.03% TypeScript 10.62%

docx-rs's Introduction

A .docx file `writer` with Rust/WebAssembly.

Installation

Rust

[dependencies]
docx-rs = "0.4"

Browser/Node.js

$ yarn add docx-wasm

Example

Rust

use docx_rs::*;

pub fn hello() -> Result<(), DocxError> {
    let path = std::path::Path::new("./hello.docx");
    let file = std::fs::File::create(path).unwrap();
    Docx::new()
        .add_paragraph(Paragraph::new().add_run(Run::new().add_text("Hello")))
        .build()
        .pack(file)?;
    Ok(())
}

Browser

import { saveAs } from "file-saver";

// // Note that a dynamic `import` statement here is required due to webpack/webpack#6615,
import("docx-wasm").then((w) => {
  const { buffer } = new w.Docx()
    .addParagraph(
      new w.Paragraph().addRun(new w.Run().addText("Hello world!!"))
    )
    .build();
  saveAs(new Blob([buffer]), "hello.docx");
});

Node.js

const w = require("docx-wasm");
const { writeFileSync } = require("fs");

const { buffer } = new w.Docx()
  .addParagraph(new w.Paragraph().addRun(new w.Run().addText("Hello world!!")))
  .build();

writeFileSync("hello.docx", Buffer.from(buffer));

More examples

Development

Requirements

Node.js 16+
yarn 1+
wasm-pack0.10.1 (https://rustwasm.github.io/wasm-pack/)
insta (https://github.com/mitsuhiko/insta)

Examples

You can run example with following code. Please see examples directory.

$ cargo run --example [EXAMPLE_NAME]

For Example if you want to run hello example. Please run following command.

$ cargo run --example hello

So you can see output file in output directory.

Testing

Rust

Please run following command.

make lint && make test

If snapshot testing is failed, fix code or update snapshot files. (See https://insta.rs/).

$ cargo-insta review

Then re run test.

$ make test

Wasm

Please run following command.

$ cd docx-wasm && yarn install && yarn test

If snapshot testing is failed, fix code or update snapshot files. (See https://jestjs.io/docs/snapshot-testing).

$ yarn test -- --updateSnapshot

Features

docx-rs's People

Contributors

Stargazers

Watchers

docx-rs's Issues

Describe the bug

A clear and concise description of what the bug is.

Reproduced step

Steps to reproduce the behavior:

Impossible to change the orientation of a page to Landscape with the function page_orient(PageOrientationType::Landscape)

Expected behavior

Change the orientation of the page

Actual behavior

Nothing happen

Desktop (please complete the following information)

OS: [mac OS]
Version [0.4.6]

quick_xml has better performance（around 50 times faster） than xml-rs

Is your feature request related to a problem? Please describe.

On a particular file, quick-xml is around 50 times faster than xml-rs crate.
https://github.com/tafia/quick-xml#performance-1

A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Warning

These dependencies are deprecated:

Datasource	Name	Replacement PR?
npm	`text-encoding`

Rate-Limited

These updates are currently rate-limited. Click on a checkbox below to force their creation now.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

cargo

docx-core/Cargo.toml

xml-rs 0.8.4

thiserror 1.0

zip 0.6.3

serde 1.0

serde_json 1.0

base64 0.13.1

image 0.24.4

wasm-bindgen 0.2.78

ts-rs 6.1

pretty_assertions 1.3.0

insta 1.16

docx-wasm/Cargo.toml

wasm-bindgen 0.2.78

console_error_panic_hook 0.1.7

github-actions

.github/workflows/ci.yml

actions-rs/toolchain v1

actions/cache v1

actions/cache v1

actions/cache v1

actions-rs/toolchain v1

actions/cache v1

actions/cache v1

actions/checkout v2

actions-rs/toolchain v1

actions/checkout v1

actions-rs/toolchain v1

actions-rs/cargo v1

html

docs/index.html

FileSaver.js 2.0.5

docx-wasm/assets/template.html

FileSaver.js 2.0.5

npm

docx-wasm/package.json

@types/file-saver 2.0.7

@wasm-tool/wasm-pack-plugin 1.6.0

adm-zip 0.5.14

cpy-cli 4.2.0

file-saver 2.0.5

html-webpack-plugin 5.5.3

jest 28.1.3

npm-run-all2 5.0.2

text-encoding 0.7.0

ts-loader 9.4.2

typescript 4.9.3

webpack 4.46.0

webpack-cli 5.0.1

webpack-dev-server 3.11.3

webpack-merge 5.8.0

serialize-javascript 6.0.2

vrt/package.json

glob ^8.0.0

libreoffice-convert 1.3.5

Check this box to trigger a request for Renovate to run again on this repository

Action Required: Fix Renovate Configuration

There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.

Error type: Cannot find preset's package (github>whitesource/merge-confidence:beta)

Describe the bug

A clear and concise description of what the bug is.

Reproduced step

Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior

A clear and concise description of what you expected to happen.

Actual behavior

A clear and concise description of what you actual to happen.

Screenshots

If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information)

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

How to add arbitrary lines in single TableCell?

Hi everyone.
I want to create a table cell which contain arbitrary lines of content ( the number of lines are determined at runtime).

head
line1 line2 ... lineN

N is determined at runtime.

How can this be accomplished?

Strike is only available on run_property

Describe the bug

Hello, thanks for this crate. It is very useful. I see that strike() which to my knowledge should be for ~~strikethrough~~ is only in run_property and not for run itself.

Reproduced step

Run::new().strike()

Expected behavior

I think strike() should exist in run and not only run_property.

Actual behavior

does not exist, while bold, italic, etc does exist.

Screenshots

If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information)

OS: Windows 11
Version 0.4.7

Is your feature request related to a problem? Please describe.

when importing a docx file the value of specVanish is not translated from the XML to the output

Describe the solution you'd like

capture the specVanish value and pass it into the output

Describe alternatives you've considered

N/A

Additional context

https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.wordprocessing.specvanish?view=openxml-2.8.1

you are capturing vanish but specVanish is not and they have slightly different meanings

Share images/pics/drawings between elements

Is your feature request related to a problem? Please describe.

I need to render an identical pictures grid. if i try to reuse elements, docx-rs generates invalid docx file.

With copying image buffer generator takes a lot of time and generates pretty large files

But if i open with word and resave it optimizes up to 30 times

Additional context

I tried to solve it with parallel execution, but seems it should be docx-rs problem, because word can optimize and dedupe generated file

fn create_docx(code: &str, barcode: &str, barcodes_image: &Vec<u8>) -> Docx {
  Docx::new()
    .page_margin(
      PageMargin::new()
        .header(0)
        .bottom(0)
        .footer(0)
        .left(0)
        .right(0)
        .top(0)
        .gutter(0)
    )
    .add_table(create_table(code, barcode, barcodes_image))
}

fn create_table(code: &str, barcode: &str, barcodes_image: &Vec<u8>) -> Table {
  let rows = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    .par_iter()
    .map(|_| create_row(code, barcode, barcodes_image))
    .collect();

  Table::new(rows)
    .layout(TableLayoutType::Fixed)
    .indent(0)
    .width(11880, WidthType::DXA)
}

fn create_row(code: &str, barcode: &str, barcodes_image: &Vec<u8>) -> TableRow {
  let cells = [0, 1, 2, 3]
    .par_iter()
    .map(|_| create_cell(code, barcode, barcodes_image.clone()))
    .collect();

  TableRow::new(cells)
    .row_height(1678f32)
    .height_rule(docx_rs::HeightRule::Exact)
}

fn create_cell(code: &str, barcode: &str, barcodes_image: Vec<u8>) -> TableCell {
  TableCell::new()
    .add_paragraph(
      Paragraph::default()
        .add_run(
          Run::new()
            .add_image(
              Pic::new(barcodes_image)
                .floating()
                .offset_x(20)
                .offset_y(10)
                .size(155, 45)
            )
            .add_break(BreakType::TextWrapping)
            .add_break(BreakType::TextWrapping)
            .add_text(barcode)
            .add_break(BreakType::TextWrapping)
            .add_text(code)
            .fonts(
              RunFonts::new()
                .east_asia("Calibri")
                .ascii("Calibri")
                .hi_ansi("Calibri")
            )
            .size(22)
            .character_spacing(0)
        )
        .align(AlignmentType::Center)
    )
    .clear_all_border()
    .vertical_align(VAlignType::Center)
}

before.docx
after.docx

`read_docx` does not parse image?

Describe the bug

docx_rs::read_docx reading a .docx file with images should return Docx struct with non-empty RunChild::Drawing variant with Drawing.Pic.image to be a non-empty vector but it doesn't.

Reproduced step

Steps to reproduce the behavior:

fn read_to_vec(file_name: &PathBuf) -> anyhow::Result<Vec<u8>> {
    let mut buf: Vec<u8> = Vec::new();
    std::fs::File::open(file_name)?.read_to_end(&mut buf)?;
    Ok(buf)
}

let mut original_docx = docx_rs::read_docx(&read_to_vec(&path)?)?;
dbg!(&original_docx);

Expected behavior

The dbg!(&original_docx); should print out something that indicates that RunChild::Drawing variant with Drawing.Pic.image is a non-empty vector.

Actual behavior

Instead, it prints out this:

children: [
                                    Drawing(
                                        Drawing {
                                            data: Some(
                                                Pic(
                                                    Pic {
                                                        id: "rId5",
                                                        image: [],
                                                        size: (
                                                            679639,
                                                            679639,
                                                        ),
                                                        position_type: Inline,
                                                        simple_pos: false,
                                                        simple_pos_x: 0,
                                                        simple_pos_y: 0,
                                                        layout_in_cell: true,
                                                        relative_height: 0,
                                                        allow_overlap: true,
                                                        position_h: Offset(
                                                            0,
                                                        ),
                                                        position_v: Offset(
                                                            0,
                                                        ),
                                                        relative_from_h: Margin,
                                                        relative_from_v: Margin,
                                                        dist_t: 0,
                                                        dist_b: 0,
                                                        dist_l: 0,
                                                        dist_r: 0,
                                                        rot: 0,
                                                    },
                                                ),
                                            ),
                                        },
                                    ),
                                ]

This indicates that indeed, the images are not read.

Screenshots

Screenshot of original .docx file

Desktop (please complete the following information)

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Special characters are being carried across using HTML entities vs unicode

Describe the bug

html encoded values vs unicode values we have seen include ', " and & are coming across in the text elements

Reproduced step

Steps to reproduce the behavior:

import { readDocx } from 'docx-wasm'
const parsedDoc = readDocx(buf)
console.log(parsedDoc: , parsedDoc)

Expected behavior

would prefer to see the values in the output as unicode e.g. ← since many special characters do not actually have html entity translations (MS Word's start and end double quotes are different unicode entities [U+201C, U+201D])

Actual behavior

html encoded values ', " and &

Screenshots

Corporate Arbitration.docx

Desktop (please complete the following information)

OS: [e.g. iOS] - Windows 11 & mac OS
Browser [e.g. chrome, safari] - chrome
Version [e.g. 22] - 0.0.276-rc28 [https://www.npmjs.com/package/docx-wasm]

Add page number to footer

Hey thanks for this wonderful little library!

I'm trying to add a page number in the footer of every page, but I'm not able to figure out how to do it.

Using the footer example, I thought maybe just set has_numbering to true, but it doesn't appear to do anything

use docx_rs::*;

pub fn main() -> Result<(), DocxError> {
    let path = std::path::Path::new("footer.docx");
    let file = std::fs::File::create(&path).unwrap();
    let mut footer = Footer::new();
    footer.has_numbering = true;
    footer = footer.add_paragraph(Paragraph::new().add_run(Run::new().add_text("Hello")));
    Docx::new()
        .footer(footer)
        .add_paragraph(Paragraph::new().add_run(Run::new().add_text("World")))
        .build()
        .pack(file)?;
    Ok(())
}

If there is no handy API to do it directly yet, this stackoverflow post does describe a way it could be added manually, and I see docx-rs does offer some level of xml manipulation via functions like Docx::add_custom_item and Docx::custom_property but I don't understand the docx format well enough to implement this.

Could you advise?

Thanks in advance!!

Allow overriding styles and paragraph properties for table of contents

First of all, thank you for maintaining this crate! It is really useful!

Is your feature request related to a problem? Please describe.

When creating a table of contents, it is created with a specific style (i.e., normal text without any additional formatting and "..." as tab leader):

If a different style is needed (for example, bold text and no tab leader) it is not possible to set, because:

preset style is defined and it does not seem that it can be overridden with add_style because if the style is added it will be overridden when building document anyway.
tab is added to the paragraph in TableOfContentsItem and it does not seem that it can be changed in any way.

Describe the solution you'd like

TOC item paragraph customization can be solved by allowing user to specify a mapping of levels to style names (that would be used instead of ToC{level}). In addition to that, logic in build can be updated to skip adding style if style with such name is already present (that would allow the user to customize the style by creating ToC{level} style themself).

Customizing the paragraph tab is a bit more difficult but can be solved by allowing the user to set a custom tab (per level) by calling a function on TableOfContents or by providing a way for user to set Paragraph instance which will be later be cloned per each TOC item (or by setting a factory of Paragraph).

Describe alternatives you've considered

It does not seem that there are any alternative solutions here. If there are - please let me know!

Additional context

Please let me know what you think about possible solutions, and I will be happy to submit an MR for this!

Thank you!

Numbered list example

Can you please provide numbered list example?

My experiments with numberings doesn't help much.

Set page size/margin?

I can create PageSize and PageMargin but I can't set document.section_property because SectionPriority can't be made.

Heading styles example

Hey, could you show me how to add a heading 1 style in Rust to a run using .style()? I am having difficulty with it and can't get it to work. I tried .style("Heading 1") its not working. Thanks.

Enable a means of curtailing read_docx?

Is your feature request related to a problem? Please describe.

Inability to end parsing mid-stream
In a Rust context, if I am parsing multiple large .docx documents in parallel threads, it can take a significant time for read_dox() to end, and there could be a fairly simple way of tweaking your code to detect an AtomicBool

Describe the solution you'd like

Provide an AtomicBool called "stop_parse" or something. The user could then set that to true in an independent thread. The code used by read_dox() would be liberally sprinkled with a test to detect that "stop_parse" was now at true, and would exit the parsing ASAP. Graceful shutdown in other words.

Describe alternatives you've considered

It is possible to run in another process and kill the process. Or just wait for the long thread task(s) to end, accepting that a number of threads will be needlessly occupied for a second or more after the stop instruction is given.

Additional context

This is powerful code which on a big file leads to substantial processing. It would be nice to provide a means of shutting down gracefully.
🥰

footer for default section property.

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

numProperty read error

Describe the bug

numPr read error occurred.
i ran into following xml.
Is it bug of word??

            <w:pPr>
                <w:numPr>
                    <w:ins w:id="14" w:author="a" w:date="2006-09-07T14:34:00Z"/>
                </w:numPr>
                <w:jc w:val="center"/>
                <w:rPr>
                    <w:ins w:id="15" w:author="aeon" w:date="2006-09-07T14:34:00Z"/>
                    <w:rFonts w:ascii="ＭＳ 明朝" w:eastAsia="ＭＳ 明朝" w:hAnsi="ＭＳ 明朝" w:cs="Times New Roman"/>
                    <w:lang w:eastAsia="zh-TW"/>
                </w:rPr>
            </w:pPr>

Reproduced step

Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior

A clear and concise description of what you expected to happen.

Actual behavior

A clear and concise description of what you actual to happen.

Screenshots

If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information)

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Add paragraph spacing support

Is your feature request related to a problem? Please describe.

Now docs-rs have indent and align line spacing options on Paragraph:

 pub fn indent(
        mut self,
        left: Option<i32>,
        special_indent: Option<SpecialIndentType>,
        end: Option<i32>,
        start_chars: Option<i32>,
    ) -> Paragraph

But Paragraph has no spacing with before, after, spacing modes, etc

Describe the solution you'd like

Add spacing method on paragraph

Additional context

I need "At least" line spacing with 0 pixels

Reading and then writing shortens the document

Describe the bug

I was going to make a search-and-replace utility, but it looks like the template document doesn't survive reading and writing.

Reproduced step

Here's how far I got. Read in the document, write it again. I had a search and replace step (untested) but that's commented out.

https://github.com/bburdette/docx-template/blob/f2205657daa70795ffe9a177653fadf37df4f77a/src/main.rs#L38

Expected behavior

Expected the same document to be written as as was read in.

Actual behavior

Only the first line is found in the output document.

Screenshots

Input:

An error occurs on opening the output file:

Proceeding anyway, only the first line survives:

Desktop (please complete the following information)

OS: nixos
Browser: n/a
Version: tried with 0.4.6. Also with this rev (latest commit as of today):

docx-rs = { git = "https://github.com/bokuweb/docx-rs.git", rev="bffbd468e3af9739792bcf2b35eb373aadba4785" }

Read support

Hi, thanks for this crate! ❤️

Do you have any intention to add docx reading capability to this crate? If anyone contributes reading support, will that be merged?

Thanks again!

Images in headers/footers not importing

Describe the bug

When an image is in the header or footer the image is not present in the JSON of readDocx()

Reproduced step

Steps to reproduce the behavior:

create a document with an image in the header or footer
import that document using readDocx
image is not present

Expected behavior

All images should be referenced in the JSON output of readDocx regardless of position in the document

Actual behavior

Image is not present when it is in the header or footer

Screenshots

N/A

Desktop (please complete the following information)

N/A

Macros for commonly created objects

Is your feature request related to a problem? Please describe.

It is quite a pain to create things like paragraphs or table cells. For example if you need to create a bunch of paragraphs you have to do
Paragraph::new().add_run(Run::new().add_text("Text")); over and over again

Describe the solution you'd like

It would be great if there were macros for that. For example, here's my macro for creating a paragraph:

#[macro_export]
macro_rules! paragraph {
    ($text:expr) => {
        paragraph!($text, AlignmentType::Center)
    };
    ($text:expr,$align:expr) => {
        Paragraph::new()
            .add_run(Run::new().add_text($text))
            .align($align)
    };
}

With this macro i can just do paragraph!("Text"); which is just so much nicer

Sometimes error author in comments

Hi, sometimes author in comments have name - (i cant remember, ms word or libreofiice...), and if i read docx and write this docx, this will be error.
May be change on parse?

let a = author.replace("<", "&lt;").replace(">", "&gt;");
comment = comment.author(a);

Image disappears when editing a document

If I take a docx that contains an image inside it, when I process it and write over it I lose the image, is this a bug or is it a bad implementation?

pub fn main() -> Result<(), DocxError> {

    let mut file = File::open("./files/file.docx").unwrap();
    let mut buf = vec![];
    file.read_to_end(&mut buf).unwrap();
    let res = read_docx(&buf).unwrap();

    let path = std::path::Path::new("./docxsout/file3_out.docx");
    let file_out = std::fs::File::create(&path).unwrap();

    res
        .add_paragraph(Paragraph::new().add_run(Run::new().add_text("Hello")))
        // add_paragraph(paragraph) /*En un mundo ideal*/
        .build()
        .pack(file_out)?;
    Ok(())
}

isLgl

Is your feature request related to a problem? Please describe.

when importing a docx file the value of isLgl is not translated from the XML to the output

Describe the solution you'd like

capture the isLgl value/presence and pass it into the output

Describe alternatives you've considered

N/A

Additional context

https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.wordprocessing.islegalnumberingstyle?view=openxml-2.8.1

Missing Contributing Guidelines

Hello, at first place I would like to thank you for your work. It's very handy. 👍

Is your feature request related to a problem? Please describe.

I would like to add some feature, but I'm missing how to start even developing.
What's your development workflow? What's the documentation for docx "standard"?
What's the project structure? What is inside fixtures, vrt and output etc.

Describe the solution you'd like

A one page guide how to start developing new things.

Header image not imported

When you add an in-paragraph image to the header or footer, the image is not correctly imported and relate.

[Bug?] Runs don't respect newlines

Describe the bug

Run::new().add_text("Hello\nworld")

will not produce text on two lines

Reproduced step

use docx_rs::*;

pub fn main() -> Result<(), DocxError> {
    let path = std::path::Path::new("./hello.docx");
    let file = std::fs::File::create(path).unwrap();
    Docx::new()
        .add_paragraph(Paragraph::new().add_run(Run::new().add_text("Hello\nworld")))
        .build()
        .pack(file)?;
    Ok(())
}

Expected behavior

Actual behavior

Support Multi-Column Pages.

I want to multi-column pages.

fn set_pages(columns: usize, docx: Doc) -> Doc{
  let mut section = docx_rs::SectionProperty::new();
  section.columns = columns;
  doc.document.section_property = section;
  return doc;
}

This confirms that I can change the settings for section properties for the entire document. However, the columns are not reflected.

I confirm that the columns can be changed by partially updating sectyon_property.rs and elements.rs.

I know that I have not yet Supported the Section.
Can I send a pull request?

Issue importing table

Describe the bug

When trying to import a docx file containing a table the importer fails.

Reproduced step

Steps to reproduce the behavior:

try to import the attached file table test (1).docx
see this error:

Expected behavior

Expect the engine to throw an intelligible error or better yet to import the file without error (since the word document opens with no issues).

Actual behavior

Cannot import the file while the table is in it.

Screenshots

If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information)

OS: [e.g. iOS]: Windows 11 [ and MacOS ]
Browser [e.g. chrome, safari]: chrome
Version [e.g. 22]: Version 108.0.5359.94 (Official Build) (64-bit)

通过read_docx将docx内容转成Docx,想通过遍历docx.document.children 想修改里面节点的值，不知道杂个写

Is your feature request related to a problem? Please describe.

A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

Adding optional properties error

If we have run property like

<w:b w:val="false" />

in RunProperty this wil be Some

And function

 pub(crate) fn add_optional_child<T>(mut self, child: &Option<T>) -> Self
    where
        T: BuildXML,
    {
        if let Some(c) = child 
        {
            self = self.add_child(c)
        }
        self
    }

will add property to run even if val="false"...

can't find `<w:ptab>`

Hello, I tried to add ptab to the document.
But I can't find it.
Its XML structure is like this.

<w:r>
  <w:ptab w:relativeTo="margin" w:alignment="right" w:leader="none" />
</w:r>

Get a least styles from an existing document and use it in a new document

First of all I want to say I like your project. Is it possible with docx-rs to extract the styles from an existing document and use it in a new document?

Test cases failed locally

Describe the bug

I found that my docx file could not be read, and then after I cloned the repo, the test case did not pass locally.

Reproduced step

Steps to reproduce the behavior:

git clone https://github.com/bokuweb/docx-rs.git
cd docx-rs
cargo test -- --test-threads=1

Expected behavior

All pass.

Actual behavior

running 38 tests
failures:

---- reader::read_bom stdout ----
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ZipError(FileNotFound)', docx-core\tests\reader.rs:148:32
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- reader::read_bookmark stdout ----
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ZipError(FileNotFound)', docx-core\tests\reader.rs:163:32

---- reader::read_comment stdout ----
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ZipError(FileNotFound)', docx-core\tests\reader.rs:238:32

---- reader::read_decoration stdout ----
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ZipError(FileNotFound)', docx-core\tests\reader.rs:42:32

---- reader::read_extended_comment stdout ----
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ZipError(FileNotFound)', docx-core\tests\reader.rs:253:32

---- reader::read_from_doc stdout ----
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ZipError(FileNotFound)', docx-core\tests\reader.rs:208:32

---- reader::read_hello stdout ----
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ZipError(FileNotFound)', docx-core\tests\reader.rs:12:32

---- reader::read_highlight_and_underline stdout ----
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ZipError(FileNotFound)', docx-core\tests\reader.rs:58:32

---- reader::read_history stdout ----
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ZipError(FileNotFound)', docx-core\tests\reader.rs:73:32

---- reader::read_indent_word_online stdout ----
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ZipError(FileNotFound)', docx-core\tests\reader.rs:88:32

---- reader::read_insert_table stdout ----
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ZipError(FileNotFound)', docx-core\tests\reader.rs:178:32

---- reader::read_line_spacing stdout ----
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ZipError(FileNotFound)', docx-core\tests\reader.rs:268:32

---- reader::read_lvl_override stdout ----
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ZipError(FileNotFound)', docx-core\tests\reader.rs:223:32

---- reader::read_numbering stdout ----
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ZipError(FileNotFound)', docx-core\tests\reader.rs:27:32

---- reader::read_tab_and_break stdout ----
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ZipError(FileNotFound)', docx-core\tests\reader.rs:103:32

---- reader::read_table_docx stdout ----
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ZipError(FileNotFound)', docx-core\tests\reader.rs:118:32

---- reader::read_table_merged_libre_office stdout ----
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ZipError(FileNotFound)', docx-core\tests\reader.rs:133:32

---- reader::read_textbox stdout ----
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ZipError(FileNotFound)', docx-core\tests\reader.rs:193:32


failures:
    reader::read_bom
    reader::read_bookmark
    reader::read_comment
    reader::read_decoration
    reader::read_extended_comment
    reader::read_from_doc
    reader::read_hello
    reader::read_highlight_and_underline
    reader::read_history
    reader::read_indent_word_online
    reader::read_insert_table
    reader::read_line_spacing
    reader::read_lvl_override
    reader::read_numbering
    reader::read_tab_and_break
    reader::read_table_docx
    reader::read_table_merged_libre_office
    reader::read_textbox

test result: FAILED. 20 passed; 18 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.08s

error: test failed, to rerun pass '-p docx-rs --test lib'

Screenshots

Desktop (please complete the following information)

OS: Window10 20H2

How do I detect tables and their contents?

This is about reading .docx files rather than writing them.

I have some lines like this (based on this page):

let data: Value = serde_json::from_str(&read_docx(&read_to_vec(file_name)?)?.json())?;
if let Some(children) = data["document"]["children"].as_array() {
    children.iter().for_each(|node| {
    	let n = read_children(node);
    	n_words_in_docx += n;
    	()
    });
    ...

The idea of this is that you get an array of "nodes". The nodes can themselves have child nodes, and the function read_children calls itself recursively. Despite that, no text in tables in the Word document is identified. That's my main question, but I'm also not sure about the handling of headers, footers, footnotes, comments, text boxes, watermarks... I want to sweep up all the text in the file if possible.

NB I'm a Rust uber-newb, but I have now cloned your repo, and am currently taking a look at reader/mod.rs fn read_docx and also reader/read_zip.rs fn read_zip... is it possible that one of these fails to parse (document.xml?) as it should?

With my document (consisting of just one table), I get just two nodes. Neither produces any text, using the method in the above page. So then I looked at the json produced:

println!("read_children...\n{}", serde_json::to_string_pretty(node).unwrap());

read_children...
{
  "data": {
    "grid": [
      4928,
      4926
    ],
    "hasNumbering": false,
    "property": {
      "borders": {
        "bottom": null,
        "insideH": null,
        "insideV": null,
        "left": null,
        "right": null,
        "top": null
      },
      "justification": "left",
      "style": "TableGrid",
      "width": {
        "width": 0,
        "widthType": "auto"
      }
    },
    "rows": [
      {
        "data": {
          "cells": [
            {
              "data": {
                "children": [
                  {
                    "data": {
                      "children": [
                        {
                          "data": {
                            "children": [
                              {
                                "data": {
                                  "preserveSpace": true,
                                  "text": "CHAPTER I. "
                                },
                                "type": "text"
                              }
                            ],
                            "runProperty": {}
                          },
                          "type": "run"
                        }
                      ],
                      "hasNumbering": false,
                      "id": "00000001",
                      "property": {
                        "indent": {
                          "end": null,
                          "firstLineChars": null,
                          "hangingChars": null,
                          "specialIndent": {
                            "type": "firstLine",
                            "val": 0
                          },
                          "start": 0,
                          "startChars": null
                        },
                        "runProperty": {},
                        "tabs": []
                      }
                    },
                    "type": "paragraph"
                  }
                ],
                "hasNumbering": false,
                "property": {
                  "borders": null,
                  "gridSpan": null,
                  "shading": null,
                  "textDirection": null,
                  "verticalAlign": null,
                  "verticalMerge": null,
                  "width": {
                    "width": 4928,
                    "widthType": "dxa"
                  }
                }
              },
              "type": "tableCell"
            },
            {
              "data": {
                "children": [
                  {
...

"text": "CHAPTER I. " in the above is not identified as text. ... is it possible that the parsing of such a file is not recursively exploring "rows", "cells" and "children" keys? Way out of my depth now.

Disable header if it not set

Is your feature request related to a problem? Please describe.

Header is always visible. it is breaking table layout and can't be disabled inside rust

Describe the solution you'd like

@bokuweb idea is to disable header if not set #115 (comment)

Additional context

Not importing (or translating) linked style names

Describe the bug

Style import is not including linked styles

Reproduced step

Import a document with a linked style

The linked style is not in the imported styles

Expected behavior

The imported styles should include the linked style

Actual behavior

The linked style is not included

Screenshots

If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information)

OS: [e.g. iOS]: Windows 11 [ and MacOS ]
Browser [e.g. chrome, safari]: chrome
Version [e.g. 22]: Version 108.0.5359.94 (Official Build) (64-bit)

Set table's bgcolor?

Can I set table's bgcolor attribute? I cannot find methods for it.

Documentation

Is your feature request related to a problem? Please describe.

Hard to learn how to use this crate.

Describe the solution you'd like

This crate needs more documentation.

Describe alternatives you've considered

Trial and error

Additional context

https://docs.rs/docx-rs/0.4.6/docx_rs/index.html

Please update your documentation on how to save the finalized `Docx` or `Document`.

Please update your documentation on how to save the finalized Docx or Document.

how to write data into a template docx

how to write data into a template docx .

error

嘿嘿哈哈

Not respecting the headerfooter flags properly

Describe the bug

When importing a document with a header / footer there are three possible state variables. titlePg in the sectPr indicates that the headerFooters first page property is on if it is absent it is off. When we added a First page header, enter a value then uncheck it the property on the importer is not resetting to false.

For even page header and footer the property should be based on the absence or presence of the evenAndOddHeaders element in the settings.xml. Presence being on absence being off. But the importer is always giving us false.

Reproduced step

Steps to reproduce the behavior:
See above

Expected behavior

A clear and concise description of what you expected to happen.
when the TitlePg element is present in the word xml sectionPr we expect the value to be true in the JSON when it is absent we expect the value to be false.

when the evenAndOddHeaders element is present in the word settings.xml we expect the value to be true in the JSON when it is absent we expect the value to be false.

Actual behavior

A clear and concise description of what you actual to happen.
when the TitlePg element is present in the word xml sectionPr we get true if you unselect/uncheck it in word the value still comes across as true.

when the evenAndOddHeaders element is present in the word settings.xml the value is always false regardless of whether or not the element is in the xml

Screenshots

first page unchecked

first page checked

evenAndOddHeaders unchecked

evenAndOddHeaders checked

Desktop (please complete the following information)

N/A

[BUG] `widow_control` XML structure error

build in word:

<w:style w:type="paragraph" w:customStyle="1" w:styleId="t">
    <w:name w:val="t"/>
    <w:qFormat/>
</w:style>
<w:style w:type="paragraph" w:customStyle="1" w:styleId="f">
    <w:name w:val="f"/>
    <w:qFormat/>
    <w:rsid w:val="009B7F4F"/>
    <w:pPr>
        <w:widowControl w:val="0"/>
    </w:pPr>
</w:style>

build in docx-rs:

<w:style w:type="paragraph" w:styleId="t">
    <w:name w:val="t"/>
    <w:pPr>
        <w:widowControl/>
    </w:pPr>
    <w:qFormat/>
</w:style>
<w:style w:type="paragraph" w:styleId="f">
    <w:name w:val="f"/>
    <w:qFormat/>
</w:style>

Is it possible to retrieve formatting for highlighted text?

Hi, I was wondering if it's possible to retrieve formatting data for highlighted text?

I can't see it anywhere in the parsed document's JSON.

If it's not available but possible, where might be a good place to start looking in the code to investigate implementing it?

I believe the XML structure is along the lines of:

<w:p>
  <w:r>
    <w:rPr>
      <w:highlight w:val="yellow" />
    </w:rPr>
    <w:t>This text is highlighted.</w:t>
  </w:r>
</w:p>

Thanks a lot!

can pub any struct fileds?

For example, if you set LineSpaceing.line to any of the values previously, you cannot reset it to none.

bokuweb / docx-rs Goto Github PK

docx-rs's Introduction

Installation

Rust

Browser/Node.js

Example

Rust

Browser

Node.js

More examples

Development

Requirements

Examples

Testing

Rust

Wasm

Features

docx-rs's People

Contributors

Stargazers

Watchers

Forkers

docx-rs's Issues

Describe the bug

Reproduced step

Expected behavior

Actual behavior

Desktop (please complete the following information)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Rate-Limited

Open

Detected dependencies

Describe the bug

Reproduced step

Expected behavior

Actual behavior

Screenshots

Desktop (please complete the following information)

Describe the bug

Reproduced step

Expected behavior

Actual behavior

Screenshots

Desktop (please complete the following information)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Is your feature request related to a problem? Please describe.

Additional context

Describe the bug

Reproduced step

Expected behavior

Actual behavior

Screenshots

Desktop (please complete the following information)

Describe the bug

Reproduced step

Expected behavior

Actual behavior

Screenshots

Desktop (please complete the following information)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Describe the bug

Reproduced step

Expected behavior