动态创建的 Word 文档中缺少 Open XML 部分

路易斯·巴拉哈斯

我正在WordprocessingDocument使用 Open XML SDK 在 C# 中创建s,然后将它们转换为 pdf。最初,我使用 Interop 将文档保存为 PDF 格式,但现在这不是一种选择。我发现 LibreOffice 可以转换从 cmd 调用 soffice.exe 的文档,并且我对普通文档有很好的效果。然而,当我用我的动态文档测试 LibreOffice 转换器时,转换器崩溃了。

我复制了其中一个文件并用LibreOffice Writer打开它,它的结构是错误的,然后我用Microsoft Word打开了同一个文件,它的结构很好。最后,我用 Microsoft Word 保存它并将两个文档作为 ZIP 文件打开,如下所示:

这是一个好的:

良好的文档结构

这是坏的:

错误的文档结构

我注意到当我将文档保存在 Microsoft Word 中时,会出现这些 Open XML 部分(我在本问题的早期版本中将其称为“文件”)。当我在 LibreOffice 中打开之前用 Microsoft Word 保存的文档时,该文档又恢复正常了。

因此,有没有办法在不打开 Microsoft Word 的情况下生成这些 Open XML 部分(在 Word 文档中)?

我使用以下代码(检查它是否正在创建所有文件):

        using (MemoryStream mem = new MemoryStream())
        {
            // Create Document
            using (WordprocessingDocument wordDocument =
                WordprocessingDocument.Create(mem, WordprocessingDocumentType.Document, true))
            {
                // Add a main document part. 
                MainDocumentPart mainPart = wordDocument.AddMainDocumentPart();

                // Create the document structure and add some text.
                mainPart.Document = new Document();
                Body docBody = new Body();

                // Add your docx content here
                CreateParagraph(docBody);
                CreateStyledParagraph(docBody);
                CreateTable(docBody);
                CreateList(docBody);

                Paragraph pImg = new Paragraph();
                ImagePart imagePart = mainPart.AddImagePart(ImagePartType.Jpeg);
                string imgPath = "https://cdn.pixabay.com/photo/2019/11/15/05/23/dog-4627679_960_720.png";
                HttpWebRequest req = (HttpWebRequest)WebRequest.Create(imgPath);
                req.UseDefaultCredentials = true;
                req.PreAuthenticate = true;
                req.Credentials = CredentialCache.DefaultCredentials;
                HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
                imagePart.FeedData(resp.GetResponseStream());

                // 1500000 and 1092000 are img width and height
                Run rImg = new Run(DrawingManager(mainPart.GetIdOfPart(imagePart), "PictureName", 1500000, 1092000, string.Empty));
                pImg.Append(rImg);
                docBody.Append(pImg);

                Paragraph pLink = new Paragraph();
                // For the mainpart see above
                pLink.Append(HyperLinkManager("http://YourLink", "My awesome link", mainPart));
                docBody.Append(pLink);

                mainPart.Document.Append(docBody);
                mainPart.Document.Save();
                wordDocument.Close();
            }

            result = Convert.ToBase64String(mem.ToArray());
        }

上面的代码创建了一个名为 Result.docx 的 Word 文档,其结构如下:

结果.docx结构

But there aren't any other Open XML parts (like app.xml or styles.xml)

Thomas Barnekow

You need to make a difference between:

  • the Open XML standard and its minimum requirements on a WordprocessingDocument and
  • the "minimum" document created by Microsoft Word or other applications.

As per the standard, the minimum WordprocessingDocument only needs a main document part (MainDocumentPart, document.xml) with the following content:

<w:document xmlns:w="...">
  <w:body>
    <w:p />
  </w:body>
</w:document>

Further parts such as the StyleDefinitionsPart (styles.xml) or the NumberingDefintionsPart (numbering.xml) are only required if you have styles or numbering, in which case you must explicitly create them in your code.

Next, looking at your sample code, it seems you are creating:

  1. paragraphs that reference styles (see CreateStyledParagraph(docBody)), which would have to be defined in the StyleDefinitionsPart (styles.xml); and
  2. numbered lists (e.g., CreateList(docBody)), which would have to be defined in the NumberingDefinitionsPart (numbering.xml).

However, your code neither creates a StyleDefinitionsPart nor a NumberingDefintionsPart, which means your document is likely not a valid Open XML document.

Now, Word is very forgiving and fixes various issues silently, ignoring parts of your Open XML markup (e.g., the styles you might have assigned to your paragraphs).

By contrast, depending on how fault-tolerant LibreOffice is, invalid Open XML markup might lead to a crash. For example, if LibreOffice simply assumes that a StyleDefinitionsPart exists when it finds an element like <w:pStyle w:val="MyStyleName" /> in your w:document and then does not check whether it gets a null reference when asking for the StyleDefinitionsPart, it could crash.

最后,要将部件添加到 Word 文档,您将使用 Open XML SDK,如下所示:

[Fact]
public void CanAddParts()
{
    const string path = "Document.docx";
    const WordprocessingDocumentType type = WordprocessingDocumentType.Document;

    using WordprocessingDocument wordDocument = WordprocessingDocument.Create(path, type);

    // Create minimum main document part.
    MainDocumentPart mainDocumentPart = wordDocument.AddMainDocumentPart();
    mainDocumentPart.Document = new Document(new Body(new Paragraph()));

    // Create empty style definitions part.
    var styleDefinitionsPart = mainDocumentPart.AddNewPart<StyleDefinitionsPart>();
    styleDefinitionsPart.Styles = new Styles();

    // Create empty numbering definitions part.
    var numberingDefinitionsPart = mainDocumentPart.AddNewPart<NumberingDefinitionsPart>();
    numberingDefinitionsPart.Numbering = new Numbering();
}

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章