如何使用书签将HTML转换为PDF

卡斯拉

我正在尝试将自定义的html文件另存为pdf。通常,我会在浏览器(chrome)中按ctrl-P并以pdf格式打印。

将html保存为pdf

但是,当我打开pdf文件时,pdf阅读器(adobe)的左侧没有书签选项卡。

不含书签的pdf

我想要将html文件另存为pdf,并且书签应显示在pdf阅读器的左侧:

在此处输入图片说明

我创建了html文件。我使用id和超链接添加了指向该文件某些部分的链接:

<a href="#part1">part1</a>
...some codes here...
<div id="part1">

并且它可以工作,但是我不知道如何从html创建pdf书签...通常,ms word或libre office可以使用书签将其文档转换为pdf。

但是,如何使用HTML制作带有书签的pdf?

凯文

Okay, so I ran into this problem and really wanted there to be a solution here that worked. When there wasn't, I figured I should add what I found so that hopefully the next developer can benefit from it.

First up: HTML conversion to PDF isn't really up to the HTML itself - it's up to whatever the conversion engine decides to do with your HTML. So for instance, if your approach is: Open it in IE/Chrome/Firefox/whatever > File > Print > Microsoft Print to PDF - well, your conversion engine is 'Microsoft Print to PDF'. Doesn't matter what browser you were using at that point - all its doing is creating a print stream to send to a printer. So if Microsoft Print to PDF isn't going to make bookmarks for you (which it doesn't) then it doesn't matter which web browser you use to open the PDF.

And this is the critical problem with any Ctrl-P / Print avenues. The web browser is ultimately creating a print stream, which the conversion library simply streams into a PDF. And all the web browsers I looked at do not have native support built in to convert to PDF (why would they? 99% of the use cases are covered with a 'Print to PDF' functionality.) And the print drivers I tried (Microsoft Print to PDF, Adobe PDF Print) didn't manage to suss out bookmarks from the raw print stream. Which makes sense.

So, at this point, what you're looking for is a standalone PDF Conversion engine - something that can actively open the HTML file and convert from there, instead of going through a web browser. Are there PDF Conversion engines that do this and add Header-Tag based bookmarks? Possibly. The ones we had at our disposal (ABCPdf, Neevia) weren't able to do it, but it's certainly possible there's one out there.

So what now?

There are a few different options I explored.

Option #1: Separate Files, Combined With Adobe

Adobe Acrobat (non-viewer version), when it's the conversion engine, will automatically add bookmarks for each file it converts. So you can submit the HTML contents, not as a single HTML file, but as HTML files for each section you want a bookmark over.

The good news is that if a section has a hyperlink that points to another document its merging, it's smart enough to have that hyperlink point to the spot within the internal PDF its creating (it's not an external hyperlink like I expected it would be). There are two bits of bad news, though:

  • Each section has to be the start of a PDF page. If your section is two inches tall, the rest of the page will be blank, and the next section will start on the following page.
  • The bookmarks aren't clean. When I did it, each file had 3 bookmarks. Which is pretty darned ugly and off-putting.

Option #2: Separate Files, Combined With Another Library

The first 'downside' of Option #1 might not be a problem. But the second is pretty ugly. And other libraries definitely can create the bookmarks without creating 3-per-file. The main obstacle here is: the library has to be smart enough to resolve those 'external' hyperlinks to within the PDF that's created. One thing that often hurts is that those conversion libraries often want to convert each separate file to a PDF internally first and then merge the PDFs together... but that means that it won't handle the cross-file hyperlinks correctly. I wasn't able to find a way to make this work with our existing PDF conversion libraries.

Option #3: Different Origination Method

Instead of having a 'Help.html', which is then converted to PDF somehow, start with a format other than HTML. And the easiest source to get into PDF+Bookmarks is MSWord+Headers. Generally, for each PDF help file you want, you can have a master .DOCX sitting somewhere behind the scenes. We've used this approach before, and while it's not the most elegant, it at least works pretty well.

Option #4: Programmatic with Library

This might not be applicable for the OP's use case... but if you're generating the help, there's nothing to say you can't use the PDF Conversion library programatically to add whatever bookmarks you want. Pretty much every PDF engine I've seen allows API access to bookmarks, so if this avenue is open to you, it's almost certainly the cleanest solution-wise.

Option #5: PDF Conversion Scouring

Like I mentioned, it's possible there's a PDF conversion engine out there that has a good HTML parsing engine and can handle bookmarks from various HTML tags (like H1, H2, etc.) However, it's probably going to take a bit to find it, because it's so much easier for a potential engine-writer to allow the file to be rendered with a native viewer. Think about it. If you were writing a PDF Conversion Service, which would you rather do:

  • Develop routines that can accurately render an HTML document fed into it - aka, basically write your own web browser from scratch.
  • Have IE/Chrome/Whatever render it and simply take their print output to convert to PDF.

...第二种选择比第一种选择简直荒谬,以至于大多数PDF转换引擎没有自己的内部HTML解析器(就此而言,Word解析器,Excel解析器等)也就不足为奇了。

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章