Simplifying HTML to PDF Conversion in Python

January 7, 2024

Python-PDFKit: Simplifying HTML to PDF Conversion in Python

Do you need a simple and efficient way to convert HTML files to PDF format using Python? Look no further than the Python-PDFKit library. Whether you’re a software engineer, solution architect, or anyone in need of HTML to PDF conversion, this article will guide you through the installation process, provide usage examples, and offer troubleshooting tips.

Why Python-PDFKit?

Python-PDFKit is a Python 3 wrapper for the wkhtmltopdf utility, which uses the Webkit rendering engine to convert HTML to PDF. Drawing inspiration from the Ruby PDFKit library, Python-PDFKit offers a convenient and straightforward interface for generating PDF files from HTML content.

Installation

To get started, you’ll need to install the Python-PDFKit library. Simply open your terminal and run the following command:

#
$ pip install pdfkit

In addition to installing the library, you’ll also need to install the wkhtmltopdf utility. The installation process varies depending on your operating system:

Debian/Ubuntu:

#
$ sudo apt-get install wkhtmltopdf

macOS:

#
$ brew install homebrew/cask/wkhtmltopdf

Please note that the version of wkhtmltopdf available in the Debian/Ubuntu repositories may have reduced functionality. To access all available options, consider installing the static binary from the official wkhtmltopdf website or using the provided script for Ubuntu/Debian versions.

For Windows and other options, refer to the wkhtmltopdf homepage for binary installers.

Usage

Using Python-PDFKit to convert HTML to PDF is straightforward. Here are a few examples to get you started:

#python
import pdfkit

pdfkit.from_url('http://google.com', 'out.pdf')
pdfkit.from_file('test.html', 'out.pdf')
pdfkit.from_string('Hello!', 'out.pdf')

You can pass either a single URL or file path, or a list of multiple URLs or file paths to convert multiple HTML files into PDF. Additionally, you can easily handle an opened file object:

#python
with open('file.html') as f:
    pdfkit.from_file(f, 'out.pdf')

If you prefer to store the generated PDF in a variable for further processing, omit the output path parameter:

#python
pdf = pdfkit.from_url('http://google.com')

To customize the PDF generation process, you can specify various options available in wkhtmltopdf. These options can be passed as a dictionary to the options parameter. Here’s an example:

#python
options = {
    'page-size': 'Letter',
    'margin-top': '0.75in',
    'margin-right': '0.75in',
    'margin-bottom': '0.75in',
    'margin-left': '0.75in',
    'encoding': "UTF-8",
    'custom-header': [
        ('Accept-Encoding', 'gzip')
    ],
    'cookie': [
        ('cookie-empty-value', '""'),
        ('cookie-name1', 'cookie-value1'),
        ('cookie-name2', 'cookie-value2'),
    ],
    'no-outline': None
}

pdfkit.from_url('http://google.com', 'out.pdf', options=options)

Remember, you can find a comprehensive list of options in the wkhtmltopdf documentation. To access these options, drop the leading -- in the option name. For options without a value, use None, False, or '' as the dictionary value. For repeatable options and those requiring multiple values, use a list or tuple.

Configuration

Python-PDFKit provides a configuration option to customize the behavior of the wkhtmltopdf binary. To specify the location of the wkhtmltopdf binary or check if it is present in your $PATH, use the pdfkit.configuration() API call.

Here’s an example of using a custom wkhtmltopdf location:

#python
config = pdfkit.configuration(wkhtmltopdf='/opt/bin/wkhtmltopdf')
pdfkit.from_string(html_string, output_file, configuration=config)

You can also use the configuration() call to check if wkhtmltopdf is present in your $PATH:

#python
try:
    config = pdfkit.configuration()
    pdfkit.from_string(html_string, output_file)
except OSError:
    # wkhtmltopdf not present in PATH

Troubleshooting

If you encounter any issues with PDF generation, the following tips may help you troubleshoot the problem:

Debugging issues with PDF generation

If you’re struggling to generate the correct PDF, pass the verbose=True parameter to the API calls to view the wkhtmltopdf output:

#python
pdfkit.from_url('http://google.com', 'out.pdf', verbose=True)

If you’re getting unexpected results or certain options appear to be ignored, try running wkhtmltopdf directly with the CLI command produced by the pdfkit.PDFKit class:

#python
import pdfkit

r = pdfkit.PDFKit('html', 'string', verbose=True)
print(' '.join(r.command()))
# try running wkhtmltopdf to create PDF
output = r.to_pdf()

Common Errors

'No wkhtmltopdf executable found' IOError:

Make sure you have wkhtmltopdf in your $PATH or use the custom configuration option. On Windows, the command where wkhtmltopdf should return the actual path to the binary. On Linux, use which wkhtmltopdf.
'Command Failed' IOError:

This error indicates that PDFKit was unable to process the input. Try running the command directly from the error message to see what caused the failure. Note that some versions of wkhtmltopdf may produce segmentation faults.

Wrap Up

Python-PDFKit is a powerful tool for converting HTML to PDF effortlessly using Python. Its intuitive interface, extensive options, and compatibility with the wkhtmltopdf utility make it a popular choice for developers and software architects alike.

In this article, we covered the installation process, basic usage examples, and troubleshooting tips for Python-PDFKit. Now it’s time to unleash your creativity and explore the possibilities of HTML to PDF conversion in your Python projects!

If you have any questions or want to share your experiences with Python-PDFKit, please feel free to leave a comment below.

References

Python-PDFKit Repository: github.com/JazzCore/python-pdfkit
wkhtmltopdf Project: wkhtmltopdf.org

Acknowledgements

Special thanks to the JazzCore team for creating and maintaining the Python-PDFKit library.

Group Sum