Have you ever needed to manipulate HTML data quickly and efficiently? Look no further, because hq is here to save the day! hq is a powerful command-line tool built in Python that allows you to query HTML and manipulate data with ease. Whether you need to extract specific information from a web page or transform HTML into different formats such as HTML, JSON, or anything else, hq has got you covered.
Features of hq
XPath 1.0 Compliance with Some Extra Goodies
hq is 99% compliant with the XPath 1.0 standard, making it an excellent choice for querying HTML. However, it doesn’t stop there. hq also includes some additional features that give you more power to control the shape and format of the data you produce.
Nuggets of XQuery
With hq, you get the best parts of XQuery without the overwhelming complexity. Use it for iteration, branching, and other essential tasks.
XPath Expansions for HTML
hq provides expanded functionality for working specifically with HTML. It introduces a class::
axis and class()
function, making it easier to navigate and manipulate HTML elements. Additionally, it includes abbreviated axes to keep your commands concise and readable.
Super-charged String Interpolation
Transforming data on the fly is a breeze with hq‘s powerful string interpolation capabilities. Chain together various filters to manipulate and format your data as you extract it.
Computed Constructors for HTML and JSON
Need to programmatically generate HTML or JSON objects and arrays? hq has you covered with computed constructors. Assemble and output new HTML or JSON structures effortlessly.
Out-of-left-field Union Decomposition
hq takes unions in XPath expressions to new levels of terseness and power. Use different expressions for each clause in a union, enabling incredibly concise and effective mappings.
Installing and Running hq
Getting started with hq is easy. Simply run the following command to install hq using pip:
#
pip install hq
Once installed, you can run hq by piping HTML data into it or specifying a file to read from. Here are a couple of examples:
#
cat /path/to/file.html | hq '`Hello, ${/html/head/title}!`'
or
#
hq -f /path/to/file.html '`Hello, ${/html/head/title}!`'
For complete usage information, you can run:
#
hq --help
Running hq in a Container
Not ready to install anything locally? No problem! hq offers a Docker image that makes it incredibly easy to try it out without any software installation (except Docker). Here’s how to run hq in a container:
#
cat /path/to/file.html | docker run -i frew/hq '//some/hquery'
Thank you, Frew, for providing this convenient Docker image!
Learning More about hq
To dive deeper into the hq tool and the language it is built upon, be sure to check out the wiki. The wiki provides valuable resources such as discussions on the motivations behind the HQuery language’s design and a language reference for hq.
Contributing to hq
If you’re interested in contributing to the development of hq, the project welcomes your contributions. To get started, make sure you have Python 3.5 through 3.9 installed. The project’s file structure and setup.py
script are based on this helpful blog post.
Dependencies for hq are split into a “base” file for running the application and a “dev” file providing the tools necessary for testing and development. To set up the development environment, run the following command:
#
pip install -r requirements/dev.txt
The parsing logic in hq is based on the top-down operator precedence approach, inspired by Douglas Crockford’s top-down operator precedence parser.
When making changes, ensure that all tests pass by running:
#
py.test
If you want to generate a coverage report, you can run:
#
py.test --cov=hq --cov-report html
For more detailed and verbose output during tests, use the --gabby
flag. Please consider running one test at a time when using the --gabby
flag to avoid excessive output:
#
py.test --gabby -vv -k some_particular_test_function
Remember to submit pull requests rather than uploading directly to PyPI. Uploading instructions can be found in the project’s setup guide and blog post.
Conclusion
hq is a versatile and powerful tool for HTML manipulation and data extraction. With its XPath querying capabilities, XQuery nuggets, HTML-specific expansions, string interpolation, computed constructors, and union decomposition, you’ll have all the tools you need to slice and dice HTML data with ease. Install hq today and take control of your HTML data manipulation!
If you have any questions or would like to share your experiences with hq, please leave a comment below. Happy hacking!
Leave a Reply