html
stableParse and manipulate HTML documents using CSS selectors, and escape or create HTML elements.
use plugin html::{parse_select, parse_select_text, parse_select_attr, …} Functions (18)
- parse_select Select elements by CSS selector, return HTML
- parse_select_text Select elements, return text content
- parse_select_attr Select elements, return attribute values
- extract_links Extract all anchor links with text and href
- extract_images Extract all images with src and alt
- text_content Get all text from entire document
- select_nth Get text of nth CSS selector match
- select_count Count elements matching a selector
- outer_html Get full outer HTML of matching elements
- extract_meta Extract all meta tag name/content pairs
- extract_title Extract the page title string
- extract_scripts Extract script tags with src and inline code
- extract_styles Extract stylesheet links and inline CSS
- strip_tags Strip all HTML tags, return plain text
- extract_tables Extract HTML tables as nested arrays
- escape Escape special characters as HTML entities
- unescape Decode HTML entities back to characters
- create_element Build an HTML element string
Overview
html is a stateless HTML toolkit built on a real CSS-selector engine, so you
work with documents the same way a browser or scraper would: feed in an HTML
string, query it with familiar selectors like "a", ".card", or
"link[rel='stylesheet']", and get back plain Zolo strings and tables. There
are no handles or objects to manage — every function takes the document as a
string argument and returns ordinary values, so each call is independent and
re-parses what it needs.
The functions fall into three groups: selector queries (parse_select,
parse_select_text, parse_select_attr, select_nth, select_count,
outer_html) that pull elements out of a document; high-level extractors
(extract_links, extract_images, extract_meta, extract_title,
extract_scripts, extract_styles, extract_tables, text_content,
strip_tags) that return structured data for common page parts; and string
builders (escape, unescape, create_element) for safely producing HTML.
Reach for it whenever you need to scrape, inspect, or assemble HTML without an
external parser.
Common patterns
Scrape a navigation menu by pulling every link with its text and target:
use plugin html::{extract_links, select_count}
let page = "<nav><a href='/home'>Home</a><a href='/blog'>Blog</a></nav>"
let links = extract_links(page)
print("found {select_count(page, "a")} links")
for link in links {
print("{link["text"]} -> {link["href"]}")
}
Read a page's metadata in one pass — title plus every <meta> tag:
use plugin html::{extract_title, extract_meta}
let page = "<head><title>Cats</title><meta name='description' content='All about cats'></head>"
print("title: {extract_title(page)}")
for m in extract_meta(page) {
print("{m["name"]} = {m["content"]}")
}
Build an element safely by escaping untrusted text before nesting it:
use plugin html::{escape, create_element}
let comment = escape("<b>hi</b> & bye")
let safe = create_element("p", #{"class": "comment"}, comment)
print(safe)
Select elements by CSS selector, return HTML
Parses the HTML document and returns a table of outer HTML strings for each element matching the CSS selector. Keys are 1-indexed integers.
use plugin html::{parse_select}
let doc = "<ul><li>Alice</li><li>Bob</li></ul>"
let items = parse_select(doc, "li")
print(items[1])
print(items[2])
Select elements, return text content
Like parse_select but returns the text content of each matched element instead of its HTML.
use plugin html::{parse_select_text}
let doc = "<div><p>Hello <b>World</b></p><p>Goodbye</p></div>"
let texts = parse_select_text(doc, "p")
print(texts[1])
Select elements, return attribute values
Returns a table of attribute values for a given attribute name across all elements matching the selector. Elements without the attribute are skipped.
use plugin html::{parse_select_attr}
let doc = "<a href='/home'>Home</a><a href='/about'>About</a>"
let hrefs = parse_select_attr(doc, "a", "href")
print(hrefs[1])
print(hrefs[2])
Use any attribute name and a more specific selector to read, say, image sources inside a gallery:
use plugin html::{parse_select_attr}
let doc = "<div class='gallery'><img src='1.png'><img src='2.png'></div>"
let srcs = parse_select_attr(doc, ".gallery img", "src")
print(srcs[1])
Extract all anchor links with text and href
Returns a table of {text, href} tables for every <a href="..."> element in the document.
use plugin html::{extract_links}
let doc = "<a href='https://example.com'>Example</a>"
let links = extract_links(doc)
print(links[1]["text"])
print(links[1]["href"])
Extract all images with src and alt
Returns a table of {src, alt} tables for every <img> element in the document.
use plugin html::{extract_images}
let doc = "<img src='logo.png' alt='Logo'><img src='banner.jpg' alt=''>"
let imgs = extract_images(doc)
print(imgs[1]["src"])
Get all text from entire document
Concatenates all text nodes in the document and returns a single string.
use plugin html::{text_content}
let doc = "<h1>Title</h1><p>Body text here.</p>"
let text = text_content(doc)
print(text)
Get text of nth CSS selector match
Returns the text content of the nth match (1-indexed) of the CSS selector. Returns nil if there is no nth match.
use plugin html::{select_nth}
let doc = "<ul><li>First</li><li>Second</li><li>Third</li></ul>"
let second = select_nth(doc, "li", 2)
print(second)
Count elements matching a selector
Counts how many elements in the document match the given CSS selector.
use plugin html::{select_count}
let doc = "<p>One</p><p>Two</p><p>Three</p>"
let n = select_count(doc, "p")
print("Paragraph count: {n}")
Pair it with a class selector to check whether a page contains a given widget before doing more work:
use plugin html::{select_count}
let doc = "<div class='alert'>Warning</div><div class='alert'>Error</div>"
if select_count(doc, ".alert") > 0 {
print("page has alerts")
}
Get full outer HTML of matching elements
Returns a table of full outer HTML strings (including the element tag itself) for each element matching the selector.
use plugin html::{outer_html}
let doc = "<div class='card'><span>Hi</span></div>"
let results = outer_html(doc, ".card")
print(results[1])
Extract all meta tag name/content pairs
Returns a table of {name, content} tables for every <meta> tag, using the name or property attribute as the key.
use plugin html::{extract_meta}
let doc = "<meta name='description' content='Page about cats'><meta property='og:title' content='Cats'>"
let metas = extract_meta(doc)
print(metas[1]["name"])
print(metas[1]["content"])
Extract the page title string
Returns the text content of the first <title> element, or an empty string if none exists.
use plugin html::{extract_title}
let doc = "<html><head><title>My Page</title></head><body></body></html>"
let title = extract_title(doc)
print(title)
Extract script tags with src and inline code
Returns a table of {src, inline_code} tables for every <script> tag. External scripts have src set; inline scripts have inline_code set.
use plugin html::{extract_scripts}
let doc = "<script src='/app.js'></script><script>console.log('hi')</script>"
let scripts = extract_scripts(doc)
print(scripts[1]["src"])
print(scripts[2]["inline_code"])
Extract stylesheet links and inline CSS
Returns a table of {href, inline_css} tables for every <link rel="stylesheet"> and <style> element.
use plugin html::{extract_styles}
let doc = "<link rel='stylesheet' href='/style.css'><style>body { margin: 0 }</style>"
let styles = extract_styles(doc)
print(styles[1]["href"])
print(styles[2]["inline_css"])
Extract HTML tables as nested arrays
Extracts all <table> elements as nested arrays. Returns a table of tables of rows, where each row is a table of cell text strings.
use plugin html::{extract_tables}
let doc = "<table><tr><th>Name</th><th>Age</th></tr><tr><td>Alice</td><td>30</td></tr></table>"
let tables = extract_tables(doc)
let row1 = tables[1][1]
print(row1[1])
print(row1[2])
Escape special characters as HTML entities
Escapes &, <, >, ", and 39; into their HTML entity equivalents, safe for inserting into HTML.
use plugin html::{escape}
let safe = escape("<script>alert('xss')</script>")
print(safe)
Escape user input before interpolating it into markup you build by hand:
use plugin html::{escape}
let name = "Tom & \"Jerry\""
print("<span>{escape(name)}</span>")
Decode HTML entities back to characters
Decodes common HTML entities (&, <, >, ", ', , etc.) back to their original characters.
use plugin html::{unescape}
let raw = unescape("Tom & Jerry <3")
print(raw)
Build an HTML element string
Builds an HTML element string. Pass a table for attrs (string key/value pairs) and a string for inner_html. Void tags (br, img, input, etc.) are self-closed.
use plugin html::{create_element}
let link = create_element("a", #{"href": "/home", "class": "nav"}, "Home")
print(link)
let br = create_element("br", nil, nil)
print(br)