ocr
stableImage preprocessing primitives for optical character recognition pipelines: grayscale conversion, thresholding, morphological operations, and connected-component analysis.
use plugin ocr::{to_grayscale, threshold, invert_bytes, …} Functions (13)
- to_grayscale Convert RGBA bytes to single-channel grayscale
- threshold Apply a global binary threshold
- invert_bytes Invert all byte values in an image buffer
- count_black Count zero-value pixels in a binary image
- dilate 3×3 morphological dilation
- erode 3×3 morphological erosion
- connected_components Label connected foreground regions
- bounding_boxes Compute bounding boxes of connected regions
- histogram Build a 256-bin intensity histogram
- adaptive_threshold Local mean adaptive thresholding
- crop_region Extract a rectangular region from an image
- resize_nearest Nearest-neighbor resize of a grayscale image
- pixel_density Fraction of foreground pixels in a region
Overview
ocr is a collection of low-level image preprocessing primitives for building optical character recognition pipelines without pulling in a heavyweight vision library. It operates on plain byte buffers: an RGBA buffer (4 bytes per pixel) or a single-channel buffer (1 byte per pixel) where 0 is black and 255 is white. Nothing is stateful or handle-based — every function takes a buffer plus the dimensions it needs and returns a new buffer, an integer, a number, or a table, so you compose them freely by feeding one result into the next.
The typical mental model is a funnel: start from raw RGBA pixels, collapse to grayscale, binarize to pure black-and-white, clean up with morphology, then analyze the connected ink regions to find candidate characters. Reach for this plugin when you need the preprocessing front end of an OCR or document-analysis system and want full control over each stage.
Common patterns
Binarize an RGBA frame and measure how much ink it contains:
use plugin ocr::{to_grayscale, threshold, count_black}
let gray = to_grayscale(rgba_bytes, 640, 480)
let binary = threshold(gray, 128)
print("ink pixels: {count_black(binary)}")
Clean noise with morphology, then locate candidate character regions:
use plugin ocr::{to_grayscale, threshold, erode, dilate, bounding_boxes}
let gray = to_grayscale(rgba_bytes, 800, 600)
let binary = threshold(gray, 140)
let cleaned = dilate(erode(binary, 800, 600), 800, 600)
let boxes = bounding_boxes(cleaned, 800, 600)
print("{boxes[1]["width"]}x{boxes[1]["height"]} region")
Isolate a detected box and normalize it to a fixed classifier input size:
use plugin ocr::{bounding_boxes, crop_region, resize_nearest}
let boxes = bounding_boxes(binary, 800, 600)
let b = boxes[1]
let patch = crop_region(binary, 800, 600, b["x"], b["y"], b["width"], b["height"])
let normalized = resize_nearest(patch, b["width"], b["height"], 28, 28)
print("patch bytes: {normalized}")
Convert RGBA bytes to single-channel grayscale
Converts a flat RGBA byte buffer to a single-channel grayscale buffer using the luminance formula 0.299R + 0.587G + 0.114B. The input must be exactly w * h * 4 bytes; the output is w * h bytes.
use plugin ocr::{to_grayscale}
let gray = to_grayscale(rgba_bytes, 640, 480)
print("gray bytes: {gray}")
Grayscale is the first stage of almost every pipeline — feed it straight into a histogram to study the intensity distribution before choosing a threshold:
use plugin ocr::{to_grayscale, histogram}
let gray = to_grayscale(rgba_bytes, 640, 480)
let hist = histogram(gray)
print("very dark pixels: {hist[0]}")
Apply a global binary threshold
Applies a global binary threshold: pixels with value >= val become 255 (white), all others become 0 (black). Use after to_grayscale to binarize an image before further processing.
use plugin ocr::{to_grayscale, threshold}
let gray = to_grayscale(rgba_bytes, 320, 240)
let binary = threshold(gray, 128)
For text that is lighter than its background, invert after thresholding so the ink ends up as foreground:
use plugin ocr::{to_grayscale, threshold, invert_bytes}
let gray = to_grayscale(rgba_bytes, 320, 240)
let binary = invert_bytes(threshold(gray, 200))
Invert all byte values in an image buffer
Inverts every byte value (255 - b) in the buffer. Useful when text is white-on-black and algorithms expect black-on-white, or vice versa.
use plugin ocr::{invert_bytes}
let inverted = invert_bytes(binary_bytes)
Count zero-value pixels in a binary image
Counts the number of pixels with value 0 in a binary image buffer. Use to measure how much foreground (ink) is present in a region.
use plugin ocr::{threshold, count_black}
let binary = threshold(gray, 128)
let ink = count_black(binary)
print("black pixels: {ink}")
3×3 morphological dilation
Performs 3×3 morphological dilation on a binary image: a pixel becomes 255 if any pixel in its 3×3 neighborhood is 255. Expands foreground regions and closes small gaps between characters.
use plugin ocr::{dilate}
let expanded = dilate(binary_bytes, 320, 240)
3×3 morphological erosion
Performs 3×3 morphological erosion on a binary image: a pixel becomes 0 if any pixel in its 3×3 neighborhood is 0. Shrinks foreground regions and removes small noise specks.
use plugin ocr::{erode, dilate}
let cleaned = erode(dilate(binary_bytes, 320, 240), 320, 240)
Label connected foreground regions
Labels connected foreground (255) regions using two-pass union-find. Returns {count, labels} where count is the number of distinct components and labels is a flat table mapping each pixel index to its component label (0 = background).
use plugin ocr::{connected_components}
let result = connected_components(binary_bytes, 320, 240)
print("{result["count"]} components found")
The flat labels table lets you inspect which component owns a given pixel — index it by y * w + x:
use plugin ocr::{connected_components}
let result = connected_components(binary_bytes, 320, 240)
let labels = result["labels"]
print("pixel (5,3) belongs to component {labels[3 * 320 + 5]}")
Compute bounding boxes of connected regions
Computes an axis-aligned bounding box for each connected component. Returns a table of entries, each with {x, y, width, height, label}. Useful for locating candidate character or word regions before feeding to a classifier.
use plugin ocr::{to_grayscale, threshold, bounding_boxes}
let gray = to_grayscale(rgba_bytes, 800, 600)
let binary = threshold(gray, 140)
let boxes = bounding_boxes(binary, 800, 600)
let box1 = boxes[1]
print("region at ({box1["x"]}, {box1["y"]}) size {box1["width"]}x{box1["height"]}")
Build a 256-bin intensity histogram
Builds a 256-entry intensity histogram from a grayscale buffer. The table is indexed 0–255, each value being the count of pixels at that intensity. Use to pick a good threshold value (e.g., Otsu's method).
use plugin ocr::{to_grayscale, histogram}
let gray = to_grayscale(rgba_bytes, 640, 480)
let hist = histogram(gray)
print("pixels at 128: {hist[128]}")
Local mean adaptive thresholding
Applies local mean adaptive thresholding: for each pixel, the threshold is the mean of the block_size × block_size neighborhood minus c. Better than global threshold for images with uneven lighting.
use plugin ocr::{to_grayscale, adaptive_threshold}
let gray = to_grayscale(rgba_bytes, 640, 480)
let binary = adaptive_threshold(gray, 640, 480, 15, 5.0)
A larger block size averages over a wider neighborhood, which is more robust to gradual shadows; raising c makes the result more aggressive at suppressing background:
use plugin ocr::{to_grayscale, adaptive_threshold, invert_bytes, count_black}
let gray = to_grayscale(rgba_bytes, 640, 480)
let tight = adaptive_threshold(gray, 640, 480, 31, 10.0)
print("foreground after adaptive: {count_black(invert_bytes(tight))}")
Extract a rectangular region from an image
Extracts a rectangular region from a single-channel image buffer. Pixels outside the source image bounds are filled with 0. Useful for isolating a detected bounding box for per-character processing.
use plugin ocr::{bounding_boxes, crop_region}
let boxes = bounding_boxes(binary, 800, 600)
let b = boxes[1]
let region = crop_region(binary, 800, 600, b["x"], b["y"], b["width"], b["height"])
Nearest-neighbor resize of a grayscale image
Resizes a single-channel image buffer to new_w × new_h using nearest-neighbor interpolation. Use to normalize character patches to a fixed input size before classification.
use plugin ocr::{resize_nearest}
let patch = resize_nearest(region_bytes, 24, 32, 28, 28)
Fraction of foreground pixels in a region
Returns the fraction of foreground (255) pixels within a rectangular region of a binary image. Values near 1.0 indicate a dense region; values near 0.0 indicate mostly background. Useful as a quick confidence estimate.
use plugin ocr::{pixel_density}
let density = pixel_density(binary, 640, 480, 10, 10, 50, 20)
print("ink density: {density}")
Pair it with bounding_boxes to score each detected region and skip near-empty ones:
use plugin ocr::{bounding_boxes, pixel_density}
let boxes = bounding_boxes(binary, 640, 480)
let b = boxes[1]
let d = pixel_density(binary, 640, 480, b["x"], b["y"], b["width"], b["height"])
if d > 0.1 {
print("region {b["label"]} looks like ink ({d})")
}