ocr

stable

Image preprocessing primitives for optical character recognition pipelines: grayscale conversion, thresholding, morphological operations, and connected-component analysis.

use plugin ocr::{to_grayscale, threshold, invert_bytes, …}

13 functions AI & ML

/ filter jk navigate Esc clear

Functions (13)

to_grayscale Convert RGBA bytes to single-channel grayscale
threshold Apply a global binary threshold
invert_bytes Invert all byte values in an image buffer
count_black Count zero-value pixels in a binary image
dilate 3×3 morphological dilation
erode 3×3 morphological erosion
connected_components Label connected foreground regions
bounding_boxes Compute bounding boxes of connected regions
histogram Build a 256-bin intensity histogram
adaptive_threshold Local mean adaptive thresholding
crop_region Extract a rectangular region from an image
resize_nearest Nearest-neighbor resize of a grayscale image
pixel_density Fraction of foreground pixels in a region

Overview

ocr is a collection of low-level image preprocessing primitives for building optical character recognition pipelines without pulling in a heavyweight vision library. It operates on plain byte buffers: an RGBA buffer (4 bytes per pixel) or a single-channel buffer (1 byte per pixel) where 0 is black and 255 is white. Nothing is stateful or handle-based — every function takes a buffer plus the dimensions it needs and returns a new buffer, an integer, a number, or a table, so you compose them freely by feeding one result into the next.

The typical mental model is a funnel: start from raw RGBA pixels, collapse to grayscale, binarize to pure black-and-white, clean up with morphology, then analyze the connected ink regions to find candidate characters. Reach for this plugin when you need the preprocessing front end of an OCR or document-analysis system and want full control over each stage.

Common patterns

Binarize an RGBA frame and measure how much ink it contains:

use plugin ocr::{to_grayscale, threshold, count_black}

let gray = to_grayscale(rgba_bytes, 640, 480)
let binary = threshold(gray, 128)
print("ink pixels: {count_black(binary)}")

Clean noise with morphology, then locate candidate character regions:

use plugin ocr::{to_grayscale, threshold, erode, dilate, bounding_boxes}

let gray = to_grayscale(rgba_bytes, 800, 600)
let binary = threshold(gray, 140)
let cleaned = dilate(erode(binary, 800, 600), 800, 600)
let boxes = bounding_boxes(cleaned, 800, 600)
print("{boxes[1]["width"]}x{boxes[1]["height"]} region")

Isolate a detected box and normalize it to a fixed classifier input size:

use plugin ocr::{bounding_boxes, crop_region, resize_nearest}

let boxes = bounding_boxes(binary, 800, 600)
let b = boxes[1]
let patch = crop_region(binary, 800, 600, b["x"], b["y"], b["width"], b["height"])
let normalized = resize_nearest(patch, b["width"], b["height"], 28, 28)
print("patch bytes: {normalized}")

to_grayscale(rgba_bytes, w, h) → any

Convert RGBA bytes to single-channel grayscale

Converts a flat RGBA byte buffer to a single-channel grayscale buffer using the luminance formula 0.299R + 0.587G + 0.114B. The input must be exactly w * h * 4 bytes; the output is w * h bytes.

use plugin ocr::{to_grayscale}

let gray = to_grayscale(rgba_bytes, 640, 480)
print("gray bytes: {gray}")

Grayscale is the first stage of almost every pipeline — feed it straight into a histogram to study the intensity distribution before choosing a threshold:

use plugin ocr::{to_grayscale, histogram}

let gray = to_grayscale(rgba_bytes, 640, 480)
let hist = histogram(gray)
print("very dark pixels: {hist[0]}")

threshold(gray_bytes, val) → any

Apply a global binary threshold

Applies a global binary threshold: pixels with value >= val become 255 (white), all others become 0 (black). Use after to_grayscale to binarize an image before further processing.

use plugin ocr::{to_grayscale, threshold}

let gray = to_grayscale(rgba_bytes, 320, 240)
let binary = threshold(gray, 128)

For text that is lighter than its background, invert after thresholding so the ink ends up as foreground:

use plugin ocr::{to_grayscale, threshold, invert_bytes}

let gray = to_grayscale(rgba_bytes, 320, 240)
let binary = invert_bytes(threshold(gray, 200))

invert_bytes(bytes) → any

Invert all byte values in an image buffer

Inverts every byte value (255 - b) in the buffer. Useful when text is white-on-black and algorithms expect black-on-white, or vice versa.

use plugin ocr::{invert_bytes}

let inverted = invert_bytes(binary_bytes)

count_black(bytes) → int

Count zero-value pixels in a binary image

Counts the number of pixels with value 0 in a binary image buffer. Use to measure how much foreground (ink) is present in a region.

use plugin ocr::{threshold, count_black}

let binary = threshold(gray, 128)
let ink = count_black(binary)
print("black pixels: {ink}")

dilate(bytes, w, h) → any

3×3 morphological dilation

Performs 3×3 morphological dilation on a binary image: a pixel becomes 255 if any pixel in its 3×3 neighborhood is 255. Expands foreground regions and closes small gaps between characters.

use plugin ocr::{dilate}

let expanded = dilate(binary_bytes, 320, 240)

erode(bytes, w, h) → any

3×3 morphological erosion

Performs 3×3 morphological erosion on a binary image: a pixel becomes 0 if any pixel in its 3×3 neighborhood is 0. Shrinks foreground regions and removes small noise specks.

use plugin ocr::{erode, dilate}

let cleaned = erode(dilate(binary_bytes, 320, 240), 320, 240)

connected_components(binary_bytes, w, h) → table

Label connected foreground regions

Labels connected foreground (255) regions using two-pass union-find. Returns {count, labels} where count is the number of distinct components and labels is a flat table mapping each pixel index to its component label (0 = background).

use plugin ocr::{connected_components}

let result = connected_components(binary_bytes, 320, 240)
print("{result["count"]} components found")

The flat labels table lets you inspect which component owns a given pixel — index it by y * w + x:

use plugin ocr::{connected_components}

let result = connected_components(binary_bytes, 320, 240)
let labels = result["labels"]
print("pixel (5,3) belongs to component {labels[3 * 320 + 5]}")

bounding_boxes(binary_bytes, w, h) → table

Compute bounding boxes of connected regions

Computes an axis-aligned bounding box for each connected component. Returns a table of entries, each with {x, y, width, height, label}. Useful for locating candidate character or word regions before feeding to a classifier.

use plugin ocr::{to_grayscale, threshold, bounding_boxes}

let gray = to_grayscale(rgba_bytes, 800, 600)
let binary = threshold(gray, 140)
let boxes = bounding_boxes(binary, 800, 600)
let box1 = boxes[1]
print("region at ({box1["x"]}, {box1["y"]}) size {box1["width"]}x{box1["height"]}")

histogram(gray_bytes) → table

Build a 256-bin intensity histogram

Builds a 256-entry intensity histogram from a grayscale buffer. The table is indexed 0–255, each value being the count of pixels at that intensity. Use to pick a good threshold value (e.g., Otsu's method).

use plugin ocr::{to_grayscale, histogram}

let gray = to_grayscale(rgba_bytes, 640, 480)
let hist = histogram(gray)
print("pixels at 128: {hist[128]}")

adaptive_threshold(gray_bytes, w, h, block_size, c) → any

Local mean adaptive thresholding

Applies local mean adaptive thresholding: for each pixel, the threshold is the mean of the block_size × block_size neighborhood minus c. Better than global threshold for images with uneven lighting.

use plugin ocr::{to_grayscale, adaptive_threshold}

let gray = to_grayscale(rgba_bytes, 640, 480)
let binary = adaptive_threshold(gray, 640, 480, 15, 5.0)

A larger block size averages over a wider neighborhood, which is more robust to gradual shadows; raising c makes the result more aggressive at suppressing background:

use plugin ocr::{to_grayscale, adaptive_threshold, invert_bytes, count_black}

let gray = to_grayscale(rgba_bytes, 640, 480)
let tight = adaptive_threshold(gray, 640, 480, 31, 10.0)
print("foreground after adaptive: {count_black(invert_bytes(tight))}")

crop_region(bytes, w, h, x, y, crop_w, crop_h) → any

Extract a rectangular region from an image

Extracts a rectangular region from a single-channel image buffer. Pixels outside the source image bounds are filled with 0. Useful for isolating a detected bounding box for per-character processing.

use plugin ocr::{bounding_boxes, crop_region}

let boxes = bounding_boxes(binary, 800, 600)
let b = boxes[1]
let region = crop_region(binary, 800, 600, b["x"], b["y"], b["width"], b["height"])

resize_nearest(bytes, w, h, new_w, new_h) → any

Nearest-neighbor resize of a grayscale image

Resizes a single-channel image buffer to new_w × new_h using nearest-neighbor interpolation. Use to normalize character patches to a fixed input size before classification.

use plugin ocr::{resize_nearest}

let patch = resize_nearest(region_bytes, 24, 32, 28, 28)

pixel_density(binary_bytes, w, h, x, y, region_w, region_h) → number

Fraction of foreground pixels in a region

Returns the fraction of foreground (255) pixels within a rectangular region of a binary image. Values near 1.0 indicate a dense region; values near 0.0 indicate mostly background. Useful as a quick confidence estimate.

use plugin ocr::{pixel_density}

let density = pixel_density(binary, 640, 480, 10, 10, 50, 20)
print("ink density: {density}")

Pair it with bounding_boxes to score each detected region and skip near-empty ones:

use plugin ocr::{bounding_boxes, pixel_density}

let boxes = bounding_boxes(binary, 640, 480)
let b = boxes[1]
let d = pixel_density(binary, 640, 480, b["x"], b["y"], b["width"], b["height"])
if d > 0.1 {
  print("region {b["label"]} looks like ink ({d})")
}

View source code