Under the hood

How RemeDocs works

A three-phase pipeline that turns any PDF — scanned, complex, or legacy — into a fully compliant accessible document in seconds.

Analysis

Phase 1

Document intelligence & structure detection

Before any remediation begins, RemeDocs builds a full map of your document — page layout, content types, reading flow, and existing tag structure. This analysis phase is what makes the downstream fixes accurate rather than guessed.

OCR & text extraction — Tesseract-powered OCR for scanned pages, PyMuPDF for native text. Handles handwriting, mixed content, and multi-language documents.

Layout classification — Detects single-column, multi-column, table-heavy, form, and mixed layouts. Each gets routed to the appropriate remediation path.

Reading order inference — Uses spatial analysis and semantic heuristics to establish logical content flow, even in complex multi-column and sidebar layouts.

Existing tag audit — Evaluates any pre-existing tags for accuracy. Broken or malformed tags are flagged and re-built rather than simply preserved.

Image inventory — Catalogs every image, figure, chart, and decorative element. Each is classified as informational or decorative to guide alt text generation.

Remediation

Phase 2

AI-powered accessibility fixes

With the document map in hand, RemeDocs applies a full suite of accessibility fixes — structural tagging, alt text generation, heading hierarchy, and metadata — programmatically and at scale.

Full PDF tagging — Builds a complete tag tree: Document, Part, Article, Section, P, H1–H6, Table, TR, TD, Figure, and Span tags per the PDF/UA specification.

AI alt text — Every informational image is described by a vision model (Claude). Descriptions capture content, context, and purpose — not just visual appearance.

Heading hierarchy repair — Detects visual heading styles and maps them to semantic H1–H6 levels. Skipped levels and ambiguous headings are resolved automatically.

Table structure — Identifies header rows, scope attributes, and merged cells. Complex tables get TH/TD mapping with proper scope and id/headers relationships.

Link text remediation — "Click here" and bare-URL links are rewritten to be descriptive. Anchor text is inferred from surrounding context.

Language & metadata — Document language is set in XMP metadata and the PDF catalog. Title, author, and subject are populated or corrected.

Output

Phase 3

Compliant file delivery & audit trail

The remediated PDF is packaged to the PDF/UA-1 specification, validated against WCAG 2.1 AA, and delivered with a full audit certificate. Every fix is logged for traceability.

PDF/UA-1 packaging — The output PDF is written to ISO 14289-1 (PDF/UA) standard using pikepdf. DisplayDocTitle is set, tab order is standardized, and the XMP metadata includes the PDF/UA identifier.

Automated validation — The output is checked with PAC 2024 and VeraPDF rules before delivery. Issues that can't be auto-resolved are flagged in the audit report.

Audit certificate — Every document gets a signed PDF/UA conformance certificate with a timestamp, issue log, fix summary, and test results. Suitable for legal and compliance records.

API delivery — Remediated files are returned via REST API, S3-compatible storage, or webhook callback. Async processing with job tracking for large batches.

What RemeDocs fixes

Every PDF that passes through the pipeline receives a comprehensive set of accessibility corrections.

Tag structure

Most PDFs are created without any semantic tag structure, which means screen readers encounter a wall of undifferentiated text. RemeDocs adds a complete tag tree — headings, paragraphs, lists, tables, figures, and sections — so assistive technologies can navigate the document the same way a sighted user scans a page. Tags follow the PDF/UA specification, and every content element is assigned the correct role in the document hierarchy.

Reading order

A PDF can look perfectly organized on screen yet present content in a completely wrong sequence to a screen reader. This happens because the visual layout and the underlying content stream are two different things. RemeDocs analyzes the spatial relationships between text blocks, columns, sidebars, and footnotes to establish a logical reading sequence that matches what a sighted reader would follow from top to bottom and left to right.

Alt text

Images without alternative text are invisible to screen reader users. RemeDocs uses AI-powered vision models to generate descriptive alt text for every informational image — charts, photographs, diagrams, and logos. Decorative images are marked as artifacts so screen readers skip them entirely. The result is that users who cannot see the images still receive the information those images are meant to convey.

Table structure

Data tables are among the hardest elements to make accessible. Without proper header cell associations, a screen reader user has no way to know which column or row a particular value belongs to. RemeDocs identifies header rows and columns, adds scope attributes, and maps complex merged-cell layouts with id/headers relationships so that every data cell is associated with its corresponding headers.

Metadata

Standards-compliant PDFs require specific metadata fields: a document title (displayed in the browser tab instead of the filename), a declared language (so screen readers use the correct pronunciation engine), and a PDF/UA identifier that signals conformance to assistive technology. RemeDocs sets all of these automatically, along with author, subject, and XMP metadata.

Bookmarks

Long documents without bookmarks force screen reader users to scroll through every page to find a specific section. RemeDocs generates a navigable bookmark outline from the heading structure, giving users a table of contents they can use to jump directly to any part of the document. This is especially valuable for reports, manuals, and multi-chapter publications.

Standards we meet

Every remediated document is validated against three accessibility standards before delivery.

RemeDocs output conforms to WCAG 2.1 Level AA, the Web Content Accessibility Guidelines published by the W3C and referenced by virtually every accessibility regulation worldwide. It also meets Section 508 of the Rehabilitation Act, which requires federal agencies and their contractors to produce accessible electronic documents. Finally, every PDF is packaged to the PDF/UA-1 (ISO 14289-1) specification, the international standard for accessible PDF files.

These three standards overlap significantly, but each has requirements the others do not. By targeting all three simultaneously, RemeDocs ensures your documents are compliant regardless of which regulation applies to your organization. For a detailed comparison of how these standards differ and where they converge, see our guide: Section 508 vs. WCAG vs. PDF/UA — What You Actually Need to Know.

Get started

See it on your own documents

Upload a PDF and get a free accessibility audit — no credit card, no commitment.

Get your free audit