Python: How can I split a PDF while preserving Web Accessibility to comply with WCAG?

41 views Asked by At

I'm writing a small program in Python that simply splits a PDF into chunks with a given number of pages. The splitting itself works perfectly, but the Web Accessibility, aka. Web Content Accessibility Guidelines (WCAG), is lost.

How can I preserve the formatting necessary to keep the output files web accessible?

First I tried using PyPDF2, and got the splitting to work. However, I turned to a more advanced library. PyMuPDF should apparently be good at this type of thing.

Here is a simplified version of my method (I excluded a lot of irrelevant code for the sake of my question):

import os
import fitz  # PyMuPDF

def split_pdf():
    file_path = "path_to_pdf"
    chunk_size = "pages_per_chunk"
    output_folder = f"{file_path[:-4]}_output"

    pdf_document = fitz.open(file_path)
    total_pages = pdf_document.page_count

    for start in range(0, total_pages, chunk_size):
        end = min(start + chunk_size, total_pages)
        output_filename = f"{output_folder}/pages_{start+1}_to_{end}.pdf"

        doc_subset = fitz.open()  # Create a new PDF to hold the subset
        for page_num in range(start, end):
            doc_subset.insert_pdf(pdf_document, from_page=page_num, to_page=page_num)

        doc_subset.save(output_filename)
        doc_subset.close()

    pdf_document.close()

Any help would be greatly appreciated!

0

There are 0 answers