Welcome to Pdfstruct⚓︎

Home Illustration

A python module that builds upon the PyMuPDF library to extract the physical and logical structure a pdf file.

It mainly aims to detect section titles and table of contents, and also handles the splitting of aggregated pdf files.

This module is a tool used in the LIRIAe project (Projet de Liseuse et Recherche Intelligente pour les Autorités environnementales)