that's not for TrOCR, it's just for OCR, which may not work for handwriting
I did try some of the GPT steps:
pip install --upgrade transformers pillow pdf2image
getting some errors:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 3/4 [transformers] WARNING: The scripts transformers and transformers-cli are installed in '/home/user/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
mistral-common 1.5.2 requires pillow<11.0.0,>=10.3.0, but you have pillow 12.1.0 which is incompatible.
moviepy 2.1.2 requires pillow<11.0,>=9.2.0, but you have pillow 12.1.0 which is incompatible.
this is what GPT said to run, but it makes no sense because I don't have TrOCR even downloaded or running at all.
Install packages: pip install --upgrade transformers pillow pdf2image
Ensure poppler is installed:
Ubuntu/Debian: sudo apt install -y poppler-utils
macOS: brew install poppler
Execute: python3 trocr_pdf.py input.pdf output.txt
That's the script to save and run.
#!/usr/bin/env python3
import sys
from pdf2image import convert_from_path
from PIL import Image
import torch
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
def main(pdf_path, out_path="output.txt", dpi=300):
device = "cuda" if torch.cuda.is_available() else "cpu"
model_name = "microsoft/trocr-base-handwritten"
processor = TrOCRProcessor.from_pretrained(model_name)
model = VisionEncoderDecoderModel.from_pretrained(model_name).to(device)
pages = convert_from_path(pdf_path, dpi=dpi)
results = []
for i, page in enumerate(pages, 1):
page = page.convert("RGB")
# downscale if very large to avoid OOM
max_dim = 1600
if max(page.width, page.height) > max_dim:
scale = max_dim / max(page.width, page.height)
page = page.resize((int(page.width*scale), int(page.height*scale)), Image.Resampling.LANCZOS)
pixel_values = processor(images=page, return_tensors="pt").pixel_values.to(device)
generated_ids = model.generate(pixel_values, max_length=512)
text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
results.append(f"--- Page {i} ---\n{text.strip()}\n")
with open(out_path, "w", encoding="utf-8") as f:
f.write("\n".join(results))
print(f"Saved OCR text to {out_path}")
if __name__ == "__main__":
if len(sys.argv) < 2:
print("Usage: python3 trocr_pdf.py input.pdf [output.txt]")
sys.exit(1)
pdf_path = sys.argv[1]
out_path = sys.argv[2] if len(sys.argv) > 2 else "output.txt"
main(pdf_path, out_path)
Terminal error after running GPT code:
python3 trocr_pdf.py small.pdf output.txt Traceback (most recent call last): File "/home/user/.local/lib/python3.10/site-packages/transformers/utils/hub.py", line 479, in cached_files hf_hub_download( File "/home/user/.local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn return fn(*args, **kwargs) File "/home/user/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1007, in hf_hub_download return _hf_hub_download_to_cache_dir( File "/home/user/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1124, in _hf_hub_download_to_cache_dir os.makedirs(os.path.dirname(blob_path), exist_ok=True) File "/usr/lib/python3.10/os.py", line 215, in makedirs makedirs(head, exist_ok=exist_ok) File "/usr/lib/python3.10/os.py", line 225, in makedirs mkdir(name, mode) PermissionError: [Errno 13] Permission denied: '/home/user/.cache/huggingface/hub/models--microsoft--trocr-base-handwritten' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/user/Documents/trocr_pdf.py", line 39, in <module> main(pdf_path, out_path) File "/home/user/Documents/trocr_pdf.py", line 11, in main processor = TrOCRProcessor.from_pretrained(model_name) File "/home/user/.local/lib/python3.10/site-packages/transformers/processing_utils.py", line 1394, in from_pretrained args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs) File "/home/user/.local/lib/python3.10/site-packages/transformers/processing_utils.py", line 1453, in _get_arguments_from_pretrained args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs)) File "/home/user/.local/lib/python3.10/site-packages/transformers/models/auto/image_processing_auto.py", line 489, in from_pretrained raise initial_exception File "/home/user/.local/lib/python3.10/site-packages/transformers/models/auto/image_processing_auto.py", line 476, in from_pretrained config_dict, _ = ImageProcessingMixin.get_image_processor_dict( File "/home/user/.local/lib/python3.10/site-packages/transformers/image_processing_base.py", line 333, in get_image_processor_dict resolved_image_processor_files = [ File "/home/user/.local/lib/python3.10/site-packages/transformers/image_processing_base.py", line 337, in <listcomp> resolved_file := cached_file( File "/home/user/.local/lib/python3.10/site-packages/transformers/utils/hub.py", line 322, in cached_file file = cached_files(path_or_repo_id=path_or_repo_id, filenames=[filename], **kwargs) File "/home/user/.local/lib/python3.10/site-packages/transformers/utils/hub.py", line 524, in cached_files raise OSError( OSError: PermissionError at /home/user/.cache/huggingface/hub/models--microsoft--trocr-base-handwritten when downloading microsoft/trocr-base-handwritten. Check cache directory permissions. Common causes: 1) another user is downloading the same model (please wait); 2) a previous download was canceled and the lock file needs manual removal.LLMs are so bad at code sometimes. This happens all the time time with LLMs and code for me, the code is unusable and it saves no time because it's a rabbit hole leading to nowhere.
I also don't know if this is the right approach to the problem. Any sort of GUI interface would be easier. This is also hundreds of pages of handwritten stuff I want to change to text.