Skip Navigation

InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)S
Posts
3
Comments
58
Joined
1 mo. ago

  • Terminal error after running GPT code:

     
        
    
    
    python3 trocr_pdf.py small.pdf output.txt
    Traceback (most recent call last):
      File "/home/user/.local/lib/python3.10/site-packages/transformers/utils/hub.py", line 479, in cached_files
        hf_hub_download(
      File "/home/user/.local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
        return fn(*args, **kwargs)
      File "/home/user/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1007, in hf_hub_download
        return _hf_hub_download_to_cache_dir(
      File "/home/user/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1124, in _hf_hub_download_to_cache_dir
        os.makedirs(os.path.dirname(blob_path), exist_ok=True)
      File "/usr/lib/python3.10/os.py", line 215, in makedirs
        makedirs(head, exist_ok=exist_ok)
      File "/usr/lib/python3.10/os.py", line 225, in makedirs
        mkdir(name, mode)
    PermissionError: [Errno 13] Permission denied: '/home/user/.cache/huggingface/hub/models--microsoft--trocr-base-handwritten'
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "/home/user/Documents/trocr_pdf.py", line 39, in <module>
        main(pdf_path, out_path)
      File "/home/user/Documents/trocr_pdf.py", line 11, in main
        processor = TrOCRProcessor.from_pretrained(model_name)
      File "/home/user/.local/lib/python3.10/site-packages/transformers/processing_utils.py", line 1394, in from_pretrained
        args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
      File "/home/user/.local/lib/python3.10/site-packages/transformers/processing_utils.py", line 1453, in _get_arguments_from_pretrained
        args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
      File "/home/user/.local/lib/python3.10/site-packages/transformers/models/auto/image_processing_auto.py", line 489, in from_pretrained
        raise initial_exception
      File "/home/user/.local/lib/python3.10/site-packages/transformers/models/auto/image_processing_auto.py", line 476, in from_pretrained
        config_dict, _ = ImageProcessingMixin.get_image_processor_dict(
      File "/home/user/.local/lib/python3.10/site-packages/transformers/image_processing_base.py", line 333, in get_image_processor_dict
        resolved_image_processor_files = [
      File "/home/user/.local/lib/python3.10/site-packages/transformers/image_processing_base.py", line 337, in <listcomp>
        resolved_file := cached_file(
      File "/home/user/.local/lib/python3.10/site-packages/transformers/utils/hub.py", line 322, in cached_file
        file = cached_files(path_or_repo_id=path_or_repo_id, filenames=[filename], **kwargs)
      File "/home/user/.local/lib/python3.10/site-packages/transformers/utils/hub.py", line 524, in cached_files
        raise OSError(
    OSError: PermissionError at /home/user/.cache/huggingface/hub/models--microsoft--trocr-base-handwritten when downloading microsoft/trocr-base-handwritten. Check cache directory permissions. Common causes: 1) another user is downloading the same model (please wait); 2) a previous download was canceled and the lock file needs manual removal.
    
      

    LLMs are so bad at code sometimes. This happens all the time time with LLMs and code for me, the code is unusable and it saves no time because it's a rabbit hole leading to nowhere.

    I also don't know if this is the right approach to the problem. Any sort of GUI interface would be easier. This is also hundreds of pages of handwritten stuff I want to change to text.

  • that's not for TrOCR, it's just for OCR, which may not work for handwriting

    I did try some of the GPT steps:

     
        
    pip install --upgrade transformers pillow pdf2image
    
    
    
      

    getting some errors:

     
        
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 3/4 [transformers]  WARNING: The scripts transformers and transformers-cli are installed in '/home/user/.local/bin' which is not on PATH.
      Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
    ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
    mistral-common 1.5.2 requires pillow<11.0.0,>=10.3.0, but you have pillow 12.1.0 which is incompatible.
    moviepy 2.1.2 requires pillow<11.0,>=9.2.0, but you have pillow 12.1.0 which is incompatible.
    
    
    
    
      

    this is what GPT said to run, but it makes no sense because I don't have TrOCR even downloaded or running at all.

     
            Install packages: pip install --upgrade transformers pillow pdf2image
        Ensure poppler is installed:
    
        Ubuntu/Debian: sudo apt install -y poppler-utils
        macOS: brew install poppler
    
        Execute: python3 trocr_pdf.py input.pdf output.txt
    
    
      

    That's the script to save and run.

     
        
    #!/usr/bin/env python3
    import sys
    from pdf2image import convert_from_path
    from PIL import Image
    import torch
    from transformers import TrOCRProcessor, VisionEncoderDecoderModel
    
    def main(pdf_path, out_path="output.txt", dpi=300):
        device = "cuda" if torch.cuda.is_available() else "cpu"
        model_name = "microsoft/trocr-base-handwritten"
        processor = TrOCRProcessor.from_pretrained(model_name)
        model = VisionEncoderDecoderModel.from_pretrained(model_name).to(device)
    
        pages = convert_from_path(pdf_path, dpi=dpi)
        results = []
        for i, page in enumerate(pages, 1):
            page = page.convert("RGB")
            # downscale if very large to avoid OOM
            max_dim = 1600
            if max(page.width, page.height) > max_dim:
                scale = max_dim / max(page.width, page.height)
                page = page.resize((int(page.width*scale), int(page.height*scale)), Image.Resampling.LANCZOS)
    
            pixel_values = processor(images=page, return_tensors="pt").pixel_values.to(device)
            generated_ids = model.generate(pixel_values, max_length=512)
            text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
            results.append(f"--- Page {i} ---\n{text.strip()}\n")
    
        with open(out_path, "w", encoding="utf-8") as f:
            f.write("\n".join(results))
        print(f"Saved OCR text to {out_path}")
    
    if __name__ == "__main__":
        if len(sys.argv) < 2:
            print("Usage: python3 trocr_pdf.py input.pdf [output.txt]")
            sys.exit(1)
        pdf_path = sys.argv[1]
        out_path = sys.argv[2] if len(sys.argv) > 2 else "output.txt"
        main(pdf_path, out_path)
    
    
      

  • I don't remember exactly, but I have rocm 7.2 installed, and there was something I was trying to install inside pip for rocm and it just wouldn't work, it was like 7.2 rocm wasn't out or the link didn't work. The LLM tried multiple suggestion and they all failed, then I gave up. When I said "inside" pip, I don't know if that's accurate. I am very knew to pip and am decent at linux and only know a small amount of coding and lack python familiarity.

  • Lots of people laughing in Germany in the 30s at people fleeing weren't laughing in the 40s.

  • which community? i did look and didn't see one.

  • I am happy to participate. I do not want my IP and ping data sold to data brokers to serve targeted ads, track me, and go to the police surveillance state. I'm sure you are a good citizen who always keeps location on and feels like life would be easier if everyone just complied while you proudly put ring cameras on every door. Not everyone is a tech bro neo-feudalist.

  • i got that a lot when i first started using linux-based distros. "why aren't you typing man?" or "just type sudo rmdir /"

  • GPT 5 gave me a lot of code that returned errors. I really need help with the specific terminal code or knowing if I am even approaching the problem right.

  • I am trying to run an OCR program for handwriting to process some large PDFs of old journals that are scanned into PDF. Doing it by hand will take a very long time. I have a amd gpu and have rocm installed. I tried to configure pip with rocm and failed. I was considering pulling a docker of PyTorth and then configuring gradio in it, then trying to get gradio to run TrOCR. I have never run gradio. I have "easier" LLM programs like LM Studio and Ollama but I don't know if they can run TrOCR. There is AMD documentation on running OCR (https://rocm.docs.amd.com/projects/ai-developer-hub/en/latest/notebooks/inference/ocr_vllm.html) but it's not clear if it works well with handwriting. TrOCR is just trained for handwriting. It's also on huggingface, which i don't know how to use that well.

  • hell fucking no. I do not need some large corporate company taking a) my browser fingerprint b) my real IP address and ping time and c) selling them to every data broker that exists. Fuck no. Buying a residential IP costs less than a dollar. Big data does not need to associate my online activities with who I am IRL.

    It's also a terrible idea because a lot of my online protection is because no one know my initial ping time, so I would be giving big data information like "here's this user with browser hash ########### and they have a ping time of X ms and origination IP of X.X.X.X." so after that, even if I protected my origination IP later, they could guess who I am based on hops and ping time and browser fingerprint, because even privacy browsers have a browser fingerprint. Fuuuuuuuuuuuuuuck no.

  • I added a warning to the original post based on what you said.

  • Most of the public's awareness of technology is so naive as to basically stick their heads in the sand because they want the convenience and ease and are willing to overlook evil. So they don't care about supply chain evils, corruption that is embedded into the system, so they can scroll TikTok and watch netflix on their offtime, while using Klarma to pay the groceries. Real organization against organizers in which people aren't being data mined would require some technological awareness. The masses just don't have it.

    They join political groups on Facebook then get data mined and classified into oblivion until a computer can process their views and give them a discounted ad-supported Netflix so they are less politically upset.

    This is our world.

  • Right, when you apply for jobs, they want socials and indeeds and this and that, when you meet people they want to add you on Facebook. These are "private" companies, but because they are natural monopolies, they essentially become quasi-required parts of being able to function normally within society.

    The right way to deal with this is to regulate the fuck out of the natural monopolies, but rich elitists who own the government by proxy and big surveillance tech have this symbiotic relationship where big tech surveils and secretly data mines the public (to blacklist and exclude potential risks to the system and monitor those people) and so it won't happen.

    Republicans, hurt most by the blacklisting initially, were most likely to regulate big tech, ironically, but now money and ass-kissing of Republican leaders has changed the game and no one will fucking regulate them. All politicians who allow this to happen are either weak or removed, being paid to just perform a role of politician.

  • This information goes right to the US government.

    Anyone who doesn't think Facebook is an arm of the US government is naive at best.

  • conservatives are notorious for under reporting prejudice in surveys and also not responding to surveys

    even if this article is right, Republicans still use anti-trans rhetoric and anti-trans policy to appeal to ignorant uneducated males of low socioeconomic status, and it works...

    if you go far enough down on the totem pole of socioeconomic status and education, you find people who genuinely think that Democrats are trying to give all the money to the elites and don't understand that proposals for trickle down economics mean they, the lower class people, are going to get fucked. Because so many of these low education and low intelligence people can only vote emotionally, because they actually can't understand the economic issues, they get tricked by conservatives every time with emotional ploys about trans people and other bullshit...

    Because if you're some tractor driver who went to Bob Jones university and can't understand things like monetary supply and federal interest rates and crowding out of private investment by government bonds and real prices and the problems of the CPI, you still may understand the myths of the Bible and that trans people "don't seem quite right" and you may laugh at "trans-insanity" insults and other things enough to connect with super rich people out to fuck you as hard as your ignorant brain will allow. And so if you're that type, you're going to vote Republican and then 2 years later complain about how it's unfair you're losing the farm and tractor.

    It's not about how people rationally feel, this is about rhetorical mendacious trickery to confuse and dupe the stupid plus rage-bait wedge issues so that enough poor people vote with the rich so that both the lower and middle classes get fucked.

  • People who use lemmy are a very particular subset, people who self-host another, and you are posting this, saying various things about the government in a long AI-generated post, then claiming you wrote it yourself. People don't randomly lie. You're being paid.

  • Your post is CLEARLY written by AI. Your responses aren't. So you're being deceptive.

    Your posting in a self-hosted community, preaching self-hosting to people who already do it. You could very well be trying to identify people who hold more unusual views about technology and harbor anti-governemnt sentiments. You think people here are stupid? You're clearly a liar and probably working for someone. No one lies like that without an agenda.

  • shut the fuck up you liar

  • You're a liar. No one is that influenced. The post is AI, your responses aren't. Who are you?

    To everyone other than OP: this may be someone trying to collect data on people on lemmy and what their views are on the government. This person is lying and being deceptive. Something is off.

  • This thread is by a malicious actor trying to collect data on lemmy users.