Skip Navigation

A dummy's request for Nepenthes

I want to set up my own Nepenthes against LLMs. I have purchased a domain, say "wowsocool.com".

I have a RaspberryPi 4B that I want to use as an nginx reverse proxy, and an old Acer laptop that will host the Nepenthes. I am going to host this at my current residence router as I won't be staying there too long. I thought this was a cool temporary project.

My problem is that the website sort of glosses over the whole nginx setup and IP pointing etc.

If anyone has done this before, is it possible to please write up a dummy's guide that goes through everything. I am quite unconfident and my skills are nonexistent in this field.

Pretty please.

12 comments
  • So, I set this up recently and agree with all of your points about the actual integration being glossed over.

    I already had bot detection setup in my Nginx config, so adding Nepenthes was just changing the behavior of that. Previously, I had just returned either 404 or 444 to those requests but now it redirects them to Nepenthes.

    Rather than trying to do rewrites and pretend the Nepenthes content is under my app's URL namespace, I just do a redirect which the bot crawlers tend to follow just fine.

    There's several parts to this to keep my config sane. Each of those are in include files.

    • An include file that looks at the user agent, compares it to a list of bot UA regexes, and sets a variable to either 0 or 1. By itself, that include file doesn't do anything more than set that variable. This allows me to have it as a global config without having it apply to every virtual host.
    • An include file that performs the action if a variable is set to true. This has to be included in the server portion of each virtual host where I want the bot traffic to go to Nepenthes. If this isn't included in a virtual host's server block, then bot traffic is allowed.
    • A virtual host where the Nepenthes content is presented. I run a subdomain (content.mydomain.xyz). You could also do this as a path off of your protected domain, but this works for me and keeps my already complex config from getting any worse. Plus, it was easier to integrate into my existing bot config. Had I not already had that, I would have run it off of a path (and may go back and do that when I have time to mess with it again).

    The map-bot-user-agents.conf is included in the http section of Nginx and applies to all virtual hosts. You can either include this in the main nginx.conf or at the top (above the server section) in your individual virtual host config file(s).

    The deny-disallowed.conf is included individually in each virtual hosts's server section. Even though the bot detection is global, if the virtual host's server section does not include the action file, then nothing is done.

    Files

12 comments