177 lines
4.9 KiB
Markdown
177 lines
4.9 KiB
Markdown
# CaddyHole
|
|
|
|
A Caddy module that blocks clients based on country and sends crawlers/bots a random amount of data from `/dev/random`.
|
|
|
|
## Features
|
|
|
|
- **Country-based blocking**: Block requests from specific countries using GeoIP2 database
|
|
- **Bot/Crawler detection**: Automatically detect bots and crawlers based on User-Agent
|
|
- **Bot tarpit**: Send bots/crawlers random data from `/dev/random` to waste their resources
|
|
- **Configurable**: Customize blocked countries and amount of data sent to bots
|
|
|
|
## Installation
|
|
|
|
To use this module, you need to build Caddy with this plugin included. You can use [xcaddy](https://github.com/caddyserver/xcaddy):
|
|
|
|
```bash
|
|
xcaddy build --with git.wntrmute.dev/kyle/caddyhole
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Caddyfile
|
|
|
|
```caddyfile
|
|
example.com {
|
|
caddyhole {
|
|
# Path to GeoIP2 database (optional, required for country blocking)
|
|
database /path/to/GeoLite2-Country.mmdb
|
|
|
|
# List of country ISO codes to block (optional)
|
|
block_countries CN RU KP IN IL
|
|
|
|
# Minimum bytes to send to bots (optional, default: 1MB)
|
|
min_bot_bytes 1048576
|
|
|
|
# Maximum bytes to send to bots (optional, default: 100MB)
|
|
max_bot_bytes 104857600
|
|
}
|
|
|
|
# Your other handlers here
|
|
respond "Hello, World!"
|
|
}
|
|
```
|
|
|
|
### JSON Config
|
|
|
|
```json
|
|
{
|
|
"apps": {
|
|
"http": {
|
|
"servers": {
|
|
"srv0": {
|
|
"listen": [":443"],
|
|
"routes": [
|
|
{
|
|
"handle": [
|
|
{
|
|
"handler": "caddyhole",
|
|
"database_path": "/path/to/GeoLite2-Country.mmdb",
|
|
"blocked_countries": ["CN", "RU", "KP"],
|
|
"min_bot_bytes": 1048576,
|
|
"max_bot_bytes": 104857600
|
|
},
|
|
{
|
|
"handler": "static_response",
|
|
"body": "Hello, World!"
|
|
}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## How It Works
|
|
|
|
### Bot Detection
|
|
|
|
The module detects bots/crawlers by examining the `User-Agent` header. It looks for common bot signatures including:
|
|
|
|
- bot, crawler, spider, scraper
|
|
- curl, wget
|
|
- python-requests, python-urllib
|
|
- go-http-client
|
|
- java, perl, ruby, php
|
|
- http_request
|
|
|
|
When a bot is detected, the module:
|
|
1. Generates a random amount of data between `min_bot_bytes` and `max_bot_bytes`
|
|
2. Streams that amount of random data from `/dev/random` (or `/dev/urandom` as fallback)
|
|
3. Returns HTTP 200 OK to make the bot think it succeeded
|
|
4. Never passes the request to downstream handlers
|
|
|
|
### Country Blocking
|
|
|
|
The module uses MaxMind's GeoIP2 database to determine the country of the client based on their IP address. If the country code matches any in the `blocked_countries` list:
|
|
|
|
1. Returns HTTP 403 Forbidden
|
|
2. Sends "Access denied" message
|
|
3. Never passes the request to downstream handlers
|
|
|
|
The module checks the following headers for the client IP (in order):
|
|
1. `X-Forwarded-For` (first IP)
|
|
2. `X-Real-IP`
|
|
3. `RemoteAddr`
|
|
|
|
### Execution Order
|
|
|
|
The module processes requests in this order:
|
|
1. Check if request is from a bot → If yes, send random data
|
|
2. Check if request is from a blocked country → If yes, return 403
|
|
3. Otherwise, pass to the next handler
|
|
|
|
## GeoIP2 Database
|
|
|
|
To use country blocking, you need a GeoIP2 database. You can download the free GeoLite2 Country database from MaxMind:
|
|
|
|
1. Sign up for a free account at [MaxMind](https://www.maxmind.com/en/geolite2/signup)
|
|
2. Download the GeoLite2 Country database in MMDB format
|
|
3. Extract the `.mmdb` file
|
|
4. Configure the `database` path in your Caddyfile
|
|
|
|
## Examples
|
|
|
|
### Block only specific countries (no bot handling)
|
|
|
|
```caddyfile
|
|
example.com {
|
|
caddyhole {
|
|
database /path/to/GeoLite2-Country.mmdb
|
|
block_countries CN RU
|
|
}
|
|
respond "Hello, World!"
|
|
}
|
|
```
|
|
|
|
### Bot tarpit only (no country blocking)
|
|
|
|
```caddyfile
|
|
example.com {
|
|
caddyhole {
|
|
min_bot_bytes 10485760 # 10MB
|
|
max_bot_bytes 1073741824 # 1GB
|
|
}
|
|
respond "Hello, World!"
|
|
}
|
|
```
|
|
|
|
### Full protection
|
|
|
|
```caddyfile
|
|
example.com {
|
|
caddyhole {
|
|
database /path/to/GeoLite2-Country.mmdb
|
|
block_countries CN RU KP IR
|
|
min_bot_bytes 52428800 # 50MB
|
|
max_bot_bytes 524288000 # 500MB
|
|
}
|
|
respond "Hello, World!"
|
|
}
|
|
```
|
|
|
|
## Notes
|
|
|
|
- Bot detection happens before country blocking, so bots will get random data regardless of their country
|
|
- The random data is streamed directly from `/dev/random` (or `/dev/urandom`), which may impact system entropy on some systems
|
|
- The `Content-Type` is set to `application/octet-stream` and `Content-Length` is set to make the response appear legitimate
|
|
- Country blocking requires a GeoIP2 database; without it, no country blocking occurs
|
|
- All configuration parameters are optional, but you need at least `database` and `block_countries` for country blocking to work
|
|
|
|
## License
|
|
|
|
This module is provided as-is for use with Caddy.
|