policies.mdx 26 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447
  1. ---
  2. title: Policy Definitions
  3. ---
  4. import Tabs from "@theme/Tabs";
  5. import TabItem from "@theme/TabItem";
  6. Out of the box, Anubis is pretty heavy-handed. It will aggressively challenge everything that might be a browser (usually indicated by having `Mozilla` in its user agent). However, some bots are smart enough to get past the challenge. Some things that look like bots may actually be fine (IE: RSS readers). Some resources need to be visible no matter what. Some resources and remotes are fine to begin with.
  7. Anubis lets you customize its configuration with a Policy File. This is a YAML document that spells out what actions Anubis should take when evaluating requests. The [default configuration](https://github.com/TecharoHQ/anubis/blob/main/data/botPolicies.yaml) explains everything, but this page contains an overview of everything you can do with it.
  8. ## Bot Policies
  9. Bot policies let you customize the rules that Anubis uses to allow, deny, or challenge incoming requests. Currently you can set policies by the following matches:
  10. - Request path
  11. - User agent string
  12. - HTTP request header values
  13. - [Importing other configuration snippets](./configuration/import.mdx)
  14. As of version v1.17.0 or later, configuration can be written in either JSON or YAML.
  15. Here's an example rule that denies [Amazonbot](https://developer.amazon.com/en/amazonbot):
  16. ```yaml
  17. - name: amazonbot
  18. user_agent_regex: Amazonbot
  19. action: DENY
  20. ```
  21. When this rule is evaluated, Anubis will check the `User-Agent` string of the request. If it contains `Amazonbot`, Anubis will send an error page to the user saying that access is denied, but in such a way that makes scrapers think they have correctly loaded the webpage.
  22. Right now the only kinds of policies you can write are bot policies. Other forms of policies will be added in the future.
  23. Here is a minimal policy file that will protect against most scraper bots:
  24. ```yaml
  25. bots:
  26. - name: cloudflare-workers
  27. headers_regex:
  28. CF-Worker: .*
  29. action: DENY
  30. - name: well-known
  31. path_regex: ^/.well-known/.*$
  32. action: ALLOW
  33. - name: favicon
  34. path_regex: ^/favicon.ico$
  35. action: ALLOW
  36. - name: robots-txt
  37. path_regex: ^/robots.txt$
  38. action: ALLOW
  39. - name: generic-browser
  40. user_agent_regex: Mozilla
  41. action: CHALLENGE
  42. ```
  43. This allows requests to [`/.well-known`](https://en.wikipedia.org/wiki/Well-known_URI), `/favicon.ico`, `/robots.txt`, and challenges any request that has the word `Mozilla` in its User-Agent string. The [default policy file](https://github.com/TecharoHQ/anubis/blob/main/data/botPolicies.yaml) is a bit more cohesive, but this should be more than enough for most users.
  44. If no rules match the request, it is allowed through. For more details on this default behavior and its implications, see [Default allow behavior](./default-allow-behavior.mdx).
  45. ### Writing your own rules
  46. There are four actions that can be returned from a rule:
  47. | Action | Effects |
  48. | :---------- | :---------------------------------------------------------------------------------------------------------------------------------- |
  49. | `ALLOW` | Bypass all further checks and send the request to the backend. |
  50. | `DENY` | Deny the request and send back an error message that scrapers think is a success. |
  51. | `CHALLENGE` | Show a challenge page and/or validate that clients have passed a challenge. |
  52. | `WEIGH` | Change the [request weight](#request-weight) for this request. See the [request weight](#request-weight) docs for more information. |
  53. Name your rules in lower case using kebab-case. Rule names will be exposed in Prometheus metrics.
  54. ### Challenge configuration
  55. Rules can also have their own challenge settings. These are customized using the `"challenge"` key. For example, here is a rule that makes challenges artificially hard for connections with the substring "bot" in their user agent:
  56. This rule has been known to have a high false positive rate in testing. Please use this with care.
  57. ```yaml
  58. # Punish any bot with "bot" in the user-agent string
  59. - name: generic-bot-catchall
  60. user_agent_regex: (?i:bot|crawler)
  61. action: CHALLENGE
  62. challenge:
  63. difficulty: 16 # impossible
  64. algorithm: slow # intentionally waste CPU cycles and time
  65. ```
  66. Challenges can be configured with these settings:
  67. | Key | Example | Description |
  68. | :----------- | :------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------- |
  69. | `difficulty` | `4` | The challenge difficulty (number of leading zeros) for proof-of-work. See [Why does Anubis use Proof-of-Work?](/docs/design/why-proof-of-work) for more details. |
  70. | `algorithm` | `"fast"` | The challenge method to use. See [the list of challenge methods](./configuration/challenges/) for more information. |
  71. ### Remote IP based filtering
  72. The `remote_addresses` field of a Bot rule allows you to set the IP range that this ruleset applies to.
  73. For example, you can allow a search engine to connect if and only if its IP address matches the ones they published:
  74. ```yaml
  75. - name: qwantbot
  76. user_agent_regex: \+https\://help\.qwant\.com/bot/
  77. action: ALLOW
  78. # https://help.qwant.com/wp-content/uploads/sites/2/2025/01/qwantbot.json
  79. remote_addresses: ["91.242.162.0/24"]
  80. ```
  81. This also works at an IP range level without any other checks:
  82. ```yaml
  83. name: internal-network
  84. action: ALLOW
  85. remote_addresses:
  86. - 100.64.0.0/10
  87. ```
  88. ## Imprint / Impressum support
  89. Anubis has support for showing imprint / impressum information. This is defined in the `impressum` block of your configuration. See [Imprint / Impressum configuration](./configuration/impressum.mdx) for more information.
  90. ## Storage backends
  91. Anubis needs to store temporary data in order to determine if a user is legitimate or not. Administrators should choose a storage backend based on their infrastructure needs. Each backend has its own advantages and disadvantages.
  92. Anubis offers the following storage backends:
  93. - [`memory`](#memory) -- A simple in-memory hashmap
  94. - [`bbolt`](#bbolt) -- An on-disk key/value store backed by [bbolt](https://github.com/etcd-io/bbolt), an embedded key/value database for Go programs
  95. - [`valkey`](#valkey) -- A remote in-memory key/value database backed by [Valkey](https://valkey.io/) (or another database compatible with the [RESP](https://redis.io/docs/latest/develop/reference/protocol-spec/) protocol)
  96. If no storage backend is set in the policy file, Anubis will use the [`memory`](#memory) backend by default. This is equivalent to the following in the policy file:
  97. ```yaml
  98. store:
  99. backend: memory
  100. parameters: {}
  101. ```
  102. ### `memory`
  103. The memory backend is an in-memory cache. This backend works best if you don't use multiple instances of Anubis or don't have mutable storage in the environment you're running Anubis in.
  104. | Should I use this backend? | Yes/no |
  105. | :------------------------------------------------------------ | :----- |
  106. | Are you running only one instance of Anubis for this service? | ✅ Yes |
  107. | Does your service get a lot of traffic? | 🚫 No |
  108. | Do you want to store data persistently when Anubis restarts? | 🚫 No |
  109. | Do you run Anubis without mutable filesystem storage? | ✅ Yes |
  110. The biggest downside is that there is not currently a limit to how much data can be stored in memory. This will be addressed at a later time.
  111. :::warning
  112. The in-memory backend exists mostly for validation, testing, and to ensure that the default configuration of Anubis works as expected. Do not use this persistently in production.
  113. :::
  114. #### Configuration
  115. The memory backend does not require any configuration to use.
  116. ### `bbolt`
  117. An on-disk storage layer powered by [bbolt](https://github.com/etcd-io/bbolt), a high performance embedded key/value database used by containerd, etcd, Kubernetes, and NATS. This backend works best if you're running Anubis on a single host and get a lot of traffic.
  118. | Should I use this backend? | Yes/no |
  119. | :------------------------------------------------------------ | :----- |
  120. | Are you running only one instance of Anubis for this service? | ✅ Yes |
  121. | Does your service get a lot of traffic? | ✅ Yes |
  122. | Do you want to store data persistently when Anubis restarts? | ✅ Yes |
  123. | Do you run Anubis without mutable filesystem storage? | 🚫 No |
  124. When Anubis opens a bbolt database, it takes an exclusive lock on that database. Other instances of Anubis or other tools cannot view the bbolt database while it is locked by another instance of Anubis. If you run multiple instances of Anubis for different services, give each its own `bbolt` configuration.
  125. #### Configuration
  126. The `bbolt` backend takes the following configuration options:
  127. | Name | Type | Example | Description |
  128. | :----- | :--- | :----------------- | :--------------------------------------------------------------------------------------------------------------------------- |
  129. | `path` | path | `/data/anubis.bdb` | The filesystem path for the Anubis bbolt database. Anubis requires write access to the folder containing the bbolt database. |
  130. Example:
  131. If you have persistent storage mounted to `/data`, then your store configuration could look like this:
  132. ```yaml
  133. store:
  134. backend: bbolt
  135. parameters:
  136. path: /data/anubis.bdb
  137. ```
  138. ### `s3api`
  139. A network-backed storage layer backed by [object storage](https://en.wikipedia.org/wiki/Object_storage), specifically using the [S3 API](https://docs.aws.amazon.com/AmazonS3/latest/API/Type_API_Reference.html). This can be backed by any S3-compatible object storage service such as:
  140. - [AWS S3](https://aws.amazon.com/s3/)
  141. - [Cloudflare R2](https://www.cloudflare.com/developer-platform/products/r2/)
  142. - [Hetzner Object Storage](https://www.hetzner.com/storage/object-storage/)
  143. - [Minio](https://www.min.io/)
  144. - [Tigris](https://www.tigrisdata.com/)
  145. If you are using a cloud platform, they likely provide an S3 compatible object storage service. If not, you may want to choose [one of the fastest options](https://www.tigrisdata.com/blog/benchmark-small-objects/).
  146. | Should I use this backend? | Yes/no |
  147. | :------------------------------------------------------------ | :----- |
  148. | Are you running only one instance of Anubis for this service? | 🚫 No |
  149. | Does your service get a lot of traffic? | ✅ Yes |
  150. | Do you want to store data persistently when Anubis restarts? | ✅ Yes |
  151. | Do you run Anubis without mutable filesystem storage? | ✅ Yes |
  152. :::note
  153. Using this backend will cause a lot of S3 operations, at least one for creating challenges, one for invalidating challenges, one for updating challenges to prevent double-spends, and one for removing challenges.
  154. :::
  155. #### Configuration
  156. The `s3api` backend takes the following configuration options:
  157. | Name | Type | Example | Description |
  158. | :----------- | :------ | :------------ | :------------------------------------------------------------------------------------------------------------------------------------------ |
  159. | `bucketName` | string | `anubis-data` | (Required) The name of the dedicated bucket for Anubis to store information in. |
  160. | `pathStyle` | boolean | `false` | If true, use path-style S3 API operations. Please consult your storage provider's documentation if you don't know what you should put here. |
  161. :::note
  162. You should probably enable a lifecycle expiration rule for buckets containing Anubis data. Here is an example policy:
  163. ```json
  164. {
  165. "Rules": [
  166. {
  167. "Status": "Enabled",
  168. "Expiration": {
  169. "Days": 7
  170. }
  171. }
  172. ]
  173. }
  174. ```
  175. Adjust this as facts and circumstances demand, but 7 days should be enough for anyone.
  176. :::
  177. Example:
  178. Assuming your environment looks like this:
  179. ```sh
  180. # All of the following are fake credentials that look like real ones.
  181. AWS_ACCESS_KEY_ID=accordingToAllKnownRulesOfAviation
  182. AWS_SECRET_ACCESS_KEY=thereIsNoWayABeeShouldBeAbleToFly
  183. AWS_REGION=yow
  184. AWS_ENDPOINT_URL_S3=https://yow.s3.probably-not-malware.lol
  185. ```
  186. Then your configuration would look like this:
  187. ```yaml
  188. store:
  189. backend: s3api
  190. parameters:
  191. bucketName: techaro-prod-anubis
  192. pathStyle: false
  193. ```
  194. ### `valkey`
  195. [Valkey](https://valkey.io/) is an in-memory key/value store that clients access over the network. This allows multiple instances of Anubis to share information and does not require each instance of Anubis to have persistent filesystem storage.
  196. :::note
  197. You can also use [Redis™](http://redis.io/) with Anubis.
  198. :::
  199. This backend is ideal if you are running multiple instances of Anubis in a worker pool (eg: Kubernetes Deployments with a copy of Anubis in each Pod).
  200. | Should I use this backend? | Yes/no |
  201. | :------------------------------------------------------------ | :----- |
  202. | Are you running only one instance of Anubis for this service? | 🚫 No |
  203. | Does your service get a lot of traffic? | ✅ Yes |
  204. | Do you want to store data persistently when Anubis restarts? | ✅ Yes |
  205. | Do you run Anubis without mutable filesystem storage? | ✅ Yes |
  206. | Do you have Redis™ or Valkey installed? | ✅ Yes |
  207. #### Configuration
  208. The `valkey` backend takes the following configuration options:
  209. | Name | Type | Example | Description |
  210. | :--------- | :----- | :---------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------ |
  211. | `cluster` | bool | `false` | If true, use [Redis™ Clustering](https://redis.io/topics/cluster-spec) for storing Anubis data. |
  212. | `sentinel` | object | `{}` | See [Redis™ Sentinel docs](#redis-sentinel) for more detail and examples |
  213. | `url` | string | `redis://valkey:6379/0` | The URL for the instance of Redis™ or Valkey that Anubis should store data in. This is in the same format as `REDIS_URL` in many cloud providers. |
  214. Example:
  215. If you have an instance of Valkey running with the hostname `valkey.int.techaro.lol`, then your store configuration could look like this:
  216. ```yaml
  217. store:
  218. backend: valkey
  219. parameters:
  220. url: "redis://valkey.int.techaro.lol:6379/0"
  221. ```
  222. This would have the Valkey client connect to host `valkey.int.techaro.lol` on port `6379` with database `0` (the default database).
  223. #### Redis™ Sentinel
  224. If you are using [Redis™ Sentinel](https://redis.io/docs/latest/operate/oss_and_stack/management/sentinel/) for a high availability setup, you need to configure the `sentinel` object. This object takes the following configuration options:
  225. | Name | Type | Example | Description |
  226. | :----------- | :----------------------- | :-------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------- |
  227. | `addr` | string or list of string | `10.43.208.130:26379` | (Required) The host and port of the Redis™ Sentinel server. When possible, use DNS names for this. If you have multiple addresses, supply a list of them. |
  228. | `clientName` | string | `Anubis` | The client name reported to Redis™ Sentinel. Set this if you want to track Anubis connections to your Redis™ Sentinel. |
  229. | `masterName` | string | `mymaster` | (Required) The name of the master in the Redis™ Sentinel configuration. This is used to discover where to find client connection hosts/ports. |
  230. | `username` | string | `azurediamond` | The username used to authenticate against the Redis™ Sentinel and Redis™ servers. |
  231. | `password` | string | `hunter2` | The password used to authenticate against the Redis™ Sentinel and Redis™ servers. |
  232. ## Logging management
  233. Anubis has very verbose logging out of the box. This is intentional and allows administrators to be sure that it is working merely by watching it work in real time. Some administrators may not appreciate this level of logging out of the box. As such, Anubis lets you customize details about how it logs data.
  234. Anubis uses a practice called [structured logging](https://stackify.com/what-is-structured-logging-and-why-developers-need-it/) to emit log messages with key-value pair context. In order to make analyzing large amounts of log messages easier, Anubis encodes all logs in JSON. This allows you to use any tool that can parse JSON to perform analytics or monitor for issues.
  235. Anubis exposes the following logging settings in the policy file:
  236. | Name | Type | Example | Description |
  237. | :----------- | :----------------------- | :-------------- | :--------------------------------------------------------------------------------------------------------------------------------------- |
  238. | `level` | [log level](#log-levels) | `info` | The logging level threshold. Any logs that are at or above this threshold will be drained to the sink. Any other logs will be discarded. |
  239. | `sink` | string | `stdio`, `file` | The sink where the logs drain to as they are being recorded in Anubis. |
  240. | `parameters` | object | | Parameters for the given logging sink. This will vary based on the logging sink of choice. See below for more information. |
  241. Anubis supports the following logging sinks:
  242. 1. `file`: logs are emitted to a file that is rotated based on size and age. Old log files are compressed with gzip to save space. This allows for better integration with users that decide to use legacy service managers (OpenRC, FreeBSD's init, etc).
  243. 2. `stdio`: logs are emitted to the standard error stream of the Anubis process. This allows runtimes such as Docker, Podman, Systemd, and Kubernetes to capture logs with their native logging subsystems without any additional configuration.
  244. ### Log levels
  245. Anubis uses Go's [standard library `log/slog` package](https://pkg.go.dev/log/slog) to emit structured logs. By default, Anubis logs at the [Info level](https://pkg.go.dev/log/slog#Level), which is fairly verbose out of the box. Here are the possible logging levels in Anubis:
  246. | Log level | Use in Anubis |
  247. | :-------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------- |
  248. | `DEBUG` | The raw unfiltered torrent of doom. Only use this if you are actively working on Anubis or have very good reasons to use it. |
  249. | `INFO` | The default logging level, fairly verbose in order to make it easier for automation to parse. |
  250. | `WARN` | A "more silent" logging level. Much less verbose. Some things that are now at the `info` level need to be moved up to the `warn` level in future patches. |
  251. | `ERROR` | Only log error messages. |
  252. Additionally, you can set a "slightly higher" log level if you need to, such as:
  253. ```yaml
  254. logging:
  255. sink: stdio
  256. level: "INFO+1"
  257. ```
  258. This isn't currently used by Anubis, but will be in the future for "slightly important" information.
  259. ### `file` sink
  260. The `file` sink makes Anubis write its logs to the filesystem and rotate them out when the log file meets certain thresholds. This logging sink takes the following parameters:
  261. | Name | Type | Example | Description |
  262. | :------------- | :-------------- | :-------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
  263. | `file` | string | `/var/log/anubis.log` | The file where Anubis logs should be written to. Make sure the user Anubis is running as has write and file creation permissions to this directory. |
  264. | `maxBackups` | number | `3` | The number of old log files that should be maintained when log files are rotated out. |
  265. | `maxBytes` | number of bytes | `67108864` (64Mi) | The maximum size of each log file before it is rotated out. |
  266. | `maxAge` | number of days | `7` | If a log file is more than this many days old, rotate it out. |
  267. | `compress` | boolean | `true` | If true, compress old log files with gzip. This should be set to `true` and is only exposed as an option for dealing with legacy workflows where there is magical thinking about log files at play. |
  268. | `useLocalTime` | boolean | `false` | If true, use the system local time zone to create log filenames instead of UTC. This should almost always be set to `false` and is only exposed for legacy workflows where there is magical thinking about time zones at play. |
  269. ```yaml
  270. logging:
  271. sink: file
  272. parameters:
  273. file: "./var/anubis.log"
  274. maxBackups: 3 # keep at least 3 old copies
  275. maxBytes: 67108864 # each file can have up to 64 Mi of logs
  276. maxAge: 7 # rotate files out every n days
  277. compress: true # gzip-compress old log files
  278. useLocalTime: false # timezone for rotated files is UTC
  279. ```
  280. When files are rotated out, the old files will be named after the rotation timestamp in [RFC 3339 format](https://www.rfc-editor.org/rfc/rfc3339).
  281. ### `stdio` sink
  282. By default, Anubis logs everything to the standard error stream of its process. This requires no configuration:
  283. ```yaml
  284. logging:
  285. sink: stdio
  286. ```
  287. If you use a service orchestration platform that does not capture the standard error stream of processes, you need to use a different logging sink.
  288. ## Risk calculation for downstream services
  289. In case your service needs it for risk calculation reasons, Anubis exposes information about the rules that any requests match using a few headers:
  290. | Header | Explanation | Example |
  291. | :---------------- | :--------------------------------------------------- | :--------------- |
  292. | `X-Anubis-Rule` | The name of the rule that was matched | `bot/lightpanda` |
  293. | `X-Anubis-Action` | The action that Anubis took in response to that rule | `CHALLENGE` |
  294. | `X-Anubis-Status` | The status and how strict Anubis was in its checks | `PASS` |
  295. Policy rules are matched using [Go's standard library regular expressions package](https://pkg.go.dev/regexp). You can mess around with the syntax at [regex101.com](https://regex101.com), make sure to select the Golang option.
  296. ## Request Weight
  297. Anubis rules can also add or remove "weight" from requests, allowing administrators to configure custom levels of suspicion. For example, if your application uses session tokens named `i_love_gitea`:
  298. ```yaml
  299. - name: gitea-session-token
  300. action: WEIGH
  301. expression:
  302. all:
  303. - '"Cookie" in headers'
  304. - headers["Cookie"].contains("i_love_gitea=")
  305. # Remove 5 weight points
  306. weight:
  307. adjust: -5
  308. ```
  309. This would remove five weight points from the request, which would make Anubis present the [Meta Refresh challenge](./configuration/challenges/metarefresh.mdx) in the default configuration.
  310. ### Weight Thresholds
  311. For more information on configuring weight thresholds, see [Weight Threshold Configuration](./configuration/thresholds.mdx)
  312. ### Advice
  313. Weight is still very new and needs work. This is an experimental feature and should be treated as such. Here's some advice to help you better tune requests:
  314. - The default weight for browser-like clients is 10. This triggers an aggressive challenge.
  315. - Remove and add weight in multiples of five.
  316. - Be careful with how you configure weight.