diff options
author | Yuchen Wu <[email protected]> | 2024-02-27 20:25:44 -0800 |
---|---|---|
committer | Yuchen Wu <[email protected]> | 2024-02-27 20:25:44 -0800 |
commit | 8797329225018c4d0ab990166dd020338ae292dc (patch) | |
tree | 1e8d0bf6f3c27e987559f52319d91ff75e4da5cb | |
parent | 0bca116c1027a878469b72352e1e9e3916e85dde (diff) | |
download | pingora-8797329225018c4d0ab990166dd020338ae292dc.tar.gz pingora-8797329225018c4d0ab990166dd020338ae292dc.zip |
Release Pingora version 0.1.0v0.1.0
Co-authored-by: Andrew Hauck <[email protected]>
Co-authored-by: Edward Wang <[email protected]>
279 files changed, 48111 insertions, 18 deletions
diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md new file mode 100644 index 0000000..383a2be --- /dev/null +++ b/.github/CONTRIBUTING.md @@ -0,0 +1,51 @@ +# Contributing + +Welcome to Pingora! Before you make a contribution, be it a bug report, documentation improvement, +pull request (PR), etc., please read and follow these guidelines. + +## Start with filing an issue + +More often than not, **start by filing an issue on GitHub**. If you have a bug report or feature +request, open a GitHub issue. Non-trivial PRs will also require a GitHub issue. The issue provides +us with a space to discuss proposed changes with you and the community. + +Having a discussion via GitHub issue upfront is the best way to ensure your contribution lands in +Pingora. We don't want you to spend your time making a PR, only to find that we won't accept it on +a design basis. For example, we may find that your proposed feature works better as a third-party +module built on top of or for use with Pingora and encourage you to pursue that direction instead. + +**You do not need to file an issue for small fixes.** What counts as a "small" or trivial fix is a +judgment call, so here's a few examples to clarify: +- fixing a typo +- refactoring a bit of code +- most documentation or comment edits + +Still, _sometimes_ we may review your PR and ask you to file an issue if we expect there are larger +design decisions to be made. + +## Making a PR + +After you've filed an issue, you can make your PR referencing that issue number. Once you open your +PR, it will be labelled _needs review_. A maintainer will review your PR as soon as they can. The +reviewer may ask for changes - they will mark the PR as _changes requested_ and _work in progress_ +and will give you details about the requested changes. Feel free to ask lots of questions! The +maintainers are there to help you. + +### Caveats + +Currently, internal contributions will take priority. Today Pingora is being maintained by +Cloudflare's Content Delivery team, and internal Cloudflare proxy services are a primary user of +Pingora. We value the community's work on Pingora, but the reality is that our team has a limited +amount of resources and time. We can't promise we will review or address all PRs or issues in a +timely manner. + +## Conduct + +Pingora and Cloudflare OpenSource generally follows the [Contributor Covenant Code of Conduct]. +Violating the CoC could result in a warning or a ban to Pingora or any and all repositories in the Cloudflare organization. + +[Contributor Covenant Code of Conduct]: https://github.com/cloudflare/.github/blob/26b37ca2ba7ab3d91050ead9f2c0e30674d3b91e/CODE_OF_CONDUCT.md + +## Contact + +If you have any questions, please reach out to [[email protected]](mailto:[email protected]). diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md new file mode 100644 index 0000000..434a12e --- /dev/null +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -0,0 +1,37 @@ +--- +name: Bug Report +about: Report an issue to help us improve +title: '' +labels: '' +assignees: '' +--- + +## Describe the bug + +A clear and concise description of what the bug is. + +## Pingora info + +Please include the following information about your environment: + +**Pingora version**: release number of commit hash +**Rust version**: i.e. `cargo --version` +**Operating system version**: e.g. Ubuntu 22.04, Debian 12.4 + +## Steps to reproduce + +Please provide step-by-step instructions to reproduce the issue. Include any relevant code +snippets. + +## Expected results + +What were you expecting to happen? + +## Observed results + +What actually happened? + +## Additional context + +What other information would you like to provide? e.g. screenshots, how you're working around the +issue, or other clues you think could be helpful to identify the root cause. diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md new file mode 100644 index 0000000..cc8d785 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -0,0 +1,27 @@ +--- +name: Feature request +about: Propose a new feature +title: '' +labels: '' +assignees: '' +--- + +## What is the problem your feature solves, or the need it fulfills? + +A clear and concise description of why this feature should be added. What is the problem? Who is +this for? + +## Describe the solution you'd like + +What do you propose to resolve the problem or fulfill the need above? How would you like it to +work? + +## Describe alternatives you've considered + +What other solutions, features, or workarounds have you considered that might also solve the issue? +What are the tradeoffs for these alternatives compared to what you're proposing? + +## Additional context + +This could include references to documentation or papers, prior art, screenshots, or benchmark +results. @@ -1,6 +1,7 @@ -**/target +Cargo.lock +/target **/*.rs.bk -**/Cargo.lock -**/dhat-heap.json +dhat-heap.json .vscode -.cover
\ No newline at end of file +.idea +.cover diff --git a/.rustfmt.toml b/.rustfmt.toml new file mode 100644 index 0000000..e69de29 --- /dev/null +++ b/.rustfmt.toml @@ -1,4 +1,37 @@ [workspace] +resolver = "2" members = [ + "pingora", + "pingora-core", + "pingora-pool", + "pingora-error", "pingora-limits", -]
\ No newline at end of file + "pingora-timeout", + "pingora-header-serde", + "pingora-proxy", + "pingora-cache", + "pingora-http", + "pingora-lru", + "pingora-openssl", + "pingora-boringssl", + "pingora-runtime", + "pingora-ketama", + "pingora-load-balancing", + "pingora-memory-cache", + "tinyufo", +] + +[workspace.dependencies] +tokio = "1" +async-trait = "0.1.42" +httparse = "1" +bytes = "1.0" +http = "1.0.0" +log = "0.4" +h2 = ">=0.4.2" +once_cell = "1" +lru = "0" +ahash = ">=0.8.9" + +[profile.bench] +debug = true @@ -1,5 +1,65 @@ # Pingora -[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) +![Pingora banner image](./docs/assets/pingora_banner.png) -A library for building fast, reliable and evolvable network services. +## What is Pingora +Pingora is a Rust framework to [build fast, reliable and programmable networked systems](https://blog.cloudflare.com/pingora-open-source). + +Pingora is battle tested as it has been serving more than 40 million Internet requests per second for [more than a few years](https://blog.cloudflare.com/how-we-built-pingora-the-proxy-that-connects-cloudflare-to-the-internet). + +## Feature highlights +* Async Rust: fast and reliable +* HTTP 1/2 end to end proxy +* TLS over OpenSSL or BoringSSL +* gRPC and websocket proxying +* Graceful reload +* Customizable load balancing and failover strategies +* Support for a variety of observability tools + +## Reasons to use Pingora +* **Security** is your top priority: Pingora is a more memory safe alternative for services that are written in C/C++. +* Your service is **performance-sensitive**: Pingora is fast and efficient. +* Your service requires extensive **customization**: The APIs Pingora proxy framework provides are highly programmable. + +# Getting started + +See our [quick starting guide](./docs/quick_start.md) to see how easy it is to build a load balancer. + +Our [user guide](./docs/user_guide/index.md) covers more topics such as how to configure and run Pingora servers, as well as how to build custom HTTP server and proxy logic on top of Pingora's framework. + +API docs are also available for all the crates. + +# Notable crates in this workspace +* Pingora: the "public facing" crate to build to build networked systems and proxies. +* Pingora-core: this crates defined the protocols, functionalities and basic traits. +* Pingora-proxy: the logic and APIs to build HTTP proxies. +* Pingora-error: the common error type used across Pingora crates +* Pingora-http: the HTTP header definitions and APIs +* Pingora-openssl & pingora-boringssl: SSL related extensions and APIs +* Pingora-ketama: the [Ketama](https://github.com/RJ/ketama) consistent algorithm +* Pingora-limits: efficient counting algorithms +* Pingora-load-balancing: load balancing algorithm extensions for pingora proxy +* Pingora-memory-cache: Async in-memory caching with cache lock to prevent cache stampede. +* Pingora-timeout: A more efficient async timer system. +* TinyUfo: The caching algorithm behind pingora-memory-cache. + +# System requirements + +## Systems +Linux is our tier 1 environment and main focus. + +We will try our best for most code to compile for Unix environments. This is for developers and users to have an easier time developing with Pingora in Unix-like environments like macOS (though some features might be missing) + +Both x86_64 and aarch64 architectures will be supported. + +## Rust version + +Pingora keeps a rolling MSRV (minimum supported Rust version) policy of 6 months. This means we will accept PRs that upgrade the MSRV as long as the new Rust version used is at least 6 months old. + +Our current MSRV is 1.72. + +# Contributing +Please see our [contribution guidelines](./.github/CONTRIBUTING.md). + +# License +This project is Licensed under [Apache License, Version 2.0](./LICENSE). diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000..06e1394 --- /dev/null +++ b/docs/README.md @@ -0,0 +1,14 @@ +# Pingora User Manual + +## Quick Start +In this section we show you how to build a barebones load balancer. + +[Read the quick start here.](quick_start.md) + +## User Guide +Covers how to configure and run Pingora servers, as well as how to build custom HTTP server and proxy logic on top of Pingora's framework. + +[Read the user guide here.](user_guide/index.md) + +## API Reference +TBD diff --git a/docs/assets/pingora_banner.png b/docs/assets/pingora_banner.png Binary files differnew file mode 100644 index 0000000..43c6932 --- /dev/null +++ b/docs/assets/pingora_banner.png diff --git a/docs/quick_start.md b/docs/quick_start.md new file mode 100644 index 0000000..c59add2 --- /dev/null +++ b/docs/quick_start.md @@ -0,0 +1,324 @@ +# Quick Start: load balancer + +## Introduction + +This quick start shows how to build a bare-bones load balancer using pingora and pingora-proxy. + +The goal of the load balancer is for every incoming HTTP request, select one of the two backends: https://1.1.1.1 and https://1.0.0.1 in a round-robin fashion. + +## Build a basic load balancer + +Create a new cargo project for our load balancer. Let's call it `load_balancer` + +``` +cargo new load_balancer +``` + +### Include the Pingora Crate and Basic Dependencies + +In your project's `cargo.toml` file add the following to your dependencies +``` +async-trait="0.1" +pingora = { version = "0.1", features = [ "lb" ] } +``` + +### Create a pingora server +First, let's create a pingora server. A pingora `Server` is a process which can host one or many +services. The pingora `Server` takes care of configuration and CLI argument parsing, daemonization, +signal handling, and graceful restart or shutdown. + +The preferred usage is to initialize the `Server` in the `main()` function and +use `run_forever()` to spawn all the runtime threads and block the main thread until the server is +ready to exit. + + +```rust +use async_trait::async_trait; +use pingora::prelude::*; +use std::sync::Arc; + +fn main() { + let mut my_server = Server::new(None).unwrap(); + my_server.bootstrap(); + my_server.run_forever(); +} +``` + +This will compile and run, but it doesn't do anything interesting. + +### Create a load balancer proxy +Next let's create a load balancer. Our load balancer holds a static list of upstream IPs. The `pingora-load-balancing` crate already provides the `LoadBalancer` struct with common selection algorithms such as round robin and hashing. So let’s just use it. If the use case requires more sophisticated or customized server selection logic, users can simply implement it themselves in this function. + + +```rust +pub struct LB(Arc<LoadBalancer<RoundRobin>>); +``` + +In order to make the server a proxy, we need to implement the `ProxyHttp` trait for it. + +Any object that implements the `ProxyHttp` trait essentially defines how a request is handled in +the proxy. The only required method in the `ProxyHttp` trait is `upstream_peer()` which returns +the address where the request should be proxied to. + +In the body of the `upstream_peer()`, let's use the `select()` method for the `LoadBalancer` to round-robin across the upstream IPs. In this example we use HTTPS to connect to the backends, so we also need to specify to `use_tls` and set the SNI when constructing our [`Peer`](peer.md) object. + +```rust +#[async_trait] +impl ProxyHttp for LB { + + /// For this small example, we don't need context storage + type CTX = (); + fn new_ctx(&self) -> () { + () + } + + async fn upstream_peer(&self, _session: &mut Session, _ctx: &mut ()) -> Result<Box<HttpPeer>> { + let upstream = self.0 + .select(b"", 256) // hash doesn't matter for round robin + .unwrap(); + + println!("upstream peer is: {:upstream?}"); + + // Set SNI to one.one.one.one + let peer = Box::new(HttpPeer::new(upstream, true, "one.one.one.one".to_string())); + Ok(peer) + } +} +``` + +In order for the 1.1.1.1 backends to accept our requests, a host header must be present. Adding this header +can be done by the `upstream_request_filter()` callback which modifies the request header after +the connection to the backends are established and before the request header is sent. + +```rust +impl ProxyHttp for LB { + // ... + async fn upstream_request_filter( + &self, + _session: &mut Session, + upstream_request: &mut RequestHeader, + _ctx: &mut Self::CTX, + ) -> Result<()> { + upstream_request.insert_header("Host", "one.one.one.one").unwrap(); + Ok(()) + } +} +``` + + +### Create a pingora-proxy service +Next, let's create a proxy service that follows the instructions of the load balancer above. + +A pingora `Service` listens to one or multiple (TCP or Unix domain socket) endpoints. When a new connection is established +the `Service` hands the connection over to its "application." `pingora-proxy` is such an application +which proxies the HTTP request to the given backend as configured above. + +In the example below, we create a `LB` instance with two backends `1.1.1.1:443` and `1.0.0.1:443`. +We put that `LB` instance to a proxy `Service` via the `http_proxy_service()` call and then tell our +`Server` to host that proxy `Service`. + +```rust +fn main() { + let mut my_server = Server::new(None).unwrap(); + my_server.bootstrap(); + + let upstreams = + LoadBalancer::try_from_iter(["1.1.1.1:443", "1.0.0.1:443"]).unwrap(); + + let mut lb = http_proxy_service(&my_server.configuration, LB(Arc::new(upstreams))); + lb.add_tcp("0.0.0.0:6188"); + + my_server.add_service(lb); + + my_server.run_forever(); +} +``` + +### Run it + +Now that we have added the load balancer to the service, we can run our new +project with + +```cargo run``` + +To test it, simply send the server a few requests with the command: +``` +curl 127.0.0.1:6188 -svo /dev/null +``` + +You can also navigate your browser to [http://localhost:6188](http://localhost:6188) + +The following output shows that the load balancer is doing its job to balance across the two backends: +``` +upstream peer is: Backend { addr: Inet(1.0.0.1:443), weight: 1 } +upstream peer is: Backend { addr: Inet(1.1.1.1:443), weight: 1 } +upstream peer is: Backend { addr: Inet(1.0.0.1:443), weight: 1 } +upstream peer is: Backend { addr: Inet(1.1.1.1:443), weight: 1 } +upstream peer is: Backend { addr: Inet(1.0.0.1:443), weight: 1 } +... +``` + +Well done! At this point you have a functional load balancer. It is a _very_ +basic load balancer though, so the next section will walk you through how to +make it more robust with some built-in pingora tooling. + +## Add functionality + +Pingora provides several helpful features that can be enabled and configured +with just a few lines of code. These range from simple peer health checks to +the ability to seamlessly update running binary with zero service interruptions. + +### Peer health checks + +To make our load balancer more reliable, we would like to add some health checks +to our upstream peers. That way if there is a peer that has gone down, we can +quickly stop routing our traffic to that peer. + +First let's see how our simple load balancer behaves when one of the peers is +down. To do this, we'll update the list of peers to include a peer that is +guaranteed to be broken. + +```rust +fn main() { + // ... + let upstreams = + LoadBalancer::try_from_iter(["1.1.1.1:443", "1.0.0.1:443", "127.0.0.1:343"]).unwrap(); + // ... +} +``` + +Now if we run our loud balancer again with `cargo run`, and test it with + +``` +curl 127.0.0.1:6188 -svo /dev/null +``` + +We can see that one in every 3 request fails with `502: Bad Gateway`. This is +because our peer selection is strictly following the `RoundRobin` selection +pattern we gave it with no consideration to whether that peer is healthy. We can +fix this by adding a basic health check service. + +```rust +fn main() { + let mut my_server = Server::new(None).unwrap(); + my_server.bootstrap(); + + // Note that upstreams needs to be declared as `mut` now + let mut upstreams = + LoadBalancer::try_from_iter(["1.1.1.1:443", "1.0.0.1:443", "127.0.0.1:343"]).unwrap(); + + let hc = TcpHealthCheck::new(); + upstreams.set_health_check(hc); + upstreams.health_check_frequency = Some(std::time::Duration::from_secs(1)); + + let background = background_service("health check", upstreams); + let upstreams = background.task(); + + // `upstreams` no longer need to be wrapped in an arc + let mut lb = http_proxy_service(&my_server.configuration, LB(upstreams)); + lb.add_tcp("0.0.0.0:6188"); + + my_server.add_service(background); + + my_server.add_service(lb); + my_server.run_forever(); +} +``` + +Now if we again run and test our load balancer, we see that all requests +succeed and the broken peer is never used. Based on the configuration we used, +if that peer were to become healthy again, it would be re-included in the round +robin again in within 1 second. + +### Command line options + +The pingora `Server` type provides a lot of built-in functionality that we can +take advantage of with single-line change. + +```rust +fn main() { + let mut my_server = Server::new(Some(Opt::default())).unwrap(); + ... +} +``` + +With this change, the command-line arguments passed to our load balancer will be +consumed by Pingora. We can test this by running: + +``` +cargo run -- -h +``` + +We should see a help menu with the list of arguments now available to us. We +will take advantage of those in the next sections to do more with our load +balancer for free + +### Running in the background + +Passing the parameter `-d` or `--daemon` will tell the program to run in the background. + +``` +cargo run -- -d +``` + +To stop this service, you can send `SIGTERM` signal to it for a graceful shutdown, in which the service will stop accepting new request but try to finish all ongoing requests before exiting. +``` +pkill -SIGTERM load_balancer +``` + (`SIGTERM` is the default signal for `pkill`.) + +### Configurations +Pingora configuration files help define how to run the service. Here is an +example config file that defines how many threads the service can have, the +location of the pid file, the error log file, and the upgrade coordination +socket (which we will explain later). Copy the contents below and put them into +a file called `conf.yaml` in your `load_balancer` project directory. + +```yaml +--- +version: 1 +threads: 2 +pid_file: /tmp/load_balancer.pid +error_log: /tmp/load_balancer_err.log +upgrade_sock: /tmp/load_balancer.sock +``` + +To use this conf file: +``` +RUST_LOG=INFO cargo run -- -c conf.yaml -d +``` +`RUST_LOG=INFO` is here so that the service actually populate the error log. + +Now you can find the pid of the service. +``` + cat /tmp/load_balancer.pid +``` + +### Gracefully upgrade the service +(Linux only) + +Let's say we changed the code of the load balancer, recompiled the binary. Now we want to upgrade the service running in the background to this newer version. + +If we simply stop the old service, then start the new one, some request arriving in between could be lost. Fortunately, Pingora provides a graceful way to upgrade the service. + +This is done by, first, send `SIGQUIT` signal to the running server, and then start the new server with the parameter `-u` \ `--upgrade`. + +``` +pkill -SIGQUIT load_balancer &&\ +RUST_LOG=INFO cargo run -- -c conf.yaml -d -u +``` + +In this process, The old running server will wait and hand over its listening sockets to the new server. Then the old server runs until all its ongoing requests finish. + +From a client's perspective, the service is always running because the listening socket is never closed. + +## Full examples + +The full code for this example is available in this repository under + +[pingora-proxy/examples/load_balancer.rs](../pingora-proxy/examples/load_balancer.rs) + +Other examples that you may find helpful are also available here + +[pingora-proxy/examples/](../pingora-proxy/examples/) +[pingora/examples](../pingora/examples/)
\ No newline at end of file diff --git a/docs/user_guide/conf.md b/docs/user_guide/conf.md new file mode 100644 index 0000000..6e13609 --- /dev/null +++ b/docs/user_guide/conf.md @@ -0,0 +1,33 @@ +# Configuration + +A Pingora configuration file is a list of Pingora settings in yaml format. + +Example +```yaml +--- +version: 1 +threads: 2 +pid_file: /run/pingora.pid +upgrade_sock: /tmp/pingora_upgrade.sock +user: nobody +group: webusers +``` +## Settings +| Key | meaning | value type | +| ------------- |-------------| ----| +| version | the version of the conf, currently it is a constant `1` | number | +| pid_file | The path to the pid file | string | +| daemon | whether to run the server in the background | bool | +| error_log | the path to error log output file. STDERR is used if not set | string | +| upgrade_sock | the path to the upgrade socket. | string | +| threads | number of threads per service | number | +| user | the user the pingora server should be run under after daemonization | string | +| group | the group the pingora server should be run under after daemonization | string | +| client_bind_to_ipv4 | source IPv4 addresses to bind to when connecting to server | list of string | +| client_bind_to_ipv6 | source IPv6 addresses to bind to when connecting to server| list of string | +| ca_file | The path to the root CA file | string | +| work_stealing | Enable work stealing runtime (default true). See Pingora runtime (WIP) section for more info | bool | +| upstream_keepalive_pool_size | The number of total connections to keep in the connetion pool | number | + +## Extension +Any unknown settings will be ignored. This allows extending the conf file to add and pass user defined settings. See User defined configuration section. diff --git a/docs/user_guide/ctx.md b/docs/user_guide/ctx.md new file mode 100644 index 0000000..2d585e6 --- /dev/null +++ b/docs/user_guide/ctx.md @@ -0,0 +1,116 @@ +# Sharing state across phases with `CTX` + +## Using `CTX` +The custom filters users implement in different phases of the request don't interact with each other directly. In order to share information and state across the filters, users can define a `CTX` struct. Each request owns a single `CTX` object. All the filters are able to read and update members of the `CTX` object. The CTX object will be dropped at the end of the request. + +### Example + +In the following example, the proxy parses the request header in the `request_filter` phase, it stores the boolean flag so that later in the `upstream_peer` phase the flag is used to decide which server to route traffic to. (Technically, the header can be parsed in `upstream_peer` phase, but we just do it in an earlier phase just for the demonstration.) + +```Rust +pub struct MyProxy(); + +pub struct MyCtx { + beta_user: bool, +} + +fn check_beta_user(req: &pingora_http::RequestHeader) -> bool { + // some simple logic to check if user is beta + req.headers.get("beta-flag").is_some() +} + +#[async_trait] +impl ProxyHttp for MyProxy { + type CTX = MyCtx; + fn new_ctx(&self) -> Self::CTX { + MyCtx { beta_user: false } + } + + async fn request_filter(&self, session: &mut Session, ctx: &mut Self::CTX) -> Result<bool> { + ctx.beta_user = check_beta_user(session.req_header()); + Ok(false) + } + + async fn upstream_peer( + &self, + _session: &mut Session, + ctx: &mut Self::CTX, + ) -> Result<Box<HttpPeer>> { + let addr = if ctx.beta_user { + info!("I'm a beta user"); + ("1.0.0.1", 443) + } else { + ("1.1.1.1", 443) + }; + + let peer = Box::new(HttpPeer::new(addr, true, "one.one.one.one".to_string())); + Ok(peer) + } +} +``` + +## Sharing state across requests +Sharing state such as a counter, cache and other info across requests is common. There is nothing special needed for sharing resources and data across requests in Pingora. `Arc`, `static` or any other mechanism can be used. + + +### Example +Let's modify the example above to track the number of beta visitors as well as the number of total visitors. The counters can either be defined in the `MyProxy` struct itself or defined as a global variable. Because the counters can be concurrently accessed, Mutex is used here. + +```Rust +// global counter +static REQ_COUNTER: Mutex<usize> = Mutex::new(0); + +pub struct MyProxy { + // counter for the service + beta_counter: Mutex<usize>, // AtomicUsize works too +} + +pub struct MyCtx { + beta_user: bool, +} + +fn check_beta_user(req: &pingora_http::RequestHeader) -> bool { + // some simple logic to check if user is beta + req.headers.get("beta-flag").is_some() +} + +#[async_trait] +impl ProxyHttp for MyProxy { + type CTX = MyCtx; + fn new_ctx(&self) -> Self::CTX { + MyCtx { beta_user: false } + } + + async fn request_filter(&self, session: &mut Session, ctx: &mut Self::CTX) -> Result<bool> { + ctx.beta_user = check_beta_user(session.req_header()); + Ok(false) + } + + async fn upstream_peer( + &self, + _session: &mut Session, + ctx: &mut Self::CTX, + ) -> Result<Box<HttpPeer>> { + let mut req_counter = REQ_COUNTER.lock().unwrap(); + *req_counter += 1; + + let addr = if ctx.beta_user { + let mut beta_count = self.beta_counter.lock().unwrap(); + *beta_count += 1; + info!("I'm a beta user #{beta_count}"); + ("1.0.0.1", 443) + } else { + info!("I'm an user #{req_counter}"); + ("1.1.1.1", 443) + }; + + let peer = Box::new(HttpPeer::new(addr, true, "one.one.one.one".to_string())); + Ok(peer) + } +} +``` + +The complete example can be found under [`pingora-proxy/examples/ctx.rs`](../../pingora-proxy/examples/ctx.rs). You can run it using `cargo`: +``` +RUST_LOG=INFO cargo run --example ctx +```
\ No newline at end of file diff --git a/docs/user_guide/daemon.md b/docs/user_guide/daemon.md new file mode 100644 index 0000000..f5f8d4a --- /dev/null +++ b/docs/user_guide/daemon.md @@ -0,0 +1,7 @@ +# Daemonization + +When a Pingora server is configured to run as a daemon, after its bootstrapping, it will move itself to the background and optionally change to run under the configured user and group. The `pid_file` option comes handy in this case for the user to track the PID of the daemon in the background. + +Daemonization also allows the server to perform privileged actions like loading secrets and then switch to an unprivileged user before accepting any requests from the network. + +This process happens in the `run_forever()` call. Because daemonization involves `fork()`, certain things like threads created before this call are likely lost. diff --git a/docs/user_guide/error_log.md b/docs/user_guide/error_log.md new file mode 100644 index 0000000..6dd0830 --- /dev/null +++ b/docs/user_guide/error_log.md @@ -0,0 +1,13 @@ +# Error logging + +Pingora libraries are built to expect issues like disconnects, timeouts and invalid inputs from the network. A common way to record these issues are to output them in error log (STDERR or log files). + +## Log level guidelines +Pingora adopts the idea behind [log](https://docs.rs/log/latest/log/). There are five log levels: +* `error`: This level should be used when the error stops the request from being handled correctly. For example when the server we try to connect to is offline. +* `warning`: This level should be used when an error occurs but the system recovers from it. For example when the primary DNS timed out but the system is able to query the secondary DNS. +* `info`: Pingora logs when the server is starting up or shuting down. +* `debug`: Internal details. This log level is not compiled in `release` builds. +* `trace`: Fine-grained internal details. This log level is not compiled in `release` builds. + +The pingora-proxy crate has a well-defined interface to log errors, so that users don't have to manually log common proxy errors. See its guide for more details. diff --git a/docs/user_guide/errors.md b/docs/user_guide/errors.md new file mode 100644 index 0000000..b0a9b82 --- /dev/null +++ b/docs/user_guide/errors.md @@ -0,0 +1,53 @@ +# How to return errors + +For easy error handling, the `pingora-error` crate exports a custom `Result` type used throughout other Pingora crates. + +The `Error` struct used in this `Result`'s error variant is a wrapper around arbitrary error types. It allows the user to tag the source of the underlying error and attach other custom context info. + +Users will often need to return errors by propagating an existing error or creating a wholly new one. `pingora-error` makes this easy with its error building functions. + +## Examples + +For example, one could return an error when an expected header is not present: + +```rust +fn validate_req_header(req: &RequestHeader) -> Result<()> { + // validate that the `host` header exists + req.headers() + .get(http::header::HOST) + .ok_or_else(|| Error::explain(InvalidHTTPHeader, "No host header detected")) +} + +impl MyServer { + pub async fn handle_request_filter( + &self, + http_session: &mut Session, + ctx: &mut CTX, + ) -> Result<bool> { + validate_req_header(session.req_header()?).or_err(HTTPStatus(400), "Missing required headers")?; + Ok(true) + } +} +``` + +`validate_req_header` returns an `Error` if the `host` header is not found, using `Error::explain` to create a new `Error` along with an associated type (`InvalidHTTPHeader`) and helpful context that may be logged in an error log. + +This error will eventually propagate to the request filter, where it is returned as a new `HTTPStatus` error using `or_err`. (As part of the default pingora-proxy `fail_to_proxy()` phase, not only will this error be logged, but it will result in sending a `400 Bad Request` response downstream.) + +Note that the original causing error will be visible in the error logs as well. `or_err` wraps the original causing error in a new one with additional context, but `Error`'s `Display` implementation also prints the chain of causing errors. + +## Guidelines + +An error has a _type_ (e.g. `ConnectionClosed`), a _source_ (e.g. `Upstream`, `Downstream`, `Internal`), and optionally, a _cause_ (another wrapped error) and a _context_ (arbitrary user-provided string details). + +A minimal error can be created using functions like `new_in` / `new_up` / `new_down`, each of which specifies a source and asks the user to provide a type. + +Generally speaking: +* To create a new error, without a direct cause but with more context, use `Error::explain`. You can also use `explain_err` on a `Result` to replace the potential error inside it with a new one. +* To wrap a causing error in a new one with more context, use `Error::because`. You can also use `or_err` on a `Result` to replace the potential error inside it by wrapping the original one. + +## Retry + +Errors can be "retry-able." If the error is retry-able, pingora-proxy will be allowed to retry the upstream request. Some errors are only retry-able on [reused connections](pooling.md), e.g. to handle situations where the remote end has dropped a connection we attempted to reuse. + +By default a newly created `Error` either takes on its direct causing error's retry status, or, if left unspecified, is considered not retry-able. diff --git a/docs/user_guide/failover.md b/docs/user_guide/failover.md new file mode 100644 index 0000000..5783256 --- /dev/null +++ b/docs/user_guide/failover.md @@ -0,0 +1,67 @@ +# Handling failures and failover + +Pingora-proxy allows users to define how to handle failures throughout the life of a proxied request. + +When a failure happens before the response header is sent downstream, users have a few options: +1. Send an error page downstream and then give up. +2. Retry the same upstream again. +3. Try another upstream if applicable. + +Otherwise, once the response header is already sent downstream, there is nothing the proxy can do other than logging an error and then giving up on the request. + + +## Retry / Failover +In order to implement retry or failover, `fail_to_connect()` / `error_while_proxy()` needs to mark the error as "retry-able." For failover, `fail_to_connect() / error_while_proxy()` also needs to update the `CTX` to tell `upstream_peer()` not to use the same `Peer` again. + +### Safety +In general, idempotent HTTP requests, e.g., `GET`, are safe to retry. Other requests, e.g., `POST`, are not safe to retry if the requests have already been sent. When `fail_to_connect()` is called, pingora-proxy guarantees that nothing was sent upstream. Users are not recommended to retry an non-idempotent request after `error_while_proxy()` unless they know the upstream server enough to know whether it is safe. + +### Example +In the following example we set a `tries` variable on the `CTX` to track how many connection attempts we've made. When setting our peer in `upstream_peer` we check if `tries` is less than one and connect to 192.0.2.1. On connect failure we increment `tries` in `fail_to_connect` and set `e.set_retry(true)` which tells Pingora this a retryable error. On retry we enter `upstream_peer` again and this time connect to 1.1.1.1. If we're unable to connect to 1.1.1.1 we return a 502 since we only set `e.set_retry(true)` in `fail_to_connect` when `tries` is zero. + +```Rust +pub struct MyProxy(); + +pub struct MyCtx { + tries: usize, +} + +#[async_trait] +impl ProxyHttp for MyProxy { + type CTX = MyCtx; + fn new_ctx(&self) -> Self::CTX { + MyCtx { tries: 0 } + } + + fn fail_to_connect( + &self, + _session: &mut Session, + _peer: &HttpPeer, + ctx: &mut Self::CTX, + mut e: Box<Error>, + ) -> Box<Error> { + if ctx.tries > 0 { + return e; + } + ctx.tries += 1; + e.set_retry(true); + e + } + + async fn upstream_peer( + &self, + _session: &mut Session, + ctx: &mut Self::CTX, + ) -> Result<Box<HttpPeer>> { + let addr = if ctx.tries < 1 { + ("192.0.2.1", 443) + } else { + ("1.1.1.1", 443) + }; + + let mut peer = Box::new(HttpPeer::new(addr, true, "one.one.one.one".to_string())); + peer.options.connection_timeout = Some(Duration::from_millis(100)); + Ok(peer) + } +} +``` diff --git a/docs/user_guide/graceful.md b/docs/user_guide/graceful.md new file mode 100644 index 0000000..1f67aa3 --- /dev/null +++ b/docs/user_guide/graceful.md @@ -0,0 +1,19 @@ +# Graceful restart and shutdown + +Graceful restart, upgrade, and shutdown mechanisms are very commonly used to avoid errors or downtime when releasing new versions of pingora servers. + +Pingora graceful upgrade mechanism guarantees the following: +* A request is guaranteed to be handled either by the old server instance or the new one. No request will see connection refused when trying to connect to the server endpoints. +* A request that can finish within the grace period is guaranteed not to be terminated. + +## How to graceful upgrade +### Step 0 +Configure the upgrade socket. The old and new server need to agree on the same path to this socket. See configuration manual for details. + +### Step 1 +Start the new instance with the `--upgrade` cli option. The new instance will not try to listen to the service endpoint right away. It will try to acquire the listening socket from the old instance instead. + +### Step 2 +Send SIGQUIT signal to the old instance. The old instance will start to transfer the listening socket to the new instance. + +Once step 2 is successful, the new instance will start to handle new incoming connections right away. Meanwhile, the old instance will enter its graceful shutdown mode. It waits a short period of time (to give the new instance time to initialize and prepare to handle traffic), after which it will not accept any new connections. diff --git a/docs/user_guide/index.md b/docs/user_guide/index.md new file mode 100644 index 0000000..a8abcb1 --- /dev/null +++ b/docs/user_guide/index.md @@ -0,0 +1,31 @@ +# User Guide + +In this guide, we will cover the most used features, operations and settings of Pingora. + +## Running Pingora servers +* [Start and stop](start_stop.md) +* [Graceful restart and graceful shutdown](graceful.md) +* [Configuration](conf.md) +* [Daemonization](daemon.md) +* [Systemd integration](systemd.md) +* [Handling panics](panic.md) +* [Error logging](error_log.md) +* [Prometheus](prom.md) + +## Building HTTP proxies +* [Life of a request: `pingora-proxy` phases and filters](phase.md) +* [`Peer`: how to connect to upstream](peer.md) +* [Sharing state across phases with `CTX`](ctx.md) +* [How to return errors](errors.md) +* [Examples: take control of the request](modify_filter.md) +* [Connection pooling and reuse](pooling.md) +* [Handling failures and failover](failover.md) + +## Advanced topics (WIP) +* [Pingora internals](internals.md) +* Using BoringSSL +* User defined configuration +* Pingora async runtime and threading model +* Background Service +* Blocking code in async context +* Tracing diff --git a/docs/user_guide/internals.md b/docs/user_guide/internals.md new file mode 100644 index 0000000..0655379 --- /dev/null +++ b/docs/user_guide/internals.md @@ -0,0 +1,256 @@ +# Pingora Internals + +(Special thanks to [James Munns](https://github.com/jamesmunns) for writing this section) + + +## Starting the `Server` + +The pingora system starts by spawning a *server*. The server is responsible for starting *services*, and listening for termination events. + +``` + ┌───────────┐ + ┌─────────>│ Service │ + │ └───────────┘ +┌────────┐ │ ┌───────────┐ +│ Server │──Spawns──┼─────────>│ Service │ +└────────┘ │ └───────────┘ + │ ┌───────────┐ + └─────────>│ Service │ + └───────────┘ +``` + +After spawning the *services*, the server continues to listen to a termination event, which it will propagate to the created services. + +## Services + +*Services* are entities that handle listening to given sockets, and perform the core functionality. A *service* is tied to a particular protocol and set of options. + +> NOTE: there are also "background" services, which just do *stuff*, and aren't necessarily listening to a socket. For now we're just talking about listener services. + +Each service has its own threadpool/tokio runtime, with a number of threads based on the configured value. Worker threads are not shared cross-service. Service runtime threadpools may be work-stealing (tokio-default), or non-work-stealing (N isolated single threaded runtimes). + +``` +┌─────────────────────────┐ +│ ┌─────────────────────┐ │ +│ │┌─────────┬─────────┐│ │ +│ ││ Conn │ Conn ││ │ +│ │├─────────┼─────────┤│ │ +│ ││Endpoint │Endpoint ││ │ +│ │├─────────┴─────────┤│ │ +│ ││ Listeners ││ │ +│ │├─────────┬─────────┤│ │ +│ ││ Worker │ Worker ││ │ +│ ││ Thread │ Thread ││ │ +│ │├─────────┴─────────┤│ │ +│ ││ Tokio Executor ││ │ +│ │└───────────────────┘│ │ +│ └─────────────────────┘ │ +│ ┌───────┐ │ +└─┤Service├───────────────┘ + └───────┘ +``` + +## Service Listeners + +At startup, each Service is assigned a set of downstream endpoints that they listen to. A single service may listen to more than one endpoint. The Server also passes along any relevant configuration, including TLS settings if relevant. + +These endpoints are converted into listening sockets, called `TransportStack`s. Each `TransportStack` is assigned to an async task within that service's executor. + +``` + ┌───────────────────┐ + │┌─────────────────┐│ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ + ┌─────────┐ ││ TransportStack ││ ┌────────────────────┐│ +┌┤Listeners├────────┐ ││ ││ │ │ ││ │ +│└─────────┘ │ ││ (Listener, TLS │├──────spawn(run_endpoint())────>│ Service<ServerApp> ││ +│┌─────────────────┐│ ││ Acceptor, ││ │ │ ││ │ +││ Endpoint ││ ││ UpgradeFDs) ││ └────────────────────┘│ +││ addr/ports ││ │├─────────────────┤│ │ │ │ +││ + TLS Settings ││ ││ TransportStack ││ ┌────────────────────┐│ +│├─────────────────┤│ ││ ││ │ │ ││ │ +││ Endpoint ││──build()─> ││ (Listener, TLS │├──────spawn(run_endpoint())────>│ Service<ServerApp> ││ +││ addr/ports ││ ││ Acceptor, ││ │ │ ││ │ +││ + TLS Settings ││ ││ UpgradeFDs) ││ └────────────────────┘│ +│├─────────────────┤│ │├─────────────────┤│ │ │ │ +││ Endpoint ││ ││ TransportStack ││ ┌────────────────────┐│ +││ addr/ports ││ ││ ││ │ │ ││ │ +││ + TLS Settings ││ ││ (Listener, TLS │├──────spawn(run_endpoint())────>│ Service<ServerApp> ││ +│└─────────────────┘│ ││ Acceptor, ││ │ │ ││ │ +└───────────────────┘ ││ UpgradeFDs) ││ └────────────────────┘│ + │└─────────────────┘│ │ ┌───────────────┐ │ │ ┌──────────────┐ + └───────────────────┘ ─│start_service()│─ ─ ─ ─│ Worker Tasks ├ ─ ─ ┘ + └───────────────┘ └──────────────┘ +``` + +## Downstream connection lifecycle + +Each service processes incoming connections by spawning a task-per-connection. These connections are held open +as long as there are new events to be handled. + +``` + ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ + + │ ┌───────────────┐ ┌────────────────┐ ┌─────────────────┐ ┌─────────────┐ │ +┌────────────────────┐ │ UninitStream │ │ Service │ │ App │ │ Task Ends │ +│ │ │ │ ::handshake() │──>│::handle_event()│──>│ ::process_new() │──┬>│ │ │ +│ Service<ServerApp> │──spawn()──> └───────────────┘ └────────────────┘ └─────────────────┘ │ └─────────────┘ +│ │ │ ▲ │ │ +└────────────────────┘ │ while + │ └─────────reuse │ + ┌───────────────────────────┐ + └ ─│ Task on Service Runtime │─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ + └───────────────────────────┘ +``` + +## What is a proxy then? + +Interestingly, the `pingora` `Server` itself has no particular notion of a Proxy. + +Instead, it only thinks in terms of `Service`s, which are expected to contain a particular implementor of the `ServiceApp` trait. + +For example, this is how an `HttpProxy` struct, from the `pingora-proxy` crate, "becomes" a `Service` spawned by the `Server`: + +``` +┌─────────────┐ +│ HttpProxy │ +│ (struct) │ +└─────────────┘ + │ + implements ┌─────────────┐ + │ │HttpServerApp│ + └───────>│ (trait) │ + └─────────────┘ + │ + implements ┌─────────────┐ + │ │ ServerApp │ + └───────>│ (trait) │ + └─────────────┘ + │ + contained ┌─────────────────────┐ + within │ │ + └───────>│ Service<ServiceApp> │ + │ │ + └─────────────────────┘ +``` + +Different functionality and helpers are provided at different layers in this representation. + +``` +┌─────────────┐ ┌──────────────────────────────────────┐ +│ HttpProxy │ │Handles high level Proxying workflow, │ +│ (struct) │─ ─ ─ ─ │ customizable via ProxyHttp trait │ +└──────┬──────┘ └──────────────────────────────────────┘ + │ +┌──────▼──────┐ ┌──────────────────────────────────────┐ +│HttpServerApp│ │ Handles selection of H1 vs H2 stream │ +│ (trait) │─ ─ ─ ─ │ handling, incl H2 handshake │ +└──────┬──────┘ └──────────────────────────────────────┘ + │ +┌──────▼──────┐ ┌──────────────────────────────────────┐ +│ ServerApp │ │ Handles dispatching of App instances │ +│ (trait) │─ ─ ─ ─ │ as individual tasks, per Session │ +└──────┬──────┘ └──────────────────────────────────────┘ + │ +┌──────▼──────┐ ┌──────────────────────────────────────┐ +│ Service<A> │ │ Handles dispatching of App instances │ +│ (struct) │─ ─ ─ ─ │ as individual tasks, per Listener │ +└─────────────┘ └──────────────────────────────────────┘ +``` + +The `HttpProxy` struct handles the high level workflow of proxying an HTTP connection + +It uses the `ProxyHttp` (note the flipped wording order!) **trait** to allow customization +at each of the following steps (note: taken from [the phase chart](./phase_chart.md) doc): + +```mermaid + graph TD; + start("new request")-->request_filter; + request_filter-->upstream_peer; + + upstream_peer-->Connect{{IO: connect to upstream}}; + + Connect--connection success-->connected_to_upstream; + Connect--connection failure-->fail_to_connect; + + connected_to_upstream-->upstream_request_filter; + upstream_request_filter --> SendReq{{IO: send request to upstream}}; + SendReq-->RecvResp{{IO: read response from upstream}}; + RecvResp-->upstream_response_filter-->response_filter-->upstream_response_body_filter-->response_body_filter-->logging-->endreq("request done"); + + fail_to_connect --can retry-->upstream_peer; + fail_to_connect --can't retry-->fail_to_proxy--send error response-->logging; + + RecvResp--failure-->IOFailure; + SendReq--failure-->IOFailure; + error_while_proxy--can retry-->upstream_peer; + error_while_proxy--can't retry-->fail_to_proxy; + + request_filter --send response-->logging + + + Error>any response filter error]-->error_while_proxy + IOFailure>IO error]-->error_while_proxy + +``` + +## Zooming out + +Before we zoom in, it's probably good to zoom out and remind ourselves how +a proxy generally works: + +``` +┌────────────┐ ┌─────────────┐ ┌────────────┐ +│ Downstream │ │ Proxy │ │ Upstream │ +│ Client │─────────>│ │────────>│ Server │ +└────────────┘ └─────────────┘ └────────────┘ +``` + +The proxy will be taking connections from the **Downstream** client, and (if +everything goes right), establishing a connection with the appropriate +**Upstream** server. This selected upstream server is referred to as +the **Peer**. + +Once the connection is established, the Downstream and Upstream can communicate +bidirectionally. + +So far, the discussion of Server, Services, and Listeners have focused on the LEFT +half of this diagram, handling incoming Downstream connections, and getting it TO +the proxy component. + +Next, we'll look at the RIGHT half of this diagram, connecting to Upstreams. + +## Managing the Upstream + +Connections to Upstream Peers are made through `Connector`s. This is not a specific type or trait, but more +of a "style". + +Connectors are responsible for a few things: + +* Establishing a connection with a Peer +* Maintaining a connection pool with the Peer, allowing for connection reuse across: + * Multiple requests from a single downstream client + * Multiple requests from different downstream clients +* Measuring health of connections, for connections like H2, which perform regular pings +* Handling protocols with multiple poolable layers, like H2 +* Caching, if relevant to the protocol and enabled +* Compression, if relevant to the protocol and enabled + +Now in context, we can see how each end of the Proxy is handled: + +``` +┌────────────┐ ┌─────────────┐ ┌────────────┐ +│ Downstream │ ┌ ─│─ Proxy ┌ ┼ ─ │ Upstream │ +│ Client │─────────>│ │ │──┼─────>│ Server │ +└────────────┘ │ └───────────┼─┘ └────────────┘ + ─ ─ ┘ ─ ─ ┘ + ▲ ▲ + ┌──┘ └──┐ + │ │ + ┌ ─ ─ ─ ─ ┐ ┌ ─ ─ ─ ─ ─ + Listeners Connectors│ + └ ─ ─ ─ ─ ┘ └ ─ ─ ─ ─ ─ +``` + +## What about multiple peers? + +`Connectors` only handle the connection to a single peer, so selecting one of potentially multiple Peers +is actually handled one level up, in the `upstream_peer()` method of the `ProxyHttp` trait. diff --git a/docs/user_guide/modify_filter.md b/docs/user_guide/modify_filter.md new file mode 100644 index 0000000..3e5378f --- /dev/null +++ b/docs/user_guide/modify_filter.md @@ -0,0 +1,133 @@ +# Examples: taking control of the request + +In this section we will go through how to route, modify or reject requests. + +## Routing +Any information from the request can be used to make routing decision. Pingora doesn't impose any constraints on how users could implement their own routing logic. + +In the following example, the proxy sends traffic to 1.0.0.1 only when the request path start with `/family/`. All the other requests are routed to 1.1.1.1. + +```Rust +pub struct MyGateway; + +#[async_trait] +impl ProxyHttp for MyGateway { + type CTX = (); + fn new_ctx(&self) -> Self::CTX {} + + async fn upstream_peer( + &self, + session: &mut Session, + _ctx: &mut Self::CTX, + ) -> Result<Box<HttpPeer>> { + let addr = if session.req_header().uri.path().starts_with("/family/") { + ("1.0.0.1", 443) + } else { + ("1.1.1.1", 443) + }; + + info!("connecting to {addr:?}"); + + let peer = Box::new(HttpPeer::new(addr, true, "one.one.one.one".to_string())); + Ok(peer) + } +} +``` + + +## Modifying headers + +Both request and response headers can be added, removed or modified in their corresponding phases. In the following example, we add logic to the `response_filter` phase to update the `Server` header and remove the `alt-svc` header. + +```Rust +#[async_trait] +impl ProxyHttp for MyGateway { + ... + async fn response_filter( + &self, + _session: &mut Session, + upstream_response: &mut ResponseHeader, + _ctx: &mut Self::CTX, + ) -> Result<()> + where + Self::CTX: Send + Sync, + { + // replace existing header if any + upstream_response + .insert_header("Server", "MyGateway") + .unwrap(); + // because we don't support h3 + upstream_response.remove_header("alt-svc"); + + Ok(()) + } +} +``` + +## Return Error pages + +Sometimes instead of proxying the traffic, under certain conditions, such as authentication failures, you might want the proxy to just return an error page. + +```Rust +fn check_login(req: &pingora_http::RequestHeader) -> bool { + // implement you logic check logic here + req.headers.get("Authorization").map(|v| v.as_bytes()) == Some(b"password") +} + +#[async_trait] +impl ProxyHttp for MyGateway { + ... + async fn request_filter(&self, session: &mut Session, _ctx: &mut Self::CTX) -> Result<bool> { + if session.req_header().uri.path().starts_with("/login") + && !check_login(session.req_header()) + { + let _ = session.respond_error(403).await; + // true: tell the proxy that the response is already written + return Ok(true); + } + Ok(false) + } +``` +## Logging + +Logging logic can be added to the `logging` phase of Pingora. The logging phase runs on every request right before Pingora proxy finish processing it. This phase runs for both successful and failed requests. + +In the example below, we add Prometheus metric and access logging to the proxy. In order for the metrics to be scraped, we also start a Prometheus metric server on a different port. + + +``` Rust +pub struct MyGateway { + req_metric: prometheus::IntCounter, +} + +#[async_trait] +impl ProxyHttp for MyGateway { + ... + async fn logging( + &self, + session: &mut Session, + _e: Option<&pingora::Error>, + ctx: &mut Self::CTX, + ) { + let response_code = session + .response_written() + .map_or(0, |resp| resp.status.as_u16()); + // access log + info!( + "{} response code: {response_code}", + self.request_summary(session, ctx) + ); + + self.req_metric.inc(); + } + +fn main() { + ... + let mut prometheus_service_http = + pingora::services::listening::Service::prometheus_http_service(); + prometheus_service_http.add_tcp("127.0.0.1:6192"); + my_server.add_service(prometheus_service_http); + + my_server.run_forever(); +} +```
\ No newline at end of file diff --git a/docs/user_guide/panic.md b/docs/user_guide/panic.md new file mode 100644 index 0000000..58bf19d --- /dev/null +++ b/docs/user_guide/panic.md @@ -0,0 +1,10 @@ +# Handling panics + +Any panic that happens to particular requests does not affect other ongoing requests or the server's ability to handle other requests. Sockets acquired by the panicking requests are dropped (closed). The panics will be captured by the tokio runtime and then ignored. + +In order to monitor the panics, Pingora server has built-in Sentry integration. +```rust +my_server.sentry = Some("SENTRY_DSN"); +``` + +Even though a panic is not fatal in Pingora, it is still not the preferred way to handle failures like network timeouts. Panics should be reserved for unexpected logic errors. diff --git a/docs/user_guide/peer.md b/docs/user_guide/peer.md new file mode 100644 index 0000000..1176d14 --- /dev/null +++ b/docs/user_guide/peer.md @@ -0,0 +1,35 @@ +# `Peer`: how to connect to upstream + +In the `upstream_peer()` phase the user should return a `Peer` object which defines how to connect to a certain upstream. + +## `Peer` +A `HttpPeer` defines which upstream to connect to. +| attribute | meaning | +| ------------- |-------------| +|address: `SocketAddr`| The IP:Port to connect to | +|scheme: `Scheme`| Http or Https | +|sni: `String`| The SNI to use, Https only | +|proxy: `Option<Proxy>`| The setting to proxy the request through a [CONNECT proxy](https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/CONNECT) | +|client_cert_key: `Option<Arc<CertKey>>`| The client certificate to use in mTLS connections to upstream | +|options: `PeerOptions`| See below | + + +## `PeerOptions` +A `PeerOptions` defines how to connect to the upstream. +| attribute | meaning | +| ------------- |-------------| +|bind_to: `Option<InetSocketAddr>`| Which local address to bind to as the client IP | +|connection_timeout: `Option<Duration>`| How long to wait before giving up *establishing* a TCP connection | +|total_connection_timeout: `Option<Duration>`| How long to wait before giving up *establishing* a connection including TLS handshake time | +|read_timeout: `Option<Duration>`| How long to wait before each individual `read()` from upstream. The timer is reset after each `read()` | +|idle_timeout: `Option<Duration>`| How long to wait before closing a idle connection waiting for connetion reuse | +|write_timeout: `Option<Duration>`| How long to wait before a `write()` to upstream finishes | +|verify_cert: `bool`| Whether to check if upstream' server cert is valid and validated | +|verify_hostname: `bool`| Whether to check if upstream server cert's CN matches the SNI | +|alternative_cn: `Option<String>`| Accept the cert if the CN matches this name | +|alpn: `ALPN`| Which HTTP protocol to advertise during ALPN, http1.1 and/or http2 | +|ca: `Option<Arc<Box<[X509]>>>`| Which Root CA to use to validate the server's cert | +|tcp_keepalive: `Option<TcpKeepalive>`| TCP keepalive settings to upstream | + +## Examples +TBD diff --git a/docs/user_guide/phase.md b/docs/user_guide/phase.md new file mode 100644 index 0000000..aa30f3a --- /dev/null +++ b/docs/user_guide/phase.md @@ -0,0 +1,126 @@ +# Life of a request: pingora-proxy phases and filters + +## Intro +The pingora-proxy HTTP proxy framework supports highly programmable proxy behaviors. This is done by allowing users to inject custom logic into different phases (stages) in the life of a request. + +## Life of a proxied HTTP request +1. The life of a proxied HTTP request starts when the proxy reads the request header from the **downstream** (i.e., the client). +2. Then, the proxy connects to the **upstream** (i.e., the remote server). This step is skipped if there is a previously established [connection to reuse](pooling.md). +3. The proxy then sends the request header to the upstream. +4. Once the request header is sent, the proxy enters a duplex mode, which simultaneously proxies: + a. upstream response (both header and body) to the downstream, and + b. downstream request body to upstream (if any). +5. Once the entire request/response finishes, the life of the request is ended. All resources are released. The downstream connections and the upstream connections are recycled to be reused if applicable. + +## Pingora-proxy phases and filters +Pingora-proxy allows users to insert arbitrary logic into the life of a request. +```mermaid + graph TD; + start("new request")-->request_filter; + request_filter-->upstream_peer; + + upstream_peer-->Connect{{IO: connect to upstream}}; + + Connect--connection success-->connected_to_upstream; + Connect--connection failure-->fail_to_connect; + + connected_to_upstream-->upstream_request_filter; + upstream_request_filter --> SendReq{{IO: send request to upstream}}; + SendReq-->RecvResp{{IO: read response from upstream}}; + RecvResp-->upstream_response_filter-->response_filter-->upstream_response_body_filter-->response_body_filter-->logging-->endreq("request done"); + + fail_to_connect --can retry-->upstream_peer; + fail_to_connect --can't retry-->fail_to_proxy--send error response-->logging; + + RecvResp--failure-->IOFailure; + SendReq--failure-->IOFailure; + error_while_proxy--can retry-->upstream_peer; + error_while_proxy--can't retry-->fail_to_proxy; + + request_filter --send response-->logging + + + Error>any response filter error]-->error_while_proxy + IOFailure>IO error]-->error_while_proxy +``` + +### General filter usage guidelines +* Most filters return a [`pingora_error::Result<_>`](errors.md). When the returned value is `Result::Err`, `fail_to_proxy()` will be called and the request will be terminated. +* Most filters are async functions, which allows other async operations such as IO to be performed within the filters. +* A per-request `CTX` object can be defined to share states across the filters of the same request. All filters have mutable access to this object. +* Most filters are optional. +* The reason both `upstream_response_*_filter()` and `response_*_filter()` exist is for HTTP caching integration reasons (still WIP). + + +### `request_filter()` +This is the first phase of every request. + +This phase is usually for validating request inputs, rate limiting, and initializing context. + +### `proxy_upstream_filter()` +This phase determines if we should continue to the upstream to serve a response. If we short-circuit, a 502 is returned by default, but a different response can be implemented. + +This phase returns a boolean determining if we should continue to the upstream or error. + +### `upstream_peer()` +This phase decides which upstream to connect to (e.g. with DNS lookup and hashing/round-robin), and how to connect to it. + +This phase returns a `Peer` that defines the upstream to connect to. Implementing this phase is **required**. + +### `connected_to_upstream()` +This phase is executed when upstream is successfully connected. + +Usually this phase is for logging purposes. Connection info such as RTT and upstream TLS ciphers are reported in this phase. + +### `fail_to_connect()` +The counterpart of `connected_to_upstream()`. This phase is called if an error is encountered when connecting to upstream. + +In this phase users can report the error in Sentry/Prometheus/error log. Users can also decide if the error is retry-able. + +If the error is retry-able, `upstream_peer()` will be called again, in which case the user can decide whether to retry the same upstream or failover to a secondary one. + +If the error is not retry-able, the request will end. + +### `upstream_request_filter()` +This phase is to modify requests before sending to upstream. + +### `upstream_response_filter()/upstream_response_body_filter()` +This phase is triggered after an upstream response header/body is received. + +This phase is to modify response headers (or body) before sending to downstream. Note that this phase is called _prior_ to HTTP caching and therefore any changes made here will affect the response stored in the HTTP cache. + +### `response_filter()/response_body_filter()/response_trailer_filter()` +This phase is triggered after a response header/body/trailer is ready to send to downstream. + +This phase is to modify them before sending to downstream. + +### `error_while_proxy()` +This phase is triggered during proxy errors to upstream, this is after the connection is established. + +This phase may decide to retry a request if the connection was re-used and the HTTP method is idempotent. + +### `fail_to_proxy()` +This phase is called whenever an error is encounter during any of the phases above. + +This phase is usually for error logging and error reporting to downstream. + +### `logging()` +This is the last phase that runs after the request is finished (or errors) and before any of its resources are released. Every request will end up in this final phase. + +This phase is usually for logging and post request cleanup. + +### `request_summary()` +This is not a phase, but a commonly used callback. + +Every error that reaches `fail_to_proxy()` will be automatically logged in the error log. `request_summary()` will be called to dump the info regarding the request when logging the error. + +This callback returns a string which allows users to customize what info to dump in the error log to help track and debug the failures. + +### `suppress_error_log()` +This is also not a phase, but another callback. + +`fail_to_proxy()` errors are automatically logged in the error log, but users may not be interested in every error. For example, downstream errors are logged if the client disconnects early, but these errors can become noisy if users are mainly interested in observing upstream issues. This callback can inspect the error and returns true or false. If true, the error will not be written to the log. + +### Cache filters + +To be documented diff --git a/docs/user_guide/phase_chart.md b/docs/user_guide/phase_chart.md new file mode 100644 index 0000000..a7d01d4 --- /dev/null +++ b/docs/user_guide/phase_chart.md @@ -0,0 +1,30 @@ +Pingora proxy phases without caching +```mermaid + graph TD; + start("new request")-->request_filter; + request_filter-->upstream_peer; + + upstream_peer-->Connect{{IO: connect to upstream}}; + + Connect--connection success-->connected_to_upstream; + Connect--connection failure-->fail_to_connect; + + connected_to_upstream-->upstream_request_filter; + upstream_request_filter --> SendReq{{IO: send request to upstream}}; + SendReq-->RecvResp{{IO: read response from upstream}}; + RecvResp-->upstream_response_filter-->response_filter-->upstream_response_body_filter-->response_body_filter-->logging-->endreq("request done"); + + fail_to_connect --can retry-->upstream_peer; + fail_to_connect --can't retry-->fail_to_proxy--send error response-->logging; + + RecvResp--failure-->IOFailure; + SendReq--failure-->IOFailure; + error_while_proxy--can retry-->upstream_peer; + error_while_proxy--can't retry-->fail_to_proxy; + + request_filter --send response-->logging + + + Error>any response filter error]-->error_while_proxy + IOFailure>IO error]-->error_while_proxy +```
\ No newline at end of file diff --git a/docs/user_guide/pooling.md b/docs/user_guide/pooling.md new file mode 100644 index 0000000..0a108b3 --- /dev/null +++ b/docs/user_guide/pooling.md @@ -0,0 +1,22 @@ +# Connection pooling and reuse + +When the request to a `Peer` (upstream server) is finished, the connection to that peer is kept alive and added to a connection pool to be _reused_ by subsequent requests. This happens automatically without any special configuration. + +Requests that reuse previously established connections avoid the latency and compute cost of setting up a new connection, improving the Pingora server's overall performance and scalability. + +## Same `Peer` +Only the connections to the exact same `Peer` can be reused by a request. For correctness and security reasons, two `Peer`s are the same if and only if all the following attributes are the same +* IP:port +* scheme +* SNI +* client cert +* verify cert +* verify hostname +* alternative_cn +* proxy settings + +## Disable pooling +To disable connection pooling and reuse to a certain `Peer`, just set the `idle_timeout` to 0 seconds to all requests using that `Peer`. + +## Failure +A connection is considered not reusable if errors happen during the request. diff --git a/docs/user_guide/prom.md b/docs/user_guide/prom.md new file mode 100644 index 0000000..f248b0e --- /dev/null +++ b/docs/user_guide/prom.md @@ -0,0 +1,22 @@ +# Prometheus + +Pingora has a built-in prometheus HTTP metric server for scraping. + +```rust + ... + let mut prometheus_service_http = Service::prometheus_http_service(); + prometheus_service_http.add_tcp("0.0.0.0:1234"); + my_server.add_service(prometheus_service_http); + my_server.run_forever(); +``` + +The simplest way to use it is to have [static metrics](https://docs.rs/prometheus/latest/prometheus/#static-metrics). + +```rust +static MY_COUNTER: Lazy<IntGauge> = Lazy::new(|| { + register_int_gauge!("my_counter", "my counter").unwrap() +}); + +``` + +This static metric will automatically appear in the Prometheus metric endpoint. diff --git a/docs/user_guide/start_stop.md b/docs/user_guide/start_stop.md new file mode 100644 index 0000000..2c2a585 --- /dev/null +++ b/docs/user_guide/start_stop.md @@ -0,0 +1,27 @@ +# Starting and stoping Pingora server + +A pingora server is a regular unprivileged multithreaded process. + +## Start +By default, the server will run in the foreground. + +A Pingora server by default takes the following command-line arguments: + +| Argument | Effect | default| +| ------------- |-------------| ----| +| -d, --daemon | Daemonize the server | false | +| -t, --test | Test the server conf and then exit (WIP) | false | +| -c, --conf | The path to the configuarion file | empty string | +| -u, --upgrade | This server should gracefully upgrade a running server | false | + +## Stop +A Pingora server will listen to the following signals. + +### SIGINT: fast shutdown +Upon receiving SIGINT (ctrl + c), the server will exit immediately with no delay. All unfinished requests will be interrupted. This behavior is usually less preferred because it could break requests. + +### SIGTERM: graceful shutdown +Upon receiving SIGTERM, the server will notify all its services to shutdown, wait for some preconfigured time and then exit. This behavior gives requests a grace period to finish. + +### SIGQUIT: graceful upgrade +Similar to SIGQUIT, but the server will also transfer all its listening sockets to a new Pingora server so that there is no downtime during the upgrade. See the [graceful upgrade](graceful.md) section for more details. diff --git a/docs/user_guide/systemd.md b/docs/user_guide/systemd.md new file mode 100644 index 0000000..d5474d0 --- /dev/null +++ b/docs/user_guide/systemd.md @@ -0,0 +1,14 @@ +# Systemd integration + +A Pingora server doesn't depend on systemd but it can easily be made into a systemd service. + +```ini +[Service] +Type=forking +PIDFile=/run/pingora.pid +ExecStart=/bin/pingora -d -c /etc/pingora.conf +ExecReload=kill -QUIT $MAINPID +ExecReload=/bin/pingora -u -d -c /etc/pingora.conf +``` + +The example systemd setup integrates Pingora's graceful upgrade into systemd. To upgrade the pingora service, simply install a version of the binary and then call `systemctl reload pingora.service`. diff --git a/pingora-boringssl/Cargo.toml b/pingora-boringssl/Cargo.toml new file mode 100644 index 0000000..d84ed81 --- /dev/null +++ b/pingora-boringssl/Cargo.toml @@ -0,0 +1,36 @@ +[package] +name = "pingora-boringssl" +version = "0.1.0" +authors = ["Yuchen Wu <[email protected]>"] +license = "Apache-2.0" +edition = "2021" +repository = "https://github.com/cloudflare/pingora" +categories = ["asynchronous", "network-programming"] +keywords = ["async", "tls", "ssl", "pingora"] +description = """ +BoringSSL async APIs for Pingora. +""" + +# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html +[lib] +name = "pingora_boringssl" +path = "src/lib.rs" + +[dependencies] +boring = { version = "4.5", features = ["pq-experimental"] } +boring-sys = "4.5" +futures-util = { version = "0.3", default-features = false } +tokio = { workspace = true, features = ["io-util", "net", "macros", "rt-multi-thread"] } +libc = "0.2.70" +foreign-types-shared = { version = "0.3" } + + +[dev-dependencies] +tokio-test = "0.4" +tokio = { workspace = true, features = ["full"] } + +[features] +default = [] +pq_use_second_keyshare = [] +# waiting for boring-rs release +read_uninit = [] diff --git a/pingora-boringssl/LICENSE b/pingora-boringssl/LICENSE new file mode 100644 index 0000000..d645695 --- /dev/null +++ b/pingora-boringssl/LICENSE @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/pingora-boringssl/src/boring_tokio.rs b/pingora-boringssl/src/boring_tokio.rs new file mode 100644 index 0000000..dd99533 --- /dev/null +++ b/pingora-boringssl/src/boring_tokio.rs @@ -0,0 +1,305 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! This file reimplements tokio-boring with the [overhauled](https://github.com/sfackler/tokio-openssl/commit/56f6618ab619f3e431fa8feec2d20913bf1473aa) +//! tokio-openssl interface while the tokio APIs from official [boring] crate is not yet caught up to it. + +use boring::error::ErrorStack; +use boring::ssl::{self, ErrorCode, ShutdownResult, Ssl, SslRef, SslStream as SslStreamCore}; +use futures_util::future; +use std::fmt; +use std::io::{self, Read, Write}; +use std::pin::Pin; +use std::task::{Context, Poll}; +use tokio::io::{AsyncRead, AsyncWrite, ReadBuf}; + +struct StreamWrapper<S> { + stream: S, + context: usize, +} + +impl<S> fmt::Debug for StreamWrapper<S> +where + S: fmt::Debug, +{ + fn fmt(&self, fmt: &mut fmt::Formatter<'_>) -> fmt::Result { + fmt::Debug::fmt(&self.stream, fmt) + } +} + +impl<S> StreamWrapper<S> { + /// # Safety + /// + /// Must be called with `context` set to a valid pointer to a live `Context` object, and the + /// wrapper must be pinned in memory. + unsafe fn parts(&mut self) -> (Pin<&mut S>, &mut Context<'_>) { + debug_assert_ne!(self.context, 0); + let stream = Pin::new_unchecked(&mut self.stream); + let context = &mut *(self.context as *mut _); + (stream, context) + } +} + +impl<S> Read for StreamWrapper<S> +where + S: AsyncRead, +{ + fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> { + let (stream, cx) = unsafe { self.parts() }; + let mut buf = ReadBuf::new(buf); + match stream.poll_read(cx, &mut buf)? { + Poll::Ready(()) => Ok(buf.filled().len()), + Poll::Pending => Err(io::Error::from(io::ErrorKind::WouldBlock)), + } + } +} + +impl<S> Write for StreamWrapper<S> +where + S: AsyncWrite, +{ + fn write(&mut self, buf: &[u8]) -> io::Result<usize> { + let (stream, cx) = unsafe { self.parts() }; + match stream.poll_write(cx, buf) { + Poll::Ready(r) => r, + Poll::Pending => Err(io::Error::from(io::ErrorKind::WouldBlock)), + } + } + + fn flush(&mut self) -> io::Result<()> { + let (stream, cx) = unsafe { self.parts() }; + match stream.poll_flush(cx) { + Poll::Ready(r) => r, + Poll::Pending => Err(io::Error::from(io::ErrorKind::WouldBlock)), + } + } +} + +fn cvt<T>(r: io::Result<T>) -> Poll<io::Result<T>> { + match r { + Ok(v) => Poll::Ready(Ok(v)), + Err(ref e) if e.kind() == io::ErrorKind::WouldBlock => Poll::Pending, + Err(e) => Poll::Ready(Err(e)), + } +} + +fn cvt_ossl<T>(r: Result<T, ssl::Error>) -> Poll<Result<T, ssl::Error>> { + match r { + Ok(v) => Poll::Ready(Ok(v)), + Err(e) => match e.code() { + ErrorCode::WANT_READ | ErrorCode::WANT_WRITE => Poll::Pending, + _ => Poll::Ready(Err(e)), + }, + } +} + +/// An asynchronous version of [`boring::ssl::SslStream`]. +#[derive(Debug)] +pub struct SslStream<S>(SslStreamCore<StreamWrapper<S>>); + +impl<S: AsyncRead + AsyncWrite> SslStream<S> { + /// Like [`SslStream::new`](ssl::SslStream::new). + pub fn new(ssl: Ssl, stream: S) -> Result<Self, ErrorStack> { + SslStreamCore::new(ssl, StreamWrapper { stream, context: 0 }).map(SslStream) + } + + /// Like [`SslStream::connect`](ssl::SslStream::connect). + pub fn poll_connect( + self: Pin<&mut Self>, + cx: &mut Context<'_>, + ) -> Poll<Result<(), ssl::Error>> { + self.with_context(cx, |s| cvt_ossl(s.connect())) + } + + /// A convenience method wrapping [`poll_connect`](Self::poll_connect). + pub async fn connect(mut self: Pin<&mut Self>) -> Result<(), ssl::Error> { + future::poll_fn(|cx| self.as_mut().poll_connect(cx)).await + } + + /// Like [`SslStream::accept`](ssl::SslStream::accept). + pub fn poll_accept(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Result<(), ssl::Error>> { + self.with_context(cx, |s| cvt_ossl(s.accept())) + } + + /// A convenience method wrapping [`poll_accept`](Self::poll_accept). + pub async fn accept(mut self: Pin<&mut Self>) -> Result<(), ssl::Error> { + future::poll_fn(|cx| self.as_mut().poll_accept(cx)).await + } + + /// Like [`SslStream::do_handshake`](ssl::SslStream::do_handshake). + pub fn poll_do_handshake( + self: Pin<&mut Self>, + cx: &mut Context<'_>, + ) -> Poll<Result<(), ssl::Error>> { + self.with_context(cx, |s| cvt_ossl(s.do_handshake())) + } + + /// A convenience method wrapping [`poll_do_handshake`](Self::poll_do_handshake). + pub async fn do_handshake(mut self: Pin<&mut Self>) -> Result<(), ssl::Error> { + future::poll_fn(|cx| self.as_mut().poll_do_handshake(cx)).await + } + + // TODO: early data +} + +impl<S> SslStream<S> { + /// Returns a shared reference to the `Ssl` object associated with this stream. + pub fn ssl(&self) -> &SslRef { + self.0.ssl() + } + + /// Returns a shared reference to the underlying stream. + pub fn get_ref(&self) -> &S { + &self.0.get_ref().stream + } + + /// Returns a mutable reference to the underlying stream. + pub fn get_mut(&mut self) -> &mut S { + &mut self.0.get_mut().stream + } + + /// Returns a pinned mutable reference to the underlying stream. + pub fn get_pin_mut(self: Pin<&mut Self>) -> Pin<&mut S> { + unsafe { Pin::new_unchecked(&mut self.get_unchecked_mut().0.get_mut().stream) } + } + + fn with_context<F, R>(self: Pin<&mut Self>, ctx: &mut Context<'_>, f: F) -> R + where + F: FnOnce(&mut SslStreamCore<StreamWrapper<S>>) -> R, + { + let this = unsafe { self.get_unchecked_mut() }; + this.0.get_mut().context = ctx as *mut _ as usize; + let r = f(&mut this.0); + this.0.get_mut().context = 0; + r + } +} + +#[cfg(feature = "read_uninit")] +impl<S> AsyncRead for SslStream<S> +where + S: AsyncRead + AsyncWrite, +{ + fn poll_read( + self: Pin<&mut Self>, + ctx: &mut Context<'_>, + buf: &mut ReadBuf<'_>, + ) -> Poll<io::Result<()>> { + self.with_context(ctx, |s| { + // SAFETY: read_uninit does not de-initialize the buffer. + match cvt(s.read_uninit(unsafe { buf.unfilled_mut() }))? { + Poll::Ready(nread) => { + unsafe { + buf.assume_init(nread); + } + buf.advance(nread); + Poll::Ready(Ok(())) + } + Poll::Pending => Poll::Pending, + } + }) + } +} + +#[cfg(not(feature = "read_uninit"))] +impl<S> AsyncRead for SslStream<S> +where + S: AsyncRead + AsyncWrite, +{ + fn poll_read( + self: Pin<&mut Self>, + ctx: &mut Context<'_>, + buf: &mut ReadBuf<'_>, + ) -> Poll<io::Result<()>> { + self.with_context(ctx, |s| { + // This isn't really "proper", but rust-openssl doesn't currently expose a suitable interface even though + // OpenSSL itself doesn't require the buffer to be initialized. So this is good enough for now. + let slice = unsafe { + let buf = buf.unfilled_mut(); + std::slice::from_raw_parts_mut(buf.as_mut_ptr().cast::<u8>(), buf.len()) + }; + match cvt(s.read(slice))? { + Poll::Ready(nread) => { + unsafe { + buf.assume_init(nread); + } + buf.advance(nread); + Poll::Ready(Ok(())) + } + Poll::Pending => Poll::Pending, + } + }) + } +} + +impl<S> AsyncWrite for SslStream<S> +where + S: AsyncRead + AsyncWrite, +{ + fn poll_write(self: Pin<&mut Self>, ctx: &mut Context, buf: &[u8]) -> Poll<io::Result<usize>> { + self.with_context(ctx, |s| cvt(s.write(buf))) + } + + fn poll_flush(self: Pin<&mut Self>, ctx: &mut Context) -> Poll<io::Result<()>> { + self.with_context(ctx, |s| cvt(s.flush())) + } + + fn poll_shutdown(mut self: Pin<&mut Self>, ctx: &mut Context) -> Poll<io::Result<()>> { + match self.as_mut().with_context(ctx, |s| s.shutdown()) { + Ok(ShutdownResult::Sent) | Ok(ShutdownResult::Received) => {} + Err(ref e) if e.code() == ErrorCode::ZERO_RETURN => {} + Err(ref e) if e.code() == ErrorCode::WANT_READ || e.code() == ErrorCode::WANT_WRITE => { + return Poll::Pending; + } + Err(e) => { + return Poll::Ready(Err(e + .into_io_error() + .unwrap_or_else(|e| io::Error::new(io::ErrorKind::Other, e)))); + } + } + + self.get_pin_mut().poll_shutdown(ctx) + } +} + +#[tokio::test] +async fn test_google() { + use boring::ssl; + use std::net::ToSocketAddrs; + use std::pin::Pin; + use tokio::io::{AsyncReadExt, AsyncWriteExt}; + use tokio::net::TcpStream; + + let addr = "8.8.8.8:443".to_socket_addrs().unwrap().next().unwrap(); + let stream = TcpStream::connect(&addr).await.unwrap(); + + let ssl_context = ssl::SslContext::builder(ssl::SslMethod::tls()) + .unwrap() + .build(); + let ssl = ssl::Ssl::new(&ssl_context).unwrap(); + let mut stream = crate::tokio_ssl::SslStream::new(ssl, stream).unwrap(); + + Pin::new(&mut stream).connect().await.unwrap(); + + stream.write_all(b"GET / HTTP/1.0\r\n\r\n").await.unwrap(); + + let mut buf = vec![]; + stream.read_to_end(&mut buf).await.unwrap(); + let response = String::from_utf8_lossy(&buf); + let response = response.trim_end(); + + // any response code is fine + assert!(response.starts_with("HTTP/1.0 ")); + assert!(response.ends_with("</html>") || response.ends_with("</HTML>")); +} diff --git a/pingora-boringssl/src/ext.rs b/pingora-boringssl/src/ext.rs new file mode 100644 index 0000000..bdc0d56 --- /dev/null +++ b/pingora-boringssl/src/ext.rs @@ -0,0 +1,192 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! the extended functionalities that are yet exposed via the [`boring`] APIs + +use boring::error::ErrorStack; +use boring::pkey::{HasPrivate, PKeyRef}; +use boring::ssl::{Ssl, SslAcceptor, SslRef}; +use boring::x509::store::X509StoreRef; +use boring::x509::verify::X509VerifyParamRef; +use boring::x509::X509Ref; +use foreign_types_shared::ForeignTypeRef; +use libc::*; +use std::ffi::CString; + +fn cvt(r: c_int) -> Result<c_int, ErrorStack> { + if r != 1 { + Err(ErrorStack::get()) + } else { + Ok(r) + } +} + +/// Add name as an additional reference identifier that can match the peer's certificate +/// +/// See [X509_VERIFY_PARAM_set1_host](https://www.openssl.org/docs/man3.1/man3/X509_VERIFY_PARAM_set1_host.html). +pub fn add_host(verify_param: &mut X509VerifyParamRef, host: &str) -> Result<(), ErrorStack> { + if host.is_empty() { + return Ok(()); + } + unsafe { + cvt(boring_sys::X509_VERIFY_PARAM_add1_host( + verify_param.as_ptr(), + host.as_ptr() as *const _, + host.len(), + )) + .map(|_| ()) + } +} + +/// Set the verify cert store of `ssl` +/// +/// See [SSL_set1_verify_cert_store](https://www.openssl.org/docs/man1.1.1/man3/SSL_set1_verify_cert_store.html). +pub fn ssl_set_verify_cert_store( + ssl: &mut SslRef, + cert_store: &X509StoreRef, +) -> Result<(), ErrorStack> { + unsafe { + cvt(boring_sys::SSL_set1_verify_cert_store( + ssl.as_ptr(), + cert_store.as_ptr(), + ))?; + } + Ok(()) +} + +/// Load the certificate into `ssl` +/// +/// See [SSL_use_certificate](https://www.openssl.org/docs/man1.1.1/man3/SSL_use_certificate.html). +pub fn ssl_use_certificate(ssl: &mut SslRef, cert: &X509Ref) -> Result<(), ErrorStack> { + unsafe { + cvt(boring_sys::SSL_use_certificate(ssl.as_ptr(), cert.as_ptr()))?; + } + Ok(()) +} + +/// Load the private key into `ssl` +/// +/// See [SSL_use_certificate](https://www.openssl.org/docs/man1.1.1/man3/SSL_use_PrivateKey.html). +pub fn ssl_use_private_key<T>(ssl: &mut SslRef, key: &PKeyRef<T>) -> Result<(), ErrorStack> +where + T: HasPrivate, +{ + unsafe { + cvt(boring_sys::SSL_use_PrivateKey(ssl.as_ptr(), key.as_ptr()))?; + } + Ok(()) +} + +/// Add the certificate into the cert chain of `ssl` +/// +/// See [SSL_add1_chain_cert](https://www.openssl.org/docs/man1.1.1/man3/SSL_add1_chain_cert.html) +pub fn ssl_add_chain_cert(ssl: &mut SslRef, cert: &X509Ref) -> Result<(), ErrorStack> { + unsafe { + cvt(boring_sys::SSL_add1_chain_cert(ssl.as_ptr(), cert.as_ptr()))?; + } + Ok(()) +} + +/// Set renegotiation +/// +/// This function is specific to BoringSSL +/// See <https://commondatastorage.googleapis.com/chromium-boringssl-docs/ssl.h.html#SSL_set_renegotiate_mode> +pub fn ssl_set_renegotiate_mode_freely(ssl: &mut SslRef) { + unsafe { + boring_sys::SSL_set_renegotiate_mode( + ssl.as_ptr(), + boring_sys::ssl_renegotiate_mode_t::ssl_renegotiate_freely, + ); + } +} + +/// Set the curves/groups of `ssl` +/// +/// See [set_groups_list](https://www.openssl.org/docs/manmaster/man3/SSL_CTX_set1_curves.html). +pub fn ssl_set_groups_list(ssl: &mut SslRef, groups: &str) -> Result<(), ErrorStack> { + let groups = CString::new(groups).unwrap(); + unsafe { + // somehow SSL_set1_groups_list doesn't exist but SSL_set1_curves_list means the same anyways + cvt(boring_sys::SSL_set1_curves_list( + ssl.as_ptr(), + groups.as_ptr(), + ))?; + } + Ok(()) +} + +/// Set's whether a second keyshare to be sent in client hello when PQ is used. +/// +/// Default is true. When `true`, the first PQ (if any) and none-PQ keyshares are sent. +/// When `false`, only the first configured keyshares are sent. +#[cfg(feature = "pq_use_second_keyshare")] +pub fn ssl_use_second_key_share(ssl: &mut SslRef, enabled: bool) { + unsafe { boring_sys::SSL_use_second_keyshare(ssl.as_ptr(), enabled as _) } +} +#[cfg(not(feature = "pq_use_second_keyshare"))] +pub fn ssl_use_second_key_share(_ssl: &mut SslRef, _enabled: bool) {} + +/// Clear the error stack +/// +/// SSL calls should check and clear the BoringSSL error stack. But some calls fail to do so. +/// This causes the next unrelated SSL call to fail due to the leftover errors. This function allow +/// the caller to clear the error stack before performing SSL calls to avoid this issue. +pub fn clear_error_stack() { + let _ = ErrorStack::get(); +} + +/// Create a new [Ssl] from &[SslAcceptor] +/// +/// This function is needed because [Ssl::new()] doesn't take `&SslContextRef` like openssl-rs +pub fn ssl_from_acceptor(acceptor: &SslAcceptor) -> Result<Ssl, ErrorStack> { + Ssl::new_from_ref(acceptor.context()) +} + +/// Suspend the TLS handshake when a certificate is needed. +/// +/// This function will cause tls handshake to pause and return the error: SSL_ERROR_WANT_X509_LOOKUP. +/// The caller should set the certificate and then call [unblock_ssl_cert()] before continue the +/// handshake on the tls connection. +pub fn suspend_when_need_ssl_cert(ssl: &mut SslRef) { + unsafe { + boring_sys::SSL_set_cert_cb(ssl.as_ptr(), Some(raw_cert_block), std::ptr::null_mut()); + } +} + +/// Unblock a TLS handshake after the certificate is set. +/// +/// The user should continue to call tls handshake after this function is called. +pub fn unblock_ssl_cert(ssl: &mut SslRef) { + unsafe { + boring_sys::SSL_set_cert_cb(ssl.as_ptr(), None, std::ptr::null_mut()); + } +} + +// Just block the handshake +extern "C" fn raw_cert_block(_ssl: *mut boring_sys::SSL, _arg: *mut c_void) -> c_int { + -1 +} + +/// Whether the TLS error is SSL_ERROR_WANT_X509_LOOKUP +pub fn is_suspended_for_cert(error: &boring::ssl::Error) -> bool { + error.code().as_raw() == boring_sys::SSL_ERROR_WANT_X509_LOOKUP +} + +#[allow(clippy::mut_from_ref)] +/// Get a mutable SslRef ouf of SslRef. which is a missing functionality for certain SslStream +/// # Safety +/// the caller need to make sure that they hold a &mut SslRef +pub unsafe fn ssl_mut(ssl: &SslRef) -> &mut SslRef { + unsafe { SslRef::from_ptr_mut(ssl.as_ptr()) } +} diff --git a/pingora-boringssl/src/lib.rs b/pingora-boringssl/src/lib.rs new file mode 100644 index 0000000..26dfaab --- /dev/null +++ b/pingora-boringssl/src/lib.rs @@ -0,0 +1,34 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! The BoringSSL API compatibility layer. +//! +//! This crate aims at making [boring] APIs exchangeable with [openssl-rs](https://docs.rs/openssl/latest/openssl/). +//! In other words, this crate and `pingora-openssl` expose identical rust APIs. + +#![warn(clippy::all)] + +use boring as ssl_lib; +pub use boring_sys as ssl_sys; +pub mod boring_tokio; +pub use boring_tokio as tokio_ssl; +pub mod ext; + +// export commonly used libs +pub use ssl_lib::error; +pub use ssl_lib::hash; +pub use ssl_lib::nid; +pub use ssl_lib::pkey; +pub use ssl_lib::ssl; +pub use ssl_lib::x509; diff --git a/pingora-cache/Cargo.toml b/pingora-cache/Cargo.toml new file mode 100644 index 0000000..3add293 --- /dev/null +++ b/pingora-cache/Cargo.toml @@ -0,0 +1,64 @@ +[package] +name = "pingora-cache" +version = "0.1.0" +authors = ["Yuchen Wu <[email protected]>"] +license = "Apache-2.0" +edition = "2021" +repository = "https://github.com/cloudflare/pingora" +categories = ["asynchronous", "network-programming"] +keywords = ["async", "http", "cache"] +description = """ +HTTP caching APIs for Pingora proxy. +""" + +# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html +[lib] +name = "pingora_cache" +path = "src/lib.rs" + +[dependencies] +pingora-core = { version = "0.1.0", path = "../pingora-core" } +pingora-error = { version = "0.1.0", path = "../pingora-error" } +pingora-header-serde = { version = "0.1.0", path = "../pingora-header-serde" } +pingora-http = { version = "0.1.0", path = "../pingora-http" } +pingora-lru = { version = "0.1.0", path = "../pingora-lru" } +pingora-timeout = { version = "0.1.0", path = "../pingora-timeout" } +http = { workspace = true } +indexmap = "1" +once_cell = { workspace = true } +regex = "1" +blake2 = "0.10" +serde = { version = "1.0", features = ["derive"] } +rmp-serde = "1" +bytes = { workspace = true } +httpdate = "1.0.2" +log = { workspace = true } +async-trait = { workspace = true } +parking_lot = "0.12" +rustracing = "0.5.1" +rustracing_jaeger = "0.7" +rmp = "0.8" +tokio = { workspace = true } +lru = { workspace = true } +ahash = { workspace = true } +hex = "0.4" +httparse = { workspace = true } + +[dev-dependencies] +tokio-test = "0.4" +tokio = { workspace = true, features = ["fs"] } +env_logger = "0.9" +dhat = "0" +futures = "0.3" + +[[bench]] +name = "simple_lru_memory" +harness = false + +[[bench]] +name = "lru_memory" +harness = false + +[[bench]] +name = "lru_serde" +harness = false diff --git a/pingora-cache/LICENSE b/pingora-cache/LICENSE new file mode 100644 index 0000000..d645695 --- /dev/null +++ b/pingora-cache/LICENSE @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/pingora-cache/benches/lru_memory.rs b/pingora-cache/benches/lru_memory.rs new file mode 100644 index 0000000..d2d022f --- /dev/null +++ b/pingora-cache/benches/lru_memory.rs @@ -0,0 +1,96 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#[global_allocator] +static ALLOC: dhat::Alloc = dhat::Alloc; + +use pingora_cache::{ + eviction::{lru::Manager, EvictionManager}, + CacheKey, +}; + +const ITEMS: usize = 5 * usize::pow(2, 20); + +/* + Total: 681,836,456 bytes (100%, 28,192,797.16/s) in 10,485,845 blocks (100%, 433,572.15/s), avg size 65.02 bytes, avg lifetime 5,935,075.17 µs (24.54% of program duration) + At t-gmax: 569,114,536 bytes (100%) in 5,242,947 blocks (100%), avg size 108.55 bytes + At t-end: 88 bytes (100%) in 3 blocks (100%), avg size 29.33 bytes + Allocated at { + #0: [root] + } + ├── PP 1.1/5 { + │ Total: 293,601,280 bytes (43.06%, 12,139,921.91/s) in 5,242,880 blocks (50%, 216,784.32/s), avg size 56 bytes, avg lifetime 11,870,032.65 µs (49.08% of program duration) + │ Max: 293,601,280 bytes in 5,242,880 blocks, avg size 56 bytes + │ At t-gmax: 293,601,280 bytes (51.59%) in 5,242,880 blocks (100%), avg size 56 bytes + │ At t-end: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes + │ Allocated at { + │ #1: 0x5555703cf69c: alloc::alloc::exchange_malloc (alloc/src/alloc.rs:326:11) + │ #2: 0x5555703cf69c: alloc::boxed::Box<T>::new (alloc/src/boxed.rs:215:9) + │ #3: 0x5555703cf69c: pingora_lru::LruUnit<T>::admit (pingora-lru/src/lib.rs:201:20) + │ #4: 0x5555703cf69c: pingora_lru::Lru<T,_>::admit (pingora-lru/src/lib.rs:48:26) + │ #5: 0x5555703cf69c: <pingora_cache::eviction::lru::Manager<_> as pingora_cache::eviction::EvictionManager>::admit (src/eviction/lru.rs:114:9) + │ #6: 0x5555703cf69c: lru_memory::main (pingora-cache/benches/lru_memory.rs:78:9) + │ } + │ } + ├── PP 1.2/5 { + │ Total: 203,685,456 bytes (29.87%, 8,422,052.97/s) in 50 blocks (0%, 2.07/s), avg size 4,073,709.12 bytes, avg lifetime 6,842,528.74 µs (28.29% of program duration) + │ Max: 132,906,576 bytes in 32 blocks, avg size 4,153,330.5 bytes + │ At t-gmax: 132,906,576 bytes (23.35%) in 32 blocks (0%), avg size 4,153,330.5 bytes + │ At t-end: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes + │ Allocated at { + │ #1: 0x5555703cec54: <alloc::alloc::Global as core::alloc::Allocator>::allocate (alloc/src/alloc.rs:237:9) + │ #2: 0x5555703cec54: alloc::raw_vec::RawVec<T,A>::allocate_in (alloc/src/raw_vec.rs:185:45) + │ #3: 0x5555703cec54: alloc::raw_vec::RawVec<T,A>::with_capacity_in (alloc/src/raw_vec.rs:131:9) + │ #4: 0x5555703cec54: alloc::vec::Vec<T,A>::with_capacity_in (src/vec/mod.rs:641:20) + │ #5: 0x5555703cec54: alloc::vec::Vec<T>::with_capacity (src/vec/mod.rs:483:9) + │ #6: 0x5555703cec54: pingora_lru::linked_list::Nodes::with_capacity (pingora-lru/src/linked_list.rs:50:25) + │ #7: 0x5555703cec54: pingora_lru::linked_list::LinkedList::with_capacity (pingora-lru/src/linked_list.rs:121:20) + │ #8: 0x5555703cec54: pingora_lru::LruUnit<T>::with_capacity (pingora-lru/src/lib.rs:176:20) + │ #9: 0x5555703cec54: pingora_lru::Lru<T,_>::with_capacity (pingora-lru/src/lib.rs:28:36) + │ #10: 0x5555703cec54: pingora_cache::eviction::lru::Manager<_>::with_capacity (src/eviction/lru.rs:22:17) + │ #11: 0x5555703cec54: lru_memory::main (pingora-cache/benches/lru_memory.rs:74:19) + │ } + │ } + ├── PP 1.3/5 { + │ Total: 142,606,592 bytes (20.92%, 5,896,544.09/s) in 32 blocks (0%, 1.32/s), avg size 4,456,456 bytes, avg lifetime 22,056,252.88 µs (91.2% of program duration) + │ Max: 142,606,592 bytes in 32 blocks, avg size 4,456,456 bytes + │ At t-gmax: 142,606,592 bytes (25.06%) in 32 blocks (0%), avg size 4,456,456 bytes + │ At t-end: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes + │ Allocated at { + │ #1: 0x5555703ceb64: alloc::alloc::alloc (alloc/src/alloc.rs:95:14) + │ #2: 0x5555703ceb64: <hashbrown::raw::alloc::inner::Global as hashbrown::raw::alloc::inner::Allocator>::allocate (src/raw/alloc.rs:47:35) + │ #3: 0x5555703ceb64: hashbrown::raw::alloc::inner::do_alloc (src/raw/alloc.rs:62:9) + │ #4: 0x5555703ceb64: hashbrown::raw::RawTableInner<A>::new_uninitialized (src/raw/mod.rs:1080:38) + │ #5: 0x5555703ceb64: hashbrown::raw::RawTableInner<A>::fallible_with_capacity (src/raw/mod.rs:1109:30) + │ #6: 0x5555703ceb64: hashbrown::raw::RawTable<T,A>::fallible_with_capacity (src/raw/mod.rs:460:20) + │ #7: 0x5555703ceb64: hashbrown::raw::RawTable<T,A>::with_capacity_in (src/raw/mod.rs:481:15) + │ #8: 0x5555703ceb64: hashbrown::raw::RawTable<T>::with_capacity (src/raw/mod.rs:411:9) + │ #9: 0x5555703ceb64: hashbrown::map::HashMap<K,V,S>::with_capacity_and_hasher (hashbrown-0.12.3/src/map.rs:422:20) + │ #10: 0x5555703ceb64: hashbrown::map::HashMap<K,V>::with_capacity (hashbrown-0.12.3/src/map.rs:326:9) + │ #11: 0x5555703ceb64: pingora_lru::LruUnit<T>::with_capacity (pingora-lru/src/lib.rs:175:27) + │ #12: 0x5555703ceb64: pingora_lru::Lru<T,_>::with_capacity (pingora-lru/src/lib.rs:28:36) + │ #13: 0x5555703ceb64: pingora_cache::eviction::lru::Manager<_>::with_capacity (src/eviction/lru.rs:22:17) + │ #14: 0x5555703ceb64: lru_memory::main (pingora-cache/benches/lru_memory.rs:74:19) + │ } + │ } +*/ +fn main() { + let _profiler = dhat::Profiler::new_heap(); + let manager = Manager::<32>::with_capacity(ITEMS, ITEMS / 32); + let unused_ttl = std::time::SystemTime::now(); + for i in 0..ITEMS { + let item = CacheKey::new("", i.to_string(), "").to_compact(); + manager.admit(item, 1, unused_ttl); + } +} diff --git a/pingora-cache/benches/lru_serde.rs b/pingora-cache/benches/lru_serde.rs new file mode 100644 index 0000000..7ad4234 --- /dev/null +++ b/pingora-cache/benches/lru_serde.rs @@ -0,0 +1,46 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use std::time::Instant; + +use pingora_cache::{ + eviction::{lru::Manager, EvictionManager}, + CacheKey, +}; + +const ITEMS: usize = 5 * usize::pow(2, 20); + +fn main() { + let manager = Manager::<32>::with_capacity(ITEMS, ITEMS / 32); + let manager2 = Manager::<32>::with_capacity(ITEMS, ITEMS / 32); + let unused_ttl = std::time::SystemTime::now(); + for i in 0..ITEMS { + let item = CacheKey::new("", i.to_string(), "").to_compact(); + manager.admit(item, 1, unused_ttl); + } + + /* lru serialize shard 19 22.573338ms, 5241623 bytes + * lru deserialize shard 19 39.260669ms, 5241623 bytes */ + for i in 0..32 { + let before = Instant::now(); + let ser = manager.serialize_shard(i).unwrap(); + let elapsed = before.elapsed(); + println!("lru serialize shard {i} {elapsed:?}, {} bytes", ser.len()); + + let before = Instant::now(); + manager2.deserialize_shard(&ser).unwrap(); + let elapsed = before.elapsed(); + println!("lru deserialize shard {i} {elapsed:?}, {} bytes", ser.len()); + } +} diff --git a/pingora-cache/benches/simple_lru_memory.rs b/pingora-cache/benches/simple_lru_memory.rs new file mode 100644 index 0000000..b12a5c8 --- /dev/null +++ b/pingora-cache/benches/simple_lru_memory.rs @@ -0,0 +1,78 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#[global_allocator] +static ALLOC: dhat::Alloc = dhat::Alloc; + +use pingora_cache::{ + eviction::{simple_lru::Manager, EvictionManager}, + CacheKey, +}; + +const ITEMS: usize = 5 * usize::pow(2, 20); + +/* + Total: 704,643,412 bytes (100%, 29,014,058.85/s) in 10,485,787 blocks (100%, 431,757.73/s), avg size 67.2 bytes, avg lifetime 6,163,799.09 µs (25.38% of program duration) + At t-gmax: 520,093,936 bytes (100%) in 5,242,886 blocks (100%), avg size 99.2 bytes + ├── PP 1.1/4 { + │ Total: 377,487,360 bytes (53.57%, 15,543,238.31/s) in 5,242,880 blocks (50%, 215,878.31/s), avg size 72 bytes, avg lifetime 12,327,602.83 µs (50.76% of program duration) + │ Max: 377,487,360 bytes in 5,242,880 blocks, avg size 72 bytes + │ At t-gmax: 377,487,360 bytes (72.58%) in 5,242,880 blocks (100%), avg size 72 bytes + │ At t-end: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes + │ Allocated at { + │ #1: 0x5555791dd7e0: alloc::alloc::exchange_malloc (alloc/src/alloc.rs:326:11) + │ #2: 0x5555791dd7e0: alloc::boxed::Box<T>::new (alloc/src/boxed.rs:215:9) + │ #3: 0x5555791dd7e0: lru::LruCache<K,V,S>::replace_or_create_node (lru-0.8.1/src/lib.rs:391:20) + │ #4: 0x5555791dd7e0: lru::LruCache<K,V,S>::capturing_put (lru-0.8.1/src/lib.rs:355:44) + │ #5: 0x5555791dd7e0: lru::LruCache<K,V,S>::push (lru-0.8.1/src/lib.rs:334:9) + │ #6: 0x5555791dd7e0: pingora_cache::eviction::simple_lru::Manager::insert (src/eviction/simple_lru.rs:49:23) + │ #7: 0x5555791dd7e0: <pingora_cache::eviction::simple_lru::Manager as pingora_cache::eviction::EvictionManager>::admit (src/eviction/simple_lru.rs:166:9) + │ #8: 0x5555791dd7e0: simple_lru_memory::main (pingora-cache/benches/simple_lru_memory.rs:21:9) + │ } + │ } + ├── PP 1.2/4 { + │ Total: 285,212,780 bytes (40.48%, 11,743,784.5/s) in 22 blocks (0%, 0.91/s), avg size 12,964,217.27 bytes, avg lifetime 1,116,774.23 µs (4.6% of program duration) + │ Max: 213,909,520 bytes in 2 blocks, avg size 106,954,760 bytes + │ At t-gmax: 142,606,344 bytes (27.42%) in 1 blocks (0%), avg size 142,606,344 bytes + │ At t-end: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes + │ Allocated at { + │ #1: 0x5555791dae20: alloc::alloc::alloc (alloc/src/alloc.rs:95:14) + │ #2: 0x5555791dae20: <hashbrown::raw::alloc::inner::Global as hashbrown::raw::alloc::inner::Allocator>::allocate (src/raw/alloc.rs:47:35) + │ #3: 0x5555791dae20: hashbrown::raw::alloc::inner::do_alloc (src/raw/alloc.rs:62:9) + │ #4: 0x5555791dae20: hashbrown::raw::RawTableInner<A>::new_uninitialized (src/raw/mod.rs:1080:38) + │ #5: 0x5555791dae20: hashbrown::raw::RawTableInner<A>::fallible_with_capacity (src/raw/mod.rs:1109:30) + │ #6: 0x5555791dae20: hashbrown::raw::RawTableInner<A>::prepare_resize (src/raw/mod.rs:1353:29) + │ #7: 0x5555791dae20: hashbrown::raw::RawTableInner<A>::resize_inner (src/raw/mod.rs:1426:29) + │ #8: 0x5555791dae20: hashbrown::raw::RawTableInner<A>::reserve_rehash_inner (src/raw/mod.rs:1403:13) + │ #9: 0x5555791dae20: hashbrown::raw::RawTable<T,A>::reserve_rehash (src/raw/mod.rs:680:13) + │ #10: 0x5555791dde50: hashbrown::raw::RawTable<T,A>::reserve (src/raw/mod.rs:646:16) + │ #11: 0x5555791dde50: hashbrown::raw::RawTable<T,A>::insert (src/raw/mod.rs:725:17) + │ #12: 0x5555791dde50: hashbrown::map::HashMap<K,V,S,A>::insert (hashbrown-0.12.3/src/map.rs:1679:13) + │ #13: 0x5555791dde50: lru::LruCache<K,V,S>::capturing_put (lru-0.8.1/src/lib.rs:361:17) + │ #14: 0x5555791dde50: lru::LruCache<K,V,S>::push (lru-0.8.1/src/lib.rs:334:9) + │ #15: 0x5555791dde50: pingora_cache::eviction::simple_lru::Manager::insert (src/eviction/simple_lru.rs:49:23) + │ #16: 0x5555791dde50: <pingora_cache::eviction::simple_lru::Manager as pingora_cache::eviction::EvictionManager>::admit (src/eviction/simple_lru.rs:166:9) + │ #17: 0x5555791dde50: simple_lru_memory::main (pingora-cache/benches/simple_lru_memory.rs:21:9) + │ } + │ } +*/ +fn main() { + let _profiler = dhat::Profiler::new_heap(); + let manager = Manager::new(ITEMS); + let unused_ttl = std::time::SystemTime::now(); + for i in 0..ITEMS { + let item = CacheKey::new("", i.to_string(), "").to_compact(); + manager.admit(item, 1, unused_ttl); + } +} diff --git a/pingora-cache/src/cache_control.rs b/pingora-cache/src/cache_control.rs new file mode 100644 index 0000000..6686c3e --- /dev/null +++ b/pingora-cache/src/cache_control.rs @@ -0,0 +1,839 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Functions and utilities to help parse Cache-Control headers + +use super::*; + +use http::header::HeaderName; +use http::HeaderValue; +use indexmap::IndexMap; +use once_cell::sync::Lazy; +use pingora_error::{Error, ErrorType, Result}; +use pingora_http::ResponseHeader; +use regex::bytes::Regex; +use std::num::IntErrorKind; +use std::slice; +use std::str; + +/// The max delta-second per [RFC 7234](https://datatracker.ietf.org/doc/html/rfc7234#section-1.2.1) +// "If a cache receives a delta-seconds +// value greater than the greatest integer it can represent, or if any +// of its subsequent calculations overflows, the cache MUST consider the +// value to be either 2147483648 (2^31) or the greatest positive integer +// it can conveniently represent." +pub const DELTA_SECONDS_OVERFLOW_VALUE: u32 = 2147483648; + +/// Cache control directive key type +pub type DirectiveKey = String; + +/// Cache control directive value type +#[derive(Debug)] +pub struct DirectiveValue(pub Vec<u8>); + +impl AsRef<[u8]> for DirectiveValue { + fn as_ref(&self) -> &[u8] { + &self.0 + } +} + +impl DirectiveValue { + /// A [DirectiveValue] without quotes (`"`). + pub fn parse_as_bytes(&self) -> &[u8] { + self.0 + .strip_prefix(&[b'"']) + .and_then(|bytes| bytes.strip_suffix(&[b'"'])) + .unwrap_or(&self.0[..]) + } + + /// A [DirectiveValue] without quotes (`"`) as `str`. + pub fn parse_as_str(&self) -> Result<&str> { + str::from_utf8(self.parse_as_bytes()).or_else(|e| { + Error::e_because(ErrorType::InternalError, "could not parse value as utf8", e) + }) + } + + /// Parse the [DirectiveValue] as delta seconds + /// + /// `"`s are ignored. The value is capped to [DELTA_SECONDS_OVERFLOW_VALUE]. + pub fn parse_as_delta_seconds(&self) -> Result<u32> { + match self.parse_as_str()?.parse::<u32>() { + Ok(value) => Ok(value), + Err(e) => { + // delta-seconds expect to handle positive overflow gracefully + if e.kind() == &IntErrorKind::PosOverflow { + Ok(DELTA_SECONDS_OVERFLOW_VALUE) + } else { + Error::e_because(ErrorType::InternalError, "could not parse value as u32", e) + } + } + } + } +} + +/// An ordered map to store cache control key value pairs. +pub type DirectiveMap = IndexMap<DirectiveKey, Option<DirectiveValue>>; + +/// Parsed Cache-Control directives +#[derive(Debug)] +pub struct CacheControl { + /// The parsed directives + pub directives: DirectiveMap, +} + +/// Cacheability calculated from cache control. +#[derive(Debug, PartialEq, Eq)] +pub enum Cacheable { + /// Cacheable + Yes, + /// Not cacheable + No, + /// No directive found for explicit cacheability + Default, +} + +/// An iter over all the cache control directives +pub struct ListValueIter<'a>(slice::Split<'a, u8, fn(&u8) -> bool>); + +impl<'a> ListValueIter<'a> { + pub fn from(value: &'a DirectiveValue) -> Self { + ListValueIter(value.parse_as_bytes().split(|byte| byte == &b',')) + } +} + +// https://datatracker.ietf.org/doc/html/rfc7230#section-3.2.3 +// optional whitespace OWS = *(SP / HTAB); SP = 0x20, HTAB = 0x09 +fn trim_ows(bytes: &[u8]) -> &[u8] { + fn not_ows(b: &u8) -> bool { + b != &b'\x20' && b != &b'\x09' + } + // find first non-OWS char from front (head) and from end (tail) + let head = bytes.iter().position(not_ows).unwrap_or(0); + let tail = bytes + .iter() + .rposition(not_ows) + .map(|rpos| rpos + 1) + .unwrap_or(head); + &bytes[head..tail] +} + +impl<'a> Iterator for ListValueIter<'a> { + type Item = &'a [u8]; + + fn next(&mut self) -> Option<Self::Item> { + Some(trim_ows(self.0.next()?)) + } +} + +/* + Originally from https://github.com/hapijs/wreck: + Cache-Control = 1#cache-directive + cache-directive = token [ "=" ( token / quoted-string ) ] + token = [^\x00-\x20\(\)<>@\,;\:\\"\/\[\]\?\=\{\}\x7F]+ + quoted-string = "(?:[^"\\]|\\.)*" +*/ +static RE_CACHE_DIRECTIVE: Lazy<Regex> = + // unicode support disabled, allow ; or , delimiter | capture groups: 1: directive = 2: token OR quoted-string + Lazy::new(|| { + Regex::new(r#"(?-u)(?:^|(?:\s*[,;]\s*))([^\x00-\x20\(\)<>@,;:\\"/\[\]\?=\{\}\x7F]+)(?:=((?:[^\x00-\x20\(\)<>@,;:\\"/\[\]\?=\{\}\x7F]+|(?:"(?:[^"\\]|\\.)*"))))?"#).unwrap() + }); + +impl CacheControl { + // Our parsing strategy is more permissive than the RFC in a few ways: + // - Allows semicolons as delimiters (in addition to commas). + // - Allows octets outside of visible ASCII in tokens. + // - Doesn't require no-value for "boolean directives," such as must-revalidate + // - Allows quoted-string format for numeric values. + fn from_headers(headers: http::header::GetAll<HeaderValue>) -> Option<Self> { + let mut directives = IndexMap::new(); + // should iterate in header line insertion order + for line in headers { + for captures in RE_CACHE_DIRECTIVE.captures_iter(line.as_bytes()) { + // directive key + // header values don't have to be utf-8, but we store keys as strings for case-insensitive hashing + let key = captures.get(1).and_then(|cap| { + str::from_utf8(cap.as_bytes()) + .ok() + .map(|token| token.to_lowercase()) + }); + if key.is_none() { + continue; + } + // directive value + // match token or quoted-string + let value = captures + .get(2) + .map(|cap| DirectiveValue(cap.as_bytes().to_vec())); + directives.insert(key.unwrap(), value); + } + } + Some(CacheControl { directives }) + } + + /// Parse from the given header name in `headers` + pub fn from_headers_named(header_name: &str, headers: &http::HeaderMap) -> Option<Self> { + if !headers.contains_key(header_name) { + return None; + } + + Self::from_headers(headers.get_all(header_name)) + } + + /// Parse from the given header name in the [ReqHeader] + pub fn from_req_headers_named(header_name: &str, req_header: &ReqHeader) -> Option<Self> { + Self::from_headers_named(header_name, &req_header.headers) + } + + /// Parse `Cache-Control` header name from the [ReqHeader] + pub fn from_req_headers(req_header: &ReqHeader) -> Option<Self> { + Self::from_req_headers_named("cache-control", req_header) + } + + /// Parse from the given header name in the [RespHeader] + pub fn from_resp_headers_named(header_name: &str, resp_header: &RespHeader) -> Option<Self> { + Self::from_headers_named(header_name, &resp_header.headers) + } + + /// Parse `Cache-Control` header name from the [RespHeader] + pub fn from_resp_headers(resp_header: &RespHeader) -> Option<Self> { + Self::from_resp_headers_named("cache-control", resp_header) + } + + /// Whether the given directive is in the cache control. + pub fn has_key(&self, key: &str) -> bool { + self.directives.contains_key(key) + } + + /// Whether the `public` directive is in the cache control. + pub fn public(&self) -> bool { + self.has_key("public") + } + + /// Whether the given directive exists and it has no value. + fn has_key_without_value(&self, key: &str) -> bool { + matches!(self.directives.get(key), Some(None)) + } + + /// Whether the standalone `private` exists in the cache control + // RFC 7234: using the #field-name versions of `private` + // means a shared cache "MUST NOT store the specified field-name(s), + // whereas it MAY store the remainder of the response." + // It must be a boolean form (no value) to apply to the whole response. + // https://datatracker.ietf.org/doc/html/rfc7234#section-5.2.2.6 + pub fn private(&self) -> bool { + self.has_key_without_value("private") + } + + fn get_field_names(&self, key: &str) -> Option<ListValueIter> { + if let Some(Some(value)) = self.directives.get(key) { + Some(ListValueIter::from(value)) + } else { + None + } + } + + /// Get the values of `private=` + pub fn private_field_names(&self) -> Option<ListValueIter> { + self.get_field_names("private") + } + + /// Whether the standalone `no-cache` exists in the cache control + pub fn no_cache(&self) -> bool { + self.has_key_without_value("no-cache") + } + + /// Get the values of `no-cache=` + pub fn no_cache_field_names(&self) -> Option<ListValueIter> { + self.get_field_names("no-cache") + } + + /// Whether `no-store` exists. + pub fn no_store(&self) -> bool { + self.has_key("no-store") + } + + fn parse_delta_seconds(&self, key: &str) -> Result<Option<u32>> { + if let Some(Some(dir_value)) = self.directives.get(key) { + Ok(Some(dir_value.parse_as_delta_seconds()?)) + } else { + Ok(None) + } + } + + /// Return the `max-age` seconds + pub fn max_age(&self) -> Result<Option<u32>> { + self.parse_delta_seconds("max-age") + } + + /// Return the `s-maxage` seconds + pub fn s_maxage(&self) -> Result<Option<u32>> { + self.parse_delta_seconds("s-maxage") + } + + /// Return the `stale-while-revalidate` seconds + pub fn stale_while_revalidate(&self) -> Result<Option<u32>> { + self.parse_delta_seconds("stale-while-revalidate") + } + + /// Return the `stale-if-error` seconds + pub fn stale_if_error(&self) -> Result<Option<u32>> { + self.parse_delta_seconds("stale-if-error") + } + + /// Whether `must-revalidate` exists. + pub fn must_revalidate(&self) -> bool { + self.has_key("must-revalidate") + } + + /// Whether `proxy-revalidate` exists. + pub fn proxy_revalidate(&self) -> bool { + self.has_key("proxy-revalidate") + } + + /// Whether `only-if-cached` exists. + pub fn only_if_cached(&self) -> bool { + self.has_key("only-if-cached") + } +} + +impl InterpretCacheControl for CacheControl { + fn is_cacheable(&self) -> Cacheable { + if self.no_store() || self.private() { + return Cacheable::No; + } + if self.has_key("s-maxage") || self.has_key("max-age") || self.public() { + return Cacheable::Yes; + } + Cacheable::Default + } + + fn allow_caching_authorized_req(&self) -> bool { + // RFC 7234 https://datatracker.ietf.org/doc/html/rfc7234#section-3 + // "MUST NOT" store requests with Authorization header + // unless response contains one of these directives + self.must_revalidate() || self.public() || self.has_key("s-maxage") + } + + fn fresh_sec(&self) -> Option<u32> { + if self.no_cache() { + // always treated as stale + return Some(0); + } + match self.s_maxage() { + Ok(Some(seconds)) => Some(seconds), + // s-maxage not present + Ok(None) => match self.max_age() { + Ok(Some(seconds)) => Some(seconds), + _ => None, + }, + _ => None, + } + } + + fn serve_stale_while_revalidate_sec(&self) -> Option<u32> { + // RFC 7234: these directives forbid serving stale. + // https://datatracker.ietf.org/doc/html/rfc7234#section-4.2.4 + if self.must_revalidate() || self.proxy_revalidate() || self.has_key("s-maxage") { + return Some(0); + } + self.stale_while_revalidate().unwrap_or(None) + } + + fn serve_stale_if_error_sec(&self) -> Option<u32> { + if self.must_revalidate() || self.proxy_revalidate() || self.has_key("s-maxage") { + return Some(0); + } + self.stale_if_error().unwrap_or(None) + } + + // Strip header names listed in `private` or `no-cache` directives from a response. + fn strip_private_headers(&self, resp_header: &mut ResponseHeader) { + fn strip_listed_headers(resp: &mut ResponseHeader, field_names: ListValueIter) { + for name in field_names { + if let Ok(header) = HeaderName::from_bytes(name) { + resp.remove_header(&header); + } + } + } + + if let Some(headers) = self.private_field_names() { + strip_listed_headers(resp_header, headers); + } + // We interpret `no-cache` the same way as `private`, + // though technically it has a less restrictive requirement + // ("MUST NOT be sent in the response to a subsequent request + // without successful revalidation with the origin server"). + // https://datatracker.ietf.org/doc/html/rfc7234#section-5.2.2.2 + if let Some(headers) = self.no_cache_field_names() { + strip_listed_headers(resp_header, headers); + } + } +} + +/// `InterpretCacheControl` provides a meaningful interface to the parsed `CacheControl`. +/// These functions actually interpret the parsed cache-control directives to return +/// the freshness or other cache meta values that cache-control is signaling. +/// +/// By default `CacheControl` implements an RFC-7234 compliant reading that assumes it is being +/// used with a shared (proxy) cache. +pub trait InterpretCacheControl { + /// Does cache-control specify this response is cacheable? + /// + /// Note that an RFC-7234 compliant cacheability check must also + /// check if the request contained the Authorization header and + /// `allow_caching_authorized_req`. + fn is_cacheable(&self) -> Cacheable; + + /// Does this cache-control allow caching a response to + /// a request with the Authorization header? + fn allow_caching_authorized_req(&self) -> bool; + + /// Returns freshness ttl specified in cache-control + /// + /// - `Some(_)` indicates cache-control specifies a valid ttl. Some(0) = always stale. + /// - `None` means cache-control did not specify a valid ttl. + fn fresh_sec(&self) -> Option<u32>; + + /// Returns stale-while-revalidate ttl, + /// + /// The result should consider all the relevant cache directives, not just SWR header itself. + /// + /// Some(0) means serving such stale is disallowed by directive like `must-revalidate` + /// or `stale-while-revalidater=0`. + /// + /// `None` indicates no SWR ttl was specified. + fn serve_stale_while_revalidate_sec(&self) -> Option<u32>; + + /// Returns stale-if-error ttl, + /// + /// The result should consider all the relevant cache directives, not just SIE header itself. + /// + /// Some(0) means serving such stale is disallowed by directive like `must-revalidate` + /// or `stale-if-error=0`. + /// + /// `None` indicates no SIE ttl was specified. + fn serve_stale_if_error_sec(&self) -> Option<u32>; + + /// Strip header names listed in `private` or `no-cache` directives from a response, + /// usually prior to storing that response in cache. + fn strip_private_headers(&self, resp_header: &mut ResponseHeader); +} + +#[cfg(test)] +mod tests { + use super::*; + use http::header::CACHE_CONTROL; + use http::HeaderValue; + use http::{request, response}; + + fn build_response(cc_key: HeaderName, cc_value: &str) -> response::Parts { + let (parts, _) = response::Builder::new() + .header(cc_key, cc_value) + .body(()) + .unwrap() + .into_parts(); + parts + } + + #[test] + fn test_simple_cache_control() { + let resp = build_response(CACHE_CONTROL, "public, max-age=10000"); + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + assert!(cc.public()); + assert_eq!(cc.max_age().unwrap().unwrap(), 10000); + } + + #[test] + fn test_private_cache_control() { + let resp = build_response(CACHE_CONTROL, "private"); + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + + assert!(cc.private()); + assert!(cc.max_age().unwrap().is_none()); + } + + #[test] + fn test_directives_across_header_lines() { + let (parts, _) = response::Builder::new() + .header(CACHE_CONTROL, "public,") + .header("cache-Control", "max-age=10000") + .body(()) + .unwrap() + .into_parts(); + let cc = CacheControl::from_resp_headers(&parts).unwrap(); + + assert!(cc.public()); + assert_eq!(cc.max_age().unwrap().unwrap(), 10000); + } + + #[test] + fn test_recognizes_semicolons_as_delimiters() { + let resp = build_response(CACHE_CONTROL, "public; max-age=0"); + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + + assert!(cc.public()); + assert_eq!(cc.max_age().unwrap().unwrap(), 0); + } + + #[test] + fn test_unknown_directives() { + let resp = build_response(CACHE_CONTROL, "public,random1=random2, rand3=\"\""); + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + let mut directive_iter = cc.directives.iter(); + + let first = directive_iter.next().unwrap(); + assert_eq!(first.0, &"public"); + assert!(first.1.is_none()); + + let second = directive_iter.next().unwrap(); + assert_eq!(second.0, &"random1"); + assert_eq!(second.1.as_ref().unwrap().0, "random2".as_bytes()); + + let third = directive_iter.next().unwrap(); + assert_eq!(third.0, &"rand3"); + assert_eq!(third.1.as_ref().unwrap().0, "\"\"".as_bytes()); + + assert!(directive_iter.next().is_none()); + } + + #[test] + fn test_case_insensitive_directive_keys() { + let resp = build_response( + CACHE_CONTROL, + "Public=\"something\", mAx-AGe=\"10000\", foo=cRaZyCaSe, bAr=\"inQuotes\"", + ); + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + + assert!(cc.public()); + assert_eq!(cc.max_age().unwrap().unwrap(), 10000); + + let mut directive_iter = cc.directives.iter(); + let first = directive_iter.next().unwrap(); + assert_eq!(first.0, &"public"); + assert_eq!(first.1.as_ref().unwrap().0, "\"something\"".as_bytes()); + + let second = directive_iter.next().unwrap(); + assert_eq!(second.0, &"max-age"); + assert_eq!(second.1.as_ref().unwrap().0, "\"10000\"".as_bytes()); + + // values are still stored with casing + let third = directive_iter.next().unwrap(); + assert_eq!(third.0, &"foo"); + assert_eq!(third.1.as_ref().unwrap().0, "cRaZyCaSe".as_bytes()); + + let fourth = directive_iter.next().unwrap(); + assert_eq!(fourth.0, &"bar"); + assert_eq!(fourth.1.as_ref().unwrap().0, "\"inQuotes\"".as_bytes()); + + assert!(directive_iter.next().is_none()); + } + + #[test] + fn test_non_ascii() { + let resp = build_response(CACHE_CONTROL, "püblic=💖, max-age=\"💯\""); + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + + // Not considered valid registered directive keys / values + assert!(!cc.public()); + assert_eq!( + cc.max_age().unwrap_err().context.unwrap().to_string(), + "could not parse value as u32" + ); + + let mut directive_iter = cc.directives.iter(); + let first = directive_iter.next().unwrap(); + assert_eq!(first.0, &"püblic"); + assert_eq!(first.1.as_ref().unwrap().0, "💖".as_bytes()); + + let second = directive_iter.next().unwrap(); + assert_eq!(second.0, &"max-age"); + assert_eq!(second.1.as_ref().unwrap().0, "\"💯\"".as_bytes()); + + assert!(directive_iter.next().is_none()); + } + + #[test] + fn test_non_utf8_key() { + let mut resp = response::Builder::new().body(()).unwrap(); + resp.headers_mut().insert( + CACHE_CONTROL, + HeaderValue::from_bytes(b"bar\xFF=\"baz\", a=b").unwrap(), + ); + let (parts, _) = resp.into_parts(); + let cc = CacheControl::from_resp_headers(&parts).unwrap(); + + // invalid bytes for key + let mut directive_iter = cc.directives.iter(); + let first = directive_iter.next().unwrap(); + assert_eq!(first.0, &"a"); + assert_eq!(first.1.as_ref().unwrap().0, "b".as_bytes()); + + assert!(directive_iter.next().is_none()); + } + + #[test] + fn test_non_utf8_value() { + // RFC 7230: 0xFF is part of obs-text and is officially considered a valid octet in quoted-strings + let mut resp = response::Builder::new().body(()).unwrap(); + resp.headers_mut().insert( + CACHE_CONTROL, + HeaderValue::from_bytes(b"max-age=ba\xFFr, bar=\"baz\xFF\", a=b").unwrap(), + ); + let (parts, _) = resp.into_parts(); + let cc = CacheControl::from_resp_headers(&parts).unwrap(); + + assert_eq!( + cc.max_age().unwrap_err().context.unwrap().to_string(), + "could not parse value as utf8" + ); + + let mut directive_iter = cc.directives.iter(); + + let first = directive_iter.next().unwrap(); + assert_eq!(first.0, &"max-age"); + assert_eq!(first.1.as_ref().unwrap().0, b"ba\xFFr"); + + let second = directive_iter.next().unwrap(); + assert_eq!(second.0, &"bar"); + assert_eq!(second.1.as_ref().unwrap().0, b"\"baz\xFF\""); + + let third = directive_iter.next().unwrap(); + assert_eq!(third.0, &"a"); + assert_eq!(third.1.as_ref().unwrap().0, "b".as_bytes()); + + assert!(directive_iter.next().is_none()); + } + + #[test] + fn test_age_overflow() { + let resp = build_response( + CACHE_CONTROL, + "max-age=-99999999999999999999999999, s-maxage=99999999999999999999999999", + ); + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + + assert_eq!( + cc.s_maxage().unwrap().unwrap(), + DELTA_SECONDS_OVERFLOW_VALUE + ); + // negative ages still result in errors even with overflow handling + assert_eq!( + cc.max_age().unwrap_err().context.unwrap().to_string(), + "could not parse value as u32" + ); + } + + #[test] + fn test_fresh_sec() { + let resp = build_response(CACHE_CONTROL, ""); + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + assert!(cc.fresh_sec().is_none()); + + let resp = build_response(CACHE_CONTROL, "max-age=12345"); + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + assert_eq!(cc.fresh_sec().unwrap(), 12345); + + let resp = build_response(CACHE_CONTROL, "max-age=99999,s-maxage=123"); + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + // prefer s-maxage over max-age + assert_eq!(cc.fresh_sec().unwrap(), 123); + } + + #[test] + fn test_cacheability() { + let resp = build_response(CACHE_CONTROL, ""); + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + assert_eq!(cc.is_cacheable(), Cacheable::Default); + + // uncacheable + let resp = build_response(CACHE_CONTROL, "private, max-age=12345"); + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + assert_eq!(cc.is_cacheable(), Cacheable::No); + + let resp = build_response(CACHE_CONTROL, "no-store, max-age=12345"); + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + assert_eq!(cc.is_cacheable(), Cacheable::No); + + // cacheable + let resp = build_response(CACHE_CONTROL, "public"); + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + assert_eq!(cc.is_cacheable(), Cacheable::Yes); + + let resp = build_response(CACHE_CONTROL, "max-age=0"); + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + assert_eq!(cc.is_cacheable(), Cacheable::Yes); + } + + #[test] + fn test_no_cache() { + let resp = build_response(CACHE_CONTROL, "no-cache, max-age=12345"); + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + assert_eq!(cc.is_cacheable(), Cacheable::Yes); + assert_eq!(cc.fresh_sec().unwrap(), 0); + } + + #[test] + fn test_no_cache_field_names() { + let resp = build_response(CACHE_CONTROL, "no-cache=\"set-cookie\", max-age=12345"); + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + assert!(!cc.private()); + assert_eq!(cc.is_cacheable(), Cacheable::Yes); + assert_eq!(cc.fresh_sec().unwrap(), 12345); + let mut field_names = cc.no_cache_field_names().unwrap(); + assert_eq!( + str::from_utf8(field_names.next().unwrap()).unwrap(), + "set-cookie" + ); + assert!(field_names.next().is_none()); + + let mut resp = response::Builder::new().body(()).unwrap(); + resp.headers_mut().insert( + CACHE_CONTROL, + HeaderValue::from_bytes( + b"private=\"\", no-cache=\"a\xFF, set-cookie, Baz\x09 , c,d ,, \"", + ) + .unwrap(), + ); + let (parts, _) = resp.into_parts(); + let cc = CacheControl::from_resp_headers(&parts).unwrap(); + let mut field_names = cc.private_field_names().unwrap(); + assert_eq!(str::from_utf8(field_names.next().unwrap()).unwrap(), ""); + assert!(field_names.next().is_none()); + let mut field_names = cc.no_cache_field_names().unwrap(); + assert!(str::from_utf8(field_names.next().unwrap()).is_err()); + assert_eq!( + str::from_utf8(field_names.next().unwrap()).unwrap(), + "set-cookie" + ); + assert_eq!(str::from_utf8(field_names.next().unwrap()).unwrap(), "Baz"); + assert_eq!(str::from_utf8(field_names.next().unwrap()).unwrap(), "c"); + assert_eq!(str::from_utf8(field_names.next().unwrap()).unwrap(), "d"); + assert_eq!(str::from_utf8(field_names.next().unwrap()).unwrap(), ""); + assert_eq!(str::from_utf8(field_names.next().unwrap()).unwrap(), ""); + assert!(field_names.next().is_none()); + } + + #[test] + fn test_strip_private_headers() { + let mut resp = ResponseHeader::build(200, None).unwrap(); + resp.append_header( + CACHE_CONTROL, + "no-cache=\"x-private-header\", max-age=12345", + ) + .unwrap(); + resp.append_header("X-Private-Header", "dropped").unwrap(); + + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + cc.strip_private_headers(&mut resp); + assert!(!resp.headers.contains_key("X-Private-Header")); + } + + #[test] + fn test_stale_while_revalidate() { + let resp = build_response(CACHE_CONTROL, "max-age=12345, stale-while-revalidate=5"); + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + assert_eq!(cc.stale_while_revalidate().unwrap().unwrap(), 5); + assert_eq!(cc.serve_stale_while_revalidate_sec().unwrap(), 5); + assert!(cc.serve_stale_if_error_sec().is_none()); + } + + #[test] + fn test_stale_if_error() { + let resp = build_response(CACHE_CONTROL, "max-age=12345, stale-if-error=3600"); + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + assert_eq!(cc.stale_if_error().unwrap().unwrap(), 3600); + assert_eq!(cc.serve_stale_if_error_sec().unwrap(), 3600); + assert!(cc.serve_stale_while_revalidate_sec().is_none()); + } + + #[test] + fn test_must_revalidate() { + let resp = build_response( + CACHE_CONTROL, + "max-age=12345, stale-while-revalidate=60, stale-if-error=30, must-revalidate", + ); + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + assert!(cc.must_revalidate()); + assert_eq!(cc.stale_while_revalidate().unwrap().unwrap(), 60); + assert_eq!(cc.stale_if_error().unwrap().unwrap(), 30); + assert_eq!(cc.serve_stale_while_revalidate_sec().unwrap(), 0); + assert_eq!(cc.serve_stale_if_error_sec().unwrap(), 0); + } + + #[test] + fn test_proxy_revalidate() { + let resp = build_response( + CACHE_CONTROL, + "max-age=12345, stale-while-revalidate=60, stale-if-error=30, proxy-revalidate", + ); + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + assert!(cc.proxy_revalidate()); + assert_eq!(cc.stale_while_revalidate().unwrap().unwrap(), 60); + assert_eq!(cc.stale_if_error().unwrap().unwrap(), 30); + assert_eq!(cc.serve_stale_while_revalidate_sec().unwrap(), 0); + assert_eq!(cc.serve_stale_if_error_sec().unwrap(), 0); + } + + #[test] + fn test_s_maxage_stale() { + let resp = build_response( + CACHE_CONTROL, + "s-maxage=0, stale-while-revalidate=60, stale-if-error=30", + ); + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + assert_eq!(cc.stale_while_revalidate().unwrap().unwrap(), 60); + assert_eq!(cc.stale_if_error().unwrap().unwrap(), 30); + assert_eq!(cc.serve_stale_while_revalidate_sec().unwrap(), 0); + assert_eq!(cc.serve_stale_if_error_sec().unwrap(), 0); + } + + #[test] + fn test_authorized_request() { + let resp = build_response(CACHE_CONTROL, "max-age=10"); + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + assert!(!cc.allow_caching_authorized_req()); + + let resp = build_response(CACHE_CONTROL, "s-maxage=10"); + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + assert!(cc.allow_caching_authorized_req()); + + let resp = build_response(CACHE_CONTROL, "public"); + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + assert!(cc.allow_caching_authorized_req()); + + let resp = build_response(CACHE_CONTROL, "must-revalidate, max-age=0"); + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + assert!(cc.allow_caching_authorized_req()); + + let resp = build_response(CACHE_CONTROL, ""); + let cc = CacheControl::from_resp_headers(&resp).unwrap(); + assert!(!cc.allow_caching_authorized_req()); + } + + fn build_request(cc_key: HeaderName, cc_value: &str) -> request::Parts { + let (parts, _) = request::Builder::new() + .header(cc_key, cc_value) + .body(()) + .unwrap() + .into_parts(); + parts + } + + #[test] + fn test_request_only_if_cached() { + let req = build_request(CACHE_CONTROL, "only-if-cached=1"); + let cc = CacheControl::from_req_headers(&req).unwrap(); + assert!(cc.only_if_cached()) + } +} diff --git a/pingora-cache/src/eviction/lru.rs b/pingora-cache/src/eviction/lru.rs new file mode 100644 index 0000000..47c88a6 --- /dev/null +++ b/pingora-cache/src/eviction/lru.rs @@ -0,0 +1,431 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! A shared LRU cache manager + +use super::EvictionManager; +use crate::key::CompactCacheKey; + +use async_trait::async_trait; +use pingora_error::{BError, ErrorType::*, OrErr, Result}; +use pingora_lru::Lru; +use serde::de::SeqAccess; +use serde::{Deserialize, Serialize}; +use std::fs::File; +use std::hash::{Hash, Hasher}; +use std::io::prelude::*; +use std::path::Path; +use std::time::SystemTime; + +/// A shared LRU cache manager designed to manage a large volume of assets. +/// +/// - Space optimized in-memory LRU (see [pingora_lru]). +/// - Instead of a single giant LRU, this struct shards the assets into `N` independent LRUs. +/// This allows [EvictionManager::save()] not to lock the entire cache mananger while performing +/// serialization. +pub struct Manager<const N: usize>(Lru<CompactCacheKey, N>); + +#[derive(Debug, Serialize, Deserialize)] +struct SerdeHelperNode(CompactCacheKey, usize); + +impl<const N: usize> Manager<N> { + /// Create a [Manager] with the given size limit and estimated per shard capacity. + /// + /// The `capacity` is for preallocating to avoid reallocation cost when the LRU grows. + pub fn with_capacity(limit: usize, capacity: usize) -> Self { + Manager(Lru::with_capacity(limit, capacity)) + } + + /// Serialize the given shard + pub fn serialize_shard(&self, shard: usize) -> Result<Vec<u8>> { + use rmp_serde::encode::Serializer; + use serde::ser::SerializeSeq; + use serde::ser::Serializer as _; + + assert!(shard < N); + + // NOTE: This could use a lot memory to buffer the serialized data in memory + // NOTE: This for loop could lock the LRU for too long + let mut nodes = Vec::with_capacity(self.0.shard_len(shard)); + self.0.iter_for_each(shard, |(node, size)| { + nodes.push(SerdeHelperNode(node.clone(), size)); + }); + let mut ser = Serializer::new(vec![]); + let mut seq = ser + .serialize_seq(Some(self.0.shard_len(shard))) + .or_err(InternalError, "fail to serialize node")?; + for node in nodes { + seq.serialize_element(&node).unwrap(); // write to vec, safe + } + + seq.end().or_err(InternalError, "when serializing LRU")?; + Ok(ser.into_inner()) + } + + /// Deserialize a shard + /// + /// Shard number is not needed because the key itself will hash to the correct shard. + pub fn deserialize_shard(&self, buf: &[u8]) -> Result<()> { + use rmp_serde::decode::Deserializer; + use serde::de::Deserializer as _; + + let mut de = Deserializer::new(buf); + let visitor = InsertToManager { lru: self }; + de.deserialize_seq(visitor) + .or_err(InternalError, "when deserializing LRU")?; + Ok(()) + } +} + +struct InsertToManager<'a, const N: usize> { + lru: &'a Manager<N>, +} + +impl<'de, 'a, const N: usize> serde::de::Visitor<'de> for InsertToManager<'a, N> { + type Value = (); + + fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::Result { + formatter.write_str("array of lru nodes") + } + + fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error> + where + A: SeqAccess<'de>, + { + while let Some(node) = seq.next_element::<SerdeHelperNode>()? { + let key = u64key(&node.0); + self.lru.0.insert_tail(key, node.0, node.1); // insert in the back + } + Ok(()) + } +} + +#[inline] +fn u64key(key: &CompactCacheKey) -> u64 { + // note that std hash is not uniform, I'm not sure if ahash is also the case + let mut hasher = ahash::AHasher::default(); + key.hash(&mut hasher); + hasher.finish() +} + +const FILE_NAME: &str = "lru.data"; + +#[inline] +fn err_str_path(s: &str, path: &Path) -> String { + format!("{s} {}", path.display()) +} + +#[async_trait] +impl<const N: usize> EvictionManager for Manager<N> { + fn total_size(&self) -> usize { + self.0.weight() + } + fn total_items(&self) -> usize { + self.0.len() + } + fn evicted_size(&self) -> usize { + self.0.evicted_weight() + } + fn evicted_items(&self) -> usize { + self.0.evicted_len() + } + + fn admit( + &self, + item: CompactCacheKey, + size: usize, + _fresh_until: SystemTime, + ) -> Vec<CompactCacheKey> { + let key = u64key(&item); + self.0.admit(key, item, size); + self.0 + .evict_to_limit() + .into_iter() + .map(|(key, _weight)| key) + .collect() + } + + fn remove(&self, item: &CompactCacheKey) { + let key = u64key(item); + self.0.remove(key); + } + + fn access(&self, item: &CompactCacheKey, size: usize, _fresh_until: SystemTime) -> bool { + let key = u64key(item); + if !self.0.promote(key) { + self.0.admit(key, item.clone(), size); + false + } else { + true + } + } + + fn peek(&self, item: &CompactCacheKey) -> bool { + let key = u64key(item); + self.0.peek(key) + } + + async fn save(&self, dir_path: &str) -> Result<()> { + let dir_path_str = dir_path.to_owned(); + + tokio::task::spawn_blocking(move || { + let dir_path = Path::new(&dir_path_str); + std::fs::create_dir_all(dir_path) + .or_err_with(InternalError, || err_str_path("fail to create", dir_path)) + }) + .await + .or_err(InternalError, "async blocking IO failure")??; + + for i in 0..N { + let data = self.serialize_shard(i)?; + let dir_path = dir_path.to_owned(); + tokio::task::spawn_blocking(move || { + let file_path = Path::new(&dir_path).join(format!("{}.{i}", FILE_NAME)); + let mut file = File::create(&file_path) + .or_err_with(InternalError, || err_str_path("fail to create", &file_path))?; + file.write_all(&data).or_err_with(InternalError, || { + err_str_path("fail to write to", &file_path) + }) + }) + .await + .or_err(InternalError, "async blocking IO failure")??; + } + Ok(()) + } + + async fn load(&self, dir_path: &str) -> Result<()> { + // TODO: check the saved shards so that we load all the save files + for i in 0..N { + let dir_path = dir_path.to_owned(); + + let data = tokio::task::spawn_blocking(move || { + let file_path = Path::new(&dir_path).join(format!("{}.{i}", FILE_NAME)); + let mut file = File::open(&file_path) + .or_err_with(InternalError, || err_str_path("fail to open", &file_path))?; + let mut buffer = Vec::with_capacity(8192); + file.read_to_end(&mut buffer) + .or_err_with(InternalError, || { + err_str_path("fail to write to", &file_path) + })?; + Ok::<Vec<u8>, BError>(buffer) + }) + .await + .or_err(InternalError, "async blocking IO failure")??; + self.deserialize_shard(&data)?; + } + + Ok(()) + } +} + +#[cfg(test)] +mod test { + use super::*; + use crate::CacheKey; + use EvictionManager; + + // we use shard (N) = 1 for eviction consistency in all tests + + #[test] + fn test_admission() { + let lru = Manager::<1>::with_capacity(4, 10); + let key1 = CacheKey::new("", "a", "1").to_compact(); + let until = SystemTime::now(); // unused value as a placeholder + let v = lru.admit(key1.clone(), 1, until); + assert_eq!(v.len(), 0); + let key2 = CacheKey::new("", "b", "1").to_compact(); + let v = lru.admit(key2.clone(), 2, until); + assert_eq!(v.len(), 0); + let key3 = CacheKey::new("", "c", "1").to_compact(); + let v = lru.admit(key3, 1, until); + assert_eq!(v.len(), 0); + + // lru si full (4) now + + let key4 = CacheKey::new("", "d", "1").to_compact(); + let v = lru.admit(key4, 2, until); + // need to reduce used by at least 2, both key1 and key2 are evicted to make room for 3 + assert_eq!(v.len(), 2); + assert_eq!(v[0], key1); + assert_eq!(v[1], key2); + } + + #[test] + fn test_access() { + let lru = Manager::<1>::with_capacity(4, 10); + let key1 = CacheKey::new("", "a", "1").to_compact(); + let until = SystemTime::now(); // unused value as a placeholder + let v = lru.admit(key1.clone(), 1, until); + assert_eq!(v.len(), 0); + let key2 = CacheKey::new("", "b", "1").to_compact(); + let v = lru.admit(key2.clone(), 2, until); + assert_eq!(v.len(), 0); + let key3 = CacheKey::new("", "c", "1").to_compact(); + let v = lru.admit(key3, 1, until); + assert_eq!(v.len(), 0); + + // lru is full (4) now + // make key1 most recently used + lru.access(&key1, 1, until); + assert_eq!(v.len(), 0); + + let key4 = CacheKey::new("", "d", "1").to_compact(); + let v = lru.admit(key4, 2, until); + assert_eq!(v.len(), 1); + assert_eq!(v[0], key2); + } + + #[test] + fn test_remove() { + let lru = Manager::<1>::with_capacity(4, 10); + let key1 = CacheKey::new("", "a", "1").to_compact(); + let until = SystemTime::now(); // unused value as a placeholder + let v = lru.admit(key1.clone(), 1, until); + assert_eq!(v.len(), 0); + let key2 = CacheKey::new("", "b", "1").to_compact(); + let v = lru.admit(key2.clone(), 2, until); + assert_eq!(v.len(), 0); + let key3 = CacheKey::new("", "c", "1").to_compact(); + let v = lru.admit(key3, 1, until); + assert_eq!(v.len(), 0); + + // lru is full (4) now + // remove key1 + lru.remove(&key1); + + // key2 is the least recently used one now + let key4 = CacheKey::new("", "d", "1").to_compact(); + let v = lru.admit(key4, 2, until); + assert_eq!(v.len(), 1); + assert_eq!(v[0], key2); + } + + #[test] + fn test_access_add() { + let lru = Manager::<1>::with_capacity(4, 10); + let until = SystemTime::now(); // unused value as a placeholder + + let key1 = CacheKey::new("", "a", "1").to_compact(); + lru.access(&key1, 1, until); + let key2 = CacheKey::new("", "b", "1").to_compact(); + lru.access(&key2, 2, until); + let key3 = CacheKey::new("", "c", "1").to_compact(); + lru.access(&key3, 2, until); + + let key4 = CacheKey::new("", "d", "1").to_compact(); + let v = lru.admit(key4, 2, until); + // need to reduce used by at least 2, both key1 and key2 are evicted to make room for 3 + assert_eq!(v.len(), 2); + assert_eq!(v[0], key1); + assert_eq!(v[1], key2); + } + + #[test] + fn test_admit_update() { + let lru = Manager::<1>::with_capacity(4, 10); + let key1 = CacheKey::new("", "a", "1").to_compact(); + let until = SystemTime::now(); // unused value as a placeholder + let v = lru.admit(key1.clone(), 1, until); + assert_eq!(v.len(), 0); + let key2 = CacheKey::new("", "b", "1").to_compact(); + let v = lru.admit(key2.clone(), 2, until); + assert_eq!(v.len(), 0); + let key3 = CacheKey::new("", "c", "1").to_compact(); + let v = lru.admit(key3, 1, until); + assert_eq!(v.len(), 0); + + // lru is full (4) now + // update key2 to reduce its size by 1 + let v = lru.admit(key2, 1, until); + assert_eq!(v.len(), 0); + + // lru is not full anymore + let key4 = CacheKey::new("", "d", "1").to_compact(); + let v = lru.admit(key4.clone(), 1, until); + assert_eq!(v.len(), 0); + + // make key4 larger + let v = lru.admit(key4, 2, until); + // need to evict now + assert_eq!(v.len(), 1); + assert_eq!(v[0], key1); + } + + #[test] + fn test_peek() { + let lru = Manager::<1>::with_capacity(4, 10); + let until = SystemTime::now(); // unused value as a placeholder + + let key1 = CacheKey::new("", "a", "1").to_compact(); + lru.access(&key1, 1, until); + let key2 = CacheKey::new("", "b", "1").to_compact(); + lru.access(&key2, 2, until); + assert!(lru.peek(&key1)); + assert!(lru.peek(&key2)); + } + + #[test] + fn test_serde() { + let lru = Manager::<1>::with_capacity(4, 10); + let key1 = CacheKey::new("", "a", "1").to_compact(); + let until = SystemTime::now(); // unused value as a placeholder + let v = lru.admit(key1.clone(), 1, until); + assert_eq!(v.len(), 0); + let key2 = CacheKey::new("", "b", "1").to_compact(); + let v = lru.admit(key2.clone(), 2, until); + assert_eq!(v.len(), 0); + let key3 = CacheKey::new("", "c", "1").to_compact(); + let v = lru.admit(key3, 1, until); + assert_eq!(v.len(), 0); + + // lru is full (4) now + // make key1 most recently used + lru.access(&key1, 1, until); + assert_eq!(v.len(), 0); + + // load lru2 with lru's data + let ser = lru.serialize_shard(0).unwrap(); + let lru2 = Manager::<1>::with_capacity(4, 10); + lru2.deserialize_shard(&ser).unwrap(); + + let key4 = CacheKey::new("", "d", "1").to_compact(); + let v = lru2.admit(key4, 2, until); + assert_eq!(v.len(), 1); + assert_eq!(v[0], key2); + } + + #[tokio::test] + async fn test_save_to_disk() { + let until = SystemTime::now(); // unused value as a placeholder + let lru = Manager::<2>::with_capacity(10, 10); + + lru.admit(CacheKey::new("", "a", "1").to_compact(), 1, until); + lru.admit(CacheKey::new("", "b", "1").to_compact(), 2, until); + lru.admit(CacheKey::new("", "c", "1").to_compact(), 1, until); + lru.admit(CacheKey::new("", "d", "1").to_compact(), 1, until); + lru.admit(CacheKey::new("", "e", "1").to_compact(), 2, until); + lru.admit(CacheKey::new("", "f", "1").to_compact(), 1, until); + + // load lru2 with lru's data + lru.save("/tmp/test_lru_save").await.unwrap(); + let lru2 = Manager::<2>::with_capacity(4, 10); + lru2.load("/tmp/test_lru_save").await.unwrap(); + + let ser0 = lru.serialize_shard(0).unwrap(); + let ser1 = lru.serialize_shard(1).unwrap(); + + assert_eq!(ser0, lru2.serialize_shard(0).unwrap()); + assert_eq!(ser1, lru2.serialize_shard(1).unwrap()); + } +} diff --git a/pingora-cache/src/eviction/mod.rs b/pingora-cache/src/eviction/mod.rs new file mode 100644 index 0000000..7c9d60b --- /dev/null +++ b/pingora-cache/src/eviction/mod.rs @@ -0,0 +1,89 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Cache eviction module + +use crate::key::CompactCacheKey; + +use async_trait::async_trait; +use pingora_error::Result; +use std::time::SystemTime; + +pub mod lru; +pub mod simple_lru; + +/// The trait that a cache eviction algorithm needs to implement +/// +/// NOTE: these trait methods require &self not &mut self, which means concurrency should +/// be handled the implementations internally. +#[async_trait] +pub trait EvictionManager { + /// Total size of the cache in bytes tracked by this eviction mananger + fn total_size(&self) -> usize; + /// Number of assets tracked by this eviction mananger + fn total_items(&self) -> usize; + /// Number of bytes that are already evicted + /// + /// The accumulated number is returned to play well with Prometheus counter metric type. + fn evicted_size(&self) -> usize; + /// Number of assets that are already evicted + /// + /// The accumulated number is returned to play well with Prometheus counter metric type. + fn evicted_items(&self) -> usize; + + /// Admit an item + /// + /// Return one or more items to evict. The sizes of these items are deducted + /// from the total size already. The caller needs to make sure that these assets are actually + /// removed from the storage. + /// + /// If the item is already admitted, A. update its freshness; B. if the new size is larger than the + /// existing one, Some(_) might be returned for the caller to evict. + fn admit( + &self, + item: CompactCacheKey, + size: usize, + fresh_until: SystemTime, + ) -> Vec<CompactCacheKey>; + + /// Remove an item from the eviction manager. + /// + /// The size of the item will be deducted. + fn remove(&self, item: &CompactCacheKey); + + /// Access an item that should already be in cache. + /// + /// If the item is not tracked by this [EvictionManager], track it but no eviction will happen. + /// + /// The call used for asking the eviction manager to track the assets that are already admitted + /// in the cache storage system. + fn access(&self, item: &CompactCacheKey, size: usize, fresh_until: SystemTime) -> bool; + + /// Peek into the manager to see if the item is already tracked by the system + /// + /// This function should have no side-effect on the asset itself. For example, for LRU, this + /// method shouldn't change the popularity of the asset being peeked. + fn peek(&self, item: &CompactCacheKey) -> bool; + + /// Serialize to save the state of this eviction mananger to disk + /// + /// This function is for preserving the eviction manager's state across server restarts. + /// + /// `dir_path` define the directory on disk that the data should use. + // dir_path is &str no AsRef<Path> so that trait objects can be used + async fn save(&self, dir_path: &str) -> Result<()>; + + /// The counterpart of [Self::save()]. + async fn load(&self, dir_path: &str) -> Result<()>; +} diff --git a/pingora-cache/src/eviction/simple_lru.rs b/pingora-cache/src/eviction/simple_lru.rs new file mode 100644 index 0000000..73efb85 --- /dev/null +++ b/pingora-cache/src/eviction/simple_lru.rs @@ -0,0 +1,445 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! A simple LRU cache manager built on top of the `lru` crate + +use super::EvictionManager; +use crate::key::CompactCacheKey; + +use async_trait::async_trait; +use lru::LruCache; +use parking_lot::RwLock; +use pingora_error::{BError, ErrorType::*, OrErr, Result}; +use serde::de::SeqAccess; +use serde::{Deserialize, Serialize}; +use std::collections::hash_map::DefaultHasher; +use std::fs::File; +use std::hash::{Hash, Hasher}; +use std::io::prelude::*; +use std::path::Path; +use std::sync::atomic::{AtomicUsize, Ordering}; +use std::time::SystemTime; + +#[derive(Debug, Deserialize, Serialize)] +struct Node { + key: CompactCacheKey, + size: usize, +} + +/// A simple LRU eviction manager +/// +/// The implementation is not optimized. All operation require global locks. +pub struct Manager { + lru: RwLock<LruCache<u64, Node>>, + limit: usize, + used: AtomicUsize, + items: AtomicUsize, + evicted_size: AtomicUsize, + evicted_items: AtomicUsize, +} + +impl Manager { + /// Create a new [Manager] with the given total size limit `limit`. + pub fn new(limit: usize) -> Self { + Manager { + lru: RwLock::new(LruCache::unbounded()), + limit, + used: AtomicUsize::new(0), + items: AtomicUsize::new(0), + evicted_size: AtomicUsize::new(0), + evicted_items: AtomicUsize::new(0), + } + } + + fn insert(&self, hash_key: u64, node: CompactCacheKey, size: usize, reverse: bool) { + use std::cmp::Ordering::*; + let node = Node { key: node, size }; + let old = { + let mut lru = self.lru.write(); + let old = lru.push(hash_key, node); + if reverse && old.is_none() { + lru.demote(&hash_key); + } + old + }; + if let Some(old) = old { + // replacing a node, just need to update used size + match size.cmp(&old.1.size) { + Greater => self.used.fetch_add(size - old.1.size, Ordering::Relaxed), + Less => self.used.fetch_sub(old.1.size - size, Ordering::Relaxed), + Equal => 0, // same size, update nothing, use 0 to match other arms' type + }; + } else { + self.used.fetch_add(size, Ordering::Relaxed); + self.items.fetch_add(1, Ordering::Relaxed); + } + } + + // evict items until the used capacity is below limit + fn evict(&self) -> Vec<CompactCacheKey> { + if self.used.load(Ordering::Relaxed) <= self.limit { + return vec![]; + } + let mut to_evict = Vec::with_capacity(1); // we will at least pop 1 item + while self.used.load(Ordering::Relaxed) > self.limit { + if let Some((_, node)) = self.lru.write().pop_lru() { + self.used.fetch_sub(node.size, Ordering::Relaxed); + self.items.fetch_sub(1, Ordering::Relaxed); + self.evicted_size.fetch_add(node.size, Ordering::Relaxed); + self.evicted_items.fetch_add(1, Ordering::Relaxed); + to_evict.push(node.key); + } else { + // lru empty + return to_evict; + } + } + to_evict + } + + // This could use a lot memory to buffer the serialized data in memory and could lock the LRU + // for too long + fn serialize(&self) -> Result<Vec<u8>> { + use rmp_serde::encode::Serializer; + use serde::ser::SerializeSeq; + use serde::ser::Serializer as _; + // NOTE: This could use a lot memory to buffer the serialized data in memory + let mut ser = Serializer::new(vec![]); + // NOTE: This long for loop could lock the LRU for too long + let lru = self.lru.read(); + let mut seq = ser + .serialize_seq(Some(lru.len())) + .or_err(InternalError, "fail to serialize node")?; + for item in lru.iter() { + seq.serialize_element(item.1).unwrap(); // write to vec, safe + } + seq.end().or_err(InternalError, "when serializing LRU")?; + Ok(ser.into_inner()) + } + + fn deserialize(&self, buf: &[u8]) -> Result<()> { + use rmp_serde::decode::Deserializer; + use serde::de::Deserializer as _; + let mut de = Deserializer::new(buf); + let visitor = InsertToManager { lru: self }; + de.deserialize_seq(visitor) + .or_err(InternalError, "when deserializing LRU")?; + Ok(()) + } +} + +struct InsertToManager<'a> { + lru: &'a Manager, +} + +impl<'de, 'a> serde::de::Visitor<'de> for InsertToManager<'a> { + type Value = (); + + fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::Result { + formatter.write_str("array of lru nodes") + } + + fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error> + where + A: SeqAccess<'de>, + { + while let Some(node) = seq.next_element::<Node>()? { + let key = u64key(&node.key); + self.lru.insert(key, node.key, node.size, true); // insert in the back + } + Ok(()) + } +} + +#[inline] +fn u64key(key: &CompactCacheKey) -> u64 { + let mut hasher = DefaultHasher::new(); + key.hash(&mut hasher); + hasher.finish() +} + +const FILE_NAME: &str = "simple_lru.data"; + +#[async_trait] +impl EvictionManager for Manager { + fn total_size(&self) -> usize { + self.used.load(Ordering::Relaxed) + } + fn total_items(&self) -> usize { + self.items.load(Ordering::Relaxed) + } + fn evicted_size(&self) -> usize { + self.evicted_size.load(Ordering::Relaxed) + } + fn evicted_items(&self) -> usize { + self.evicted_items.load(Ordering::Relaxed) + } + + fn admit( + &self, + item: CompactCacheKey, + size: usize, + _fresh_until: SystemTime, + ) -> Vec<CompactCacheKey> { + let key = u64key(&item); + self.insert(key, item, size, false); + self.evict() + } + + fn remove(&self, item: &CompactCacheKey) { + let key = u64key(item); + let node = self.lru.write().pop(&key); + if let Some(n) = node { + self.used.fetch_sub(n.size, Ordering::Relaxed); + self.items.fetch_sub(1, Ordering::Relaxed); + } + } + + fn access(&self, item: &CompactCacheKey, size: usize, _fresh_until: SystemTime) -> bool { + let key = u64key(item); + if self.lru.write().get(&key).is_none() { + self.insert(key, item.clone(), size, false); + false + } else { + true + } + } + + fn peek(&self, item: &CompactCacheKey) -> bool { + let key = u64key(item); + self.lru.read().peek(&key).is_some() + } + + async fn save(&self, dir_path: &str) -> Result<()> { + let data = self.serialize()?; + let dir_path = dir_path.to_owned(); + tokio::task::spawn_blocking(move || { + let dir_path = Path::new(&dir_path); + std::fs::create_dir_all(dir_path).or_err(InternalError, "fail to create {dir_path}")?; + let file_path = dir_path.join(FILE_NAME); + let mut file = + File::create(file_path).or_err(InternalError, "fail to create {file_path}")?; + file.write_all(&data) + .or_err(InternalError, "fail to write to {file_path}") + }) + .await + .or_err(InternalError, "async blocking IO failure")? + } + + async fn load(&self, dir_path: &str) -> Result<()> { + let dir_path = dir_path.to_owned(); + let data = tokio::task::spawn_blocking(move || { + let file_path = Path::new(&dir_path).join(FILE_NAME); + let mut file = + File::open(file_path).or_err(InternalError, "fail to open {file_path}")?; + let mut buffer = Vec::with_capacity(8192); + file.read_to_end(&mut buffer) + .or_err(InternalError, "fail to write to {file_path}")?; + Ok::<Vec<u8>, BError>(buffer) + }) + .await + .or_err(InternalError, "async blocking IO failure")??; + self.deserialize(&data) + } +} + +#[cfg(test)] +mod test { + use super::*; + use crate::CacheKey; + + #[test] + fn test_admission() { + let lru = Manager::new(4); + let key1 = CacheKey::new("", "a", "1").to_compact(); + let until = SystemTime::now(); // unused value as a placeholder + let v = lru.admit(key1.clone(), 1, until); + assert_eq!(v.len(), 0); + let key2 = CacheKey::new("", "b", "1").to_compact(); + let v = lru.admit(key2.clone(), 2, until); + assert_eq!(v.len(), 0); + let key3 = CacheKey::new("", "c", "1").to_compact(); + let v = lru.admit(key3, 1, until); + assert_eq!(v.len(), 0); + + // lru si full (4) now + + let key4 = CacheKey::new("", "d", "1").to_compact(); + let v = lru.admit(key4, 2, until); + // need to reduce used by at least 2, both key1 and key2 are evicted to make room for 3 + assert_eq!(v.len(), 2); + assert_eq!(v[0], key1); + assert_eq!(v[1], key2); + } + + #[test] + fn test_access() { + let lru = Manager::new(4); + let key1 = CacheKey::new("", "a", "1").to_compact(); + let until = SystemTime::now(); // unused value as a placeholder + let v = lru.admit(key1.clone(), 1, until); + assert_eq!(v.len(), 0); + let key2 = CacheKey::new("", "b", "1").to_compact(); + let v = lru.admit(key2.clone(), 2, until); + assert_eq!(v.len(), 0); + let key3 = CacheKey::new("", "c", "1").to_compact(); + let v = lru.admit(key3, 1, until); + assert_eq!(v.len(), 0); + + // lru is full (4) now + // make key1 most recently used + lru.access(&key1, 1, until); + assert_eq!(v.len(), 0); + + let key4 = CacheKey::new("", "d", "1").to_compact(); + let v = lru.admit(key4, 2, until); + assert_eq!(v.len(), 1); + assert_eq!(v[0], key2); + } + + #[test] + fn test_remove() { + let lru = Manager::new(4); + let key1 = CacheKey::new("", "a", "1").to_compact(); + let until = SystemTime::now(); // unused value as a placeholder + let v = lru.admit(key1.clone(), 1, until); + assert_eq!(v.len(), 0); + let key2 = CacheKey::new("", "b", "1").to_compact(); + let v = lru.admit(key2.clone(), 2, until); + assert_eq!(v.len(), 0); + let key3 = CacheKey::new("", "c", "1").to_compact(); + let v = lru.admit(key3, 1, until); + assert_eq!(v.len(), 0); + + // lru is full (4) now + // remove key1 + lru.remove(&key1); + + // key2 is the least recently used one now + let key4 = CacheKey::new("", "d", "1").to_compact(); + let v = lru.admit(key4, 2, until); + assert_eq!(v.len(), 1); + assert_eq!(v[0], key2); + } + + #[test] + fn test_access_add() { + let lru = Manager::new(4); + let until = SystemTime::now(); // unused value as a placeholder + + let key1 = CacheKey::new("", "a", "1").to_compact(); + lru.access(&key1, 1, until); + let key2 = CacheKey::new("", "b", "1").to_compact(); + lru.access(&key2, 2, until); + let key3 = CacheKey::new("", "c", "1").to_compact(); + lru.access(&key3, 2, until); + + let key4 = CacheKey::new("", "d", "1").to_compact(); + let v = lru.admit(key4, 2, until); + // need to reduce used by at least 2, both key1 and key2 are evicted to make room for 3 + assert_eq!(v.len(), 2); + assert_eq!(v[0], key1); + assert_eq!(v[1], key2); + } + + #[test] + fn test_admit_update() { + let lru = Manager::new(4); + let key1 = CacheKey::new("", "a", "1").to_compact(); + let until = SystemTime::now(); // unused value as a placeholder + let v = lru.admit(key1.clone(), 1, until); + assert_eq!(v.len(), 0); + let key2 = CacheKey::new("", "b", "1").to_compact(); + let v = lru.admit(key2.clone(), 2, until); + assert_eq!(v.len(), 0); + let key3 = CacheKey::new("", "c", "1").to_compact(); + let v = lru.admit(key3, 1, until); + assert_eq!(v.len(), 0); + + // lru is full (4) now + // update key2 to reduce its size by 1 + let v = lru.admit(key2, 1, until); + assert_eq!(v.len(), 0); + + // lru is not full anymore + let key4 = CacheKey::new("", "d", "1").to_compact(); + let v = lru.admit(key4.clone(), 1, until); + assert_eq!(v.len(), 0); + + // make key4 larger + let v = lru.admit(key4, 2, until); + // need to evict now + assert_eq!(v.len(), 1); + assert_eq!(v[0], key1); + } + + #[test] + fn test_serde() { + let lru = Manager::new(4); + let key1 = CacheKey::new("", "a", "1").to_compact(); + let until = SystemTime::now(); // unused value as a placeholder + let v = lru.admit(key1.clone(), 1, until); + assert_eq!(v.len(), 0); + let key2 = CacheKey::new("", "b", "1").to_compact(); + let v = lru.admit(key2.clone(), 2, until); + assert_eq!(v.len(), 0); + let key3 = CacheKey::new("", "c", "1").to_compact(); + let v = lru.admit(key3, 1, until); + assert_eq!(v.len(), 0); + + // lru is full (4) now + // make key1 most recently used + lru.access(&key1, 1, until); + assert_eq!(v.len(), 0); + + // load lru2 with lru's data + let ser = lru.serialize().unwrap(); + let lru2 = Manager::new(4); + lru2.deserialize(&ser).unwrap(); + + let key4 = CacheKey::new("", "d", "1").to_compact(); + let v = lru2.admit(key4, 2, until); + assert_eq!(v.len(), 1); + assert_eq!(v[0], key2); + } + + #[tokio::test] + async fn test_save_to_disk() { + let lru = Manager::new(4); + let key1 = CacheKey::new("", "a", "1").to_compact(); + let until = SystemTime::now(); // unused value as a placeholder + let v = lru.admit(key1.clone(), 1, until); + assert_eq!(v.len(), 0); + let key2 = CacheKey::new("", "b", "1").to_compact(); + let v = lru.admit(key2.clone(), 2, until); + assert_eq!(v.len(), 0); + let key3 = CacheKey::new("", "c", "1").to_compact(); + let v = lru.admit(key3, 1, until); + assert_eq!(v.len(), 0); + + // lru is full (4) now + // make key1 most recently used + lru.access(&key1, 1, until); + assert_eq!(v.len(), 0); + + // load lru2 with lru's data + lru.save("/tmp/test_simple_lru_save").await.unwrap(); + let lru2 = Manager::new(4); + lru2.load("/tmp/test_simple_lru_save").await.unwrap(); + + let key4 = CacheKey::new("", "d", "1").to_compact(); + let v = lru2.admit(key4, 2, until); + assert_eq!(v.len(), 1); + assert_eq!(v[0], key2); + } +} diff --git a/pingora-cache/src/filters.rs b/pingora-cache/src/filters.rs new file mode 100644 index 0000000..8293cb5 --- /dev/null +++ b/pingora-cache/src/filters.rs @@ -0,0 +1,673 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Utility functions to help process HTTP headers for caching + +use super::*; +use crate::cache_control::{CacheControl, Cacheable, InterpretCacheControl}; +use crate::{RespCacheable, RespCacheable::*}; + +use http::{header, HeaderValue}; +use httpdate::HttpDate; +use log::warn; +use pingora_http::{RequestHeader, ResponseHeader}; + +/// Decide if the request can be cacheable +pub fn request_cacheable(req_header: &ReqHeader) -> bool { + // TODO: the check is incomplete + matches!(req_header.method, Method::GET | Method::HEAD) +} + +/// Decide if the response is cacheable. +/// +/// `cache_control` is the parsed [CacheControl] from the response header. It is an standalone +/// argument so that caller has the flexibility to choose to use, change or ignore it. +// TODO: vary processing +pub fn resp_cacheable( + cache_control: Option<&CacheControl>, + resp_header: &ResponseHeader, + authorization_present: bool, + defaults: &CacheMetaDefaults, +) -> RespCacheable { + let now = SystemTime::now(); + let expire_time = calculate_fresh_until( + now, + cache_control, + resp_header, + authorization_present, + defaults, + ); + if let Some(fresh_until) = expire_time { + let (stale_while_revalidate_sec, stale_if_error_sec) = + calculate_serve_stale_sec(cache_control, defaults); + + let mut cloned_header = resp_header.clone(); + if let Some(cc) = cache_control { + cc.strip_private_headers(&mut cloned_header); + } + return Cacheable(CacheMeta::new( + fresh_until, + now, + stale_while_revalidate_sec, + stale_if_error_sec, + cloned_header, + )); + } + Uncacheable(NoCacheReason::OriginNotCache) +} + +/// Calculate the [SystemTime] at which the asset expires +/// +/// Return None when not cacheable. +pub fn calculate_fresh_until( + now: SystemTime, + cache_control: Option<&CacheControl>, + resp_header: &RespHeader, + authorization_present: bool, + defaults: &CacheMetaDefaults, +) -> Option<SystemTime> { + fn freshness_ttl_to_time(now: SystemTime, fresh_sec: u32) -> Option<SystemTime> { + if fresh_sec == 0 { + // ensure that the response is treated as stale + now.checked_sub(Duration::from_secs(1)) + } else { + now.checked_add(Duration::from_secs(fresh_sec.into())) + } + } + + // A request with Authorization is normally not cacheable, unless Cache-Control allows it + if authorization_present { + let uncacheable = cache_control + .as_ref() + .map_or(true, |cc| !cc.allow_caching_authorized_req()); + if uncacheable { + return None; + } + } + + let uncacheable = cache_control + .as_ref() + .map_or(false, |cc| cc.is_cacheable() == Cacheable::No); + if uncacheable { + return None; + } + + // For TTL check cache-control first, then expires header, then defaults + cache_control + .and_then(|cc| { + cc.fresh_sec() + .and_then(|ttl| freshness_ttl_to_time(now, ttl)) + }) + .or_else(|| calculate_expires_header_time(resp_header)) + .or_else(|| { + defaults + .fresh_sec(resp_header.status) + .and_then(|ttl| freshness_ttl_to_time(now, ttl)) + }) +} + +/// Calculate the expire time from the `Expires` header only +pub fn calculate_expires_header_time(resp_header: &RespHeader) -> Option<SystemTime> { + // according to RFC 7234: + // https://datatracker.ietf.org/doc/html/rfc7234#section-4.2.1 + // - treat multiple expires headers as invalid + // https://datatracker.ietf.org/doc/html/rfc7234#section-5.3 + // - "MUST interpret invalid date formats... as representing a time in the past" + fn parse_expires_value(expires_value: &HeaderValue) -> Option<SystemTime> { + let expires = expires_value.to_str().ok()?; + Some(SystemTime::from( + expires + .parse::<HttpDate>() + .map_err(|e| warn!("Invalid HttpDate in Expires: {}, error: {}", expires, e)) + .ok()?, + )) + } + + let mut expires_iter = resp_header.headers.get_all("expires").iter(); + let expires_header = expires_iter.next(); + if expires_header.is_none() || expires_iter.next().is_some() { + return None; + } + parse_expires_value(expires_header.unwrap()).or(Some(SystemTime::UNIX_EPOCH)) +} + +/// Calculates stale-while-revalidate and stale-if-error seconds from Cache-Control or the [CacheMetaDefaults]. +pub fn calculate_serve_stale_sec( + cache_control: Option<&impl InterpretCacheControl>, + defaults: &CacheMetaDefaults, +) -> (u32, u32) { + let serve_stale_while_revalidate_sec = cache_control + .and_then(|cc| cc.serve_stale_while_revalidate_sec()) + .unwrap_or_else(|| defaults.serve_stale_while_revalidate_sec()); + let serve_stale_if_error_sec = cache_control + .and_then(|cc| cc.serve_stale_if_error_sec()) + .unwrap_or_else(|| defaults.serve_stale_if_error_sec()); + (serve_stale_while_revalidate_sec, serve_stale_if_error_sec) +} + +/// Filters to run when sending requests to upstream +pub mod upstream { + use super::*; + + /// Adjust the request header for cacheable requests + /// + /// This filter does the following in order to fetch the entire response to cache + /// - Convert HEAD to GET + /// - `If-*` headers are removed + /// - `Range` header is removed + /// + /// When `meta` is set, this function will inject `If-modified-since` according to the `Last-Modified` header + /// and inject `If-none-match` according to `Etag` header + pub fn request_filter(req: &mut RequestHeader, meta: Option<&CacheMeta>) -> Result<()> { + // change HEAD to GET, HEAD itself is not semantically cacheable + if req.method == Method::HEAD { + req.set_method(Method::GET); + } + + // remove downstream precondition headers https://datatracker.ietf.org/doc/html/rfc7232#section-3 + // we'd like to cache the 200 not the 304 + req.remove_header(&header::IF_MATCH); + req.remove_header(&header::IF_NONE_MATCH); + req.remove_header(&header::IF_MODIFIED_SINCE); + req.remove_header(&header::IF_UNMODIFIED_SINCE); + // see below range header + req.remove_header(&header::IF_RANGE); + + // remove downstream range header as we'd like to cache the entire response (this might change in the future) + req.remove_header(&header::RANGE); + + // we have a persumably staled response already, add precondition headers for revalidation + if let Some(m) = meta { + // rfc7232: "SHOULD send both validators in cache validation" but + // there have been weird cases that an origin has matching etag but not Last-Modified + if let Some(since) = m.headers().get(&header::LAST_MODIFIED) { + req.insert_header(header::IF_MODIFIED_SINCE, since).unwrap(); + } + if let Some(etag) = m.headers().get(&header::ETAG) { + req.insert_header(header::IF_NONE_MATCH, etag).unwrap(); + } + } + + Ok(()) + } +} + +#[cfg(test)] +mod tests { + use super::*; + use http::header::{HeaderName, CACHE_CONTROL, EXPIRES, SET_COOKIE}; + use http::StatusCode; + use httpdate::fmt_http_date; + + fn init_log() { + let _ = env_logger::builder().is_test(true).try_init(); + } + + const DEFAULTS: CacheMetaDefaults = CacheMetaDefaults::new( + |status| match status { + StatusCode::OK => Some(10), + StatusCode::NOT_FOUND => Some(5), + StatusCode::PARTIAL_CONTENT => None, + _ => Some(1), + }, + 0, + u32::MAX, /* "infinite" stale-if-error */ + ); + + // Cache nothing, by default + const BYPASS_CACHE_DEFAULTS: CacheMetaDefaults = CacheMetaDefaults::new(|_| None, 0, 0); + + fn build_response(status: u16, headers: &[(HeaderName, &str)]) -> ResponseHeader { + let mut header = ResponseHeader::build(status, Some(headers.len())).unwrap(); + for (k, v) in headers { + header.append_header(k.to_string(), *v).unwrap(); + } + header + } + + fn resp_cacheable_wrapper( + resp: &ResponseHeader, + defaults: &CacheMetaDefaults, + authorization_present: bool, + ) -> Option<CacheMeta> { + if let Cacheable(meta) = resp_cacheable( + CacheControl::from_resp_headers(resp).as_ref(), + resp, + authorization_present, + defaults, + ) { + Some(meta) + } else { + None + } + } + + #[test] + fn test_resp_cacheable() { + let meta = resp_cacheable_wrapper( + &build_response(200, &[(CACHE_CONTROL, "max-age=12345")]), + &DEFAULTS, + false, + ); + + let meta = meta.unwrap(); + assert!(meta.is_fresh(SystemTime::now())); + assert!(meta.is_fresh( + SystemTime::now() + .checked_add(Duration::from_secs(12)) + .unwrap() + ),); + assert!(!meta.is_fresh( + SystemTime::now() + .checked_add(Duration::from_secs(12346)) + .unwrap() + )); + } + + #[test] + fn test_resp_uncacheable_directives() { + let meta = resp_cacheable_wrapper( + &build_response(200, &[(CACHE_CONTROL, "private, max-age=12345")]), + &DEFAULTS, + false, + ); + assert!(meta.is_none()); + + let meta = resp_cacheable_wrapper( + &build_response(200, &[(CACHE_CONTROL, "no-store, max-age=12345")]), + &DEFAULTS, + false, + ); + assert!(meta.is_none()); + } + + #[test] + fn test_resp_cache_authorization() { + let meta = resp_cacheable_wrapper(&build_response(200, &[]), &DEFAULTS, true); + assert!(meta.is_none()); + + let meta = resp_cacheable_wrapper( + &build_response(200, &[(CACHE_CONTROL, "max-age=10")]), + &DEFAULTS, + true, + ); + assert!(meta.is_none()); + + let meta = resp_cacheable_wrapper( + &build_response(200, &[(CACHE_CONTROL, "s-maxage=10")]), + &DEFAULTS, + true, + ); + assert!(meta.unwrap().is_fresh(SystemTime::now())); + + let meta = resp_cacheable_wrapper( + &build_response(200, &[(CACHE_CONTROL, "public, max-age=10")]), + &DEFAULTS, + true, + ); + assert!(meta.unwrap().is_fresh(SystemTime::now())); + + let meta = resp_cacheable_wrapper( + &build_response(200, &[(CACHE_CONTROL, "must-revalidate")]), + &DEFAULTS, + true, + ); + assert!(meta.unwrap().is_fresh(SystemTime::now())); + } + + #[test] + fn test_resp_zero_max_age() { + let meta = resp_cacheable_wrapper( + &build_response(200, &[(CACHE_CONTROL, "max-age=0, public")]), + &DEFAULTS, + false, + ); + + // cacheable, but needs revalidation + assert!(!meta.unwrap().is_fresh(SystemTime::now())); + } + + #[test] + fn test_resp_expires() { + let five_sec_time = SystemTime::now() + .checked_add(Duration::from_secs(5)) + .unwrap(); + + // future expires is cacheable + let meta = resp_cacheable_wrapper( + &build_response(200, &[(EXPIRES, &fmt_http_date(five_sec_time))]), + &DEFAULTS, + false, + ); + + let meta = meta.unwrap(); + assert!(meta.is_fresh(SystemTime::now())); + assert!(!meta.is_fresh( + SystemTime::now() + .checked_add(Duration::from_secs(6)) + .unwrap() + )); + + // even on default uncacheable statuses + let meta = resp_cacheable_wrapper( + &build_response(206, &[(EXPIRES, &fmt_http_date(five_sec_time))]), + &DEFAULTS, + false, + ); + assert!(meta.is_some()); + } + + #[test] + fn test_resp_past_expires() { + // cacheable, but expired + let meta = resp_cacheable_wrapper( + &build_response(200, &[(EXPIRES, "Fri, 15 May 2015 15:34:21 GMT")]), + &BYPASS_CACHE_DEFAULTS, + false, + ); + assert!(!meta.unwrap().is_fresh(SystemTime::now())); + } + + #[test] + fn test_resp_nonstandard_expires() { + // init log to allow inspecting warnings + init_log(); + + // invalid cases, according to parser + // (but should be stale according to RFC) + let meta = resp_cacheable_wrapper( + &build_response(200, &[(EXPIRES, "Mon, 13 Feb 0002 12:00:00 GMT")]), + &BYPASS_CACHE_DEFAULTS, + false, + ); + assert!(!meta.unwrap().is_fresh(SystemTime::now())); + + let meta = resp_cacheable_wrapper( + &build_response(200, &[(EXPIRES, "Fri, 01 Dec 99999 16:00:00 GMT")]), + &BYPASS_CACHE_DEFAULTS, + false, + ); + assert!(!meta.unwrap().is_fresh(SystemTime::now())); + + let meta = resp_cacheable_wrapper( + &build_response(200, &[(EXPIRES, "0")]), + &BYPASS_CACHE_DEFAULTS, + false, + ); + assert!(!meta.unwrap().is_fresh(SystemTime::now())); + } + + #[test] + fn test_resp_multiple_expires() { + let five_sec_time = SystemTime::now() + .checked_add(Duration::from_secs(5)) + .unwrap(); + let ten_sec_time = SystemTime::now() + .checked_add(Duration::from_secs(10)) + .unwrap(); + + // multiple expires = uncacheable + let meta = resp_cacheable_wrapper( + &build_response( + 200, + &[ + (EXPIRES, &fmt_http_date(five_sec_time)), + (EXPIRES, &fmt_http_date(ten_sec_time)), + ], + ), + &BYPASS_CACHE_DEFAULTS, + false, + ); + assert!(meta.is_none()); + + // unless the default is cacheable + let meta = resp_cacheable_wrapper( + &build_response( + 200, + &[ + (EXPIRES, &fmt_http_date(five_sec_time)), + (EXPIRES, &fmt_http_date(ten_sec_time)), + ], + ), + &DEFAULTS, + false, + ); + assert!(meta.is_some()); + } + + #[test] + fn test_resp_cache_control_with_expires() { + let five_sec_time = SystemTime::now() + .checked_add(Duration::from_secs(5)) + .unwrap(); + // cache-control takes precedence over expires + let meta = resp_cacheable_wrapper( + &build_response( + 200, + &[ + (EXPIRES, &fmt_http_date(five_sec_time)), + (CACHE_CONTROL, "max-age=0"), + ], + ), + &DEFAULTS, + false, + ); + assert!(!meta.unwrap().is_fresh(SystemTime::now())); + } + + #[test] + fn test_resp_stale_while_revalidate() { + // respect defaults + let meta = resp_cacheable_wrapper( + &build_response(200, &[(CACHE_CONTROL, "max-age=10")]), + &DEFAULTS, + false, + ); + + let meta = meta.unwrap(); + let eleven_sec_time = SystemTime::now() + .checked_add(Duration::from_secs(11)) + .unwrap(); + assert!(!meta.is_fresh(eleven_sec_time)); + assert!(!meta.serve_stale_while_revalidate(SystemTime::now())); + assert!(!meta.serve_stale_while_revalidate(eleven_sec_time)); + + // override with stale-while-revalidate + let meta = resp_cacheable_wrapper( + &build_response( + 200, + &[(CACHE_CONTROL, "max-age=10, stale-while-revalidate=5")], + ), + &DEFAULTS, + false, + ); + + let meta = meta.unwrap(); + let eleven_sec_time = SystemTime::now() + .checked_add(Duration::from_secs(11)) + .unwrap(); + let sixteen_sec_time = SystemTime::now() + .checked_add(Duration::from_secs(16)) + .unwrap(); + assert!(!meta.is_fresh(eleven_sec_time)); + assert!(meta.serve_stale_while_revalidate(eleven_sec_time)); + assert!(!meta.serve_stale_while_revalidate(sixteen_sec_time)); + } + + #[test] + fn test_resp_stale_if_error() { + // respect defaults + let meta = resp_cacheable_wrapper( + &build_response(200, &[(CACHE_CONTROL, "max-age=10")]), + &DEFAULTS, + false, + ); + + let meta = meta.unwrap(); + let hundred_years_time = SystemTime::now() + .checked_add(Duration::from_secs(86400 * 365 * 100)) + .unwrap(); + assert!(!meta.is_fresh(hundred_years_time)); + assert!(meta.serve_stale_if_error(hundred_years_time)); + + // override with stale-if-error + let meta = resp_cacheable_wrapper( + &build_response( + 200, + &[( + CACHE_CONTROL, + "max-age=10, stale-while-revalidate=5, stale-if-error=60", + )], + ), + &DEFAULTS, + false, + ); + + let meta = meta.unwrap(); + let eleven_sec_time = SystemTime::now() + .checked_add(Duration::from_secs(11)) + .unwrap(); + let seventy_sec_time = SystemTime::now() + .checked_add(Duration::from_secs(70)) + .unwrap(); + assert!(!meta.is_fresh(eleven_sec_time)); + assert!(meta.serve_stale_if_error(SystemTime::now())); + assert!(meta.serve_stale_if_error(eleven_sec_time)); + assert!(!meta.serve_stale_if_error(seventy_sec_time)); + + // never serve stale + let meta = resp_cacheable_wrapper( + &build_response(200, &[(CACHE_CONTROL, "max-age=10, stale-if-error=0")]), + &DEFAULTS, + false, + ); + + let meta = meta.unwrap(); + let eleven_sec_time = SystemTime::now() + .checked_add(Duration::from_secs(11)) + .unwrap(); + assert!(!meta.is_fresh(eleven_sec_time)); + assert!(!meta.serve_stale_if_error(eleven_sec_time)); + } + + #[test] + fn test_resp_status_cache_defaults() { + // 200 response + let meta = resp_cacheable_wrapper(&build_response(200, &[]), &DEFAULTS, false); + assert!(meta.is_some()); + + let meta = meta.unwrap(); + assert!(meta.is_fresh( + SystemTime::now() + .checked_add(Duration::from_secs(9)) + .unwrap() + )); + assert!(!meta.is_fresh( + SystemTime::now() + .checked_add(Duration::from_secs(11)) + .unwrap() + )); + + // 404 response, different ttl + let meta = resp_cacheable_wrapper(&build_response(404, &[]), &DEFAULTS, false); + assert!(meta.is_some()); + + let meta = meta.unwrap(); + assert!(meta.is_fresh( + SystemTime::now() + .checked_add(Duration::from_secs(4)) + .unwrap() + )); + assert!(!meta.is_fresh( + SystemTime::now() + .checked_add(Duration::from_secs(6)) + .unwrap() + )); + + // 206 marked uncacheable (no cache TTL) + let meta = resp_cacheable_wrapper(&build_response(206, &[]), &DEFAULTS, false); + assert!(meta.is_none()); + + // default uncacheable status with explicit Cache-Control is cacheable + let meta = resp_cacheable_wrapper( + &build_response(206, &[(CACHE_CONTROL, "public, max-age=10")]), + &DEFAULTS, + false, + ); + assert!(meta.is_some()); + + let meta = meta.unwrap(); + assert!(meta.is_fresh( + SystemTime::now() + .checked_add(Duration::from_secs(9)) + .unwrap() + )); + assert!(!meta.is_fresh( + SystemTime::now() + .checked_add(Duration::from_secs(11)) + .unwrap() + )); + + // 416 matches any status + let meta = resp_cacheable_wrapper(&build_response(416, &[]), &DEFAULTS, false); + assert!(meta.is_some()); + + let meta = meta.unwrap(); + assert!(meta.is_fresh(SystemTime::now())); + assert!(!meta.is_fresh( + SystemTime::now() + .checked_add(Duration::from_secs(2)) + .unwrap() + )); + } + + #[test] + fn test_resp_cache_no_cache_fields() { + // check #field-names are stripped from the cache header + let meta = resp_cacheable_wrapper( + &build_response( + 200, + &[ + (SET_COOKIE, "my-cookie"), + (CACHE_CONTROL, "private=\"something\", max-age=10"), + (HeaderName::from_bytes(b"Something").unwrap(), "foo"), + ], + ), + &DEFAULTS, + false, + ); + let meta = meta.unwrap(); + assert!(meta.headers().contains_key(SET_COOKIE)); + assert!(!meta.headers().contains_key("Something")); + + let meta = resp_cacheable_wrapper( + &build_response( + 200, + &[ + (SET_COOKIE, "my-cookie"), + ( + CACHE_CONTROL, + "max-age=0, no-cache=\"meta1, SeT-Cookie ,meta2\"", + ), + (HeaderName::from_bytes(b"meta1").unwrap(), "foo"), + ], + ), + &DEFAULTS, + false, + ); + let meta = meta.unwrap(); + assert!(!meta.headers().contains_key(SET_COOKIE)); + assert!(!meta.headers().contains_key("meta1")); + } +} diff --git a/pingora-cache/src/hashtable.rs b/pingora-cache/src/hashtable.rs new file mode 100644 index 0000000..a89f9ad --- /dev/null +++ b/pingora-cache/src/hashtable.rs @@ -0,0 +1,112 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Concurrent hash tables and LRUs + +use lru::LruCache; +use parking_lot::{RwLock, RwLockReadGuard, RwLockWriteGuard}; +use std::collections::HashMap; + +// There are probably off-the-shelf crates of this, DashMap? +/// A hash table that shards to a constant number of tables to reduce lock contention +pub struct ConcurrentHashTable<V, const N: usize> { + tables: [RwLock<HashMap<u128, V>>; N], +} + +#[inline] +fn get_shard(key: u128, n_shards: usize) -> usize { + (key % n_shards as u128) as usize +} + +impl<V, const N: usize> ConcurrentHashTable<V, N> +where + [RwLock<HashMap<u128, V>>; N]: Default, +{ + pub fn new() -> Self { + ConcurrentHashTable { + tables: Default::default(), + } + } + pub fn get(&self, key: u128) -> &RwLock<HashMap<u128, V>> { + &self.tables[get_shard(key, N)] + } + + #[allow(dead_code)] + pub fn read(&self, key: u128) -> RwLockReadGuard<HashMap<u128, V>> { + self.get(key).read() + } + + pub fn write(&self, key: u128) -> RwLockWriteGuard<HashMap<u128, V>> { + self.get(key).write() + } + + // TODO: work out the lifetimes to provide get/set directly +} + +impl<V, const N: usize> Default for ConcurrentHashTable<V, N> +where + [RwLock<HashMap<u128, V>>; N]: Default, +{ + fn default() -> Self { + Self::new() + } +} + +#[doc(hidden)] // not need in public API +pub struct LruShard<V>(RwLock<LruCache<u128, V>>); +impl<V> Default for LruShard<V> { + fn default() -> Self { + // help satisfy default construction of array + LruShard(RwLock::new(LruCache::unbounded())) + } +} + +/// Sharded concurrent data structure for LruCache +pub struct ConcurrentLruCache<V, const N: usize> { + lrus: [LruShard<V>; N], +} + +impl<V, const N: usize> ConcurrentLruCache<V, N> +where + [LruShard<V>; N]: Default, +{ + pub fn new(shard_capacity: usize) -> Self { + use std::num::NonZeroUsize; + // safe, 1 != 0 + const ONE: NonZeroUsize = unsafe { NonZeroUsize::new_unchecked(1) }; + let mut cache = ConcurrentLruCache { + lrus: Default::default(), + }; + for lru in &mut cache.lrus { + lru.0 + .write() + .resize(shard_capacity.try_into().unwrap_or(ONE)); + } + cache + } + pub fn get(&self, key: u128) -> &RwLock<LruCache<u128, V>> { + &self.lrus[get_shard(key, N)].0 + } + + #[allow(dead_code)] + pub fn read(&self, key: u128) -> RwLockReadGuard<LruCache<u128, V>> { + self.get(key).read() + } + + pub fn write(&self, key: u128) -> RwLockWriteGuard<LruCache<u128, V>> { + self.get(key).write() + } + + // TODO: work out the lifetimes to provide get/set directly +} diff --git a/pingora-cache/src/key.rs b/pingora-cache/src/key.rs new file mode 100644 index 0000000..26e9362 --- /dev/null +++ b/pingora-cache/src/key.rs @@ -0,0 +1,302 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Cache key + +use super::*; + +use blake2::{Blake2b, Digest}; +use serde::{Deserialize, Serialize}; + +// 16-byte / 128-bit key: large enough to avoid collision +const KEY_SIZE: usize = 16; + +/// An 128 bit hash binary +pub type HashBinary = [u8; KEY_SIZE]; + +fn hex2str(hex: &[u8]) -> String { + use std::fmt::Write; + let mut s = String::with_capacity(KEY_SIZE * 2); + for c in hex { + write!(s, "{:02x}", c).unwrap(); // safe, just dump hex to string + } + s +} + +/// Decode the hex str into [HashBinary]. +/// +/// Return `None` when the decode fails or the input is not exact 32 (to decode to 16 bytes). +pub fn str2hex(s: &str) -> Option<HashBinary> { + if s.len() != KEY_SIZE * 2 { + return None; + } + let mut output = [0; KEY_SIZE]; + // no need to bubble the error, it should be obvious why the decode fails + hex::decode_to_slice(s.as_bytes(), &mut output).ok()?; + Some(output) +} + +/// The trait for cache key +pub trait CacheHashKey { + /// Return the hash of the cache key + fn primary_bin(&self) -> HashBinary; + + /// Return the variance hash of the cache key. + /// + /// `None` if no variance. + fn variance_bin(&self) -> Option<HashBinary>; + + /// Return the hash including both primary and variance keys + fn combined_bin(&self) -> HashBinary { + let key = self.primary_bin(); + if let Some(v) = self.variance_bin() { + let mut hasher = Blake2b128::new(); + hasher.update(key); + hasher.update(v); + hasher.finalize().into() + } else { + // if there is no variance, combined_bin should return the same as primary_bin + key + } + } + + /// An extra tag for identifying users + /// + /// For example if the storage backend implements per user quota, this tag can be used. + fn user_tag(&self) -> &str; + + /// The hex string of [Self::primary_bin()] + fn primary(&self) -> String { + hex2str(&self.primary_bin()) + } + + /// The hex string of [Self::variance_bin()] + fn variance(&self) -> Option<String> { + self.variance_bin().as_ref().map(|b| hex2str(&b[..])) + } + + /// The hex string of [Self::combined_bin()] + fn combined(&self) -> String { + hex2str(&self.combined_bin()) + } +} + +/// General purpose cache key +#[derive(Debug, Clone)] +pub struct CacheKey { + // All strings for now, can be more structural as long as it can hash + namespace: String, + primary: String, + variance: Option<HashBinary>, + /// An extra tag for identifying users + /// + /// For example if the storage backend implements per user quota, this tag can be used. + pub user_tag: String, +} + +impl CacheKey { + /// Set the value of the variance hash + pub fn set_variance_key(&mut self, key: HashBinary) { + self.variance = Some(key) + } + + /// Get the value of the variance hash + pub fn get_variance_key(&self) -> Option<&HashBinary> { + self.variance.as_ref() + } + + /// Removes the variance from this cache key + pub fn remove_variance_key(&mut self) { + self.variance = None + } +} + +/// Storage optimized cache key to keep in memory or in storage +// 16 bytes + 8 bytes (+16 * u8) + user_tag.len() + 16 Bytes (Box<str>) +#[derive(Debug, Deserialize, Serialize, Clone, Hash, PartialEq, Eq)] +pub struct CompactCacheKey { + pub primary: HashBinary, + // save 8 bytes for non-variance but waste 8 bytes for variance vs, store flat 16 bytes + pub variance: Option<Box<HashBinary>>, + pub user_tag: Box<str>, // the len should be small to keep memory usage bounded +} + +impl CacheHashKey for CompactCacheKey { + fn primary_bin(&self) -> HashBinary { + self.primary + } + + fn variance_bin(&self) -> Option<HashBinary> { + self.variance.as_ref().map(|s| *s.as_ref()) + } + + fn user_tag(&self) -> &str { + &self.user_tag + } +} + +/* + * We use blake2 hashing, which is faster and more secure, to replace md5. + * We have not given too much thought on whether non-crypto hash can be safely + * use because hashing performance is not critical. + * Note: we should avoid hashes like ahash which does not have consistent output + * across machines because it is designed purely for in memory hashtable +*/ + +// hash output: we use 128 bits (16 bytes) hash which will map to 32 bytes hex string +pub(crate) type Blake2b128 = Blake2b<blake2::digest::consts::U16>; + +/// helper function: hash str to u8 +pub fn hash_u8(key: &str) -> u8 { + let mut hasher = Blake2b128::new(); + hasher.update(key); + let raw = hasher.finalize(); + raw[0] +} + +/// helper function: hash str to [HashBinary] +pub fn hash_key(key: &str) -> HashBinary { + let mut hasher = Blake2b128::new(); + hasher.update(key); + let raw = hasher.finalize(); + raw.into() +} + +impl CacheKey { + fn primary_hasher(&self) -> Blake2b128 { + let mut hasher = Blake2b128::new(); + hasher.update(&self.namespace); + hasher.update(&self.primary); + hasher + } + + /// Create a default [CacheKey] from a request, which just takes it URI as the primary key. + pub fn default(req_header: &ReqHeader) -> Self { + CacheKey { + namespace: "".into(), + primary: format!("{}", req_header.uri), + variance: None, + user_tag: "".into(), + } + } + + /// Create a new [CacheKey] from the given namespace, primary, and user_tag string. + /// + /// Both `namespace` and `primary` will be used for the primary hash + pub fn new<S1, S2, S3>(namespace: S1, primary: S2, user_tag: S3) -> Self + where + S1: Into<String>, + S2: Into<String>, + S3: Into<String>, + { + CacheKey { + namespace: namespace.into(), + primary: primary.into(), + variance: None, + user_tag: user_tag.into(), + } + } + + /// Return the namespace of this key + pub fn namespace(&self) -> &str { + &self.namespace + } + + /// Return the primary key of this key + pub fn primary_key(&self) -> &str { + &self.primary + } + + /// Convert this key to [CompactCacheKey]. + pub fn to_compact(&self) -> CompactCacheKey { + let primary = self.primary_bin(); + CompactCacheKey { + primary, + variance: self.variance_bin().map(Box::new), + user_tag: self.user_tag.clone().into_boxed_str(), + } + } +} + +impl CacheHashKey for CacheKey { + fn primary_bin(&self) -> HashBinary { + self.primary_hasher().finalize().into() + } + + fn variance_bin(&self) -> Option<HashBinary> { + self.variance + } + + fn user_tag(&self) -> &str { + &self.user_tag + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_cache_key_hash() { + let key = CacheKey { + namespace: "".into(), + primary: "aa".into(), + variance: None, + user_tag: "1".into(), + }; + let hash = key.primary(); + assert_eq!(hash, "ac10f2aef117729f8dad056b3059eb7e"); + assert!(key.variance().is_none()); + assert_eq!(key.combined(), hash); + let compact = key.to_compact(); + assert_eq!(compact.primary(), hash); + assert!(compact.variance().is_none()); + assert_eq!(compact.combined(), hash); + } + + #[test] + fn test_cache_key_vary_hash() { + let key = CacheKey { + namespace: "".into(), + primary: "aa".into(), + variance: Some([0u8; 16]), + user_tag: "1".into(), + }; + let hash = key.primary(); + assert_eq!(hash, "ac10f2aef117729f8dad056b3059eb7e"); + assert_eq!(key.variance().unwrap(), "00000000000000000000000000000000"); + assert_eq!(key.combined(), "004174d3e75a811a5b44c46b3856f3ee"); + let compact = key.to_compact(); + assert_eq!(compact.primary(), "ac10f2aef117729f8dad056b3059eb7e"); + assert_eq!( + compact.variance().unwrap(), + "00000000000000000000000000000000" + ); + assert_eq!(compact.combined(), "004174d3e75a811a5b44c46b3856f3ee"); + } + + #[test] + fn test_hex_str() { + let mut key = [0; KEY_SIZE]; + for (i, v) in key.iter_mut().enumerate() { + // key: [0, 1, 2, .., 15] + *v = i as u8; + } + let hex_str = hex2str(&key); + let key2 = str2hex(&hex_str).unwrap(); + for i in 0..KEY_SIZE { + assert_eq!(key[i], key2[i]); + } + } +} diff --git a/pingora-cache/src/lib.rs b/pingora-cache/src/lib.rs new file mode 100644 index 0000000..be352dc --- /dev/null +++ b/pingora-cache/src/lib.rs @@ -0,0 +1,1093 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! The HTTP caching layer for proxies. + +#![allow(clippy::new_without_default)] + +use http::{method::Method, request::Parts as ReqHeader, response::Parts as RespHeader}; +use key::{CacheHashKey, HashBinary}; +use lock::WritePermit; +use pingora_error::Result; +use pingora_http::ResponseHeader; +use std::time::{Duration, SystemTime}; +use trace::CacheTraceCTX; + +pub mod cache_control; +pub mod eviction; +pub mod filters; +pub mod hashtable; +pub mod key; +pub mod lock; +pub mod max_file_size; +mod memory; +pub mod meta; +pub mod predictor; +pub mod put; +pub mod storage; +pub mod trace; +mod variance; + +use crate::max_file_size::MaxFileSizeMissHandler; +pub use key::CacheKey; +use lock::{CacheLock, LockStatus, Locked}; +pub use memory::MemCache; +pub use meta::{CacheMeta, CacheMetaDefaults}; +pub use storage::{HitHandler, MissHandler, Storage}; +pub use variance::VarianceBuilder; + +pub mod prelude {} + +/// The state machine for http caching +/// +/// This object is used to handle the state and transitions for HTTP caching through the life of a +/// request. +pub struct HttpCache { + phase: CachePhase, + // Box the rest so that a disabled HttpCache struct is small + inner: Option<Box<HttpCacheInner>>, +} + +/// This reflects the phase of HttpCache during the lifetime of a request +#[derive(Clone, Copy, Debug, PartialEq, Eq)] +pub enum CachePhase { + /// Cache disabled, with reason (NeverEnabled if never explicitly used) + Disabled(NoCacheReason), + /// Cache enabled but nothing is set yet + Uninit, + /// Cache was enabled, the request decided not to use it + // HttpCache.inner is kept + Bypass, + /// Awaiting cache key to be generated + CacheKey, + /// Cache hit + Hit, + /// No cached asset is found + Miss, + /// A staled (expired) asset is found + Stale, + /// A staled (expired) asset was found, so a fresh one was fetched + Expired, + /// A staled (expired) asset was found, and it was revalidated to be fresh + Revalidated, + /// Revalidated, but deemed uncacheable so we do not freshen it + RevalidatedNoCache(NoCacheReason), +} + +impl CachePhase { + /// Convert [CachePhase] as `str`, for logging and debugging. + pub fn as_str(&self) -> &'static str { + match self { + CachePhase::Disabled(_) => "disabled", + CachePhase::Uninit => "uninitialized", + CachePhase::Bypass => "bypass", + CachePhase::CacheKey => "key", + CachePhase::Hit => "hit", + CachePhase::Miss => "miss", + CachePhase::Stale => "stale", + CachePhase::Expired => "expired", + CachePhase::Revalidated => "revalidated", + CachePhase::RevalidatedNoCache(_) => "revalidated-nocache", + } + } +} + +/// The possible reasons for not caching +#[derive(Copy, Clone, Debug, PartialEq, Eq)] +pub enum NoCacheReason { + /// Caching is not enabled to begin with + NeverEnabled, + /// Origin directives indicated this was not cacheable + OriginNotCache, + /// Response size was larger than the cache's configured maximum asset size + ResponseTooLarge, + /// Due to internal caching storage error + StorageError, + /// Due to other type of internal issues + InternalError, + /// will be cacheable but skip cache admission now + /// + /// This happens when the cache predictor predicted that this request is not cacheable but + /// the response turns out to be OK to cache. However it might be too large to re-enable caching + /// for this request. + Deferred, + /// The writer of the cache lock sees that the request is not cacheable (Could be OriginNotCache) + CacheLockGiveUp, + /// This request waited too long for the writer of the cache lock to finish, so this request will + /// fetch from the origin without caching + CacheLockTimeout, + /// Other custom defined reasons + Custom(&'static str), +} + +impl NoCacheReason { + /// Convert [NoCacheReason] as `str`, for logging and debugging. + pub fn as_str(&self) -> &'static str { + use NoCacheReason::*; + match self { + NeverEnabled => "NeverEnabled", + OriginNotCache => "OriginNotCache", + ResponseTooLarge => "ResponseTooLarge", + StorageError => "StorageError", + InternalError => "InternalError", + Deferred => "Deferred", + CacheLockGiveUp => "CacheLockGiveUp", + CacheLockTimeout => "CacheLockTimeout", + Custom(s) => s, + } + } +} + +/// Response cacheable decision +/// +/// +#[derive(Debug)] +pub enum RespCacheable { + Cacheable(CacheMeta), + Uncacheable(NoCacheReason), +} + +impl RespCacheable { + /// Whether it is cacheable + #[inline] + pub fn is_cacheable(&self) -> bool { + matches!(*self, Self::Cacheable(_)) + } + + /// Unwrap [RespCacheable] to get the [CacheMeta] stored + /// # Panic + /// Panic when this object is not cacheable. Check [Self::is_cacheable()] first. + pub fn unwrap_meta(self) -> CacheMeta { + match self { + Self::Cacheable(meta) => meta, + Self::Uncacheable(_) => panic!("expected Cacheable value"), + } + } +} + +/// Freshness state of cache hit asset +/// +/// +#[derive(Debug, Copy, Clone)] +pub enum HitStatus { + Expired, + ForceExpired, + FailedHitFilter, + Fresh, +} + +impl HitStatus { + /// For displaying cache hit status + pub fn as_str(&self) -> &'static str { + match self { + Self::Expired => "expired", + Self::ForceExpired => "force_expired", + Self::FailedHitFilter => "failed_hit_filter", + Self::Fresh => "fresh", + } + } + + /// Whether cached asset can be served as fresh + pub fn is_fresh(&self) -> bool { + match self { + Self::Expired | Self::ForceExpired | Self::FailedHitFilter => false, + Self::Fresh => true, + } + } +} + +struct HttpCacheInner { + pub key: Option<CacheKey>, + pub meta: Option<CacheMeta>, + // when set, even if an asset exists, it would only be considered valid after this timestamp + pub valid_after: Option<SystemTime>, + // when set, an asset will be rejected from the cache if it exceeds this size in bytes + pub max_file_size_bytes: Option<usize>, + pub miss_handler: Option<MissHandler>, + pub body_reader: Option<HitHandler>, + pub storage: &'static (dyn storage::Storage + Sync), // static for now + pub eviction: Option<&'static (dyn eviction::EvictionManager + Sync)>, + pub predictor: Option<&'static (dyn predictor::CacheablePredictor + Sync)>, + pub lock: Option<Locked>, // TODO: these 3 fields should come in 1 sub struct + pub cache_lock: Option<&'static CacheLock>, + pub lock_duration: Option<Duration>, + pub traces: trace::CacheTraceCTX, +} + +impl HttpCache { + /// Create a new [HttpCache]. + /// + /// Caching is not enabled by default. + pub fn new() -> Self { + HttpCache { + phase: CachePhase::Disabled(NoCacheReason::NeverEnabled), + inner: None, + } + } + + /// Whether the cache is enabled + pub fn enabled(&self) -> bool { + !matches!(self.phase, CachePhase::Disabled(_) | CachePhase::Bypass) + } + + /// Whether the cache is being bypassed + pub fn bypassing(&self) -> bool { + matches!(self.phase, CachePhase::Bypass) + } + + /// Return the [CachePhase] + pub fn phase(&self) -> CachePhase { + self.phase + } + + /// Whether anything was fetched from the upstream + /// + /// This essentially checks all possible [CachePhase] who need to contact the upstream server + pub fn upstream_used(&self) -> bool { + use CachePhase::*; + match self.phase { + Disabled(_) | Bypass | Miss | Expired | Revalidated | RevalidatedNoCache(_) => true, + Hit | Stale => false, + Uninit | CacheKey => false, // invalid states for this call, treat them as false to keep it simple + } + } + + /// Check whether the backend storage is the type `T`. + pub fn storage_type_is<T: 'static>(&self) -> bool { + self.inner + .as_ref() + .and_then(|inner| inner.storage.as_any().downcast_ref::<T>()) + .is_some() + } + + /// Disable caching + pub fn disable(&mut self, reason: NoCacheReason) { + use NoCacheReason::*; + match self.phase { + CachePhase::Disabled(_) => { + // replace reason + self.phase = CachePhase::Disabled(reason); + } + _ => { + self.phase = CachePhase::Disabled(reason); + if let Some(inner) = self.inner.as_mut() { + let lock = inner.lock.take(); + if let Some(Locked::Write(_r)) = lock { + let lock_status = match reason { + // let next request try to fetch it + InternalError | StorageError | Deferred => LockStatus::TransientError, + // no need for the lock anymore + OriginNotCache | ResponseTooLarge => LockStatus::GiveUp, + // not sure which LockStatus make sense, we treat it as GiveUp for now + Custom(_) => LockStatus::GiveUp, + // should never happen, NeverEnabled shouldn't hold a lock + NeverEnabled => panic!("NeverEnabled holds a write lock"), + CacheLockGiveUp | CacheLockTimeout => { + panic!("CacheLock* are for cache lock readers only") + } + }; + inner + .cache_lock + .unwrap() + .release(inner.key.as_ref().unwrap(), lock_status); + } + } + // log initial disable reason + self.inner_mut() + .traces + .cache_span + .set_tag(|| trace::Tag::new("disable_reason", reason.as_str())); + self.inner = None; + } + } + } + + /* The following methods panic when they are used in the wrong phase. + * This is better than returning errors as such panics are only caused by coding error, which + * should be fixed right away. Tokio runtime only crashes the current task instead of the whole + * program when these panics happen. */ + + /// Set the cache to bypass + /// + /// # Panic + /// This call is only allowed in [CachePhase::CacheKey] phase (before any cache lookup is performed). + /// Use it in any other phase will lead to panic. + pub fn bypass(&mut self) { + match self.phase { + CachePhase::CacheKey => { + // before cache lookup / found / miss + self.phase = CachePhase::Bypass; + self.inner_mut() + .traces + .cache_span + .set_tag(|| trace::Tag::new("bypassed", true)); + } + _ => panic!("wrong phase to bypass HttpCache {:?}", self.phase), + } + } + + /// Enable the cache + /// + /// - `storage`: the cache storage backend that implements [storage::Storage] + /// - `eviction`: optionally the eviction mananger, without it, nothing will be evicted from the storage + /// - `predictor`: optionally a cache predictor. The cache predictor predicts whether something is likely + /// to be cacheable or not. This is useful because the proxy can apply different types of optimization to + /// cacheable and uncacheable requests. + /// - `cache_lock`: optionally a cache lock which handles concurrent lookups to the same asset. Without it + /// such lookups will all be allowed to fetch the asset independently. + pub fn enable( + &mut self, + storage: &'static (dyn storage::Storage + Sync), + eviction: Option<&'static (dyn eviction::EvictionManager + Sync)>, + predictor: Option<&'static (dyn predictor::CacheablePredictor + Sync)>, + cache_lock: Option<&'static CacheLock>, + ) { + match self.phase { + CachePhase::Disabled(_) => { + self.phase = CachePhase::Uninit; + self.inner = Some(Box::new(HttpCacheInner { + key: None, + meta: None, + valid_after: None, + max_file_size_bytes: None, + miss_handler: None, + body_reader: None, + storage, + eviction, + predictor, + lock: None, + cache_lock, + lock_duration: None, + traces: CacheTraceCTX::new(), + })); + } + _ => panic!("Cannot enable already enabled HttpCache {:?}", self.phase), + } + } + + // Enable distributed tracing + pub fn enable_tracing(&mut self, parent_span: trace::Span) { + if let Some(inner) = self.inner.as_mut() { + inner.traces.enable(parent_span); + } + } + + // Get the cache `miss` tracing span + pub fn get_miss_span(&mut self) -> Option<trace::SpanHandle> { + self.inner.as_mut().map(|i| i.traces.get_miss_span()) + } + + // shortcut to access inner, panic if phase is disabled + #[inline] + fn inner_mut(&mut self) -> &mut HttpCacheInner { + self.inner.as_mut().unwrap() + } + + #[inline] + fn inner(&self) -> &HttpCacheInner { + self.inner.as_ref().unwrap() + } + + /// Set the cache key + /// # Panic + /// Cache key is only allowed to be set in its own phase. Set it in other phases will cause panic. + pub fn set_cache_key(&mut self, key: CacheKey) { + match self.phase { + CachePhase::Uninit | CachePhase::CacheKey => { + self.phase = CachePhase::CacheKey; + self.inner_mut().key = Some(key); + } + _ => panic!("wrong phase {:?}", self.phase), + } + } + + /// Return the cache key used for asset lookup + /// # Panic + /// Can only be called after cache key is set and cache is not disabled. Panic otherwise. + pub fn cache_key(&self) -> &CacheKey { + match self.phase { + CachePhase::Disabled(_) | CachePhase::Uninit => panic!("wrong phase {:?}", self.phase), + _ => self.inner().key.as_ref().unwrap(), + } + } + + /// Return the max size allowed to be cached. + pub fn max_file_size_bytes(&self) -> Option<usize> { + match self.phase { + CachePhase::Disabled(_) | CachePhase::Uninit => panic!("wrong phase {:?}", self.phase), + _ => self.inner().max_file_size_bytes, + } + } + + /// Set the maximum response _body_ size in bytes that will be admitted to the cache. + /// + /// Response header size does not contribute to the max file size. + pub fn set_max_file_size_bytes(&mut self, max_file_size_bytes: usize) { + match self.phase { + CachePhase::Disabled(_) => panic!("wrong phase {:?}", self.phase), + _ => { + self.inner_mut().max_file_size_bytes = Some(max_file_size_bytes); + } + } + } + + /// Set that cache is found in cache storage. + /// + /// This function is called after [Self::cache_lookup()] which returns the [CacheMeta] and + /// [HitHandler]. + /// + /// The `hit_status` enum allows the caller to force expire assets. + pub fn cache_found(&mut self, meta: CacheMeta, hit_handler: HitHandler, hit_status: HitStatus) { + match self.phase { + // Stale allowed because of cache lock and then retry + CachePhase::CacheKey | CachePhase::Stale => { + self.phase = if hit_status.is_fresh() { + CachePhase::Hit + } else { + CachePhase::Stale + }; + let phase = self.phase; + let inner = self.inner_mut(); + let key = inner.key.as_ref().unwrap(); + if phase == CachePhase::Stale { + if let Some(lock) = inner.cache_lock.as_ref() { + inner.lock = Some(lock.lock(key)); + } + } + inner.traces.log_meta(&meta); + if let Some(eviction) = inner.eviction { + // TODO: make access() accept CacheKey + let cache_key = key.to_compact(); + // FIXME: get size + eviction.access(&cache_key, 0, meta.0.internal.fresh_until); + } + inner.traces.start_hit_span(phase, hit_status); + inner.meta = Some(meta); + inner.body_reader = Some(hit_handler); + } + _ => panic!("wrong phase {:?}", self.phase), + } + } + + /// Mark `self` to be cache miss. + /// + /// This function is called after [Self::cache_lookup()] finds nothing or the caller decides + /// not to use the assets found. + /// # Panic + /// Panic in other phases. + pub fn cache_miss(&mut self) { + match self.phase { + // from CacheKey: set state to miss during cache lookup + // from Bypass: response became cacheable, set state to miss to cache + CachePhase::CacheKey | CachePhase::Bypass => { + self.phase = CachePhase::Miss; + self.inner_mut().traces.start_miss_span(); + } + _ => panic!("wrong phase {:?}", self.phase), + } + } + + /// Return the [HitHandler] + /// # Panic + /// Call this after [Self::cache_found()], panic in other phases. + pub fn hit_handler(&mut self) -> &mut HitHandler { + match self.phase { + CachePhase::Hit + | CachePhase::Stale + | CachePhase::Revalidated + | CachePhase::RevalidatedNoCache(_) => self.inner_mut().body_reader.as_mut().unwrap(), + _ => panic!("wrong phase {:?}", self.phase), + } + } + + /// Return the body reader during a cache admission(miss/expired) which decouples the downstream + /// read and upstream cache write + pub fn miss_body_reader(&mut self) -> Option<&mut HitHandler> { + match self.phase { + CachePhase::Miss | CachePhase::Expired => { + let inner = self.inner_mut(); + if inner.storage.support_streaming_partial_write() { + inner.body_reader.as_mut() + } else { + // body_reader could be set even when the storage doesn't support streaming + // Expired cache would have the reader set. + None + } + } + _ => None, + } + } + + /// Call this when cache hit is fully read. + /// + /// This call will release resource if any and log the timing in tracing if set. + /// # Panic + /// Panic in phases where there is no cache hit. + pub async fn finish_hit_handler(&mut self) -> Result<()> { + match self.phase { + CachePhase::Hit + | CachePhase::Miss + | CachePhase::Expired + | CachePhase::Stale + | CachePhase::Revalidated + | CachePhase::RevalidatedNoCache(_) => { + let inner = self.inner_mut(); + if inner.body_reader.is_none() { + // already finished, we allow calling this function more than once + return Ok(()); + } + let body_reader = inner.body_reader.take().unwrap(); + let key = inner.key.as_ref().unwrap(); + let result = body_reader + .finish(inner.storage, key, &inner.traces.hit_span.handle()) + .await; + inner.traces.finish_hit_span(); + result + } + _ => panic!("wrong phase {:?}", self.phase), + } + } + + /// Set the [MissHandler] according to cache_key and meta, can only call once + pub async fn set_miss_handler(&mut self) -> Result<()> { + match self.phase { + // set_miss_handler() needs to be called after set_cache_meta() (which change Stale to Expire). + // This is an artificial rule to enforce the state transitions + CachePhase::Miss | CachePhase::Expired => { + let max_file_size_bytes = self.max_file_size_bytes(); + + let inner = self.inner_mut(); + if inner.miss_handler.is_some() { + panic!("write handler is already set") + } + let meta = inner.meta.as_ref().unwrap(); + let key = inner.key.as_ref().unwrap(); + let miss_handler = inner + .storage + .get_miss_handler(key, meta, &inner.traces.get_miss_span()) + .await?; + + inner.miss_handler = if let Some(max_size) = max_file_size_bytes { + Some(Box::new(MaxFileSizeMissHandler::new( + miss_handler, + max_size, + ))) + } else { + Some(miss_handler) + }; + + if inner.storage.support_streaming_partial_write() { + // If reader can access partial write, the cache lock can be released here + // to let readers start reading the body. + let lock = inner.lock.take(); + if let Some(Locked::Write(_r)) = lock { + inner.cache_lock.unwrap().release(key, LockStatus::Done); + } + // Downstream read and upstream write can be decoupled + let body_reader = inner + .storage + .lookup(key, &inner.traces.get_miss_span()) + .await?; + + if let Some((_meta, body_reader)) = body_reader { + inner.body_reader = Some(body_reader); + } else { + // body_reader should exist now because streaming_partial_write is to support it + panic!("unable to get body_reader for {:?}", meta); + } + } + Ok(()) + } + _ => panic!("wrong phase {:?}", self.phase), + } + } + + /// Return the [MissHandler] to write the response body to cache. + /// + /// `None`: the handler has not been set or already finished + pub fn miss_handler(&mut self) -> Option<&mut MissHandler> { + match self.phase { + CachePhase::Miss | CachePhase::Expired => self.inner_mut().miss_handler.as_mut(), + _ => panic!("wrong phase {:?}", self.phase), + } + } + + /// Finish cache admission + /// + /// If [self] is dropped without calling this, the cache admission is considered incomplete and + /// should be cleaned up. + /// + /// This call will also trigger eviction if set. + pub async fn finish_miss_handler(&mut self) -> Result<()> { + match self.phase { + CachePhase::Miss | CachePhase::Expired => { + let inner = self.inner_mut(); + if inner.miss_handler.is_none() { + // already finished, we allow calling this function more than once + return Ok(()); + } + let miss_handler = inner.miss_handler.take().unwrap(); + let size = miss_handler.finish().await?; + let lock = inner.lock.take(); + let key = inner.key.as_ref().unwrap(); + if let Some(Locked::Write(_r)) = lock { + // no need to call r.unlock() because release() will call it + // r is a guard to make sure the lock is unlocked when this request is dropped + inner.cache_lock.unwrap().release(key, LockStatus::Done); + } + if let Some(eviction) = inner.eviction { + let cache_key = key.to_compact(); + let meta = inner.meta.as_ref().unwrap(); + let evicted = eviction.admit(cache_key, size, meta.0.internal.fresh_until); + // TODO: make this async + let span = inner.traces.child("eviction"); + let handle = span.handle(); + for item in evicted { + // TODO: warn/log the error + let _ = inner.storage.purge(&item, &handle).await; + } + } + inner.traces.finish_miss_span(); + Ok(()) + } + _ => panic!("wrong phase {:?}", self.phase), + } + } + + /// Set the [CacheMeta] of the cache + pub fn set_cache_meta(&mut self, meta: CacheMeta) { + match self.phase { + // TODO: store the staled meta somewhere else for future use? + CachePhase::Stale | CachePhase::Miss => { + let inner = self.inner_mut(); + inner.traces.log_meta(&meta); + inner.meta = Some(meta); + } + _ => panic!("wrong phase {:?}", self.phase), + } + if self.phase == CachePhase::Stale { + self.phase = CachePhase::Expired; + } + } + + /// Set the [CacheMeta] of the cache after revalidation. + /// + /// Certain info such as the original cache admission time will be preserved. Others will + /// be replaced by the input `meta`. + pub async fn revalidate_cache_meta(&mut self, mut meta: CacheMeta) -> Result<bool> { + let result = match self.phase { + CachePhase::Stale => { + let inner = self.inner_mut(); + // TODO: we should keep old meta in place, just use new one to update it + // that requires cacheable_filter to take a mut header and just return InternalMeta + + // update new meta with old meta's created time + let created = inner.meta.as_ref().unwrap().0.internal.created; + meta.0.internal.created = created; + // meta.internal.updated was already set to new meta's `created`, + // no need to set `updated` here + + inner.meta.replace(meta); + + let lock = inner.lock.take(); + if let Some(Locked::Write(_r)) = lock { + inner + .cache_lock + .unwrap() + .release(inner.key.as_ref().unwrap(), LockStatus::Done); + } + + let mut span = inner.traces.child("update_meta"); + // TODO: this call can be async + let result = inner + .storage + .update_meta( + inner.key.as_ref().unwrap(), + inner.meta.as_ref().unwrap(), + &span.handle(), + ) + .await; + span.set_tag(|| trace::Tag::new("updated", result.is_ok())); + result + } + _ => panic!("wrong phase {:?}", self.phase), + }; + self.phase = CachePhase::Revalidated; + result + } + + /// After a successful revalidation, update certain headers for the cached asset + /// such as `Etag` with the fresh response header `resp`. + pub fn revalidate_merge_header(&mut self, resp: &RespHeader) -> ResponseHeader { + match self.phase { + CachePhase::Stale => { + /* + * https://datatracker.ietf.org/doc/html/rfc9110#section-15.4.5 + * 304 response MUST generate ... would have been sent in a 200 ... + * - Content-Location, Date, ETag, and Vary + * - Cache-Control and Expires... + */ + let mut old_header = self.inner().meta.as_ref().unwrap().0.header.clone(); + let mut clone_header = |header_name: &'static str| { + // TODO: multiple headers + if let Some(value) = resp.headers.get(header_name) { + old_header.insert_header(header_name, value).unwrap(); + } + }; + clone_header("cache-control"); + clone_header("expires"); + clone_header("cache-tag"); + clone_header("cdn-cache-control"); + clone_header("etag"); + // https://datatracker.ietf.org/doc/html/rfc9111#section-4.3.4 + // "...cache MUST update its header fields with the header fields provided in the 304..." + // But if the Vary header changes, the cached response may no longer match the + // incoming request. + // + // For simplicity, ignore changing Vary in revalidation for now. + // TODO: if we support vary during revalidation, there are a few edge cases to + // consider (what if Vary header appears/disappears/changes)? + // + // clone_header("vary"); + old_header + } + _ => panic!("wrong phase {:?}", self.phase), + } + } + + /// Mark this asset uncacheable after revalidation + pub fn revalidate_uncacheable(&mut self, header: ResponseHeader, reason: NoCacheReason) { + match self.phase { + CachePhase::Stale => { + // replace cache meta header + self.inner_mut().meta.as_mut().unwrap().0.header = header; + } + _ => panic!("wrong phase {:?}", self.phase), + } + self.phase = CachePhase::RevalidatedNoCache(reason); + // TODO: remove this asset from cache once finished? + } + + /// Update the variance of the [CacheMeta]. + /// + /// Note that this process may change the lookup `key`, and eventually (when the asset is + /// written to storage) invalidate other cached variants under the same primary key as the + /// current asset. + pub fn update_variance(&mut self, variance: Option<HashBinary>) { + // If this is a cache miss, we will simply update the variance in the meta. + // + // If this is an expired response, we will have to consider a few cases: + // + // **Case 1**: Variance was absent, but caller sets it now. + // We will just insert it into the meta. The current asset becomes the primary variant. + // Because the current location of the asset is already the primary variant, nothing else + // needs to be done. + // + // **Case 2**: Variance was present, but it changed or was removed. + // We want the current asset to take over the primary slot, in order to invalidate all + // other variants derived under the old Vary. + // + // **Case 3**: Variance did not change. + // Nothing needs to happen. + let inner = match self.phase { + CachePhase::Miss | CachePhase::Expired => self.inner_mut(), + _ => panic!("wrong phase {:?}", self.phase), + }; + + // Update the variance in the meta + if let Some(variance_hash) = variance.as_ref() { + inner + .meta + .as_mut() + .unwrap() + .set_variance_key(*variance_hash); + } else { + inner.meta.as_mut().unwrap().remove_variance(); + } + + // Change the lookup `key` if necessary, in order to admit asset into the primary slot + // instead of the secondary slot. + let key = inner.key.as_ref().unwrap(); + if let Some(old_variance) = key.get_variance_key().as_ref() { + // This is a secondary variant slot. + if Some(*old_variance) != variance.as_ref() { + // This new variance does not match the variance in the cache key we used to look + // up this asset. + // Drop the cache lock to avoid leaving a dangling lock + // (because we locked with the old cache key for the secondary slot) + // TODO: maybe we should try to signal waiting readers to compete for the primary key + // lock instead? we will not be modifying this secondary slot so it's not actually + // ready for readers + if let Some(lock) = inner.cache_lock.as_ref() { + lock.release(key, LockStatus::Done); + } + // Remove the `variance` from the `key`, so that we admit this asset into the + // primary slot. (`key` is used to tell storage where to write the data.) + inner.key.as_mut().unwrap().remove_variance_key(); + } + } + } + + /// Return the [CacheMeta] of this asset + /// + /// # Panic + /// Panic in phases which has no cache meta. + pub fn cache_meta(&self) -> &CacheMeta { + match self.phase { + // TODO: allow in Bypass phase? + CachePhase::Stale + | CachePhase::Expired + | CachePhase::Hit + | CachePhase::Revalidated + | CachePhase::RevalidatedNoCache(_) => self.inner().meta.as_ref().unwrap(), + CachePhase::Miss => { + // this is the async body read case, safe because body_reader is only set + // after meta is retrieved + if self.inner().body_reader.is_some() { + self.inner().meta.as_ref().unwrap() + } else { + panic!("wrong phase {:?}", self.phase); + } + } + + _ => panic!("wrong phase {:?}", self.phase), + } + } + + /// Return the [CacheMeta] of this asset if any + /// + /// Different from [Self::cache_meta()], this function is allowed to be called in + /// [CachePhase::Miss] phase where the cache meta maybe set. + /// # Panic + /// Panic in phases that shouldn't have cache meta. + pub fn maybe_cache_meta(&self) -> Option<&CacheMeta> { + match self.phase { + CachePhase::Miss + | CachePhase::Stale + | CachePhase::Expired + | CachePhase::Hit + | CachePhase::Revalidated + | CachePhase::RevalidatedNoCache(_) => self.inner().meta.as_ref(), + _ => panic!("wrong phase {:?}", self.phase), + } + } + + /// Perform the cache lookup from the given cache storage with the given cache key + /// + /// A cache hit will return [CacheMeta] which contains the header and meta info about + /// the cache as well as a [HitHandler] to read the cache hit body. + /// # Panic + /// Panic in other phases. + pub async fn cache_lookup(&mut self) -> Result<Option<(CacheMeta, HitHandler)>> { + match self.phase { + // Stale is allowed here because stale-> cache_lock -> lookup again + CachePhase::CacheKey | CachePhase::Stale => { + let inner = self.inner_mut(); + let mut span = inner.traces.child("lookup"); + let key = inner.key.as_ref().unwrap(); // safe, this phase should have cache key + let result = inner.storage.lookup(key, &span.handle()).await?; + let result = result.and_then(|(meta, header)| { + if let Some(ts) = inner.valid_after { + if meta.created() < ts { + span.set_tag(|| trace::Tag::new("not valid", true)); + return None; + } + } + Some((meta, header)) + }); + if result.is_none() { + if let Some(lock) = inner.cache_lock.as_ref() { + inner.lock = Some(lock.lock(key)); + } + } + span.set_tag(|| trace::Tag::new("found", result.is_some())); + Ok(result) + } + _ => panic!("wrong phase {:?}", self.phase), + } + } + + /// Update variance and see if the meta matches the current variance + /// + /// `cache_lookup() -> compute vary hash -> cache_vary_lookup()` + /// This function allows callers to compute vary based on the initial cache hit. + /// `meta` should be the ones returned from the initial cache_lookup() + /// - return true if the meta is the variance. + /// - return false if the current meta doesn't match the variance, need to cache_lookup() again + pub fn cache_vary_lookup(&mut self, variance: HashBinary, meta: &CacheMeta) -> bool { + match self.phase { + CachePhase::CacheKey => { + let inner = self.inner_mut(); + // make sure that all variance found are fresher than this asset + // this is because when purging all the variance, only the primary slot is deleted + // the created TS of the primary is the tombstone of all the variances + inner.valid_after = Some(meta.created()); + + // update vary + let key = inner.key.as_mut().unwrap(); + // if no variance was previously set, then this is the first cache hit + let is_initial_cache_hit = key.get_variance_key().is_none(); + key.set_variance_key(variance); + let variance_binary = key.variance_bin(); + let matches_variance = meta.variance() == variance_binary; + + // We should remove the variance in the lookup `key` if this is the primary variant + // slot. We know this is the primary variant slot if this is the initial cache hit + // AND the variance in the `key` already matches the `meta`'s.) + // + // For the primary variant slot, the storage backend needs to use the primary key + // for both cache lookup and updating the meta. Otherwise it will look for the + // asset in the wrong location during revalidation. + // + // We can recreate the "full" cache key by using the meta's variance, if needed. + if matches_variance && is_initial_cache_hit { + inner.key.as_mut().unwrap().remove_variance_key(); + } + + matches_variance + } + _ => panic!("wrong phase {:?}", self.phase), + } + } + + /// Whether this request is behind a cache lock in order to wait for another request to read the + /// asset. + pub fn is_cache_locked(&self) -> bool { + matches!(self.inner().lock, Some(Locked::Read(_))) + } + + /// Whether this request is the leader request to fetch the assets for itself and other requests + /// behind the cache lock. + pub fn is_cache_lock_writer(&self) -> bool { + matches!(self.inner().lock, Some(Locked::Write(_))) + } + + /// Take the write lock from this request to transfer it to another one. + /// # Panic + /// Call is_cache_lock_writer() to check first, will panic otherwise. + pub fn take_write_lock(&mut self) -> WritePermit { + let lock = self.inner_mut().lock.take().unwrap(); + match lock { + Locked::Write(w) => w, + Locked::Read(_) => panic!("take_write_lock() called on read lock"), + } + } + + /// Set the write lock, which is usually transferred from [Self::take_write_lock()] + pub fn set_write_lock(&mut self, write_lock: WritePermit) { + self.inner_mut().lock.replace(Locked::Write(write_lock)); + } + + /// Whether this request's cache hit is staled + fn has_staled_asset(&self) -> bool { + self.phase == CachePhase::Stale + } + + /// Whether this asset is staled and stale if error is allowed + pub fn can_serve_stale_error(&self) -> bool { + self.has_staled_asset() && self.cache_meta().serve_stale_if_error(SystemTime::now()) + } + + /// Whether this asset is staled and stale while revalidate is allowed. + pub fn can_serve_stale_updating(&self) -> bool { + self.has_staled_asset() + && self + .cache_meta() + .serve_stale_while_revalidate(SystemTime::now()) + } + + /// Wait for the cache read lock to be unlocked + /// # Panic + /// Check [Self::is_cache_locked()], panic if this request doesn't have a read lock. + pub async fn cache_lock_wait(&mut self) -> LockStatus { + let inner = self.inner_mut(); + let _span = inner.traces.child("cache_lock"); + let lock = inner.lock.take(); // remove the lock from self + if let Some(Locked::Read(r)) = lock { + let now = std::time::Instant::now(); + r.wait().await; + let lock_duration = now.elapsed(); + // it's possible for a request to be locked more than once + inner.lock_duration = Some( + inner + .lock_duration + .map_or(lock_duration, |d| d + lock_duration), + ); + r.lock_status() // TODO: tag the span with lock status + } else { + // should always call is_cache_locked() before this function + panic!("cache_lock_wait on wrong type of lock") + } + } + + /// How long did this request wait behind the read lock + pub fn lock_duration(&self) -> Option<Duration> { + // FIXME: this duration is lost when cache is disabled + self.inner.as_ref().and_then(|i| i.lock_duration) + } + + /// Delete the asset from the cache storage + /// # Panic + /// Need to be called after cache key is set. Panic otherwise. + pub async fn purge(&mut self) -> Result<bool> { + match self.phase { + CachePhase::CacheKey => { + let inner = self.inner_mut(); + let mut span = inner.traces.child("purge"); + let key = inner.key.as_ref().unwrap().to_compact(); + let result = inner.storage.purge(&key, &span.handle()).await; + // FIXME: also need to remove from eviction manager + span.set_tag(|| trace::Tag::new("purged", matches!(result, Ok(true)))); + result + } + _ => panic!("wrong phase {:?}", self.phase), + } + } + + /// Check the cacheable prediction + /// + /// Return true if the predictor is not set + pub fn cacheable_prediction(&self) -> bool { + if let Some(predictor) = self.inner().predictor { + predictor.cacheable_prediction(self.cache_key()) + } else { + true + } + } + + /// Tell the predictor that this response which is previously predicted to be uncacheable + /// is cacheable now. + pub fn response_became_cacheable(&self) { + if let Some(predictor) = self.inner().predictor { + predictor.mark_cacheable(self.cache_key()); + } + } + + /// Tell the predictor that this response is uncacheable so that it will know next time + /// this request arrives. + pub fn response_became_uncacheable(&self, reason: NoCacheReason) { + if let Some(predictor) = self.inner().predictor { + predictor.mark_uncacheable(self.cache_key(), reason); + } + } +} + +/// Set the header compression dictionary that help serialize http header. +/// +/// Return false if it is already set. +pub fn set_compression_dict_path(path: &str) -> bool { + crate::meta::COMPRESSION_DICT_PATH + .set(path.to_string()) + .is_ok() +} diff --git a/pingora-cache/src/lock.rs b/pingora-cache/src/lock.rs new file mode 100644 index 0000000..c5e3c31 --- /dev/null +++ b/pingora-cache/src/lock.rs @@ -0,0 +1,336 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Cache lock + +use crate::key::CacheHashKey; + +use crate::hashtable::ConcurrentHashTable; +use pingora_timeout::timeout; +use std::sync::Arc; + +const N_SHARDS: usize = 16; + +/// The global cache locking manager +pub struct CacheLock { + lock_table: ConcurrentHashTable<LockStub, N_SHARDS>, + timeout: Duration, // fixed timeout value for now +} + +/// A struct prepresenting a locked cache access +#[derive(Debug)] +pub enum Locked { + /// The writer is allowed to fetch the asset + Write(WritePermit), + /// The reader waits for the writer to fetch the asset + Read(ReadLock), +} + +impl Locked { + /// Is this a write lock + pub fn is_write(&self) -> bool { + matches!(self, Self::Write(_)) + } +} + +impl CacheLock { + /// Create a new [CacheLock] with the given lock timeout + /// + /// When the timeout is reached, the read locks are automatically unlocked + pub fn new(timeout: Duration) -> Self { + CacheLock { + lock_table: ConcurrentHashTable::new(), + timeout, + } + } + + /// Try to lock a cache fetch + /// + /// Users should call after a cache miss before fetching the asset. + /// The returned [Locked] will tell the caller either to fetch or wait. + pub fn lock<K: CacheHashKey>(&self, key: &K) -> Locked { + let hash = key.combined_bin(); + let key = u128::from_be_bytes(hash); // endianness doesn't matter + let table = self.lock_table.get(key); + if let Some(lock) = table.read().get(&key) { + // already has an ongoing request + if lock.0.lock_status() != LockStatus::Dangling { + return Locked::Read(lock.read_lock()); + } + // Dangling: the previous writer quit without unlocking the lock. Requests should + // compete for the write lock again. + } + + let (permit, stub) = WritePermit::new(self.timeout); + let mut table = table.write(); + // check again in case another request already added it + if let Some(lock) = table.get(&key) { + if lock.0.lock_status() != LockStatus::Dangling { + return Locked::Read(lock.read_lock()); + } + } + table.insert(key, stub); + Locked::Write(permit) + } + + /// Release a lock for the given key + /// + /// When the write lock is dropped without being released, the read lock holders will consider + /// it to be failed so that they will compete for the write lock again. + pub fn release<K: CacheHashKey>(&self, key: &K, reason: LockStatus) { + let hash = key.combined_bin(); + let key = u128::from_be_bytes(hash); // endianness doesn't matter + if let Some(lock) = self.lock_table.write(key).remove(&key) { + // make sure that the caller didn't forget to unlock it + if lock.0.locked() { + lock.0.unlock(reason); + } + } + } +} + +use std::sync::atomic::{AtomicU8, Ordering}; +use std::time::{Duration, Instant}; +use tokio::sync::Semaphore; + +/// Status which the read locks could possibly see. +#[derive(Debug, Copy, Clone, PartialEq, Eq)] +pub enum LockStatus { + /// Waiting for the writer to populate the asset + Waiting, + /// The writer finishes, readers can start + Done, + /// The writer encountered error, such as network issue. A new writer will be elected. + TransientError, + /// The writer observed that no cache lock is needed (e.g., uncacheable), readers should start + /// to fetch independently without a new writer + GiveUp, + /// The write lock is dropped without being unlocked + Dangling, + /// The lock is held for too long + Timeout, +} + +impl From<LockStatus> for u8 { + fn from(l: LockStatus) -> u8 { + match l { + LockStatus::Waiting => 0, + LockStatus::Done => 1, + LockStatus::TransientError => 2, + LockStatus::GiveUp => 3, + LockStatus::Dangling => 4, + LockStatus::Timeout => 5, + } + } +} + +impl From<u8> for LockStatus { + fn from(v: u8) -> Self { + match v { + 0 => Self::Waiting, + 1 => Self::Done, + 2 => Self::TransientError, + 3 => Self::GiveUp, + 4 => Self::Dangling, + 5 => Self::Timeout, + _ => Self::GiveUp, // placeholder + } + } +} + +#[derive(Debug)] +struct LockCore { + pub lock_start: Instant, + pub timeout: Duration, + pub(super) lock: Semaphore, + // use u8 for Atomic enum + lock_status: AtomicU8, +} + +impl LockCore { + pub fn new_arc(timeout: Duration) -> Arc<Self> { + Arc::new(LockCore { + lock: Semaphore::new(0), + timeout, + lock_start: Instant::now(), + lock_status: AtomicU8::new(LockStatus::Waiting.into()), + }) + } + + fn locked(&self) -> bool { + self.lock.available_permits() == 0 + } + + fn unlock(&self, reason: LockStatus) { + self.lock_status.store(reason.into(), Ordering::SeqCst); + // any small positive number will do, 10 is used for RwLock too + // no need to wake up all at once + self.lock.add_permits(10); + } + + fn lock_status(&self) -> LockStatus { + self.lock_status.load(Ordering::Relaxed).into() + } +} + +// all 3 structs below are just Arc<LockCore> with different interfaces + +/// ReadLock: requests who get it need to wait until it is released +#[derive(Debug)] +pub struct ReadLock(Arc<LockCore>); + +impl ReadLock { + /// Wait for the writer to release the lock + pub async fn wait(&self) { + if !self.locked() || self.expired() { + return; + } + + // TODO: should subtract now - start so that the lock don't wait beyond start + timeout + // Also need to be careful not to wake everyone up at the same time + // (maybe not an issue because regular cache lock release behaves that way) + let _ = timeout(self.0.timeout, self.0.lock.acquire()).await; + // permit is returned to Semaphore right away + } + + /// Test if it is still locked + pub fn locked(&self) -> bool { + self.0.locked() + } + + /// Whether the lock is expired, e.g., the writer has been holding the lock for too long + pub fn expired(&self) -> bool { + // NOTE: this whether the lock is currently expired + // not whether it was timed out during wait() + self.0.lock_start.elapsed() >= self.0.timeout + } + + /// The current status of the lock + pub fn lock_status(&self) -> LockStatus { + let status = self.0.lock_status(); + if matches!(status, LockStatus::Waiting) && self.expired() { + LockStatus::Timeout + } else { + status + } + } +} + +/// WritePermit: requires who get it need to populate the cache and then release it +#[derive(Debug)] +pub struct WritePermit(Arc<LockCore>); + +impl WritePermit { + fn new(timeout: Duration) -> (WritePermit, LockStub) { + let lock = LockCore::new_arc(timeout); + let stub = LockStub(lock.clone()); + (WritePermit(lock), stub) + } + + fn unlock(&self, reason: LockStatus) { + self.0.unlock(reason) + } +} + +impl Drop for WritePermit { + fn drop(&mut self) { + // writer exit without properly unlock, let others to compete for the write lock again + if self.0.locked() { + self.unlock(LockStatus::Dangling); + } + } +} + +struct LockStub(Arc<LockCore>); +impl LockStub { + pub fn read_lock(&self) -> ReadLock { + ReadLock(self.0.clone()) + } +} + +#[cfg(test)] +mod test { + use super::*; + use crate::CacheKey; + + #[test] + fn test_get_release() { + let cache_lock = CacheLock::new(Duration::from_secs(1000)); + let key1 = CacheKey::new("", "a", "1"); + let locked1 = cache_lock.lock(&key1); + assert!(locked1.is_write()); // write permit + let locked2 = cache_lock.lock(&key1); + assert!(!locked2.is_write()); // read lock + cache_lock.release(&key1, LockStatus::Done); + let locked3 = cache_lock.lock(&key1); + assert!(locked3.is_write()); // write permit again + } + + #[tokio::test] + async fn test_lock() { + let cache_lock = CacheLock::new(Duration::from_secs(1000)); + let key1 = CacheKey::new("", "a", "1"); + let permit = match cache_lock.lock(&key1) { + Locked::Write(w) => w, + _ => panic!(), + }; + let lock = match cache_lock.lock(&key1) { + Locked::Read(r) => r, + _ => panic!(), + }; + assert!(lock.locked()); + let handle = tokio::spawn(async move { + lock.wait().await; + assert_eq!(lock.lock_status(), LockStatus::Done); + }); + permit.unlock(LockStatus::Done); + handle.await.unwrap(); // check lock is unlocked and the task is returned + } + + #[tokio::test] + async fn test_lock_timeout() { + let cache_lock = CacheLock::new(Duration::from_secs(1)); + let key1 = CacheKey::new("", "a", "1"); + let permit = match cache_lock.lock(&key1) { + Locked::Write(w) => w, + _ => panic!(), + }; + let lock = match cache_lock.lock(&key1) { + Locked::Read(r) => r, + _ => panic!(), + }; + assert!(lock.locked()); + + let handle = tokio::spawn(async move { + // timed out + lock.wait().await; + assert_eq!(lock.lock_status(), LockStatus::Timeout); + }); + + tokio::time::sleep(Duration::from_secs(2)).await; + + // expired lock + let lock2 = match cache_lock.lock(&key1) { + Locked::Read(r) => r, + _ => panic!(), + }; + assert!(lock2.locked()); + assert_eq!(lock2.lock_status(), LockStatus::Timeout); + lock2.wait().await; + assert_eq!(lock2.lock_status(), LockStatus::Timeout); + + permit.unlock(LockStatus::Done); + handle.await.unwrap(); + } +} diff --git a/pingora-cache/src/max_file_size.rs b/pingora-cache/src/max_file_size.rs new file mode 100644 index 0000000..7d812f2 --- /dev/null +++ b/pingora-cache/src/max_file_size.rs @@ -0,0 +1,75 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Set limit on the largest size to cache + +use crate::storage::HandleMiss; +use crate::MissHandler; +use async_trait::async_trait; +use bytes::Bytes; +use pingora_error::{Error, ErrorType}; + +/// [MaxFileSizeMissHandler] wraps a MissHandler to enforce a maximum asset size that should be +/// written to the MissHandler. +/// +/// This is used to enforce a maximum cache size for a request when the +/// response size is not known ahead of time (no Content-Length header). When the response size _is_ +/// known ahead of time, it should be checked up front (when calculating cacheability) for efficiency. +/// Note: for requests with partial read support (where downstream reads the response from cache as +/// it is filled), this will cause the request as a whole to fail. The response will be remembered +/// as uncacheable, though, so downstream will be able to retry the request, since the cache will be +/// disabled for the retried request. +pub struct MaxFileSizeMissHandler { + inner: MissHandler, + max_file_size_bytes: usize, + bytes_written: usize, +} + +impl MaxFileSizeMissHandler { + /// Create a new [MaxFileSizeMissHandler] wrapping the given [MissHandler] + pub fn new(inner: MissHandler, max_file_size_bytes: usize) -> MaxFileSizeMissHandler { + MaxFileSizeMissHandler { + inner, + max_file_size_bytes, + bytes_written: 0, + } + } +} + +/// Error type returned when the limit is reached. +pub const ERR_RESPONSE_TOO_LARGE: ErrorType = ErrorType::Custom("response too large"); + +#[async_trait] +impl HandleMiss for MaxFileSizeMissHandler { + async fn write_body(&mut self, data: Bytes, eof: bool) -> pingora_error::Result<()> { + // fail if writing the body would exceed the max_file_size_bytes + if self.bytes_written + data.len() > self.max_file_size_bytes { + return Error::e_explain( + ERR_RESPONSE_TOO_LARGE, + format!( + "writing data of size {} bytes would exceed max file size of {} bytes", + data.len(), + self.max_file_size_bytes + ), + ); + } + + self.bytes_written += data.len(); + self.inner.write_body(data, eof).await + } + + async fn finish(self: Box<Self>) -> pingora_error::Result<usize> { + self.inner.finish().await + } +} diff --git a/pingora-cache/src/memory.rs b/pingora-cache/src/memory.rs new file mode 100644 index 0000000..679517d --- /dev/null +++ b/pingora-cache/src/memory.rs @@ -0,0 +1,510 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Hash map based in memory cache +//! +//! For testing only, not for production use + +//TODO: Mark this module #[test] only + +use super::*; +use crate::key::{CacheHashKey, CompactCacheKey}; +use crate::storage::{HandleHit, HandleMiss, Storage}; +use crate::trace::SpanHandle; + +use async_trait::async_trait; +use bytes::Bytes; +use parking_lot::RwLock; +use pingora_error::*; +use std::any::Any; +use std::collections::HashMap; +use std::sync::Arc; +use tokio::sync::watch; + +type BinaryMeta = (Vec<u8>, Vec<u8>); + +pub(crate) struct CacheObject { + pub meta: BinaryMeta, + pub body: Arc<Vec<u8>>, +} + +pub(crate) struct TempObject { + pub meta: BinaryMeta, + // these are Arc because they need to continue exist after this TempObject is removed + pub body: Arc<RwLock<Vec<u8>>>, + bytes_written: Arc<watch::Sender<PartialState>>, // this should match body.len() +} + +impl TempObject { + fn new(meta: BinaryMeta) -> Self { + let (tx, _rx) = watch::channel(PartialState::Partial(0)); + TempObject { + meta, + body: Arc::new(RwLock::new(Vec::new())), + bytes_written: Arc::new(tx), + } + } + // this is not at all optimized + fn make_cache_object(&self) -> CacheObject { + let meta = self.meta.clone(); + let body = Arc::new(self.body.read().clone()); + CacheObject { meta, body } + } +} + +/// Hash map based in memory cache +/// +/// For testing only, not for production use. +pub struct MemCache { + pub(crate) cached: Arc<RwLock<HashMap<String, CacheObject>>>, + pub(crate) temp: Arc<RwLock<HashMap<String, TempObject>>>, +} + +impl MemCache { + /// Create a new [MemCache] + pub fn new() -> Self { + MemCache { + cached: Arc::new(RwLock::new(HashMap::new())), + temp: Arc::new(RwLock::new(HashMap::new())), + } + } +} + +pub enum MemHitHandler { + Complete(CompleteHit), + Partial(PartialHit), +} + +#[derive(Copy, Clone)] +enum PartialState { + Partial(usize), + Complete(usize), +} + +pub struct CompleteHit { + body: Arc<Vec<u8>>, + done: bool, + range_start: usize, + range_end: usize, +} + +impl CompleteHit { + fn get(&mut self) -> Option<Bytes> { + if self.done { + None + } else { + self.done = true; + Some(Bytes::copy_from_slice( + &self.body.as_slice()[self.range_start..self.range_end], + )) + } + } + + fn seek(&mut self, start: usize, end: Option<usize>) -> Result<()> { + if start >= self.body.len() { + return Error::e_explain( + ErrorType::InternalError, + format!("seek start out of range {start} >= {}", self.body.len()), + ); + } + self.range_start = start; + if let Some(end) = end { + // end over the actual last byte is allowed, we just need to return the actual bytes + self.range_end = std::cmp::min(self.body.len(), end); + } + // seek resets read so that one handler can be used for multiple ranges + self.done = false; + Ok(()) + } +} + +pub struct PartialHit { + body: Arc<RwLock<Vec<u8>>>, + bytes_written: watch::Receiver<PartialState>, + bytes_read: usize, +} + +impl PartialHit { + async fn read(&mut self) -> Option<Bytes> { + loop { + let bytes_written = *self.bytes_written.borrow_and_update(); + let bytes_end = match bytes_written { + PartialState::Partial(s) => s, + PartialState::Complete(c) => { + // no more data will arrive + if c == self.bytes_read { + return None; + } + c + } + }; + assert!(bytes_end >= self.bytes_read); + + // more data avaliable to read + if bytes_end > self.bytes_read { + let new_bytes = + Bytes::copy_from_slice(&self.body.read()[self.bytes_read..bytes_end]); + self.bytes_read = bytes_end; + return Some(new_bytes); + } + + // wait for more data + if self.bytes_written.changed().await.is_err() { + // err: sender dropped, body is finished + // FIXME: sender could drop because of an error + return None; + } + } + } +} + +#[async_trait] +impl HandleHit for MemHitHandler { + async fn read_body(&mut self) -> Result<Option<Bytes>> { + match self { + Self::Complete(c) => Ok(c.get()), + Self::Partial(p) => Ok(p.read().await), + } + } + async fn finish( + self: Box<Self>, // because self is always used as a trait object + _storage: &'static (dyn storage::Storage + Sync), + _key: &CacheKey, + _trace: &SpanHandle, + ) -> Result<()> { + Ok(()) + } + + fn can_seek(&self) -> bool { + match self { + Self::Complete(_) => true, + Self::Partial(_) => false, // TODO: support seeking in partial reads + } + } + + fn seek(&mut self, start: usize, end: Option<usize>) -> Result<()> { + match self { + Self::Complete(c) => c.seek(start, end), + Self::Partial(_) => Error::e_explain( + ErrorType::InternalError, + "seek not supported for partial cache", + ), + } + } + + fn as_any(&self) -> &(dyn Any + Send + Sync) { + self + } +} + +pub struct MemMissHandler { + body: Arc<RwLock<Vec<u8>>>, + bytes_written: Arc<watch::Sender<PartialState>>, + // these are used only in finish() to to data from temp to cache + key: String, + cache: Arc<RwLock<HashMap<String, CacheObject>>>, + temp: Arc<RwLock<HashMap<String, TempObject>>>, +} + +#[async_trait] +impl HandleMiss for MemMissHandler { + async fn write_body(&mut self, data: bytes::Bytes, eof: bool) -> Result<()> { + let current_bytes = match *self.bytes_written.borrow() { + PartialState::Partial(p) => p, + PartialState::Complete(_) => panic!("already EOF"), + }; + self.body.write().extend_from_slice(&data); + let written = current_bytes + data.len(); + let new_state = if eof { + PartialState::Complete(written) + } else { + PartialState::Partial(written) + }; + self.bytes_written.send_replace(new_state); + Ok(()) + } + + async fn finish(self: Box<Self>) -> Result<usize> { + // safe, the temp object is inserted when the miss handler is created + let cache_object = self.temp.read().get(&self.key).unwrap().make_cache_object(); + let size = cache_object.body.len(); // FIXME: this just body size, also track meta size + self.cache.write().insert(self.key.clone(), cache_object); + self.temp.write().remove(&self.key); + Ok(size) + } +} + +impl Drop for MemMissHandler { + fn drop(&mut self) { + self.temp.write().remove(&self.key); + } +} + +#[async_trait] +impl Storage for MemCache { + async fn lookup( + &'static self, + key: &CacheKey, + _trace: &SpanHandle, + ) -> Result<Option<(CacheMeta, HitHandler)>> { + let hash = key.combined(); + // always prefer partial read otherwise fresh asset will not be visible on expired asset + // until it is fully updated + if let Some(temp_obj) = self.temp.read().get(&hash) { + let meta = CacheMeta::deserialize(&temp_obj.meta.0, &temp_obj.meta.1)?; + let partial = PartialHit { + body: temp_obj.body.clone(), + bytes_written: temp_obj.bytes_written.subscribe(), + bytes_read: 0, + }; + let hit_handler = MemHitHandler::Partial(partial); + Ok(Some((meta, Box::new(hit_handler)))) + } else if let Some(obj) = self.cached.read().get(&hash) { + let meta = CacheMeta::deserialize(&obj.meta.0, &obj.meta.1)?; + let hit_handler = CompleteHit { + body: obj.body.clone(), + done: false, + range_start: 0, + range_end: obj.body.len(), + }; + let hit_handler = MemHitHandler::Complete(hit_handler); + Ok(Some((meta, Box::new(hit_handler)))) + } else { + Ok(None) + } + } + + async fn get_miss_handler( + &'static self, + key: &CacheKey, + meta: &CacheMeta, + _trace: &SpanHandle, + ) -> Result<MissHandler> { + // TODO: support multiple concurrent writes or panic if the is already a writer + let hash = key.combined(); + let meta = meta.serialize()?; + let temp_obj = TempObject::new(meta); + let miss_handler = MemMissHandler { + body: temp_obj.body.clone(), + bytes_written: temp_obj.bytes_written.clone(), + key: hash.clone(), + cache: self.cached.clone(), + temp: self.temp.clone(), + }; + self.temp.write().insert(hash, temp_obj); + Ok(Box::new(miss_handler)) + } + + async fn purge(&'static self, key: &CompactCacheKey, _trace: &SpanHandle) -> Result<bool> { + // TODO: purge partial + + // This usually purges the primary key because, without a lookup, variance key is usually + // empty + let hash = key.combined(); + Ok(self.cached.write().remove(&hash).is_some()) + } + + async fn update_meta( + &'static self, + key: &CacheKey, + meta: &CacheMeta, + _trace: &SpanHandle, + ) -> Result<bool> { + let hash = key.combined(); + if let Some(obj) = self.cached.write().get_mut(&hash) { + obj.meta = meta.serialize()?; + Ok(true) + } else { + panic!("no meta found") + } + } + + fn support_streaming_partial_write(&self) -> bool { + true + } + + fn as_any(&self) -> &(dyn Any + Send + Sync) { + self + } +} + +#[cfg(test)] +mod test { + use super::*; + use once_cell::sync::Lazy; + use rustracing::span::Span; + + fn gen_meta() -> CacheMeta { + let mut header = ResponseHeader::build(200, None).unwrap(); + header.append_header("foo1", "bar1").unwrap(); + header.append_header("foo2", "bar2").unwrap(); + header.append_header("foo3", "bar3").unwrap(); + header.append_header("Server", "Pingora").unwrap(); + let internal = crate::meta::InternalMeta::default(); + CacheMeta(Box::new(crate::meta::CacheMetaInner { + internal, + header, + extensions: http::Extensions::new(), + })) + } + + #[tokio::test] + async fn test_write_then_read() { + static MEM_CACHE: Lazy<MemCache> = Lazy::new(MemCache::new); + let span = &Span::inactive().handle(); + + let key1 = CacheKey::new("", "a", "1"); + let res = MEM_CACHE.lookup(&key1, span).await.unwrap(); + assert!(res.is_none()); + + let cache_meta = gen_meta(); + + let mut miss_handler = MEM_CACHE + .get_miss_handler(&key1, &cache_meta, span) + .await + .unwrap(); + miss_handler + .write_body(b"test1"[..].into(), false) + .await + .unwrap(); + miss_handler + .write_body(b"test2"[..].into(), false) + .await + .unwrap(); + miss_handler.finish().await.unwrap(); + + let (cache_meta2, mut hit_handler) = MEM_CACHE.lookup(&key1, span).await.unwrap().unwrap(); + assert_eq!( + cache_meta.0.internal.fresh_until, + cache_meta2.0.internal.fresh_until + ); + + let data = hit_handler.read_body().await.unwrap().unwrap(); + assert_eq!("test1test2", data); + let data = hit_handler.read_body().await.unwrap(); + assert!(data.is_none()); + } + + #[tokio::test] + async fn test_read_range() { + static MEM_CACHE: Lazy<MemCache> = Lazy::new(MemCache::new); + let span = &Span::inactive().handle(); + + let key1 = CacheKey::new("", "a", "1"); + let res = MEM_CACHE.lookup(&key1, span).await.unwrap(); + assert!(res.is_none()); + + let cache_meta = gen_meta(); + + let mut miss_handler = MEM_CACHE + .get_miss_handler(&key1, &cache_meta, span) + .await + .unwrap(); + miss_handler + .write_body(b"test1test2"[..].into(), false) + .await + .unwrap(); + miss_handler.finish().await.unwrap(); + + let (cache_meta2, mut hit_handler) = MEM_CACHE.lookup(&key1, span).await.unwrap().unwrap(); + assert_eq!( + cache_meta.0.internal.fresh_until, + cache_meta2.0.internal.fresh_until + ); + + // out of range + assert!(hit_handler.seek(10000, None).is_err()); + + assert!(hit_handler.seek(5, None).is_ok()); + let data = hit_handler.read_body().await.unwrap().unwrap(); + assert_eq!("test2", data); + let data = hit_handler.read_body().await.unwrap(); + assert!(data.is_none()); + + assert!(hit_handler.seek(4, Some(5)).is_ok()); + let data = hit_handler.read_body().await.unwrap().unwrap(); + assert_eq!("1", data); + let data = hit_handler.read_body().await.unwrap(); + assert!(data.is_none()); + } + + #[tokio::test] + async fn test_write_while_read() { + use futures::FutureExt; + + static MEM_CACHE: Lazy<MemCache> = Lazy::new(MemCache::new); + let span = &Span::inactive().handle(); + + let key1 = CacheKey::new("", "a", "1"); + let res = MEM_CACHE.lookup(&key1, span).await.unwrap(); + assert!(res.is_none()); + + let cache_meta = gen_meta(); + + let mut miss_handler = MEM_CACHE + .get_miss_handler(&key1, &cache_meta, span) + .await + .unwrap(); + + // first reader + let (cache_meta1, mut hit_handler1) = MEM_CACHE.lookup(&key1, span).await.unwrap().unwrap(); + assert_eq!( + cache_meta.0.internal.fresh_until, + cache_meta1.0.internal.fresh_until + ); + + // No body to read + let res = hit_handler1.read_body().now_or_never(); + assert!(res.is_none()); + + miss_handler + .write_body(b"test1"[..].into(), false) + .await + .unwrap(); + + let data = hit_handler1.read_body().await.unwrap().unwrap(); + assert_eq!("test1", data); + let res = hit_handler1.read_body().now_or_never(); + assert!(res.is_none()); + + miss_handler + .write_body(b"test2"[..].into(), false) + .await + .unwrap(); + let data = hit_handler1.read_body().await.unwrap().unwrap(); + assert_eq!("test2", data); + + // second reader + let (cache_meta2, mut hit_handler2) = MEM_CACHE.lookup(&key1, span).await.unwrap().unwrap(); + assert_eq!( + cache_meta.0.internal.fresh_until, + cache_meta2.0.internal.fresh_until + ); + + let data = hit_handler2.read_body().await.unwrap().unwrap(); + assert_eq!("test1test2", data); + let res = hit_handler2.read_body().now_or_never(); + assert!(res.is_none()); + + let res = hit_handler1.read_body().now_or_never(); + assert!(res.is_none()); + + miss_handler.finish().await.unwrap(); + + let data = hit_handler1.read_body().await.unwrap(); + assert!(data.is_none()); + let data = hit_handler2.read_body().await.unwrap(); + assert!(data.is_none()); + } +} diff --git a/pingora-cache/src/meta.rs b/pingora-cache/src/meta.rs new file mode 100644 index 0000000..a534dc0 --- /dev/null +++ b/pingora-cache/src/meta.rs @@ -0,0 +1,608 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Metadata for caching + +use http::Extensions; +use pingora_error::{Error, ErrorType::*, OrErr, Result}; +use pingora_http::{HMap, ResponseHeader}; +use serde::{Deserialize, Serialize}; +use std::time::{Duration, SystemTime}; + +use crate::key::HashBinary; + +pub(crate) type InternalMeta = internal_meta::InternalMetaLatest; +mod internal_meta { + use super::*; + + pub(crate) type InternalMetaLatest = InternalMetaV2; + + #[derive(Debug, Deserialize, Serialize, Clone)] + pub(crate) struct InternalMetaV0 { + pub(crate) fresh_until: SystemTime, + pub(crate) created: SystemTime, + pub(crate) stale_while_revalidate_sec: u32, + pub(crate) stale_if_error_sec: u32, + // Do not add more field + } + + impl InternalMetaV0 { + #[allow(dead_code)] + fn serialize(&self) -> Result<Vec<u8>> { + rmp_serde::encode::to_vec(self).or_err(InternalError, "failed to encode cache meta") + } + + fn deserialize(buf: &[u8]) -> Result<Self> { + rmp_serde::decode::from_slice(buf) + .or_err(InternalError, "failed to decode cache meta v0") + } + } + + #[derive(Debug, Deserialize, Serialize, Clone)] + pub(crate) struct InternalMetaV1 { + pub(crate) version: u8, + pub(crate) fresh_until: SystemTime, + pub(crate) created: SystemTime, + pub(crate) stale_while_revalidate_sec: u32, + pub(crate) stale_if_error_sec: u32, + // Do not add more field + } + + impl InternalMetaV1 { + #[allow(dead_code)] + pub const VERSION: u8 = 1; + + #[allow(dead_code)] + pub fn serialize(&self) -> Result<Vec<u8>> { + assert_eq!(self.version, 1); + rmp_serde::encode::to_vec(self).or_err(InternalError, "failed to encode cache meta") + } + + fn deserialize(buf: &[u8]) -> Result<Self> { + rmp_serde::decode::from_slice(buf) + .or_err(InternalError, "failed to decode cache meta v1") + } + } + + #[derive(Debug, Deserialize, Serialize, Clone)] + pub(crate) struct InternalMetaV2 { + pub(crate) version: u8, + pub(crate) fresh_until: SystemTime, + pub(crate) created: SystemTime, + pub(crate) updated: SystemTime, + pub(crate) stale_while_revalidate_sec: u32, + pub(crate) stale_if_error_sec: u32, + // Only the extended field to be added below. One field at a time. + // 1. serde default in order to accept an older version schema without the field existing + // 2. serde skip_serializing_if in order for software with only an older version of this + // schema to decode it + // After full releases, remove `skip_serializing_if` so that we can add the next extended field. + #[serde(default)] + #[serde(skip_serializing_if = "Option::is_none")] + pub(crate) variance: Option<HashBinary>, + } + + impl Default for InternalMetaV2 { + fn default() -> Self { + let epoch = SystemTime::UNIX_EPOCH; + InternalMetaV2 { + version: InternalMetaV2::VERSION, + fresh_until: epoch, + created: epoch, + updated: epoch, + stale_while_revalidate_sec: 0, + stale_if_error_sec: 0, + variance: None, + } + } + } + + impl InternalMetaV2 { + pub const VERSION: u8 = 2; + + pub fn serialize(&self) -> Result<Vec<u8>> { + assert_eq!(self.version, Self::VERSION); + rmp_serde::encode::to_vec(self).or_err(InternalError, "failed to encode cache meta") + } + + fn deserialize(buf: &[u8]) -> Result<Self> { + rmp_serde::decode::from_slice(buf) + .or_err(InternalError, "failed to decode cache meta v2") + } + } + + impl From<InternalMetaV0> for InternalMetaV2 { + fn from(v0: InternalMetaV0) -> Self { + InternalMetaV2 { + version: InternalMetaV2::VERSION, + fresh_until: v0.fresh_until, + created: v0.created, + updated: v0.created, + stale_while_revalidate_sec: v0.stale_while_revalidate_sec, + stale_if_error_sec: v0.stale_if_error_sec, + ..Default::default() + } + } + } + + impl From<InternalMetaV1> for InternalMetaV2 { + fn from(v1: InternalMetaV1) -> Self { + InternalMetaV2 { + version: InternalMetaV2::VERSION, + fresh_until: v1.fresh_until, + created: v1.created, + updated: v1.created, + stale_while_revalidate_sec: v1.stale_while_revalidate_sec, + stale_if_error_sec: v1.stale_if_error_sec, + ..Default::default() + } + } + } + + // cross version decode + pub(crate) fn deserialize(buf: &[u8]) -> Result<InternalMetaLatest> { + const MIN_SIZE: usize = 10; // a small number to read the first few bytes + if buf.len() < MIN_SIZE { + return Error::e_explain( + InternalError, + format!("Buf too short ({}) to be InternalMeta", buf.len()), + ); + } + let preread_buf = &mut &buf[..MIN_SIZE]; + // the struct is always packed as a fixed size array + match rmp::decode::read_array_len(preread_buf) + .or_err(InternalError, "failed to decode cache meta array size")? + { + // v0 has 4 items and no version number + 4 => Ok(InternalMetaV0::deserialize(buf)?.into()), + // other V should has version number encoded + _ => { + // rmp will encode version < 128 into a fixint (one byte), + // so we use read_pfix + let version = rmp::decode::read_pfix(preread_buf) + .or_err(InternalError, "failed to decode meta version")?; + match version { + 1 => Ok(InternalMetaV1::deserialize(buf)?.into()), + 2 => InternalMetaV2::deserialize(buf), + _ => Error::e_explain( + InternalError, + format!("Unknown InternalMeta version {version}"), + ), + } + } + } + } + + #[cfg(test)] + mod tests { + use super::*; + + #[test] + fn test_internal_meta_serde_v0() { + let meta = InternalMetaV0 { + fresh_until: SystemTime::now(), + created: SystemTime::now(), + stale_while_revalidate_sec: 0, + stale_if_error_sec: 0, + }; + let binary = meta.serialize().unwrap(); + let meta2 = InternalMetaV0::deserialize(&binary).unwrap(); + assert_eq!(meta.fresh_until, meta2.fresh_until); + } + + #[test] + fn test_internal_meta_serde_v1() { + let meta = InternalMetaV1 { + version: InternalMetaV1::VERSION, + fresh_until: SystemTime::now(), + created: SystemTime::now(), + stale_while_revalidate_sec: 0, + stale_if_error_sec: 0, + }; + let binary = meta.serialize().unwrap(); + let meta2 = InternalMetaV1::deserialize(&binary).unwrap(); + assert_eq!(meta.fresh_until, meta2.fresh_until); + } + + #[test] + fn test_internal_meta_serde_v2() { + let meta = InternalMetaV2::default(); + let binary = meta.serialize().unwrap(); + let meta2 = InternalMetaV2::deserialize(&binary).unwrap(); + assert_eq!(meta2.version, 2); + assert_eq!(meta.fresh_until, meta2.fresh_until); + assert_eq!(meta.created, meta2.created); + assert_eq!(meta.updated, meta2.updated); + } + + #[test] + fn test_internal_meta_serde_across_versions() { + let meta = InternalMetaV0 { + fresh_until: SystemTime::now(), + created: SystemTime::now(), + stale_while_revalidate_sec: 0, + stale_if_error_sec: 0, + }; + let binary = meta.serialize().unwrap(); + let meta2 = deserialize(&binary).unwrap(); + assert_eq!(meta2.version, 2); + assert_eq!(meta.fresh_until, meta2.fresh_until); + + let meta = InternalMetaV1 { + version: 1, + fresh_until: SystemTime::now(), + created: SystemTime::now(), + stale_while_revalidate_sec: 0, + stale_if_error_sec: 0, + }; + let binary = meta.serialize().unwrap(); + let meta2 = deserialize(&binary).unwrap(); + assert_eq!(meta2.version, 2); + assert_eq!(meta.fresh_until, meta2.fresh_until); + // `updated` == `created` when upgrading to v2 + assert_eq!(meta2.created, meta2.updated); + } + + #[test] + fn test_internal_meta_serde_v2_extend_fields() { + // make sure that v2 format is backward compatible + // this is the base version of v2 without any extended fields + #[derive(Deserialize, Serialize)] + pub(crate) struct InternalMetaV2Base { + pub(crate) version: u8, + pub(crate) fresh_until: SystemTime, + pub(crate) created: SystemTime, + pub(crate) updated: SystemTime, + pub(crate) stale_while_revalidate_sec: u32, + pub(crate) stale_if_error_sec: u32, + } + + impl InternalMetaV2Base { + pub const VERSION: u8 = 2; + pub fn serialize(&self) -> Result<Vec<u8>> { + assert!(self.version >= Self::VERSION); + rmp_serde::encode::to_vec(self) + .or_err(InternalError, "failed to encode cache meta") + } + fn deserialize(buf: &[u8]) -> Result<Self> { + rmp_serde::decode::from_slice(buf) + .or_err(InternalError, "failed to decode cache meta v2") + } + } + + // ext V2 to base v2 + let meta = InternalMetaV2::default(); + let binary = meta.serialize().unwrap(); + let meta2 = InternalMetaV2Base::deserialize(&binary).unwrap(); + assert_eq!(meta2.version, 2); + assert_eq!(meta.fresh_until, meta2.fresh_until); + assert_eq!(meta.created, meta2.created); + assert_eq!(meta.updated, meta2.updated); + + // base V2 to ext v2 + let now = SystemTime::now(); + let meta = InternalMetaV2Base { + version: InternalMetaV2::VERSION, + fresh_until: now, + created: now, + updated: now, + stale_while_revalidate_sec: 0, + stale_if_error_sec: 0, + }; + let binary = meta.serialize().unwrap(); + let meta2 = InternalMetaV2::deserialize(&binary).unwrap(); + assert_eq!(meta2.version, 2); + assert_eq!(meta.fresh_until, meta2.fresh_until); + assert_eq!(meta.created, meta2.created); + assert_eq!(meta.updated, meta2.updated); + } + } +} + +#[derive(Debug)] +pub(crate) struct CacheMetaInner { + // http header and Internal meta have different ways of serialization, so keep them separated + pub(crate) internal: InternalMeta, + pub(crate) header: ResponseHeader, + /// An opaque type map to hold extra information for communication between cache backends + /// and users. This field is **not** garanteed be persistently stored in the cache backend. + pub extensions: Extensions, +} + +/// The cacheable response header and cache metadata +#[derive(Debug)] +pub struct CacheMeta(pub(crate) Box<CacheMetaInner>); + +impl CacheMeta { + /// Create a [CacheMeta] from the given metadata and the response header + pub fn new( + fresh_until: SystemTime, + created: SystemTime, + stale_while_revalidate_sec: u32, + stale_if_error_sec: u32, + header: ResponseHeader, + ) -> CacheMeta { + CacheMeta(Box::new(CacheMetaInner { + internal: InternalMeta { + version: InternalMeta::VERSION, + fresh_until, + created, + updated: created, // created == updated for new meta + stale_while_revalidate_sec, + stale_if_error_sec, + ..Default::default() + }, + header, + extensions: Extensions::new(), + })) + } + + /// When the asset was created/admitted to cache + pub fn created(&self) -> SystemTime { + self.0.internal.created + } + + /// The last time the asset was revalidated + /// + /// This value will be the same as [Self::created()] if no revalidation ever happens + pub fn updated(&self) -> SystemTime { + self.0.internal.updated + } + + /// Is the asset still valid + pub fn is_fresh(&self, time: SystemTime) -> bool { + // NOTE: HTTP cache time resolution is second + self.0.internal.fresh_until >= time + } + + /// How long (in seconds) the asset should be fresh since its admission/revalidation + /// + /// This is essentially the max-age value (or its equivalence) + pub fn fresh_sec(&self) -> u64 { + // swallow `duration_since` error, assets that are always stale have earlier `fresh_until` than `created` + // practically speaking we can always treat these as 0 ttl + // XXX: return Error if `fresh_until` is much earlier than expected? + self.0 + .internal + .fresh_until + .duration_since(self.0.internal.updated) + .map_or(0, |duration| duration.as_secs()) + } + + /// Until when the asset is considered fresh + pub fn fresh_until(&self) -> SystemTime { + self.0.internal.fresh_until + } + + /// How old the asset is since its admission/revalidation + pub fn age(&self) -> Duration { + SystemTime::now() + .duration_since(self.updated()) + .unwrap_or_default() + } + + /// The stale-while-revalidate limit in seconds + pub fn stale_while_revalidate_sec(&self) -> u32 { + self.0.internal.stale_while_revalidate_sec + } + + /// The stale-if-error limit in seconds + pub fn stale_if_error_sec(&self) -> u32 { + self.0.internal.stale_if_error_sec + } + + /// Can the asset be used to serve stale during revalidation at the given time. + /// + /// NOTE: the serve stale functions do not check !is_fresh(time), + /// i.e. the object is already assumed to be stale. + pub fn serve_stale_while_revalidate(&self, time: SystemTime) -> bool { + self.can_serve_stale(self.0.internal.stale_while_revalidate_sec, time) + } + + /// Can the asset be used to serve stale after error at the given time. + /// + /// NOTE: the serve stale functions do not check !is_fresh(time), + /// i.e. the object is already assumed to be stale. + pub fn serve_stale_if_error(&self, time: SystemTime) -> bool { + self.can_serve_stale(self.0.internal.stale_if_error_sec, time) + } + + /// Disable serve stale for this asset + pub fn disable_serve_stale(&mut self) { + self.0.internal.stale_if_error_sec = 0; + self.0.internal.stale_while_revalidate_sec = 0; + } + + /// Get the variance hash of this asset + pub fn variance(&self) -> Option<HashBinary> { + self.0.internal.variance + } + + /// Set the variance key of this asset + pub fn set_variance_key(&mut self, variance_key: HashBinary) { + self.0.internal.variance = Some(variance_key); + } + + /// Set the variance (hash) of this asset + pub fn set_variance(&mut self, variance: HashBinary) { + self.0.internal.variance = Some(variance) + } + + /// Removes the variance (hash) of this asset + pub fn remove_variance(&mut self) { + self.0.internal.variance = None + } + + /// Get the response header in this asset + pub fn response_header(&self) -> &ResponseHeader { + &self.0.header + } + + /// Modify the header in this asset + pub fn response_header_mut(&mut self) -> &mut ResponseHeader { + &mut self.0.header + } + + /// Expose the extensions to read + pub fn extensions(&self) -> &Extensions { + &self.0.extensions + } + + /// Expose the extensions to modify + pub fn extensions_mut(&mut self) -> &mut Extensions { + &mut self.0.extensions + } + + /// Get a copy of the response header + pub fn response_header_copy(&self) -> ResponseHeader { + self.0.header.clone() + } + + /// get all the headers of this asset + pub fn headers(&self) -> &HMap { + &self.0.header.headers + } + + fn can_serve_stale(&self, serve_stale_sec: u32, time: SystemTime) -> bool { + if serve_stale_sec == 0 { + return false; + } + if let Some(stale_until) = self + .0 + .internal + .fresh_until + .checked_add(Duration::from_secs(serve_stale_sec.into())) + { + stale_until >= time + } else { + // overflowed: treat as infinite ttl + true + } + } + + /// Serialize this object + pub fn serialize(&self) -> Result<(Vec<u8>, Vec<u8>)> { + let internal = self.0.internal.serialize()?; + let header = header_serialize(&self.0.header)?; + Ok((internal, header)) + } + + /// Deserialize from the binary format + pub fn deserialize(internal: &[u8], header: &[u8]) -> Result<Self> { + let internal = internal_meta::deserialize(internal)?; + let header = header_deserialize(header)?; + Ok(CacheMeta(Box::new(CacheMetaInner { + internal, + header, + extensions: Extensions::new(), + }))) + } +} + +use http::StatusCode; + +/// The function to generate TTL from the given [StatusCode]. +pub type FreshSecByStatusFn = fn(StatusCode) -> Option<u32>; + +/// The default settings to generate [CacheMeta] +pub struct CacheMetaDefaults { + // if a status code is not included in fresh_sec, it's not considered cacheable by default. + fresh_sec_fn: FreshSecByStatusFn, + stale_while_revalidate_sec: u32, + // TODO: allow "error" condition to be configurable? + stale_if_error_sec: u32, +} + +impl CacheMetaDefaults { + /// Create a new [CacheMetaDefaults] + pub const fn new( + fresh_sec_fn: FreshSecByStatusFn, + stale_while_revalidate_sec: u32, + stale_if_error_sec: u32, + ) -> Self { + CacheMetaDefaults { + fresh_sec_fn, + stale_while_revalidate_sec, + stale_if_error_sec, + } + } + + /// Return the default TTL for the given [StatusCode] + /// + /// `None`: do no cache this code. + pub fn fresh_sec(&self, resp_status: StatusCode) -> Option<u32> { + // safe guard to make sure 304 response to share the same default ttl of 200 + if resp_status == StatusCode::NOT_MODIFIED { + (self.fresh_sec_fn)(StatusCode::OK) + } else { + (self.fresh_sec_fn)(resp_status) + } + } + + /// The default SWR seconds + pub fn serve_stale_while_revalidate_sec(&self) -> u32 { + self.stale_while_revalidate_sec + } + + /// The default SIE seconds + pub fn serve_stale_if_error_sec(&self) -> u32 { + self.stale_if_error_sec + } +} + +use log::warn; +use once_cell::sync::{Lazy, OnceCell}; +use pingora_header_serde::HeaderSerde; +use std::fs::File; +use std::io::Read; + +/* load header compression engine and its' dictionary globally */ +pub(crate) static COMPRESSION_DICT_PATH: OnceCell<String> = OnceCell::new(); + +fn load_file(path: &String) -> Option<Vec<u8>> { + let mut file = File::open(path) + .map_err(|e| { + warn!( + "failed to open header compress dictionary file at {}, {:?}", + path, e + ); + e + }) + .ok()?; + let mut dict = Vec::new(); + file.read_to_end(&mut dict) + .map_err(|e| { + warn!( + "failed to read header compress dictionary file at {}, {:?}", + path, e + ); + e + }) + .ok()?; + + Some(dict) +} + +static HEADER_SERDE: Lazy<HeaderSerde> = Lazy::new(|| { + let dict = COMPRESSION_DICT_PATH.get().and_then(load_file); + HeaderSerde::new(dict) +}); + +pub(crate) fn header_serialize(header: &ResponseHeader) -> Result<Vec<u8>> { + HEADER_SERDE.serialize(header) +} + +pub(crate) fn header_deserialize<T: AsRef<[u8]>>(buf: T) -> Result<ResponseHeader> { + HEADER_SERDE.deserialize(buf.as_ref()) +} diff --git a/pingora-cache/src/predictor.rs b/pingora-cache/src/predictor.rs new file mode 100644 index 0000000..df8f374 --- /dev/null +++ b/pingora-cache/src/predictor.rs @@ -0,0 +1,228 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Cacheability Predictor + +use crate::hashtable::{ConcurrentLruCache, LruShard}; + +pub type CustomReasonPredicate = fn(&'static str) -> bool; + +/// Cacheability Predictor +/// +/// Remembers previously uncacheable assets. +/// Allows bypassing cache / cache lock early based on historical precedent. +/// +/// NOTE: to simply avoid caching requests with certain characteristics, +/// add checks in request_cache_filter to avoid enabling cache in the first place. +/// The predictor's bypass mechanism handles cases where the request _looks_ cacheable +/// but its previous responses suggest otherwise. The request _could_ be cacheable in the future. +pub struct Predictor<const N_SHARDS: usize> { + uncacheable_keys: ConcurrentLruCache<(), N_SHARDS>, + skip_custom_reasons_fn: Option<CustomReasonPredicate>, +} + +use crate::{key::CacheHashKey, CacheKey, NoCacheReason}; +use log::debug; + +/// The cache predictor trait. +/// +/// This trait allows user defined predictor to replace [Predictor]. +pub trait CacheablePredictor { + /// Return true if likely cacheable, false if likely not. + fn cacheable_prediction(&self, key: &CacheKey) -> bool; + + /// Mark cacheable to allow next request to cache. + /// Returns false if the key was already marked cacheable. + fn mark_cacheable(&self, key: &CacheKey) -> bool; + + /// Mark uncacheable to actively bypass cache on the next request. + /// May skip marking on certain NoCacheReasons. + /// Returns None if we skipped marking uncacheable. + /// Returns Some(false) if the key was already marked uncacheable. + fn mark_uncacheable(&self, key: &CacheKey, reason: NoCacheReason) -> Option<bool>; +} + +// This particular bit of `where [LruShard...; N]: Default` nonsense arises from +// ConcurrentLruCache needing this trait bound, which in turns arises from the Rust +// compiler not being able to guarantee that all array sizes N implement `Default`. +// See https://github.com/rust-lang/rust/issues/61415 +impl<const N_SHARDS: usize> Predictor<N_SHARDS> +where + [LruShard<()>; N_SHARDS]: Default, +{ + /// Create a new Predictor with `N_SHARDS * shard_capacity` total capacity for + /// uncacheable cache keys. + /// + /// - `shard_capacity`: defines number of keys remembered as uncacheable per LRU shard. + /// - `skip_custom_reasons_fn`: an optional predicate used in `mark_uncacheable` + /// that can customize which `Custom` `NoCacheReason`s ought to be remembered as uncacheable. + /// If the predicate returns true, then the predictor will skip remembering the current + /// cache key as uncacheable (and avoid bypassing cache on the next request). + pub fn new( + shard_capacity: usize, + skip_custom_reasons_fn: Option<CustomReasonPredicate>, + ) -> Predictor<N_SHARDS> { + Predictor { + uncacheable_keys: ConcurrentLruCache::<(), N_SHARDS>::new(shard_capacity), + skip_custom_reasons_fn, + } + } +} + +impl<const N_SHARDS: usize> CacheablePredictor for Predictor<N_SHARDS> +where + [LruShard<()>; N_SHARDS]: Default, +{ + fn cacheable_prediction(&self, key: &CacheKey) -> bool { + // variance key is ignored because this check happens before cache lookup + let hash = key.primary_bin(); + let key = u128::from_be_bytes(hash); // Endianness doesn't matter + + // Note: LRU updated in mark_* functions only, + // as we assume the caller always updates the cacheability of the response later + !self.uncacheable_keys.read(key).contains(&key) + } + + fn mark_cacheable(&self, key: &CacheKey) -> bool { + // variance key is ignored because cacheable_prediction() is called before cache lookup + // where the variance key is unknown + let hash = key.primary_bin(); + let key = u128::from_be_bytes(hash); + + let cache = self.uncacheable_keys.get(key); + if !cache.read().contains(&key) { + // not in uncacheable list, nothing to do + return true; + } + + let mut cache = cache.write(); + cache.pop(&key); + debug!("bypassed request became cacheable"); + false + } + + fn mark_uncacheable(&self, key: &CacheKey, reason: NoCacheReason) -> Option<bool> { + // only mark as uncacheable for the future on certain reasons, + // (e.g. InternalErrors) + use NoCacheReason::*; + match reason { + // CacheLockGiveUp: the writer will set OriginNotCache (if applicable) + // readers don't need to do it + NeverEnabled | StorageError | InternalError | Deferred | CacheLockGiveUp + | CacheLockTimeout => { + return None; + } + // Skip certain NoCacheReason::Custom according to user + Custom(reason) if self.skip_custom_reasons_fn.map_or(false, |f| f(reason)) => { + return None; + } + Custom(_) | OriginNotCache | ResponseTooLarge => { /* mark uncacheable for these only */ + } + } + + // variance key is ignored because cacheable_prediction() is called before cache lookup + // where the variance key is unknown + let hash = key.primary_bin(); + let key = u128::from_be_bytes(hash); + + let mut cache = self.uncacheable_keys.get(key).write(); + // put() returns Some(old_value) if the key existed, else None + let new_key = cache.put(key, ()).is_none(); + if new_key { + debug!("request marked uncacheable"); + } + Some(new_key) + } +} + +#[cfg(test)] +mod tests { + use super::*; + #[test] + fn test_mark_cacheability() { + let predictor = Predictor::<1>::new(10, None); + let key = CacheKey::new("a", "b", "c"); + // cacheable if no history + assert!(predictor.cacheable_prediction(&key)); + + // don't remember internal / storage errors + predictor.mark_uncacheable(&key, NoCacheReason::InternalError); + assert!(predictor.cacheable_prediction(&key)); + predictor.mark_uncacheable(&key, NoCacheReason::StorageError); + assert!(predictor.cacheable_prediction(&key)); + + // origin explicitly said uncacheable + predictor.mark_uncacheable(&key, NoCacheReason::OriginNotCache); + assert!(!predictor.cacheable_prediction(&key)); + + // mark cacheable again + predictor.mark_cacheable(&key); + assert!(predictor.cacheable_prediction(&key)); + } + + #[test] + fn test_custom_skip_predicate() { + let predictor = Predictor::<1>::new( + 10, + Some(|custom_reason| matches!(custom_reason, "Skipping")), + ); + let key = CacheKey::new("a", "b", "c"); + // cacheable if no history + assert!(predictor.cacheable_prediction(&key)); + + // custom predicate still uses default skip reasons + predictor.mark_uncacheable(&key, NoCacheReason::InternalError); + assert!(predictor.cacheable_prediction(&key)); + + // other custom reasons can still be marked uncacheable + predictor.mark_uncacheable(&key, NoCacheReason::Custom("DontCacheMe")); + assert!(!predictor.cacheable_prediction(&key)); + + let key = CacheKey::new("a", "c", "d"); + assert!(predictor.cacheable_prediction(&key)); + // specific custom reason is skipped + predictor.mark_uncacheable(&key, NoCacheReason::Custom("Skipping")); + assert!(predictor.cacheable_prediction(&key)); + } + + #[test] + fn test_mark_uncacheable_lru() { + let predictor = Predictor::<1>::new(3, None); + let key1 = CacheKey::new("a", "b", "c"); + predictor.mark_uncacheable(&key1, NoCacheReason::OriginNotCache); + assert!(!predictor.cacheable_prediction(&key1)); + + let key2 = CacheKey::new("a", "bc", "c"); + predictor.mark_uncacheable(&key2, NoCacheReason::OriginNotCache); + assert!(!predictor.cacheable_prediction(&key2)); + + let key3 = CacheKey::new("a", "cd", "c"); + predictor.mark_uncacheable(&key3, NoCacheReason::OriginNotCache); + assert!(!predictor.cacheable_prediction(&key3)); + + // promote / reinsert key1 + predictor.mark_uncacheable(&key1, NoCacheReason::OriginNotCache); + + let key4 = CacheKey::new("a", "de", "c"); + predictor.mark_uncacheable(&key4, NoCacheReason::OriginNotCache); + assert!(!predictor.cacheable_prediction(&key4)); + + // key 1 was recently used + assert!(!predictor.cacheable_prediction(&key1)); + // key 2 was evicted + assert!(predictor.cacheable_prediction(&key2)); + assert!(!predictor.cacheable_prediction(&key3)); + assert!(!predictor.cacheable_prediction(&key4)); + } +} diff --git a/pingora-cache/src/put.rs b/pingora-cache/src/put.rs new file mode 100644 index 0000000..c50cc2b --- /dev/null +++ b/pingora-cache/src/put.rs @@ -0,0 +1,754 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Cache Put module + +use crate::*; +use bytes::Bytes; +use http::header; +use pingora_core::protocols::http::{ + v1::common::header_value_content_length, HttpTask, ServerSession, +}; + +/// The interface to define cache put behavior +pub trait CachePut { + /// Return whether to cache the asset according to the given response header. + fn cacheable(&self, response: &ResponseHeader) -> RespCacheable { + let cc = cache_control::CacheControl::from_resp_headers(response); + filters::resp_cacheable(cc.as_ref(), response, false, Self::cache_defaults()) + } + + /// Return the [CacheMetaDefaults] + fn cache_defaults() -> &'static CacheMetaDefaults; +} + +use parse_response::ResponseParse; + +/// The cache put context +pub struct CachePutCtx<C: CachePut> { + cache_put: C, // the user defined cache put behavior + key: CacheKey, + storage: &'static (dyn storage::Storage + Sync), // static for now + eviction: Option<&'static (dyn eviction::EvictionManager + Sync)>, + miss_handler: Option<MissHandler>, + max_file_size_bytes: Option<usize>, + meta: Option<CacheMeta>, + parser: ResponseParse, + // FIXME: cache put doesn't have cache lock but some storage cannot handle concurrent put + // to the same asset. + trace: trace::Span, +} + +impl<C: CachePut> CachePutCtx<C> { + /// Create a new [CachePutCtx] + pub fn new( + cache_put: C, + key: CacheKey, + storage: &'static (dyn storage::Storage + Sync), + eviction: Option<&'static (dyn eviction::EvictionManager + Sync)>, + trace: trace::Span, + ) -> Self { + CachePutCtx { + cache_put, + key, + storage, + eviction, + miss_handler: None, + max_file_size_bytes: None, + meta: None, + parser: ResponseParse::new(), + trace, + } + } + + /// Set the max cacheable size limit + pub fn set_max_file_size_bytes(&mut self, max_file_size_bytes: usize) { + self.max_file_size_bytes = Some(max_file_size_bytes); + } + + async fn put_header(&mut self, meta: CacheMeta) -> Result<()> { + let trace = self.trace.child("cache put header", |o| o.start()).handle(); + let miss_handler = self + .storage + .get_miss_handler(&self.key, &meta, &trace) + .await?; + self.miss_handler = Some( + if let Some(max_file_size_bytes) = self.max_file_size_bytes { + Box::new(MaxFileSizeMissHandler::new( + miss_handler, + max_file_size_bytes, + )) + } else { + miss_handler + }, + ); + self.meta = Some(meta); + Ok(()) + } + + async fn put_body(&mut self, data: Bytes, eof: bool) -> Result<()> { + let miss_handler = self.miss_handler.as_mut().unwrap(); + miss_handler.write_body(data, eof).await + } + + async fn finish(&mut self) -> Result<()> { + let Some(miss_handler) = self.miss_handler.take() else { + // no miss_handler, uncacheable + return Ok(()); + }; + let size = miss_handler.finish().await?; + if let Some(eviction) = self.eviction.as_ref() { + let cache_key = self.key.to_compact(); + let meta = self.meta.as_ref().unwrap(); + let evicted = eviction.admit(cache_key, size, meta.0.internal.fresh_until); + // TODO: make this async + let trace = self + .trace + .child("cache put eviction", |o| o.start()) + .handle(); + for item in evicted { + // TODO: warn/log the error + let _ = self.storage.purge(&item, &trace).await; + } + } + + Ok(()) + } + + async fn do_cache_put(&mut self, data: &[u8]) -> Result<Option<NoCacheReason>> { + let tasks = self.parser.inject_data(data)?; + for task in tasks { + match task { + HttpTask::Header(header, _eos) => match self.cache_put.cacheable(&header) { + RespCacheable::Cacheable(meta) => { + if let Some(max_file_size_bytes) = self.max_file_size_bytes { + let content_length_hdr = header.headers.get(header::CONTENT_LENGTH); + if let Some(content_length) = + header_value_content_length(content_length_hdr) + { + if content_length > max_file_size_bytes { + return Ok(Some(NoCacheReason::ResponseTooLarge)); + } + } + } + + self.put_header(meta).await?; + } + RespCacheable::Uncacheable(reason) => { + return Ok(Some(reason)); + } + }, + HttpTask::Body(data, eos) => { + if let Some(data) = data { + self.put_body(data, eos).await?; + } + } + _ => { + panic!("unexpected HttpTask during cache put {task:?}"); + } + } + } + Ok(None) + } + + /// Start the cache put logic for the given request + /// + /// This function will start to read the request body to put into cache. + /// Return: + /// - `Ok(None)` when the payload will be cache. + /// - `Ok(Some(reason))` when the payload is not cacheable + pub async fn cache_put( + &mut self, + session: &mut ServerSession, + ) -> Result<Option<NoCacheReason>> { + let mut no_cache_reason = None; + while let Some(data) = session.read_request_body().await? { + if no_cache_reason.is_some() { + // even uncacheable, the entire body needs to be drains for 1. downstream + // not throwing errors 2. connection reuse + continue; + } + no_cache_reason = self.do_cache_put(&data).await? + } + self.parser.finish()?; + self.finish().await?; + Ok(no_cache_reason) + } +} + +#[cfg(test)] +mod test { + use super::*; + use once_cell::sync::Lazy; + use rustracing::span::Span; + + struct TestCachePut(); + impl CachePut for TestCachePut { + fn cache_defaults() -> &'static CacheMetaDefaults { + const DEFAULT: CacheMetaDefaults = CacheMetaDefaults::new(|_| Some(1), 1, 1); + &DEFAULT + } + } + + type TestCachePutCtx = CachePutCtx<TestCachePut>; + static CACHE_BACKEND: Lazy<MemCache> = Lazy::new(MemCache::new); + + #[tokio::test] + async fn test_cache_put() { + let key = CacheKey::new("", "a", "1"); + let span = Span::inactive(); + let put = TestCachePut(); + let mut ctx = TestCachePutCtx::new(put, key.clone(), &*CACHE_BACKEND, None, span); + let payload = b"HTTP/1.1 200 OK\r\n\ + Date: Thu, 26 Apr 2018 05:42:05 GMT\r\n\ + Content-Type: text/html; charset=utf-8\r\n\ + Connection: keep-alive\r\n\ + X-Frame-Options: SAMEORIGIN\r\n\ + Cache-Control: public, max-age=1\r\n\ + Server: origin-server\r\n\ + Content-Length: 4\r\n\r\nrust"; + // here we skip mocking a real http session for simplicity + let res = ctx.do_cache_put(payload).await.unwrap(); + assert!(res.is_none()); // cacheable + ctx.parser.finish().unwrap(); + ctx.finish().await.unwrap(); + + let span = Span::inactive(); + let (meta, mut hit) = CACHE_BACKEND + .lookup(&key, &span.handle()) + .await + .unwrap() + .unwrap(); + assert_eq!( + meta.headers().get("date").unwrap(), + "Thu, 26 Apr 2018 05:42:05 GMT" + ); + let data = hit.read_body().await.unwrap().unwrap(); + assert_eq!(data, "rust"); + } + + #[tokio::test] + async fn test_cache_put_uncacheable() { + let key = CacheKey::new("", "a", "1"); + let span = Span::inactive(); + let put = TestCachePut(); + let mut ctx = TestCachePutCtx::new(put, key.clone(), &*CACHE_BACKEND, None, span); + let payload = b"HTTP/1.1 200 OK\r\n\ + Date: Thu, 26 Apr 2018 05:42:05 GMT\r\n\ + Content-Type: text/html; charset=utf-8\r\n\ + Connection: keep-alive\r\n\ + X-Frame-Options: SAMEORIGIN\r\n\ + Cache-Control: no-store\r\n\ + Server: origin-server\r\n\ + Content-Length: 4\r\n\r\nrust"; + // here we skip mocking a real http session for simplicity + let no_cache = ctx.do_cache_put(payload).await.unwrap().unwrap(); + assert_eq!(no_cache, NoCacheReason::OriginNotCache); + ctx.parser.finish().unwrap(); + ctx.finish().await.unwrap(); + } +} + +// maybe this can simplify some logic in pingora::h1 + +mod parse_response { + use super::*; + use bytes::{Bytes, BytesMut}; + use httparse::Status; + use pingora_error::{ + Error, + ErrorType::{self, *}, + Result, + }; + use pingora_http::ResponseHeader; + + pub const INVALID_CHUNK: ErrorType = ErrorType::new("InvalidChunk"); + pub const INCOMPLETE_BODY: ErrorType = ErrorType::new("IncompleteHttpBody"); + + const MAX_HEADERS: usize = 256; + const INIT_HEADER_BUF_SIZE: usize = 4096; + const CHUNK_DELIMITER_SIZE: usize = 2; // \r\n + + #[derive(Debug, Clone, Copy)] + enum ParseState { + Init, + PartialHeader, + PartialBodyContentLength(usize, usize), + PartialChunkedBody(usize), + PartialHttp10Body(usize), + Done(usize), + Invalid(httparse::Error), + } + + impl ParseState { + fn is_done(&self) -> bool { + matches!(self, Self::Done(_)) + } + fn read_header(&self) -> bool { + matches!(self, Self::Init | Self::PartialHeader) + } + fn read_body(&self) -> bool { + matches!( + self, + Self::PartialBodyContentLength(..) + | Self::PartialChunkedBody(_) + | Self::PartialHttp10Body(_) + ) + } + } + + pub(super) struct ResponseParse { + state: ParseState, + buf: BytesMut, + header_bytes: Bytes, + } + + impl ResponseParse { + pub fn new() -> Self { + ResponseParse { + state: ParseState::Init, + buf: BytesMut::with_capacity(INIT_HEADER_BUF_SIZE), + header_bytes: Bytes::new(), + } + } + + pub fn inject_data(&mut self, data: &[u8]) -> Result<Vec<HttpTask>> { + self.put_data(data); + + let mut tasks = vec![]; + while !self.state.is_done() { + if self.state.read_header() { + let header = self.parse_header()?; + let Some(header) = header else { + break; + }; + tasks.push(HttpTask::Header(Box::new(header), self.state.is_done())); + } else if self.state.read_body() { + let body = self.parse_body()?; + let Some(body) = body else { + break; + }; + tasks.push(HttpTask::Body(Some(body), self.state.is_done())); + } else { + break; + } + } + Ok(tasks) + } + + fn put_data(&mut self, data: &[u8]) { + use ParseState::*; + if matches!(self.state, Done(_) | Invalid(_)) { + panic!("Wrong phase {:?}", self.state); + } + self.buf.extend_from_slice(data); + } + + fn parse_header(&mut self) -> Result<Option<ResponseHeader>> { + let mut headers = [httparse::EMPTY_HEADER; MAX_HEADERS]; + let mut resp = httparse::Response::new(&mut headers); + let mut parser = httparse::ParserConfig::default(); + parser.allow_spaces_after_header_name_in_responses(true); + parser.allow_obsolete_multiline_headers_in_responses(true); + + let res = parser.parse_response(&mut resp, &self.buf); + let res = match res { + Ok(res) => res, + Err(e) => { + self.state = ParseState::Invalid(e); + return Error::e_because( + InvalidHTTPHeader, + format!("buf: {:?}", String::from_utf8_lossy(&self.buf)), + e, + ); + } + }; + + let split_to = match res { + Status::Complete(s) => s, + Status::Partial => { + self.state = ParseState::PartialHeader; + return Ok(None); + } + }; + // safe to unwrap, valid response always has code set. + let mut response = + ResponseHeader::build(resp.code.unwrap(), Some(resp.headers.len())).unwrap(); + for header in resp.headers { + // TODO: consider hold a Bytes and all header values can be Bytes referencing the + // original buffer without reallocation + response.append_header(header.name.to_owned(), header.value.to_owned())?; + } + // TODO: see above, we can make header value `Bytes` referencing header_bytes + let header_bytes = self.buf.split_to(split_to).freeze(); + self.header_bytes = header_bytes; + self.state = body_type(&response); + + Ok(Some(response)) + } + + fn parse_body(&mut self) -> Result<Option<Bytes>> { + use ParseState::*; + if self.buf.is_empty() { + return Ok(None); + } + match self.state { + Init | PartialHeader | Invalid(_) => { + panic!("Wrong phase {:?}", self.state); + } + Done(_) => Ok(None), + PartialBodyContentLength(total, mut seen) => { + let end = if total < self.buf.len() + seen { + // TODO: warn! more data than expected + total - seen + } else { + self.buf.len() + }; + seen += end; + if seen >= total { + self.state = Done(seen); + } else { + self.state = PartialBodyContentLength(total, seen); + } + Ok(Some(self.buf.split_to(end).freeze())) + } + PartialChunkedBody(seen) => { + let parsed = httparse::parse_chunk_size(&self.buf).map_err(|e| { + self.state = Done(seen); + Error::explain(INVALID_CHUNK, format!("Invalid chucked encoding: {e:?}")) + })?; + match parsed { + httparse::Status::Complete((header_len, body_len)) => { + // 4\r\nRust\r\n: header: "4\r\n", body: "Rust", "\r\n" + let total_chunk_size = + header_len + body_len as usize + CHUNK_DELIMITER_SIZE; + if self.buf.len() < total_chunk_size { + // wait for the full chunk tob read + // Note that we have to buffer the entire chunk in this design + Ok(None) + } else { + if body_len == 0 { + self.state = Done(seen); + } else { + self.state = PartialChunkedBody(seen + body_len as usize); + } + let mut chunk_bytes = self.buf.split_to(total_chunk_size); + let mut chunk_body = chunk_bytes.split_off(header_len); + chunk_body.truncate(body_len as usize); + // Note that the final 0 sized chunk will return an empty Bytes + // instead of not None + Ok(Some(chunk_body.freeze())) + } + } + httparse::Status::Partial => { + // not even a full chunk, continue waiting for more data + Ok(None) + } + } + } + PartialHttp10Body(seen) => { + self.state = PartialHttp10Body(seen + self.buf.len()); + Ok(Some(self.buf.split().freeze())) + } + } + } + + pub fn finish(&mut self) -> Result<()> { + if let ParseState::PartialHttp10Body(seen) = self.state { + self.state = ParseState::Done(seen); + } + if !self.state.is_done() { + Error::e_explain(INCOMPLETE_BODY, format!("{:?}", self.state)) + } else { + Ok(()) + } + } + } + + fn body_type(resp: &ResponseHeader) -> ParseState { + use http::StatusCode; + + if matches!( + resp.status, + StatusCode::NO_CONTENT | StatusCode::NOT_MODIFIED + ) { + // these status code cannot have body by definition + return ParseState::Done(0); + } + if let Some(encoding) = resp.headers.get(http::header::TRANSFER_ENCODING) { + // TODO: case sensitive? + if encoding.as_bytes() == b"chunked" { + return ParseState::PartialChunkedBody(0); + } + } + if let Some(cl) = resp.headers.get(http::header::CONTENT_LENGTH) { + // ignore invalid header value + if let Some(cl) = std::str::from_utf8(cl.as_bytes()) + .ok() + .and_then(|cl| cl.parse::<usize>().ok()) + { + return if cl == 0 { + ParseState::Done(0) + } else { + ParseState::PartialBodyContentLength(cl, 0) + }; + } + } + ParseState::PartialHttp10Body(0) + } + + #[cfg(test)] + mod test { + use super::*; + + #[test] + fn test_basic_response() { + let input = b"HTTP/1.1 200 OK\r\n\r\n"; + let mut parser = ResponseParse::new(); + let output = parser.inject_data(input).unwrap(); + assert_eq!(output.len(), 1); + let HttpTask::Header(header, eos) = &output[0] else { + panic!("{:?}", output); + }; + assert_eq!(header.status, 200); + assert!(!eos); + + let body = b"abc"; + let output = parser.inject_data(body).unwrap(); + assert_eq!(output.len(), 1); + let HttpTask::Body(data, _eos) = &output[0] else { + panic!("{:?}", output); + }; + assert_eq!(data.as_ref().unwrap(), &body[..]); + parser.finish().unwrap(); + } + + #[test] + fn test_partial_response_headers() { + let input = b"HTTP/1.1 200 OK\r\n"; + let mut parser = ResponseParse::new(); + let output = parser.inject_data(input).unwrap(); + // header is not complete + assert_eq!(output.len(), 0); + + let output = parser + .inject_data("Server: pingora\r\n\r\n".as_bytes()) + .unwrap(); + assert_eq!(output.len(), 1); + let HttpTask::Header(header, eos) = &output[0] else { + panic!("{:?}", output); + }; + assert_eq!(header.status, 200); + assert_eq!(header.headers.get("Server").unwrap(), "pingora"); + assert!(!eos); + } + + #[test] + fn test_invalid_headers() { + let input = b"HTP/1.1 200 OK\r\nServer: pingora\r\n\r\n"; + let mut parser = ResponseParse::new(); + let output = parser.inject_data(input); + // header is not complete + assert!(output.is_err()); + } + + #[test] + fn test_body_content_length() { + let input = b"HTTP/1.1 200 OK\r\nContent-Length: 6\r\n\r\nabc"; + let mut parser = ResponseParse::new(); + let output = parser.inject_data(input).unwrap(); + + assert_eq!(output.len(), 2); + let HttpTask::Header(header, _eos) = &output[0] else { + panic!("{:?}", output); + }; + assert_eq!(header.status, 200); + + let HttpTask::Body(data, eos) = &output[1] else { + panic!("{:?}", output); + }; + assert_eq!(data.as_ref().unwrap(), "abc"); + assert!(!eos); + + let output = parser.inject_data(b"def").unwrap(); + assert_eq!(output.len(), 1); + let HttpTask::Body(data, eos) = &output[0] else { + panic!("{:?}", output); + }; + assert_eq!(data.as_ref().unwrap(), "def"); + assert!(eos); + + parser.finish().unwrap(); + } + + #[test] + fn test_body_chunked() { + let input = b"HTTP/1.1 200 OK\r\nTransfer-Encoding: chunked\r\n\r\n4\r\nrust\r\n"; + let mut parser = ResponseParse::new(); + let output = parser.inject_data(input).unwrap(); + + assert_eq!(output.len(), 2); + let HttpTask::Header(header, _eos) = &output[0] else { + panic!("{:?}", output); + }; + assert_eq!(header.status, 200); + + let HttpTask::Body(data, eos) = &output[1] else { + panic!("{:?}", output); + }; + assert_eq!(data.as_ref().unwrap(), "rust"); + assert!(!eos); + + let output = parser.inject_data(b"0\r\n\r\n").unwrap(); + assert_eq!(output.len(), 1); + let HttpTask::Body(data, eos) = &output[0] else { + panic!("{:?}", output); + }; + assert_eq!(data.as_ref().unwrap(), ""); + assert!(eos); + + parser.finish().unwrap(); + } + + #[test] + fn test_body_content_length_early() { + let input = b"HTTP/1.1 200 OK\r\nContent-Length: 6\r\n\r\nabc"; + let mut parser = ResponseParse::new(); + let output = parser.inject_data(input).unwrap(); + + assert_eq!(output.len(), 2); + let HttpTask::Header(header, _eos) = &output[0] else { + panic!("{:?}", output); + }; + assert_eq!(header.status, 200); + + let HttpTask::Body(data, eos) = &output[1] else { + panic!("{:?}", output); + }; + assert_eq!(data.as_ref().unwrap(), "abc"); + assert!(!eos); + + parser.finish().unwrap_err(); + } + + #[test] + fn test_body_content_length_more_data() { + let input = b"HTTP/1.1 200 OK\r\nContent-Length: 2\r\n\r\nabc"; + let mut parser = ResponseParse::new(); + let output = parser.inject_data(input).unwrap(); + + assert_eq!(output.len(), 2); + let HttpTask::Header(header, _eos) = &output[0] else { + panic!("{:?}", output); + }; + assert_eq!(header.status, 200); + + let HttpTask::Body(data, eos) = &output[1] else { + panic!("{:?}", output); + }; + assert_eq!(data.as_ref().unwrap(), "ab"); + assert!(eos); + + // extra data is dropped without error + parser.finish().unwrap(); + } + + #[test] + fn test_body_chunked_early() { + let input = b"HTTP/1.1 200 OK\r\nTransfer-Encoding: chunked\r\n\r\n4\r\nrust\r\n"; + let mut parser = ResponseParse::new(); + let output = parser.inject_data(input).unwrap(); + + assert_eq!(output.len(), 2); + let HttpTask::Header(header, _eos) = &output[0] else { + panic!("{:?}", output); + }; + assert_eq!(header.status, 200); + + let HttpTask::Body(data, eos) = &output[1] else { + panic!("{:?}", output); + }; + assert_eq!(data.as_ref().unwrap(), "rust"); + assert!(!eos); + + parser.finish().unwrap_err(); + } + + #[test] + fn test_body_chunked_partial_chunk() { + let input = b"HTTP/1.1 200 OK\r\nTransfer-Encoding: chunked\r\n\r\n4\r\nru"; + let mut parser = ResponseParse::new(); + let output = parser.inject_data(input).unwrap(); + + assert_eq!(output.len(), 1); + let HttpTask::Header(header, _eos) = &output[0] else { + panic!("{:?}", output); + }; + assert_eq!(header.status, 200); + + let output = parser.inject_data(b"st\r\n").unwrap(); + assert_eq!(output.len(), 1); + let HttpTask::Body(data, eos) = &output[0] else { + panic!("{:?}", output); + }; + assert_eq!(data.as_ref().unwrap(), "rust"); + assert!(!eos); + } + + #[test] + fn test_body_chunked_partial_chunk_head() { + let input = b"HTTP/1.1 200 OK\r\nTransfer-Encoding: chunked\r\n\r\n4\r"; + let mut parser = ResponseParse::new(); + let output = parser.inject_data(input).unwrap(); + + assert_eq!(output.len(), 1); + let HttpTask::Header(header, _eos) = &output[0] else { + panic!("{:?}", output); + }; + assert_eq!(header.status, 200); + + let output = parser.inject_data(b"\nrust\r\n").unwrap(); + assert_eq!(output.len(), 1); + let HttpTask::Body(data, eos) = &output[0] else { + panic!("{:?}", output); + }; + assert_eq!(data.as_ref().unwrap(), "rust"); + assert!(!eos); + } + + #[test] + fn test_body_chunked_many_chunks() { + let input = + b"HTTP/1.1 200 OK\r\nTransfer-Encoding: chunked\r\n\r\n4\r\nrust\r\n1\r\ny\r\n"; + let mut parser = ResponseParse::new(); + let output = parser.inject_data(input).unwrap(); + + assert_eq!(output.len(), 3); + let HttpTask::Header(header, _eos) = &output[0] else { + panic!("{:?}", output); + }; + assert_eq!(header.status, 200); + let HttpTask::Body(data, eos) = &output[1] else { + panic!("{:?}", output); + }; + assert!(!eos); + assert_eq!(data.as_ref().unwrap(), "rust"); + let HttpTask::Body(data, eos) = &output[2] else { + panic!("{:?}", output); + }; + assert_eq!(data.as_ref().unwrap(), "y"); + assert!(!eos); + } + } +} diff --git a/pingora-cache/src/storage.rs b/pingora-cache/src/storage.rs new file mode 100644 index 0000000..c6365c7 --- /dev/null +++ b/pingora-cache/src/storage.rs @@ -0,0 +1,122 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Cache backend storage abstraction + +use super::{CacheKey, CacheMeta}; +use crate::key::CompactCacheKey; +use crate::trace::SpanHandle; + +use async_trait::async_trait; +use pingora_error::Result; +use std::any::Any; + +/// Cache storage interface +#[async_trait] +pub trait Storage { + // TODO: shouldn't have to be static + + /// Lookup the storage for the given [CacheKey] + async fn lookup( + &'static self, + key: &CacheKey, + trace: &SpanHandle, + ) -> Result<Option<(CacheMeta, HitHandler)>>; + + /// Write the given [CacheMeta] to the storage. Return [MissHandler] to write the body later. + async fn get_miss_handler( + &'static self, + key: &CacheKey, + meta: &CacheMeta, + trace: &SpanHandle, + ) -> Result<MissHandler>; + + /// Delete the cached asset for the given key + /// + /// [CompactCacheKey] is used here because it is how eviction managers store the keys + async fn purge(&'static self, key: &CompactCacheKey, trace: &SpanHandle) -> Result<bool>; + + /// Update cache header and metadata for the already stored asset. + async fn update_meta( + &'static self, + key: &CacheKey, + meta: &CacheMeta, + trace: &SpanHandle, + ) -> Result<bool>; + + /// Whether this storage backend supports reading partially written data + /// + /// This is to indicate when cache should unlock readers + fn support_streaming_partial_write(&self) -> bool { + false + } + + /// Helper function to cast the trait object to concrete types + fn as_any(&self) -> &(dyn Any + Send + Sync + 'static); +} + +/// Cache hit handling trait +#[async_trait] +pub trait HandleHit { + /// Read cached body + /// + /// Return `None` when no more body to read. + async fn read_body(&mut self) -> Result<Option<bytes::Bytes>>; + + /// Finish the current cache hit + async fn finish( + self: Box<Self>, // because self is always used as a trait object + storage: &'static (dyn Storage + Sync), + key: &CacheKey, + trace: &SpanHandle, + ) -> Result<()>; + + /// Whether this storage allow seeking to a certain range of body + fn can_seek(&self) -> bool { + false + } + + /// Try to seek to a certain range of the body + /// + /// `end: None` means to read to the end of the body. + fn seek(&mut self, _start: usize, _end: Option<usize>) -> Result<()> { + // to prevent impl can_seek() without impl seek + todo!("seek() needs to be implemented") + } + // TODO: fn is_stream_hit() + + /// Helper function to cast the trait object to concrete types + fn as_any(&self) -> &(dyn Any + Send + Sync); +} + +/// Hit Handler +pub type HitHandler = Box<(dyn HandleHit + Sync + Send)>; + +/// Cache miss handling trait +#[async_trait] +pub trait HandleMiss { + /// Write the given body to the storage + async fn write_body(&mut self, data: bytes::Bytes, eof: bool) -> Result<()>; + + /// Finish the cache admission + /// + /// When `self` is dropped without calling this function, the storage should consider this write + /// failed. + async fn finish( + self: Box<Self>, // because self is always used as a trait object + ) -> Result<usize>; +} + +/// Miss Handler +pub type MissHandler = Box<(dyn HandleMiss + Sync + Send)>; diff --git a/pingora-cache/src/trace.rs b/pingora-cache/src/trace.rs new file mode 100644 index 0000000..c385aea --- /dev/null +++ b/pingora-cache/src/trace.rs @@ -0,0 +1,98 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Distributed tracing helpers + +use rustracing_jaeger::span::SpanContextState; +use std::time::SystemTime; + +use crate::{CacheMeta, CachePhase, HitStatus}; + +pub use rustracing::tag::Tag; + +pub type Span = rustracing::span::Span<SpanContextState>; +pub type SpanHandle = rustracing::span::SpanHandle<SpanContextState>; + +#[derive(Debug)] +pub(crate) struct CacheTraceCTX { + // parent span + pub cache_span: Span, + // only spans across multiple calls need to store here + pub miss_span: Span, + pub hit_span: Span, +} + +impl CacheTraceCTX { + pub fn new() -> Self { + CacheTraceCTX { + cache_span: Span::inactive(), + miss_span: Span::inactive(), + hit_span: Span::inactive(), + } + } + + pub fn enable(&mut self, cache_span: Span) { + self.cache_span = cache_span; + } + + #[inline] + pub fn child(&self, name: &'static str) -> Span { + self.cache_span.child(name, |o| o.start()) + } + + pub fn start_miss_span(&mut self) { + self.miss_span = self.child("miss"); + } + + pub fn get_miss_span(&self) -> SpanHandle { + self.miss_span.handle() + } + + pub fn finish_miss_span(&mut self) { + self.miss_span.set_finish_time(SystemTime::now); + } + + pub fn start_hit_span(&mut self, phase: CachePhase, hit_status: HitStatus) { + self.hit_span = self.child("hit"); + self.hit_span.set_tag(|| Tag::new("phase", phase.as_str())); + self.hit_span + .set_tag(|| Tag::new("status", hit_status.as_str())); + } + + pub fn finish_hit_span(&mut self) { + self.hit_span.set_finish_time(SystemTime::now); + } + + pub fn log_meta(&mut self, meta: &CacheMeta) { + fn ts2epoch(ts: SystemTime) -> f64 { + ts.duration_since(SystemTime::UNIX_EPOCH) + .unwrap_or_default() // should never overflow but be safe here + .as_secs_f64() + } + let internal = &meta.0.internal; + self.hit_span.set_tags(|| { + [ + Tag::new("created", ts2epoch(internal.created)), + Tag::new("fresh_until", ts2epoch(internal.fresh_until)), + Tag::new("updated", ts2epoch(internal.updated)), + Tag::new("stale_if_error_sec", internal.stale_if_error_sec as i64), + Tag::new( + "stale_while_revalidate_sec", + internal.stale_while_revalidate_sec as i64, + ), + Tag::new("variance", internal.variance.is_some()), + ] + }); + } +} diff --git a/pingora-cache/src/variance.rs b/pingora-cache/src/variance.rs new file mode 100644 index 0000000..cce8160 --- /dev/null +++ b/pingora-cache/src/variance.rs @@ -0,0 +1,120 @@ +use std::{borrow::Cow, collections::BTreeMap}; + +use blake2::Digest; + +use crate::key::{Blake2b128, HashBinary}; + +/// A builder for variance keys, used for distinguishing multiple cached assets +/// at the same URL. This is intended to be easily passed to helper functions, +/// which can each populate a portion of the variance. +pub struct VarianceBuilder<'a> { + values: BTreeMap<Cow<'a, str>, Cow<'a, [u8]>>, +} + +impl<'a> VarianceBuilder<'a> { + /// Create an empty variance key. Has no variance by default - add some variance using + /// [`Self::add_value`]. + pub fn new() -> Self { + VarianceBuilder { + values: BTreeMap::new(), + } + } + + /// Add a byte string to the variance key. Not sensitive to insertion order. + /// `value` is intended to take either `&str` or `&[u8]`. + pub fn add_value(&mut self, name: &'a str, value: &'a (impl AsRef<[u8]> + ?Sized)) { + self.values + .insert(name.into(), Cow::Borrowed(value.as_ref())); + } + + /// Move a byte string to the variance key. Not sensitive to insertion order. Useful when + /// writing helper functions which generate a value then add said value to the VarianceBuilder. + /// Without this, the helper function would have to move the value to the calling function + /// to extend its lifetime to at least match the VarianceBuilder. + pub fn add_owned_value(&mut self, name: &'a str, value: Vec<u8>) { + self.values.insert(name.into(), Cow::Owned(value)); + } + + /// Check whether this variance key actually has variance, or just refers to the root asset + pub fn has_variance(&self) -> bool { + !self.values.is_empty() + } + + /// Hash this variance key. Returns [`None`] if [`Self::has_variance`] is false. + pub fn finalize(self) -> Option<HashBinary> { + const SALT: &[u8; 1] = &[0u8; 1]; + if self.has_variance() { + let mut hash = Blake2b128::new(); + for (name, value) in self.values.iter() { + hash.update(name.as_bytes()); + hash.update(SALT); + hash.update(value); + hash.update(SALT); + } + Some(hash.finalize().into()) + } else { + None + } + } +} + +#[cfg(test)] +mod test { + use super::*; + + #[test] + fn test_basic() { + let key_empty = VarianceBuilder::new().finalize(); + assert_eq!(None, key_empty); + + let mut key_value = VarianceBuilder::new(); + key_value.add_value("a", "a"); + let key_value = key_value.finalize(); + + let mut key_owned_value = VarianceBuilder::new(); + key_owned_value.add_owned_value("a", "a".as_bytes().to_vec()); + let key_owned_value = key_owned_value.finalize(); + + assert_ne!(key_empty, key_value); + assert_ne!(key_empty, key_owned_value); + assert_eq!(key_value, key_owned_value); + } + + #[test] + fn test_value_ordering() { + let mut key_abc = VarianceBuilder::new(); + key_abc.add_value("a", "a"); + key_abc.add_value("b", "b"); + key_abc.add_value("c", "c"); + let key_abc = key_abc.finalize().unwrap(); + + let mut key_bac = VarianceBuilder::new(); + key_bac.add_value("b", "b"); + key_bac.add_value("a", "a"); + key_bac.add_value("c", "c"); + let key_bac = key_bac.finalize().unwrap(); + + let mut key_cba = VarianceBuilder::new(); + key_cba.add_value("c", "c"); + key_cba.add_value("b", "b"); + key_cba.add_value("a", "a"); + let key_cba = key_cba.finalize().unwrap(); + + assert_eq!(key_abc, key_bac); + assert_eq!(key_abc, key_cba); + } + + #[test] + fn test_value_overriding() { + let mut key_a = VarianceBuilder::new(); + key_a.add_value("a", "a"); + let key_a = key_a.finalize().unwrap(); + + let mut key_b = VarianceBuilder::new(); + key_b.add_value("a", "b"); + key_b.add_value("a", "a"); + let key_b = key_b.finalize().unwrap(); + + assert_eq!(key_a, key_b); + } +} diff --git a/pingora-core/Cargo.toml b/pingora-core/Cargo.toml new file mode 100644 index 0000000..76d239c --- /dev/null +++ b/pingora-core/Cargo.toml @@ -0,0 +1,81 @@ +[package] +name = "pingora-core" +version = "0.1.0" +authors = ["Yuchen Wu <[email protected]>"] +license = "Apache-2.0" +edition = "2021" +repository = "https://github.com/cloudflare/pingora" +categories = ["asynchronous", "network-programming"] +keywords = ["async", "http", "network", "pingora"] +exclude = ["tests/*"] +description = """ +Pingora's APIs and traits for the core network protocols. +""" + +# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html + +[lib] +name = "pingora_core" +path = "src/lib.rs" + +[dependencies] +pingora-runtime = { version = "0.1.0", path = "../pingora-runtime" } +pingora-openssl = { version = "0.1.0", path = "../pingora-openssl", optional = true } +pingora-boringssl = { version = "0.1.0", path = "../pingora-boringssl", optional = true } +pingora-pool = { version = "0.1.0", path = "../pingora-pool" } +pingora-error = { version = "0.1.0", path = "../pingora-error" } +pingora-timeout = { version = "0.1.0", path = "../pingora-timeout" } +pingora-http = { version = "0.1.0", path = "../pingora-http" } +tokio = { workspace = true, features = ["rt-multi-thread", "signal"] } +futures = "0.3" +async-trait = { workspace = true } +httparse = { workspace = true } +bytes = { workspace = true } +http = { workspace = true } +log = { workspace = true } +h2 = { workspace = true } +lru = { workspace = true } +nix = "0.24" +structopt = "0.3" +once_cell = { workspace = true } +serde = { version = "1.0", features = ["derive"] } +serde_yaml = "0.8" +libc = "0.2.70" +chrono = { version = "0.4", features = ["alloc"], default-features = false } +thread_local = "1.0" +prometheus = "0.13" +daemonize = "0.5.0" +sentry = { version = "0.26", features = [ + "backtrace", + "contexts", + "panic", + "reqwest", + "rustls", +], default-features = false } +regex = "1" +percent-encoding = "2.1" +parking_lot = "0.12" +socket2 = { version = "0", features = ["all"] } +flate2 = { version = "1", features = ["zlib-ng"], default-features = false } +sfv = "0" +rand = "0.8" +ahash = { workspace = true } +unicase = "2" +brotli = "3" +openssl-probe = "0.1" +tokio-test = "0.4" +zstd = "0" + +[dev-dependencies] +matches = "0.1" +env_logger = "0.9" +reqwest = { version = "0.11", features = ["rustls"], default-features = false } +hyperlocal = "0.8" +hyper = "0.14" +jemallocator = "0.5" + +[features] +default = ["openssl"] +openssl = ["pingora-openssl"] +boringssl = ["pingora-boringssl"] +patched_http1 = [] diff --git a/pingora-core/LICENSE b/pingora-core/LICENSE new file mode 100644 index 0000000..d645695 --- /dev/null +++ b/pingora-core/LICENSE @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/pingora-core/src/apps/http_app.rs b/pingora-core/src/apps/http_app.rs new file mode 100644 index 0000000..4dc9059 --- /dev/null +++ b/pingora-core/src/apps/http_app.rs @@ -0,0 +1,210 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! A simple HTTP application trait that maps a request to a response + +use async_trait::async_trait; +use http::Response; +use log::{debug, error, trace}; +use pingora_http::ResponseHeader; +use std::sync::Arc; + +use crate::apps::HttpServerApp; +use crate::modules::http::{HttpModules, ModuleBuilder}; +use crate::protocols::http::HttpTask; +use crate::protocols::http::ServerSession; +use crate::protocols::Stream; +use crate::server::ShutdownWatch; + +/// This trait defines how to map a request to a response +#[cfg_attr(not(doc_async_trait), async_trait)] +pub trait ServeHttp { + /// Define the mapping from a request to a response. + /// Note that the request header is already read, but the implementation needs to read the + /// request body if any. + /// + /// # Limitation + /// In this API, the entire response has to be generated before the end of this call. + /// So it is not suitable for streaming response or interactive communications. + /// Users need to implement their own [`super::HttpServerApp`] for those use cases. + async fn response(&self, http_session: &mut ServerSession) -> Response<Vec<u8>>; +} + +// TODO: remove this in favor of HttpServer? +#[cfg_attr(not(doc_async_trait), async_trait)] +impl<SV> HttpServerApp for SV +where + SV: ServeHttp + Send + Sync, +{ + async fn process_new_http( + self: &Arc<Self>, + mut http: ServerSession, + shutdown: &ShutdownWatch, + ) -> Option<Stream> { + match http.read_request().await { + Ok(res) => match res { + false => { + debug!("Failed to read request header"); + return None; + } + true => { + debug!("Successfully get a new request"); + } + }, + Err(e) => { + error!("HTTP server fails to read from downstream: {e}"); + return None; + } + } + trace!("{:?}", http.req_header()); + if *shutdown.borrow() { + http.set_keepalive(None); + } else { + http.set_keepalive(Some(60)); + } + let new_response = self.response(&mut http).await; + let (parts, body) = new_response.into_parts(); + let resp_header: ResponseHeader = parts.into(); + match http.write_response_header(Box::new(resp_header)).await { + Ok(()) => { + debug!("HTTP response header done."); + } + Err(e) => { + error!( + "HTTP server fails to write to downstream: {e}, {}", + http.request_summary() + ); + } + } + if !body.is_empty() { + // TODO: check if chunked encoding is needed + match http.write_response_body(body.into()).await { + Ok(_) => debug!("HTTP response written."), + Err(e) => error!( + "HTTP server fails to write to downstream: {e}, {}", + http.request_summary() + ), + } + } + match http.finish().await { + Ok(c) => c, + Err(e) => { + error!("HTTP server fails to finish the request: {e}"); + None + } + } + } +} + +/// A helper struct for HTTP server with http modules embedded +pub struct HttpServer<SV> { + app: SV, + modules: HttpModules, +} + +impl<SV> HttpServer<SV> { + /// Create a new [HttpServer] with the given app which implements [ServeHttp] + pub fn new_app(app: SV) -> Self { + HttpServer { + app, + modules: HttpModules::new(), + } + } + + /// Add [ModuleBuilder] to this [HttpServer] + pub fn add_module(&mut self, module: ModuleBuilder) { + self.modules.add_module(module) + } +} + +#[cfg_attr(not(doc_async_trait), async_trait)] +impl<SV> HttpServerApp for HttpServer<SV> +where + SV: ServeHttp + Send + Sync, +{ + async fn process_new_http( + self: &Arc<Self>, + mut http: ServerSession, + shutdown: &ShutdownWatch, + ) -> Option<Stream> { + match http.read_request().await { + Ok(res) => match res { + false => { + debug!("Failed to read request header"); + return None; + } + true => { + debug!("Successfully get a new request"); + } + }, + Err(e) => { + error!("HTTP server fails to read from downstream: {e}"); + return None; + } + } + trace!("{:?}", http.req_header()); + if *shutdown.borrow() { + http.set_keepalive(None); + } else { + http.set_keepalive(Some(60)); + } + let mut module_ctx = self.modules.build_ctx(); + let req = http.req_header_mut(); + module_ctx.request_header_filter(req).ok()?; + let new_response = self.app.response(&mut http).await; + let (parts, body) = new_response.into_parts(); + let resp_header: ResponseHeader = parts.into(); + let mut task = HttpTask::Header(Box::new(resp_header), body.is_empty()); + module_ctx.response_filter(&mut task).ok()?; + + trace!("{task:?}"); + + match http.response_duplex_vec(vec![task]).await { + Ok(_) => { + debug!("HTTP response header done."); + } + Err(e) => { + error!( + "HTTP server fails to write to downstream: {e}, {}", + http.request_summary() + ); + } + } + let mut task = if !body.is_empty() { + HttpTask::Body(Some(body.into()), true) + } else { + HttpTask::Body(None, true) + }; + + trace!("{task:?}"); + + module_ctx.response_filter(&mut task).ok()?; + + // TODO: check if chunked encoding is needed + match http.response_duplex_vec(vec![task]).await { + Ok(_) => debug!("HTTP response written."), + Err(e) => error!( + "HTTP server fails to write to downstream: {e}, {}", + http.request_summary() + ), + } + match http.finish().await { + Ok(c) => c, + Err(e) => { + error!("HTTP server fails to finish the request: {e}"); + None + } + } + } +} diff --git a/pingora-core/src/apps/mod.rs b/pingora-core/src/apps/mod.rs new file mode 100644 index 0000000..db8f11b --- /dev/null +++ b/pingora-core/src/apps/mod.rs @@ -0,0 +1,135 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! The abstraction and implementation interface for service application logic + +pub mod http_app; +pub mod prometheus_http_app; + +use crate::server::ShutdownWatch; +use async_trait::async_trait; +use log::{debug, error}; +use std::sync::Arc; + +use crate::protocols::http::v2::server; +use crate::protocols::http::ServerSession; +use crate::protocols::Stream; +use crate::protocols::ALPN; + +#[cfg_attr(not(doc_async_trait), async_trait)] +/// This trait defines the interface of a transport layer (TCP or TLS) application. +pub trait ServerApp { + /// Whenever a new connection is established, this function will be called with the established + /// [`Stream`] object provided. + /// + /// The application can do whatever it wants with the `session`. + /// + /// After processing the `session`, if the `session`'s connection is reusable, This function + /// can return it to the service by returning `Some(session)`. The returned `session` will be + /// fed to another [`Self::process_new()`] for another round of processing. + /// If not reusable, `None` should be returned. + /// + /// The `shutdown` argument will change from `false` to `true` when the server receives a + /// signal to shutdown. This argument allows the application to react accordingly. + async fn process_new( + self: &Arc<Self>, + mut session: Stream, + // TODO: make this ShutdownWatch so that all task can await on this event + shutdown: &ShutdownWatch, + ) -> Option<Stream>; + + /// This callback will be called once after the service stops listening to its endpoints. + fn cleanup(&self) {} +} + +/// This trait defines the interface of a HTTP application. +#[cfg_attr(not(doc_async_trait), async_trait)] +pub trait HttpServerApp { + /// Similar to the [`ServerApp`], this function is called whenever a new HTTP session is established. + /// + /// After successful processing, [`ServerSession::finish()`] can be called to return an optionally reusable + /// connection back to the service. The caller needs to make sure that the connection is in a reusable state + /// i.e., no error or incomplete read or write headers or bodies. Otherwise a `None` should be returned. + async fn process_new_http( + self: &Arc<Self>, + mut session: ServerSession, + // TODO: make this ShutdownWatch so that all task can await on this event + shutdown: &ShutdownWatch, + ) -> Option<Stream>; + + /// Provide options on how HTTP/2 connection should be established. This function will be called + /// every time a new HTTP/2 **connection** needs to be established. + /// + /// A `None` means to use the built-in default options. See [`server::H2Options`] for more details. + fn h2_options(&self) -> Option<server::H2Options> { + None + } + + fn http_cleanup(&self) {} +} + +#[cfg_attr(not(doc_async_trait), async_trait)] +impl<T> ServerApp for T +where + T: HttpServerApp + Send + Sync + 'static, +{ + async fn process_new( + self: &Arc<Self>, + stream: Stream, + shutdown: &ShutdownWatch, + ) -> Option<Stream> { + match stream.selected_alpn_proto() { + Some(ALPN::H2) => { + let h2_options = self.h2_options(); + let h2_conn = server::handshake(stream, h2_options).await; + let mut h2_conn = match h2_conn { + Err(e) => { + error!("H2 handshake error {e}"); + return None; + } + Ok(c) => c, + }; + loop { + // this loop ends when the client decides to close the h2 conn + // TODO: add a timeout? + let h2_stream = server::HttpSession::from_h2_conn(&mut h2_conn).await; + let h2_stream = match h2_stream { + Err(e) => { + // It is common for client to just disconnect TCP without properly + // closing H2. So we don't log the errors here + debug!("H2 error when accepting new stream {e}"); + return None; + } + Ok(s) => s?, // None means the connection is ready to be closed + }; + let app = self.clone(); + let shutdown = shutdown.clone(); + pingora_runtime::current_handle().spawn(async move { + app.process_new_http(ServerSession::new_http2(h2_stream), &shutdown) + .await; + }); + } + } + _ => { + // No ALPN or ALPN::H1 or something else, just try Http1 + self.process_new_http(ServerSession::new_http1(stream), shutdown) + .await + } + } + } + + fn cleanup(&self) { + self.http_cleanup() + } +} diff --git a/pingora-core/src/apps/prometheus_http_app.rs b/pingora-core/src/apps/prometheus_http_app.rs new file mode 100644 index 0000000..146f3da --- /dev/null +++ b/pingora-core/src/apps/prometheus_http_app.rs @@ -0,0 +1,60 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! A HTTP application that reports Prometheus metrics. + +use async_trait::async_trait; +use http::{self, Response}; +use prometheus::{Encoder, TextEncoder}; + +use super::http_app::HttpServer; +use crate::apps::http_app::ServeHttp; +use crate::modules::http::compression::ResponseCompressionBuilder; +use crate::protocols::http::ServerSession; + +/// A HTTP application that reports Prometheus metrics. +/// +/// This application will report all the [static metrics](https://docs.rs/prometheus/latest/prometheus/index.html#static-metrics) +/// collected via the [Prometheus](https://docs.rs/prometheus/) crate; +pub struct PrometheusHttpApp; + +#[cfg_attr(not(doc_async_trait), async_trait)] +impl ServeHttp for PrometheusHttpApp { + async fn response(&self, _http_session: &mut ServerSession) -> Response<Vec<u8>> { + let encoder = TextEncoder::new(); + let metric_families = prometheus::gather(); + let mut buffer = vec![]; + encoder.encode(&metric_families, &mut buffer).unwrap(); + Response::builder() + .status(200) + .header(http::header::CONTENT_TYPE, encoder.format_type()) + .header(http::header::CONTENT_LENGTH, buffer.len()) + .body(buffer) + .unwrap() + } +} + +/// The [HttpServer] for [PrometheusHttpApp] +/// +/// This type provides the functionality of [PrometheusHttpApp] with compression enabled +pub type PrometheusServer = HttpServer<PrometheusHttpApp>; + +impl PrometheusServer { + pub fn new() -> Self { + let mut server = Self::new_app(PrometheusHttpApp); + // enable gzip level 7 compression + server.add_module(ResponseCompressionBuilder::enable(7)); + server + } +} diff --git a/pingora-core/src/connectors/http/mod.rs b/pingora-core/src/connectors/http/mod.rs new file mode 100644 index 0000000..c399530 --- /dev/null +++ b/pingora-core/src/connectors/http/mod.rs @@ -0,0 +1,221 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Connecting to HTTP servers + +use crate::connectors::ConnectorOptions; +use crate::protocols::http::client::HttpSession; +use crate::upstreams::peer::Peer; +use pingora_error::Result; +use std::time::Duration; + +pub mod v1; +pub mod v2; + +pub struct Connector { + h1: v1::Connector, + h2: v2::Connector, +} + +impl Connector { + pub fn new(options: Option<ConnectorOptions>) -> Self { + Connector { + h1: v1::Connector::new(options.clone()), + h2: v2::Connector::new(options), + } + } + + /// Get an [HttpSession] to the given server. + /// + /// The second return value indicates whether the session is connected via a reused stream. + pub async fn get_http_session<P: Peer + Send + Sync + 'static>( + &self, + peer: &P, + ) -> Result<(HttpSession, bool)> { + // NOTE: maybe TODO: we do not yet enforce that only TLS traffic can use h2, which is the + // de facto requirement for h2, because non TLS traffic lack the negotiation mechanism. + + // We assume no peer option == no ALPN == h1 only + let h1_only = peer + .get_peer_options() + .map_or(true, |o| o.alpn.get_max_http_version() == 1); + if h1_only { + let (h1, reused) = self.h1.get_http_session(peer).await?; + Ok((HttpSession::H1(h1), reused)) + } else { + // the peer allows h2, we first check the h2 reuse pool + let reused_h2 = self.h2.reused_http_session(peer).await?; + if let Some(h2) = reused_h2 { + return Ok((HttpSession::H2(h2), true)); + } + let h2_only = peer + .get_peer_options() + .map_or(false, |o| o.alpn.get_min_http_version() == 2) + && !self.h2.h1_is_preferred(peer); + if !h2_only { + // We next check the reuse pool for h1 before creating a new h2 connection. + // This is because the server may not support h2 at all, connections to + // the server could all be h1. + if let Some(h1) = self.h1.reused_http_session(peer).await { + return Ok((HttpSession::H1(h1), true)); + } + } + let session = self.h2.new_http_session(peer).await?; + Ok((session, false)) + } + } + + pub async fn release_http_session<P: Peer + Send + Sync + 'static>( + &self, + session: HttpSession, + peer: &P, + idle_timeout: Option<Duration>, + ) { + match session { + HttpSession::H1(h1) => self.h1.release_http_session(h1, peer, idle_timeout).await, + HttpSession::H2(h2) => self.h2.release_http_session(h2, peer, idle_timeout), + } + } + + /// Tell the connector to always send h1 for ALPN for the given peer in the future. + pub fn prefer_h1(&self, peer: &impl Peer) { + self.h2.prefer_h1(peer); + } +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::protocols::http::v1::client::HttpSession as Http1Session; + use crate::upstreams::peer::HttpPeer; + use pingora_http::RequestHeader; + + async fn get_http(http: &mut Http1Session, expected_status: u16) { + let mut req = Box::new(RequestHeader::build("GET", b"/", None).unwrap()); + req.append_header("Host", "one.one.one.one").unwrap(); + http.write_request_header(req).await.unwrap(); + http.read_response().await.unwrap(); + http.respect_keepalive(); + + assert_eq!(http.get_status().unwrap(), expected_status); + while http.read_body_bytes().await.unwrap().is_some() {} + } + + #[tokio::test] + async fn test_connect_h2() { + let connector = Connector::new(None); + let mut peer = HttpPeer::new(("1.1.1.1", 443), true, "one.one.one.one".into()); + peer.options.set_http_version(2, 2); + let (h2, reused) = connector.get_http_session(&peer).await.unwrap(); + assert!(!reused); + match &h2 { + HttpSession::H1(_) => panic!("expect h2"), + HttpSession::H2(h2_stream) => assert!(!h2_stream.ping_timedout()), + } + + connector.release_http_session(h2, &peer, None).await; + + let (h2, reused) = connector.get_http_session(&peer).await.unwrap(); + // reused this time + assert!(reused); + match &h2 { + HttpSession::H1(_) => panic!("expect h2"), + HttpSession::H2(h2_stream) => assert!(!h2_stream.ping_timedout()), + } + } + + #[tokio::test] + async fn test_connect_h1() { + let connector = Connector::new(None); + let mut peer = HttpPeer::new(("1.1.1.1", 443), true, "one.one.one.one".into()); + peer.options.set_http_version(1, 1); + let (mut h1, reused) = connector.get_http_session(&peer).await.unwrap(); + assert!(!reused); + match &mut h1 { + HttpSession::H1(http) => { + get_http(http, 200).await; + } + HttpSession::H2(_) => panic!("expect h1"), + } + connector.release_http_session(h1, &peer, None).await; + + let (mut h1, reused) = connector.get_http_session(&peer).await.unwrap(); + // reused this time + assert!(reused); + match &mut h1 { + HttpSession::H1(_) => {} + HttpSession::H2(_) => panic!("expect h1"), + } + } + + #[tokio::test] + async fn test_connect_h2_fallback_h1_reuse() { + // this test verify that if the server doesn't support h2, the Connector will reuse the + // h1 session instead. + + let connector = Connector::new(None); + let mut peer = HttpPeer::new(("1.1.1.1", 443), true, "one.one.one.one".into()); + // As it is hard to find a server that support only h1, we use the following hack to trick + // the connector to think the server supports only h1. We force ALPN to use h1 and then + // return the connection to the Connector. And then we use a Peer that allows h2 + peer.options.set_http_version(1, 1); + let (mut h1, reused) = connector.get_http_session(&peer).await.unwrap(); + assert!(!reused); + match &mut h1 { + HttpSession::H1(http) => { + get_http(http, 200).await; + } + HttpSession::H2(_) => panic!("expect h1"), + } + connector.release_http_session(h1, &peer, None).await; + + let mut peer = HttpPeer::new(("1.1.1.1", 443), true, "one.one.one.one".into()); + peer.options.set_http_version(2, 1); + + let (mut h1, reused) = connector.get_http_session(&peer).await.unwrap(); + // reused this time + assert!(reused); + match &mut h1 { + HttpSession::H1(_) => {} + HttpSession::H2(_) => panic!("expect h1"), + } + } + + #[tokio::test] + async fn test_connect_prefer_h1() { + let connector = Connector::new(None); + let mut peer = HttpPeer::new(("1.1.1.1", 443), true, "one.one.one.one".into()); + peer.options.set_http_version(2, 1); + connector.prefer_h1(&peer); + + let (mut h1, reused) = connector.get_http_session(&peer).await.unwrap(); + assert!(!reused); + match &mut h1 { + HttpSession::H1(http) => { + get_http(http, 200).await; + } + HttpSession::H2(_) => panic!("expect h1"), + } + connector.release_http_session(h1, &peer, None).await; + + peer.options.set_http_version(2, 2); + let (mut h1, reused) = connector.get_http_session(&peer).await.unwrap(); + // reused this time + assert!(reused); + match &mut h1 { + HttpSession::H1(_) => {} + HttpSession::H2(_) => panic!("expect h1"), + } + } +} diff --git a/pingora-core/src/connectors/http/v1.rs b/pingora-core/src/connectors/http/v1.rs new file mode 100644 index 0000000..513fed1 --- /dev/null +++ b/pingora-core/src/connectors/http/v1.rs @@ -0,0 +1,119 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use crate::connectors::{ConnectorOptions, TransportConnector}; +use crate::protocols::http::v1::client::HttpSession; +use crate::upstreams::peer::Peer; + +use pingora_error::Result; +use std::time::Duration; + +pub struct Connector { + transport: TransportConnector, +} + +impl Connector { + pub fn new(options: Option<ConnectorOptions>) -> Self { + Connector { + transport: TransportConnector::new(options), + } + } + + pub async fn get_http_session<P: Peer + Send + Sync + 'static>( + &self, + peer: &P, + ) -> Result<(HttpSession, bool)> { + let (stream, reused) = self.transport.get_stream(peer).await?; + let http = HttpSession::new(stream); + Ok((http, reused)) + } + + pub async fn reused_http_session<P: Peer + Send + Sync + 'static>( + &self, + peer: &P, + ) -> Option<HttpSession> { + self.transport + .reused_stream(peer) + .await + .map(HttpSession::new) + } + + pub async fn release_http_session<P: Peer + Send + Sync + 'static>( + &self, + session: HttpSession, + peer: &P, + idle_timeout: Option<Duration>, + ) { + if let Some(stream) = session.reuse().await { + self.transport + .release_stream(stream, peer.reuse_hash(), idle_timeout); + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::upstreams::peer::HttpPeer; + use pingora_http::RequestHeader; + + async fn get_http(http: &mut HttpSession, expected_status: u16) { + let mut req = Box::new(RequestHeader::build("GET", b"/", None).unwrap()); + req.append_header("Host", "one.one.one.one").unwrap(); + http.write_request_header(req).await.unwrap(); + http.read_response().await.unwrap(); + http.respect_keepalive(); + + assert_eq!(http.get_status().unwrap(), expected_status); + while http.read_body_bytes().await.unwrap().is_some() {} + } + + #[tokio::test] + async fn test_connect() { + let connector = Connector::new(None); + let peer = HttpPeer::new(("1.1.1.1", 80), false, "".into()); + // make a new connection to 1.1.1.1 + let (http, reused) = connector.get_http_session(&peer).await.unwrap(); + assert!(!reused); + + // this http is not even used, so not be able to reuse + connector.release_http_session(http, &peer, None).await; + let (mut http, reused) = connector.get_http_session(&peer).await.unwrap(); + assert!(!reused); + + get_http(&mut http, 301).await; + connector.release_http_session(http, &peer, None).await; + let (_, reused) = connector.get_http_session(&peer).await.unwrap(); + assert!(reused); + } + + #[tokio::test] + async fn test_connect_tls() { + let connector = Connector::new(None); + let peer = HttpPeer::new(("1.1.1.1", 443), true, "one.one.one.one".into()); + // make a new connection to https://1.1.1.1 + let (http, reused) = connector.get_http_session(&peer).await.unwrap(); + assert!(!reused); + + // this http is not even used, so not be able to reuse + connector.release_http_session(http, &peer, None).await; + let (mut http, reused) = connector.get_http_session(&peer).await.unwrap(); + assert!(!reused); + + get_http(&mut http, 200).await; + connector.release_http_session(http, &peer, None).await; + let (_, reused) = connector.get_http_session(&peer).await.unwrap(); + assert!(reused); + } +} diff --git a/pingora-core/src/connectors/http/v2.rs b/pingora-core/src/connectors/http/v2.rs new file mode 100644 index 0000000..389bd4e --- /dev/null +++ b/pingora-core/src/connectors/http/v2.rs @@ -0,0 +1,531 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use super::HttpSession; +use crate::connectors::{ConnectorOptions, TransportConnector}; +use crate::protocols::http::v1::client::HttpSession as Http1Session; +use crate::protocols::http::v2::client::{drive_connection, Http2Session}; +use crate::protocols::{Digest, Stream}; +use crate::upstreams::peer::{Peer, ALPN}; + +use bytes::Bytes; +use h2::client::SendRequest; +use log::debug; +use parking_lot::RwLock; +use pingora_error::{Error, ErrorType::*, OrErr, Result}; +use pingora_pool::{ConnectionMeta, ConnectionPool, PoolNode}; +use std::collections::HashMap; +use std::sync::atomic::{AtomicBool, AtomicUsize, Ordering}; +use std::sync::Arc; +use std::time::Duration; +use tokio::sync::watch; + +struct Stub(SendRequest<Bytes>); + +impl Stub { + async fn new_stream(&self) -> Result<SendRequest<Bytes>> { + let send_req = self.0.clone(); + send_req + .ready() + .await + .or_err(H2Error, "while creating new stream") + } +} + +pub(crate) struct ConnectionRefInner { + connection_stub: Stub, + closed: watch::Receiver<bool>, + ping_timeout_occurred: Arc<AtomicBool>, + id: i32, + // max concurrent streams this connection is allowed to create + max_streams: usize, + // how many concurrent streams already active + current_streams: AtomicUsize, + // because `SendRequest` doesn't actually have access to the underlying Stream, + // we log info about timing and tcp info here. + pub(crate) digest: Digest, +} + +#[derive(Clone)] +pub(crate) struct ConnectionRef(Arc<ConnectionRefInner>); + +impl ConnectionRef { + pub fn new( + send_req: SendRequest<Bytes>, + closed: watch::Receiver<bool>, + ping_timeout_occurred: Arc<AtomicBool>, + id: i32, + max_streams: usize, + digest: Digest, + ) -> Self { + ConnectionRef(Arc::new(ConnectionRefInner { + connection_stub: Stub(send_req), + closed, + ping_timeout_occurred, + id, + max_streams, + current_streams: AtomicUsize::new(0), + digest, + })) + } + pub fn more_streams_allowed(&self) -> bool { + self.0.max_streams > self.0.current_streams.load(Ordering::Relaxed) + } + + pub fn is_idle(&self) -> bool { + self.0.current_streams.load(Ordering::Relaxed) == 0 + } + + pub fn release_stream(&self) { + self.0.current_streams.fetch_sub(1, Ordering::SeqCst); + } + + pub fn id(&self) -> i32 { + self.0.id + } + + pub fn digest(&self) -> &Digest { + &self.0.digest + } + + pub fn ping_timedout(&self) -> bool { + self.0.ping_timeout_occurred.load(Ordering::Relaxed) + } + + pub fn is_closed(&self) -> bool { + *self.0.closed.borrow() + } + + // spawn a stream if more stream is allowed, otherwise return Ok(None) + pub async fn spawn_stream(&self) -> Result<Option<Http2Session>> { + // Atomically check if the current_stream is over the limit + // load(), compare and then fetch_add() cannot guarantee the same + let current_streams = self.0.current_streams.fetch_add(1, Ordering::SeqCst); + if current_streams >= self.0.max_streams { + // already over the limit, reset the counter to the previous value + self.0.current_streams.fetch_sub(1, Ordering::SeqCst); + return Ok(None); + } + let send_req = self.0.connection_stub.new_stream().await.map_err(|e| { + // fail to create the stream, reset the counter + self.0.current_streams.fetch_sub(1, Ordering::SeqCst); + e + })?; + + Ok(Some(Http2Session::new(send_req, self.clone()))) + } +} + +struct InUsePool { + // TODO: use pingora hashmap to shard the lock contention + pools: RwLock<HashMap<u64, PoolNode<ConnectionRef>>>, +} + +impl InUsePool { + fn new() -> Self { + InUsePool { + pools: RwLock::new(HashMap::new()), + } + } + + fn insert(&self, reuse_hash: u64, conn: ConnectionRef) { + { + let pools = self.pools.read(); + if let Some(pool) = pools.get(&reuse_hash) { + pool.insert(conn.id(), conn); + return; + } + } // drop read lock + + let pool = PoolNode::new(); + pool.insert(conn.id(), conn); + let mut pools = self.pools.write(); + pools.insert(reuse_hash, pool); + } + + // retrieve a h2 conn ref to create a new stream + // the caller should return the conn ref to this pool if there are still + // capacity left for more streams + fn get(&self, reuse_hash: u64) -> Option<ConnectionRef> { + let pools = self.pools.read(); + pools.get(&reuse_hash)?.get_any().map(|v| v.1) + } + + // release a h2_stream, this functional will cause an ConnectionRef to be returned (if exist) + // the caller should update the ref and then decide where to put it (in use pool or idle) + fn release(&self, reuse_hash: u64, id: i32) -> Option<ConnectionRef> { + let pools = self.pools.read(); + if let Some(pool) = pools.get(&reuse_hash) { + pool.remove(id) + } else { + None + } + } +} + +const DEFAULT_POOL_SIZE: usize = 128; + +/// Http2 connector +pub struct Connector { + // just for creating connections, the Stream of h2 should be reused + transport: TransportConnector, + // the h2 connection idle pool + idle_pool: Arc<ConnectionPool<ConnectionRef>>, + // the pool of h2 connections that have ongoing streams + in_use_pool: InUsePool, +} + +impl Connector { + /// Create a new [Connector] from the given [ConnectorOptions] + pub fn new(options: Option<ConnectorOptions>) -> Self { + let pool_size = options + .as_ref() + .map_or(DEFAULT_POOL_SIZE, |o| o.keepalive_pool_size); + // connection offload is handled by the [TransportConnector] + Connector { + transport: TransportConnector::new(options), + idle_pool: Arc::new(ConnectionPool::new(pool_size)), + in_use_pool: InUsePool::new(), + } + } + + /// Create a new Http2 connection to the given server + /// + /// Either an Http2 or Http1 session can be returned depending on the server's preference. + pub async fn new_http_session<P: Peer + Send + Sync + 'static>( + &self, + peer: &P, + ) -> Result<HttpSession> { + let stream = self.transport.new_stream(peer).await?; + + // check alpn + match stream.selected_alpn_proto() { + Some(ALPN::H2) => { /* continue */ } + Some(_) => { + // H2 not supported + return Ok(HttpSession::H1(Http1Session::new(stream))); + } + None => { + // if tls but no ALPN, default to h1 + // else if plaintext and min http version is 1, this is most likely h1 + if peer.tls() + || peer + .get_peer_options() + .map_or(true, |o| o.alpn.get_min_http_version() == 1) + { + return Ok(HttpSession::H1(Http1Session::new(stream))); + } + // else: min http version=H2 over plaintext, there is no ALPN anyways, we trust + // the caller that the server speaks h2c + } + } + let max_h2_stream = peer.get_peer_options().map_or(1, |o| o.max_h2_streams); + let conn = handshake(stream, max_h2_stream, peer.h2_ping_interval()).await?; + let h2_stream = conn + .spawn_stream() + .await? + .expect("newly created connections should have at least one free stream"); + if conn.more_streams_allowed() { + self.in_use_pool.insert(peer.reuse_hash(), conn); + } + Ok(HttpSession::H2(h2_stream)) + } + + /// Try to create a new http2 stream from any existing H2 connection. + /// + /// None means there is no "free" connection left. + pub async fn reused_http_session<P: Peer + Send + Sync + 'static>( + &self, + peer: &P, + ) -> Result<Option<Http2Session>> { + // check in use pool first so that we use fewer total connections + // then idle pool + let reuse_hash = peer.reuse_hash(); + + // NOTE: We grab a conn from the pools, create a new stream and put the conn back if the + // conn has more free streams. During this process another caller could arrive but is not + // able to find the conn even the conn has free stream to use. + // We accept this false negative to keep the implementation simple. This false negative + // makes an actual impact when there are only a few connection. + // Alternative design 1. given each free stream a conn object: a lot of Arc<> + // Alternative design 2. mutex the pool, which creates lock contention when concurrency is high + // Alternative design 3. do not pop conn from the pool so that multiple callers can grab it + // which will cause issue where spawn_stream() could return None because others call it + // first. Thus a caller might have to retry or give up. This issue is more likely to happen + // when concurrency is high. + let maybe_conn = self + .in_use_pool + .get(reuse_hash) + .or_else(|| self.idle_pool.get(&reuse_hash)); + if let Some(conn) = maybe_conn { + let h2_stream = conn + .spawn_stream() + .await? + .expect("connection from the pools should have free stream to allocate"); + if conn.more_streams_allowed() { + self.in_use_pool.insert(reuse_hash, conn); + } + Ok(Some(h2_stream)) + } else { + Ok(None) + } + } + + /// Release a finished h2 stream. + /// + /// This function will terminate the [Http2Session]. The corresponding h2 connection will now + /// have one more free stream to use. + /// + /// The h2 connection will be closed after `idle_timeout` if it has no active streams. + pub fn release_http_session<P: Peer + Send + Sync + 'static>( + &self, + session: Http2Session, + peer: &P, + idle_timeout: Option<Duration>, + ) { + let id = session.conn.id(); + let reuse_hash = peer.reuse_hash(); + // get a ref to the connection, which we might need below, before dropping the h2 + let conn = session.conn(); + // this drop() will both drop the actual stream and call the conn.release_stream() + drop(session); + // find and remove the conn stored in in_use_pool so that it could be put in the idle pool + // if necessary + let conn = self.in_use_pool.release(reuse_hash, id).unwrap_or(conn); + if conn.is_closed() { + // Already dead h2 connection + return; + } + if conn.is_idle() { + let meta = ConnectionMeta { + key: reuse_hash, + id, + }; + let closed = conn.0.closed.clone(); + let (notify_evicted, watch_use) = self.idle_pool.put(&meta, conn); + if let Some(to) = idle_timeout { + let pool = self.idle_pool.clone(); //clone the arc + let rt = pingora_runtime::current_handle(); + rt.spawn(async move { + pool.idle_timeout(&meta, to, notify_evicted, closed, watch_use) + .await; + }); + } + } else { + self.in_use_pool.insert(reuse_hash, conn); + } + } + + /// Tell the connector to always send h1 for ALPN for the given peer in the future. + pub fn prefer_h1(&self, peer: &impl Peer) { + self.transport.prefer_h1(peer); + } + + pub(crate) fn h1_is_preferred(&self, peer: &impl Peer) -> bool { + self.transport + .preferred_http_version + .get(peer) + .map_or(false, |v| matches!(v, ALPN::H1)) + } +} + +// The h2 library we use has unbounded internal buffering, which will cause excessive memory +// consumption when the downstream is slower than upstream. This window size caps the buffering by +// limiting how much data can be inflight. However, setting this value will also cap the max +// download speed by limiting the bandwidth-delay product of a link. +// Long term, we should advertising large window but shrink it when a small buffer is full. +// 8 Mbytes = 80 Mbytes X 100ms, which should be enough for most links. +const H2_WINDOW_SIZE: u32 = 1 << 23; + +async fn handshake( + stream: Stream, + max_streams: usize, + h2_ping_interval: Option<Duration>, +) -> Result<ConnectionRef> { + use h2::client::Builder; + use pingora_runtime::current_handle; + + // Safe guard: new_http_session() assumes there should be at least one free stream + if max_streams == 0 { + return Error::e_explain(H2Error, "zero max_stream configured"); + } + + let id = stream.id(); + let digest = Digest { + // NOTE: this field is always false because the digest is shared across all streams + // The streams should log their own reuse info + ssl_digest: stream.get_ssl_digest(), + // TODO: log h2 handshake time + timing_digest: stream.get_timing_digest(), + proxy_digest: stream.get_proxy_digest(), + }; + // TODO: make these configurable + let (send_req, connection) = Builder::new() + .enable_push(false) + .initial_max_send_streams(max_streams) + // The limit for the server. Server push is not allowed, so this value doesn't matter + .max_concurrent_streams(1) + .max_frame_size(64 * 1024) // advise server to send larger frames + .initial_window_size(H2_WINDOW_SIZE) + // should this be max_streams * H2_WINDOW_SIZE? + .initial_connection_window_size(H2_WINDOW_SIZE) + .handshake(stream) + .await + .or_err(HandshakeError, "during H2 handshake")?; + debug!("H2 handshake to server done."); + let ping_timeout_occurred = Arc::new(AtomicBool::new(false)); + let ping_timeout_clone = ping_timeout_occurred.clone(); + let max_allowed_streams = std::cmp::min(max_streams, connection.max_concurrent_send_streams()); + + // Safe guard: new_http_session() assumes there should be at least one free stream + // The server won't commonly advertise 0 max stream. + if max_allowed_streams == 0 { + return Error::e_explain(H2Error, "zero max_concurrent_send_streams received"); + } + + let (closed_tx, closed_rx) = watch::channel(false); + + current_handle().spawn(async move { + drive_connection( + connection, + id, + closed_tx, + h2_ping_interval, + ping_timeout_clone, + ) + .await; + }); + Ok(ConnectionRef::new( + send_req, + closed_rx, + ping_timeout_occurred, + id, + max_allowed_streams, + digest, + )) +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::upstreams::peer::HttpPeer; + + #[tokio::test] + async fn test_connect_h2() { + let connector = Connector::new(None); + let mut peer = HttpPeer::new(("1.1.1.1", 443), true, "one.one.one.one".into()); + peer.options.set_http_version(2, 2); + let h2 = connector.new_http_session(&peer).await.unwrap(); + match h2 { + HttpSession::H1(_) => panic!("expect h2"), + HttpSession::H2(h2_stream) => assert!(!h2_stream.ping_timedout()), + } + } + + #[tokio::test] + async fn test_connect_h1() { + let connector = Connector::new(None); + let mut peer = HttpPeer::new(("1.1.1.1", 443), true, "one.one.one.one".into()); + // a hack to force h1, new_http_session() in the future might validate this setting + peer.options.set_http_version(1, 1); + let h2 = connector.new_http_session(&peer).await.unwrap(); + match h2 { + HttpSession::H1(_) => {} + HttpSession::H2(_) => panic!("expect h1"), + } + } + + #[tokio::test] + async fn test_connect_h1_plaintext() { + let connector = Connector::new(None); + let mut peer = HttpPeer::new(("1.1.1.1", 80), false, "".into()); + peer.options.set_http_version(2, 1); + let h2 = connector.new_http_session(&peer).await.unwrap(); + match h2 { + HttpSession::H1(_) => {} + HttpSession::H2(_) => panic!("expect h1"), + } + } + + #[tokio::test] + async fn test_h2_single_stream() { + let connector = Connector::new(None); + let mut peer = HttpPeer::new(("1.1.1.1", 443), true, "one.one.one.one".into()); + peer.options.set_http_version(2, 2); + peer.options.max_h2_streams = 1; + let h2 = connector.new_http_session(&peer).await.unwrap(); + let h2_1 = match h2 { + HttpSession::H1(_) => panic!("expect h2"), + HttpSession::H2(h2_stream) => h2_stream, + }; + + let id = h2_1.conn.id(); + + assert!(connector + .reused_http_session(&peer) + .await + .unwrap() + .is_none()); + + connector.release_http_session(h2_1, &peer, None); + + let h2_2 = connector.reused_http_session(&peer).await.unwrap().unwrap(); + assert_eq!(id, h2_2.conn.id()); + + connector.release_http_session(h2_2, &peer, None); + + let h2_3 = connector.reused_http_session(&peer).await.unwrap().unwrap(); + assert_eq!(id, h2_3.conn.id()); + } + + #[tokio::test] + async fn test_h2_multiple_stream() { + let connector = Connector::new(None); + let mut peer = HttpPeer::new(("1.1.1.1", 443), true, "one.one.one.one".into()); + peer.options.set_http_version(2, 2); + peer.options.max_h2_streams = 3; + let h2 = connector.new_http_session(&peer).await.unwrap(); + let h2_1 = match h2 { + HttpSession::H1(_) => panic!("expect h2"), + HttpSession::H2(h2_stream) => h2_stream, + }; + + let id = h2_1.conn.id(); + + let h2_2 = connector.reused_http_session(&peer).await.unwrap().unwrap(); + assert_eq!(id, h2_2.conn.id()); + let h2_3 = connector.reused_http_session(&peer).await.unwrap().unwrap(); + assert_eq!(id, h2_3.conn.id()); + + // max stream is 3 for now + assert!(connector + .reused_http_session(&peer) + .await + .unwrap() + .is_none()); + + connector.release_http_session(h2_1, &peer, None); + + let h2_4 = connector.reused_http_session(&peer).await.unwrap().unwrap(); + assert_eq!(id, h2_4.conn.id()); + + connector.release_http_session(h2_2, &peer, None); + connector.release_http_session(h2_3, &peer, None); + connector.release_http_session(h2_4, &peer, None); + + // all streams are released, now the connection is idle + let h2_5 = connector.reused_http_session(&peer).await.unwrap().unwrap(); + assert_eq!(id, h2_5.conn.id()); + } +} diff --git a/pingora-core/src/connectors/l4.rs b/pingora-core/src/connectors/l4.rs new file mode 100644 index 0000000..6f0f5fd --- /dev/null +++ b/pingora-core/src/connectors/l4.rs @@ -0,0 +1,313 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use log::debug; +use pingora_error::{Context, Error, ErrorType::*, OrErr, Result}; +use rand::seq::SliceRandom; +use std::net::SocketAddr as InetSocketAddr; + +use crate::protocols::l4::ext::{connect as tcp_connect, connect_uds, set_tcp_keepalive}; +use crate::protocols::l4::socket::SocketAddr; +use crate::protocols::l4::stream::Stream; +use crate::upstreams::peer::Peer; + +/// Establish a connection (l4) to the given peer using its settings and an optional bind address. +pub async fn connect<P>(peer: &P, bind_to: Option<InetSocketAddr>) -> Result<Stream> +where + P: Peer + Send + Sync, +{ + if peer.get_proxy().is_some() { + return proxy_connect(peer) + .await + .err_context(|| format!("Fail to establish CONNECT proxy: {}", peer)); + } + let mut stream: Stream = match peer.address() { + SocketAddr::Inet(addr) => { + let connect_future = tcp_connect(addr, bind_to.as_ref()); + let conn_res = match peer.connection_timeout() { + Some(t) => pingora_timeout::timeout(t, connect_future) + .await + .explain_err(ConnectTimedout, |_| { + format!("timeout {t:?} connecting to server {peer}") + })?, + None => connect_future.await, + }; + match conn_res { + Ok(socket) => { + debug!("connected to new server: {}", peer.address()); + if let Some(ka) = peer.tcp_keepalive() { + debug!("Setting tcp keepalive"); + set_tcp_keepalive(&socket, ka)?; + } + Ok(socket.into()) + } + Err(e) => { + let c = format!("Fail to connect to {peer}"); + match e.etype() { + SocketError | BindError => Error::e_because(InternalError, c, e), + _ => Err(e.more_context(c)), + } + } + } + } + SocketAddr::Unix(addr) => { + let connect_future = connect_uds( + addr.as_pathname() + .expect("non-pathname unix sockets not supported as peer"), + ); + let conn_res = match peer.connection_timeout() { + Some(t) => pingora_timeout::timeout(t, connect_future) + .await + .explain_err(ConnectTimedout, |_| { + format!("timeout {t:?} connecting to server {peer}") + })?, + None => connect_future.await, + }; + match conn_res { + Ok(socket) => { + debug!("connected to new server: {}", peer.address()); + // no SO_KEEPALIVE for UDS + Ok(socket.into()) + } + Err(e) => { + let c = format!("Fail to connect to {peer}"); + match e.etype() { + SocketError | BindError => Error::e_because(InternalError, c, e), + _ => Err(e.more_context(c)), + } + } + } + } + }?; + let tracer = peer.get_tracer(); + if let Some(t) = tracer { + t.0.on_connected(); + stream.tracer = Some(t); + } + + stream.set_nodelay()?; + Ok(stream) +} + +pub(crate) fn bind_to_random<P: Peer>( + peer: &P, + v4_list: &[InetSocketAddr], + v6_list: &[InetSocketAddr], +) -> Option<InetSocketAddr> { + let selected = peer.get_peer_options().and_then(|o| o.bind_to); + if selected.is_some() { + return selected; + } + + fn bind_to_ips(ips: &[InetSocketAddr]) -> Option<InetSocketAddr> { + match ips.len() { + 0 => None, + 1 => Some(ips[0]), + _ => { + // pick a random bind ip + ips.choose(&mut rand::thread_rng()).copied() + } + } + } + + match peer.address() { + SocketAddr::Inet(sockaddr) => match sockaddr { + InetSocketAddr::V4(_) => bind_to_ips(v4_list), + InetSocketAddr::V6(_) => bind_to_ips(v6_list), + }, + SocketAddr::Unix(_) => None, + } +} + +use crate::protocols::raw_connect; + +async fn proxy_connect<P: Peer>(peer: &P) -> Result<Stream> { + // safe to unwrap + let proxy = peer.get_proxy().unwrap(); + let options = peer.get_peer_options().unwrap(); + + // combine required and optional headers + let mut headers = proxy + .headers + .iter() + .chain(options.extra_proxy_headers.iter()); + + // not likely to timeout during connect() to UDS + let stream: Box<Stream> = Box::new( + connect_uds(&proxy.next_hop) + .await + .or_err_with(ConnectError, || { + format!("CONNECT proxy connect() error to {:?}", &proxy.next_hop) + })? + .into(), + ); + + let req_header = raw_connect::generate_connect_header(&proxy.host, proxy.port, &mut headers)?; + let fut = raw_connect::connect(stream, &req_header); + let (mut stream, digest) = match peer.connection_timeout() { + Some(t) => pingora_timeout::timeout(t, fut) + .await + .explain_err(ConnectTimedout, |_| "establishing CONNECT proxy")?, + None => fut.await, + } + .map_err(|mut e| { + // http protocol may ask to retry if reused client + e.retry.decide_reuse(false); + e + })?; + debug!("CONNECT proxy established: {:?}", proxy); + stream.set_proxy_digest(digest); + let stream = stream.into_any().downcast::<Stream>().unwrap(); // safe, it is Stream from above + Ok(*stream) +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::upstreams::peer::{BasicPeer, HttpPeer, Proxy}; + use std::collections::BTreeMap; + use std::path::PathBuf; + use tokio::io::AsyncWriteExt; + use tokio::net::UnixListener; + + #[tokio::test] + async fn test_conn_error_refused() { + let peer = BasicPeer::new("127.0.0.1:79"); // hopefully port 79 is not used + let new_session = connect(&peer, None).await; + assert_eq!(new_session.unwrap_err().etype(), &ConnectRefused) + } + + // TODO broken on arm64 + #[ignore] + #[tokio::test] + async fn test_conn_error_no_route() { + let peer = BasicPeer::new("[::3]:79"); // no route + let new_session = connect(&peer, None).await; + assert_eq!(new_session.unwrap_err().etype(), &ConnectNoRoute) + } + + #[tokio::test] + async fn test_conn_error_addr_not_avail() { + let peer = HttpPeer::new("127.0.0.1:121".to_string(), false, "".to_string()); + let new_session = connect(&peer, Some("192.0.2.2:0".parse().unwrap())).await; + assert_eq!(new_session.unwrap_err().etype(), &InternalError) + } + + #[tokio::test] + async fn test_conn_error_other() { + let peer = HttpPeer::new("240.0.0.1:80".to_string(), false, "".to_string()); // non localhost + + // create an error: cannot send from src addr: localhost to dst addr: a public IP + let new_session = connect(&peer, Some("127.0.0.1:0".parse().unwrap())).await; + let error = new_session.unwrap_err(); + // XXX: some system will allow the socket to bind and connect without error, only to timeout + assert!(error.etype() == &ConnectError || error.etype() == &ConnectTimedout) + } + + #[tokio::test] + async fn test_conn_timeout() { + // 192.0.2.1 is effectively a blackhole + let mut peer = BasicPeer::new("192.0.2.1:79"); + peer.options.connection_timeout = Some(std::time::Duration::from_millis(1)); //1ms + let new_session = connect(&peer, None).await; + assert_eq!(new_session.unwrap_err().etype(), &ConnectTimedout) + } + + #[tokio::test] + async fn test_connect_proxy_fail() { + let mut peer = HttpPeer::new("1.1.1.1:80".to_string(), false, "".to_string()); + let mut path = PathBuf::new(); + path.push("/tmp/123"); + peer.proxy = Some(Proxy { + next_hop: path.into(), + host: "1.1.1.1".into(), + port: 80, + headers: BTreeMap::new(), + }); + let new_session = connect(&peer, None).await; + let e = new_session.unwrap_err(); + assert_eq!(e.etype(), &ConnectError); + assert!(!e.retry()); + } + + const MOCK_UDS_PATH: &str = "/tmp/test_unix_connect_proxy.sock"; + + // one-off mock server + async fn mock_connect_server() { + let _ = std::fs::remove_file(MOCK_UDS_PATH); + let listener = UnixListener::bind(MOCK_UDS_PATH).unwrap(); + if let Ok((mut stream, _addr)) = listener.accept().await { + stream.write_all(b"HTTP/1.1 200 OK\r\n\r\n").await.unwrap(); + // wait a bit so that the client can read + tokio::time::sleep(std::time::Duration::from_millis(100)).await; + } + let _ = std::fs::remove_file(MOCK_UDS_PATH); + } + + #[tokio::test(flavor = "multi_thread")] + async fn test_connect_proxy_work() { + tokio::spawn(async { + mock_connect_server().await; + }); + // wait for the server to start + tokio::time::sleep(std::time::Duration::from_millis(100)).await; + let mut peer = HttpPeer::new("1.1.1.1:80".to_string(), false, "".to_string()); + let mut path = PathBuf::new(); + path.push(MOCK_UDS_PATH); + peer.proxy = Some(Proxy { + next_hop: path.into(), + host: "1.1.1.1".into(), + port: 80, + headers: BTreeMap::new(), + }); + let new_session = connect(&peer, None).await; + assert!(new_session.is_ok()); + } + + const MOCK_BAD_UDS_PATH: &str = "/tmp/test_unix_bad_connect_proxy.sock"; + + // one-off mock bad proxy + // closes connection upon accepting + async fn mock_connect_bad_server() { + let _ = std::fs::remove_file(MOCK_BAD_UDS_PATH); + let listener = UnixListener::bind(MOCK_BAD_UDS_PATH).unwrap(); + if let Ok((mut stream, _addr)) = listener.accept().await { + stream.shutdown().await.unwrap(); + tokio::time::sleep(std::time::Duration::from_millis(100)).await; + } + let _ = std::fs::remove_file(MOCK_BAD_UDS_PATH); + } + + #[tokio::test(flavor = "multi_thread")] + async fn test_connect_proxy_conn_closed() { + tokio::spawn(async { + mock_connect_bad_server().await; + }); + // wait for the server to start + tokio::time::sleep(std::time::Duration::from_millis(100)).await; + let mut peer = HttpPeer::new("1.1.1.1:80".to_string(), false, "".to_string()); + let mut path = PathBuf::new(); + path.push(MOCK_BAD_UDS_PATH); + peer.proxy = Some(Proxy { + next_hop: path.into(), + host: "1.1.1.1".into(), + port: 80, + headers: BTreeMap::new(), + }); + let new_session = connect(&peer, None).await; + let err = new_session.unwrap_err(); + assert_eq!(err.etype(), &ConnectionClosed); + assert!(!err.retry()); + } +} diff --git a/pingora-core/src/connectors/mod.rs b/pingora-core/src/connectors/mod.rs new file mode 100644 index 0000000..ad9fbc4 --- /dev/null +++ b/pingora-core/src/connectors/mod.rs @@ -0,0 +1,477 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Connecting to servers + +pub mod http; +mod l4; +mod offload; +mod tls; + +use crate::protocols::Stream; +use crate::server::configuration::ServerConf; +use crate::tls::ssl::SslConnector; +use crate::upstreams::peer::{Peer, ALPN}; + +use l4::connect as l4_connect; +use log::{debug, error, warn}; +use offload::OffloadRuntime; +use parking_lot::RwLock; +use pingora_error::{Error, ErrorType::*, OrErr, Result}; +use pingora_pool::{ConnectionMeta, ConnectionPool}; +use std::collections::HashMap; +use std::net::SocketAddr; +use std::sync::Arc; +use tokio::sync::Mutex; + +/// The options to configure a [TransportConnector] +#[derive(Clone)] +pub struct ConnectorOptions { + /// Path to the CA file used to validate server certs. + /// + /// If `None`, the CA in the [default](https://www.openssl.org/docs/manmaster/man3/SSL_CTX_set_default_verify_paths.html) + /// locations will be loaded + pub ca_file: Option<String>, + /// The default client cert and key to use for mTLS + /// + /// Each individual connection can use their own cert key to override this. + pub cert_key_file: Option<(String, String)>, + /// How many connections to keepalive + pub keepalive_pool_size: usize, + /// Optionally offload the connection establishment to dedicated thread pools + /// + /// TCP and TLS connection establishment can be CPU intensive. Sometimes such tasks can slow + /// down the entire service, which causes timeouts which leads to more connections which + /// snowballs the issue. Use this option to isolate these CPU intensive tasks from impacting + /// other traffic. + /// + /// Syntax: (#pools, #thread in each pool) + pub offload_threadpool: Option<(usize, usize)>, + /// Bind to any of the given source IPv6 addresses + pub bind_to_v4: Vec<SocketAddr>, + /// Bind to any of the given source IPv4 addresses + pub bind_to_v6: Vec<SocketAddr>, +} + +impl ConnectorOptions { + /// Derive the [ConnectorOptions] from a [ServerConf] + pub fn from_server_conf(server_conf: &ServerConf) -> Self { + // if both pools and threads are Some(>0) + let offload_threadpool = server_conf + .upstream_connect_offload_threadpools + .zip(server_conf.upstream_connect_offload_thread_per_pool) + .filter(|(pools, threads)| *pools > 0 && *threads > 0); + + // create SocketAddrs with port 0 for src addr bind + + let bind_to_v4 = server_conf + .client_bind_to_ipv4 + .iter() + .map(|v4| { + let ip = v4.parse().unwrap(); + SocketAddr::new(ip, 0) + }) + .collect(); + + let bind_to_v6 = server_conf + .client_bind_to_ipv6 + .iter() + .map(|v6| { + let ip = v6.parse().unwrap(); + SocketAddr::new(ip, 0) + }) + .collect(); + ConnectorOptions { + ca_file: server_conf.ca_file.clone(), + cert_key_file: None, // TODO: use it + keepalive_pool_size: server_conf.upstream_keepalive_pool_size, + offload_threadpool, + bind_to_v4, + bind_to_v6, + } + } + + /// Create a new [ConnectorOptions] with the given keepalive pool size + pub fn new(keepalive_pool_size: usize) -> Self { + ConnectorOptions { + ca_file: None, + cert_key_file: None, + keepalive_pool_size, + offload_threadpool: None, + bind_to_v4: vec![], + bind_to_v6: vec![], + } + } +} + +/// [TransportConnector] provides APIs to connect to servers via TCP or TLS with connection reuse +pub struct TransportConnector { + tls_ctx: tls::Connector, + connection_pool: Arc<ConnectionPool<Arc<Mutex<Stream>>>>, + offload: Option<OffloadRuntime>, + bind_to_v4: Vec<SocketAddr>, + bind_to_v6: Vec<SocketAddr>, + preferred_http_version: PreferredHttpVersion, +} + +const DEFAULT_POOL_SIZE: usize = 128; + +impl TransportConnector { + /// Create a new [TransportConnector] with the given [ConnectorOptions] + pub fn new(mut options: Option<ConnectorOptions>) -> Self { + let pool_size = options + .as_ref() + .map_or(DEFAULT_POOL_SIZE, |c| c.keepalive_pool_size); + // Take the offloading setting there because this layer has implement offloading, + // so no need for stacks at lower layer to offload again. + let offload = options.as_mut().and_then(|o| o.offload_threadpool.take()); + let bind_to_v4 = options + .as_ref() + .map_or_else(Vec::new, |o| o.bind_to_v4.clone()); + let bind_to_v6 = options + .as_ref() + .map_or_else(Vec::new, |o| o.bind_to_v6.clone()); + TransportConnector { + tls_ctx: tls::Connector::new(options), + connection_pool: Arc::new(ConnectionPool::new(pool_size)), + offload: offload.map(|v| OffloadRuntime::new(v.0, v.1)), + bind_to_v4, + bind_to_v6, + preferred_http_version: PreferredHttpVersion::new(), + } + } + + /// Connect to the given server [Peer] + /// + /// No connection is reused. + pub async fn new_stream<P: Peer + Send + Sync + 'static>(&self, peer: &P) -> Result<Stream> { + let rt = self + .offload + .as_ref() + .map(|o| o.get_runtime(peer.reuse_hash())); + let bind_to = l4::bind_to_random(peer, &self.bind_to_v4, &self.bind_to_v6); + let alpn_override = self.preferred_http_version.get(peer); + let stream = if let Some(rt) = rt { + let peer = peer.clone(); + let tls_ctx = self.tls_ctx.clone(); + rt.spawn(async move { do_connect(&peer, bind_to, alpn_override, &tls_ctx.ctx).await }) + .await + .or_err(InternalError, "offload runtime failure")?? + } else { + do_connect(peer, bind_to, alpn_override, &self.tls_ctx.ctx).await? + }; + + Ok(stream) + } + + /// Try to find a reusable connection to the given server [Peer] + pub async fn reused_stream<P: Peer + Send + Sync>(&self, peer: &P) -> Option<Stream> { + match self.connection_pool.get(&peer.reuse_hash()) { + Some(s) => { + debug!("find reusable stream, trying to acquire it"); + { + let _ = s.lock().await; + } // wait for the idle poll to release it + match Arc::try_unwrap(s) { + Ok(l) => { + let mut stream = l.into_inner(); + // test_reusable_stream: we assume server would never actively send data + // first on an idle stream. + if peer.matches_fd(stream.id()) && test_reusable_stream(&mut stream) { + Some(stream) + } else { + None + } + } + Err(_) => { + error!("failed to acquire reusable stream"); + None + } + } + } + None => { + debug!("No reusable connection found for {peer}"); + None + } + } + } + + /// Return the [Stream] to the [TransportConnector] for connection reuse. + /// + /// Not all TCP/TLS connection can be reused. It is the caller's responsibility to make sure + /// that protocol over the [Stream] supports connection reuse and the [Stream] itself is ready + /// to be reused. + /// + /// If a [Stream] is dropped instead of being returned via this function. it will be closed. + pub fn release_stream( + &self, + mut stream: Stream, + key: u64, // usually peer.reuse_hash() + idle_timeout: Option<std::time::Duration>, + ) { + if !test_reusable_stream(&mut stream) { + return; + } + let id = stream.id(); + let meta = ConnectionMeta::new(key, id); + debug!("Try to keepalive client session"); + let stream = Arc::new(Mutex::new(stream)); + let locked_stream = stream.clone().try_lock_owned().unwrap(); // safe as we just created it + let (notify_close, watch_use) = self.connection_pool.put(&meta, stream); + let pool = self.connection_pool.clone(); //clone the arc + let rt = pingora_runtime::current_handle(); + rt.spawn(async move { + pool.idle_poll(locked_stream, &meta, idle_timeout, notify_close, watch_use) + .await; + }); + } + + /// Get a stream to the given server [Peer] + /// + /// This function will try to find a reusable [Stream] first. If there is none, a new connection + /// will be made to the server. + /// + /// The returned boolean will indicate whether the stream is reused. + pub async fn get_stream<P: Peer + Send + Sync + 'static>( + &self, + peer: &P, + ) -> Result<(Stream, bool)> { + let reused_stream = self.reused_stream(peer).await; + if let Some(s) = reused_stream { + Ok((s, true)) + } else { + let s = self.new_stream(peer).await?; + Ok((s, false)) + } + } + + /// Tell the connector to always send h1 for ALPN for the given peer in the future. + pub fn prefer_h1(&self, peer: &impl Peer) { + self.preferred_http_version.add(peer, 1); + } +} + +// Perform the actual L4 and tls connection steps while respecting the peer's +// connection timeout if there one +async fn do_connect<P: Peer + Send + Sync>( + peer: &P, + bind_to: Option<SocketAddr>, + alpn_override: Option<ALPN>, + tls_ctx: &SslConnector, +) -> Result<Stream> { + // Create the future that does the connections, but don't evaluate it until + // we decide if we need a timeout or not + let connect_future = do_connect_inner(peer, bind_to, alpn_override, tls_ctx); + + match peer.total_connection_timeout() { + Some(t) => match pingora_timeout::timeout(t, connect_future).await { + Ok(res) => res, + Err(_) => Error::e_explain( + ConnectTimedout, + format!("connecting to server {peer}, total-connection timeout {t:?}"), + ), + }, + None => connect_future.await, + } +} + +// Perform the actual L4 and tls connection steps with no timeout +async fn do_connect_inner<P: Peer + Send + Sync>( + peer: &P, + bind_to: Option<SocketAddr>, + alpn_override: Option<ALPN>, + tls_ctx: &SslConnector, +) -> Result<Stream> { + let stream = l4_connect(peer, bind_to).await?; + if peer.tls() { + let tls_stream = tls::connect(stream, peer, alpn_override, tls_ctx).await?; + Ok(Box::new(tls_stream)) + } else { + Ok(Box::new(stream)) + } +} + +struct PreferredHttpVersion { + // TODO: shard to avoid the global lock + versions: RwLock<HashMap<u64, u8>>, // <hash of peer, version> +} + +// TODO: limit the size of this + +impl PreferredHttpVersion { + pub fn new() -> Self { + PreferredHttpVersion { + versions: RwLock::default(), + } + } + + pub fn add(&self, peer: &impl Peer, version: u8) { + let key = peer.reuse_hash(); + let mut v = self.versions.write(); + v.insert(key, version); + } + + pub fn get(&self, peer: &impl Peer) -> Option<ALPN> { + let key = peer.reuse_hash(); + let v = self.versions.read(); + v.get(&key) + .copied() + .map(|v| if v == 1 { ALPN::H1 } else { ALPN::H2H1 }) + } +} + +use futures::future::FutureExt; +use tokio::io::AsyncReadExt; + +/// Test whether a stream is already closed or not reusable (server sent unexpected data) +fn test_reusable_stream(stream: &mut Stream) -> bool { + let mut buf = [0; 1]; + let result = stream.read(&mut buf[..]).now_or_never(); + if let Some(data_result) = result { + match data_result { + Ok(n) => { + if n == 0 { + debug!("Idle connection is closed"); + } else { + warn!("Unexpected data read in idle connection"); + } + } + Err(e) => { + debug!("Idle connection is broken: {e:?}"); + } + } + false + } else { + true + } +} + +#[cfg(test)] +mod tests { + use pingora_error::ErrorType; + use pingora_openssl::ssl::SslMethod; + + use super::*; + use crate::upstreams::peer::BasicPeer; + + // 192.0.2.1 is effectively a black hole + const BLACK_HOLE: &str = "192.0.2.1:79"; + + #[tokio::test] + async fn test_connect() { + let connector = TransportConnector::new(None); + let peer = BasicPeer::new("1.1.1.1:80"); + // make a new connection to 1.1.1.1 + let stream = connector.new_stream(&peer).await.unwrap(); + connector.release_stream(stream, peer.reuse_hash(), None); + + let (_, reused) = connector.get_stream(&peer).await.unwrap(); + assert!(reused); + } + + #[tokio::test] + async fn test_connect_tls() { + let connector = TransportConnector::new(None); + let mut peer = BasicPeer::new("1.1.1.1:443"); + // BasicPeer will use tls when SNI is set + peer.sni = "one.one.one.one".to_string(); + // make a new connection to https://1.1.1.1 + let stream = connector.new_stream(&peer).await.unwrap(); + connector.release_stream(stream, peer.reuse_hash(), None); + + let (_, reused) = connector.get_stream(&peer).await.unwrap(); + assert!(reused); + } + + async fn do_test_conn_timeout(conf: Option<ConnectorOptions>) { + let connector = TransportConnector::new(conf); + let mut peer = BasicPeer::new(BLACK_HOLE); + peer.options.connection_timeout = Some(std::time::Duration::from_millis(1)); + let stream = connector.new_stream(&peer).await; + match stream { + Ok(_) => panic!("should throw an error"), + Err(e) => assert_eq!(e.etype(), &ConnectTimedout), + } + } + + #[tokio::test] + async fn test_conn_timeout() { + do_test_conn_timeout(None).await; + } + + #[tokio::test] + async fn test_conn_timeout_with_offload() { + let mut conf = ConnectorOptions::new(8); + conf.offload_threadpool = Some((2, 2)); + do_test_conn_timeout(Some(conf)).await; + } + + #[tokio::test] + async fn test_connector_bind_to() { + // connect to remote while bind to localhost will fail + let peer = BasicPeer::new("240.0.0.1:80"); + let mut conf = ConnectorOptions::new(1); + conf.bind_to_v4.push("127.0.0.1:0".parse().unwrap()); + let connector = TransportConnector::new(Some(conf)); + + let stream = connector.new_stream(&peer).await; + let error = stream.unwrap_err(); + // XXX: some system will allow the socket to bind and connect without error, only to timeout + assert!(error.etype() == &ConnectError || error.etype() == &ConnectTimedout) + } + + /// Helper function for testing error handling in the `do_connect` function. + /// This assumes that the connection will fail to on the peer and returns + /// the decomposed error type and message + async fn get_do_connect_failure_with_peer(peer: &BasicPeer) -> (ErrorType, String) { + let ssl_connector = SslConnector::builder(SslMethod::tls()).unwrap().build(); + let stream = do_connect(peer, None, None, &ssl_connector).await; + match stream { + Ok(_) => panic!("should throw an error"), + Err(e) => ( + e.etype().clone(), + e.context + .as_ref() + .map(|ctx| ctx.as_str().to_owned()) + .unwrap_or_default(), + ), + } + } + + #[tokio::test] + async fn test_do_connect_with_total_timeout() { + let mut peer = BasicPeer::new(BLACK_HOLE); + peer.options.total_connection_timeout = Some(std::time::Duration::from_millis(1)); + let (etype, context) = get_do_connect_failure_with_peer(&peer).await; + assert_eq!(etype, ConnectTimedout); + assert!(context.contains("total-connection timeout")); + } + + #[tokio::test] + async fn test_tls_connect_timeout_supersedes_total() { + let mut peer = BasicPeer::new(BLACK_HOLE); + peer.options.total_connection_timeout = Some(std::time::Duration::from_millis(10)); + peer.options.connection_timeout = Some(std::time::Duration::from_millis(1)); + let (etype, context) = get_do_connect_failure_with_peer(&peer).await; + assert_eq!(etype, ConnectTimedout); + assert!(!context.contains("total-connection timeout")); + } + + #[tokio::test] + async fn test_do_connect_without_total_timeout() { + let peer = BasicPeer::new(BLACK_HOLE); + let (etype, context) = get_do_connect_failure_with_peer(&peer).await; + assert!(etype != ConnectTimedout || !context.contains("total-connection timeout")); + } +} diff --git a/pingora-core/src/connectors/offload.rs b/pingora-core/src/connectors/offload.rs new file mode 100644 index 0000000..17334b3 --- /dev/null +++ b/pingora-core/src/connectors/offload.rs @@ -0,0 +1,77 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use log::debug; +use once_cell::sync::OnceCell; +use rand::Rng; +use tokio::runtime::{Builder, Handle}; +use tokio::sync::oneshot::{channel, Sender}; + +// TODO: use pingora_runtime +// a shared runtime (thread pools) +pub(crate) struct OffloadRuntime { + shards: usize, + thread_per_shard: usize, + // Lazily init the runtimes so that they are created after pingora + // daemonize itself. Otherwise the runtime threads are lost. + pools: OnceCell<Box<[(Handle, Sender<()>)]>>, +} + +impl OffloadRuntime { + pub fn new(shards: usize, thread_per_shard: usize) -> Self { + assert!(shards != 0); + assert!(thread_per_shard != 0); + OffloadRuntime { + shards, + thread_per_shard, + pools: OnceCell::new(), + } + } + + fn init_pools(&self) -> Box<[(Handle, Sender<()>)]> { + let threads = self.shards * self.thread_per_shard; + let mut pools = Vec::with_capacity(threads); + for _ in 0..threads { + // We use single thread runtimes to reduce the scheduling overhead of multithread + // tokio runtime, which can be 50% of the on CPU time of the runtimes + let rt = Builder::new_current_thread().enable_all().build().unwrap(); + let handler = rt.handle().clone(); + let (tx, rx) = channel::<()>(); + std::thread::Builder::new() + .name("Offload thread".to_string()) + .spawn(move || { + debug!("Offload thread started"); + // the thread that calls block_on() will drive the runtime + // rx will return when tx is dropped so this runtime and thread will exit + rt.block_on(rx) + }) + .unwrap(); + pools.push((handler, tx)); + } + + pools.into_boxed_slice() + } + + pub fn get_runtime(&self, hash: u64) -> &Handle { + let mut rng = rand::thread_rng(); + + // choose a shard based on hash and a random thread with in that shard + // e.g. say thread_per_shard=2, shard 1 thread 1 is 1 * 2 + 1 = 3 + // [[th0, th1], [th2, th3], ...] + let shard = hash as usize % self.shards; + let thread_in_shard = rng.gen_range(0..self.thread_per_shard); + let pools = self.pools.get_or_init(|| self.init_pools()); + &pools[shard * self.thread_per_shard + thread_in_shard].0 + } +} diff --git a/pingora-core/src/connectors/tls.rs b/pingora-core/src/connectors/tls.rs new file mode 100644 index 0000000..e8eb37e --- /dev/null +++ b/pingora-core/src/connectors/tls.rs @@ -0,0 +1,309 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use log::debug; +use pingora_error::{Error, ErrorType::*, OrErr, Result}; +use std::sync::{Arc, Once}; + +use super::ConnectorOptions; +use crate::protocols::ssl::client::handshake; +use crate::protocols::ssl::SslStream; +use crate::protocols::IO; +use crate::tls::ext::{ + add_host, clear_error_stack, ssl_add_chain_cert, ssl_set_groups_list, + ssl_set_renegotiate_mode_freely, ssl_set_verify_cert_store, ssl_use_certificate, + ssl_use_private_key, ssl_use_second_key_share, +}; +#[cfg(feature = "boringssl")] +use crate::tls::ssl::SslCurve; +use crate::tls::ssl::{SslConnector, SslFiletype, SslMethod, SslVerifyMode, SslVersion}; +use crate::tls::x509::store::X509StoreBuilder; +use crate::upstreams::peer::{Peer, ALPN}; + +const CIPHER_LIST: &str = "AES-128-GCM-SHA256\ + :AES-256-GCM-SHA384\ + :CHACHA20-POLY1305-SHA256\ + :ECDHE-ECDSA-AES128-GCM-SHA256\ + :ECDHE-ECDSA-AES256-GCM-SHA384\ + :ECDHE-RSA-AES128-GCM-SHA256\ + :ECDHE-RSA-AES256-GCM-SHA384\ + :ECDHE-RSA-AES128-SHA\ + :ECDHE-RSA-AES256-SHA384\ + :AES128-GCM-SHA256\ + :AES256-GCM-SHA384\ + :AES128-SHA\ + :AES256-SHA\ + :DES-CBC3-SHA"; + +/** + * Enabled signature algorithms for signing/verification (ECDSA). + * As of 4/10/2023, the only addition to boringssl's defaults is ECDSA_SECP521R1_SHA512. + */ +const SIGALG_LIST: &str = "ECDSA_SECP256R1_SHA256\ + :RSA_PSS_RSAE_SHA256\ + :RSA_PKCS1_SHA256\ + :ECDSA_SECP384R1_SHA384\ + :RSA_PSS_RSAE_SHA384\ + :RSA_PKCS1_SHA384\ + :RSA_PSS_RSAE_SHA512\ + :RSA_PKCS1_SHA512\ + :RSA_PKCS1_SHA1\ + :ECDSA_SECP521R1_SHA512"; +/** + * Enabled curves for ECDHE (signature key exchange). + * As of 4/10/2023, the only addition to boringssl's defaults is SECP521R1. + * + * N.B. The ordering of these curves is important. The boringssl library will select the first one + * as a guess when negotiating a handshake with a server using TLSv1.3. We should opt for curves + * that are both computationally cheaper and more supported. + */ +#[cfg(feature = "boringssl")] +const BORINGSSL_CURVE_LIST: &[SslCurve] = &[ + SslCurve::X25519, + SslCurve::SECP256R1, + SslCurve::SECP384R1, + SslCurve::SECP521R1, +]; + +static INIT_CA_ENV: Once = Once::new(); +fn init_ssl_cert_env_vars() { + // this sets env vars to pick up the root certs + // it is universal across openssl and boringssl + INIT_CA_ENV.call_once(openssl_probe::init_ssl_cert_env_vars); +} + +#[derive(Clone)] +pub struct Connector { + pub(crate) ctx: Arc<SslConnector>, // Arc to support clone +} + +impl Connector { + pub fn new(options: Option<ConnectorOptions>) -> Self { + let mut builder = SslConnector::builder(SslMethod::tls()).unwrap(); + // TODO: make these conf + // Set supported ciphers. + builder.set_cipher_list(CIPHER_LIST).unwrap(); + // Set supported signature algorithms and ECDH (key exchange) curves. + builder + .set_sigalgs_list(&SIGALG_LIST.to_lowercase()) + .unwrap(); + #[cfg(feature = "boringssl")] + builder.set_curves(BORINGSSL_CURVE_LIST).unwrap(); + builder + .set_max_proto_version(Some(SslVersion::TLS1_3)) + .unwrap(); + builder + .set_min_proto_version(Some(SslVersion::TLS1)) + .unwrap(); + if let Some(conf) = options.as_ref() { + if let Some(ca_file_path) = conf.ca_file.as_ref() { + builder.set_ca_file(ca_file_path).unwrap(); + } else { + init_ssl_cert_env_vars(); + // load from default system wide trust location. (the name is misleading) + builder.set_default_verify_paths().unwrap(); + } + if let Some((cert, key)) = conf.cert_key_file.as_ref() { + builder + .set_certificate_file(cert, SslFiletype::PEM) + .unwrap(); + + builder.set_private_key_file(key, SslFiletype::PEM).unwrap(); + } + } else { + init_ssl_cert_env_vars(); + builder.set_default_verify_paths().unwrap(); + } + + Connector { + ctx: Arc::new(builder.build()), + } + } +} + +/* + OpenSSL considers underscores in hostnames non-compliant. + We replace the underscore in the leftmost label as we must support these + hostnames for wildcard matches and we have not patched OpenSSL. + + https://github.com/openssl/openssl/issues/12566 + + > The labels must follow the rules for ARPANET host names. They must + > start with a letter, end with a letter or digit, and have as interior + > characters only letters, digits, and hyphen. There are also some + > restrictions on the length. Labels must be 63 characters or less. + - https://datatracker.ietf.org/doc/html/rfc1034#section-3.5 +*/ +fn replace_leftmost_underscore(sni: &str) -> Option<String> { + // wildcard is only leftmost label + let mut s = sni.splitn(2, '.'); + if let (Some(leftmost), Some(rest)) = (s.next(), s.next()) { + // if not a subdomain or leftmost does not contain underscore return + if !rest.contains('.') || !leftmost.contains('_') { + return None; + } + // we have a subdomain, replace underscores + let leftmost = leftmost.replace('_', "-"); + return Some(format!("{leftmost}.{rest}")); + } + None +} + +pub(crate) async fn connect<T, P>( + stream: T, + peer: &P, + alpn_override: Option<ALPN>, + tls_ctx: &SslConnector, +) -> Result<SslStream<T>> +where + T: IO, + P: Peer + Send + Sync, +{ + let mut ssl_conf = tls_ctx.configure().unwrap(); + + ssl_set_renegotiate_mode_freely(&mut ssl_conf); + + // Set up CA/verify cert store + // TODO: store X509Store in the peer directly + if let Some(ca_list) = peer.get_ca() { + let mut store_builder = X509StoreBuilder::new().unwrap(); + for ca in &***ca_list { + store_builder.add_cert(ca.clone()).unwrap(); + } + ssl_set_verify_cert_store(&mut ssl_conf, &store_builder.build()) + .or_err(InternalError, "failed to load cert store")?; + } + + // Set up client cert/key + if let Some(key_pair) = peer.get_client_cert_key() { + debug!("setting client cert and key"); + ssl_use_certificate(&mut ssl_conf, key_pair.leaf()) + .or_err(InternalError, "invalid client cert")?; + ssl_use_private_key(&mut ssl_conf, key_pair.key()) + .or_err(InternalError, "invalid client key")?; + + let intermediates = key_pair.intermediates(); + if !intermediates.is_empty() { + debug!("adding intermediate certificates for mTLS chain"); + for int in intermediates { + ssl_add_chain_cert(&mut ssl_conf, int) + .or_err(InternalError, "invalid intermediate client cert")?; + } + } + } + + if let Some(curve) = peer.get_peer_options().and_then(|o| o.curves) { + ssl_set_groups_list(&mut ssl_conf, curve).or_err(InternalError, "invalid curves")?; + } + + // second_keyshare is default true + if !peer.get_peer_options().map_or(true, |o| o.second_keyshare) { + ssl_use_second_key_share(&mut ssl_conf, false); + } + + // disable verification if sni does not exist + // XXX: verify on empty string cause null string seg fault + if peer.sni().is_empty() { + ssl_conf.set_use_server_name_indication(false); + /* NOTE: technically we can still verify who signs the cert but turn it off to be + consistant with nginx's behavior */ + ssl_conf.set_verify(SslVerifyMode::NONE); + } else if peer.verify_cert() { + if peer.verify_hostname() { + let verify_param = ssl_conf.param_mut(); + add_host(verify_param, peer.sni()).or_err(InternalError, "failed to add host")?; + // if sni had underscores in leftmost label replace and add + if let Some(sni_s) = replace_leftmost_underscore(peer.sni()) { + add_host(verify_param, sni_s.as_ref()).unwrap(); + } + if let Some(alt_cn) = peer.alternative_cn() { + if !alt_cn.is_empty() { + add_host(verify_param, alt_cn).unwrap(); + // if alt_cn had underscores in leftmost label replace and add + if let Some(alt_cn_s) = replace_leftmost_underscore(alt_cn) { + add_host(verify_param, alt_cn_s.as_ref()).unwrap(); + } + } + } + } + ssl_conf.set_verify(SslVerifyMode::PEER); + } else { + ssl_conf.set_verify(SslVerifyMode::NONE); + } + + /* + We always set set_verify_hostname(false) here because: + - verify case.) otherwise ssl.connect calls X509_VERIFY_PARAM_set1_host + which overrides the names added by add_host. Verify is + essentially on as long as the names are added. + - off case.) the non verify hostname case should have it disabled + */ + ssl_conf.set_verify_hostname(false); + + if let Some(alpn) = alpn_override.as_ref().or(peer.get_alpn()) { + ssl_conf.set_alpn_protos(alpn.to_wire_preference()).unwrap(); + } + + clear_error_stack(); + let connect_future = handshake(ssl_conf, peer.sni(), stream); + + match peer.connection_timeout() { + Some(t) => match pingora_timeout::timeout(t, connect_future).await { + Ok(res) => res, + Err(_) => Error::e_explain( + ConnectTimedout, + format!("connecting to server {}, timeout {:?}", peer, t), + ), + }, + None => connect_future.await, + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_replace_leftmost_underscore() { + let none_cases = [ + "", + "some", + "some.com", + "1.1.1.1:5050", + "dog.dot.com", + "dog.d_t.com", + "dog.dot.c_m", + "d_g.com", + "_", + "dog.c_m", + ]; + + for case in none_cases { + assert!(replace_leftmost_underscore(case).is_none(), "{}", case); + } + + assert_eq!( + Some("bb-b.some.com".to_string()), + replace_leftmost_underscore("bb_b.some.com") + ); + assert_eq!( + Some("a-a-a.some.com".to_string()), + replace_leftmost_underscore("a_a_a.some.com") + ); + assert_eq!( + Some("-.some.com".to_string()), + replace_leftmost_underscore("_.some.com") + ); + } +} diff --git a/pingora-core/src/lib.rs b/pingora-core/src/lib.rs new file mode 100644 index 0000000..cdeff85 --- /dev/null +++ b/pingora-core/src/lib.rs @@ -0,0 +1,69 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#![warn(clippy::all)] +#![allow(clippy::new_without_default)] +#![allow(clippy::type_complexity)] +#![allow(clippy::match_wild_err_arm)] +#![allow(clippy::missing_safety_doc)] +#![allow(clippy::upper_case_acronyms)] +// enable nightly feature async trait so that the docs are cleaner +#![cfg_attr(doc_async_trait, feature(async_fn_in_trait))] + +//! # Pingora +//! +//! Pingora is a collection of service frameworks and network libraries battle-tested by the Internet. +//! It is to build robust, scalable and secure network infrastructures and services at Internet scale. +//! +//! # Features +//! - Http 1.x and Http 2 +//! - Modern TLS with OpenSSL or BoringSSL (FIPS compatible) +//! - Zero downtime upgrade +//! +//! # Usage +//! This crate provides low level service and protocol implementation and abstraction. +//! +//! If looking to build a (reverse) proxy, see `pingora-proxy` crate. +//! +//! # Optional features +//! `boringssl`: Switch the internal TLS library from OpenSSL to BoringSSL. + +pub mod apps; +pub mod connectors; +pub mod listeners; +pub mod modules; +pub mod protocols; +pub mod server; +pub mod services; +pub mod upstreams; +pub mod utils; + +pub use pingora_error::{ErrorType::*, *}; + +// If both openssl and boringssl are enabled, prefer boringssl. +// This is to make sure that boringssl can override the default openssl feature +// when this crate is used indirectly by other crates. +#[cfg(feature = "boringssl")] +pub use pingora_boringssl as tls; + +#[cfg(all(not(feature = "boringssl"), feature = "openssl"))] +pub use pingora_openssl as tls; + +pub mod prelude { + pub use crate::server::configuration::Opt; + pub use crate::server::Server; + pub use crate::services::background::background_service; + pub use crate::upstreams::peer::HttpPeer; + pub use pingora_error::{ErrorType::*, *}; +} diff --git a/pingora-core/src/listeners/l4.rs b/pingora-core/src/listeners/l4.rs new file mode 100644 index 0000000..1bec6c6 --- /dev/null +++ b/pingora-core/src/listeners/l4.rs @@ -0,0 +1,311 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use log::warn; +use pingora_error::{ + ErrorType::{AcceptError, BindError}, + OrErr, Result, +}; +use std::fs::Permissions; +use std::io::ErrorKind; +use std::net::{SocketAddr, ToSocketAddrs}; +use std::os::unix::io::{AsRawFd, FromRawFd}; +use std::os::unix::net::UnixListener as StdUnixListener; +use std::time::Duration; +use tokio::net::TcpSocket; + +use crate::protocols::l4::listener::Listener; +pub use crate::protocols::l4::stream::Stream; +use crate::server::ListenFds; + +const TCP_LISTENER_MAX_TRY: usize = 30; +const TCP_LISTENER_TRY_STEP: Duration = Duration::from_secs(1); +// TODO: configurable backlog +const LISTENER_BACKLOG: u32 = 65535; + +/// Address for listening server, either TCP/UDS socket. +#[derive(Clone, Debug)] +pub enum ServerAddress { + Tcp(String, Option<TcpSocketOptions>), + Uds(String, Option<Permissions>), +} + +impl AsRef<str> for ServerAddress { + fn as_ref(&self) -> &str { + match &self { + Self::Tcp(l, _) => l, + Self::Uds(l, _) => l, + } + } +} + +/// TCP socket configuration options. +#[derive(Clone, Debug)] +pub struct TcpSocketOptions { + /// IPV6_V6ONLY flag (if true, limit socket to IPv6 communication only). + /// This is mostly useful when binding to `[::]`, which on most Unix distributions + /// will bind to both IPv4 and IPv6 addresses by default. + pub ipv6_only: bool, + // TODO: allow configuring reuseaddr, backlog, etc. from here? +} + +mod uds { + use super::{OrErr, Result}; + use crate::protocols::l4::listener::Listener; + use log::{debug, error}; + use pingora_error::ErrorType::BindError; + use std::fs::{self, Permissions}; + use std::io::ErrorKind; + use std::os::unix::fs::PermissionsExt; + use std::os::unix::net::UnixListener as StdUnixListener; + use tokio::net::UnixListener; + + use super::LISTENER_BACKLOG; + + pub(super) fn set_perms(path: &str, perms: Option<Permissions>) -> Result<()> { + // set read/write permissions for all users on the socket by default + let perms = perms.unwrap_or(Permissions::from_mode(0o666)); + fs::set_permissions(path, perms).or_err_with(BindError, || { + format!("Fail to bind to {path}, could not set permissions") + }) + } + + pub(super) fn set_backlog(l: StdUnixListener, backlog: u32) -> Result<UnixListener> { + let socket: socket2::Socket = l.into(); + // Note that we call listen on an already listening socket + // POSIX undefined but on Linux it will update the backlog size + socket + .listen(backlog as i32) + .or_err_with(BindError, || format!("listen() failed on {socket:?}"))?; + UnixListener::from_std(socket.into()).or_err(BindError, "Failed to convert to tokio socket") + } + + pub(super) fn bind(addr: &str, perms: Option<Permissions>) -> Result<Listener> { + /* + We remove the filename/address in case there is a dangling reference. + + "Binding to a socket with a filename creates a socket in the + filesystem that must be deleted by the caller when it is no + longer needed (using unlink(2))" + */ + match std::fs::remove_file(addr) { + Ok(()) => { + debug!("unlink {addr} done"); + } + Err(e) => match e.kind() { + ErrorKind::NotFound => debug!("unlink {addr} not found: {e}"), + _ => error!("unlink {addr} failed: {e}"), + }, + } + let listener_socket = UnixListener::bind(addr) + .or_err_with(BindError, || format!("Bind() failed on {addr}"))?; + set_perms(addr, perms)?; + let std_listener = listener_socket.into_std().unwrap(); + Ok(set_backlog(std_listener, LISTENER_BACKLOG)?.into()) + } +} + +// currently, these options can only apply on sockets prior to calling bind() +fn apply_tcp_socket_options(sock: &TcpSocket, opt: Option<&TcpSocketOptions>) -> Result<()> { + let Some(opt) = opt else { + return Ok(()); + }; + let socket_ref = socket2::SockRef::from(sock); + socket_ref + .set_only_v6(opt.ipv6_only) + .or_err(BindError, "failed to set IPV6_V6ONLY") +} + +fn from_raw_fd(address: &ServerAddress, fd: i32) -> Result<Listener> { + match address { + ServerAddress::Uds(addr, perm) => { + let std_listener = unsafe { StdUnixListener::from_raw_fd(fd) }; + // set permissions just in case + uds::set_perms(addr, perm.clone())?; + Ok(uds::set_backlog(std_listener, LISTENER_BACKLOG)?.into()) + } + ServerAddress::Tcp(_, _) => { + let std_listener_socket = unsafe { std::net::TcpStream::from_raw_fd(fd) }; + let listener_socket = TcpSocket::from_std_stream(std_listener_socket); + // Note that we call listen on an already listening socket + // POSIX undefined but on Linux it will update the backlog size + Ok(listener_socket + .listen(LISTENER_BACKLOG) + .or_err_with(BindError, || format!("Listen() failed on {address:?}"))? + .into()) + } + } +} + +async fn bind_tcp(addr: &str, opt: Option<TcpSocketOptions>) -> Result<Listener> { + let mut try_count = 0; + loop { + let sock_addr = addr + .to_socket_addrs() // NOTE: this could invoke a blocking network lookup + .or_err_with(BindError, || format!("Invalid listen address {addr}"))? + .next() // take the first one for now + .unwrap(); // assume there is always at least one + + let listener_socket = match sock_addr { + SocketAddr::V4(_) => TcpSocket::new_v4(), + SocketAddr::V6(_) => TcpSocket::new_v6(), + } + .or_err_with(BindError, || format!("fail to create address {sock_addr}"))?; + + // NOTE: this is to preserve the current TcpListener::bind() behavior. + // We have a few test relying on this behavior to allow multiple identical + // test servers to coexist. + listener_socket + .set_reuseaddr(true) + .or_err(BindError, "fail to set_reuseaddr(true)")?; + + apply_tcp_socket_options(&listener_socket, opt.as_ref())?; + + match listener_socket.bind(sock_addr) { + Ok(()) => { + break Ok(listener_socket + .listen(LISTENER_BACKLOG) + .or_err(BindError, "bind() failed")? + .into()) + } + Err(e) => { + if e.kind() != ErrorKind::AddrInUse { + break Err(e).or_err_with(BindError, || format!("bind() failed on {addr}")); + } + try_count += 1; + if try_count >= TCP_LISTENER_MAX_TRY { + break Err(e).or_err_with(BindError, || { + format!("bind() failed, after retries, {addr} still in use") + }); + } + warn!("{addr} is in use, will try again"); + tokio::time::sleep(TCP_LISTENER_TRY_STEP).await; + } + } + } +} + +async fn bind(addr: &ServerAddress) -> Result<Listener> { + match addr { + ServerAddress::Uds(l, perm) => uds::bind(l, perm.clone()), + ServerAddress::Tcp(l, opt) => bind_tcp(l, opt.clone()).await, + } +} + +pub struct ListenerEndpoint { + listen_addr: ServerAddress, + listener: Option<Listener>, +} + +impl ListenerEndpoint { + pub fn new(listen_addr: ServerAddress) -> Self { + ListenerEndpoint { + listen_addr, + listener: None, + } + } + + pub fn as_str(&self) -> &str { + self.listen_addr.as_ref() + } + + pub async fn listen(&mut self, fds: Option<ListenFds>) -> Result<()> { + if self.listener.is_some() { + return Ok(()); + } + + let listener = if let Some(fds_table) = fds { + let addr = self.listen_addr.as_ref(); + // consider make this mutex std::sync::Mutex or OnceCell + let mut table = fds_table.lock().await; + if let Some(fd) = table.get(addr.as_ref()) { + from_raw_fd(&self.listen_addr, *fd)? + } else { + // not found + let listener = bind(&self.listen_addr).await?; + table.add(addr.to_string(), listener.as_raw_fd()); + listener + } + } else { + // not found, no fd table + bind(&self.listen_addr).await? + }; + self.listener = Some(listener); + Ok(()) + } + + pub async fn accept(&mut self) -> Result<Stream> { + let Some(listener) = self.listener.as_mut() else { + // panic otherwise this thing dead loop + panic!("Need to call listen() first"); + }; + let mut stream = listener + .accept() + .await + .or_err(AcceptError, "Fail to accept()")?; + stream.set_nodelay()?; + Ok(stream) + } +} + +#[cfg(test)] +mod test { + use super::*; + + #[tokio::test] + async fn test_listen_tcp() { + let addr = "127.0.0.1:7100"; + let mut listener = ListenerEndpoint::new(ServerAddress::Tcp(addr.into(), None)); + listener.listen(None).await.unwrap(); + tokio::spawn(async move { + // just try to accept once + listener.accept().await.unwrap(); + }); + tokio::net::TcpStream::connect(addr) + .await + .expect("can connect to TCP listener"); + } + + #[tokio::test] + async fn test_listen_tcp_ipv6_only() { + let sock_opt = Some(TcpSocketOptions { ipv6_only: true }); + let mut listener = ListenerEndpoint::new(ServerAddress::Tcp("[::]:7101".into(), sock_opt)); + listener.listen(None).await.unwrap(); + tokio::spawn(async move { + // just try to accept twice + listener.accept().await.unwrap(); + listener.accept().await.unwrap(); + }); + tokio::net::TcpStream::connect("127.0.0.1:7101") + .await + .expect_err("cannot connect to v4 addr"); + tokio::net::TcpStream::connect("[::1]:7101") + .await + .expect("can connect to v6 addr"); + } + + #[tokio::test] + async fn test_listen_uds() { + let addr = "/tmp/test_listen_uds"; + let mut listener = ListenerEndpoint::new(ServerAddress::Uds(addr.into(), None)); + listener.listen(None).await.unwrap(); + tokio::spawn(async move { + // just try to accept once + listener.accept().await.unwrap(); + }); + tokio::net::UnixStream::connect(addr) + .await + .expect("can connect to UDS listener"); + } +} diff --git a/pingora-core/src/listeners/mod.rs b/pingora-core/src/listeners/mod.rs new file mode 100644 index 0000000..9d08f14 --- /dev/null +++ b/pingora-core/src/listeners/mod.rs @@ -0,0 +1,248 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! The listening endpoints (TCP and TLS) and their configurations. + +mod l4; +mod tls; + +use crate::protocols::Stream; +use crate::server::ListenFds; + +use pingora_error::Result; +use std::{fs::Permissions, sync::Arc}; + +use l4::{ListenerEndpoint, Stream as L4Stream}; +use tls::Acceptor; + +pub use crate::protocols::ssl::server::TlsAccept; +pub use l4::{ServerAddress, TcpSocketOptions}; +pub use tls::{TlsSettings, ALPN}; + +struct TransportStackBuilder { + l4: ServerAddress, + tls: Option<TlsSettings>, +} + +impl TransportStackBuilder { + pub fn build(&mut self, upgrade_listeners: Option<ListenFds>) -> TransportStack { + TransportStack { + l4: ListenerEndpoint::new(self.l4.clone()), + tls: self.tls.take().map(|tls| Arc::new(tls.build())), + upgrade_listeners, + } + } +} + +pub(crate) struct TransportStack { + l4: ListenerEndpoint, + tls: Option<Arc<Acceptor>>, + // listeners sent from the old process for graceful upgrade + upgrade_listeners: Option<ListenFds>, +} + +impl TransportStack { + pub fn as_str(&self) -> &str { + self.l4.as_str() + } + + pub async fn listen(&mut self) -> Result<()> { + self.l4.listen(self.upgrade_listeners.take()).await + } + + pub async fn accept(&mut self) -> Result<UninitializedStream> { + let stream = self.l4.accept().await?; + Ok(UninitializedStream { + l4: stream, + tls: self.tls.clone(), + }) + } + + pub fn cleanup(&mut self) { + // placeholder + } +} + +pub(crate) struct UninitializedStream { + l4: L4Stream, + tls: Option<Arc<Acceptor>>, +} + +impl UninitializedStream { + pub async fn handshake(self) -> Result<Stream> { + if let Some(tls) = self.tls { + let tls_stream = tls.tls_handshake(self.l4).await?; + Ok(Box::new(tls_stream)) + } else { + Ok(Box::new(self.l4)) + } + } +} + +/// The struct to hold one more multiple listening endpoints +pub struct Listeners { + stacks: Vec<TransportStackBuilder>, +} + +impl Listeners { + /// Create a new [`Listeners`] with no listening endpoints. + pub fn new() -> Self { + Listeners { stacks: vec![] } + } + + /// Create a new [`Listeners`] with a TCP server endpoint from the given string. + pub fn tcp(addr: &str) -> Self { + let mut listeners = Self::new(); + listeners.add_tcp(addr); + listeners + } + + /// Create a new [`Listeners`] with a Unix domain socket endpoint from the given string. + pub fn uds(addr: &str, perm: Option<Permissions>) -> Self { + let mut listeners = Self::new(); + listeners.add_uds(addr, perm); + listeners + } + + /// Create a new [`Listeners`] with with a TLS (TCP) endpoint with the given address string, + /// and path to the certificate/private key pairs. + /// This endpoint will adopt the [Mozilla Intermediate](https://wiki.mozilla.org/Security/Server_Side_TLS#Intermediate_compatibility_.28recommended.29) + /// server side TLS settings. + pub fn tls(addr: &str, cert_path: &str, key_path: &str) -> Result<Self> { + let mut listeners = Self::new(); + listeners.add_tls(addr, cert_path, key_path)?; + Ok(listeners) + } + + /// Add a TCP endpoint to `self`. + pub fn add_tcp(&mut self, addr: &str) { + self.add_address(ServerAddress::Tcp(addr.into(), None)); + } + + /// Add a TCP endpoint to `self`, with the given [`TcpSocketOptions`]. + pub fn add_tcp_with_settings(&mut self, addr: &str, sock_opt: TcpSocketOptions) { + self.add_address(ServerAddress::Tcp(addr.into(), Some(sock_opt))); + } + + /// Add a Unix domain socket endpoint to `self`. + pub fn add_uds(&mut self, addr: &str, perm: Option<Permissions>) { + self.add_address(ServerAddress::Uds(addr.into(), perm)); + } + + /// Add a TLS endpoint to `self` with the [Mozilla Intermediate](https://wiki.mozilla.org/Security/Server_Side_TLS#Intermediate_compatibility_.28recommended.29) + /// server side TLS settings. + pub fn add_tls(&mut self, addr: &str, cert_path: &str, key_path: &str) -> Result<()> { + self.add_tls_with_settings(addr, None, TlsSettings::intermediate(cert_path, key_path)?); + Ok(()) + } + + /// Add a TLS endpoint to `self` with the given socket and server side TLS settings. + /// See [`TlsSettings`] and [`TcpSocketOptions`] for more details. + pub fn add_tls_with_settings( + &mut self, + addr: &str, + sock_opt: Option<TcpSocketOptions>, + settings: TlsSettings, + ) { + self.add_endpoint(ServerAddress::Tcp(addr.into(), sock_opt), Some(settings)); + } + + /// Add the given [`ServerAddress`] to `self`. + pub fn add_address(&mut self, addr: ServerAddress) { + self.add_endpoint(addr, None); + } + + /// Add the given [`ServerAddress`] to `self` with the given [`TlsSettings`] if provided + pub fn add_endpoint(&mut self, l4: ServerAddress, tls: Option<TlsSettings>) { + self.stacks.push(TransportStackBuilder { l4, tls }) + } + + pub(crate) fn build(&mut self, upgrade_listeners: Option<ListenFds>) -> Vec<TransportStack> { + self.stacks + .iter_mut() + .map(|b| b.build(upgrade_listeners.clone())) + .collect() + } + + pub(crate) fn cleanup(&self) { + // placeholder + } +} + +#[cfg(test)] +mod test { + use super::*; + use tokio::io::AsyncWriteExt; + use tokio::net::TcpStream; + use tokio::time::{sleep, Duration}; + + #[tokio::test] + async fn test_listen_tcp() { + let addr1 = "127.0.0.1:7101"; + let addr2 = "127.0.0.1:7102"; + let mut listeners = Listeners::tcp(addr1); + listeners.add_tcp(addr2); + + let listeners = listeners.build(None); + assert_eq!(listeners.len(), 2); + for mut listener in listeners { + tokio::spawn(async move { + listener.listen().await.unwrap(); + // just try to accept once + let stream = listener.accept().await.unwrap(); + stream.handshake().await.unwrap(); + }); + } + + // make sure the above starts before the lines below + sleep(Duration::from_millis(10)).await; + + TcpStream::connect(addr1).await.unwrap(); + TcpStream::connect(addr2).await.unwrap(); + } + + #[tokio::test] + async fn test_listen_tls() { + use tokio::io::AsyncReadExt; + + let addr = "127.0.0.1:7103"; + let cert_path = format!("{}/tests/keys/server.crt", env!("CARGO_MANIFEST_DIR")); + let key_path = format!("{}/tests/keys/key.pem", env!("CARGO_MANIFEST_DIR")); + let mut listeners = Listeners::tls(addr, &cert_path, &key_path).unwrap(); + let mut listener = listeners.build(None).pop().unwrap(); + + tokio::spawn(async move { + listener.listen().await.unwrap(); + // just try to accept once + let stream = listener.accept().await.unwrap(); + let mut stream = stream.handshake().await.unwrap(); + let mut buf = [0; 1024]; + let _ = stream.read(&mut buf).await.unwrap(); + stream + .write_all(b"HTTP/1.1 200 OK\r\nContent-Length: 1\r\n\r\na") + .await + .unwrap(); + }); + // make sure the above starts before the lines below + sleep(Duration::from_millis(10)).await; + + let client = reqwest::Client::builder() + .danger_accept_invalid_certs(true) + .build() + .unwrap(); + + let res = client.get(format!("https://{addr}")).send().await.unwrap(); + assert_eq!(res.status(), reqwest::StatusCode::OK); + } +} diff --git a/pingora-core/src/listeners/tls.rs b/pingora-core/src/listeners/tls.rs new file mode 100644 index 0000000..ec53551 --- /dev/null +++ b/pingora-core/src/listeners/tls.rs @@ -0,0 +1,152 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use log::debug; +use pingora_error::{ErrorType, OrErr, Result}; +use std::ops::{Deref, DerefMut}; + +use crate::protocols::ssl::{ + server::{handshake, handshake_with_callback, TlsAcceptCallbacks}, + SslStream, +}; +use crate::protocols::IO; +use crate::tls::ssl::{SslAcceptor, SslAcceptorBuilder, SslFiletype, SslMethod}; + +pub use crate::protocols::ssl::ALPN; + +pub const TLS_CONF_ERR: ErrorType = ErrorType::Custom("TLSConfigError"); + +pub(crate) struct Acceptor { + ssl_acceptor: SslAcceptor, + callbacks: Option<TlsAcceptCallbacks>, +} + +/// The TLS settings of a listening endpoint +pub struct TlsSettings { + accept_builder: SslAcceptorBuilder, + callbacks: Option<TlsAcceptCallbacks>, +} + +impl Deref for TlsSettings { + type Target = SslAcceptorBuilder; + + fn deref(&self) -> &Self::Target { + &self.accept_builder + } +} + +impl DerefMut for TlsSettings { + fn deref_mut(&mut self) -> &mut Self::Target { + &mut self.accept_builder + } +} + +impl TlsSettings { + /// Create a new [`TlsSettings`] with the the [Mozilla Intermediate](https://wiki.mozilla.org/Security/Server_Side_TLS#Intermediate_compatibility_.28recommended.29). + /// server side TLS settings. Users can adjust the TLS settings after this object is created. + /// Return error if the provided certificate and private key are invalid or not found. + pub fn intermediate(cert_path: &str, key_path: &str) -> Result<Self> { + let mut accept_builder = SslAcceptor::mozilla_intermediate_v5(SslMethod::tls()).or_err( + TLS_CONF_ERR, + "fail to create mozilla_intermediate_v5 Acceptor", + )?; + accept_builder + .set_private_key_file(key_path, SslFiletype::PEM) + .or_err(TLS_CONF_ERR, "fail to read key file {key_path}")?; + accept_builder + .set_certificate_chain_file(cert_path) + .or_err(TLS_CONF_ERR, "fail to read cert file {cert_path}")?; + Ok(TlsSettings { + accept_builder, + callbacks: None, + }) + } + + /// Create a new [`TlsSettings`] similar to [TlsSettings::intermediate()]. A struct that implements [TlsAcceptCallbacks] + /// is needed to provide the certificate during the TLS handshake. + pub fn with_callbacks(callbacks: TlsAcceptCallbacks) -> Result<Self> { + let accept_builder = SslAcceptor::mozilla_intermediate_v5(SslMethod::tls()).or_err( + TLS_CONF_ERR, + "fail to create mozilla_intermediate_v5 Acceptor", + )?; + Ok(TlsSettings { + accept_builder, + callbacks: Some(callbacks), + }) + } + + /// Enable HTTP/2 support for this endpoint, which is default off. + /// This effectively sets the ALPN to prefer HTTP/2 with HTTP/1.1 allowed + pub fn enable_h2(&mut self) { + self.set_alpn(ALPN::H2H1); + } + + /// Set the ALPN preference of this endpoint. See [`ALPN`] for more details + pub fn set_alpn(&mut self, alpn: ALPN) { + match alpn { + ALPN::H2H1 => self + .accept_builder + .set_alpn_select_callback(alpn::prefer_h2), + ALPN::H1 => self.accept_builder.set_alpn_select_callback(alpn::h1_only), + ALPN::H2 => self.accept_builder.set_alpn_select_callback(alpn::h2_only), + } + } + + pub(crate) fn build(self) -> Acceptor { + Acceptor { + ssl_acceptor: self.accept_builder.build(), + callbacks: self.callbacks, + } + } +} + +impl Acceptor { + pub async fn tls_handshake<S: IO>(&self, stream: S) -> Result<SslStream<S>> { + debug!("new ssl session"); + // TODO: be able to offload this handshake in a thread pool + if let Some(cb) = self.callbacks.as_ref() { + handshake_with_callback(&self.ssl_acceptor, stream, cb).await + } else { + handshake(&self.ssl_acceptor, stream).await + } + } +} + +mod alpn { + use super::*; + use crate::tls::ssl::{select_next_proto, AlpnError, SslRef}; + + // A standard implementation provided by the SSL lib is used below + + pub fn prefer_h2<'a>(_ssl: &mut SslRef, alpn_in: &'a [u8]) -> Result<&'a [u8], AlpnError> { + match select_next_proto(ALPN::H2H1.to_wire_preference(), alpn_in) { + Some(p) => Ok(p), + _ => Err(AlpnError::NOACK), // unknown ALPN, just ignore it. Most clients will fallback to h1 + } + } + + pub fn h1_only<'a>(_ssl: &mut SslRef, alpn_in: &'a [u8]) -> Result<&'a [u8], AlpnError> { + match select_next_proto(ALPN::H1.to_wire_preference(), alpn_in) { + Some(p) => Ok(p), + _ => Err(AlpnError::NOACK), // unknown ALPN, just ignore it. Most clients will fallback to h1 + } + } + + pub fn h2_only<'a>(_ssl: &mut SslRef, alpn_in: &'a [u8]) -> Result<&'a [u8], AlpnError> { + match select_next_proto(ALPN::H2.to_wire_preference(), alpn_in) { + Some(p) => Ok(p), + _ => Err(AlpnError::ALERT_FATAL), // cannot agree + } + } +} diff --git a/pingora-core/src/modules/http/compression.rs b/pingora-core/src/modules/http/compression.rs new file mode 100644 index 0000000..b07e1c7 --- /dev/null +++ b/pingora-core/src/modules/http/compression.rs @@ -0,0 +1,65 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! HTTP compression filter + +use super::*; +use crate::protocols::http::compression::ResponseCompressionCtx; + +/// HTTP response compression module +pub struct ResponseCompression(ResponseCompressionCtx); + +impl HttpModule for ResponseCompression { + fn as_any(&self) -> &dyn std::any::Any { + self + } + fn as_any_mut(&mut self) -> &mut dyn std::any::Any { + self + } + + fn request_header_filter(&mut self, req: &mut RequestHeader) -> Result<()> { + self.0.request_filter(req); + Ok(()) + } + + fn response_filter(&mut self, t: &mut HttpTask) -> Result<()> { + self.0.response_filter(t); + Ok(()) + } +} + +/// The builder for HTTP response compression module +pub struct ResponseCompressionBuilder { + level: u32, +} + +impl ResponseCompressionBuilder { + /// Return a [ModuleBuilder] for [ResponseCompression] with the given compression level + pub fn enable(level: u32) -> ModuleBuilder { + Box::new(ResponseCompressionBuilder { level }) + } +} + +impl HttpModuleBuilder for ResponseCompressionBuilder { + fn init(&self) -> Module { + Box::new(ResponseCompression(ResponseCompressionCtx::new( + self.level, false, + ))) + } + + fn order(&self) -> i16 { + // run the response filter later than most others filters + i16::MIN / 2 + } +} diff --git a/pingora-core/src/modules/http/mod.rs b/pingora-core/src/modules/http/mod.rs new file mode 100644 index 0000000..9c446b1 --- /dev/null +++ b/pingora-core/src/modules/http/mod.rs @@ -0,0 +1,277 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Modules for HTTP traffic. +//! +//! [HttpModule]s define request and response filters to use while running an [HttpServer] +//! application. +//! See the [ResponseCompression] module for an example of how to implement a basic module. + +pub mod compression; + +use crate::protocols::http::HttpTask; +use bytes::Bytes; +use once_cell::sync::OnceCell; +use pingora_error::Result; +use pingora_http::RequestHeader; +use std::any::Any; +use std::any::TypeId; +use std::collections::HashMap; +use std::sync::Arc; + +/// The trait a HTTP traffic module needs to implement +// TODO: * async filters for, e.g., 3rd party auth server; * access the connection for, e.g., GeoIP +pub trait HttpModule { + fn request_header_filter(&mut self, _req: &mut RequestHeader) -> Result<()> { + Ok(()) + } + + fn request_body_filter(&mut self, body: Option<Bytes>) -> Result<Option<Bytes>> { + Ok(body) + } + + fn response_filter(&mut self, _t: &mut HttpTask) -> Result<()> { + Ok(()) + } + + fn as_any(&self) -> &dyn Any; + fn as_any_mut(&mut self) -> &mut dyn Any; +} + +type Module = Box<dyn HttpModule + 'static + Send + Sync>; + +/// Trait to init the http module ctx for each request +pub trait HttpModuleBuilder { + /// The order the module will run + /// + /// The lower the value, the later it runs relative to other filters. + /// If the order of the filter is not important, leave it to the default 0. + fn order(&self) -> i16 { + 0 + } + + /// Initialize and return the per request module context + fn init(&self) -> Module; +} + +pub type ModuleBuilder = Box<dyn HttpModuleBuilder + 'static + Send + Sync>; + +/// The object to hold multiple http modules +pub struct HttpModules { + modules: Vec<ModuleBuilder>, + module_index: OnceCell<Arc<HashMap<TypeId, usize>>>, +} + +impl HttpModules { + /// Create a new [HttpModules] + pub fn new() -> Self { + HttpModules { + modules: vec![], + module_index: OnceCell::new(), + } + } + + /// Add a new [ModuleBuilder] to [HttpModules] + /// + /// Each type of [HttpModule] can be only added once. + /// # Panic + /// Panic if any [HttpModule] is added more tha once. + pub fn add_module(&mut self, builder: ModuleBuilder) { + if self.module_index.get().is_some() { + // We use a shared module_index the index would be out of sync if we + // add more modules. + panic!("cannot add module after ctx is already built") + } + self.modules.push(builder); + // not the most efficient way but should be fine + // largest order first + self.modules.sort_by_key(|m| -m.order()); + } + + /// Build the contexts of all the modules added to this [HttpModules] + pub fn build_ctx(&self) -> HttpModuleCtx { + let module_ctx: Vec<_> = self.modules.iter().map(|b| b.init()).collect(); + let module_index = self + .module_index + .get_or_init(|| { + let mut module_index = HashMap::with_capacity(self.modules.len()); + for (i, c) in module_ctx.iter().enumerate() { + let exist = module_index.insert(c.as_any().type_id(), i); + if exist.is_some() { + panic!("duplicated filters found") + } + } + Arc::new(module_index) + }) + .clone(); + + HttpModuleCtx { + module_ctx, + module_index, + } + } +} + +/// The Contexts of multiple modules +/// +/// This is the object that will apply all the included modules to a certain HTTP request. +/// The modules are ordered according to their `order()`. +pub struct HttpModuleCtx { + // the modules in the order of execution + module_ctx: Vec<Module>, + // find the module in the vec with its type ID + module_index: Arc<HashMap<TypeId, usize>>, +} + +impl HttpModuleCtx { + /// Create an placeholder empty [HttpModuleCtx]. + /// + /// [HttpModules] should be used to create nonempty [HttpModuleCtx]. + pub fn empty() -> Self { + HttpModuleCtx { + module_ctx: vec![], + module_index: Arc::new(HashMap::new()), + } + } + + /// Get a ref to [HttpModule] if any. + pub fn get<T: 'static>(&self) -> Option<&T> { + let idx = self.module_index.get(&TypeId::of::<T>())?; + let ctx = &self.module_ctx[*idx]; + Some( + ctx.as_any() + .downcast_ref::<T>() + .expect("type should always match"), + ) + } + + /// Get a mut ref to [HttpModule] if any. + pub fn get_mut<T: 'static>(&mut self) -> Option<&mut T> { + let idx = self.module_index.get(&TypeId::of::<T>())?; + let ctx = &mut self.module_ctx[*idx]; + Some( + ctx.as_any_mut() + .downcast_mut::<T>() + .expect("type should always match"), + ) + } + + /// Run the `request_header_filter` for all the modules according to their orders. + pub fn request_header_filter(&mut self, req: &mut RequestHeader) -> Result<()> { + for filter in self.module_ctx.iter_mut() { + filter.request_header_filter(req)?; + } + Ok(()) + } + + /// Run the `request_body_filter` for all the modules according to their orders. + pub fn request_body_filter(&mut self, mut body: Option<Bytes>) -> Result<Option<Bytes>> { + for filter in self.module_ctx.iter_mut() { + body = filter.request_body_filter(body)?; + } + Ok(body) + } + + /// Run the `response_filter` for all the modules according to their orders. + pub fn response_filter(&mut self, t: &mut HttpTask) -> Result<()> { + for filter in self.module_ctx.iter_mut() { + filter.response_filter(t)?; + } + Ok(()) + } +} + +#[cfg(test)] +mod tests { + use super::*; + + struct MyModule; + impl HttpModule for MyModule { + fn as_any(&self) -> &dyn Any { + self + } + fn as_any_mut(&mut self) -> &mut dyn Any { + self + } + fn request_header_filter(&mut self, req: &mut RequestHeader) -> Result<()> { + req.insert_header("my-filter", "1") + } + } + struct MyModuleBuilder; + impl HttpModuleBuilder for MyModuleBuilder { + fn order(&self) -> i16 { + 1 + } + + fn init(&self) -> Module { + Box::new(MyModule) + } + } + + struct MyOtherModule; + impl HttpModule for MyOtherModule { + fn as_any(&self) -> &dyn Any { + self + } + fn as_any_mut(&mut self) -> &mut dyn Any { + self + } + fn request_header_filter(&mut self, req: &mut RequestHeader) -> Result<()> { + if req.headers.get("my-filter").is_some() { + // if this MyOtherModule runs after MyModule + req.insert_header("my-filter", "2") + } else { + // if this MyOtherModule runs before MyModule + req.insert_header("my-other-filter", "1") + } + } + } + struct MyOtherModuleBuilder; + impl HttpModuleBuilder for MyOtherModuleBuilder { + fn order(&self) -> i16 { + -1 + } + + fn init(&self) -> Module { + Box::new(MyOtherModule) + } + } + + #[test] + fn test_module_get() { + let mut http_module = HttpModules::new(); + http_module.add_module(Box::new(MyModuleBuilder)); + http_module.add_module(Box::new(MyOtherModuleBuilder)); + let mut ctx = http_module.build_ctx(); + assert!(ctx.get::<MyModule>().is_some()); + assert!(ctx.get::<MyOtherModule>().is_some()); + assert!(ctx.get::<usize>().is_none()); + assert!(ctx.get_mut::<MyModule>().is_some()); + assert!(ctx.get_mut::<MyOtherModule>().is_some()); + assert!(ctx.get_mut::<usize>().is_none()); + } + + #[test] + fn test_module_filter() { + let mut http_module = HttpModules::new(); + http_module.add_module(Box::new(MyOtherModuleBuilder)); + http_module.add_module(Box::new(MyModuleBuilder)); + let mut ctx = http_module.build_ctx(); + let mut req = RequestHeader::build("Get", b"/", None).unwrap(); + ctx.request_header_filter(&mut req).unwrap(); + // MyModule runs before MyOtherModule + assert_eq!(req.headers.get("my-filter").unwrap(), "2"); + assert!(req.headers.get("my-other-filter").is_none()); + } +} diff --git a/pingora-core/src/modules/mod.rs b/pingora-core/src/modules/mod.rs new file mode 100644 index 0000000..385b124 --- /dev/null +++ b/pingora-core/src/modules/mod.rs @@ -0,0 +1,16 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Modules to extend the functionalities of pingora services. +pub mod http; diff --git a/pingora-core/src/protocols/digest.rs b/pingora-core/src/protocols/digest.rs new file mode 100644 index 0000000..13ce35c --- /dev/null +++ b/pingora-core/src/protocols/digest.rs @@ -0,0 +1,66 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Extra information about the connection + +use std::sync::Arc; +use std::time::SystemTime; + +use super::raw_connect::ProxyDigest; +use super::ssl::digest::SslDigest; + +/// The information can be extracted from a connection +#[derive(Clone, Debug)] +pub struct Digest { + /// Information regarding the TLS of this connection if any + pub ssl_digest: Option<Arc<SslDigest>>, + /// Timing information + pub timing_digest: Vec<Option<TimingDigest>>, + /// information regarding the CONNECT proxy this connection uses. + pub proxy_digest: Option<Arc<ProxyDigest>>, +} + +/// The interface to return protocol related information +pub trait ProtoDigest { + fn get_digest(&self) -> Option<&Digest> { + None + } +} + +/// The timing information of the connection +#[derive(Clone, Debug)] +pub struct TimingDigest { + /// When this connection was established + pub established_ts: SystemTime, +} + +impl Default for TimingDigest { + fn default() -> Self { + TimingDigest { + established_ts: SystemTime::UNIX_EPOCH, + } + } +} + +/// The interface to return timing information +pub trait GetTimingDigest { + /// Return the timing for each layer from the lowest layer to upper + fn get_timing_digest(&self) -> Vec<Option<TimingDigest>>; +} + +/// The interface to set or return proxy information +pub trait GetProxyDigest { + fn get_proxy_digest(&self) -> Option<Arc<ProxyDigest>>; + fn set_proxy_digest(&mut self, _digest: ProxyDigest) {} +} diff --git a/pingora-core/src/protocols/http/body_buffer.rs b/pingora-core/src/protocols/http/body_buffer.rs new file mode 100644 index 0000000..f3d7cdf --- /dev/null +++ b/pingora-core/src/protocols/http/body_buffer.rs @@ -0,0 +1,61 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use bytes::{Bytes, BytesMut}; + +/// A buffer with size limit. When the total amount of data written to the buffer is below the limit +/// all the data will be held in the buffer. Otherwise, the buffer will report to be truncated. +pub(crate) struct FixedBuffer { + buffer: BytesMut, + capacity: usize, + truncated: bool, +} + +impl FixedBuffer { + pub fn new(capacity: usize) -> Self { + FixedBuffer { + buffer: BytesMut::new(), + capacity, + truncated: false, + } + } + + // TODO: maybe store a Vec of Bytes for zero-copy + pub fn write_to_buffer(&mut self, data: &Bytes) { + if !self.truncated && (self.buffer.len() + data.len() <= self.capacity) { + self.buffer.extend_from_slice(data); + } else { + // TODO: clear data because the data held here is useless anyway? + self.truncated = true; + } + } + pub fn clear(&mut self) { + self.truncated = false; + self.buffer.clear(); + } + pub fn is_empty(&self) -> bool { + self.buffer.len() == 0 + } + pub fn is_truncated(&self) -> bool { + self.truncated + } + pub fn get_buffer(&self) -> Option<Bytes> { + // TODO: return None if truncated? + if !self.is_empty() { + Some(self.buffer.clone().freeze()) + } else { + None + } + } +} diff --git a/pingora-core/src/protocols/http/client.rs b/pingora-core/src/protocols/http/client.rs new file mode 100644 index 0000000..0fe6c90 --- /dev/null +++ b/pingora-core/src/protocols/http/client.rs @@ -0,0 +1,161 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use bytes::Bytes; +use pingora_error::Result; +use pingora_http::{RequestHeader, ResponseHeader}; +use std::time::Duration; + +use super::v1::client::HttpSession as Http1Session; +use super::v2::client::Http2Session; +use crate::protocols::Digest; + +/// A type for Http client session. It can be either an Http1 connection or an Http2 stream. +pub enum HttpSession { + H1(Http1Session), + H2(Http2Session), +} + +impl HttpSession { + pub fn as_http1(&self) -> Option<&Http1Session> { + match self { + Self::H1(s) => Some(s), + Self::H2(_) => None, + } + } + + pub fn as_http2(&self) -> Option<&Http2Session> { + match self { + Self::H1(_) => None, + Self::H2(s) => Some(s), + } + } + /// Write the request header to the server + /// After the request header is sent. The caller can either start reading the response or + /// sending request body if any. + pub async fn write_request_header(&mut self, req: Box<RequestHeader>) -> Result<()> { + match self { + HttpSession::H1(h1) => { + h1.write_request_header(req).await?; + Ok(()) + } + HttpSession::H2(h2) => h2.write_request_header(req, false), + } + } + + /// Write a chunk of the request body. + pub async fn write_request_body(&mut self, data: Bytes, end: bool) -> Result<()> { + match self { + HttpSession::H1(h1) => { + // TODO: maybe h1 should also have the concept of `end` + h1.write_body(&data).await?; + Ok(()) + } + HttpSession::H2(h2) => h2.write_request_body(data, end), + } + } + + /// Signal that the request body has ended + pub async fn finish_request_body(&mut self) -> Result<()> { + match self { + HttpSession::H1(h1) => { + h1.finish_body().await?; + Ok(()) + } + HttpSession::H2(h2) => h2.finish_request_body(), + } + } + + /// Set the read timeout for reading header and body. + /// + /// The timeout is per read operation, not on the overall time reading the entire response + pub fn set_read_timeout(&mut self, timeout: Duration) { + match self { + HttpSession::H1(h1) => h1.read_timeout = Some(timeout), + HttpSession::H2(h2) => h2.read_timeout = Some(timeout), + } + } + + /// Set the write timeout for writing header and body. + /// + /// The timeout is per write operation, not on the overall time writing the entire request + pub fn set_write_timeout(&mut self, timeout: Duration) { + match self { + HttpSession::H1(h1) => h1.write_timeout = Some(timeout), + HttpSession::H2(_) => { /* no write timeout because the actual write happens async*/ } + } + } + + /// Read the response header from the server + /// For http1, this function can be called multiple times, if the headers received are just + /// informational headers. + pub async fn read_response_header(&mut self) -> Result<()> { + match self { + HttpSession::H1(h1) => { + h1.read_response().await?; + Ok(()) + } + HttpSession::H2(h2) => h2.read_response_header().await, + } + } + + /// Read response body + /// + /// `None` when no more body to read. + pub async fn read_response_body(&mut self) -> Result<Option<Bytes>> { + match self { + HttpSession::H1(h1) => h1.read_body_bytes().await, + HttpSession::H2(h2) => h2.read_response_body().await, + } + } + + /// No (more) body to read + pub fn response_done(&mut self) -> bool { + match self { + HttpSession::H1(h1) => h1.is_body_done(), + HttpSession::H2(h2) => h2.response_finished(), + } + } + + /// Give up the http session abruptly. + /// For H1 this will close the underlying connection + /// For H2 this will send RST_STREAM frame to end this stream if the stream has not ended at all + pub async fn shutdown(&mut self) { + match self { + Self::H1(s) => s.shutdown().await, + Self::H2(s) => s.shutdown(), + } + } + + /// Get the response header of the server + /// + /// `None` if the response header is not read yet. + pub fn response_header(&self) -> Option<&ResponseHeader> { + match self { + Self::H1(s) => s.resp_header(), + Self::H2(s) => s.response_header(), + } + } + + /// Return the [Digest] of the connection + /// + /// For reused connection, the timing in the digest will reflect its initial handshakes + /// The caller should check if the connection is reused to avoid misuse the timing field + pub fn digest(&self) -> Option<&Digest> { + match self { + Self::H1(s) => Some(s.digest()), + Self::H2(s) => s.digest(), + } + } +} diff --git a/pingora-core/src/protocols/http/compression/brotli.rs b/pingora-core/src/protocols/http/compression/brotli.rs new file mode 100644 index 0000000..956f87d --- /dev/null +++ b/pingora-core/src/protocols/http/compression/brotli.rs @@ -0,0 +1,161 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use super::Encode; +use super::COMPRESSION_ERROR; + +use brotli::{CompressorWriter, DecompressorWriter}; +use bytes::Bytes; +use pingora_error::{OrErr, Result}; +use std::io::Write; +use std::time::{Duration, Instant}; + +pub struct Decompressor { + decompress: DecompressorWriter<Vec<u8>>, + total_in: usize, + total_out: usize, + duration: Duration, +} + +impl Decompressor { + pub fn new() -> Self { + Decompressor { + // default buf is 4096 if 0 is used, TODO: figure out the significance of this value + decompress: DecompressorWriter::new(vec![], 0), + total_in: 0, + total_out: 0, + duration: Duration::new(0, 0), + } + } +} + +impl Encode for Decompressor { + fn encode(&mut self, input: &[u8], end: bool) -> Result<Bytes> { + // reserve at most 16k + const MAX_INIT_COMPRESSED_SIZE_CAP: usize = 4 * 1024; + // Brotli compress ratio can be 3.5 to 4.5 + const ESTIMATED_COMPRESSION_RATIO: usize = 4; + let start = Instant::now(); + self.total_in += input.len(); + // cap the buf size amplification, there is a DoS risk of always allocate + // 4x the memory of the input buffer + let reserve_size = if input.len() < MAX_INIT_COMPRESSED_SIZE_CAP { + input.len() * ESTIMATED_COMPRESSION_RATIO + } else { + input.len() + }; + self.decompress.get_mut().reserve(reserve_size); + self.decompress + .write_all(input) + .or_err(COMPRESSION_ERROR, "while decompress Brotli")?; + // write to vec will never fail. The only possible error is that the input data + // is invalid (not brotli compressed) + if end { + self.decompress + .flush() + .or_err(COMPRESSION_ERROR, "while decompress Brotli")?; + } + self.total_out += self.decompress.get_ref().len(); + self.duration += start.elapsed(); + Ok(std::mem::take(self.decompress.get_mut()).into()) // into() Bytes will drop excess capacity + } + + fn stat(&self) -> (&'static str, usize, usize, Duration) { + ("de-brotli", self.total_in, self.total_out, self.duration) + } +} + +pub struct Compressor { + compress: CompressorWriter<Vec<u8>>, + total_in: usize, + total_out: usize, + duration: Duration, +} + +impl Compressor { + pub fn new(level: u32) -> Self { + Compressor { + // buf_size:4096 , lgwin:19 TODO: fine tune these + compress: CompressorWriter::new(vec![], 4096, level, 19), + total_in: 0, + total_out: 0, + duration: Duration::new(0, 0), + } + } +} + +impl Encode for Compressor { + fn encode(&mut self, input: &[u8], end: bool) -> Result<Bytes> { + // reserve at most 16k + const MAX_INIT_COMPRESSED_BUF_SIZE: usize = 16 * 1024; + let start = Instant::now(); + self.total_in += input.len(); + + // reserve at most input size, cap at 16k, compressed output should be smaller + self.compress + .get_mut() + .reserve(std::cmp::min(MAX_INIT_COMPRESSED_BUF_SIZE, input.len())); + self.compress + .write_all(input) + .or_err(COMPRESSION_ERROR, "while compress Brotli")?; + // write to vec will never fail. + if end { + self.compress + .flush() + .or_err(COMPRESSION_ERROR, "while compress Brotli")?; + } + self.total_out += self.compress.get_ref().len(); + self.duration += start.elapsed(); + Ok(std::mem::take(self.compress.get_mut()).into()) // into() Bytes will drop excess capacity + } + + fn stat(&self) -> (&'static str, usize, usize, Duration) { + ("brotli", self.total_in, self.total_out, self.duration) + } +} + +#[cfg(test)] +mod tests_stream { + use super::*; + + #[test] + fn decompress_brotli_data() { + let mut compressor = Decompressor::new(); + let decompressed = compressor + .encode( + &[ + 0x1f, 0x0f, 0x00, 0xf8, 0x45, 0x07, 0x87, 0x3e, 0x10, 0xfb, 0x55, 0x92, 0xec, + 0x12, 0x09, 0xcc, 0x38, 0xdd, 0x51, 0x1e, + ], + true, + ) + .unwrap(); + + assert_eq!(&decompressed[..], &b"adcdefgabcdefgh\n"[..]); + } + + #[test] + fn compress_brotli_data() { + let mut compressor = Compressor::new(11); + let compressed = compressor.encode(&b"adcdefgabcdefgh\n"[..], true).unwrap(); + + assert_eq!( + &compressed[..], + &[ + 0x85, 0x07, 0x00, 0xf8, 0x45, 0x07, 0x87, 0x3e, 0x10, 0xfb, 0x55, 0x92, 0xec, 0x12, + 0x09, 0xcc, 0x38, 0xdd, 0x51, 0x1e, + ], + ); + } +} diff --git a/pingora-core/src/protocols/http/compression/gzip.rs b/pingora-core/src/protocols/http/compression/gzip.rs new file mode 100644 index 0000000..d64c961 --- /dev/null +++ b/pingora-core/src/protocols/http/compression/gzip.rs @@ -0,0 +1,103 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use super::Encode; + +use bytes::Bytes; +use flate2::write::GzEncoder; +use pingora_error::Result; +use std::io::Write; +use std::time::{Duration, Instant}; + +// TODO: unzip + +pub struct Compressor { + // TODO: enum for other compression algorithms + compress: GzEncoder<Vec<u8>>, + total_in: usize, + total_out: usize, + duration: Duration, +} + +impl Compressor { + pub fn new(level: u32) -> Compressor { + Compressor { + compress: GzEncoder::new(vec![], flate2::Compression::new(level)), + total_in: 0, + total_out: 0, + duration: Duration::new(0, 0), + } + } +} + +impl Encode for Compressor { + // infallible because compression can take any data + fn encode(&mut self, input: &[u8], end: bool) -> Result<Bytes> { + // reserve at most 16k + const MAX_INIT_COMPRESSED_BUF_SIZE: usize = 16 * 1024; + let start = Instant::now(); + self.total_in += input.len(); + self.compress + .get_mut() + .reserve(std::cmp::min(MAX_INIT_COMPRESSED_BUF_SIZE, input.len())); + self.write_all(input).unwrap(); // write to vec, should never fail + if end { + self.try_finish().unwrap(); // write to vec, should never fail + } + self.total_out += self.compress.get_ref().len(); + self.duration += start.elapsed(); + Ok(std::mem::take(self.compress.get_mut()).into()) // into() Bytes will drop excess capacity + } + + fn stat(&self) -> (&'static str, usize, usize, Duration) { + ("gzip", self.total_in, self.total_out, self.duration) + } +} + +use std::ops::{Deref, DerefMut}; +impl Deref for Compressor { + type Target = GzEncoder<Vec<u8>>; + + fn deref(&self) -> &Self::Target { + &self.compress + } +} + +impl DerefMut for Compressor { + fn deref_mut(&mut self) -> &mut Self::Target { + &mut self.compress + } +} + +#[cfg(test)] +mod tests_stream { + use super::*; + + #[test] + fn gzip_data() { + let mut compressor = Compressor::new(6); + let compressed = compressor.encode(b"abcdefg", true).unwrap(); + // gzip magic headers + assert_eq!(&compressed[..3], &[0x1f, 0x8b, 0x08]); + // check the crc32 footer + assert_eq!( + &compressed[compressed.len() - 9..], + &[0, 166, 106, 42, 49, 7, 0, 0, 0] + ); + assert_eq!(compressor.total_in, 7); + assert_eq!(compressor.total_out, compressed.len()); + + assert!(compressor.get_ref().is_empty()); + } +} diff --git a/pingora-core/src/protocols/http/compression/mod.rs b/pingora-core/src/protocols/http/compression/mod.rs new file mode 100644 index 0000000..a16c774 --- /dev/null +++ b/pingora-core/src/protocols/http/compression/mod.rs @@ -0,0 +1,612 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! HTTP response (de)compression libraries +//! +//! Brotli and Gzip and partially supported. + +use super::HttpTask; + +use bytes::Bytes; +use log::warn; +use pingora_error::{ErrorType, Result}; +use pingora_http::{RequestHeader, ResponseHeader}; +use std::time::Duration; + +mod brotli; +mod gzip; +mod zstd; + +/// The type of error to return when (de)compression fails +pub const COMPRESSION_ERROR: ErrorType = ErrorType::new("CompressionError"); + +/// The trait for both compress and decompress because the interface and syntax are the same: +/// encode some bytes to other bytes +pub trait Encode { + /// Encode the input bytes. The `end` flag signals the end of the entire input. The `end` flag + /// helps the encoder to flush out the remaining buffered encoded data because certain compression + /// algorithms prefer to collect large enough data to compress all together. + fn encode(&mut self, input: &[u8], end: bool) -> Result<Bytes>; + /// Return the Encoder's name, the total input bytes, the total output bytes and the total + /// duration spent on encoding the data. + fn stat(&self) -> (&'static str, usize, usize, Duration); +} + +/// The response compression object. Currently support gzip compression and brotli decompression. +/// +/// To use it, the caller should create a [`ResponseCompressionCtx`] per HTTP session. +/// The caller should call the corresponding filters for the request header, response header and +/// response body. If the algorithms are supported, the output response body will be encoded. +/// The response header will be adjusted accordingly as well. If the algorithm is not supported +/// or no encoding needed, the response is untouched. +/// +/// If configured and if the request's `accept-encoding` header contains the algorithm supported and the +/// incoming response doesn't have that encoding, the filter will compress the response. +/// If configured and supported, and if the incoming response's `content-encoding` isn't one of the +/// request's `accept-encoding` supported algorithm, the ctx will decompress the response. +/// +/// # Currently supported algorithms and actions +/// - Brotli decompression: if the response is br compressed, this ctx can decompress it +/// - Gzip compression: if the response is uncompressed, this ctx can compress it with gzip +pub struct ResponseCompressionCtx(CtxInner); + +enum CtxInner { + HeaderPhase { + compression_level: u32, + decompress_enable: bool, + // Store the preferred list to compare with content-encoding + accept_encoding: Vec<Algorithm>, + }, + BodyPhase(Option<Box<dyn Encode + Send + Sync>>), +} + +impl ResponseCompressionCtx { + /// Create a new [`ResponseCompressionCtx`] with the expected compression level. `0` will disable + /// the compression. + /// The `decompress_enable` flag will tell the ctx to decompress if needed. + pub fn new(compression_level: u32, decompress_enable: bool) -> Self { + Self(CtxInner::HeaderPhase { + compression_level, + decompress_enable, + accept_encoding: Vec::new(), + }) + } + + /// Whether the encoder is enabled. + /// The enablement will change according to the request and response filter by this ctx. + pub fn is_enabled(&self) -> bool { + match &self.0 { + CtxInner::HeaderPhase { + compression_level, + decompress_enable, + accept_encoding: _, + } => *compression_level != 0 || *decompress_enable, + CtxInner::BodyPhase(c) => c.is_some(), + } + } + + /// Return the stat of this ctx: + /// algorithm name, in bytes, out bytes, time took for the compression + pub fn get_info(&self) -> Option<(&'static str, usize, usize, Duration)> { + match &self.0 { + CtxInner::HeaderPhase { + compression_level: _, + decompress_enable: _, + accept_encoding: _, + } => None, + CtxInner::BodyPhase(c) => c.as_ref().map(|c| c.stat()), + } + } + + /// Adjust the compression level. + /// # Panic + /// This function will panic if it has already started encoding the response body. + pub fn adjust_level(&mut self, new_level: u32) { + match &mut self.0 { + CtxInner::HeaderPhase { + compression_level, + decompress_enable: _, + accept_encoding: _, + } => { + *compression_level = new_level; + } + CtxInner::BodyPhase(_) => panic!("Wrong phase: BodyPhase"), + } + } + + /// Adjust the decompression flag. + /// # Panic + /// This function will panic if it has already started encoding the response body. + pub fn adjust_decompression(&mut self, enabled: bool) { + match &mut self.0 { + CtxInner::HeaderPhase { + compression_level: _, + decompress_enable, + accept_encoding: _, + } => { + *decompress_enable = enabled; + } + CtxInner::BodyPhase(_) => panic!("Wrong phase: BodyPhase"), + } + } + + /// Feed the request header into this ctx. + pub fn request_filter(&mut self, req: &RequestHeader) { + if !self.is_enabled() { + return; + } + match &mut self.0 { + CtxInner::HeaderPhase { + compression_level: _, + decompress_enable: _, + accept_encoding, + } => parse_accept_encoding( + req.headers.get(http::header::ACCEPT_ENCODING), + accept_encoding, + ), + CtxInner::BodyPhase(_) => panic!("Wrong phase: BodyPhase"), + } + } + + fn response_header_filter(&mut self, resp: &mut ResponseHeader, end: bool) { + match &self.0 { + CtxInner::HeaderPhase { + compression_level, + decompress_enable, + accept_encoding, + } => { + if resp.status.is_informational() { + if resp.status == http::status::StatusCode::SWITCHING_PROTOCOLS { + // no transformation for websocket (TODO: cite RFC) + self.0 = CtxInner::BodyPhase(None); + } + // else, wait for the final response header for decision + return; + } + // do nothing if no body + if end { + self.0 = CtxInner::BodyPhase(None); + return; + } + + let action = decide_action(resp, accept_encoding); + let encoder = match action { + Action::Noop => None, + Action::Compress(algorithm) => algorithm.compressor(*compression_level), + Action::Decompress(algorithm) => algorithm.decompressor(*decompress_enable), + }; + if encoder.is_some() { + adjust_response_header(resp, &action); + } + self.0 = CtxInner::BodyPhase(encoder); + } + CtxInner::BodyPhase(_) => panic!("Wrong phase: BodyPhase"), + } + } + + fn response_body_filter(&mut self, data: Option<&Bytes>, end: bool) -> Option<Bytes> { + match &mut self.0 { + CtxInner::HeaderPhase { + compression_level: _, + decompress_enable: _, + accept_encoding: _, + } => panic!("Wrong phase: HeaderPhase"), + CtxInner::BodyPhase(compressor) => { + let result = compressor + .as_mut() + .map(|c| { + // Feed even empty slice to compressor because it might yield data + // when `end` is true + let data = if let Some(b) = data { b.as_ref() } else { &[] }; + c.encode(data, end) + }) + .transpose(); + result.unwrap_or_else(|e| { + warn!("Failed to compress, compression disabled, {}", e); + // no point to transcode further data because bad data is already seen + self.0 = CtxInner::BodyPhase(None); + None + }) + } + } + } + + /// Feed the response into this ctx. + /// This filter will mutate the response accordingly if encoding is needed. + pub fn response_filter(&mut self, t: &mut HttpTask) { + if !self.is_enabled() { + return; + } + match t { + HttpTask::Header(resp, end) => self.response_header_filter(resp, *end), + HttpTask::Body(data, end) => { + let compressed = self.response_body_filter(data.as_ref(), *end); + if compressed.is_some() { + *t = HttpTask::Body(compressed, *end); + } + } + HttpTask::Done => { + // try to finish/flush compression + let compressed = self.response_body_filter(None, true); + if compressed.is_some() { + // compressor has more data to flush + *t = HttpTask::Body(compressed, true); + } + } + _ => { /* Trailer, Failed: do nothing? */ } + } + } +} + +#[derive(Debug, PartialEq, Eq, Clone, Copy)] +enum Algorithm { + Any, // the "*" + Gzip, + Brotli, + Zstd, + // TODO: Identify, + // TODO: Deflate + Other, // anyting unknown +} + +impl Algorithm { + pub fn as_str(&self) -> &'static str { + match self { + Algorithm::Gzip => "gzip", + Algorithm::Brotli => "br", + Algorithm::Zstd => "zstd", + Algorithm::Any => "*", + Algorithm::Other => "other", + } + } + + pub fn compressor(&self, level: u32) -> Option<Box<dyn Encode + Send + Sync>> { + if level == 0 { + None + } else { + match self { + Self::Gzip => Some(Box::new(gzip::Compressor::new(level))), + Self::Brotli => Some(Box::new(brotli::Compressor::new(level))), + Self::Zstd => Some(Box::new(zstd::Compressor::new(level))), + _ => None, // not implemented + } + } + } + + pub fn decompressor(&self, enabled: bool) -> Option<Box<dyn Encode + Send + Sync>> { + if !enabled { + None + } else { + match self { + Self::Brotli => Some(Box::new(brotli::Decompressor::new())), + _ => None, // not implemented + } + } + } +} + +impl From<&str> for Algorithm { + fn from(s: &str) -> Self { + use unicase::UniCase; + + let coding = UniCase::new(s); + if coding == UniCase::ascii("gzip") { + Algorithm::Gzip + } else if coding == UniCase::ascii("br") { + Algorithm::Brotli + } else if coding == UniCase::ascii("zstd") { + Algorithm::Zstd + } else if s.is_empty() { + Algorithm::Any + } else { + Algorithm::Other + } + } +} + +#[derive(Debug, PartialEq, Eq, Clone, Copy)] +enum Action { + Noop, // do nothing, e.g. when the input is already gzip + Compress(Algorithm), + Decompress(Algorithm), +} + +// parse Accpet-Encoding header and put it to the list +fn parse_accept_encoding(accept_encoding: Option<&http::HeaderValue>, list: &mut Vec<Algorithm>) { + // https://www.rfc-editor.org/rfc/rfc9110#name-accept-encoding + if let Some(ac) = accept_encoding { + // fast path + if ac.as_bytes() == b"gzip" { + list.push(Algorithm::Gzip); + return; + } + // properly parse AC header + match sfv::Parser::parse_list(ac.as_bytes()) { + Ok(parsed) => { + for item in parsed { + if let sfv::ListEntry::Item(i) = item { + if let Some(s) = i.bare_item.as_token() { + // TODO: support q value + let algorithm = Algorithm::from(s); + // ignore algorithms that we don't understand ingore + if algorithm != Algorithm::Other { + list.push(Algorithm::from(s)); + } + } + } + } + } + Err(e) => { + warn!("Failed to parse accept-encoding {ac:?}, {e}") + } + } + } else { + // "If no Accept-Encoding header, any content coding is acceptable" + // keep the list empty + } +} + +#[test] +fn test_accept_encoding_req_header() { + let mut header = RequestHeader::build("GET", b"/", None).unwrap(); + let mut ac_list = Vec::new(); + parse_accept_encoding( + header.headers.get(http::header::ACCEPT_ENCODING), + &mut ac_list, + ); + assert!(ac_list.is_empty()); + + let mut ac_list = Vec::new(); + header.insert_header("accept-encoding", "gzip").unwrap(); + parse_accept_encoding( + header.headers.get(http::header::ACCEPT_ENCODING), + &mut ac_list, + ); + assert_eq!(ac_list[0], Algorithm::Gzip); + + let mut ac_list = Vec::new(); + header + .insert_header("accept-encoding", "what, br, gzip") + .unwrap(); + parse_accept_encoding( + header.headers.get(http::header::ACCEPT_ENCODING), + &mut ac_list, + ); + assert_eq!(ac_list[0], Algorithm::Brotli); + assert_eq!(ac_list[1], Algorithm::Gzip); +} + +// filter response header to see if (de)compression is needed +fn decide_action(resp: &ResponseHeader, accept_encoding: &[Algorithm]) -> Action { + use http::header::CONTENT_ENCODING; + + let content_encoding = if let Some(ce) = resp.headers.get(CONTENT_ENCODING) { + // https://www.rfc-editor.org/rfc/rfc9110#name-content-encoding + if let Ok(ce_str) = std::str::from_utf8(ce.as_bytes()) { + Some(Algorithm::from(ce_str)) + } else { + // not utf-8, treat it as unknown encoding to leave it untouched + Some(Algorithm::Other) + } + } else { + // no Accpet-encoding + None + }; + + if let Some(ce) = content_encoding { + if accept_encoding.contains(&ce) { + // downstream can accept this encoding, nothing to do + Action::Noop + } else { + // always decompress because uncompressed is always acceptable + // https://www.rfc-editor.org/rfc/rfc9110#field.accept-encoding + // "If the representation has no content coding, then it is acceptable by default + // unless specifically excluded..." TODO: check the exclude case + // TODO: we could also transcode it to a preferred encoding, e.g. br->gzip + Action::Decompress(ce) + } + } else if accept_encoding.is_empty() // both CE and AE are empty + || !compressible(resp) // the type is not compressible + || accept_encoding[0] == Algorithm::Any + { + Action::Noop + } else { + // try to compress with the first AC + // TODO: support to configure preferred encoding + Action::Compress(accept_encoding[0]) + } +} + +#[test] +fn test_decide_action() { + use Action::*; + use Algorithm::*; + + let header = ResponseHeader::build(200, None).unwrap(); + // no compression asked, no compression needed + assert_eq!(decide_action(&header, &[]), Noop); + + // already gzip, no compression needed + let mut header = ResponseHeader::build(200, None).unwrap(); + header.insert_header("content-type", "text/html").unwrap(); + header.insert_header("content-encoding", "gzip").unwrap(); + assert_eq!(decide_action(&header, &[Gzip]), Noop); + + // already gzip, no compression needed, upper case + let mut header = ResponseHeader::build(200, None).unwrap(); + header.insert_header("content-encoding", "GzIp").unwrap(); + header.insert_header("content-type", "text/html").unwrap(); + assert_eq!(decide_action(&header, &[Gzip]), Noop); + + // no encoding, compression needed, accepted content-type, large enough + // Will compress + let mut header = ResponseHeader::build(200, None).unwrap(); + header.insert_header("content-length", "20").unwrap(); + header.insert_header("content-type", "text/html").unwrap(); + assert_eq!(decide_action(&header, &[Gzip]), Compress(Gzip)); + + // too small + let mut header = ResponseHeader::build(200, None).unwrap(); + header.insert_header("content-length", "19").unwrap(); + header.insert_header("content-type", "text/html").unwrap(); + assert_eq!(decide_action(&header, &[Gzip]), Noop); + + // already compressed MIME + let mut header = ResponseHeader::build(200, None).unwrap(); + header.insert_header("content-length", "20").unwrap(); + header + .insert_header("content-type", "text/html+zip") + .unwrap(); + assert_eq!(decide_action(&header, &[Gzip]), Noop); + + // unsupported MIME + let mut header = ResponseHeader::build(200, None).unwrap(); + header.insert_header("content-length", "20").unwrap(); + header.insert_header("content-type", "image/jpg").unwrap(); + assert_eq!(decide_action(&header, &[Gzip]), Noop); + + // compressed, need decompress + let mut header = ResponseHeader::build(200, None).unwrap(); + header.insert_header("content-encoding", "gzip").unwrap(); + assert_eq!(decide_action(&header, &[]), Decompress(Gzip)); + + // accept-encoding different, need decompress + let mut header = ResponseHeader::build(200, None).unwrap(); + header.insert_header("content-encoding", "gzip").unwrap(); + assert_eq!(decide_action(&header, &[Brotli]), Decompress(Gzip)); + + // less preferred but no need to decompress + let mut header = ResponseHeader::build(200, None).unwrap(); + header.insert_header("content-encoding", "gzip").unwrap(); + assert_eq!(decide_action(&header, &[Brotli, Gzip]), Noop); +} + +use once_cell::sync::Lazy; +use regex::Regex; + +// Allow text, application, font, a few image/ MIME types and binary/octet-stream +// TODO: fine tune this list +static MIME_CHECK: Lazy<Regex> = Lazy::new(|| { + Regex::new(r"^(?:text/|application/|font/|image/(?:x-icon|svg\+xml|nd\.microsoft\.icon)|binary/octet-stream)") + .unwrap() +}); + +// check if the response mime type is compressible +fn compressible(resp: &ResponseHeader) -> bool { + // arbitrary size limit, things to consider + // 1. too short body may have little redundancy to compress + // 2. gzip header and footer overhead + // 3. latency is the same as long as data fits in a TCP congestion window regardless of size + const MIN_COMPRESS_LEN: usize = 20; + + // check if response is too small to compress + if let Some(cl) = resp.headers.get(http::header::CONTENT_LENGTH) { + if let Some(cl_num) = std::str::from_utf8(cl.as_bytes()) + .ok() + .and_then(|v| v.parse::<usize>().ok()) + { + if cl_num < MIN_COMPRESS_LEN { + return false; + } + } + } + // no Content-Length or large enough, check content-type next + if let Some(ct) = resp.headers.get(http::header::CONTENT_TYPE) { + if let Ok(ct_str) = std::str::from_utf8(ct.as_bytes()) { + if ct_str.contains("zip") { + // heuristic: don't compress mime type that has zip in it + false + } else { + // check if mime type in allow list + MIME_CHECK.find(ct_str).is_some() + } + } else { + false // invalid CT header, don't compress + } + } else { + false // don't compress empty content-type + } +} + +fn adjust_response_header(resp: &mut ResponseHeader, action: &Action) { + use http::header::{HeaderValue, CONTENT_ENCODING, CONTENT_LENGTH, TRANSFER_ENCODING}; + + fn set_stream_headers(resp: &mut ResponseHeader) { + // because the transcoding is streamed, content length is not known ahead + resp.remove_header(&CONTENT_LENGTH); + // we stream body now TODO: chunked is for h1 only + resp.insert_header(&TRANSFER_ENCODING, HeaderValue::from_static("chunked")) + .unwrap(); + } + + match action { + Action::Noop => { /* do nothing */ } + Action::Decompress(_) => { + resp.remove_header(&CONTENT_ENCODING); + set_stream_headers(resp) + } + Action::Compress(a) => { + resp.insert_header(&CONTENT_ENCODING, HeaderValue::from_static(a.as_str())) + .unwrap(); + set_stream_headers(resp) + } + } +} + +#[test] +fn test_adjust_response_header() { + use Action::*; + use Algorithm::*; + + // noop + let mut header = ResponseHeader::build(200, None).unwrap(); + header.insert_header("content-length", "20").unwrap(); + header.insert_header("content-encoding", "gzip").unwrap(); + adjust_response_header(&mut header, &Noop); + assert_eq!( + header.headers.get("content-encoding").unwrap().as_bytes(), + b"gzip" + ); + assert_eq!( + header.headers.get("content-length").unwrap().as_bytes(), + b"20" + ); + assert!(header.headers.get("transfor-encoding").is_none()); + + // decompress gzip + let mut header = ResponseHeader::build(200, None).unwrap(); + header.insert_header("content-length", "20").unwrap(); + header.insert_header("content-encoding", "gzip").unwrap(); + adjust_response_header(&mut header, &Decompress(Gzip)); + assert!(header.headers.get("content-encoding").is_none()); + assert!(header.headers.get("content-length").is_none()); + assert_eq!( + header.headers.get("transfer-encoding").unwrap().as_bytes(), + b"chunked" + ); + + // compress + let mut header = ResponseHeader::build(200, None).unwrap(); + header.insert_header("content-length", "20").unwrap(); + adjust_response_header(&mut header, &Compress(Gzip)); + assert_eq!( + header.headers.get("content-encoding").unwrap().as_bytes(), + b"gzip" + ); + assert!(header.headers.get("content-length").is_none()); + assert_eq!( + header.headers.get("transfer-encoding").unwrap().as_bytes(), + b"chunked" + ); +} diff --git a/pingora-core/src/protocols/http/compression/zstd.rs b/pingora-core/src/protocols/http/compression/zstd.rs new file mode 100644 index 0000000..88a2bfc --- /dev/null +++ b/pingora-core/src/protocols/http/compression/zstd.rs @@ -0,0 +1,91 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use super::{Encode, COMPRESSION_ERROR}; +use bytes::Bytes; +use parking_lot::Mutex; +use pingora_error::{OrErr, Result}; +use std::io::Write; +use std::time::{Duration, Instant}; +use zstd::stream::write::Encoder; + +pub struct Compressor { + compress: Mutex<Encoder<'static, Vec<u8>>>, + total_in: usize, + total_out: usize, + duration: Duration, +} + +impl Compressor { + pub fn new(level: u32) -> Self { + Compressor { + // Mutex because Encoder is not Sync + // https://github.com/gyscos/zstd-rs/issues/186 + compress: Mutex::new(Encoder::new(vec![], level as i32).unwrap()), + total_in: 0, + total_out: 0, + duration: Duration::new(0, 0), + } + } +} + +impl Encode for Compressor { + fn encode(&mut self, input: &[u8], end: bool) -> Result<Bytes> { + // reserve at most 16k + const MAX_INIT_COMPRESSED_BUF_SIZE: usize = 16 * 1024; + let start = Instant::now(); + self.total_in += input.len(); + let mut compress = self.compress.lock(); + // reserve at most input size, cap at 16k, compressed output should be smaller + compress + .get_mut() + .reserve(std::cmp::min(MAX_INIT_COMPRESSED_BUF_SIZE, input.len())); + compress + .write_all(input) + .or_err(COMPRESSION_ERROR, "while compress zstd")?; + // write to vec will never fail. + if end { + compress + .do_finish() + .or_err(COMPRESSION_ERROR, "while compress zstd")?; + } + self.total_out += compress.get_ref().len(); + self.duration += start.elapsed(); + Ok(std::mem::take(compress.get_mut()).into()) // into() Bytes will drop excess capacity + } + + fn stat(&self) -> (&'static str, usize, usize, Duration) { + ("zstd", self.total_in, self.total_out, self.duration) + } +} + +#[cfg(test)] +mod tests_stream { + use super::*; + + #[test] + fn compress_zstd_data() { + let mut compressor = Compressor::new(11); + let input = b"adcdefgabcdefghadcdefgabcdefghadcdefgabcdefghadcdefgabcdefgh\n"; + let compressed = compressor.encode(&input[..], false).unwrap(); + // waiting for more data + assert!(compressed.is_empty()); + + let compressed = compressor.encode(&input[..], true).unwrap(); + + // the zstd Magic_Number + assert_eq!(&compressed[..4], &[0x28, 0xB5, 0x2F, 0xFD]); + assert!(compressed.len() < input.len()); + } +} diff --git a/pingora-core/src/protocols/http/date.rs b/pingora-core/src/protocols/http/date.rs new file mode 100644 index 0000000..4b15c4e --- /dev/null +++ b/pingora-core/src/protocols/http/date.rs @@ -0,0 +1,90 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use chrono::NaiveDateTime; +use http::header::HeaderValue; +use std::cell::RefCell; +use std::time::{Duration, SystemTime}; + +fn to_date_string(epoch_sec: i64) -> String { + let dt = NaiveDateTime::from_timestamp_opt(epoch_sec, 0).unwrap(); + dt.format("%a, %d %b %Y %H:%M:%S GMT").to_string() +} + +struct CacheableDate { + h1_date: HeaderValue, + epoch: Duration, +} + +impl CacheableDate { + pub fn new() -> Self { + let d = SystemTime::now() + .duration_since(SystemTime::UNIX_EPOCH) + .unwrap(); + CacheableDate { + h1_date: HeaderValue::from_str(&to_date_string(d.as_secs() as i64)).unwrap(), + epoch: d, + } + } + + pub fn update(&mut self, d_now: Duration) { + if d_now.as_secs() != self.epoch.as_secs() { + self.epoch = d_now; + self.h1_date = HeaderValue::from_str(&to_date_string(d_now.as_secs() as i64)).unwrap(); + } + } + + pub fn get_date(&mut self) -> HeaderValue { + let d = SystemTime::now() + .duration_since(SystemTime::UNIX_EPOCH) + .unwrap(); + self.update(d); + self.h1_date.clone() + } +} + +thread_local! { + static CACHED_DATE: RefCell<CacheableDate> + = RefCell::new(CacheableDate::new()); +} + +pub fn get_cached_date() -> HeaderValue { + CACHED_DATE.with(|cache_date| (*cache_date.borrow_mut()).get_date()) +} + +#[cfg(test)] +mod test { + use super::*; + + fn now_date_header() -> HeaderValue { + HeaderValue::from_str(&to_date_string( + SystemTime::now() + .duration_since(SystemTime::UNIX_EPOCH) + .unwrap() + .as_secs() as i64, + )) + .unwrap() + } + + #[test] + fn test_date_string() { + let date_str = to_date_string(1); + assert_eq!("Thu, 01 Jan 1970 00:00:01 GMT", date_str); + } + + #[test] + fn test_date_cached() { + assert_eq!(get_cached_date(), now_date_header()); + } +} diff --git a/pingora-core/src/protocols/http/error_resp.rs b/pingora-core/src/protocols/http/error_resp.rs new file mode 100644 index 0000000..acf5d2e --- /dev/null +++ b/pingora-core/src/protocols/http/error_resp.rs @@ -0,0 +1,41 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Error response generating utilities. + +use http::header; +use once_cell::sync::Lazy; +use pingora_http::ResponseHeader; + +use super::SERVER_NAME; + +/// Generate an error response with the given status code. +/// +/// This error response has a zero `Content-Length` and `Cache-Control: private, no-store`. +pub fn gen_error_response(code: u16) -> ResponseHeader { + let mut resp = ResponseHeader::build(code, Some(4)).unwrap(); + resp.insert_header(header::SERVER, &SERVER_NAME[..]) + .unwrap(); + resp.insert_header(header::DATE, "Sun, 06 Nov 1994 08:49:37 GMT") + .unwrap(); // placeholder + resp.insert_header(header::CONTENT_LENGTH, "0").unwrap(); + resp.insert_header(header::CACHE_CONTROL, "private, no-store") + .unwrap(); + resp +} + +/// Pre-generated 502 response +pub static HTTP_502_RESPONSE: Lazy<ResponseHeader> = Lazy::new(|| gen_error_response(502)); +/// Pre-generated 400 response +pub static HTTP_400_RESPONSE: Lazy<ResponseHeader> = Lazy::new(|| gen_error_response(400)); diff --git a/pingora-core/src/protocols/http/mod.rs b/pingora-core/src/protocols/http/mod.rs new file mode 100644 index 0000000..d5e8ee9 --- /dev/null +++ b/pingora-core/src/protocols/http/mod.rs @@ -0,0 +1,57 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! HTTP/1.x and HTTP/2 implementation APIs + +mod body_buffer; +pub mod client; +pub mod compression; +pub(crate) mod date; +pub mod error_resp; +pub mod server; +pub mod v1; +pub mod v2; + +pub use server::Session as ServerSession; + +/// The Pingora server name string +pub const SERVER_NAME: &[u8; 7] = b"Pingora"; + +/// An enum to hold all possible HTTP response events. +#[derive(Debug)] +pub enum HttpTask { + /// the response header and the boolean end of response flag + Header(Box<pingora_http::ResponseHeader>, bool), + /// A piece of response header and the end of response boolean flag + Body(Option<bytes::Bytes>, bool), + /// HTTP response trailer + Trailer(Option<Box<http::HeaderMap>>), + /// Signal that the response is already finished + Done, + /// Signal that the reading of the response encounters errors. + Failed(pingora_error::BError), +} + +impl HttpTask { + /// Whether this [`HttpTask`] means the end of the response + pub fn is_end(&self) -> bool { + match self { + HttpTask::Header(_, end) => *end, + HttpTask::Body(_, end) => *end, + HttpTask::Trailer(_) => true, + HttpTask::Done => true, + HttpTask::Failed(_) => true, + } + } +} diff --git a/pingora-core/src/protocols/http/server.rs b/pingora-core/src/protocols/http/server.rs new file mode 100644 index 0000000..1f84997 --- /dev/null +++ b/pingora-core/src/protocols/http/server.rs @@ -0,0 +1,333 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! HTTP server session APIs + +use super::error_resp; +use super::v1::server::HttpSession as SessionV1; +use super::v2::server::HttpSession as SessionV2; +use super::HttpTask; +use crate::protocols::Stream; +use bytes::Bytes; +use http::header::AsHeaderName; +use http::HeaderValue; +use log::error; +use pingora_error::Result; +use pingora_http::{RequestHeader, ResponseHeader}; + +/// HTTP server session object for both HTTP/1.x and HTTP/2 +pub enum Session { + H1(SessionV1), + H2(SessionV2), +} + +impl Session { + /// Create a new [`Session`] from an established connection for HTTP/1.x + pub fn new_http1(stream: Stream) -> Self { + Self::H1(SessionV1::new(stream)) + } + + /// Create a new [`Session`] from an established HTTP/2 stream + pub fn new_http2(session: SessionV2) -> Self { + Self::H2(session) + } + + /// Whether the session is HTTP/2. If not it is HTTP/1.x + pub fn is_http2(&self) -> bool { + matches!(self, Self::H2(_)) + } + + /// Read the request header. This method is required to be called first before doing anything + /// else with the session. + /// - `Ok(true)`: successful + /// - `Ok(false)`: client exit without sending any bytes. This is normal on reused connection. + /// In this case the user should give up this session. + pub async fn read_request(&mut self) -> Result<bool> { + match self { + Self::H1(s) => { + let read = s.read_request().await?; + Ok(read.is_some()) + } + // This call will always return `Ok(true)` for Http2 because the request is already read + Self::H2(_) => Ok(true), + } + } + + /// Return the request header it just read. + /// # Panic + /// This function will panic if [`Self::read_request()`] is not called. + pub fn req_header(&self) -> &RequestHeader { + match self { + Self::H1(s) => s.req_header(), + Self::H2(s) => s.req_header(), + } + } + + /// Return a mutable reference to request header it just read. + /// # Panic + /// This function will panic if [`Self::read_request()`] is not called. + pub fn req_header_mut(&mut self) -> &mut RequestHeader { + match self { + Self::H1(s) => s.req_header_mut(), + Self::H2(s) => s.req_header_mut(), + } + } + + /// Return the header by name. None if the header doesn't exist. + /// + /// In case there are multiple headers under the same name, the first one will be returned. To + /// get all the headers: use `self.req_header().headers.get_all()`. + pub fn get_header<K: AsHeaderName>(&self, key: K) -> Option<&HeaderValue> { + self.req_header().headers.get(key) + } + + /// Get the header value in its raw format. + /// If the header doesn't exist, return an empty slice. + pub fn get_header_bytes<K: AsHeaderName>(&self, key: K) -> &[u8] { + self.get_header(key).map_or(b"", |v| v.as_bytes()) + } + + /// Read the request body. Ok(None) if no (more) body to read + pub async fn read_request_body(&mut self) -> Result<Option<Bytes>> { + match self { + Self::H1(s) => s.read_body_bytes().await, + Self::H2(s) => s.read_body_bytes().await, + } + } + + /// Write the response header to client + /// Informational headers (status code 100-199, excluding 101) can be written multiple times the final + /// response header (status code 200+ or 101) is written. + pub async fn write_response_header(&mut self, resp: Box<ResponseHeader>) -> Result<()> { + match self { + Self::H1(s) => { + s.write_response_header(resp).await?; + Ok(()) + } + Self::H2(s) => s.write_response_header(resp, false), + } + } + + /// Similar to `write_response_header()`, this fn will clone the `resp` internally + pub async fn write_response_header_ref(&mut self, resp: &ResponseHeader) -> Result<()> { + match self { + Self::H1(s) => { + s.write_response_header_ref(resp).await?; + Ok(()) + } + Self::H2(s) => s.write_response_header_ref(resp, false), + } + } + + /// Write the response body to client + pub async fn write_response_body(&mut self, data: Bytes) -> Result<()> { + match self { + Self::H1(s) => { + s.write_body(&data).await?; + Ok(()) + } + Self::H2(s) => s.write_body(data, false), + } + } + + /// Finish the life of this request. + /// For H1, if connection reuse is supported, a Some(Stream) will be returned, otherwise None. + /// For H2, always return None because H2 stream is not reusable. + pub async fn finish(self) -> Result<Option<Stream>> { + match self { + Self::H1(mut s) => { + // need to flush body due to buffering + s.finish_body().await?; + Ok(s.reuse().await) + } + Self::H2(mut s) => { + s.finish()?; + Ok(None) + } + } + } + + pub async fn response_duplex_vec(&mut self, tasks: Vec<HttpTask>) -> Result<bool> { + match self { + Self::H1(s) => s.response_duplex_vec(tasks).await, + Self::H2(s) => s.response_duplex_vec(tasks), + } + } + + /// Set connection reuse. `duration` defines how long the connection is kept open for the next + /// request to reuse. Noop for h2 + pub fn set_keepalive(&mut self, duration: Option<u64>) { + match self { + Self::H1(s) => s.set_server_keepalive(duration), + Self::H2(_) => {} + } + } + + /// Return a digest of the request including the method, path and Host header + // TODO: make this use a `Formatter` + pub fn request_summary(&self) -> String { + match self { + Self::H1(s) => s.request_summary(), + Self::H2(s) => s.request_summary(), + } + } + + /// Return the written response header. `None` if it is not written yet. + /// Only the final (status code >= 200 or 101) response header will be returned + pub fn response_written(&self) -> Option<&ResponseHeader> { + match self { + Self::H1(s) => s.response_written(), + Self::H2(s) => s.response_written(), + } + } + + /// Give up the http session abruptly. + /// For H1 this will close the underlying connection + /// For H2 this will send RESET frame to end this stream without impacting the connection + pub async fn shutdown(&mut self) { + match self { + Self::H1(s) => s.shutdown().await, + Self::H2(s) => s.shutdown(), + } + } + + pub fn to_h1_raw(&self) -> Bytes { + match self { + Self::H1(s) => s.get_headers_raw_bytes(), + Self::H2(s) => s.pseudo_raw_h1_request_header(), + } + } + + /// Whether the whole request body is sent + pub fn is_body_done(&mut self) -> bool { + match self { + Self::H1(s) => s.is_body_done(), + Self::H2(s) => s.is_body_done(), + } + } + + /// Notify the client that the entire body is sent + /// for H1 chunked encoding, this will end the last empty chunk + /// for H1 content-length, this has no effect. + /// for H2, this will send an empty DATA frame with END_STREAM flag + pub async fn finish_body(&mut self) -> Result<()> { + match self { + Self::H1(s) => s.finish_body().await.map(|_| ()), + Self::H2(s) => s.finish(), + } + } + + /// Send error response to client + pub async fn respond_error(&mut self, error: u16) { + let resp = match error { + /* commmon error responses are pre-generated */ + 502 => error_resp::HTTP_502_RESPONSE.clone(), + 400 => error_resp::HTTP_400_RESPONSE.clone(), + _ => error_resp::gen_error_response(error), + }; + + // TODO: we shouldn't be closing downstream connections on internally generated errors + // and possibly other upstream connect() errors (connection refused, timeout, etc) + // + // This change is only here because we DO NOT re-use downstream connections + // today on these errors and we should signal to the client that pingora is dropping it + // rather than a misleading the client with 'keep-alive' + self.set_keepalive(None); + + self.write_response_header(Box::new(resp)) + .await + .unwrap_or_else(|e| { + error!("failed to send error response to downstream: {e}"); + }); + } + + /// Whether there is no request body + pub fn is_body_empty(&mut self) -> bool { + match self { + Self::H1(s) => s.is_body_empty(), + Self::H2(s) => s.is_body_empty(), + } + } + + pub fn retry_buffer_truncated(&self) -> bool { + match self { + Self::H1(s) => s.retry_buffer_truncated(), + Self::H2(s) => s.retry_buffer_truncated(), + } + } + + pub fn enable_retry_buffering(&mut self) { + match self { + Self::H1(s) => s.enable_retry_buffering(), + Self::H2(s) => s.enable_retry_buffering(), + } + } + + pub fn get_retry_buffer(&self) -> Option<Bytes> { + match self { + Self::H1(s) => s.get_retry_buffer(), + Self::H2(s) => s.get_retry_buffer(), + } + } + + /// Read body (same as `read_request_body()`) or pending forever until downstream + /// terminates the session. + pub async fn read_body_or_idle(&mut self, no_body_expected: bool) -> Result<Option<Bytes>> { + match self { + Self::H1(s) => s.read_body_or_idle(no_body_expected).await, + Self::H2(s) => s.read_body_or_idle(no_body_expected).await, + } + } + + pub fn as_http1(&self) -> Option<&SessionV1> { + match self { + Self::H1(s) => Some(s), + Self::H2(_) => None, + } + } + + pub fn as_http2(&self) -> Option<&SessionV2> { + match self { + Self::H1(_) => None, + Self::H2(s) => Some(s), + } + } + + /// Write a 100 Continue response to the client. + pub async fn write_continue_response(&mut self) -> Result<()> { + match self { + Self::H1(s) => s.write_continue_response().await, + Self::H2(s) => s.write_response_header( + Box::new(ResponseHeader::build(100, Some(0)).unwrap()), + false, + ), + } + } + + /// Whether this request is for upgrade (e.g., websocket) + pub fn is_upgrade_req(&self) -> bool { + match self { + Self::H1(s) => s.is_upgrade_req(), + Self::H2(_) => false, + } + } + + /// How many response body bytes already sent + pub fn body_bytes_sent(&self) -> usize { + match self { + Self::H1(s) => s.body_bytes_sent(), + Self::H2(s) => s.body_bytes_sent(), + } + } +} diff --git a/pingora-core/src/protocols/http/v1/body.rs b/pingora-core/src/protocols/http/v1/body.rs new file mode 100644 index 0000000..81c8b22 --- /dev/null +++ b/pingora-core/src/protocols/http/v1/body.rs @@ -0,0 +1,1015 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use bytes::{Buf, BufMut, Bytes, BytesMut}; +use log::{debug, trace, warn}; +use pingora_error::{ + Error, + ErrorType::{self, *}, + OrErr, Result, +}; +use std::fmt::Debug; +use tokio::io::{AsyncRead, AsyncReadExt, AsyncWrite, AsyncWriteExt}; + +use crate::protocols::l4::stream::AsyncWriteVec; +use crate::utils::BufRef; + +// TODO: make this dynamically adjusted +const BODY_BUFFER_SIZE: usize = 1024 * 64; +// limit how much incomplete chunk-size and chunk-ext to buffer +const PARTIAL_CHUNK_HEAD_LIMIT: usize = 1024 * 8; + +const LAST_CHUNK: &[u8; 5] = b"0\r\n\r\n"; + +pub const INVALID_CHUNK: ErrorType = ErrorType::new("InvalidChunk"); +pub const PREMATURE_BODY_END: ErrorType = ErrorType::new("PrematureBodyEnd"); + +#[derive(Clone, Debug, PartialEq, Eq)] +pub enum ParseState { + ToStart, + Complete(usize), // total size + Partial(usize, usize), // size read, remaining size + Chunked(usize, usize, usize, usize), // size read, next to read in current buf start, read in current buf start, remaining chucked size to read from IO + Done(usize), // done but there is error, size read + HTTP1_0(usize), // read until connection closed, size read +} + +type PS = ParseState; + +impl ParseState { + pub fn finish(&self, additional_bytes: usize) -> Self { + match self { + PS::Partial(read, to_read) => PS::Complete(read + to_read), + PS::Chunked(read, _, _, _) => PS::Complete(read + additional_bytes), + PS::HTTP1_0(read) => PS::Complete(read + additional_bytes), + _ => self.clone(), /* invalid transaction */ + } + } + + pub fn done(&self, additional_bytes: usize) -> Self { + match self { + PS::Partial(read, _) => PS::Done(read + additional_bytes), + PS::Chunked(read, _, _, _) => PS::Done(read + additional_bytes), + PS::HTTP1_0(read) => PS::Done(read + additional_bytes), + _ => self.clone(), /* invalid transaction */ + } + } + + pub fn partial_chunk(&self, bytes_read: usize, bytes_to_read: usize) -> Self { + match self { + PS::Chunked(read, _, _, _) => PS::Chunked(read + bytes_read, 0, 0, bytes_to_read), + _ => self.clone(), /* invalid transaction */ + } + } + + pub fn multi_chunk(&self, bytes_read: usize, buf_start_index: usize) -> Self { + match self { + PS::Chunked(read, _, buf_end, _) => { + PS::Chunked(read + bytes_read, buf_start_index, *buf_end, 0) + } + _ => self.clone(), /* invalid transaction */ + } + } + + pub fn partial_chunk_head(&self, head_end: usize, head_size: usize) -> Self { + match self { + /* inform reader to read more to form a legal chunk */ + PS::Chunked(read, _, _, _) => PS::Chunked(*read, 0, head_end, head_size), + _ => self.clone(), /* invalid transaction */ + } + } + + pub fn new_buf(&self, buf_end: usize) -> Self { + match self { + PS::Chunked(read, _, _, _) => PS::Chunked(*read, 0, buf_end, 0), + _ => self.clone(), /* invalid transaction */ + } + } +} + +pub struct BodyReader { + pub body_state: ParseState, + pub body_buf: Option<BytesMut>, + pub body_buf_size: usize, + rewind_buf_len: usize, +} + +impl BodyReader { + pub fn new() -> Self { + BodyReader { + body_state: PS::ToStart, + body_buf: None, + body_buf_size: BODY_BUFFER_SIZE, + rewind_buf_len: 0, + } + } + + pub fn need_init(&self) -> bool { + matches!(self.body_state, PS::ToStart) + } + + pub fn reinit(&mut self) { + self.body_state = PS::ToStart; + } + + fn prepare_buf(&mut self, buf_to_rewind: &[u8]) { + let mut body_buf = BytesMut::with_capacity(self.body_buf_size); + if !buf_to_rewind.is_empty() { + self.rewind_buf_len = buf_to_rewind.len(); + // TODO: this is still 1 copy. Make it zero + body_buf.put_slice(buf_to_rewind); + } + if self.body_buf_size > buf_to_rewind.len() { + //body_buf.resize(self.body_buf_size, 0); + unsafe { + body_buf.set_len(self.body_buf_size); + } + } + self.body_buf = Some(body_buf); + } + + pub fn init_chunked(&mut self, buf_to_rewind: &[u8]) { + self.body_state = PS::Chunked(0, 0, 0, 0); + self.prepare_buf(buf_to_rewind); + } + + pub fn init_content_length(&mut self, cl: usize, buf_to_rewind: &[u8]) { + match cl { + 0 => self.body_state = PS::Complete(0), + _ => { + self.prepare_buf(buf_to_rewind); + self.body_state = PS::Partial(0, cl); + } + } + } + + pub fn init_http10(&mut self, buf_to_rewind: &[u8]) { + self.prepare_buf(buf_to_rewind); + self.body_state = PS::HTTP1_0(0); + } + + pub fn get_body(&self, buf_ref: &BufRef) -> &[u8] { + // TODO: these get_*() could panic. handle them better + buf_ref.get(self.body_buf.as_ref().unwrap()) + } + + pub fn body_done(&self) -> bool { + matches!(self.body_state, PS::Complete(_) | PS::Done(_)) + } + + pub fn body_empty(&self) -> bool { + self.body_state == PS::Complete(0) + } + + pub async fn read_body<S>(&mut self, stream: &mut S) -> Result<Option<BufRef>> + where + S: AsyncRead + Unpin + Send, + { + match self.body_state { + PS::Complete(_) => Ok(None), + PS::Done(_) => Ok(None), + PS::Partial(_, _) => self.do_read_body(stream).await, + PS::Chunked(_, _, _, _) => self.do_read_chunked_body(stream).await, + PS::HTTP1_0(_) => self.do_read_body_until_closed(stream).await, + PS::ToStart => panic!("need to init BodyReader first"), + } + } + + pub async fn do_read_body<S>(&mut self, stream: &mut S) -> Result<Option<BufRef>> + where + S: AsyncRead + Unpin + Send, + { + let body_buf = self.body_buf.as_deref_mut().unwrap(); + let mut n = self.rewind_buf_len; + self.rewind_buf_len = 0; // we only need to read rewind data once + if n == 0 { + /* Need to actually read */ + n = stream + .read(body_buf) + .await + .or_err(ReadError, "when reading body")?; + } + match self.body_state { + PS::Partial(read, to_read) => { + debug!( + "BodyReader body_state: {:?}, read data from IO: {n}", + self.body_state + ); + if n == 0 { + self.body_state = PS::Done(read); + Error::e_explain(ConnectionClosed, format!( + "Peer prematurely closed connection with {} bytes of body remaining to read", + to_read + )) + } else if n >= to_read { + if n > to_read { + warn!( + "Peer sent more data then expected: extra {}\ + bytes, discarding them", + n - to_read + ) + } + self.body_state = PS::Complete(read + to_read); + Ok(Some(BufRef::new(0, to_read))) + } else { + self.body_state = PS::Partial(read + n, to_read - n); + Ok(Some(BufRef::new(0, n))) + } + } + _ => panic!("wrong body state: {:?}", self.body_state), + } + } + + pub async fn do_read_body_until_closed<S>(&mut self, stream: &mut S) -> Result<Option<BufRef>> + where + S: AsyncRead + Unpin + Send, + { + let body_buf = self.body_buf.as_deref_mut().unwrap(); + let mut n = self.rewind_buf_len; + self.rewind_buf_len = 0; // we only need to read rewind data once + if n == 0 { + /* Need to actually read */ + n = stream + .read(body_buf) + .await + .or_err(ReadError, "when reading body")?; + } + match self.body_state { + PS::HTTP1_0(read) => { + if n == 0 { + self.body_state = PS::Complete(read); + Ok(None) + } else { + self.body_state = PS::HTTP1_0(read + n); + Ok(Some(BufRef::new(0, n))) + } + } + _ => panic!("wrong body state: {:?}", self.body_state), + } + } + + pub async fn do_read_chunked_body<S>(&mut self, stream: &mut S) -> Result<Option<BufRef>> + where + S: AsyncRead + Unpin + Send, + { + match self.body_state { + PS::Chunked( + total_read, + existing_buf_start, + mut existing_buf_end, + mut expecting_from_io, + ) => { + if existing_buf_start == 0 { + // read a new buf from IO + let body_buf = self.body_buf.as_deref_mut().unwrap(); + if existing_buf_end == 0 { + existing_buf_end = self.rewind_buf_len; + self.rewind_buf_len = 0; // we only need to read rewind data once + if existing_buf_end == 0 { + existing_buf_end = stream + .read(body_buf) + .await + .or_err(ReadError, "when reading body")?; + } + } else { + /* existing_buf_end != 0 this is partial chunk head */ + /* copy the #expecting_from_io bytes until index existing_buf_end + * to the front and read more to form a valid chunk head. + * existing_buf_end is the end of the partial head and + * expecting_from_io is the len of it */ + body_buf + .copy_within(existing_buf_end - expecting_from_io..existing_buf_end, 0); + let new_bytes = stream + .read(&mut body_buf[expecting_from_io..]) + .await + .or_err(ReadError, "when reading body")?; + /* more data is read, extend the buffer */ + existing_buf_end = expecting_from_io + new_bytes; + expecting_from_io = 0; + } + self.body_state = self.body_state.new_buf(existing_buf_end); + } + if existing_buf_end == 0 { + self.body_state = self.body_state.done(0); + Error::e_explain( + ConnectionClosed, + format!( + "Connection prematurely closed without the termination chunk, \ + read {total_read} bytes" + ), + ) + } else { + if expecting_from_io > 0 { + trace!( + "parital chunk playload, expecting_from_io: {}, \ + existing_buf_end {}, buf: {:?}", + expecting_from_io, + existing_buf_end, + String::from_utf8_lossy( + &self.body_buf.as_ref().unwrap()[..existing_buf_end] + ) + ); + // partial chunk payload, will read more + if expecting_from_io >= existing_buf_end + 2 { + // not enough + self.body_state = self.body_state.partial_chunk( + existing_buf_end, + expecting_from_io - existing_buf_end, + ); + return Ok(Some(BufRef::new(0, existing_buf_end))); + } + /* could be expecting DATA + CRLF or just CRLF */ + let payload_size = if expecting_from_io > 2 { + expecting_from_io - 2 + } else { + 0 + }; + /* expecting_from_io < existing_buf_end + 2 */ + if expecting_from_io >= existing_buf_end { + self.body_state = self + .body_state + .partial_chunk(payload_size, expecting_from_io - existing_buf_end); + return Ok(Some(BufRef::new(0, payload_size))); + } + + /* expecting_from_io < existing_buf_end */ + self.body_state = + self.body_state.multi_chunk(payload_size, expecting_from_io); + return Ok(Some(BufRef::new(0, payload_size))); + } + self.parse_chunked_buf(existing_buf_start, existing_buf_end) + } + } + _ => panic!("wrong body state: {:?}", self.body_state), + } + } + + fn parse_chunked_buf( + &mut self, + buf_index_start: usize, + buf_index_end: usize, + ) -> Result<Option<BufRef>> { + let buf = &self.body_buf.as_ref().unwrap()[buf_index_start..buf_index_end]; + let chunk_status = httparse::parse_chunk_size(buf); + match chunk_status { + Ok(status) => { + match status { + httparse::Status::Complete((payload_index, chunk_size)) => { + // TODO: Check chunk_size overflow + trace!( + "Got size {chunk_size}, payload_index: {payload_index}, chunk: {:?}", + String::from_utf8_lossy(buf) + ); + let chunk_size = chunk_size as usize; + if chunk_size == 0 { + /* terminating chunk. TODO: trailer */ + self.body_state = self.body_state.finish(0); + return Ok(None); + } + // chunk-size CRLF [payload_index] byte*[chunk_size] CRLF + let data_end_index = payload_index + chunk_size; + let chunk_end_index = data_end_index + 2; + if chunk_end_index >= buf.len() { + // no multi chunk in this buf + let actual_size = if data_end_index > buf.len() { + buf.len() - payload_index + } else { + chunk_size + }; + self.body_state = self + .body_state + .partial_chunk(actual_size, chunk_end_index - buf.len()); + return Ok(Some(BufRef::new( + buf_index_start + payload_index, + actual_size, + ))); + } + /* got multiple chunks, return the first */ + self.body_state = self + .body_state + .multi_chunk(chunk_size, buf_index_start + chunk_end_index); + Ok(Some(BufRef::new( + buf_index_start + payload_index, + chunk_size, + ))) + } + httparse::Status::Partial => { + if buf.len() > PARTIAL_CHUNK_HEAD_LIMIT { + // https://datatracker.ietf.org/doc/html/rfc9112#name-chunk-extensions + // "A server ought to limit the total length of chunk extensions received" + // The buf.len() here is the total length of chunk-size + chunk-ext seen + // so far. This check applies to both server and client + self.body_state = self.body_state.done(0); + Error::e_explain(INVALID_CHUNK, "Chunk ext over limit") + } else { + self.body_state = + self.body_state.partial_chunk_head(buf_index_end, buf.len()); + Ok(Some(BufRef::new(0, 0))) + } + } + } + } + Err(e) => { + let context = format!("Invalid chucked encoding: {e:?}"); + debug!("{context}, {:?}", String::from_utf8_lossy(buf)); + self.body_state = self.body_state.done(0); + Error::e_explain(INVALID_CHUNK, context) + } + } + } +} + +#[derive(Clone, Debug, PartialEq, Eq)] +pub enum BodyMode { + ToSelect, + ContentLength(usize, usize), // total length to write, bytes already written + ChunkedEncoding(usize), //bytes written + HTTP1_0(usize), //bytes written + Complete(usize), //bytes written +} + +type BM = BodyMode; + +pub struct BodyWriter { + pub body_mode: BodyMode, +} + +impl BodyWriter { + pub fn new() -> Self { + BodyWriter { + body_mode: BM::ToSelect, + } + } + + pub fn init_chunked(&mut self) { + self.body_mode = BM::ChunkedEncoding(0); + } + + pub fn init_http10(&mut self) { + self.body_mode = BM::HTTP1_0(0); + } + + pub fn init_content_length(&mut self, cl: usize) { + self.body_mode = BM::ContentLength(cl, 0); + } + + // NOTE on buffering/flush stream when writing the body + // Buffering writes can reduce the syscalls hence improves efficiency of the system + // But it hurts real time communication + // So we only allow buffering when the body size is known ahead, which is less likely + // to be real time interaction + + pub async fn write_body<S>(&mut self, stream: &mut S, buf: &[u8]) -> Result<Option<usize>> + where + S: AsyncWrite + Unpin + Send, + { + trace!("Writing Body, size: {}", buf.len()); + match self.body_mode { + BM::Complete(_) => Ok(None), + BM::ContentLength(_, _) => self.do_write_body(stream, buf).await, + BM::ChunkedEncoding(_) => self.do_write_chunked_body(stream, buf).await, + BM::HTTP1_0(_) => self.do_write_http1_0_body(stream, buf).await, + BM::ToSelect => Ok(None), // Error here? + } + } + + pub fn finished(&self) -> bool { + match self.body_mode { + BM::Complete(_) => true, + BM::ContentLength(total, written) => written >= total, + _ => false, + } + } + + async fn do_write_body<S>(&mut self, stream: &mut S, buf: &[u8]) -> Result<Option<usize>> + where + S: AsyncWrite + Unpin + Send, + { + match self.body_mode { + BM::ContentLength(total, written) => { + if written >= total { + // already written full length + return Ok(None); + } + let mut to_write = total - written; + if to_write < buf.len() { + warn!("Trying to write data over content-length: {total}"); + } else { + to_write = buf.len(); + } + let res = stream.write_all(&buf[..to_write]).await; + match res { + Ok(()) => { + self.body_mode = BM::ContentLength(total, written + to_write); + if self.finished() { + stream.flush().await.or_err(WriteError, "flushing body")?; + } + Ok(Some(to_write)) + } + Err(e) => Error::e_because(WriteError, "while writing body", e), + } + } + _ => panic!("wrong body mode: {:?}", self.body_mode), + } + } + + async fn do_write_chunked_body<S>( + &mut self, + stream: &mut S, + buf: &[u8], + ) -> Result<Option<usize>> + where + S: AsyncWrite + Unpin + Send, + { + match self.body_mode { + BM::ChunkedEncoding(written) => { + let chunk_size = buf.len(); + + let chuck_size_buf = format!("{:X}\r\n", chunk_size); + let mut output_buf = Bytes::from(chuck_size_buf).chain(buf).chain(&b"\r\n"[..]); + + while output_buf.has_remaining() { + let res = stream.write_vec(&mut output_buf).await; + match res { + Ok(n) => { + if n == 0 { + return Error::e_explain(ConnectionClosed, "while writing body"); + } + } + Err(e) => { + return Error::e_because(WriteError, "while writing body", e); + } + } + } + stream.flush().await.or_err(WriteError, "flushing body")?; + self.body_mode = BM::ChunkedEncoding(written + chunk_size); + Ok(Some(chunk_size)) + } + _ => panic!("wrong body mode: {:?}", self.body_mode), + } + } + + async fn do_write_http1_0_body<S>( + &mut self, + stream: &mut S, + buf: &[u8], + ) -> Result<Option<usize>> + where + S: AsyncWrite + Unpin + Send, + { + match self.body_mode { + BM::HTTP1_0(written) => { + let res = stream.write_all(buf).await; + match res { + Ok(()) => { + self.body_mode = BM::HTTP1_0(written + buf.len()); + stream.flush().await.or_err(WriteError, "flushing body")?; + Ok(Some(buf.len())) + } + Err(e) => Error::e_because(WriteError, "while writing body", e), + } + } + _ => panic!("wrong body mode: {:?}", self.body_mode), + } + } + + pub async fn finish<S>(&mut self, stream: &mut S) -> Result<Option<usize>> + where + S: AsyncWrite + Unpin + Send, + { + match self.body_mode { + BM::Complete(_) => Ok(None), + BM::ContentLength(_, _) => self.do_finish_body(stream), + BM::ChunkedEncoding(_) => self.do_finish_chunked_body(stream).await, + BM::HTTP1_0(_) => self.do_finish_http1_0_body(stream), + BM::ToSelect => Ok(None), + } + } + + fn do_finish_body<S>(&mut self, _stream: S) -> Result<Option<usize>> { + match self.body_mode { + BM::ContentLength(total, written) => { + self.body_mode = BM::Complete(written); + if written < total { + return Error::e_explain( + PREMATURE_BODY_END, + format!("Content-length: {total} bytes written: {written}"), + ); + } + Ok(Some(written)) + } + _ => panic!("wrong body mode: {:?}", self.body_mode), + } + } + + async fn do_finish_chunked_body<S>(&mut self, stream: &mut S) -> Result<Option<usize>> + where + S: AsyncWrite + Unpin + Send, + { + match self.body_mode { + BM::ChunkedEncoding(written) => { + let res = stream.write_all(&LAST_CHUNK[..]).await; + self.body_mode = BM::Complete(written); + match res { + Ok(()) => Ok(Some(written)), + Err(e) => Error::e_because(WriteError, "while writing body", e), + } + } + _ => panic!("wrong body mode: {:?}", self.body_mode), + } + } + + fn do_finish_http1_0_body<S>(&mut self, _stream: &mut S) -> Result<Option<usize>> { + match self.body_mode { + BM::HTTP1_0(written) => { + self.body_mode = BM::Complete(written); + Ok(Some(written)) + } + _ => panic!("wrong body mode: {:?}", self.body_mode), + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::utils::BufRef; + use tokio_test::io::Builder; + + fn init_log() { + let _ = env_logger::builder().is_test(true).try_init(); + } + + #[tokio::test] + async fn read_with_body_content_length() { + init_log(); + let input = b"abc"; + let mut mock_io = Builder::new().read(&input[..]).build(); + let mut body_reader = BodyReader::new(); + body_reader.init_content_length(3, b""); + let res = body_reader.read_body(&mut mock_io).await.unwrap().unwrap(); + assert_eq!(res, BufRef::new(0, 3)); + assert_eq!(body_reader.body_state, ParseState::Complete(3)); + assert_eq!(input, body_reader.get_body(&res)); + } + + #[tokio::test] + async fn read_with_body_content_length_2() { + init_log(); + let input1 = b"a"; + let input2 = b"bc"; + let mut mock_io = Builder::new().read(&input1[..]).read(&input2[..]).build(); + let mut body_reader = BodyReader::new(); + body_reader.init_content_length(3, b""); + let res = body_reader.read_body(&mut mock_io).await.unwrap().unwrap(); + assert_eq!(res, BufRef::new(0, 1)); + assert_eq!(body_reader.body_state, ParseState::Partial(1, 2)); + assert_eq!(input1, body_reader.get_body(&res)); + let res = body_reader.read_body(&mut mock_io).await.unwrap().unwrap(); + assert_eq!(res, BufRef::new(0, 2)); + assert_eq!(body_reader.body_state, ParseState::Complete(3)); + assert_eq!(input2, body_reader.get_body(&res)); + } + + #[tokio::test] + async fn read_with_body_content_length_less() { + init_log(); + let input1 = b"a"; + let input2 = b""; // simulating close + let mut mock_io = Builder::new().read(&input1[..]).read(&input2[..]).build(); + let mut body_reader = BodyReader::new(); + body_reader.init_content_length(3, b""); + let res = body_reader.read_body(&mut mock_io).await.unwrap().unwrap(); + assert_eq!(res, BufRef::new(0, 1)); + assert_eq!(body_reader.body_state, ParseState::Partial(1, 2)); + assert_eq!(input1, body_reader.get_body(&res)); + let res = body_reader.read_body(&mut mock_io).await.unwrap_err(); + assert_eq!(&ConnectionClosed, res.etype()); + assert_eq!(body_reader.body_state, ParseState::Done(1)); + } + + #[tokio::test] + async fn read_with_body_content_length_more() { + init_log(); + let input1 = b"a"; + let input2 = b"bcd"; + let mut mock_io = Builder::new().read(&input1[..]).read(&input2[..]).build(); + let mut body_reader = BodyReader::new(); + body_reader.init_content_length(3, b""); + let res = body_reader.read_body(&mut mock_io).await.unwrap().unwrap(); + assert_eq!(res, BufRef::new(0, 1)); + assert_eq!(body_reader.body_state, ParseState::Partial(1, 2)); + assert_eq!(input1, body_reader.get_body(&res)); + let res = body_reader.read_body(&mut mock_io).await.unwrap().unwrap(); + assert_eq!(res, BufRef::new(0, 2)); + assert_eq!(body_reader.body_state, ParseState::Complete(3)); + assert_eq!(&input2[0..2], body_reader.get_body(&res)); + } + + #[tokio::test] + async fn read_with_body_content_length_rewind() { + init_log(); + let rewind = b"ab"; + let input = b"c"; + let mut mock_io = Builder::new().read(&input[..]).build(); + let mut body_reader = BodyReader::new(); + body_reader.init_content_length(3, rewind); + let res = body_reader.read_body(&mut mock_io).await.unwrap().unwrap(); + assert_eq!(res, BufRef::new(0, 2)); + assert_eq!(body_reader.body_state, ParseState::Partial(2, 1)); + assert_eq!(rewind, body_reader.get_body(&res)); + let res = body_reader.read_body(&mut mock_io).await.unwrap().unwrap(); + assert_eq!(res, BufRef::new(0, 1)); + assert_eq!(body_reader.body_state, ParseState::Complete(3)); + assert_eq!(input, body_reader.get_body(&res)); + } + + #[tokio::test] + async fn read_with_body_http10() { + init_log(); + let input1 = b"a"; + let input2 = b""; // simulating close + let mut mock_io = Builder::new().read(&input1[..]).read(&input2[..]).build(); + let mut body_reader = BodyReader::new(); + body_reader.init_http10(b""); + let res = body_reader.read_body(&mut mock_io).await.unwrap().unwrap(); + assert_eq!(res, BufRef::new(0, 1)); + assert_eq!(body_reader.body_state, ParseState::HTTP1_0(1)); + assert_eq!(input1, body_reader.get_body(&res)); + let res = body_reader.read_body(&mut mock_io).await.unwrap(); + assert_eq!(res, None); + assert_eq!(body_reader.body_state, ParseState::Complete(1)); + } + + #[tokio::test] + async fn read_with_body_http10_rewind() { + init_log(); + let rewind = b"ab"; + let input1 = b"c"; + let input2 = b""; // simulating close + let mut mock_io = Builder::new().read(&input1[..]).read(&input2[..]).build(); + let mut body_reader = BodyReader::new(); + body_reader.init_http10(rewind); + let res = body_reader.read_body(&mut mock_io).await.unwrap().unwrap(); + assert_eq!(res, BufRef::new(0, 2)); + assert_eq!(body_reader.body_state, ParseState::HTTP1_0(2)); + assert_eq!(rewind, body_reader.get_body(&res)); + let res = body_reader.read_body(&mut mock_io).await.unwrap().unwrap(); + assert_eq!(res, BufRef::new(0, 1)); + assert_eq!(body_reader.body_state, ParseState::HTTP1_0(3)); + assert_eq!(input1, body_reader.get_body(&res)); + let res = body_reader.read_body(&mut mock_io).await.unwrap(); + assert_eq!(res, None); + assert_eq!(body_reader.body_state, ParseState::Complete(3)); + } + + #[tokio::test] + async fn read_with_body_zero_chunk() { + init_log(); + let input = b"0\r\n\r\n"; + let mut mock_io = Builder::new().read(&input[..]).build(); + let mut body_reader = BodyReader::new(); + body_reader.init_chunked(b""); + let res = body_reader.read_body(&mut mock_io).await.unwrap(); + assert_eq!(res, None); + assert_eq!(body_reader.body_state, ParseState::Complete(0)); + } + + #[tokio::test] + async fn read_with_body_chunk_ext() { + init_log(); + let input = b"0;aaaa\r\n\r\n"; + let mut mock_io = Builder::new().read(&input[..]).build(); + let mut body_reader = BodyReader::new(); + body_reader.init_chunked(b""); + let res = body_reader.read_body(&mut mock_io).await.unwrap(); + assert_eq!(res, None); + assert_eq!(body_reader.body_state, ParseState::Complete(0)); + } + + #[tokio::test] + async fn read_with_body_chunk_ext_oversize() { + init_log(); + let chunk_size = b"0;"; + let ext1 = [b'a'; 1024 * 5]; + let ext2 = [b'a'; 1024 * 3]; + let mut mock_io = Builder::new() + .read(&chunk_size[..]) + .read(&ext1[..]) + .read(&ext2[..]) + .build(); + let mut body_reader = BodyReader::new(); + body_reader.init_chunked(b""); + // read chunk-size, chunk incomplete + let res = body_reader.read_body(&mut mock_io).await.unwrap(); + assert_eq!(res, Some(BufRef::new(0, 0))); + // read ext1, chunk incomplete + let res = body_reader.read_body(&mut mock_io).await.unwrap(); + assert_eq!(res, Some(BufRef::new(0, 0))); + // read ext2, now oversized + let res = body_reader.read_body(&mut mock_io).await; + assert!(res.is_err()); + assert_eq!(body_reader.body_state, ParseState::Done(0)); + } + + #[tokio::test] + async fn read_with_body_1_chunk() { + init_log(); + let input1 = b"1\r\na\r\n"; + let input2 = b"0\r\n\r\n"; + let mut mock_io = Builder::new().read(&input1[..]).read(&input2[..]).build(); + let mut body_reader = BodyReader::new(); + body_reader.init_chunked(b""); + let res = body_reader.read_body(&mut mock_io).await.unwrap().unwrap(); + assert_eq!(res, BufRef::new(3, 1)); + assert_eq!(&input1[3..4], body_reader.get_body(&res)); + assert_eq!(body_reader.body_state, ParseState::Chunked(1, 0, 0, 0)); + let res = body_reader.read_body(&mut mock_io).await.unwrap(); + assert_eq!(res, None); + assert_eq!(body_reader.body_state, ParseState::Complete(1)); + } + + #[tokio::test] + async fn read_with_body_1_chunk_rewind() { + init_log(); + let rewind = b"1\r\nx\r\n"; + let input1 = b"1\r\na\r\n"; + let input2 = b"0\r\n\r\n"; + let mut mock_io = Builder::new().read(&input1[..]).read(&input2[..]).build(); + let mut body_reader = BodyReader::new(); + body_reader.init_chunked(rewind); + let res = body_reader.read_body(&mut mock_io).await.unwrap().unwrap(); + assert_eq!(res, BufRef::new(3, 1)); + assert_eq!(&rewind[3..4], body_reader.get_body(&res)); + assert_eq!(body_reader.body_state, ParseState::Chunked(1, 0, 0, 0)); + let res = body_reader.read_body(&mut mock_io).await.unwrap().unwrap(); + assert_eq!(res, BufRef::new(3, 1)); + assert_eq!(&input1[3..4], body_reader.get_body(&res)); + assert_eq!(body_reader.body_state, ParseState::Chunked(2, 0, 0, 0)); + let res = body_reader.read_body(&mut mock_io).await.unwrap(); + assert_eq!(res, None); + assert_eq!(body_reader.body_state, ParseState::Complete(2)); + } + + #[tokio::test] + async fn read_with_body_multi_chunk() { + init_log(); + let input1 = b"1\r\na\r\n2\r\nbc\r\n"; + let input2 = b"0\r\n\r\n"; + let mut mock_io = Builder::new().read(&input1[..]).read(&input2[..]).build(); + let mut body_reader = BodyReader::new(); + body_reader.init_chunked(b""); + let res = body_reader.read_body(&mut mock_io).await.unwrap().unwrap(); + assert_eq!(res, BufRef::new(3, 1)); + assert_eq!(&input1[3..4], body_reader.get_body(&res)); + assert_eq!(body_reader.body_state, ParseState::Chunked(1, 6, 13, 0)); + let res = body_reader.read_body(&mut mock_io).await.unwrap().unwrap(); + assert_eq!(res, BufRef::new(9, 2)); + assert_eq!(&input1[9..11], body_reader.get_body(&res)); + assert_eq!(body_reader.body_state, ParseState::Chunked(3, 0, 0, 0)); + let res = body_reader.read_body(&mut mock_io).await.unwrap(); + assert_eq!(res, None); + assert_eq!(body_reader.body_state, ParseState::Complete(3)); + } + + #[tokio::test] + async fn read_with_body_partial_chunk() { + init_log(); + let input1 = b"3\r\na"; + let input2 = b"bc\r\n0\r\n\r\n"; + let mut mock_io = Builder::new().read(&input1[..]).read(&input2[..]).build(); + let mut body_reader = BodyReader::new(); + body_reader.init_chunked(b""); + let res = body_reader.read_body(&mut mock_io).await.unwrap().unwrap(); + assert_eq!(res, BufRef::new(3, 1)); + assert_eq!(&input1[3..4], body_reader.get_body(&res)); + assert_eq!(body_reader.body_state, ParseState::Chunked(1, 0, 0, 4)); + let res = body_reader.read_body(&mut mock_io).await.unwrap().unwrap(); + assert_eq!(res, BufRef::new(0, 2)); + assert_eq!(&input2[0..2], body_reader.get_body(&res)); + assert_eq!(body_reader.body_state, ParseState::Chunked(3, 4, 9, 0)); + let res = body_reader.read_body(&mut mock_io).await.unwrap(); + assert_eq!(res, None); + assert_eq!(body_reader.body_state, ParseState::Complete(3)); + } + + #[tokio::test] + async fn read_with_body_partial_head_chunk() { + init_log(); + let input1 = b"1\r"; + let input2 = b"\na\r\n0\r\n\r\n"; + let mut mock_io = Builder::new().read(&input1[..]).read(&input2[..]).build(); + let mut body_reader = BodyReader::new(); + body_reader.init_chunked(b""); + let res = body_reader.read_body(&mut mock_io).await.unwrap().unwrap(); + assert_eq!(res, BufRef::new(0, 0)); + assert_eq!(body_reader.body_state, ParseState::Chunked(0, 0, 2, 2)); + let res = body_reader.read_body(&mut mock_io).await.unwrap().unwrap(); + assert_eq!(res, BufRef::new(3, 1)); // input1 concat input2 + assert_eq!(&input2[1..2], body_reader.get_body(&res)); + assert_eq!(body_reader.body_state, ParseState::Chunked(1, 6, 11, 0)); + let res = body_reader.read_body(&mut mock_io).await.unwrap(); + assert_eq!(res, None); + assert_eq!(body_reader.body_state, ParseState::Complete(1)); + } + + #[tokio::test] + async fn write_body_cl() { + init_log(); + let output = b"a"; + let mut mock_io = Builder::new().write(&output[..]).build(); + let mut body_writer = BodyWriter::new(); + body_writer.init_content_length(1); + assert_eq!(body_writer.body_mode, BodyMode::ContentLength(1, 0)); + let res = body_writer + .write_body(&mut mock_io, &output[..]) + .await + .unwrap() + .unwrap(); + assert_eq!(res, 1); + assert_eq!(body_writer.body_mode, BodyMode::ContentLength(1, 1)); + // write again, over the limit + let res = body_writer + .write_body(&mut mock_io, &output[..]) + .await + .unwrap(); + assert_eq!(res, None); + assert_eq!(body_writer.body_mode, BodyMode::ContentLength(1, 1)); + let res = body_writer.finish(&mut mock_io).await.unwrap().unwrap(); + assert_eq!(res, 1); + assert_eq!(body_writer.body_mode, BodyMode::Complete(1)); + } + + #[tokio::test] + async fn write_body_chunked() { + init_log(); + let data = b"abcdefghij"; + let output = b"A\r\nabcdefghij\r\n"; + let mut mock_io = Builder::new() + .write(&output[..]) + .write(&output[..]) + .write(&LAST_CHUNK[..]) + .build(); + let mut body_writer = BodyWriter::new(); + body_writer.init_chunked(); + assert_eq!(body_writer.body_mode, BodyMode::ChunkedEncoding(0)); + let res = body_writer + .write_body(&mut mock_io, &data[..]) + .await + .unwrap() + .unwrap(); + assert_eq!(res, data.len()); + assert_eq!(body_writer.body_mode, BodyMode::ChunkedEncoding(data.len())); + let res = body_writer + .write_body(&mut mock_io, &data[..]) + .await + .unwrap() + .unwrap(); + assert_eq!(res, data.len()); + assert_eq!( + body_writer.body_mode, + BodyMode::ChunkedEncoding(data.len() * 2) + ); + let res = body_writer.finish(&mut mock_io).await.unwrap().unwrap(); + assert_eq!(res, data.len() * 2); + assert_eq!(body_writer.body_mode, BodyMode::Complete(data.len() * 2)); + } + + #[tokio::test] + async fn write_body_http10() { + init_log(); + let data = b"a"; + let mut mock_io = Builder::new().write(&data[..]).write(&data[..]).build(); + let mut body_writer = BodyWriter::new(); + body_writer.init_http10(); + assert_eq!(body_writer.body_mode, BodyMode::HTTP1_0(0)); + let res = body_writer + .write_body(&mut mock_io, &data[..]) + .await + .unwrap() + .unwrap(); + assert_eq!(res, 1); + assert_eq!(body_writer.body_mode, BodyMode::HTTP1_0(1)); + let res = body_writer + .write_body(&mut mock_io, &data[..]) + .await + .unwrap() + .unwrap(); + assert_eq!(res, 1); + assert_eq!(body_writer.body_mode, BodyMode::HTTP1_0(2)); + let res = body_writer.finish(&mut mock_io).await.unwrap().unwrap(); + assert_eq!(res, 2); + assert_eq!(body_writer.body_mode, BodyMode::Complete(2)); + } +} diff --git a/pingora-core/src/protocols/http/v1/client.rs b/pingora-core/src/protocols/http/v1/client.rs new file mode 100644 index 0000000..1d970d7 --- /dev/null +++ b/pingora-core/src/protocols/http/v1/client.rs @@ -0,0 +1,1085 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! HTTP/1.x client session + +use bytes::{BufMut, Bytes, BytesMut}; +use http::{header, header::AsHeaderName, HeaderValue, StatusCode, Version}; +use log::{debug, trace}; +use pingora_error::{Error, ErrorType::*, OrErr, Result, RetryType}; +use pingora_http::{HMap, IntoCaseHeaderName, RequestHeader, ResponseHeader}; +use pingora_timeout::timeout; +use std::io::ErrorKind; +use std::str; +use std::time::Duration; +use tokio::io::{AsyncReadExt, AsyncWriteExt}; + +use super::body::{BodyReader, BodyWriter}; +use super::common::*; +use crate::protocols::http::HttpTask; +use crate::protocols::{Digest, Stream, UniqueID}; +use crate::utils::{BufRef, KVRef}; + +/// The HTTP 1.x client session +pub struct HttpSession { + buf: Bytes, + pub(crate) underlying_stream: Stream, + raw_header: Option<BufRef>, + preread_body: Option<BufRef>, + body_reader: BodyReader, + body_writer: BodyWriter, + // timeouts: + /// The read timeout, which will be applied to both reading the header and the body. + /// The timeout is reset on every read. This is not a timeout on the overall duration of the + /// response. + pub read_timeout: Option<Duration>, + /// The write timeout which will be applied to both writing request header and body. + /// The timeout is reset on every write. This is not a timeout on the overall duration of the + /// request. + pub write_timeout: Option<Duration>, + keepalive_timeout: KeepaliveStatus, + pub(crate) digest: Box<Digest>, + response_header: Option<Box<ResponseHeader>>, + request_written: Option<Box<RequestHeader>>, + bytes_sent: usize, + upgraded: bool, +} + +/// HTTP 1.x client session +impl HttpSession { + /// Create a new http client session from an established (TCP or TLS) [`Stream`]. + pub fn new(stream: Stream) -> Self { + // TODO: maybe we should put digest in the connection itself + let digest = Box::new(Digest { + ssl_digest: stream.get_ssl_digest(), + timing_digest: stream.get_timing_digest(), + proxy_digest: stream.get_proxy_digest(), + }); + HttpSession { + underlying_stream: stream, + buf: Bytes::new(), // zero size, will be replaced by parsed header later + raw_header: None, + preread_body: None, + body_reader: BodyReader::new(), + body_writer: BodyWriter::new(), + keepalive_timeout: KeepaliveStatus::Off, + response_header: None, + request_written: None, + read_timeout: None, + write_timeout: None, + digest, + bytes_sent: 0, + upgraded: false, + } + } + /// Write the request header to the server + /// After the request header is sent. The caller can either start reading the response or + /// sending request body if any. + pub async fn write_request_header(&mut self, req: Box<RequestHeader>) -> Result<usize> { + // TODO: make sure this can only be called once + // init body writer + self.init_req_body_writer(&req); + + let to_wire = http_req_header_to_wire(&req).unwrap(); + trace!("Writing request header: {to_wire:?}"); + + let write_fut = self.underlying_stream.write_all(to_wire.as_ref()); + match self.write_timeout { + Some(t) => match timeout(t, write_fut).await { + Ok(res) => res, + Err(_) => Err(std::io::Error::from(ErrorKind::TimedOut)), + }, + None => write_fut.await, + } + .map_err(|e| match e.kind() { + ErrorKind::TimedOut => { + Error::because(WriteTimedout, "while writing request headers (timeout)", e) + } + _ => Error::because(WriteError, "while writing request headers", e), + })?; + + self.underlying_stream + .flush() + .await + .or_err(WriteError, "flushing request header")?; + + // write was successful + self.request_written = Some(req); + Ok(to_wire.len()) + } + + async fn do_write_body(&mut self, buf: &[u8]) -> Result<Option<usize>> { + let written = self + .body_writer + .write_body(&mut self.underlying_stream, buf) + .await; + + if let Ok(Some(num_bytes)) = written { + self.bytes_sent += num_bytes; + } + + written + } + + /// Write request body. Return Ok(None) if no more body should be written, either due to + /// Content-Length or the last chunk is already sent + pub async fn write_body(&mut self, buf: &[u8]) -> Result<Option<usize>> { + // TODO: verify that request header is sent already + match self.write_timeout { + Some(t) => match timeout(t, self.do_write_body(buf)).await { + Ok(res) => res, + Err(_) => Error::e_explain(WriteTimedout, format!("writing body, timeout: {t:?}")), + }, + None => self.do_write_body(buf).await, + } + } + + fn maybe_force_close_body_reader(&mut self) { + if self.upgraded && !self.body_reader.body_done() { + // request is done, reset the response body to close + self.body_reader.init_content_length(0, b""); + } + } + + /// Flush local buffer and notify the server by sending the last chunk if chunked encoding is + /// used. + pub async fn finish_body(&mut self) -> Result<Option<usize>> { + let res = self.body_writer.finish(&mut self.underlying_stream).await?; + self.underlying_stream + .flush() + .await + .or_err(WriteError, "flushing body")?; + + self.maybe_force_close_body_reader(); + Ok(res) + } + + /// Read the response header from the server + /// This function can be called multiple times, if the headers received are just informational + /// headers. + pub async fn read_response(&mut self) -> Result<usize> { + self.buf.clear(); + let mut buf = BytesMut::with_capacity(INIT_HEADER_BUF_SIZE); + let mut already_read: usize = 0; + loop { + if already_read > MAX_HEADER_SIZE { + /* NOTE: this check only blocks second read. The first large read is allowed + since the buf is already allocated. The goal is to avoid slowly bloating + this buffer */ + return Error::e_explain( + InvalidHTTPHeader, + format!("Response header larger than {MAX_HEADER_SIZE}"), + ); + } + + let read_fut = self.underlying_stream.read_buf(&mut buf); + let read_result = match self.read_timeout { + Some(t) => match timeout(t, read_fut).await { + Ok(res) => res, + Err(_) => Err(std::io::Error::from(ErrorKind::TimedOut)), + }, + None => read_fut.await, + }; + let n = match read_result { + Ok(n) => match n { + 0 => { + let mut e = Error::explain( + ConnectionClosed, + format!( + "while reading response headers, bytes already read: {already_read}", + ), + ); + e.retry = RetryType::ReusedOnly; + return Err(e); + } + _ => { + n /* read n bytes, continue */ + } + }, + Err(e) => { + return match e.kind() { + ErrorKind::TimedOut => { + Error::e_explain(ReadTimedout, "while reading response headers") + } + _ => { + let true_io_error = e.raw_os_error().is_some(); + let mut e = Error::because( + ReadError, + format!( + "while reading response headers, bytes already read: {already_read}", + ), + e, + ); + // Likely OSError, typical if a previously reused connection drops it + if true_io_error { + e.retry = RetryType::ReusedOnly; + } // else: not safe to retry TLS error + Err(e) + } + }; + } + }; + already_read += n; + let mut headers = [httparse::EMPTY_HEADER; MAX_HEADERS]; + let mut resp = httparse::Response::new(&mut headers); + let parsed = parse_resp_buffer(&mut resp, &buf); + match parsed { + HeaderParseState::Complete(s) => { + self.raw_header = Some(BufRef(0, s)); + self.preread_body = Some(BufRef(s, already_read)); + let base = buf.as_ptr() as usize; + let mut header_refs = Vec::<KVRef>::with_capacity(resp.headers.len()); + + // Note: resp.headers has the correct number of headers + // while header_refs doesn't as it is still empty + let _num_headers = populate_headers(base, &mut header_refs, resp.headers); + + let mut response_header = Box::new(ResponseHeader::build( + resp.code.unwrap(), + Some(resp.headers.len()), + )?); + + response_header.set_version(match resp.version { + Some(1) => Version::HTTP_11, + Some(0) => Version::HTTP_10, + _ => Version::HTTP_09, + }); + + let buf = buf.freeze(); + + for header in header_refs { + let header_name = header.get_name_bytes(&buf); + let header_name = header_name.into_case_header_name(); + let value_bytes = header.get_value_bytes(&buf); + let header_value = if cfg!(debug_assertions) { + // from_maybe_shared_unchecked() in debug mode still checks whether + // the header value is valid, which breaks the _obsolete_multiline + // support. To work around this, in debug mode, we replace CRLF with + // whitespace + if let Some(p) = value_bytes.windows(CRLF.len()).position(|w| w == CRLF) + { + let mut new_header = Vec::from_iter(value_bytes); + new_header[p] = b' '; + new_header[p + 1] = b' '; + unsafe { + http::HeaderValue::from_maybe_shared_unchecked(new_header) + } + } else { + unsafe { + http::HeaderValue::from_maybe_shared_unchecked(value_bytes) + } + } + } else { + // safe because this is from what we parsed + unsafe { http::HeaderValue::from_maybe_shared_unchecked(value_bytes) } + }; + response_header + .append_header(header_name, header_value) + .or_err(InvalidHTTPHeader, "while parsing request header")?; + } + + self.buf = buf; + self.upgraded = self.is_upgrade(&response_header).unwrap_or(false); + self.response_header = Some(response_header); + return Ok(s); + } + HeaderParseState::Partial => { /* continue the loop */ } + HeaderParseState::Invalid(e) => { + return Error::e_because( + InvalidHTTPHeader, + format!("buf: {:?}", String::from_utf8_lossy(&buf)), + e, + ) + } + } + } + } + + /// Similar to [`Self::read_response()`], read the response header and then return a copy of it. + pub async fn read_resp_header_parts(&mut self) -> Result<Box<ResponseHeader>> { + self.read_response().await?; + // safe to unwrap because it is just read + Ok(Box::new(self.resp_header().unwrap().clone())) + } + + /// Return a reference of the [`ResponseHeader`] if the response is read + pub fn resp_header(&self) -> Option<&ResponseHeader> { + self.response_header.as_deref() + } + + /// Get the header value for the given header name from the response header + /// If there are multiple headers under the same name, the first one will be returned + /// Use `self.resp_header().header.get_all(name)` to get all the headers under the same name + /// Always return `None` if the response is not read yet. + pub fn get_header(&self, name: impl AsHeaderName) -> Option<&HeaderValue> { + self.response_header + .as_ref() + .and_then(|h| h.headers.get(name)) + } + + /// Get the request header as raw bytes, `b""` when the header doesn't exist or response not read + pub fn get_header_bytes(&self, name: impl AsHeaderName) -> &[u8] { + self.get_header(name).map_or(b"", |v| v.as_bytes()) + } + + /// Return the status code of the response if read + pub fn get_status(&self) -> Option<StatusCode> { + self.response_header.as_ref().map(|h| h.status) + } + + async fn do_read_body(&mut self) -> Result<Option<BufRef>> { + self.init_body_reader(); + self.body_reader + .read_body(&mut self.underlying_stream) + .await + } + + /// Read the response body into the internal buffer. + /// Return `Ok(Some(ref)) after a successful read. + /// Return `Ok(None)` if there is no more body to read. + pub async fn read_body_ref(&mut self) -> Result<Option<&[u8]>> { + let result = match self.read_timeout { + Some(t) => match timeout(t, self.do_read_body()).await { + Ok(res) => res, + Err(_) => Error::e_explain(ReadTimedout, format!("reading body, timeout: {t:?}")), + }, + None => self.do_read_body().await, + }; + + result.map(|maybe_body| maybe_body.map(|body_ref| self.body_reader.get_body(&body_ref))) + } + + /// Similar to [`Self::read_body_ref`] but return `Bytes` instead of a slice reference. + pub async fn read_body_bytes(&mut self) -> Result<Option<Bytes>> { + let read = self.read_body_ref().await?; + Ok(read.map(Bytes::copy_from_slice)) + } + + /// Whether there is no more body to read. + pub fn is_body_done(&mut self) -> bool { + self.init_body_reader(); + self.body_reader.body_done() + } + + pub(super) fn get_headers_raw(&self) -> &[u8] { + // TODO: these get_*() could panic. handle them better + self.raw_header.as_ref().unwrap().get(&self.buf[..]) + } + + /// Get the raw response header bytes + pub fn get_headers_raw_bytes(&self) -> Bytes { + self.raw_header.as_ref().unwrap().get_bytes(&self.buf) + } + + fn set_keepalive(&mut self, seconds: Option<u64>) { + match seconds { + Some(sec) => { + if sec > 0 { + self.keepalive_timeout = KeepaliveStatus::Timeout(Duration::from_secs(sec)); + } else { + self.keepalive_timeout = KeepaliveStatus::Infinite; + } + } + None => { + self.keepalive_timeout = KeepaliveStatus::Off; + } + } + } + + /// Apply keepalive settings according to the server's response + /// For HTTP 1.1, assume keepalive as long as there is no `Connection: Close` request header. + /// For HTTP 1.0, only keepalive if there is an explicit header `Connection: keep-alive`. + pub fn respect_keepalive(&mut self) { + if self.get_status() == Some(StatusCode::SWITCHING_PROTOCOLS) { + // make sure the connection is closed at the end when 101/upgrade is used + self.set_keepalive(None); + return; + } + if let Some(keepalive) = self.is_connection_keepalive() { + if keepalive { + let (timeout, _max_use) = self.get_keepalive_values(); + // TODO: respect max_use + match timeout { + Some(d) => self.set_keepalive(Some(d)), + None => self.set_keepalive(Some(0)), // infinite + } + } else { + self.set_keepalive(None); + } + } else if self.resp_header().map(|h| h.version) == Some(Version::HTTP_11) { + self.set_keepalive(Some(0)); // on by default for http 1.1 + } else { + self.set_keepalive(None); // off by default for http 1.0 + } + } + + // Whether this session will be kept alive + pub fn will_keepalive(&self) -> bool { + // TODO: check self.body_writer. If it is http1.0 type then keepalive + // cannot be used because the connection close is the signal of end body + !matches!(self.keepalive_timeout, KeepaliveStatus::Off) + } + + fn is_connection_keepalive(&self) -> Option<bool> { + is_buf_keepalive(self.get_header(header::CONNECTION).map(|v| v.as_bytes())) + } + + // `Keep-Alive: timeout=5, max=1000` => 5, 1000 + fn get_keepalive_values(&self) -> (Option<u64>, Option<usize>) { + // TODO: implement this parsing + (None, None) + } + + /// Close the connection abruptly. This allows to signal the server that the connection is closed + /// before dropping [`HttpSession`] + pub async fn shutdown(&mut self) { + let _ = self.underlying_stream.shutdown().await; + } + + /// Consume `self`, if the connection can be reused, the underlying stream will be returned. + /// The returned connection can be kept in a connection pool so that next time the same + /// server is being contacted. A new client session can be created via [`Self::new()`]. + /// If the connection cannot be reused, the underlying stream will be closed and `None` will be + /// returned. + pub async fn reuse(mut self) -> Option<Stream> { + // TODO: this function is unnecessarily slow for keepalive case + // because that case does not need async + match self.keepalive_timeout { + KeepaliveStatus::Off => { + debug!("HTTP shutdown connection"); + self.shutdown().await; + None + } + _ => Some(self.underlying_stream), + } + } + + fn init_body_reader(&mut self) { + if self.body_reader.need_init() { + /* follow https://tools.ietf.org/html/rfc7230#section-3.3.3 */ + let preread_body = self.preread_body.as_ref().unwrap().get(&self.buf[..]); + + if let Some(req) = self.request_written.as_ref() { + if req.method == http::method::Method::HEAD { + self.body_reader.init_content_length(0, preread_body); + return; + } + } + + let upgraded = if let Some(code) = self.get_status() { + match code.as_u16() { + 101 => self.is_upgrade_req(), + 100..=199 => { + // informational headers, not enough to init body reader + return; + } + 204 | 304 => { + // no body by definition + self.body_reader.init_content_length(0, preread_body); + return; + } + _ => false, + } + } else { + false + }; + + if upgraded { + self.body_reader.init_http10(preread_body); + } else if self.is_chunked_encoding() { + // if chunked encoding, content-length should be ignored + self.body_reader.init_chunked(preread_body); + } else if let Some(cl) = self.get_content_length() { + self.body_reader.init_content_length(cl, preread_body); + } else { + self.body_reader.init_http10(preread_body); + } + } + } + + /// Whether this request is for upgrade + pub fn is_upgrade_req(&self) -> bool { + match self.request_written.as_deref() { + Some(req) => is_upgrade_req(req), + None => false, + } + } + + /// `Some(true)` if the this is a successful upgrade + /// `Some(false)` if the request is an upgrade but the response refuses it + /// `None` if the request is not an upgrade. + fn is_upgrade(&self, header: &ResponseHeader) -> Option<bool> { + if self.is_upgrade_req() { + Some(is_upgrade_resp(header)) + } else { + None + } + } + + fn get_content_length(&self) -> Option<usize> { + buf_to_content_length( + self.get_header(header::CONTENT_LENGTH) + .map(|v| v.as_bytes()), + ) + } + + fn is_chunked_encoding(&self) -> bool { + is_header_value_chunked_encoding(self.get_header(header::TRANSFER_ENCODING)) + } + + fn init_req_body_writer(&mut self, header: &RequestHeader) { + if self.is_upgrade_req() { + self.body_writer.init_http10(); + } else { + self.init_body_writer_comm(&header.headers) + } + } + + fn init_body_writer_comm(&mut self, headers: &HMap) { + let te_value = headers.get(http::header::TRANSFER_ENCODING); + if is_header_value_chunked_encoding(te_value) { + // transfer-encoding takes priority over content-length + self.body_writer.init_chunked(); + } else { + let content_length = + header_value_content_length(headers.get(http::header::CONTENT_LENGTH)); + match content_length { + Some(length) => { + self.body_writer.init_content_length(length); + } + None => { + /* TODO: 1. connection: keepalive cannot be used, + 2. mark connection must be closed */ + self.body_writer.init_http10(); + } + } + } + } + + // should (continue to) try to read response header or start reading response body + fn should_read_resp_header(&self) -> bool { + match self.get_status().map(|s| s.as_u16()) { + Some(101) => false, // switching protocol successful, no more header to read + Some(100..=199) => true, // only informational header read + Some(_) => false, + None => true, // no response code, no header read yet + } + } + + pub async fn read_response_task(&mut self) -> Result<HttpTask> { + if self.should_read_resp_header() { + let resp_header = self.read_resp_header_parts().await?; + let end_of_body = self.is_body_done(); + debug!("Response header: {:?}", resp_header); + trace!( + "Raw Response header: {:?}", + str::from_utf8(self.get_headers_raw()).unwrap() + ); + Ok(HttpTask::Header(resp_header, end_of_body)) + } else if self.is_body_done() { + debug!("Response is done"); + Ok(HttpTask::Done) + } else { + /* need to read body */ + let data = self.read_body_bytes().await?; + let end_of_body = self.is_body_done(); + if let Some(body) = data { + debug!("Response body: {} bytes", body.len()); + trace!("Response body: {:?}", body); + Ok(HttpTask::Body(Some(body), end_of_body)) + } else { + debug!("Response is done"); + Ok(HttpTask::Done) + } + } + // TODO: support h1 trailer + } + + pub fn digest(&self) -> &Digest { + &self.digest + } +} + +#[inline] +fn parse_resp_buffer<'buf>( + resp: &mut httparse::Response<'_, 'buf>, + buf: &'buf [u8], +) -> HeaderParseState { + let mut parser = httparse::ParserConfig::default(); + parser.allow_spaces_after_header_name_in_responses(true); + parser.allow_obsolete_multiline_headers_in_responses(true); + let res = match parser.parse_response(resp, buf) { + Ok(s) => s, + Err(e) => { + return HeaderParseState::Invalid(e); + } + }; + match res { + httparse::Status::Complete(s) => HeaderParseState::Complete(s), + _ => HeaderParseState::Partial, + } +} + +// TODO: change it to to_buf +#[inline] +pub(crate) fn http_req_header_to_wire(req: &RequestHeader) -> Option<BytesMut> { + let mut buf = BytesMut::with_capacity(512); + + // Request-Line + let method = req.method.as_str().as_bytes(); + buf.put_slice(method); + buf.put_u8(b' '); + buf.put_slice(req.raw_path()); + buf.put_u8(b' '); + + let version = match req.version { + Version::HTTP_09 => "HTTP/0.9", + Version::HTTP_10 => "HTTP/1.0", + Version::HTTP_11 => "HTTP/1.1", + Version::HTTP_2 => "HTTP/2", + _ => { + return None; /*TODO: unsupported version */ + } + }; + buf.put_slice(version.as_bytes()); + buf.put_slice(CRLF); + + // headers + req.header_to_h1_wire(&mut buf); + buf.put_slice(CRLF); + Some(buf) +} + +impl UniqueID for HttpSession { + fn id(&self) -> i32 { + self.underlying_stream.id() + } +} + +#[cfg(test)] +mod tests_stream { + use super::*; + use crate::protocols::http::v1::body::ParseState; + use crate::ErrorType; + use std::str; + use std::time::Duration; + use tokio_test::io::Builder; + + fn init_log() { + let _ = env_logger::builder().is_test(true).try_init(); + } + + #[tokio::test] + async fn read_basic_response() { + init_log(); + let input = b"HTTP/1.1 200 OK\r\n\r\n"; + let mock_io = Builder::new().read(&input[..]).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + let res = http_stream.read_response().await; + assert_eq!(input.len(), res.unwrap()); + assert_eq!(0, http_stream.resp_header().unwrap().headers.len()); + } + + #[tokio::test] + async fn read_response_default() { + init_log(); + let input_header = b"HTTP/1.1 200 OK\r\n\r\n"; + let input_body = b"abc"; + let input_close = b""; // simulating close + let mock_io = Builder::new() + .read(&input_header[..]) + .read(&input_body[..]) + .read(&input_close[..]) + .build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + let res = http_stream.read_response().await; + assert_eq!(input_header.len(), res.unwrap()); + let res = http_stream.read_body_ref().await.unwrap(); + assert_eq!(res.unwrap(), input_body); + assert_eq!(http_stream.body_reader.body_state, ParseState::HTTP1_0(3)); + let res = http_stream.read_body_ref().await.unwrap(); + assert_eq!(res, None); + assert_eq!(http_stream.body_reader.body_state, ParseState::Complete(3)); + } + + #[tokio::test] + async fn read_resp_header_with_space() { + init_log(); + let input = b"HTTP/1.1 200 OK\r\nServer : pingora\r\n\r\n"; + let mock_io = Builder::new().read(&input[..]).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + let res = http_stream.read_response().await; + assert_eq!(input.len(), res.unwrap()); + assert_eq!(1, http_stream.resp_header().unwrap().headers.len()); + assert_eq!(http_stream.get_header("Server").unwrap(), "pingora"); + } + + #[cfg(feature = "patched_http1")] + #[tokio::test] + async fn read_resp_header_with_utf8() { + init_log(); + let input = "HTTP/1.1 200 OK\r\nServer👍: pingora\r\n\r\n".as_bytes(); + let mock_io = Builder::new().read(input).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + let resp = http_stream.read_resp_header_parts().await.unwrap(); + assert_eq!(1, http_stream.resp_header().unwrap().headers.len()); + assert_eq!(http_stream.get_header("Server👍").unwrap(), "pingora"); + assert_eq!(resp.headers.get("Server👍").unwrap(), "pingora"); + } + + #[tokio::test] + #[should_panic(expected = "There is still data left to read.")] + async fn read_timeout() { + init_log(); + let input = b"HTTP/1.1 200 OK\r\n\r\n"; + let mock_io = Builder::new() + .wait(Duration::from_secs(2)) + .read(&input[..]) + .build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + http_stream.read_timeout = Some(Duration::from_secs(1)); + let res = http_stream.read_response().await; + assert_eq!(res.unwrap_err().etype(), &ErrorType::ReadTimedout); + } + + #[tokio::test] + async fn read_2_buf() { + init_log(); + let input1 = b"HTTP/1.1 200 OK\r\n"; + let input2 = b"Server: pingora\r\n\r\n"; + let mock_io = Builder::new().read(&input1[..]).read(&input2[..]).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + let res = http_stream.read_response().await; + assert_eq!(input1.len() + input2.len(), res.unwrap()); + assert_eq!( + input1.len() + input2.len(), + http_stream.get_headers_raw().len() + ); + assert_eq!(1, http_stream.resp_header().unwrap().headers.len()); + assert_eq!(http_stream.get_header("Server").unwrap(), "pingora"); + + assert_eq!(Some(StatusCode::OK), http_stream.get_status()); + assert_eq!(Version::HTTP_11, http_stream.resp_header().unwrap().version); + } + + #[tokio::test] + #[should_panic(expected = "There is still data left to read.")] + async fn read_invalid() { + let input1 = b"HTP/1.1 200 OK\r\n"; + let input2 = b"Server: pingora\r\n\r\n"; + let mock_io = Builder::new().read(&input1[..]).read(&input2[..]).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + let res = http_stream.read_response().await; + assert_eq!(&ErrorType::InvalidHTTPHeader, res.unwrap_err().etype()); + } + + #[tokio::test] + async fn write() { + let wire = b"GET /test HTTP/1.1\r\nFoo: Bar\r\n\r\n"; + let mock_io = Builder::new().write(wire).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + let mut new_request = RequestHeader::build("GET", b"/test", None).unwrap(); + new_request.insert_header("Foo", "Bar").unwrap(); + let n = http_stream + .write_request_header(Box::new(new_request)) + .await + .unwrap(); + assert_eq!(wire.len(), n); + } + + #[tokio::test] + #[should_panic(expected = "There is still data left to write.")] + async fn write_timeout() { + let wire = b"GET /test HTTP/1.1\r\nFoo: Bar\r\n\r\n"; + let mock_io = Builder::new() + .wait(Duration::from_secs(2)) + .write(wire) + .build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + http_stream.write_timeout = Some(Duration::from_secs(1)); + let mut new_request = RequestHeader::build("GET", b"/test", None).unwrap(); + new_request.insert_header("Foo", "Bar").unwrap(); + let res = http_stream + .write_request_header(Box::new(new_request)) + .await; + assert_eq!(res.unwrap_err().etype(), &ErrorType::WriteTimedout); + } + + #[tokio::test] + #[should_panic(expected = "There is still data left to write.")] + async fn write_body_timeout() { + let header = b"POST /test HTTP/1.1\r\n\r\n"; + let body = b"abc"; + let mock_io = Builder::new() + .write(&header[..]) + .wait(Duration::from_secs(2)) + .write(&body[..]) + .build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + http_stream.write_timeout = Some(Duration::from_secs(1)); + + let new_request = RequestHeader::build("POST", b"/test", None).unwrap(); + http_stream + .write_request_header(Box::new(new_request)) + .await + .unwrap(); + let res = http_stream.write_body(body).await; + assert_eq!(res.unwrap_err().etype(), &WriteTimedout); + } + + #[cfg(feature = "patched_http1")] + #[tokio::test] + async fn write_invalid_path() { + let wire = b"GET /\x01\xF0\x90\x80 HTTP/1.1\r\nFoo: Bar\r\n\r\n"; + let mock_io = Builder::new().write(wire).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + let mut new_request = RequestHeader::build("GET", b"/\x01\xF0\x90\x80", None).unwrap(); + new_request.insert_header("Foo", "Bar").unwrap(); + let n = http_stream + .write_request_header(Box::new(new_request)) + .await + .unwrap(); + assert_eq!(wire.len(), n); + } + + #[tokio::test] + async fn read_informational() { + init_log(); + let input1 = b"HTTP/1.1 100 Continue\r\n\r\n"; + let input2 = b"HTTP/1.1 204 OK\r\nServer: pingora\r\n\r\n"; + let mock_io = Builder::new().read(&input1[..]).read(&input2[..]).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + + // read 100 header first + let task = http_stream.read_response_task().await.unwrap(); + match task { + HttpTask::Header(h, eob) => { + assert_eq!(h.status, 100); + assert!(!eob); + } + _ => { + panic!("task should be header") + } + } + // read 200 header next + let task = http_stream.read_response_task().await.unwrap(); + match task { + HttpTask::Header(h, eob) => { + assert_eq!(h.status, 204); + assert!(eob); + } + _ => { + panic!("task should be header") + } + } + } + + #[tokio::test] + async fn read_switching_protocol() { + init_log(); + let input1 = b"HTTP/1.1 101 Continue\r\n\r\n"; + let input2 = b"PAYLOAD"; + let mock_io = Builder::new().read(&input1[..]).read(&input2[..]).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + + // read 100 header first + let task = http_stream.read_response_task().await.unwrap(); + match task { + HttpTask::Header(h, eob) => { + assert_eq!(h.status, 101); + assert!(!eob); + } + _ => { + panic!("task should be header") + } + } + // read body + let task = http_stream.read_response_task().await.unwrap(); + match task { + HttpTask::Body(b, eob) => { + assert_eq!(b.unwrap(), &input2[..]); + assert!(!eob); + } + _ => { + panic!("task should be body") + } + } + // read body + let task = http_stream.read_response_task().await.unwrap(); + match task { + HttpTask::Done => {} + _ => { + panic!("task should be Done") + } + } + } + + // Note: in debug mode, due to from_maybe_shared_unchecked() still tries to validate headers + // values, so the code has to replace CRLF with whitespaces. In release mode, the CRLF is + // reserved + #[tokio::test] + async fn read_obsolete_multiline_headers() { + init_log(); + let input = b"HTTP/1.1 200 OK\r\nServer : pingora\r\n Foo: Bar\r\n\r\n"; + let mock_io = Builder::new().read(&input[..]).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + let res = http_stream.read_response().await; + assert_eq!(input.len(), res.unwrap()); + + assert_eq!(1, http_stream.resp_header().unwrap().headers.len()); + assert_eq!( + http_stream.get_header("Server").unwrap(), + "pingora Foo: Bar" + ); + + let input = b"HTTP/1.1 200 OK\r\nServer : pingora\r\n\t Fizz: Buzz\r\n\r\n"; + let mock_io = Builder::new().read(&input[..]).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + let res = http_stream.read_response().await; + assert_eq!(input.len(), res.unwrap()); + assert_eq!(1, http_stream.resp_header().unwrap().headers.len()); + assert_eq!( + http_stream.get_header("Server").unwrap(), + "pingora \t Fizz: Buzz" + ); + } + + #[cfg(feature = "patched_http1")] + #[tokio::test] + async fn read_headers_skip_invalid_line() { + init_log(); + let input = b"HTTP/1.1 200 OK\r\n;\r\nFoo: Bar\r\n\r\n"; + let mock_io = Builder::new().read(&input[..]).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + let res = http_stream.read_response().await; + assert_eq!(input.len(), res.unwrap()); + assert_eq!(1, http_stream.resp_header().unwrap().headers.len()); + assert_eq!(http_stream.get_header("Foo").unwrap(), "Bar"); + } + + #[tokio::test] + async fn read_keepalive_headers() { + init_log(); + + async fn build_resp_with_keepalive(conn: &str) -> HttpSession { + let input = format!("HTTP/1.1 200 OK\r\nConnection: {conn}\r\n\r\n"); + let mock_io = Builder::new().read(input.as_bytes()).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + let res = http_stream.read_response().await; + assert_eq!(input.len(), res.unwrap()); + http_stream.respect_keepalive(); + http_stream + } + + assert_eq!( + build_resp_with_keepalive("close").await.keepalive_timeout, + KeepaliveStatus::Off + ); + + assert_eq!( + build_resp_with_keepalive("keep-alive") + .await + .keepalive_timeout, + KeepaliveStatus::Infinite + ); + + assert_eq!( + build_resp_with_keepalive("foo").await.keepalive_timeout, + KeepaliveStatus::Infinite + ); + + assert_eq!( + build_resp_with_keepalive("upgrade,close") + .await + .keepalive_timeout, + KeepaliveStatus::Off + ); + + assert_eq!( + build_resp_with_keepalive("upgrade, close") + .await + .keepalive_timeout, + KeepaliveStatus::Off + ); + + assert_eq!( + build_resp_with_keepalive("Upgrade, close") + .await + .keepalive_timeout, + KeepaliveStatus::Off + ); + + assert_eq!( + build_resp_with_keepalive("Upgrade,close") + .await + .keepalive_timeout, + KeepaliveStatus::Off + ); + + assert_eq!( + build_resp_with_keepalive("close,upgrade") + .await + .keepalive_timeout, + KeepaliveStatus::Off + ); + + assert_eq!( + build_resp_with_keepalive("close, upgrade") + .await + .keepalive_timeout, + KeepaliveStatus::Off + ); + + assert_eq!( + build_resp_with_keepalive("close,Upgrade") + .await + .keepalive_timeout, + KeepaliveStatus::Off + ); + + assert_eq!( + build_resp_with_keepalive("close, Upgrade") + .await + .keepalive_timeout, + KeepaliveStatus::Off + ); + } + + /* Note: body tests are covered in server.rs */ +} + +#[cfg(test)] +mod test_sync { + use super::*; + use log::error; + + #[test] + fn test_request_to_wire() { + let mut new_request = RequestHeader::build("GET", b"/", None).unwrap(); + new_request.insert_header("Foo", "Bar").unwrap(); + let wire = http_req_header_to_wire(&new_request).unwrap(); + let mut headers = [httparse::EMPTY_HEADER; 128]; + let mut req = httparse::Request::new(&mut headers); + let result = req.parse(wire.as_ref()); + match result { + Ok(_) => {} + Err(e) => error!("{:?}", e), + } + assert!(result.unwrap().is_complete()); + // FIXME: the order is not guaranteed + assert_eq!("/", req.path.unwrap()); + assert_eq!(b"Foo", headers[0].name.as_bytes()); + assert_eq!(b"Bar", headers[0].value); + } +} diff --git a/pingora-core/src/protocols/http/v1/common.rs b/pingora-core/src/protocols/http/v1/common.rs new file mode 100644 index 0000000..9ea2cf3 --- /dev/null +++ b/pingora-core/src/protocols/http/v1/common.rs @@ -0,0 +1,237 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Common functions and constants + +use http::header; +use log::warn; +use pingora_http::{HMap, RequestHeader, ResponseHeader}; +use std::str; +use std::time::Duration; + +use super::body::BodyWriter; +use crate::utils::KVRef; + +pub(super) const MAX_HEADERS: usize = 256; + +pub(super) const INIT_HEADER_BUF_SIZE: usize = 4096; +pub(super) const MAX_HEADER_SIZE: usize = 1048575; + +pub(super) const BODY_BUF_LIMIT: usize = 1024 * 64; + +pub const CRLF: &[u8; 2] = b"\r\n"; +pub const HEADER_KV_DELIMITER: &[u8; 2] = b": "; + +pub(super) enum HeaderParseState { + Complete(usize), + Partial, + Invalid(httparse::Error), +} + +#[derive(Clone, Debug, PartialEq, Eq)] +pub(super) enum KeepaliveStatus { + Timeout(Duration), + Infinite, + Off, +} + +struct ConnectionValue { + keep_alive: bool, + upgrade: bool, + close: bool, +} + +impl ConnectionValue { + fn new() -> Self { + ConnectionValue { + keep_alive: false, + upgrade: false, + close: false, + } + } + + fn close(mut self) -> Self { + self.close = true; + self + } + fn upgrade(mut self) -> Self { + self.upgrade = true; + self + } + fn keep_alive(mut self) -> Self { + self.keep_alive = true; + self + } +} + +fn parse_connection_header(value: &[u8]) -> ConnectionValue { + // only parse keep-alive, close, and upgrade tokens + // https://www.rfc-editor.org/rfc/rfc9110.html#section-7.6.1 + + const KEEP_ALIVE: &str = "keep-alive"; + const CLOSE: &str = "close"; + const UPGRADE: &str = "upgrade"; + + // fast path + if value.eq_ignore_ascii_case(CLOSE.as_bytes()) { + ConnectionValue::new().close() + } else if value.eq_ignore_ascii_case(KEEP_ALIVE.as_bytes()) { + ConnectionValue::new().keep_alive() + } else if value.eq_ignore_ascii_case(UPGRADE.as_bytes()) { + ConnectionValue::new().upgrade() + } else { + // slow path, parse the connection value + let mut close = false; + let mut upgrade = false; + let value = str::from_utf8(value).unwrap_or(""); + for token in value + .split(',') + .map(|s| s.trim()) + .filter(|&x| !x.is_empty()) + { + if token.eq_ignore_ascii_case(CLOSE) { + close = true; + } else if token.eq_ignore_ascii_case(UPGRADE) { + upgrade = true; + } + if upgrade && close { + return ConnectionValue::new().upgrade().close(); + } + } + if close { + ConnectionValue::new().close() + } else if upgrade { + ConnectionValue::new().upgrade() + } else { + ConnectionValue::new() + } + } +} + +pub(crate) fn init_body_writer_comm(body_writer: &mut BodyWriter, headers: &HMap) { + let te_value = headers.get(http::header::TRANSFER_ENCODING); + if is_header_value_chunked_encoding(te_value) { + // transfer-encoding takes priority over content-length + body_writer.init_chunked(); + } else { + let content_length = header_value_content_length(headers.get(http::header::CONTENT_LENGTH)); + match content_length { + Some(length) => { + body_writer.init_content_length(length); + } + None => { + /* TODO: 1. connection: keepalive cannot be used, + 2. mark connection must be closed */ + body_writer.init_http10(); + } + } + } +} + +#[inline] +pub(super) fn is_header_value_chunked_encoding( + header_value: Option<&http::header::HeaderValue>, +) -> bool { + match header_value { + Some(value) => value.as_bytes().eq_ignore_ascii_case(b"chunked"), + None => false, + } +} + +pub(super) fn is_upgrade_req(req: &RequestHeader) -> bool { + req.version == http::Version::HTTP_11 && req.headers.get(header::UPGRADE).is_some() +} + +// Unlike the upgrade check on request, this function doesn't check the Upgrade or Connection header +// because when seeing 101, we assume the server accepts to switch protocol. +// In reality it is not common that some servers don't send all the required headers to establish +// websocket connections. +pub(super) fn is_upgrade_resp(header: &ResponseHeader) -> bool { + header.status == 101 && header.version == http::Version::HTTP_11 +} + +#[inline] +pub fn header_value_content_length( + header_value: Option<&http::header::HeaderValue>, +) -> Option<usize> { + match header_value { + Some(value) => buf_to_content_length(Some(value.as_bytes())), + None => None, + } +} + +#[inline] +pub(super) fn buf_to_content_length(header_value: Option<&[u8]>) -> Option<usize> { + match header_value { + Some(buf) => { + match str::from_utf8(buf) { + // check valid string + Ok(str_cl_value) => match str_cl_value.parse::<i64>() { + Ok(cl_length) => { + if cl_length >= 0 { + Some(cl_length as usize) + } else { + warn!("negative content-length header value {cl_length}"); + None + } + } + Err(_) => { + warn!("invalid content-length header value {str_cl_value}"); + None + } + }, + Err(_) => { + warn!("invalid content-length header encoding"); + None + } + } + } + None => None, + } +} + +#[inline] +pub(super) fn is_buf_keepalive(header_value: Option<&[u8]>) -> Option<bool> { + header_value.and_then(|value| { + let value = parse_connection_header(value); + if value.keep_alive { + Some(true) + } else if value.close { + Some(false) + } else { + None + } + }) +} + +#[inline] +pub(super) fn populate_headers( + base: usize, + header_ref: &mut Vec<KVRef>, + headers: &[httparse::Header], +) -> usize { + let mut used_header_index = 0; + for header in headers.iter() { + if !header.name.is_empty() { + header_ref.push(KVRef::new( + header.name.as_ptr() as usize - base, + header.name.as_bytes().len(), + header.value.as_ptr() as usize - base, + header.value.len(), + )); + used_header_index += 1; + } + } + used_header_index +} diff --git a/pingora-core/src/protocols/http/v1/mod.rs b/pingora-core/src/protocols/http/v1/mod.rs new file mode 100644 index 0000000..2604357 --- /dev/null +++ b/pingora-core/src/protocols/http/v1/mod.rs @@ -0,0 +1,20 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! HTTP/1.x implementation + +pub(crate) mod body; +pub mod client; +pub mod common; +pub mod server; diff --git a/pingora-core/src/protocols/http/v1/server.rs b/pingora-core/src/protocols/http/v1/server.rs new file mode 100644 index 0000000..5b3a111 --- /dev/null +++ b/pingora-core/src/protocols/http/v1/server.rs @@ -0,0 +1,1566 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! HTTP/1.x server session + +use bytes::Bytes; +use bytes::{BufMut, BytesMut}; +use http::HeaderValue; +use http::{header, header::AsHeaderName, Method, Version}; +use log::{debug, error, warn}; +use once_cell::sync::Lazy; +use percent_encoding::{percent_encode, AsciiSet, CONTROLS}; +use pingora_error::{Error, ErrorType::*, OrErr, Result}; +use pingora_http::{IntoCaseHeaderName, RequestHeader, ResponseHeader}; +use pingora_timeout::timeout; +use regex::bytes::Regex; +use std::time::Duration; +use tokio::io::{AsyncReadExt, AsyncWriteExt}; + +use super::body::{BodyReader, BodyWriter}; +use super::common::*; +use crate::protocols::http::{body_buffer::FixedBuffer, date, error_resp, HttpTask}; +use crate::protocols::Stream; +use crate::utils::{BufRef, KVRef}; + +/// The HTTP 1.x server session +pub struct HttpSession { + underlying_stream: Stream, + /// The buf that holds the raw request header + possibly a portion of request body + /// Request body can appear here because they could arrive with the same read() that + /// sends the request header. + buf: Bytes, + /// A slice reference to `buf` which points to the exact range of request header + raw_header: Option<BufRef>, + /// A slice reference to `buf` which points to the range of a portion of request body if any + preread_body: Option<BufRef>, + /// A state machine to track how to read the request body + body_reader: BodyReader, + /// A state machine to track how to write the response body + body_writer: BodyWriter, + /// An internal buffer to buf multiple body writes to reduce the underlying syscalls + body_write_buf: BytesMut, + /// Track how many application (not on the wire) body bytes already sent + body_bytes_sent: usize, + /// Whether to update headers like connection, Date + update_resp_headers: bool, + /// timeouts: + keepalive_timeout: KeepaliveStatus, + read_timeout: Option<Duration>, + write_timeout: Option<Duration>, + /// A copy of the response that is already written to the client + response_written: Option<Box<ResponseHeader>>, + /// The parse request header + request_header: Option<Box<RequestHeader>>, + /// An internal buffer that holds a copy of the request body up to a certain size + retry_buffer: Option<FixedBuffer>, + /// Whether this session is an upgraded session. This flag is calculated when sending the + /// response header to the client. + upgraded: bool, +} + +impl HttpSession { + /// Create a new http server session from an established (TCP or TLS) [`Stream`]. + /// The created session needs to call [`Self::read_request()`] first before performing + /// any other operations. + pub fn new(underlying_stream: Stream) -> Self { + HttpSession { + underlying_stream, + buf: Bytes::new(), // zero size, with be replaced by parsed header later + raw_header: None, + preread_body: None, + body_reader: BodyReader::new(), + body_writer: BodyWriter::new(), + body_write_buf: BytesMut::new(), + keepalive_timeout: KeepaliveStatus::Off, + update_resp_headers: true, + response_written: None, + request_header: None, + read_timeout: None, + write_timeout: None, + body_bytes_sent: 0, + retry_buffer: None, + upgraded: false, + } + } + + /// Read the request header. Return `Ok(Some(n))` where the read and parsing are successful. + /// Return `Ok(None)` when the client closed the connection without sending any data, which + /// is common on a reused connection. + pub async fn read_request(&mut self) -> Result<Option<usize>> { + self.buf.clear(); + let mut buf = BytesMut::with_capacity(INIT_HEADER_BUF_SIZE); + let mut already_read: usize = 0; + loop { + if already_read > MAX_HEADER_SIZE { + /* NOTE: this check only blocks second read. The first large read is allowed + since the buf is already allocated. The goal is to avoid slowly bloating + this buffer */ + return Error::e_explain( + InvalidHTTPHeader, + format!("Request header larger than {}", MAX_HEADER_SIZE), + ); + } + + let read_result = { + let read_event = self.underlying_stream.read_buf(&mut buf); + match self.keepalive_timeout { + KeepaliveStatus::Timeout(d) => match timeout(d, read_event).await { + Ok(res) => res, + Err(e) => { + debug!("keepalive timeout {d:?} reached, {e}"); + return Ok(None); + } + }, + _ => read_event.await, + } + }; + let n = match read_result { + Ok(n_read) => { + if n_read == 0 { + if already_read > 0 { + return Error::e_explain( + ConnectionClosed, + format!( + "while reading request headers, bytes already read: {}", + already_read + ), + ); + } else { + /* common when client decides to close a keepalived session */ + debug!("Client prematurely closed connection with 0 byte sent"); + return Ok(None); + } + } + n_read + } + + Err(e) => { + if already_read > 0 { + return Error::e_because(ReadError, "while reading request headers", e); + } + /* nothing harmful since we have not ready any thing yet */ + return Ok(None); + } + }; + already_read += n; + + // Use loop as GOTO to retry escaped request buffer, not a real loop + loop { + let mut headers = [httparse::EMPTY_HEADER; MAX_HEADERS]; + let mut req = httparse::Request::new(&mut headers); + let parsed = parse_req_buffer(&mut req, &buf); + match parsed { + HeaderParseState::Complete(s) => { + self.raw_header = Some(BufRef(0, s)); + self.preread_body = Some(BufRef(s, already_read)); + + // We have the header name and values we parsed to be just 0 copy Bytes + // referencing the original buf. That requires we convert the buf from + // BytesMut to Bytes. But `req` holds a reference to `buf`. So we use the + // `KVRef`s to record the offset of each piece of data, drop `req`, convert + // buf, the do the 0 copy update + let base = buf.as_ptr() as usize; + let mut header_refs = Vec::<KVRef>::with_capacity(req.headers.len()); + // Note: req.headers has the correct number of headers + // while header_refs doesn't as it is still empty + let _num_headers = populate_headers(base, &mut header_refs, req.headers); + + let mut request_header = Box::new(RequestHeader::build( + req.method.unwrap_or(""), + // we path httparse to allow unsafe bytes in the str + req.path.unwrap_or("").as_bytes(), + Some(req.headers.len()), + )?); + + request_header.set_version(match req.version { + Some(1) => Version::HTTP_11, + Some(0) => Version::HTTP_10, + _ => Version::HTTP_09, + }); + + let buf = buf.freeze(); + + for header in header_refs { + let header_name = header.get_name_bytes(&buf); + let header_name = header_name.into_case_header_name(); + let value_bytes = header.get_value_bytes(&buf); + // safe because this is from what we parsed + let header_value = unsafe { + http::HeaderValue::from_maybe_shared_unchecked(value_bytes) + }; + request_header + .append_header(header_name, header_value) + .or_err(InvalidHTTPHeader, "while parsing request header")?; + } + + self.buf = buf; + self.request_header = Some(request_header); + + self.body_reader.reinit(); + self.response_written = None; + self.respect_keepalive(); + + return Ok(Some(s)); + } + HeaderParseState::Partial => { + break; /* continue the read loop */ + } + HeaderParseState::Invalid(e) => match e { + httparse::Error::Token | httparse::Error::Version => { + // try to escape URI + if let Some(new_buf) = escape_illegal_request_line(&buf) { + buf = new_buf; + already_read = buf.len(); + } else { + debug!("Invalid request header from {:?}", self.underlying_stream); + return Error::e_because( + InvalidHTTPHeader, + format!("buf: {:?}", String::from_utf8_lossy(&buf)), + e, + ); + } + } + _ => { + debug!("Invalid request header from {:?}", self.underlying_stream); + return Error::e_because( + InvalidHTTPHeader, + format!("buf: {:?}", String::from_utf8_lossy(&buf)), + e, + ); + } + }, + } + } + } + } + + /// Return a reference of the `RequestHeader` this session read + /// # Panics + /// this function and most other functions will panic if called before [`Self::read_request()`] + pub fn req_header(&self) -> &RequestHeader { + self.request_header + .as_ref() + .expect("Request header is not read yet") + } + + /// Return a mutable reference of the `RequestHeader` this session read + /// # Panics + /// this function and most other functions will panic if called before [`Self::read_request()`] + pub fn req_header_mut(&mut self) -> &mut RequestHeader { + self.request_header + .as_mut() + .expect("Request header is not read yet") + } + + /// Get the header value for the given header name + /// If there are multiple headers under the same name, the first one will be returned + /// Use `self.req_header().header.get_all(name)` to get all the headers under the same name + pub fn get_header(&self, name: impl AsHeaderName) -> Option<&HeaderValue> { + self.request_header + .as_ref() + .and_then(|h| h.headers.get(name)) + } + + /// Return the method of this request. None if the request is not read yet. + pub(super) fn get_method(&self) -> Option<&http::Method> { + self.request_header.as_ref().map(|r| &r.method) + } + + /// Return the path of the request (i.e., the `/hello?1` of `GET /hello?1 HTTP1.1`) + /// An empty slice will be used if there is no path or the request is not read yet + pub(super) fn get_path(&self) -> &[u8] { + self.request_header.as_ref().map_or(b"", |r| r.raw_path()) + } + + /// Return the host header of the request. An empty slice will be used if there is no host header + pub(super) fn get_host(&self) -> &[u8] { + self.request_header + .as_ref() + .and_then(|h| h.headers.get(header::HOST)) + .map_or(b"", |h| h.as_bytes()) + } + + /// Return a string `$METHOD $PATH $HOST`. Mostly for logging and debug purpose + pub fn request_summary(&self) -> String { + format!( + "{} {}, Host: {}", + self.get_method().map_or("-", |r| r.as_str()), + String::from_utf8_lossy(self.get_path()), + String::from_utf8_lossy(self.get_host()) + ) + } + + /// Is the request a upgrade request + pub fn is_upgrade_req(&self) -> bool { + match self.request_header.as_deref() { + Some(req) => is_upgrade_req(req), + None => false, + } + } + + /// Get the request header as raw bytes, `b""` when the header doesn't exist + pub fn get_header_bytes(&self, name: impl AsHeaderName) -> &[u8] { + self.get_header(name).map_or(b"", |v| v.as_bytes()) + } + + /// Read the request body. `Ok(None)` when there is no (more) body to read. + pub async fn read_body_bytes(&mut self) -> Result<Option<Bytes>> { + let read = self.read_body().await?; + Ok(read.map(|b| { + let bytes = Bytes::copy_from_slice(self.get_body(&b)); + if let Some(buffer) = self.retry_buffer.as_mut() { + buffer.write_to_buffer(&bytes); + } + bytes + })) + } + + async fn do_read_body(&mut self) -> Result<Option<BufRef>> { + self.init_body_reader(); + self.body_reader + .read_body(&mut self.underlying_stream) + .await + } + + /// Read the body into the internal buffer + async fn read_body(&mut self) -> Result<Option<BufRef>> { + match self.read_timeout { + Some(t) => match timeout(t, self.do_read_body()).await { + Ok(res) => res, + Err(_) => Error::e_explain(ReadTimedout, format!("reading body, timeout: {t:?}")), + }, + None => self.do_read_body().await, + } + } + + /// Whether there is no (more) body need to be read. + pub fn is_body_done(&mut self) -> bool { + self.init_body_reader(); + self.body_reader.body_done() + } + + /// Whether the request has an empty body + /// Because HTTP 1.1 clients have to send either `Content-Length` or `Transfer-Encoding` in order + /// to signal the server that it will send the body, this function returns accurate results even + /// only when the request header is just read. + pub fn is_body_empty(&mut self) -> bool { + self.init_body_reader(); + self.body_reader.body_empty() + } + + /// Write the response header to the client. + /// This function can be called more than once to send 1xx informational headers excluding 101. + pub async fn write_response_header(&mut self, mut header: Box<ResponseHeader>) -> Result<()> { + if let Some(resp) = self.response_written.as_ref() { + if !resp.status.is_informational() { + warn!("Respond header is already sent, cannot send again"); + return Ok(()); + } + } + + // no need to add these headers to 1xx responses + if !header.status.is_informational() && self.update_resp_headers { + /* update headers */ + header.insert_header(header::DATE, date::get_cached_date())?; + + // TODO: make these lazy static + let connection_value = if self.will_keepalive() { + "keep-alive" + } else { + "close" + }; + header.insert_header(header::CONNECTION, connection_value)?; + } + + if header.status.as_u16() == 101 { + // make sure the connection is closed at the end when 101/upgrade is used + self.set_keepalive(None); + } + + // Allow informational header (excluding 101) to pass through without affecting the state + // of the request + if header.status == 101 || !header.status.is_informational() { + // reset request body to done for incomplete upgrade handshakes + if let Some(upgrade_ok) = self.is_upgrade(&header) { + if upgrade_ok { + debug!("ok upgrade handshake"); + // For ws we use HTTP1_0 do_read_body_until_closed + // + // On ws close the initiator sends a close frame and + // then waits for a response from the peer, once it receives + // a response it closes the conn. After receiving a + // control frame indicating the connection should be closed, + // a peer discards any further data received. + // https://www.rfc-editor.org/rfc/rfc6455#section-1.4 + self.upgraded = true; + } else { + debug!("bad upgrade handshake!"); + // reset request body buf and mark as done + // safe to reset an upgrade because it doesn't have body + self.body_reader.init_content_length(0, b""); + } + } + self.init_body_writer(&header); + } + + // Don't have to flush response with content length because it is less + // likely to be real time communication. So do flush when + // 1.1xx response: client needs to see it before the rest of response + // 2.No content length: the response could be generated in real time + let flush = header.status.is_informational() + || header.headers.get(header::CONTENT_LENGTH).is_none(); + + let mut write_buf = BytesMut::with_capacity(INIT_HEADER_BUF_SIZE); + http_resp_header_to_buf(&header, &mut write_buf).unwrap(); + match self.underlying_stream.write_all(&write_buf).await { + Ok(()) => { + // flush the stream if 1xx header or there is no response body + if flush || self.body_writer.finished() { + self.underlying_stream + .flush() + .await + .or_err(WriteError, "flushing response header")?; + } + self.response_written = Some(header); + self.body_bytes_sent += write_buf.len(); + Ok(()) + } + Err(e) => Error::e_because(WriteError, "writing response header", e), + } + } + + /// Return the response header if it is already sent. + pub fn response_written(&self) -> Option<&ResponseHeader> { + self.response_written.as_deref() + } + + /// `Some(true)` if the this is a successful upgrade + /// `Some(false)` if the request is an upgrade but the response refuses it + /// `None` if the request is not an upgrade. + pub fn is_upgrade(&self, header: &ResponseHeader) -> Option<bool> { + if self.is_upgrade_req() { + Some(is_upgrade_resp(header)) + } else { + None + } + } + + fn set_keepalive(&mut self, seconds: Option<u64>) { + match seconds { + Some(sec) => { + if sec > 0 { + self.keepalive_timeout = KeepaliveStatus::Timeout(Duration::from_secs(sec)); + } else { + self.keepalive_timeout = KeepaliveStatus::Infinite; + } + } + None => { + self.keepalive_timeout = KeepaliveStatus::Off; + } + } + } + + /// Return whether the session will be keepalived for connection reuse. + pub fn will_keepalive(&self) -> bool { + // TODO: check self.body_writer. If it is http1.0 type then keepalive + // cannot be used because the connection close is the signal of end body + !matches!(self.keepalive_timeout, KeepaliveStatus::Off) + } + + // `Keep-Alive: timeout=5, max=1000` => 5, 1000 + fn get_keepalive_values(&self) -> (Option<u64>, Option<usize>) { + // TODO: implement this parsing + (None, None) + } + + fn is_connection_keepalive(&self) -> Option<bool> { + is_buf_keepalive(self.get_header(header::CONNECTION).map(|v| v.as_bytes())) + } + + /// Apply keepalive settings according to the client + /// For HTTP 1.1, assume keepalive as long as there is no `Connection: Close` request header. + /// For HTTP 1.0, only keepalive if there is an explicit header `Connection: keep-alive`. + pub fn respect_keepalive(&mut self) { + if let Some(keepalive) = self.is_connection_keepalive() { + if keepalive { + let (timeout, _max_use) = self.get_keepalive_values(); + // TODO: respect max_use + match timeout { + Some(d) => self.set_keepalive(Some(d)), + None => self.set_keepalive(Some(0)), // infinite + } + } else { + self.set_keepalive(None); + } + } else if self.req_header().version == Version::HTTP_11 { + self.set_keepalive(Some(0)); // on by default for http 1.1 + } else { + self.set_keepalive(None); // off by default for http 1.0 + } + } + + fn init_body_writer(&mut self, header: &ResponseHeader) { + use http::StatusCode; + /* the following responses don't have body 204, 304, and HEAD */ + if matches!( + header.status, + StatusCode::NO_CONTENT | StatusCode::NOT_MODIFIED + ) || self.get_method() == Some(&Method::HEAD) + { + self.body_writer.init_content_length(0); + return; + } + + if header.status.is_informational() && header.status != StatusCode::SWITCHING_PROTOCOLS { + // 1xx response, not enough to init body + return; + } + + if self.is_upgrade(header) == Some(true) { + self.body_writer.init_http10(); + } else { + init_body_writer_comm(&mut self.body_writer, &header.headers); + } + } + + /// Same as [`Self::write_response_header()`] but takes a reference. + pub async fn write_response_header_ref(&mut self, resp: &ResponseHeader) -> Result<()> { + self.write_response_header(Box::new(resp.clone())).await + } + + async fn do_write_body(&mut self, buf: &[u8]) -> Result<Option<usize>> { + let written = self + .body_writer + .write_body(&mut self.underlying_stream, buf) + .await; + + if let Ok(Some(num_bytes)) = written { + self.body_bytes_sent += num_bytes; + } + + written + } + + /// Write response body to the client. Return `Ok(None)` when there shouldn't be more body + /// to be written, e.g., writing more bytes than what the `Content-Length` header suggests + pub async fn write_body(&mut self, buf: &[u8]) -> Result<Option<usize>> { + // TODO: check if the response header is written + match self.write_timeout { + Some(t) => match timeout(t, self.do_write_body(buf)).await { + Ok(res) => res, + Err(_) => Error::e_explain(WriteTimedout, format!("writing body, timeout: {t:?}")), + }, + None => self.do_write_body(buf).await, + } + } + + async fn write_body_buf(&mut self) -> Result<Option<usize>> { + // Don't flush empty chunks, they are considered end of body for chunks + if self.body_write_buf.is_empty() { + return Ok(None); + } + + let written = self + .body_writer + .write_body(&mut self.underlying_stream, &self.body_write_buf) + .await; + + if let Ok(Some(num_bytes)) = written { + self.body_bytes_sent += num_bytes; + } + + // make sure this buf is safe to reuse + self.body_write_buf.clear(); + + written + } + + fn maybe_force_close_body_reader(&mut self) { + if self.upgraded && !self.body_reader.body_done() { + // response is done, reset the request body to close + self.body_reader.init_content_length(0, b""); + } + } + + /// Signal that there is no more body to write. + /// This call will try to flush the buffer if there is any un-flushed data. + /// For chunked encoding response, this call will also send the last chunk. + /// For upgraded sessions, this call will also close the reading of the client body. + pub async fn finish_body(&mut self) -> Result<Option<usize>> { + let res = self.body_writer.finish(&mut self.underlying_stream).await?; + self.underlying_stream + .flush() + .await + .or_err(WriteError, "flushing body")?; + + self.maybe_force_close_body_reader(); + Ok(res) + } + + /// Return how many (application, not wire) body bytes that have been written + pub fn body_bytes_sent(&self) -> usize { + self.body_bytes_sent + } + + fn is_chunked_encoding(&self) -> bool { + is_header_value_chunked_encoding(self.get_header(header::TRANSFER_ENCODING)) + } + + fn get_content_length(&self) -> Option<usize> { + buf_to_content_length( + self.get_header(header::CONTENT_LENGTH) + .map(|v| v.as_bytes()), + ) + } + + fn init_body_reader(&mut self) { + if self.body_reader.need_init() { + // reset retry buffer + if let Some(buffer) = self.retry_buffer.as_mut() { + buffer.clear(); + } + + /* follow https://tools.ietf.org/html/rfc7230#section-3.3.3 */ + let preread_body = self.preread_body.as_ref().unwrap().get(&self.buf[..]); + + if self.req_header().version == Version::HTTP_11 && self.is_upgrade_req() { + self.body_reader.init_http10(preread_body); + return; + } + + if self.is_chunked_encoding() { + // if chunked encoding, content-length should be ignored + self.body_reader.init_chunked(preread_body); + } else { + let cl = self.get_content_length(); + match cl { + Some(i) => { + self.body_reader.init_content_length(i, preread_body); + } + None => { + match self.req_header().version { + Version::HTTP_11 => { + // Per RFC assume no body by default in HTTP 1.1 + self.body_reader.init_content_length(0, preread_body); + } + _ => { + self.body_reader.init_http10(preread_body); + } + } + } + } + } + } + } + + pub fn retry_buffer_truncated(&self) -> bool { + self.retry_buffer + .as_ref() + .map_or_else(|| false, |r| r.is_truncated()) + } + + pub fn enable_retry_buffering(&mut self) { + if self.retry_buffer.is_none() { + self.retry_buffer = Some(FixedBuffer::new(BODY_BUF_LIMIT)) + } + } + + pub fn get_retry_buffer(&self) -> Option<Bytes> { + self.retry_buffer.as_ref().and_then(|b| { + if b.is_truncated() { + None + } else { + b.get_buffer() + } + }) + } + + fn get_body(&self, buf_ref: &BufRef) -> &[u8] { + // TODO: these get_*() could panic. handle them better + self.body_reader.get_body(buf_ref) + } + + /// This function will (async) block forever until the client closes the connection. + pub async fn idle(&mut self) -> Result<usize> { + // NOTE: this implementation breaks http pipelining, ideally we need poll_error + // NOTE: buf cannot be empty, openssl-rs read() requires none empty buf. + let mut buf: [u8; 1] = [0; 1]; + self.underlying_stream + .read(&mut buf) + .await + .or_err(ReadError, "during HTTP idle state") + } + + /// This function will return body bytes (same as [`Self::read_body_bytes()`]), but after + /// the client body finishes (`Ok(None)` is returned), calling this function again will block + /// forever, same as [`Self::idle()`]. + pub async fn read_body_or_idle(&mut self, no_body_expected: bool) -> Result<Option<Bytes>> { + if no_body_expected || self.is_body_done() { + let read = self.idle().await?; + if read == 0 { + Error::e_explain( + ConnectionClosed, + if self.response_written.is_none() { + "Prematurely before response header is sent" + } else { + "Prematurely before response body is complete" + }, + ) + } else { + Error::e_explain(ConnectError, "Sent data after end of body") + } + } else { + self.read_body_bytes().await + } + } + + /// Return the raw bytes of the request header. + pub fn get_headers_raw_bytes(&self) -> Bytes { + self.raw_header.as_ref().unwrap().get_bytes(&self.buf) + } + + /// Close the connection abruptly. This allows to signal the client that the connection is closed + /// before dropping [`HttpSession`] + pub async fn shutdown(&mut self) { + let _ = self.underlying_stream.shutdown().await; + } + + /// Set the server keepalive timeout. + /// `None`: disable keepalive, this session cannot be reused. + /// `Some(0)`: reusing this session is allowed and there is no timeout. + /// `Some(>0)`: reusing this session is allowed within the given timeout in seconds. + /// If the client disallows connection reuse, then `keepalive` will be ignored. + pub fn set_server_keepalive(&mut self, keepalive: Option<u64>) { + if let Some(false) = self.is_connection_keepalive() { + // connection: close is set + self.set_keepalive(None); + } else { + self.set_keepalive(keepalive); + } + } + + /// Consume `self`, if the connection can be reused, the underlying stream will be returned + /// to be fed to the next [`Self::new()`]. The next session can just call [`Self::read_request()`]. + /// If the connection cannot be reused, the underlying stream will be closed and `None` will be + /// returned. + pub async fn reuse(mut self) -> Option<Stream> { + // TODO: this function is unnecessarily slow for keepalive case + // because that case does not need async + match self.keepalive_timeout { + KeepaliveStatus::Off => { + debug!("HTTP shutdown connection"); + self.shutdown().await; + None + } + _ => Some(self.underlying_stream), + } + } + + /// Return a error response to the client. This default error response comes with `cache-control: private, no-store`. + /// It has no response body. + pub async fn respond_error(&mut self, error_status_code: u16) { + let (resp, resp_tmp) = match error_status_code { + /* commmon error responses are pre-generated */ + 502 => (Some(&*error_resp::HTTP_502_RESPONSE), None), + 400 => (Some(&*error_resp::HTTP_400_RESPONSE), None), + _ => ( + None, + Some(error_resp::gen_error_response(error_status_code)), + ), + }; + + let resp = match resp { + Some(r) => r, + None => resp_tmp.as_ref().unwrap(), + }; + + self.write_response_header_ref(resp) + .await + .unwrap_or_else(|e| { + error!("failed to send error response to downstream: {}", e); + }); + } + + /// Write a `100 Continue` response to the client. + pub async fn write_continue_response(&mut self) -> Result<()> { + // only send if we haven't already + if self.response_written.is_none() { + // size hint Some(0) because default is 8 + return self + .write_response_header(Box::new(ResponseHeader::build(100, Some(0)).unwrap())) + .await; + } + Ok(()) + } + + async fn response_duplex(&mut self, task: HttpTask) -> Result<bool> { + match task { + HttpTask::Header(header, end_stream) => { + self.write_response_header(header) + .await + .map_err(|e| e.into_down())?; + Ok(end_stream) + } + HttpTask::Body(data, end_stream) => match data { + Some(d) => { + if !d.is_empty() { + self.write_body(&d).await.map_err(|e| e.into_down())?; + } + Ok(end_stream) + } + None => Ok(end_stream), + }, + HttpTask::Trailer(_) => Ok(true), // h1 trailer is not supported yet + HttpTask::Done => { + self.finish_body().await.map_err(|e| e.into_down())?; + Ok(true) + } + HttpTask::Failed(e) => Err(e), + } + } + + // TODO: use vectored write to avoid copying + pub async fn response_duplex_vec(&mut self, mut tasks: Vec<HttpTask>) -> Result<bool> { + let n_tasks = tasks.len(); + if n_tasks == 1 { + // fallback to single operation to avoid copy + return self.response_duplex(tasks.pop().unwrap()).await; + } + let mut end_stream = false; + for task in tasks.into_iter() { + end_stream = match task { + HttpTask::Header(header, end_stream) => { + self.write_response_header(header) + .await + .map_err(|e| e.into_down())?; + end_stream + } + HttpTask::Body(data, end_stream) => match data { + Some(d) => { + if !d.is_empty() && !self.body_writer.finished() { + self.body_write_buf.put_slice(&d); + } + end_stream + } + None => end_stream, + }, + HttpTask::Trailer(_) => true, // h1 trailer is not supported yet + HttpTask::Done => { + // flush body first + self.write_body_buf().await.map_err(|e| e.into_down())?; + self.finish_body().await.map_err(|e| e.into_down())?; + return Ok(true); + } + HttpTask::Failed(e) => { + // flush the data we have and quit + self.write_body_buf().await.map_err(|e| e.into_down())?; + self.underlying_stream + .flush() + .await + .or_err(WriteError, "flushing response")?; + return Err(e); + } + } + } + self.write_body_buf().await.map_err(|e| e.into_down())?; + Ok(end_stream) + } +} + +// Regex to parse request line that has illegal chars in it +static REQUEST_LINE_REGEX: Lazy<Regex> = + Lazy::new(|| Regex::new(r"^\w+ (?P<uri>.+) HTTP/\d(?:\.\d)?").unwrap()); + +// the chars httparse considers illegal in URL +// Almost https://url.spec.whatwg.org/#query-percent-encode-set + {} +const URI_ESC_CHARSET: &AsciiSet = &CONTROLS.add(b' ').add(b'<').add(b'>').add(b'"'); + +fn escape_illegal_request_line(buf: &BytesMut) -> Option<BytesMut> { + if let Some(captures) = REQUEST_LINE_REGEX.captures(buf) { + // return if nothing matches: not a request line at all + let uri = captures.name("uri")?; + + let escaped_uri = percent_encode(uri.as_bytes(), URI_ESC_CHARSET); + + // rebuild the entire request buf in a new buffer + // TODO: this might be able to be done in place + + // need to be slightly bigger than the current buf; + let mut new_buf = BytesMut::with_capacity(buf.len() + 32); + new_buf.extend_from_slice(&buf[..uri.start()]); + + for s in escaped_uri { + new_buf.extend_from_slice(s.as_bytes()); + } + + if new_buf.len() == uri.end() { + // buf unchanged, nothing is escaped, return None to avoid loop + return None; + } + + new_buf.extend_from_slice(&buf[uri.end()..]); + + Some(new_buf) + } else { + None + } +} + +#[inline] +fn parse_req_buffer<'buf>( + req: &mut httparse::Request<'_, 'buf>, + buf: &'buf [u8], +) -> HeaderParseState { + use httparse::Result; + + #[cfg(feature = "patched_http1")] + fn parse<'buf>(req: &mut httparse::Request<'_, 'buf>, buf: &'buf [u8]) -> Result<usize> { + req.parse_unchecked(buf) + } + + #[cfg(not(feature = "patched_http1"))] + fn parse<'buf>(req: &mut httparse::Request<'_, 'buf>, buf: &'buf [u8]) -> Result<usize> { + req.parse(buf) + } + + let res = match parse(req, buf) { + Ok(s) => s, + Err(e) => { + return HeaderParseState::Invalid(e); + } + }; + match res { + httparse::Status::Complete(s) => HeaderParseState::Complete(s), + _ => HeaderParseState::Partial, + } +} + +#[inline] +fn http_resp_header_to_buf( + resp: &ResponseHeader, + buf: &mut BytesMut, +) -> std::result::Result<(), ()> { + // Status-Line + let version = match resp.version { + Version::HTTP_09 => "HTTP/0.9 ", + Version::HTTP_10 => "HTTP/1.0 ", + Version::HTTP_11 => "HTTP/1.1 ", + _ => { + return Err(()); /*TODO: unsupported version */ + } + }; + buf.put_slice(version.as_bytes()); + let status = resp.status; + buf.put_slice(status.as_str().as_bytes()); + buf.put_u8(b' '); + let reason = status.canonical_reason(); + if let Some(reason_buf) = reason { + buf.put_slice(reason_buf.as_bytes()); + } + buf.put_slice(CRLF); + + // headers + // TODO: style: make sure Server and Date headers are the first two + resp.header_to_h1_wire(buf); + + buf.put_slice(CRLF); + Ok(()) +} + +#[cfg(test)] +mod tests_stream { + use super::*; + use crate::protocols::http::v1::body::{BodyMode, ParseState}; + use http::{Method, StatusCode}; + use std::str; + use std::time::Duration; + use tokio_test::io::Builder; + + fn init_log() { + let _ = env_logger::builder().is_test(true).try_init(); + } + + #[tokio::test] + async fn read_basic() { + init_log(); + let input = b"GET / HTTP/1.1\r\n\r\n"; + let mock_io = Builder::new().read(&input[..]).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + let res = http_stream.read_request().await; + assert_eq!(input.len(), res.unwrap().unwrap()); + assert_eq!(0, http_stream.req_header().headers.len()); + } + + #[cfg(feature = "patched_http1")] + #[tokio::test] + async fn read_invalid_path() { + init_log(); + let input = b"GET /\x01\xF0\x90\x80 HTTP/1.1\r\n\r\n"; + let mock_io = Builder::new().read(&input[..]).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + let res = http_stream.read_request().await; + assert_eq!(input.len(), res.unwrap().unwrap()); + assert_eq!(0, http_stream.req_header().headers.len()); + assert_eq!(b"/\x01\xF0\x90\x80", http_stream.get_path()); + } + + #[tokio::test] + async fn read_2_buf() { + init_log(); + let input1 = b"GET / HTTP/1.1\r\n"; + let input2 = b"Host: pingora.org\r\n\r\n"; + let mock_io = Builder::new().read(&input1[..]).read(&input2[..]).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + let res = http_stream.read_request().await; + assert_eq!(input1.len() + input2.len(), res.unwrap().unwrap()); + assert_eq!( + input1.len() + input2.len(), + http_stream.raw_header.as_ref().unwrap().len() + ); + assert_eq!(1, http_stream.req_header().headers.len()); + assert_eq!(Some(&Method::GET), http_stream.get_method()); + assert_eq!(b"/", http_stream.get_path()); + assert_eq!(Version::HTTP_11, http_stream.req_header().version); + + assert_eq!(b"pingora.org", http_stream.get_header_bytes("Host")); + } + + #[tokio::test] + async fn read_with_body_content_length() { + init_log(); + let input1 = b"GET / HTTP/1.1\r\n"; + let input2 = b"Host: pingora.org\r\nContent-Length: 3\r\n\r\n"; + let input3 = b"abc"; + let mock_io = Builder::new() + .read(&input1[..]) + .read(&input2[..]) + .read(&input3[..]) + .build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + http_stream.read_request().await.unwrap(); + let res = http_stream.read_body().await.unwrap().unwrap(); + assert_eq!(res, BufRef::new(0, 3)); + assert_eq!(http_stream.body_reader.body_state, ParseState::Complete(3)); + assert_eq!(input3, http_stream.get_body(&res)); + } + + #[tokio::test] + #[should_panic(expected = "There is still data left to read.")] + async fn read_with_body_timeout() { + init_log(); + let input1 = b"GET / HTTP/1.1\r\n"; + let input2 = b"Host: pingora.org\r\nContent-Length: 3\r\n\r\n"; + let input3 = b"abc"; + let mock_io = Builder::new() + .read(&input1[..]) + .read(&input2[..]) + .wait(Duration::from_secs(2)) + .read(&input3[..]) + .build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + http_stream.read_timeout = Some(Duration::from_secs(1)); + http_stream.read_request().await.unwrap(); + let res = http_stream.read_body().await; + assert_eq!(res.unwrap_err().etype(), &ReadTimedout); + } + + #[tokio::test] + async fn read_with_body_content_length_single_read() { + init_log(); + let input1 = b"GET / HTTP/1.1\r\n"; + let input2 = b"Host: pingora.org\r\nContent-Length: 3\r\n\r\nabc"; + let mock_io = Builder::new().read(&input1[..]).read(&input2[..]).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + http_stream.read_request().await.unwrap(); + let res = http_stream.read_body().await.unwrap().unwrap(); + assert_eq!(res, BufRef::new(0, 3)); + assert_eq!(http_stream.body_reader.body_state, ParseState::Complete(3)); + assert_eq!(b"abc", http_stream.get_body(&res)); + } + + #[tokio::test] + async fn read_with_body_http10() { + init_log(); + let input1 = b"GET / HTTP/1.0\r\n"; + let input2 = b"Host: pingora.org\r\n\r\n"; + let input3 = b"a"; + let input4 = b""; // simulating close + let mock_io = Builder::new() + .read(&input1[..]) + .read(&input2[..]) + .read(&input3[..]) + .read(&input4[..]) + .build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + http_stream.read_request().await.unwrap(); + let res = http_stream.read_body().await.unwrap().unwrap(); + assert_eq!(res, BufRef::new(0, 1)); + assert_eq!(http_stream.body_reader.body_state, ParseState::HTTP1_0(1)); + assert_eq!(input3, http_stream.get_body(&res)); + let res = http_stream.read_body().await.unwrap(); + assert_eq!(res, None); + assert_eq!(http_stream.body_reader.body_state, ParseState::Complete(1)); + } + + #[tokio::test] + async fn read_with_body_http10_single_read() { + init_log(); + let input1 = b"GET / HTTP/1.0\r\n"; + let input2 = b"Host: pingora.org\r\n\r\na"; + let input3 = b"b"; + let input4 = b""; // simulating close + let mock_io = Builder::new() + .read(&input1[..]) + .read(&input2[..]) + .read(&input3[..]) + .read(&input4[..]) + .build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + http_stream.read_request().await.unwrap(); + let res = http_stream.read_body().await.unwrap().unwrap(); + assert_eq!(res, BufRef::new(0, 1)); + assert_eq!(http_stream.body_reader.body_state, ParseState::HTTP1_0(1)); + assert_eq!(b"a", http_stream.get_body(&res)); + let res = http_stream.read_body().await.unwrap().unwrap(); + assert_eq!(res, BufRef::new(0, 1)); + assert_eq!(http_stream.body_reader.body_state, ParseState::HTTP1_0(2)); + assert_eq!(input3, http_stream.get_body(&res)); + let res = http_stream.read_body().await.unwrap(); + assert_eq!(res, None); + assert_eq!(http_stream.body_reader.body_state, ParseState::Complete(2)); + } + + #[tokio::test] + async fn read_http11_default_no_body() { + init_log(); + let input1 = b"GET / HTTP/1.1\r\n"; + let input2 = b"Host: pingora.org\r\n\r\n"; + let mock_io = Builder::new().read(&input1[..]).read(&input2[..]).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + http_stream.read_request().await.unwrap(); + let res = http_stream.read_body().await.unwrap(); + assert_eq!(res, None); + assert_eq!(http_stream.body_reader.body_state, ParseState::Complete(0)); + } + + #[tokio::test] + async fn read_with_body_chunked_0() { + init_log(); + let input1 = b"GET / HTTP/1.1\r\n"; + let input2 = b"Host: pingora.org\r\nTransfer-Encoding: chunked\r\n\r\n"; + let input3 = b"0\r\n"; + let mock_io = Builder::new() + .read(&input1[..]) + .read(&input2[..]) + .read(&input3[..]) + .build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + http_stream.read_request().await.unwrap(); + assert!(http_stream.is_chunked_encoding()); + let res = http_stream.read_body().await.unwrap(); + assert_eq!(res, None); + assert_eq!(http_stream.body_reader.body_state, ParseState::Complete(0)); + } + + #[tokio::test] + async fn read_with_body_chunked_single_read() { + init_log(); + let input1 = b"GET / HTTP/1.1\r\n"; + let input2 = b"Host: pingora.org\r\nTransfer-Encoding: chunked\r\n\r\n1\r\na\r\n"; + let input3 = b"0\r\n\r\n"; + let mock_io = Builder::new() + .read(&input1[..]) + .read(&input2[..]) + .read(&input3[..]) + .build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + http_stream.read_request().await.unwrap(); + assert!(http_stream.is_chunked_encoding()); + let res = http_stream.read_body().await.unwrap().unwrap(); + assert_eq!(res, BufRef::new(3, 1)); + assert_eq!( + http_stream.body_reader.body_state, + ParseState::Chunked(1, 0, 0, 0) + ); + let res = http_stream.read_body().await.unwrap(); + assert_eq!(res, None); + assert_eq!(http_stream.body_reader.body_state, ParseState::Complete(1)); + } + + #[tokio::test] + #[should_panic(expected = "There is still data left to read.")] + async fn read_invalid() { + let input1 = b"GET / HTP/1.1\r\n"; + let input2 = b"Host: pingora.org\r\n\r\n"; + let mock_io = Builder::new().read(&input1[..]).read(&input2[..]).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + let res = http_stream.read_request().await; + assert_eq!(&InvalidHTTPHeader, res.unwrap_err().etype()); + } + + async fn build_req(upgrade: &str, conn: &str) -> HttpSession { + let input = format!("GET / HTTP/1.1\r\nHost: pingora.org\r\nUpgrade: {upgrade}\r\nConnection: {conn}\r\n\r\n"); + let mock_io = Builder::new().read(input.as_bytes()).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + http_stream.read_request().await.unwrap(); + http_stream + } + + #[tokio::test] + async fn read_upgrade_req() { + // http 1.0 + let input = b"GET / HTTP/1.0\r\nHost: pingora.org\r\nUpgrade: websocket\r\nConnection: upgrade\r\n\r\n"; + let mock_io = Builder::new().read(&input[..]).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + http_stream.read_request().await.unwrap(); + assert!(!http_stream.is_upgrade_req()); + + // different method + let input = b"POST / HTTP/1.1\r\nHost: pingora.org\r\nUpgrade: websocket\r\nConnection: upgrade\r\n\r\n"; + let mock_io = Builder::new().read(&input[..]).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + http_stream.read_request().await.unwrap(); + assert!(http_stream.is_upgrade_req()); + + // missing upgrade header + let input = b"GET / HTTP/1.1\r\nHost: pingora.org\r\nConnection: upgrade\r\n\r\n"; + let mock_io = Builder::new().read(&input[..]).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + http_stream.read_request().await.unwrap(); + assert!(!http_stream.is_upgrade_req()); + + // no connection header + let input = b"GET / HTTP/1.1\r\nHost: pingora.org\r\nUpgrade: WebSocket\r\n\r\n"; + let mock_io = Builder::new().read(&input[..]).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + http_stream.read_request().await.unwrap(); + assert!(http_stream.is_upgrade_req()); + + assert!(build_req("websocket", "Upgrade").await.is_upgrade_req()); + + // mixed case + assert!(build_req("WebSocket", "Upgrade").await.is_upgrade_req()); + } + + #[tokio::test] + async fn read_upgrade_req_with_1xx_response() { + let input = b"GET / HTTP/1.1\r\nHost: pingora.org\r\nUpgrade: websocket\r\nConnection: upgrade\r\n\r\n"; + let mock_io = Builder::new() + .read(&input[..]) + .write(b"HTTP/1.1 100 Continue\r\n\r\n") + .build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + http_stream.read_request().await.unwrap(); + assert!(http_stream.is_upgrade_req()); + let mut response = ResponseHeader::build(StatusCode::CONTINUE, None).unwrap(); + response.set_version(http::Version::HTTP_11); + http_stream + .write_response_header(Box::new(response)) + .await + .unwrap(); + // 100 won't affect body state + assert!(!http_stream.is_body_done()); + } + + #[tokio::test] + async fn set_server_keepalive() { + // close + let input = b"GET / HTTP/1.1\r\nHost: pingora.org\r\nConnection: close\r\n\r\n"; + let mock_io = Builder::new().read(&input[..]).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + http_stream.read_request().await.unwrap(); + // verify close + assert_eq!(http_stream.keepalive_timeout, KeepaliveStatus::Off); + http_stream.set_server_keepalive(Some(60)); + // verify no change on override + assert_eq!(http_stream.keepalive_timeout, KeepaliveStatus::Off); + + // explicit keep-alive + let input = b"GET / HTTP/1.1\r\nHost: pingora.org\r\nConnection: keep-alive\r\n\r\n"; + let mock_io = Builder::new().read(&input[..]).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + // default is infinite for 1.1 + http_stream.read_request().await.unwrap(); + assert_eq!(http_stream.keepalive_timeout, KeepaliveStatus::Infinite); + http_stream.set_server_keepalive(Some(60)); + // override respected + assert_eq!( + http_stream.keepalive_timeout, + KeepaliveStatus::Timeout(Duration::from_secs(60)) + ); + + // not specified + let input = b"GET / HTTP/1.1\r\nHost: pingora.org\r\n\r\n"; + let mock_io = Builder::new().read(&input[..]).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + http_stream.read_request().await.unwrap(); + // default is infinite for 1.1 + assert_eq!(http_stream.keepalive_timeout, KeepaliveStatus::Infinite); + http_stream.set_server_keepalive(Some(60)); + // override respected + assert_eq!( + http_stream.keepalive_timeout, + KeepaliveStatus::Timeout(Duration::from_secs(60)) + ); + } + + #[tokio::test] + async fn write() { + let wire = b"HTTP/1.1 200 OK\r\nFoo: Bar\r\n\r\n"; + let mock_io = Builder::new().write(wire).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + let mut new_response = ResponseHeader::build(StatusCode::OK, None).unwrap(); + new_response.append_header("Foo", "Bar").unwrap(); + http_stream.update_resp_headers = false; + http_stream + .write_response_header_ref(&new_response) + .await + .unwrap(); + } + + #[tokio::test] + async fn write_informational() { + let wire = b"HTTP/1.1 100 Continue\r\n\r\nHTTP/1.1 200 OK\r\nFoo: Bar\r\n\r\n"; + let mock_io = Builder::new().write(wire).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + let response_100 = ResponseHeader::build(StatusCode::CONTINUE, None).unwrap(); + http_stream + .write_response_header_ref(&response_100) + .await + .unwrap(); + let mut response_200 = ResponseHeader::build(StatusCode::OK, None).unwrap(); + response_200.append_header("Foo", "Bar").unwrap(); + http_stream.update_resp_headers = false; + http_stream + .write_response_header_ref(&response_200) + .await + .unwrap(); + } + + #[tokio::test] + async fn write_101_switching_protocol() { + let wire = b"HTTP/1.1 101 Switching Protocols\r\nFoo: Bar\r\n\r\n"; + let wire_body = b"nPAYLOAD"; + let mock_io = Builder::new().write(wire).write(wire_body).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + let mut response_101 = + ResponseHeader::build(StatusCode::SWITCHING_PROTOCOLS, None).unwrap(); + response_101.append_header("Foo", "Bar").unwrap(); + http_stream + .write_response_header_ref(&response_101) + .await + .unwrap(); + let n = http_stream.write_body(wire_body).await.unwrap().unwrap(); + assert_eq!(wire_body.len(), n); + } + + #[tokio::test] + async fn write_body_cl() { + let wire_header = b"HTTP/1.1 200 OK\r\nContent-Length: 1\r\n\r\n"; + let wire_body = b"a"; + let mock_io = Builder::new().write(wire_header).write(wire_body).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + let mut new_response = ResponseHeader::build(StatusCode::OK, None).unwrap(); + new_response.append_header("Content-Length", "1").unwrap(); + http_stream.update_resp_headers = false; + http_stream + .write_response_header_ref(&new_response) + .await + .unwrap(); + assert_eq!( + http_stream.body_writer.body_mode, + BodyMode::ContentLength(1, 0) + ); + let n = http_stream.write_body(wire_body).await.unwrap().unwrap(); + assert_eq!(wire_body.len(), n); + let n = http_stream.finish_body().await.unwrap().unwrap(); + assert_eq!(wire_body.len(), n); + } + + #[tokio::test] + async fn write_body_http10() { + let wire_header = b"HTTP/1.1 200 OK\r\n\r\n"; + let wire_body = b"a"; + let mock_io = Builder::new().write(wire_header).write(wire_body).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + let new_response = ResponseHeader::build(StatusCode::OK, None).unwrap(); + http_stream.update_resp_headers = false; + http_stream + .write_response_header_ref(&new_response) + .await + .unwrap(); + assert_eq!(http_stream.body_writer.body_mode, BodyMode::HTTP1_0(0)); + let n = http_stream.write_body(wire_body).await.unwrap().unwrap(); + assert_eq!(wire_body.len(), n); + let n = http_stream.finish_body().await.unwrap().unwrap(); + assert_eq!(wire_body.len(), n); + } + + #[tokio::test] + async fn write_body_chunk() { + let wire_header = b"HTTP/1.1 200 OK\r\nTransfer-Encoding: chunked\r\n\r\n"; + let wire_body = b"1\r\na\r\n"; + let wire_end = b"0\r\n\r\n"; + let mock_io = Builder::new() + .write(wire_header) + .write(wire_body) + .write(wire_end) + .build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + let mut new_response = ResponseHeader::build(StatusCode::OK, None).unwrap(); + new_response + .append_header("Transfer-Encoding", "chunked") + .unwrap(); + http_stream.update_resp_headers = false; + http_stream + .write_response_header_ref(&new_response) + .await + .unwrap(); + assert_eq!( + http_stream.body_writer.body_mode, + BodyMode::ChunkedEncoding(0) + ); + let n = http_stream.write_body(b"a").await.unwrap().unwrap(); + assert_eq!(b"a".len(), n); + let n = http_stream.finish_body().await.unwrap().unwrap(); + assert_eq!(b"a".len(), n); + } + + #[tokio::test] + async fn read_with_illegal() { + init_log(); + let input1 = b"GET /a?q=b c HTTP/1.1\r\n"; + let input2 = b"Host: pingora.org\r\nContent-Length: 3\r\n\r\n"; + let input3 = b"abc"; + let mock_io = Builder::new() + .read(&input1[..]) + .read(&input2[..]) + .read(&input3[..]) + .build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + http_stream.read_request().await.unwrap(); + assert_eq!(http_stream.get_path(), &b"/a?q=b%20c"[..]); + let res = http_stream.read_body().await.unwrap().unwrap(); + assert_eq!(res, BufRef::new(0, 3)); + assert_eq!(http_stream.body_reader.body_state, ParseState::Complete(3)); + assert_eq!(input3, http_stream.get_body(&res)); + } + + #[test] + fn escape_illegal() { + init_log(); + // in query string + let input = BytesMut::from( + &b"GET /a?q=<\"b c\"> HTTP/1.1\r\nHost: pingora.org\r\nContent-Length: 3\r\n\r\n"[..], + ); + let output = escape_illegal_request_line(&input).unwrap(); + assert_eq!( + &output, + &b"GET /a?q=%3C%22b%20c%22%3E HTTP/1.1\r\nHost: pingora.org\r\nContent-Length: 3\r\n\r\n"[..] + ); + + // in path + let input = BytesMut::from( + &b"GET /a:\"bc\" HTTP/1.1\r\nHost: pingora.org\r\nContent-Length: 3\r\n\r\n"[..], + ); + let output = escape_illegal_request_line(&input).unwrap(); + assert_eq!( + &output, + &b"GET /a:%22bc%22 HTTP/1.1\r\nHost: pingora.org\r\nContent-Length: 3\r\n\r\n"[..] + ); + + // empty uri, unable to parse + let input = + BytesMut::from(&b"GET HTTP/1.1\r\nHost: pingora.org\r\nContent-Length: 3\r\n\r\n"[..]); + assert!(escape_illegal_request_line(&input).is_none()); + } + + #[tokio::test] + async fn test_write_body_buf() { + let wire = b"HTTP/1.1 200 OK\r\nFoo: Bar\r\n\r\n"; + let mock_io = Builder::new().write(wire).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + let mut new_response = ResponseHeader::build(StatusCode::OK, None).unwrap(); + new_response.append_header("Foo", "Bar").unwrap(); + http_stream.update_resp_headers = false; + http_stream + .write_response_header_ref(&new_response) + .await + .unwrap(); + let written = http_stream.write_body_buf().await.unwrap(); + assert!(written.is_none()); + } + + #[tokio::test] + async fn test_write_continue_resp() { + let wire = b"HTTP/1.1 100 Continue\r\n\r\n"; + let mock_io = Builder::new().write(wire).build(); + let mut http_stream = HttpSession::new(Box::new(mock_io)); + http_stream.write_continue_response().await.unwrap(); + } + + #[test] + fn test_is_upgrade_resp() { + let mut response = ResponseHeader::build(StatusCode::SWITCHING_PROTOCOLS, None).unwrap(); + response.set_version(http::Version::HTTP_11); + response.insert_header("Upgrade", "websocket").unwrap(); + response.insert_header("Connection", "upgrade").unwrap(); + assert!(is_upgrade_resp(&response)); + + // wrong http version + response.set_version(http::Version::HTTP_10); + response.insert_header("Upgrade", "websocket").unwrap(); + response.insert_header("Connection", "upgrade").unwrap(); + assert!(!is_upgrade_resp(&response)); + + // not 101 + response.set_status(http::StatusCode::OK).unwrap(); + response.set_version(http::Version::HTTP_11); + assert!(!is_upgrade_resp(&response)); + } +} + +#[cfg(test)] +mod test_sync { + use super::*; + use http::StatusCode; + use log::{debug, error}; + use std::str; + + fn init_log() { + let _ = env_logger::builder().is_test(true).try_init(); + } + + #[test] + fn test_response_to_wire() { + init_log(); + let mut new_response = ResponseHeader::build(StatusCode::OK, None).unwrap(); + new_response.append_header("Foo", "Bar").unwrap(); + let mut wire = BytesMut::with_capacity(INIT_HEADER_BUF_SIZE); + http_resp_header_to_buf(&new_response, &mut wire).unwrap(); + debug!("{}", str::from_utf8(wire.as_ref()).unwrap()); + let mut headers = [httparse::EMPTY_HEADER; 128]; + let mut resp = httparse::Response::new(&mut headers); + let result = resp.parse(wire.as_ref()); + match result { + Ok(_) => {} + Err(e) => error!("{:?}", e), + } + assert!(result.unwrap().is_complete()); + // FIXME: the order is not guaranteed + assert_eq!(b"Foo", headers[0].name.as_bytes()); + assert_eq!(b"Bar", headers[0].value); + } +} diff --git a/pingora-core/src/protocols/http/v2/client.rs b/pingora-core/src/protocols/http/v2/client.rs new file mode 100644 index 0000000..48551ec --- /dev/null +++ b/pingora-core/src/protocols/http/v2/client.rs @@ -0,0 +1,480 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! HTTP/2 client session and connection +// TODO: this module needs a refactor + +use bytes::Bytes; +use h2::client::{self, ResponseFuture, SendRequest}; +use h2::{Reason, RecvStream, SendStream}; +use http::HeaderMap; +use log::{debug, error, warn}; +use pingora_error::{Error, ErrorType, ErrorType::*, OrErr, Result, RetryType}; +use pingora_http::{RequestHeader, ResponseHeader}; +use pingora_timeout::timeout; +use std::sync::atomic::{AtomicBool, Ordering}; +use std::sync::Arc; +use std::time::Duration; +use tokio::io::{AsyncRead, AsyncWrite}; +use tokio::sync::watch; + +use crate::connectors::http::v2::ConnectionRef; +use crate::protocols::Digest; + +pub const PING_TIMEDOUT: ErrorType = ErrorType::new("PingTimedout"); + +pub struct Http2Session { + send_req: SendRequest<Bytes>, + send_body: Option<SendStream<Bytes>>, + resp_fut: Option<ResponseFuture>, + req_sent: Option<Box<RequestHeader>>, + response_header: Option<ResponseHeader>, + response_body_reader: Option<RecvStream>, + /// The read timeout, which will be applied to both reading the header and the body. + /// The timeout is reset on every read. This is not a timeout on the overall duration of the + /// response. + pub read_timeout: Option<Duration>, + pub(crate) conn: ConnectionRef, + // Indicate that whether a END_STREAM is already sent + ended: bool, +} + +impl Drop for Http2Session { + fn drop(&mut self) { + self.conn.release_stream(); + } +} + +impl Http2Session { + pub(crate) fn new(send_req: SendRequest<Bytes>, conn: ConnectionRef) -> Self { + Http2Session { + send_req, + send_body: None, + resp_fut: None, + req_sent: None, + response_header: None, + response_body_reader: None, + read_timeout: None, + conn, + ended: false, + } + } + + fn sanitize_request_header(req: &mut RequestHeader) -> Result<()> { + req.set_version(http::Version::HTTP_2); + if req.uri.authority().is_some() { + return Ok(()); + } + // use host header to populate :authority field + let Some(authority) = req.headers.get(http::header::HOST).map(|v| v.as_bytes()) else { + return Error::e_explain(InvalidHTTPHeader, "no authority header for h2"); + }; + let uri = http::uri::Builder::new() + .scheme("https") // fixed for now + .authority(authority) + .path_and_query(req.uri.path_and_query().as_ref().unwrap().as_str()) + .build(); + match uri { + Ok(uri) => { + req.set_uri(uri); + Ok(()) + } + Err(_) => Error::e_explain( + InvalidHTTPHeader, + format!("invalid authority from host {authority:?}"), + ), + } + } + + /// Write the request header to the server + pub fn write_request_header(&mut self, mut req: Box<RequestHeader>, end: bool) -> Result<()> { + if self.req_sent.is_some() { + // cannot send again, TODO: warn + return Ok(()); + } + Self::sanitize_request_header(&mut req)?; + let parts = req.as_owned_parts(); + let request = http::Request::from_parts(parts, ()); + // There is no write timeout for h2 because the actual write happens async from this fn + let (resp_fut, send_body) = self + .send_req + .send_request(request, end) + .or_err(H2Error, "while sending request") + .map_err(|e| self.handle_err(e))?; + self.req_sent = Some(req); + self.send_body = Some(send_body); + self.resp_fut = Some(resp_fut); + self.ended = self.ended || end; + + Ok(()) + } + + /// Write a request body chunk + pub fn write_request_body(&mut self, data: Bytes, end: bool) -> Result<()> { + if self.ended { + warn!("Try to write request body after end of stream, dropping the extra data"); + return Ok(()); + } + + let body_writer = self + .send_body + .as_mut() + .expect("Try to write request body before sending request header"); + + write_body(body_writer, data, end).map_err(|e| self.handle_err(e))?; + self.ended = self.ended || end; + Ok(()) + } + + /// Signal that the request body has ended + pub fn finish_request_body(&mut self) -> Result<()> { + if self.ended { + return Ok(()); + } + + let body_writer = self + .send_body + .as_mut() + .expect("Try to finish request stream before sending request header"); + + // Just send an empty data frame with end of stream set + body_writer + .send_data("".into(), true) + .or_err(WriteError, "while writing empty h2 request body") + .map_err(|e| self.handle_err(e))?; + self.ended = true; + Ok(()) + } + + /// Read the response header + pub async fn read_response_header(&mut self) -> Result<()> { + // TODO: how to read 1xx headers? + // https://github.com/hyperium/h2/issues/167 + + if self.response_header.is_some() { + panic!("H2 response header is already read") + } + + let Some(resp_fut) = self.resp_fut.take() else { + panic!("Try to response header is already read") + }; + + let res = match self.read_timeout { + Some(t) => timeout(t, resp_fut) + .await + .map_err(|_| Error::explain(ReadTimedout, "while reading h2 response header")) + .map_err(|e| self.handle_err(e))?, + None => resp_fut.await, + }; + let (resp, body_reader) = res.map_err(handle_read_header_error)?.into_parts(); + self.response_header = Some(resp.into()); + self.response_body_reader = Some(body_reader); + + Ok(()) + } + + /// Read the response body + /// + /// `None` means, no more body to read + pub async fn read_response_body(&mut self) -> Result<Option<Bytes>> { + let Some(body_reader) = self.response_body_reader.as_mut() else { + // req is not sent or response is already read + // TODO: warn + return Ok(None); + }; + + if body_reader.is_end_stream() { + return Ok(None); + } + + let fut = body_reader.data(); + let res = match self.read_timeout { + Some(t) => timeout(t, fut) + .await + .map_err(|_| Error::explain(ReadTimedout, "while reading h2 response body"))?, + None => fut.await, + }; + let body = res + .transpose() + .or_err(ReadError, "while read h2 response body") + .map_err(|mut e| { + // cannot use handle_err() because of borrow checker + if self.conn.ping_timedout() { + e.etype = PING_TIMEDOUT; + } + e + })?; + + if let Some(data) = body.as_ref() { + body_reader + .flow_control() + .release_capacity(data.len()) + .or_err(ReadError, "while releasing h2 response body capacity")?; + } + + Ok(body) + } + + /// Whether the response has ended + pub fn response_finished(&self) -> bool { + // if response_body_reader doesn't exist, the response is not even read yet + self.response_body_reader + .as_ref() + .map_or(false, |reader| reader.is_end_stream()) + } + + /// Read the optional trailer headers + pub async fn read_trailers(&mut self) -> Result<Option<HeaderMap>> { + let Some(reader) = self.response_body_reader.as_mut() else { + // response is not even read + // TODO: warn + return Ok(None); + }; + let fut = reader.trailers(); + + let res = match self.read_timeout { + Some(t) => timeout(t, fut) + .await + .map_err(|_| Error::explain(ReadTimedout, "while reading h2 trailer")) + .map_err(|e| self.handle_err(e))?, + None => fut.await, + }; + match res { + Ok(t) => Ok(t), + Err(e) => { + // GOAWAY with no error: this is graceful shutdown, continue as if no trailer + // RESET_STREAM with no error: https://datatracker.ietf.org/doc/html/rfc9113#section-8.1: + // this is to signal client to stop uploading request without breaking the response. + // TODO: should actually stop uploading + // TODO: should we try reading again? + // TODO: handle this when reading headers and body as well + // https://github.com/hyperium/h2/issues/741 + + if (e.is_go_away() || e.is_reset()) + && e.is_remote() + && e.reason() == Some(Reason::NO_ERROR) + { + Ok(None) + } else { + Err(e) + } + } + } + .or_err(ReadError, "while reading h2 trailers") + } + + /// The response header if it is already read + pub fn response_header(&self) -> Option<&ResponseHeader> { + self.response_header.as_ref() + } + + /// Give up the http session abruptly. + pub fn shutdown(&mut self) { + if !self.ended || !self.response_finished() { + if let Some(send_body) = self.send_body.as_mut() { + send_body.send_reset(h2::Reason::INTERNAL_ERROR) + } + } + } + + /// Drop everything in this h2 stream. Return the connection ref. + /// After this function the underlying h2 connection should already notify the closure of this + /// stream so that another stream can be created if needed. + pub(crate) fn conn(&self) -> ConnectionRef { + self.conn.clone() + } + + /// Whether ping timeout occurred. After a ping timeout, the h2 connection will be terminated. + /// Ongoing h2 streams will receive an stream/connection error. The streams should check this + /// flag to tell whether the error is triggered by the timeout. + pub(crate) fn ping_timedout(&self) -> bool { + self.conn.ping_timedout() + } + + /// Return the [Digest] of the connection + /// + /// For reused connection, the timing in the digest will reflect its initial handshakes + /// The caller should check if the connection is reused to avoid misuse the timing field + pub fn digest(&self) -> Option<&Digest> { + Some(self.conn.digest()) + } + + /// the FD of the underlying connection + pub fn fd(&self) -> i32 { + self.conn.id() + } + + /// take the body sender to another task to perform duplex read and write + pub fn take_request_body_writer(&mut self) -> Option<SendStream<Bytes>> { + self.send_body.take() + } + + fn handle_err(&self, mut e: Box<Error>) -> Box<Error> { + if self.ping_timedout() { + e.etype = PING_TIMEDOUT; + } + e + } +} + +/// A helper function to write the request body +pub fn write_body(send_body: &mut SendStream<Bytes>, data: Bytes, end: bool) -> Result<()> { + let data_len = data.len(); + send_body.reserve_capacity(data_len); + send_body + .send_data(data, end) + .or_err(WriteError, "while writing h2 request body") +} + +/* helper functions */ + +/* Types of errors during h2 header read + 1. peer requests to downgrade to h1, mostly IIS server for NTLM: we will downgrade and retry + 2. peer sends invalid h2 frames, usually sending h1 only header: we will downgrade and retry + 3. peer sends GO_AWAY(NO_ERROR) on reused conn, usually hit http2_max_requests: we will retry + 4. peer IO error on reused conn, usually firewall kills old conn: we will retry + 5. All other errors will terminate the request +*/ +fn handle_read_header_error(e: h2::Error) -> Box<Error> { + if e.is_remote() + && e.reason() + .map_or(false, |r| r == h2::Reason::HTTP_1_1_REQUIRED) + { + let mut err = Error::because(H2Downgrade, "while reading h2 header", e); + err.retry = true.into(); + err + } else if e.is_go_away() + && e.is_library() + && e.reason() + .map_or(false, |r| r == h2::Reason::PROTOCOL_ERROR) + { + // remote send invalid H2 responses + let mut err = Error::because(InvalidH2, "while reading h2 header", e); + err.retry = true.into(); + err + } else if e.is_go_away() + && e.is_remote() + && e.reason().map_or(false, |r| r == h2::Reason::NO_ERROR) + { + // is_go_away: retry via another connection, this connection is being teardown + // only retry if the connection is reused + let mut err = Error::because(H2Error, "while reading h2 header", e); + err.retry = RetryType::ReusedOnly; + err + } else if e.is_io() { + // is_io: typical if a previously reused connection silently drops it + // only retry if the connection is reused + let true_io_error = e.get_io().unwrap().raw_os_error().is_some(); + let mut err = Error::because(ReadError, "while reading h2 header", e); + if true_io_error { + err.retry = RetryType::ReusedOnly; + } // else could be TLS error, which is unsafe to retry + err + } else { + Error::because(H2Error, "while reading h2 header", e) + } +} + +use tokio::sync::oneshot; + +pub async fn drive_connection<S>( + mut c: client::Connection<S>, + id: i32, + closed: watch::Sender<bool>, + ping_interval: Option<Duration>, + ping_timeout_occurred: Arc<AtomicBool>, +) where + S: AsyncRead + AsyncWrite + Send + Unpin, +{ + let interval = ping_interval.unwrap_or(Duration::ZERO); + if !interval.is_zero() { + // for ping to inform this fn to drop the connection + let (tx, rx) = oneshot::channel::<()>(); + // for this fn to inform ping to give up when it is already dropped + let dropped = Arc::new(AtomicBool::new(false)); + let dropped2 = dropped.clone(); + + if let Some(ping_pong) = c.ping_pong() { + pingora_runtime::current_handle().spawn(async move { + do_ping_pong(ping_pong, interval, tx, dropped2, id).await; + }); + } else { + warn!("Cannot get ping-pong handler from h2 connection"); + } + + tokio::select! { + r = c => match r { + Ok(_) => debug!("H2 connection finished fd: {id}"), + Err(e) => debug!("H2 connection fd: {id} errored: {e:?}"), + }, + r = rx => match r { + Ok(_) => { + ping_timeout_occurred.store(true, Ordering::Relaxed); + warn!("H2 connection Ping timeout/Error fd: {id}, closing conn"); + }, + Err(e) => warn!("H2 connection Ping Rx error {e:?}"), + }, + }; + + dropped.store(true, Ordering::Relaxed); + } else { + match c.await { + Ok(_) => debug!("H2 connection finished fd: {id}"), + Err(e) => debug!("H2 connection fd: {id} errored: {e:?}"), + } + } + let _ = closed.send(true); +} + +const PING_TIMEOUT: Duration = Duration::from_secs(5); + +async fn do_ping_pong( + mut ping_pong: h2::PingPong, + interval: Duration, + tx: oneshot::Sender<()>, + dropped: Arc<AtomicBool>, + id: i32, +) { + // delay before sending the first ping, no need to race with the first request + tokio::time::sleep(interval).await; + loop { + if dropped.load(Ordering::Relaxed) { + break; + } + let ping_fut = ping_pong.ping(h2::Ping::opaque()); + debug!("H2 fd: {id} ping sent"); + match tokio::time::timeout(PING_TIMEOUT, ping_fut).await { + Err(_) => { + error!("H2 fd: {id} ping timeout"); + let _ = tx.send(()); + break; + } + Ok(r) => match r { + Ok(_) => { + debug!("H2 fd: {} pong received", id); + tokio::time::sleep(interval).await; + } + Err(e) => { + if dropped.load(Ordering::Relaxed) { + // drive_connection() exits first, no need to error again + break; + } + error!("H2 fd: {id} ping error: {e}"); + let _ = tx.send(()); + break; + } + }, + } + } +} diff --git a/pingora-core/src/protocols/http/v2/mod.rs b/pingora-core/src/protocols/http/v2/mod.rs new file mode 100644 index 0000000..8b10430 --- /dev/null +++ b/pingora-core/src/protocols/http/v2/mod.rs @@ -0,0 +1,18 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! HTTP/2 implementation + +pub mod client; +pub mod server; diff --git a/pingora-core/src/protocols/http/v2/server.rs b/pingora-core/src/protocols/http/v2/server.rs new file mode 100644 index 0000000..6ec75e8 --- /dev/null +++ b/pingora-core/src/protocols/http/v2/server.rs @@ -0,0 +1,488 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! HTTP/2 server session + +use bytes::Bytes; +use futures::Future; +use h2::server; +use h2::server::SendResponse; +use h2::{RecvStream, SendStream}; +use http::header::HeaderName; +use http::{header, Response}; +use log::{debug, warn}; +use pingora_http::{RequestHeader, ResponseHeader}; + +use crate::protocols::http::body_buffer::FixedBuffer; +use crate::protocols::http::date::get_cached_date; +use crate::protocols::http::v1::client::http_req_header_to_wire; +use crate::protocols::http::HttpTask; +use crate::protocols::Stream; +use crate::{Error, ErrorType, OrErr, Result}; + +const BODY_BUF_LIMIT: usize = 1024 * 64; + +type H2Connection<S> = server::Connection<S, Bytes>; + +pub use h2::server::Builder as H2Options; + +/// Perform HTTP/2 connection handshake with an established (TLS) connection. +/// +/// The optional `options` allow to adjust certain HTTP/2 parameters and settings. +/// See [`H2Options`] for more details. +pub async fn handshake(io: Stream, options: Option<H2Options>) -> Result<H2Connection<Stream>> { + let options = options.unwrap_or_default(); + let res = options.handshake(io).await; + match res { + Ok(connection) => { + debug!("H2 handshake done."); + Ok(connection) + } + Err(e) => Error::e_because( + ErrorType::HandshakeError, + "while h2 handshaking with client", + e, + ), + } +} + +use futures::task::Context; +use futures::task::Poll; +use std::pin::Pin; +/// The future to poll for an idle session. +/// +/// Calling `.await` in this object will not return until the client decides to close this stream. +pub struct Idle<'a>(&'a mut HttpSession); + +impl<'a> Future for Idle<'a> { + type Output = Result<h2::Reason>; + + fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> { + if let Some(body_writer) = self.0.send_response_body.as_mut() { + body_writer.poll_reset(cx) + } else { + self.0.send_response.poll_reset(cx) + } + .map_err(|e| Error::because(ErrorType::H2Error, "downstream error while idling", e)) + } +} + +/// HTTP/2 server session +pub struct HttpSession { + request_header: RequestHeader, + request_body_reader: RecvStream, + send_response: SendResponse<Bytes>, + send_response_body: Option<SendStream<Bytes>>, + // Remember what has been written + response_written: Option<Box<ResponseHeader>>, + // Indicate that whether a END_STREAM is already sent + // in order to tell whether needs to send one extra FRAME when this response finishes + ended: bool, + // How many request body bytes have been read so far. + body_read: usize, + // How many response body bytes have been sent so far. + body_sent: usize, + // buffered request body for retry logic + retry_buffer: Option<FixedBuffer>, +} + +impl HttpSession { + /// Create a new [`HttpSession`] from the HTTP/2 connection. + /// This function returns a new HTTP/2 session when the provided HTTP/2 connection, `conn`, + /// establishes a new HTTP/2 stream to this server. + /// + /// Note: in order to handle all **existing** and new HTTP/2 sessions, the server must call + /// this function in a loop until the client decides to close the connection. + /// + /// `None` will be returned when the connection is closing so that the loop can exit. + pub async fn from_h2_conn(conn: &mut H2Connection<Stream>) -> Result<Option<Self>> { + // NOTE: conn.accept().await is what drives the entire connection. + let res = conn.accept().await.transpose().or_err( + ErrorType::H2Error, + "while accepting new downstream requests", + )?; + + Ok(res.map(|(req, send_response)| { + let (request_header, request_body_reader) = req.into_parts(); + HttpSession { + request_header: request_header.into(), + request_body_reader, + send_response, + send_response_body: None, + response_written: None, + ended: false, + body_read: 0, + body_sent: 0, + retry_buffer: None, + } + })) + } + + /// The request sent from the client + /// + /// Different from its HTTP/1.X counterpart, this function never panics as the request is already + /// read when established a new HTTP/2 stream. + pub fn req_header(&self) -> &RequestHeader { + &self.request_header + } + + /// A mutable reference to request sent from the client + /// + /// Different from its HTTP/1.X counterpart, this function never panics as the request is already + /// read when established a new HTTP/2 stream. + pub fn req_header_mut(&mut self) -> &mut RequestHeader { + &mut self.request_header + } + + /// Read request body bytes. `None` when there is no more body to read. + pub async fn read_body_bytes(&mut self) -> Result<Option<Bytes>> { + // TODO: timeout + let data = self.request_body_reader.data().await.transpose().or_err( + ErrorType::ReadError, + "while reading downstream request body", + )?; + if let Some(data) = data.as_ref() { + self.body_read += data.len(); + if let Some(buffer) = self.retry_buffer.as_mut() { + buffer.write_to_buffer(data); + } + let _ = self + .request_body_reader + .flow_control() + .release_capacity(data.len()); + } + Ok(data) + } + + // the write_* don't have timeouts because the actual writing happens on the connection + // not here. + + /// Write the response header to the client. + /// # the `end` flag + /// `end` marks the end of this session. + /// If the `end` flag is set, no more header or body can be sent to the client. + pub fn write_response_header( + &mut self, + mut header: Box<ResponseHeader>, + end: bool, + ) -> Result<()> { + if self.ended { + // TODO: error or warn? + return Ok(()); + } + + // FIXME: we should ignore 1xx header because send_response() can only be called once + // https://github.com/hyperium/h2/issues/167 + + if let Some(resp) = self.response_written.as_ref() { + if !resp.status.is_informational() { + warn!("Respond header is already sent, cannot send again"); + return Ok(()); + } + } + + // no need to add these headers to 1xx responses + if !header.status.is_informational() { + /* update headers */ + header.insert_header(header::DATE, get_cached_date())?; + } + + // remove other h1 hop headers that cannot be present in H2 + // https://httpwg.org/specs/rfc7540.html#n-connection-specific-header-fields + header.remove_header(&header::TRANSFER_ENCODING); + header.remove_header(&header::CONNECTION); + header.remove_header(&header::UPGRADE); + header.remove_header(&HeaderName::from_static("keep-alive")); + header.remove_header(&HeaderName::from_static("proxy-connection")); + + let resp = Response::from_parts(header.as_owned_parts(), ()); + + let body_writer = self.send_response.send_response(resp, end).or_err( + ErrorType::WriteError, + "while writing h2 response to downstream", + )?; + + self.response_written = Some(header); + self.send_response_body = Some(body_writer); + self.ended = self.ended || end; + Ok(()) + } + + /// Write response body to the client. See [Self::write_response_header] for how to use `end`. + pub fn write_body(&mut self, data: Bytes, end: bool) -> Result<()> { + if self.ended { + // NOTE: in h1, we also track to see if content-length matches the data + // We have not tracked that in h2 + warn!("Try to write body after end of stream, dropping the extra data"); + return Ok(()); + } + let Some(writer) = self.send_response_body.as_mut() else { + return Err(Error::explain( + ErrorType::H2Error, + "try to send body before header is sent", + )); + }; + let data_len = data.len(); + writer.reserve_capacity(data_len); + writer.send_data(data, end).or_err( + ErrorType::WriteError, + "while writing h2 response body to downstream", + )?; + self.body_sent += data_len; + self.ended = self.ended || end; + Ok(()) + } + + /// Similar to [Self::write_response_header], this function takes a reference instead + pub fn write_response_header_ref(&mut self, header: &ResponseHeader, end: bool) -> Result<()> { + self.write_response_header(Box::new(header.clone()), end) + } + + // TODO: trailer + + /// Mark the session end. If no `end` flag is already set before this call, this call will + /// signal the client. Otherwise this call does nothing. + /// + /// Dropping this object without sending `end` will cause an error to the client, which will cause + /// the client to treat this session as bad or incomplete. + pub fn finish(&mut self) -> Result<()> { + if self.ended { + // already ended the stream + return Ok(()); + } + if let Some(writer) = self.send_response_body.as_mut() { + // use an empty data frame to signal the end + writer.send_data("".into(), true).or_err( + ErrorType::WriteError, + "while writing h2 response body to downstream", + )?; + self.ended = true; + }; + // else: the response header is not sent, do nothing now. + // When send_response_body is dropped, an RST_STREAM will be sent + + Ok(()) + } + + pub fn response_duplex_vec(&mut self, tasks: Vec<HttpTask>) -> Result<bool> { + let mut end_stream = false; + for task in tasks.into_iter() { + end_stream = match task { + HttpTask::Header(header, end) => { + self.write_response_header(header, end) + .map_err(|e| e.into_down())?; + end + } + HttpTask::Body(data, end) => match data { + Some(d) => { + if !d.is_empty() { + self.write_body(d, end).map_err(|e| e.into_down())?; + } + end + } + None => end, + }, + HttpTask::Trailer(_) => true, // trailer is not supported yet + HttpTask::Done => { + self.finish().map_err(|e| e.into_down())?; + return Ok(true); + } + HttpTask::Failed(e) => { + return Err(e); + } + } || end_stream // safe guard in case `end` in tasks flips from true to false + } + Ok(end_stream) + } + + /// Return a string `$METHOD $PATH $HOST`. Mostly for logging and debug purpose + pub fn request_summary(&self) -> String { + format!( + "{} {}, Host: {}", + self.request_header.method, + self.request_header.uri, + self.request_header + .headers + .get(header::HOST) + .map(|v| String::from_utf8_lossy(v.as_bytes())) + .unwrap_or_default() + ) + } + + /// Return the written response header. `None` if it is not written yet. + pub fn response_written(&self) -> Option<&ResponseHeader> { + self.response_written.as_deref() + } + + /// Give up the stream abruptly. + /// + /// This will send a `INTERNAL_ERROR` stream error to the client + pub fn shutdown(&mut self) { + if !self.ended { + self.send_response.send_reset(h2::Reason::INTERNAL_ERROR); + } + } + + // This is a hack for pingora-proxy to create subrequests from h2 server session + // TODO: be able to convert from h2 to h1 subrequest + pub fn pseudo_raw_h1_request_header(&self) -> Bytes { + let buf = http_req_header_to_wire(&self.request_header).unwrap(); // safe, None only when version unknown + buf.freeze() + } + + /// Whether there is no more body to read + pub fn is_body_done(&self) -> bool { + self.request_body_reader.is_end_stream() + } + + /// Whether there is any body to read. + pub fn is_body_empty(&self) -> bool { + self.body_read == 0 + && (self.is_body_done() + || self + .request_header + .headers + .get(header::CONTENT_LENGTH) + .map_or(false, |cl| cl.as_bytes() == b"0")) + } + + pub fn retry_buffer_truncated(&self) -> bool { + self.retry_buffer + .as_ref() + .map_or_else(|| false, |r| r.is_truncated()) + } + + pub fn enable_retry_buffering(&mut self) { + if self.retry_buffer.is_none() { + self.retry_buffer = Some(FixedBuffer::new(BODY_BUF_LIMIT)) + } + } + + pub fn get_retry_buffer(&self) -> Option<Bytes> { + self.retry_buffer.as_ref().and_then(|b| { + if b.is_truncated() { + None + } else { + b.get_buffer() + } + }) + } + + /// `async fn idle() -> Result<Reason, Error>;` + /// This async fn will be pending forever until the client closes the stream/connection + /// This function is used for watching client status so that the server is able to cancel + /// its internal tasks as the client waiting for the tasks goes away + pub fn idle(&mut self) -> Idle { + Idle(self) + } + + /// Similar to `read_body_bytes()` but will be pending after Ok(None) is returned, + /// until the client closes the connection + pub async fn read_body_or_idle(&mut self, no_body_expected: bool) -> Result<Option<Bytes>> { + if no_body_expected || self.is_body_done() { + let reason = self.idle().await?; + Error::e_explain( + ErrorType::H2Error, + format!("Client closed H2, reason: {reason}"), + ) + } else { + self.read_body_bytes().await + } + } + + /// How many response body bytes sent to the client + pub fn body_bytes_sent(&self) -> usize { + self.body_sent + } +} + +#[cfg(test)] +mod test { + use super::*; + use http::{Method, Request}; + use tokio::io::duplex; + + #[tokio::test] + async fn test_server_handshake_accept_request() { + let (client, server) = duplex(65536); + let client_body = "test client body"; + let server_body = "test server body"; + + tokio::spawn(async move { + let (h2, connection) = h2::client::handshake(client).await.unwrap(); + tokio::spawn(async move { + connection.await.unwrap(); + }); + + let mut h2 = h2.ready().await.unwrap(); + + let request = Request::builder() + .method(Method::GET) + .uri("https://www.example.com/") + .body(()) + .unwrap(); + + let (response, mut req_body) = h2.send_request(request, false).unwrap(); + req_body.reserve_capacity(client_body.len()); + req_body.send_data(client_body.into(), true).unwrap(); + + let (head, mut body) = response.await.unwrap().into_parts(); + assert_eq!(head.status, 200); + let data = body.data().await.unwrap().unwrap(); + assert_eq!(data, server_body); + }); + + let mut connection = handshake(Box::new(server), None).await.unwrap(); + + while let Some(mut http) = HttpSession::from_h2_conn(&mut connection).await.unwrap() { + tokio::spawn(async move { + let req = http.req_header(); + assert_eq!(req.method, Method::GET); + assert_eq!(req.uri, "https://www.example.com/"); + + http.enable_retry_buffering(); + + assert!(!http.is_body_empty()); + assert!(!http.is_body_done()); + + let body = http.read_body_or_idle(false).await.unwrap().unwrap(); + assert_eq!(body, client_body); + assert!(http.is_body_done()); + + let retry_body = http.get_retry_buffer().unwrap(); + assert_eq!(retry_body, client_body); + + // test idling before response header is sent + tokio::select! { + _ = http.idle() => {panic!("downstream should be idling")}, + _= tokio::time::sleep(tokio::time::Duration::from_secs(1)) => {} + } + + let response_header = Box::new(ResponseHeader::build(200, None).unwrap()); + http.write_response_header(response_header, false).unwrap(); + + // test idling after response header is sent + tokio::select! { + _ = http.read_body_or_idle(false) => {panic!("downstream should be idling")}, + _= tokio::time::sleep(tokio::time::Duration::from_secs(1)) => {} + } + + // end: false here to verify finish() closes the stream nicely + http.write_body(server_body.into(), false).unwrap(); + + http.finish().unwrap(); + }); + } + } +} diff --git a/pingora-core/src/protocols/l4/ext.rs b/pingora-core/src/protocols/l4/ext.rs new file mode 100644 index 0000000..331a852 --- /dev/null +++ b/pingora-core/src/protocols/l4/ext.rs @@ -0,0 +1,297 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Extensions to the regular TCP APIs + +#![allow(non_camel_case_types)] + +use libc::socklen_t; +#[cfg(target_os = "linux")] +use libc::{c_int, c_void}; +use pingora_error::{Error, ErrorType::*, OrErr, Result}; +use std::io::{self, ErrorKind}; +use std::mem; +use std::net::SocketAddr; +use std::os::unix::io::{AsRawFd, RawFd}; +use std::time::Duration; +use tokio::net::{TcpSocket, TcpStream, UnixStream}; + +/// The (copy of) the kernel struct tcp_info returns +#[repr(C)] +#[derive(Copy, Clone, Debug)] +pub struct TCP_INFO { + tcpi_state: u8, + tcpi_ca_state: u8, + tcpi_retransmits: u8, + tcpi_probes: u8, + tcpi_backoff: u8, + tcpi_options: u8, + tcpi_snd_wscale_4_rcv_wscale_4: u8, + tcpi_delivery_rate_app_limited: u8, + tcpi_rto: u32, + tcpi_ato: u32, + tcpi_snd_mss: u32, + tcpi_rcv_mss: u32, + tcpi_unacked: u32, + tcpi_sacked: u32, + tcpi_lost: u32, + tcpi_retrans: u32, + tcpi_fackets: u32, + tcpi_last_data_sent: u32, + tcpi_last_ack_sent: u32, + tcpi_last_data_recv: u32, + tcpi_last_ack_recv: u32, + tcpi_pmtu: u32, + tcpi_rcv_ssthresh: u32, + pub tcpi_rtt: u32, + tcpi_rttvar: u32, + /* uncomment these field if needed + tcpi_snd_ssthresh: u32, + tcpi_snd_cwnd: u32, + tcpi_advmss: u32, + tcpi_reordering: u32, + tcpi_rcv_rtt: u32, + tcpi_rcv_space: u32, + tcpi_total_retrans: u32, + tcpi_pacing_rate: u64, + tcpi_max_pacing_rate: u64, + tcpi_bytes_acked: u64, + tcpi_bytes_received: u64, + tcpi_segs_out: u32, + tcpi_segs_in: u32, + tcpi_notsent_bytes: u32, + tcpi_min_rtt: u32, + tcpi_data_segs_in: u32, + tcpi_data_segs_out: u32, + tcpi_delivery_rate: u64, + */ + /* and more, see include/linux/tcp.h */ +} + +impl TCP_INFO { + /// Create a new zeroed out [`TCP_INFO`] + pub unsafe fn new() -> Self { + mem::zeroed() + } + + /// Return the size of [`TCP_INFO`] + pub fn len() -> socklen_t { + mem::size_of::<Self>() as socklen_t + } +} + +#[cfg(target_os = "linux")] +fn set_opt<T: Copy>(sock: c_int, opt: c_int, val: c_int, payload: T) -> io::Result<()> { + unsafe { + let payload = &payload as *const T as *const c_void; + cvt_linux_error(libc::setsockopt( + sock, + opt, + val, + payload as *const _, + mem::size_of::<T>() as socklen_t, + ))?; + Ok(()) + } +} + +#[cfg(target_os = "linux")] +fn get_opt<T>( + sock: c_int, + opt: c_int, + val: c_int, + payload: &mut T, + size: &mut socklen_t, +) -> io::Result<()> { + unsafe { + let payload = payload as *mut T as *mut c_void; + cvt_linux_error(libc::getsockopt(sock, opt, val, payload as *mut _, size))?; + Ok(()) + } +} + +#[cfg(target_os = "linux")] +fn cvt_linux_error(t: i32) -> io::Result<i32> { + if t == -1 { + Err(io::Error::last_os_error()) + } else { + Ok(t) + } +} + +#[cfg(target_os = "linux")] +fn ip_bind_addr_no_port(fd: RawFd, val: bool) -> io::Result<()> { + const IP_BIND_ADDRESS_NO_PORT: i32 = 24; + + set_opt(fd, libc::IPPROTO_IP, IP_BIND_ADDRESS_NO_PORT, val as c_int) +} + +#[cfg(not(target_os = "linux"))] +fn ip_bind_addr_no_port(_fd: RawFd, _val: bool) -> io::Result<()> { + Ok(()) +} + +#[cfg(target_os = "linux")] +fn set_so_keepalive(fd: RawFd, val: bool) -> io::Result<()> { + set_opt(fd, libc::SOL_SOCKET, libc::SO_KEEPALIVE, val as c_int) +} + +#[cfg(target_os = "linux")] +fn set_so_keepalive_idle(fd: RawFd, val: Duration) -> io::Result<()> { + set_opt( + fd, + libc::IPPROTO_TCP, + libc::TCP_KEEPIDLE, + val.as_secs() as c_int, // only the seconds part of val is used + ) +} + +#[cfg(target_os = "linux")] +fn set_so_keepalive_interval(fd: RawFd, val: Duration) -> io::Result<()> { + set_opt( + fd, + libc::IPPROTO_TCP, + libc::TCP_KEEPINTVL, + val.as_secs() as c_int, // only the seconds part of val is used + ) +} + +#[cfg(target_os = "linux")] +fn set_so_keepalive_count(fd: RawFd, val: usize) -> io::Result<()> { + set_opt(fd, libc::IPPROTO_TCP, libc::TCP_KEEPCNT, val as c_int) +} + +#[cfg(target_os = "linux")] +fn set_keepalive(fd: RawFd, ka: &TcpKeepalive) -> io::Result<()> { + set_so_keepalive(fd, true)?; + set_so_keepalive_idle(fd, ka.idle)?; + set_so_keepalive_interval(fd, ka.interval)?; + set_so_keepalive_count(fd, ka.count) +} + +#[cfg(not(target_os = "linux"))] +fn set_keepalive(_fd: RawFd, _ka: &TcpKeepalive) -> io::Result<()> { + Ok(()) +} + +/// Get the kernel TCP_INFO for the given FD. +#[cfg(target_os = "linux")] +pub fn get_tcp_info(fd: RawFd) -> io::Result<TCP_INFO> { + let mut tcp_info = unsafe { TCP_INFO::new() }; + let mut data_len: socklen_t = TCP_INFO::len(); + get_opt( + fd, + libc::IPPROTO_TCP, + libc::TCP_INFO, + &mut tcp_info, + &mut data_len, + )?; + if data_len != TCP_INFO::len() { + return Err(std::io::Error::new( + std::io::ErrorKind::Other, + "TCP_INFO struct size mismatch", + )); + } + Ok(tcp_info) +} + +#[cfg(not(target_os = "linux"))] +pub fn get_tcp_info(_fd: RawFd) -> io::Result<TCP_INFO> { + Ok(unsafe { TCP_INFO::new() }) +} + +/* + * this extention is needed until the following are addressed + * https://github.com/tokio-rs/tokio/issues/1543 + * https://github.com/tokio-rs/mio/issues/1257 + * https://github.com/tokio-rs/mio/issues/1211 + */ +/// connect() to the given address while optionally bind to the specific source address +/// +/// `IP_BIND_ADDRESS_NO_PORT` is used. +pub async fn connect(addr: &SocketAddr, bind_to: Option<&SocketAddr>) -> Result<TcpStream> { + let socket = if addr.is_ipv4() { + TcpSocket::new_v4() + } else { + TcpSocket::new_v6() + } + .or_err(SocketError, "failed to create socket")?; + + if cfg!(target_os = "linux") { + ip_bind_addr_no_port(socket.as_raw_fd(), true) + .or_err(SocketError, "failed to set socket opts")?; + + if let Some(baddr) = bind_to { + socket + .bind(*baddr) + .or_err_with(BindError, || format!("failed to bind to socket {}", *baddr))?; + }; + } + // TODO: add support for bind on other platforms + + socket + .connect(*addr) + .await + .map_err(|e| wrap_os_connect_error(e, format!("Fail to connect to {}", *addr))) +} + +/// connect() to the given Unix domain socket +pub async fn connect_uds(path: &std::path::Path) -> Result<UnixStream> { + UnixStream::connect(path) + .await + .map_err(|e| wrap_os_connect_error(e, format!("Fail to connect to {}", path.display()))) +} + +fn wrap_os_connect_error(e: std::io::Error, context: String) -> Box<Error> { + match e.kind() { + ErrorKind::ConnectionRefused => Error::because(ConnectRefused, context, e), + ErrorKind::TimedOut => Error::because(ConnectTimedout, context, e), + ErrorKind::PermissionDenied | ErrorKind::AddrInUse | ErrorKind::AddrNotAvailable => { + Error::because(InternalError, context, e) + } + _ => match e.raw_os_error() { + Some(code) => match code { + libc::ENETUNREACH | libc::EHOSTUNREACH => { + Error::because(ConnectNoRoute, context, e) + } + _ => Error::because(ConnectError, context, e), + }, + None => Error::because(ConnectError, context, e), + }, + } +} + +/// The configuration for TCP keepalive +#[derive(Clone, Debug)] +pub struct TcpKeepalive { + /// The time a connection needs to be idle before TCP begins sending out keep-alive probes. + pub idle: Duration, + /// The number of seconds between TCP keep-alive probes. + pub interval: Duration, + /// The maximum number of TCP keep-alive probes to send before giving up and killing the connection + pub count: usize, +} + +impl std::fmt::Display for TcpKeepalive { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + write!(f, "{:?}/{:?}/{}", self.idle, self.interval, self.count) + } +} + +/// Apply the given TCP keepalive settings to the given connection +pub fn set_tcp_keepalive(stream: &TcpStream, ka: &TcpKeepalive) -> Result<()> { + let fd = stream.as_raw_fd(); + // TODO: check localhost or if keepalive is already set + set_keepalive(fd, ka).or_err(ConnectError, "failed to set keepalive") +} diff --git a/pingora-core/src/protocols/l4/listener.rs b/pingora-core/src/protocols/l4/listener.rs new file mode 100644 index 0000000..6473fb4 --- /dev/null +++ b/pingora-core/src/protocols/l4/listener.rs @@ -0,0 +1,59 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Listeners + +use std::io; +use std::os::unix::io::AsRawFd; +use tokio::net::{TcpListener, UnixListener}; + +use crate::protocols::l4::stream::Stream; + +/// The type for generic listener for both TCP and Unix domain socket +#[derive(Debug)] +pub enum Listener { + Tcp(TcpListener), + Unix(UnixListener), +} + +impl From<TcpListener> for Listener { + fn from(s: TcpListener) -> Self { + Self::Tcp(s) + } +} + +impl From<UnixListener> for Listener { + fn from(s: UnixListener) -> Self { + Self::Unix(s) + } +} + +impl AsRawFd for Listener { + fn as_raw_fd(&self) -> std::os::unix::prelude::RawFd { + match &self { + Self::Tcp(l) => l.as_raw_fd(), + Self::Unix(l) => l.as_raw_fd(), + } + } +} + +impl Listener { + /// Accept a connection from the listening endpoint + pub async fn accept(&self) -> io::Result<Stream> { + match &self { + Self::Tcp(l) => l.accept().await.map(|(stream, _)| stream.into()), + Self::Unix(l) => l.accept().await.map(|(stream, _)| stream.into()), + } + } +} diff --git a/pingora-core/src/protocols/l4/mod.rs b/pingora-core/src/protocols/l4/mod.rs new file mode 100644 index 0000000..cfa65e0 --- /dev/null +++ b/pingora-core/src/protocols/l4/mod.rs @@ -0,0 +1,20 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Transport layer protocol implementation + +pub mod ext; +pub mod listener; +pub mod socket; +pub mod stream; diff --git a/pingora-core/src/protocols/l4/socket.rs b/pingora-core/src/protocols/l4/socket.rs new file mode 100644 index 0000000..02eab36 --- /dev/null +++ b/pingora-core/src/protocols/l4/socket.rs @@ -0,0 +1,185 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Generic socket type + +use crate::{Error, OrErr}; +use std::cmp::Ordering; +use std::hash::{Hash, Hasher}; +use std::net::SocketAddr as StdSockAddr; +use std::os::unix::net::SocketAddr as StdUnixSockAddr; + +/// [`SocketAddr`] is a storage type that contains either a Internet (IP address) +/// socket address or a Unix domain socket address. +#[derive(Debug, Clone)] +pub enum SocketAddr { + Inet(StdSockAddr), + Unix(StdUnixSockAddr), +} + +impl SocketAddr { + /// Get a reference to the IP socket if it is one + pub fn as_inet(&self) -> Option<&StdSockAddr> { + if let SocketAddr::Inet(addr) = self { + Some(addr) + } else { + None + } + } + + /// Get a reference to the Unix domain socket if it is one + pub fn as_unix(&self) -> Option<&StdUnixSockAddr> { + if let SocketAddr::Unix(addr) = self { + Some(addr) + } else { + None + } + } + + /// Set the port if the address is an IP socket. + pub fn set_port(&mut self, port: u16) { + if let SocketAddr::Inet(addr) = self { + addr.set_port(port) + } + } +} + +impl std::fmt::Display for SocketAddr { + fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result { + match self { + SocketAddr::Inet(addr) => write!(f, "{addr}"), + SocketAddr::Unix(addr) => { + if let Some(path) = addr.as_pathname() { + write!(f, "{}", path.display()) + } else { + write!(f, "{addr:?}") + } + } + } + } +} + +impl Hash for SocketAddr { + fn hash<H: Hasher>(&self, state: &mut H) { + match self { + Self::Inet(sockaddr) => sockaddr.hash(state), + Self::Unix(sockaddr) => { + if let Some(path) = sockaddr.as_pathname() { + // use the underlying path as the hash + path.hash(state); + } else { + // unnamed or abstract UDS + // abstract UDS name not yet exposed by std API + // panic for now, we can decide on the right way to hash them later + panic!("Unnamed and abstract UDS types not yet supported for hashing") + } + } + } + } +} + +impl PartialEq for SocketAddr { + fn eq(&self, other: &Self) -> bool { + match self { + Self::Inet(addr) => Some(addr) == other.as_inet(), + Self::Unix(addr) => { + let path = addr.as_pathname(); + // can only compare UDS with path, assume false on all unnamed UDS + path.is_some() && path == other.as_unix().and_then(|addr| addr.as_pathname()) + } + } + } +} + +impl PartialOrd for SocketAddr { + fn partial_cmp(&self, other: &Self) -> Option<Ordering> { + Some(self.cmp(other)) + } +} + +impl Ord for SocketAddr { + fn cmp(&self, other: &Self) -> Ordering { + match self { + Self::Inet(addr) => { + if let Some(o) = other.as_inet() { + addr.cmp(o) + } else { + // always make Inet < Unix "smallest for variants at the top" + Ordering::Less + } + } + Self::Unix(addr) => { + if let Some(o) = other.as_unix() { + // NOTE: unnamed UDS are consider the same + addr.as_pathname().cmp(&o.as_pathname()) + } else { + // always make Inet < Unix "smallest for variants at the top" + Ordering::Greater + } + } + } + } +} + +impl Eq for SocketAddr {} + +impl std::str::FromStr for SocketAddr { + type Err = Box<Error>; + + // This is very basic parsing logic, it might treat invalid IP:PORT str as UDS path + // TODO: require UDS to have some prefix + fn from_str(s: &str) -> Result<Self, Self::Err> { + match StdSockAddr::from_str(s) { + Ok(addr) => Ok(SocketAddr::Inet(addr)), + Err(_) => { + let uds_socket = StdUnixSockAddr::from_pathname(s) + .or_err(crate::BindError, "invalid UDS path")?; + Ok(SocketAddr::Unix(uds_socket)) + } + } + } +} + +impl std::net::ToSocketAddrs for SocketAddr { + type Iter = std::iter::Once<StdSockAddr>; + + // Error if UDS addr + fn to_socket_addrs(&self) -> std::io::Result<Self::Iter> { + if let Some(inet) = self.as_inet() { + Ok(std::iter::once(*inet)) + } else { + Err(std::io::Error::new( + std::io::ErrorKind::Other, + "UDS socket cannot be used as inet socket", + )) + } + } +} + +#[cfg(test)] +mod test { + use super::*; + + #[test] + fn parse_ip() { + let ip: SocketAddr = "127.0.0.1:80".parse().unwrap(); + assert!(ip.as_inet().is_some()); + } + + #[test] + fn parse_uds() { + let uds: SocketAddr = "/tmp/my.sock".parse().unwrap(); + assert!(uds.as_unix().is_some()); + } +} diff --git a/pingora-core/src/protocols/l4/stream.rs b/pingora-core/src/protocols/l4/stream.rs new file mode 100644 index 0000000..ebe4f67 --- /dev/null +++ b/pingora-core/src/protocols/l4/stream.rs @@ -0,0 +1,378 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Transport layer connection + +use async_trait::async_trait; +use futures::FutureExt; +use log::{debug, error}; +use pingora_error::{ErrorType::*, OrErr, Result}; +use std::os::unix::io::AsRawFd; +use std::pin::Pin; +use std::sync::Arc; +use std::task::{Context, Poll}; +use std::time::SystemTime; +use tokio::io::{self, AsyncRead, AsyncWrite, AsyncWriteExt, BufStream, ReadBuf}; +use tokio::net::{TcpStream, UnixStream}; + +use crate::protocols::raw_connect::ProxyDigest; +use crate::protocols::{GetProxyDigest, GetTimingDigest, Shutdown, Ssl, TimingDigest, UniqueID}; +use crate::upstreams::peer::Tracer; + +#[derive(Debug)] +enum RawStream { + Tcp(TcpStream), + Unix(UnixStream), +} + +impl AsyncRead for RawStream { + fn poll_read( + self: Pin<&mut Self>, + cx: &mut Context<'_>, + buf: &mut ReadBuf<'_>, + ) -> Poll<io::Result<()>> { + // Safety: Basic enum pin projection + unsafe { + match &mut Pin::get_unchecked_mut(self) { + RawStream::Tcp(s) => Pin::new_unchecked(s).poll_read(cx, buf), + RawStream::Unix(s) => Pin::new_unchecked(s).poll_read(cx, buf), + } + } + } +} + +impl AsyncWrite for RawStream { + fn poll_write(self: Pin<&mut Self>, cx: &mut Context, buf: &[u8]) -> Poll<io::Result<usize>> { + // Safety: Basic enum pin projection + unsafe { + match &mut Pin::get_unchecked_mut(self) { + RawStream::Tcp(s) => Pin::new_unchecked(s).poll_write(cx, buf), + RawStream::Unix(s) => Pin::new_unchecked(s).poll_write(cx, buf), + } + } + } + + fn poll_flush(self: Pin<&mut Self>, cx: &mut Context) -> Poll<io::Result<()>> { + // Safety: Basic enum pin projection + unsafe { + match &mut Pin::get_unchecked_mut(self) { + RawStream::Tcp(s) => Pin::new_unchecked(s).poll_flush(cx), + RawStream::Unix(s) => Pin::new_unchecked(s).poll_flush(cx), + } + } + } + + fn poll_shutdown(self: Pin<&mut Self>, cx: &mut Context) -> Poll<io::Result<()>> { + // Safety: Basic enum pin projection + unsafe { + match &mut Pin::get_unchecked_mut(self) { + RawStream::Tcp(s) => Pin::new_unchecked(s).poll_shutdown(cx), + RawStream::Unix(s) => Pin::new_unchecked(s).poll_shutdown(cx), + } + } + } + + fn poll_write_vectored( + self: Pin<&mut Self>, + cx: &mut Context<'_>, + bufs: &[std::io::IoSlice<'_>], + ) -> Poll<io::Result<usize>> { + // Safety: Basic enum pin projection + unsafe { + match &mut Pin::get_unchecked_mut(self) { + RawStream::Tcp(s) => Pin::new_unchecked(s).poll_write_vectored(cx, bufs), + RawStream::Unix(s) => Pin::new_unchecked(s).poll_write_vectored(cx, bufs), + } + } + } + + fn is_write_vectored(&self) -> bool { + match self { + RawStream::Tcp(s) => s.is_write_vectored(), + RawStream::Unix(s) => s.is_write_vectored(), + } + } +} + +// Large read buffering helps reducing syscalls with little trade-off +// Ssl layer always does "small" reads in 16k (TLS record size) so L4 read buffer helps a lot. +const BUF_READ_SIZE: usize = 64 * 1024; +// Small write buf to match MSS. Too large write buf delays real time communication. +// This buffering effectively implements something similar to Nagle's algorithm. +// The benefit is that user space can control when to flush, where Nagle's can't be controlled. +// And userspace buffering reduce both syscalls and small packets. +const BUF_WRITE_SIZE: usize = 1460; + +// NOTE: with writer buffering, users need to call flush() to make sure the data is actually +// sent. Otherwise data could be stuck in the buffer forever or get lost when stream is closed. + +/// A concrete type for transport layer connection + extra fields for logging +#[derive(Debug)] +pub struct Stream { + stream: BufStream<RawStream>, + buffer_write: bool, + proxy_digest: Option<Arc<ProxyDigest>>, + /// When this connection is established + pub established_ts: SystemTime, + /// The distributed tracing object for this stream + pub tracer: Option<Tracer>, +} + +impl Stream { + /// set TCP nodelay for this connection if `self` is TCP + pub fn set_nodelay(&mut self) -> Result<()> { + if let RawStream::Tcp(s) = &self.stream.get_ref() { + s.set_nodelay(true) + .or_err(ConnectError, "failed to set_nodelay")?; + } + Ok(()) + } +} + +impl From<TcpStream> for Stream { + fn from(s: TcpStream) -> Self { + Stream { + stream: BufStream::with_capacity(BUF_READ_SIZE, BUF_WRITE_SIZE, RawStream::Tcp(s)), + buffer_write: true, + established_ts: SystemTime::now(), + proxy_digest: None, + tracer: None, + } + } +} + +impl From<UnixStream> for Stream { + fn from(s: UnixStream) -> Self { + Stream { + stream: BufStream::with_capacity(BUF_READ_SIZE, BUF_WRITE_SIZE, RawStream::Unix(s)), + buffer_write: true, + established_ts: SystemTime::now(), + proxy_digest: None, + tracer: None, + } + } +} + +impl UniqueID for Stream { + fn id(&self) -> i32 { + match &self.stream.get_ref() { + RawStream::Tcp(s) => s.as_raw_fd(), + RawStream::Unix(s) => s.as_raw_fd(), + } + } +} + +impl Ssl for Stream {} + +#[async_trait] +impl Shutdown for Stream { + async fn shutdown(&mut self) { + AsyncWriteExt::shutdown(self).await.unwrap_or_else(|e| { + debug!("Failed to shutdown connection: {:?}", e); + }); + } +} + +impl GetTimingDigest for Stream { + fn get_timing_digest(&self) -> Vec<Option<TimingDigest>> { + let mut digest = Vec::with_capacity(2); // expect to have both L4 stream and TLS layer + digest.push(Some(TimingDigest { + established_ts: self.established_ts, + })); + digest + } +} + +impl GetProxyDigest for Stream { + fn get_proxy_digest(&self) -> Option<Arc<ProxyDigest>> { + self.proxy_digest.clone() + } + + fn set_proxy_digest(&mut self, digest: ProxyDigest) { + self.proxy_digest = Some(Arc::new(digest)); + } +} + +impl Drop for Stream { + fn drop(&mut self) { + if let Some(t) = self.tracer.as_ref() { + t.0.on_disconnected(); + } + /* use nodelay/local_addr function to detect socket status */ + let ret = match &self.stream.get_ref() { + RawStream::Tcp(s) => s.nodelay().err(), + RawStream::Unix(s) => s.local_addr().err(), + }; + if let Some(e) = ret { + match e.kind() { + tokio::io::ErrorKind::Other => { + if let Some(ecode) = e.raw_os_error() { + if ecode == 9 { + // Or we could panic here + error!("Crit: socket {:?} is being double closed", self.stream); + } + } + } + _ => { + debug!("Socket is already broken {:?}", e); + } + } + } else { + // try flush the write buffer. We use now_or_never() because + // 1. Drop cannot be async + // 2. write should usually be ready, unless the buf is full. + let _ = self.flush().now_or_never(); + } + debug!("Dropping socket {:?}", self.stream); + } +} + +impl AsyncRead for Stream { + fn poll_read( + mut self: Pin<&mut Self>, + cx: &mut Context<'_>, + buf: &mut ReadBuf<'_>, + ) -> Poll<io::Result<()>> { + Pin::new(&mut self.stream).poll_read(cx, buf) + } +} + +impl AsyncWrite for Stream { + fn poll_write( + mut self: Pin<&mut Self>, + cx: &mut Context, + buf: &[u8], + ) -> Poll<io::Result<usize>> { + if self.buffer_write { + Pin::new(&mut self.stream).poll_write(cx, buf) + } else { + Pin::new(&mut self.stream.get_mut()).poll_write(cx, buf) + } + } + + fn poll_flush(mut self: Pin<&mut Self>, cx: &mut Context) -> Poll<io::Result<()>> { + Pin::new(&mut self.stream).poll_flush(cx) + } + + fn poll_shutdown(mut self: Pin<&mut Self>, cx: &mut Context) -> Poll<io::Result<()>> { + Pin::new(&mut self.stream).poll_shutdown(cx) + } + + fn poll_write_vectored( + mut self: Pin<&mut Self>, + cx: &mut Context<'_>, + bufs: &[std::io::IoSlice<'_>], + ) -> Poll<io::Result<usize>> { + if self.buffer_write { + Pin::new(&mut self.stream).poll_write_vectored(cx, bufs) + } else { + Pin::new(&mut self.stream.get_mut()).poll_write_vectored(cx, bufs) + } + } + + fn is_write_vectored(&self) -> bool { + if self.buffer_write { + self.stream.is_write_vectored() // it is true + } else { + self.stream.get_ref().is_write_vectored() + } + } +} + +pub mod async_write_vec { + use bytes::Buf; + use futures::ready; + use std::future::Future; + use std::io::IoSlice; + use std::pin::Pin; + use std::task::{Context, Poll}; + use tokio::io; + use tokio::io::AsyncWrite; + + /* + the missing write_buf https://github.com/tokio-rs/tokio/pull/3156#issuecomment-738207409 + https://github.com/tokio-rs/tokio/issues/2610 + In general vectored write is lost when accessing the trait object: Box<S: AsyncWrite> + */ + + #[must_use = "futures do nothing unless you `.await` or poll them"] + pub struct WriteVec<'a, W, B> { + writer: &'a mut W, + buf: &'a mut B, + } + + pub trait AsyncWriteVec { + fn poll_write_vec<B: Buf>( + self: Pin<&mut Self>, + _cx: &mut Context<'_>, + _buf: &mut B, + ) -> Poll<io::Result<usize>>; + + fn write_vec<'a, B>(&'a mut self, src: &'a mut B) -> WriteVec<'a, Self, B> + where + Self: Sized, + B: Buf, + { + WriteVec { + writer: self, + buf: src, + } + } + } + + impl<W, B> Future for WriteVec<'_, W, B> + where + W: AsyncWriteVec + Unpin, + B: Buf, + { + type Output = io::Result<usize>; + + fn poll(mut self: Pin<&mut Self>, ctx: &mut Context<'_>) -> Poll<io::Result<usize>> { + let me = &mut *self; + Pin::new(&mut *me.writer).poll_write_vec(ctx, me.buf) + } + } + + /* from https://github.com/tokio-rs/tokio/blob/master/tokio-util/src/lib.rs#L177 */ + impl<T> AsyncWriteVec for T + where + T: AsyncWrite, + { + fn poll_write_vec<B: Buf>( + self: Pin<&mut Self>, + ctx: &mut Context, + buf: &mut B, + ) -> Poll<io::Result<usize>> { + const MAX_BUFS: usize = 64; + + if !buf.has_remaining() { + return Poll::Ready(Ok(0)); + } + + let n = if self.is_write_vectored() { + let mut slices = [IoSlice::new(&[]); MAX_BUFS]; + let cnt = buf.chunks_vectored(&mut slices); + ready!(self.poll_write_vectored(ctx, &slices[..cnt]))? + } else { + ready!(self.poll_write(ctx, buf.chunk()))? + }; + + buf.advance(n); + + Poll::Ready(Ok(n)) + } + } +} + +pub use async_write_vec::AsyncWriteVec; diff --git a/pingora-core/src/protocols/mod.rs b/pingora-core/src/protocols/mod.rs new file mode 100644 index 0000000..4df6da8 --- /dev/null +++ b/pingora-core/src/protocols/mod.rs @@ -0,0 +1,253 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Abstractions and implementations for protocols including TCP, TLS and HTTP + +mod digest; +pub mod http; +pub mod l4; +pub mod raw_connect; +pub mod ssl; + +pub use digest::{Digest, GetProxyDigest, GetTimingDigest, ProtoDigest, TimingDigest}; +pub use ssl::ALPN; + +use async_trait::async_trait; +use std::fmt::Debug; +use std::sync::Arc; + +/// Define how a protocol should shutdown its connection. +#[async_trait] +pub trait Shutdown { + async fn shutdown(&mut self) -> (); +} + +/// Define how a given session/connection identifies itself. +pub trait UniqueID { + /// The ID returned should be unique among all existing connections of the same type. + /// But ID can be recycled after a connection is shutdown. + fn id(&self) -> i32; +} + +/// Interface to get TLS info +pub trait Ssl { + /// Return the TLS info if the connection is over TLS + fn get_ssl(&self) -> Option<&crate::tls::ssl::SslRef> { + None + } + + /// Return the [`ssl::SslDigest`] for logging + fn get_ssl_digest(&self) -> Option<Arc<ssl::SslDigest>> { + None + } + + /// Return selected ALPN if any + fn selected_alpn_proto(&self) -> Option<ALPN> { + let ssl = self.get_ssl()?; + ALPN::from_wire_selected(ssl.selected_alpn_protocol()?) + } +} + +use std::any::Any; +use tokio::io::{AsyncRead, AsyncWrite}; + +/// The abstraction of transport layer IO +pub trait IO: + AsyncRead + + AsyncWrite + + Shutdown + + UniqueID + + Ssl + + GetTimingDigest + + GetProxyDigest + + Unpin + + Debug + + Send + + Sync +{ + /// helper to cast as the reference of the concrete type + fn as_any(&self) -> &dyn Any; + /// helper to cast back of the concrete type + fn into_any(self: Box<Self>) -> Box<dyn Any>; +} + +impl< + T: AsyncRead + + AsyncWrite + + Shutdown + + UniqueID + + Ssl + + GetTimingDigest + + GetProxyDigest + + Unpin + + Debug + + Send + + Sync, + > IO for T +where + T: 'static, +{ + fn as_any(&self) -> &dyn Any { + self + } + fn into_any(self: Box<Self>) -> Box<dyn Any> { + self + } +} + +/// The type of any established transport layer connection +pub type Stream = Box<dyn IO>; + +// Implement IO trait for 3rd party types, mostly for testing +mod ext_io_impl { + use super::*; + use tokio_test::io::Mock; + + #[async_trait] + impl Shutdown for Mock { + async fn shutdown(&mut self) -> () {} + } + impl UniqueID for Mock { + fn id(&self) -> i32 { + 0 + } + } + impl Ssl for Mock {} + impl GetTimingDigest for Mock { + fn get_timing_digest(&self) -> Vec<Option<TimingDigest>> { + vec![] + } + } + impl GetProxyDigest for Mock { + fn get_proxy_digest(&self) -> Option<Arc<raw_connect::ProxyDigest>> { + None + } + } + + use std::io::Cursor; + + #[async_trait] + impl<T: Send> Shutdown for Cursor<T> { + async fn shutdown(&mut self) -> () {} + } + impl<T> UniqueID for Cursor<T> { + fn id(&self) -> i32 { + 0 + } + } + impl<T> Ssl for Cursor<T> {} + impl<T> GetTimingDigest for Cursor<T> { + fn get_timing_digest(&self) -> Vec<Option<TimingDigest>> { + vec![] + } + } + impl<T> GetProxyDigest for Cursor<T> { + fn get_proxy_digest(&self) -> Option<Arc<raw_connect::ProxyDigest>> { + None + } + } + + use tokio::io::DuplexStream; + + #[async_trait] + impl Shutdown for DuplexStream { + async fn shutdown(&mut self) -> () {} + } + impl UniqueID for DuplexStream { + fn id(&self) -> i32 { + 0 + } + } + impl Ssl for DuplexStream {} + impl GetTimingDigest for DuplexStream { + fn get_timing_digest(&self) -> Vec<Option<TimingDigest>> { + vec![] + } + } + impl GetProxyDigest for DuplexStream { + fn get_proxy_digest(&self) -> Option<Arc<raw_connect::ProxyDigest>> { + None + } + } +} + +pub(crate) trait ConnFdReusable { + fn check_fd_match<V: AsRawFd>(&self, fd: V) -> bool; +} + +use l4::socket::SocketAddr; +use log::{debug, error}; +use nix::sys::socket::{getpeername, SockaddrStorage, UnixAddr}; +use std::{net::SocketAddr as InetSocketAddr, os::unix::prelude::AsRawFd, path::Path}; + +impl ConnFdReusable for SocketAddr { + fn check_fd_match<V: AsRawFd>(&self, fd: V) -> bool { + match self { + SocketAddr::Inet(addr) => addr.check_fd_match(fd), + SocketAddr::Unix(addr) => addr + .as_pathname() + .expect("non-pathname unix sockets not supported as peer") + .check_fd_match(fd), + } + } +} + +impl ConnFdReusable for Path { + fn check_fd_match<V: AsRawFd>(&self, fd: V) -> bool { + let fd = fd.as_raw_fd(); + match getpeername::<UnixAddr>(fd) { + Ok(peer) => match UnixAddr::new(self) { + Ok(addr) => { + if addr == peer { + debug!("Unix FD to: {peer:?} is reusable"); + true + } else { + error!("Crit: unix FD mismatch: fd: {fd:?}, peer: {peer:?}, addr: {addr}",); + false + } + } + Err(e) => { + error!("Bad addr: {self:?}, error: {e:?}"); + false + } + }, + Err(e) => { + error!("Idle unix connection is broken: {e:?}"); + false + } + } + } +} + +impl ConnFdReusable for InetSocketAddr { + fn check_fd_match<V: AsRawFd>(&self, fd: V) -> bool { + let fd = fd.as_raw_fd(); + match getpeername::<SockaddrStorage>(fd) { + Ok(peer) => { + let addr = SockaddrStorage::from(*self); + if addr == peer { + debug!("Inet FD to: {peer:?} is reusable"); + true + } else { + error!("Crit: FD mismatch: fd: {fd:?}, addr: {addr:?}, peer: {peer:?}",); + false + } + } + Err(e) => { + debug!("Idle connection is broken: {e:?}"); + false + } + } + } +} diff --git a/pingora-core/src/protocols/raw_connect.rs b/pingora-core/src/protocols/raw_connect.rs new file mode 100644 index 0000000..08fdc9a --- /dev/null +++ b/pingora-core/src/protocols/raw_connect.rs @@ -0,0 +1,271 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! CONNECT protocol over http 1.1 via raw Unix domain socket +//! +//! This mod implements the most rudimentary CONNECT client over raw stream. +//! The idea is to yield raw stream once the CONNECT handshake is complete +//! so that the protocol encapsulated can use the stream directly. +//! this idea only works for CONNECT over HTTP 1.1 and localhost (or where the server is close by). + +use super::http::v1::client::HttpSession; +use super::http::v1::common::*; +use super::Stream; + +use bytes::{BufMut, BytesMut}; +use http::request::Parts as ReqHeader; +use http::Version; +use pingora_error::{Error, ErrorType::*, OrErr, Result}; +use pingora_http::ResponseHeader; +use tokio::io::AsyncWriteExt; + +/// Try to establish a CONNECT proxy via the given `stream`. +/// +/// `request_header` should include the necessary request headers for the CONNECT protocol. +/// +/// When successful, a [`Stream`] will be returned which is the established CONNECT proxy connection. +pub async fn connect(stream: Stream, request_header: &ReqHeader) -> Result<(Stream, ProxyDigest)> { + let mut http = HttpSession::new(stream); + + // We write to stream directly because HttpSession doesn't write req header in auth form + let to_wire = http_req_header_to_wire_auth_form(request_header); + http.underlying_stream + .write_all(to_wire.as_ref()) + .await + .or_err(WriteError, "while writing request headers")?; + http.underlying_stream + .flush() + .await + .or_err(WriteError, "while flushing request headers")?; + + // TODO: set http.read_timeout + let resp_header = http.read_resp_header_parts().await?; + Ok(( + http.underlying_stream, + validate_connect_response(resp_header)?, + )) +} + +/// Generate the CONNECT header for the given destination +pub fn generate_connect_header<'a, H, S>( + host: &str, + port: u16, + headers: H, +) -> Result<Box<ReqHeader>> +where + S: AsRef<[u8]>, + H: Iterator<Item = (S, &'a Vec<u8>)>, +{ + // TODO: valid that host doesn't have port + // TODO: support adding ad-hoc headers + + let authority = format!("{host}:{port}"); + let req = http::request::Builder::new() + .version(http::Version::HTTP_11) + .method(http::method::Method::CONNECT) + .uri(format!("https://{authority}/")) // scheme doesn't matter + .header(http::header::HOST, &authority); + + let (mut req, _) = match req.body(()) { + Ok(r) => r.into_parts(), + Err(e) => { + return Err(e).or_err(InvalidHTTPHeader, "Invalid CONNECT request"); + } + }; + + for (k, v) in headers { + let header_name = http::header::HeaderName::from_bytes(k.as_ref()) + .or_err(InvalidHTTPHeader, "Invalid CONNECT request")?; + let header_value = http::header::HeaderValue::from_bytes(v.as_slice()) + .or_err(InvalidHTTPHeader, "Invalid CONNECT request")?; + req.headers.insert(header_name, header_value); + } + + Ok(Box::new(req)) +} + +/// The information about the CONNECT proxy. +#[derive(Debug)] +pub struct ProxyDigest { + /// The response header the proxy returns + pub response: Box<ResponseHeader>, +} + +impl ProxyDigest { + pub fn new(response: Box<ResponseHeader>) -> Self { + ProxyDigest { response } + } +} + +/// The error returned when the CONNECT proxy fails to establish. +#[derive(Debug)] +pub struct ConnectProxyError { + /// The response header the proxy returns + pub response: Box<ResponseHeader>, +} + +impl ConnectProxyError { + pub fn boxed_new(response: Box<ResponseHeader>) -> Box<Self> { + Box::new(ConnectProxyError { response }) + } +} + +impl std::fmt::Display for ConnectProxyError { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + const PROXY_STATUS: &str = "proxy-status"; + + let reason = self + .response + .headers + .get(PROXY_STATUS) + .and_then(|s| s.to_str().ok()) + .unwrap_or("missing proxy-status header value"); + write!( + f, + "Failed CONNECT Response: status {}, proxy-status {reason}", + &self.response.status + ) + } +} + +impl std::error::Error for ConnectProxyError {} + +#[inline] +fn http_req_header_to_wire_auth_form(req: &ReqHeader) -> BytesMut { + let mut buf = BytesMut::with_capacity(512); + + // Request-Line + let method = req.method.as_str().as_bytes(); + buf.put_slice(method); + buf.put_u8(b' '); + // NOTE: CONNECT doesn't need URI path so we just skip that + if let Some(path) = req.uri.authority() { + buf.put_slice(path.as_str().as_bytes()); + } + buf.put_u8(b' '); + + let version = match req.version { + Version::HTTP_09 => "HTTP/0.9", + Version::HTTP_10 => "HTTP/1.0", + Version::HTTP_11 => "HTTP/1.1", + _ => "HTTP/0.9", + }; + buf.put_slice(version.as_bytes()); + buf.put_slice(CRLF); + + // headers + let headers = &req.headers; + for (key, value) in headers.iter() { + buf.put_slice(key.as_ref()); + buf.put_slice(HEADER_KV_DELIMITER); + buf.put_slice(value.as_ref()); + buf.put_slice(CRLF); + } + + buf.put_slice(CRLF); + buf +} + +#[inline] +fn validate_connect_response(resp: Box<ResponseHeader>) -> Result<ProxyDigest> { + if !resp.status.is_success() { + return Error::e_because( + ConnectProxyFailure, + "None 2xx code", + ConnectProxyError::boxed_new(resp), + ); + } + + // Checking Content-Length and Transfer-Encoding is optional because we already ignore them. + // We choose to do so because we want to be strict for internal use of CONNECT. + // Ignore Content-Length header because our internal CONNECT server is coded to send it. + if resp.headers.get(http::header::TRANSFER_ENCODING).is_some() { + return Error::e_because( + ConnectProxyFailure, + "Invalid Transfer-Encoding presents", + ConnectProxyError::boxed_new(resp), + ); + } + Ok(ProxyDigest::new(resp)) +} + +#[cfg(test)] +mod test_sync { + use super::*; + use std::collections::BTreeMap; + use tokio_test::io::Builder; + + #[test] + fn test_generate_connect_header() { + let mut headers = BTreeMap::new(); + headers.insert(String::from("foo"), b"bar".to_vec()); + let req = generate_connect_header("pingora.org", 123, headers.iter()).unwrap(); + + assert_eq!(req.method, http::method::Method::CONNECT); + assert_eq!(req.uri.authority().unwrap(), "pingora.org:123"); + assert_eq!(req.headers.get("Host").unwrap(), "pingora.org:123"); + assert_eq!(req.headers.get("foo").unwrap(), "bar"); + } + #[test] + fn test_request_to_wire_auth_form() { + let new_request = http::Request::builder() + .method("CONNECT") + .uri("https://pingora.org:123/") + .header("Foo", "Bar") + .body(()) + .unwrap(); + let (new_request, _) = new_request.into_parts(); + let wire = http_req_header_to_wire_auth_form(&new_request); + assert_eq!( + &b"CONNECT pingora.org:123 HTTP/1.1\r\nfoo: Bar\r\n\r\n"[..], + &wire + ); + } + + #[test] + fn test_validate_connect_response() { + let resp = ResponseHeader::build(200, None).unwrap(); + validate_connect_response(Box::new(resp)).unwrap(); + + let resp = ResponseHeader::build(404, None).unwrap(); + assert!(validate_connect_response(Box::new(resp)).is_err()); + + let mut resp = ResponseHeader::build(200, None).unwrap(); + resp.append_header("content-length", 0).unwrap(); + assert!(validate_connect_response(Box::new(resp)).is_ok()); + + let mut resp = ResponseHeader::build(200, None).unwrap(); + resp.append_header("transfer-encoding", 0).unwrap(); + assert!(validate_connect_response(Box::new(resp)).is_err()); + } + + #[tokio::test] + async fn test_connect_write_request() { + let wire = b"CONNECT pingora.org:123 HTTP/1.1\r\nhost: pingora.org:123\r\n\r\n"; + let mock_io = Box::new(Builder::new().write(wire).build()); + + let headers: BTreeMap<String, Vec<u8>> = BTreeMap::new(); + let req = generate_connect_header("pingora.org", 123, headers.iter()).unwrap(); + // ConnectionClosed + assert!(connect(mock_io, &req).await.is_err()); + + let to_wire = b"CONNECT pingora.org:123 HTTP/1.1\r\nhost: pingora.org:123\r\n\r\n"; + let from_wire = b"HTTP/1.1 200 OK\r\n\r\n"; + let mock_io = Box::new(Builder::new().write(to_wire).read(from_wire).build()); + + let req = generate_connect_header("pingora.org", 123, headers.iter()).unwrap(); + let result = connect(mock_io, &req).await; + assert!(result.is_ok()); + } +} diff --git a/pingora-core/src/protocols/ssl/client.rs b/pingora-core/src/protocols/ssl/client.rs new file mode 100644 index 0000000..abb6da6 --- /dev/null +++ b/pingora-core/src/protocols/ssl/client.rs @@ -0,0 +1,78 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! TLS client specific implementation + +use super::SslStream; +use crate::protocols::raw_connect::ProxyDigest; +use crate::protocols::{GetProxyDigest, GetTimingDigest, TimingDigest, IO}; +use crate::tls::{ + ssl, + ssl::ConnectConfiguration, + ssl_sys::{X509_V_ERR_INVALID_CALL, X509_V_OK}, +}; + +use pingora_error::{Error, ErrorType::*, OrErr, Result}; +use std::sync::Arc; + +/// Perform the TLS handshake for the given connection with the given configuration +pub async fn handshake<S: IO>( + conn_config: ConnectConfiguration, + domain: &str, + io: S, +) -> Result<SslStream<S>> { + let ssl = conn_config + .into_ssl(domain) + .explain_err(TLSHandshakeFailure, |e| format!("ssl config error: {e}"))?; + let mut stream = SslStream::new(ssl, io) + .explain_err(TLSHandshakeFailure, |e| format!("ssl stream error: {e}"))?; + let handshake_result = stream.connect().await; + match handshake_result { + Ok(()) => Ok(stream), + Err(e) => { + let context = format!("TLS connect() failed: {e}, SNI: {domain}"); + match e.code() { + ssl::ErrorCode::SSL => match stream.ssl().verify_result().as_raw() { + // X509_V_ERR_INVALID_CALL in case verify result was never set + X509_V_OK | X509_V_ERR_INVALID_CALL => { + Error::e_explain(TLSHandshakeFailure, context) + } + _ => Error::e_explain(InvalidCert, context), + }, + /* likely network error, but still mark as TLS error */ + _ => Error::e_explain(TLSHandshakeFailure, context), + } + } + } +} + +impl<S> GetTimingDigest for SslStream<S> +where + S: GetTimingDigest, +{ + fn get_timing_digest(&self) -> Vec<Option<TimingDigest>> { + let mut ts_vec = self.get_ref().get_timing_digest(); + ts_vec.push(Some(self.timing.clone())); + ts_vec + } +} + +impl<S> GetProxyDigest for SslStream<S> +where + S: GetProxyDigest, +{ + fn get_proxy_digest(&self) -> Option<Arc<ProxyDigest>> { + self.get_ref().get_proxy_digest() + } +} diff --git a/pingora-core/src/protocols/ssl/digest.rs b/pingora-core/src/protocols/ssl/digest.rs new file mode 100644 index 0000000..3cdb7aa --- /dev/null +++ b/pingora-core/src/protocols/ssl/digest.rs @@ -0,0 +1,65 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! TLS information from the TLS connection + +use crate::tls::{hash::MessageDigest, ssl::SslRef}; +use crate::utils; + +/// The TLS connection information +#[derive(Clone, Debug)] +pub struct SslDigest { + /// The cipher used + pub cipher: &'static str, + /// The TLS version of this connection + pub version: &'static str, + /// The organization of the peer's certificate + pub organization: Option<String>, + /// The serial number of the peer's certificate + pub serial_number: Option<String>, + /// The digest of the peer's certificate + pub cert_digest: Vec<u8>, +} + +impl SslDigest { + pub fn from_ssl(ssl: &SslRef) -> Self { + let cipher = match ssl.current_cipher() { + Some(c) => c.name(), + None => "", + }; + + let (cert_digest, org, sn) = match ssl.peer_certificate() { + Some(cert) => { + let cert_digest = match cert.digest(MessageDigest::sha256()) { + Ok(c) => c.as_ref().to_vec(), + Err(_) => Vec::new(), + }; + ( + cert_digest, + utils::get_organization(&cert), + utils::get_serial(&cert).ok(), + ) + } + None => (Vec::new(), None, None), + }; + + SslDigest { + cipher, + version: ssl.version_str(), + organization: org, + serial_number: sn, + cert_digest, + } + } +} diff --git a/pingora-core/src/protocols/ssl/mod.rs b/pingora-core/src/protocols/ssl/mod.rs new file mode 100644 index 0000000..f1ce8b9 --- /dev/null +++ b/pingora-core/src/protocols/ssl/mod.rs @@ -0,0 +1,246 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! The TLS layer implementations + +pub mod client; +pub mod digest; +pub mod server; + +use crate::protocols::digest::TimingDigest; +use crate::protocols::{Ssl, UniqueID}; +use crate::tls::{self, ssl, tokio_ssl::SslStream as InnerSsl}; +use log::warn; +use pingora_error::{ErrorType::*, OrErr, Result}; +use std::pin::Pin; +use std::sync::Arc; +use std::task::{Context, Poll}; +use std::time::SystemTime; +use tokio::io::{self, AsyncRead, AsyncWrite, ReadBuf}; + +pub use digest::SslDigest; + +/// The TLS connection +#[derive(Debug)] +pub struct SslStream<T> { + ssl: InnerSsl<T>, + digest: Option<Arc<SslDigest>>, + timing: TimingDigest, +} + +impl<T> SslStream<T> +where + T: AsyncRead + AsyncWrite + std::marker::Unpin, +{ + /// Create a new TLS connection from the given `stream` + /// + /// The caller needs to perform [`Self::connect()`] or [`Self::accept()`] to perform TLS + /// handshake after. + pub fn new(ssl: ssl::Ssl, stream: T) -> Result<Self> { + let ssl = InnerSsl::new(ssl, stream) + .explain_err(TLSHandshakeFailure, |e| format!("ssl stream error: {e}"))?; + + Ok(SslStream { + ssl, + digest: None, + timing: Default::default(), + }) + } + + /// Connect to the remote TLS server as a client + pub async fn connect(&mut self) -> Result<(), ssl::Error> { + Self::clear_error(); + Pin::new(&mut self.ssl).connect().await?; + self.timing.established_ts = SystemTime::now(); + self.digest = Some(Arc::new(SslDigest::from_ssl(self.ssl()))); + Ok(()) + } + + /// Finish the TLS handshake from client as a server + pub async fn accept(&mut self) -> Result<(), ssl::Error> { + Self::clear_error(); + Pin::new(&mut self.ssl).accept().await?; + self.timing.established_ts = SystemTime::now(); + self.digest = Some(Arc::new(SslDigest::from_ssl(self.ssl()))); + Ok(()) + } + + #[inline] + fn clear_error() { + let errs = tls::error::ErrorStack::get(); + if !errs.errors().is_empty() { + warn!("Clearing dirty TLS error stack: {}", errs); + } + } +} + +impl<T> SslStream<T> { + pub fn ssl_digest(&self) -> Option<Arc<SslDigest>> { + self.digest.clone() + } +} + +use std::ops::{Deref, DerefMut}; + +impl<T> Deref for SslStream<T> { + type Target = InnerSsl<T>; + + fn deref(&self) -> &Self::Target { + &self.ssl + } +} + +impl<T> DerefMut for SslStream<T> { + fn deref_mut(&mut self) -> &mut Self::Target { + &mut self.ssl + } +} + +impl<T> AsyncRead for SslStream<T> +where + T: AsyncRead + AsyncWrite + Unpin, +{ + fn poll_read( + mut self: Pin<&mut Self>, + cx: &mut Context<'_>, + buf: &mut ReadBuf<'_>, + ) -> Poll<io::Result<()>> { + Self::clear_error(); + Pin::new(&mut self.ssl).poll_read(cx, buf) + } +} + +impl<T> AsyncWrite for SslStream<T> +where + T: AsyncRead + AsyncWrite + Unpin, +{ + fn poll_write( + mut self: Pin<&mut Self>, + cx: &mut Context, + buf: &[u8], + ) -> Poll<io::Result<usize>> { + Self::clear_error(); + Pin::new(&mut self.ssl).poll_write(cx, buf) + } + + fn poll_flush(mut self: Pin<&mut Self>, cx: &mut Context) -> Poll<io::Result<()>> { + Self::clear_error(); + Pin::new(&mut self.ssl).poll_flush(cx) + } + + fn poll_shutdown(mut self: Pin<&mut Self>, cx: &mut Context) -> Poll<io::Result<()>> { + Self::clear_error(); + Pin::new(&mut self.ssl).poll_shutdown(cx) + } + + fn poll_write_vectored( + mut self: Pin<&mut Self>, + cx: &mut Context<'_>, + bufs: &[std::io::IoSlice<'_>], + ) -> Poll<io::Result<usize>> { + Self::clear_error(); + Pin::new(&mut self.ssl).poll_write_vectored(cx, bufs) + } + + fn is_write_vectored(&self) -> bool { + true + } +} + +impl<T> UniqueID for SslStream<T> +where + T: UniqueID, +{ + fn id(&self) -> i32 { + self.ssl.get_ref().id() + } +} + +impl<T> Ssl for SslStream<T> { + fn get_ssl(&self) -> Option<&ssl::SslRef> { + Some(self.ssl()) + } + + fn get_ssl_digest(&self) -> Option<Arc<SslDigest>> { + self.ssl_digest() + } +} + +/// The protocol for Application-Layer Protocol Negotiation +#[derive(Hash, Clone, Debug)] +pub enum ALPN { + /// Prefer HTTP/1.1 only + H1, + /// Prefer HTTP/2 only + H2, + /// Prefer HTTP/2 over HTTP/1.1 + H2H1, +} + +impl std::fmt::Display for ALPN { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + match self { + ALPN::H1 => write!(f, "H1"), + ALPN::H2 => write!(f, "H2"), + ALPN::H2H1 => write!(f, "H2H1"), + } + } +} + +impl ALPN { + /// Create a new ALPN according to the `max` and `min` version constraints + pub fn new(max: u8, min: u8) -> Self { + if max == 1 { + ALPN::H1 + } else if min == 2 { + ALPN::H2 + } else { + ALPN::H2H1 + } + } + + /// Return the max http version this [`ALPN`] allows + pub fn get_max_http_version(&self) -> u8 { + match self { + ALPN::H1 => 1, + _ => 2, + } + } + + /// Return the min http version this [`ALPN`] allows + pub fn get_min_http_version(&self) -> u8 { + match self { + ALPN::H2 => 2, + _ => 1, + } + } + + pub(crate) fn to_wire_preference(&self) -> &[u8] { + // https://www.openssl.org/docs/manmaster/man3/SSL_CTX_set_alpn_select_cb.html + // "vector of nonempty, 8-bit length-prefixed, byte strings" + match self { + Self::H1 => b"\x08http/1.1", + Self::H2 => b"\x02h2", + Self::H2H1 => b"\x02h2\x08http/1.1", + } + } + + pub(crate) fn from_wire_selected(raw: &[u8]) -> Option<Self> { + match raw { + b"http/1.1" => Some(Self::H1), + b"h2" => Some(Self::H2), + _ => None, + } + } +} diff --git a/pingora-core/src/protocols/ssl/server.rs b/pingora-core/src/protocols/ssl/server.rs new file mode 100644 index 0000000..98dc2f1 --- /dev/null +++ b/pingora-core/src/protocols/ssl/server.rs @@ -0,0 +1,193 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! TLS server specific implementation + +use super::SslStream; +use crate::protocols::{Shutdown, IO}; +use crate::tls::ext; +use crate::tls::ext::ssl_from_acceptor; +use crate::tls::ssl; +use crate::tls::ssl::{SslAcceptor, SslRef}; + +use async_trait::async_trait; +use log::warn; +use pingora_error::{ErrorType::*, OrErr, Result}; +use std::pin::Pin; +use tokio::io::{AsyncRead, AsyncWrite, AsyncWriteExt}; + +/// Prepare a TLS stream for handshake +pub fn prepare_tls_stream<S: IO>(ssl_acceptor: &SslAcceptor, io: S) -> Result<SslStream<S>> { + let ssl = ssl_from_acceptor(ssl_acceptor) + .explain_err(TLSHandshakeFailure, |e| format!("ssl_acceptor error: {e}"))?; + SslStream::new(ssl, io).explain_err(TLSHandshakeFailure, |e| format!("ssl stream error: {e}")) +} + +/// Perform TLS handshake for the given connection with the given configuration +pub async fn handshake<S: IO>(ssl_acceptor: &SslAcceptor, io: S) -> Result<SslStream<S>> { + let mut stream = prepare_tls_stream(ssl_acceptor, io)?; + stream + .accept() + .await + .explain_err(TLSHandshakeFailure, |e| format!("TLS accept() failed: {e}"))?; + Ok(stream) +} + +/// Perform TLS handshake for the given connection with the given configuration and callbacks +pub async fn handshake_with_callback<S: IO>( + ssl_acceptor: &SslAcceptor, + io: S, + callbacks: &TlsAcceptCallbacks, +) -> Result<SslStream<S>> { + let mut tls_stream = prepare_tls_stream(ssl_acceptor, io)?; + let done = Pin::new(&mut tls_stream) + .start_accept() + .await + .explain_err(TLSHandshakeFailure, |e| format!("TLS accept() failed: {e}"))?; + if !done { + // safety: we do hold a mut ref of tls_stream + let ssl_mut = unsafe { ext::ssl_mut(tls_stream.ssl()) }; + callbacks.certificate_callback(ssl_mut).await; + Pin::new(&mut tls_stream) + .resume_accept() + .await + .explain_err(TLSHandshakeFailure, |e| format!("TLS accept() failed: {e}"))?; + Ok(tls_stream) + } else { + Ok(tls_stream) + } +} + +/// The APIs to customize things like certificate during TLS server side handshake +#[async_trait] +pub trait TlsAccept { + // TODO: return error? + /// This function is called in the middle of a TLS handshake. Structs who implements this function + /// should provide tls certificate and key to the [SslRef] via [ext::ssl_use_certificate] and [ext::ssl_use_private_key]. + async fn certificate_callback(&self, _ssl: &mut SslRef) -> () { + // does nothing by default + } +} + +pub type TlsAcceptCallbacks = Box<dyn TlsAccept + Send + Sync>; + +#[async_trait] +impl<S> Shutdown for SslStream<S> +where + S: AsyncRead + AsyncWrite + Sync + Unpin + Send, +{ + async fn shutdown(&mut self) { + match <Self as AsyncWriteExt>::shutdown(self).await { + Ok(()) => {} + Err(e) => { + warn!("TLS shutdown failed, {e}"); + } + } + } +} + +/// Resumable TLS server side handshake. +#[async_trait] +pub trait ResumableAccept { + /// Start a resumable TLS accept handshake. + /// + /// * `Ok(true)` when the handshake is finished + /// * `Ok(false)`` when the handshake is paused midway + /// + /// For now, the accept will only pause when a certificate is needed. + async fn start_accept(self: Pin<&mut Self>) -> Result<bool, ssl::Error>; + + /// Continue the TLS handshake + /// + /// This function should be called after the certificate is provided. + async fn resume_accept(self: Pin<&mut Self>) -> Result<(), ssl::Error>; +} + +#[async_trait] +impl<S: AsyncRead + AsyncWrite + Send + Unpin> ResumableAccept for SslStream<S> { + async fn start_accept(mut self: Pin<&mut Self>) -> Result<bool, ssl::Error> { + // safety: &mut self + let ssl_mut = unsafe { ext::ssl_mut(self.ssl()) }; + ext::suspend_when_need_ssl_cert(ssl_mut); + let res = self.accept().await; + + match res { + Ok(()) => Ok(true), + Err(e) => { + if ext::is_suspended_for_cert(&e) { + Ok(false) + } else { + Err(e) + } + } + } + } + + async fn resume_accept(mut self: Pin<&mut Self>) -> Result<(), ssl::Error> { + // safety: &mut ssl + let ssl_mut = unsafe { ext::ssl_mut(self.ssl()) }; + ext::unblock_ssl_cert(ssl_mut); + self.accept().await + } +} + +#[tokio::test] +async fn test_async_cert() { + use tokio::io::AsyncReadExt; + let acceptor = ssl::SslAcceptor::mozilla_intermediate_v5(ssl::SslMethod::tls()) + .unwrap() + .build(); + + struct Callback; + #[async_trait] + impl TlsAccept for Callback { + async fn certificate_callback(&self, ssl: &mut SslRef) -> () { + assert_eq!( + ssl.servername(ssl::NameType::HOST_NAME).unwrap(), + "pingora.org" + ); + let cert = format!("{}/tests/keys/server.crt", env!("CARGO_MANIFEST_DIR")); + let key = format!("{}/tests/keys/key.pem", env!("CARGO_MANIFEST_DIR")); + + let cert_bytes = std::fs::read(cert).unwrap(); + let cert = crate::tls::x509::X509::from_pem(&cert_bytes).unwrap(); + + let key_bytes = std::fs::read(key).unwrap(); + let key = crate::tls::pkey::PKey::private_key_from_pem(&key_bytes).unwrap(); + ext::ssl_use_certificate(ssl, &cert).unwrap(); + ext::ssl_use_private_key(ssl, &key).unwrap(); + } + } + + let cb: TlsAcceptCallbacks = Box::new(Callback); + + let (client, server) = tokio::io::duplex(1024); + + tokio::spawn(async move { + let ssl_context = ssl::SslContext::builder(ssl::SslMethod::tls()) + .unwrap() + .build(); + let mut ssl = ssl::Ssl::new(&ssl_context).unwrap(); + ssl.set_hostname("pingora.org").unwrap(); + ssl.set_verify(ssl::SslVerifyMode::NONE); // we don have a valid cert + let mut stream = SslStream::new(ssl, client).unwrap(); + Pin::new(&mut stream).connect().await.unwrap(); + let mut buf = [0; 1]; + let _ = stream.read(&mut buf).await; + }); + + handshake_with_callback(&acceptor, server, &cb) + .await + .unwrap(); +} diff --git a/pingora-core/src/server/configuration/mod.rs b/pingora-core/src/server/configuration/mod.rs new file mode 100644 index 0000000..19db553 --- /dev/null +++ b/pingora-core/src/server/configuration/mod.rs @@ -0,0 +1,267 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Server configurations +//! +//! Server configurations define startup settings such as: +//! * User and group to run as after daemonization +//! * Number of threads per service +//! * Error log file path + +use log::{debug, trace}; +use pingora_error::{Error, ErrorType::*, OrErr, Result}; +use serde::{Deserialize, Serialize}; +use std::fs; +use structopt::StructOpt; + +/// The configuration file +/// +/// Pingora configuration files are by default YAML files but any key value format can potentially +/// be used. +/// +/// # Extension +/// New keys can be added to the configuration files which this configuration object will ignore. +/// Then, users can parse these key-values to pass to their code to use. +#[derive(Debug, PartialEq, Eq, Serialize, Deserialize)] +#[serde(default)] +pub struct ServerConf { + version: usize, + /// Whether to run this process in the background. + pub daemon: bool, + /// When configured, error log will be written to the given file. Otherwise StdErr will be used. + pub error_log: Option<String>, + /// The pid (process ID) file of this server + pub pid_file: String, + /// the path to the upgrade socket + /// + /// In order to perform zero downtime restart, both the new and old process need to agree on the + /// path to this sock in order to coordinate the upgrade. + pub upgrade_sock: String, + /// If configured, after daemonization, this process will switch to the given user before + /// starting to serve traffic. + pub user: Option<String>, + /// Similar to `user`, the group this process should switch to. + pub group: Option<String>, + /// How many threads **each** service should get. The threads are not shared across services. + pub threads: usize, + /// Allow work stealing between threads of the same service. Default `true`. + pub work_stealing: bool, + /// The path to CA file the SSL library should use. If empty, the default trust store location + /// defined by the SSL library will be used. + pub ca_file: Option<String>, + // These options don't belong here as they are specific to certain services + pub(crate) client_bind_to_ipv4: Vec<String>, + pub(crate) client_bind_to_ipv6: Vec<String>, + pub(crate) upstream_keepalive_pool_size: usize, + pub(crate) upstream_connect_offload_threadpools: Option<usize>, + pub(crate) upstream_connect_offload_thread_per_pool: Option<usize>, +} + +impl Default for ServerConf { + fn default() -> Self { + ServerConf { + version: 0, + client_bind_to_ipv4: vec![], + client_bind_to_ipv6: vec![], + ca_file: None, + daemon: false, + error_log: None, + pid_file: "/tmp/pingora.pid".to_string(), + upgrade_sock: "/tmp/pingora_upgrade.sock".to_string(), + user: None, + group: None, + threads: 1, + work_stealing: true, + upstream_keepalive_pool_size: 128, + upstream_connect_offload_threadpools: None, + upstream_connect_offload_thread_per_pool: None, + } + } +} + +/// Commandline options +/// +/// Call `Opt::from_args()` to build this object from the process's command line arguments. +#[derive(StructOpt, Debug)] +#[structopt(name = "basic")] +pub struct Opt { + /// Whether this server should try to upgrade from an running old server + /// + /// `-u` or `--upgrade` can be used + #[structopt(short, long)] + pub upgrade: bool, + /// Whether should run this server in the background + /// + /// `-d` or `--daemon` can be used + #[structopt(short, long)] + pub daemon: bool, + /// Not actually used. This flag is there so that the server is not upset seeing this flag + /// passed from `cargo test` sometimes + #[structopt(long)] + pub nocapture: bool, + /// Test the configuration and exit + /// + /// When this flag is set, calling `server.bootstrap()` will exit the process without errors + /// + /// This flag is useful for upgrading service where the user wants to make sure the new + /// service can start before shutting down the old server process. + /// + /// `-t` or `--test` can be used + #[structopt(short, long)] + pub test: bool, + /// The path to the configuration file. + /// + /// See [`ServerConf`] for more details of the configuration file. + /// + /// `-c` or `--conf` can be used + #[structopt(short, long)] + pub conf: Option<String>, +} + +/// Create the default instance of Opt based on the current command-line args. +/// This is equivalent to running `Opt::from_args` but does not require the +/// caller to have included the `structopt::StructOpt` +impl Default for Opt { + fn default() -> Self { + Opt::from_args() + } +} + +impl ServerConf { + // Does not has to be async until we want runtime reload + pub fn load_from_yaml<P>(path: P) -> Result<Self> + where + P: AsRef<std::path::Path> + std::fmt::Display, + { + let conf_str = fs::read_to_string(&path).or_err_with(ReadError, || { + format!("Unable to read conf file from {path}") + })?; + debug!("Conf file read from {path}"); + Self::from_yaml(&conf_str) + } + + pub fn load_yaml_with_opt_override(opt: &Opt) -> Result<Self> { + if let Some(path) = &opt.conf { + let mut conf = Self::load_from_yaml(path)?; + if opt.daemon { + conf.daemon = true; + } + Ok(conf) + } else { + Error::e_explain(ReadError, "No path specified") + } + } + + pub fn new() -> Option<Self> { + Self::from_yaml("---\nversion: 1").ok() + } + + pub fn new_with_opt_override(opt: &Opt) -> Option<Self> { + let conf = Self::new(); + match conf { + Some(mut c) => { + if opt.daemon { + c.daemon = true; + } + Some(c) + } + None => None, + } + } + + pub fn from_yaml(conf_str: &str) -> Result<Self> { + trace!("Read conf file: {conf_str}"); + let conf: ServerConf = serde_yaml::from_str(conf_str).or_err_with(ReadError, || { + format!("Unable to parse yaml conf {conf_str}") + })?; + + trace!("Loaded conf: {conf:?}"); + conf.validate() + } + + pub fn to_yaml(&self) -> String { + serde_yaml::to_string(self).unwrap() + } + + pub fn validate(self) -> Result<Self> { + // TODO: do the validation + Ok(self) + } +} + +#[cfg(test)] +mod tests { + use super::*; + + fn init_log() { + let _ = env_logger::builder().is_test(true).try_init(); + } + + #[test] + fn not_a_test_i_cannot_write_yaml_by_hand() { + init_log(); + let conf = ServerConf { + version: 1, + client_bind_to_ipv4: vec!["1.2.3.4".to_string(), "5.6.7.8".to_string()], + client_bind_to_ipv6: vec![], + ca_file: None, + daemon: false, + error_log: None, + pid_file: "".to_string(), + upgrade_sock: "".to_string(), + user: None, + group: None, + threads: 1, + work_stealing: true, + upstream_keepalive_pool_size: 4, + upstream_connect_offload_threadpools: None, + upstream_connect_offload_thread_per_pool: None, + }; + // cargo test -- --nocapture not_a_test_i_cannot_write_yaml_by_hand + println!("{}", conf.to_yaml()); + } + + #[test] + fn test_load_file() { + init_log(); + let conf_str = r#" +--- +version: 1 +client_bind_to_ipv4: + - 1.2.3.4 + - 5.6.7.8 +client_bind_to_ipv6: [] + "# + .to_string(); + let conf = ServerConf::from_yaml(&conf_str).unwrap(); + assert_eq!(2, conf.client_bind_to_ipv4.len()); + assert_eq!(0, conf.client_bind_to_ipv6.len()); + assert_eq!(1, conf.version); + } + + #[test] + fn test_default() { + init_log(); + let conf_str = r#" +--- +version: 1 + "# + .to_string(); + let conf = ServerConf::from_yaml(&conf_str).unwrap(); + assert_eq!(0, conf.client_bind_to_ipv4.len()); + assert_eq!(0, conf.client_bind_to_ipv6.len()); + assert_eq!(1, conf.version); + assert_eq!("/tmp/pingora.pid", conf.pid_file); + } +} diff --git a/pingora-core/src/server/daemon.rs b/pingora-core/src/server/daemon.rs new file mode 100644 index 0000000..fa0f9d1 --- /dev/null +++ b/pingora-core/src/server/daemon.rs @@ -0,0 +1,112 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use daemonize::Daemonize; +use log::{debug, error}; +use std::ffi::CString; +use std::fs::{self, OpenOptions}; +use std::os::unix::prelude::OpenOptionsExt; +use std::path::Path; + +use crate::server::configuration::ServerConf; + +// Utilities to daemonize a pingora server, i.e. run the process in the background, possibly +// under a different running user and/or group. + +// XXX: this operation should have been done when the old service is exiting. +// Now the new pid file just kick the old one out of the way +fn move_old_pid(path: &str) { + if !Path::new(path).exists() { + debug!("Old pid file does not exist"); + return; + } + let new_path = format!("{path}.old"); + match fs::rename(path, &new_path) { + Ok(()) => { + debug!("Old pid file renamed"); + } + Err(e) => { + error!( + "failed to rename pid file from {} to {}: {}", + path, new_path, e + ); + } + } +} + +unsafe fn gid_for_username(name: &CString) -> Option<libc::gid_t> { + let passwd = libc::getpwnam(name.as_ptr() as *const libc::c_char); + if !passwd.is_null() { + return Some((*passwd).pw_gid); + } + None +} + +/// Start a server instance as a daemon. +pub fn daemonize(conf: &ServerConf) { + // TODO: customize working dir + + let daemonize = Daemonize::new() + .umask(0o007) // allow same group to access files but not everyone else + .pid_file(&conf.pid_file); + + let daemonize = if let Some(error_log) = conf.error_log.as_ref() { + let err = OpenOptions::new() + .append(true) + .create(true) + // open read() in case there are no readers + // available otherwise we will panic with + // an ENXIO since O_NONBLOCK is set + .read(true) + .custom_flags(libc::O_NONBLOCK) + .open(error_log) + .unwrap(); + daemonize.stderr(err) + } else { + daemonize + }; + + let daemonize = match conf.user.as_ref() { + Some(user) => { + let user_cstr = CString::new(user.as_str()).unwrap(); + + #[cfg(target_os = "macos")] + let group_id = unsafe { gid_for_username(&user_cstr).map(|gid| gid as i32) }; + #[cfg(target_os = "linux")] + let group_id = unsafe { gid_for_username(&user_cstr) }; + + daemonize + .privileged_action(move || { + if let Some(gid) = group_id { + // Set the supplemental group privileges for the child process. + unsafe { + libc::initgroups(user_cstr.as_ptr() as *const libc::c_char, gid); + } + } + }) + .user(user.as_str()) + .chown_pid_file(true) + } + None => daemonize, + }; + + let daemonize = match conf.group.as_ref() { + Some(group) => daemonize.group(group.as_str()), + None => daemonize, + }; + + move_old_pid(&conf.pid_file); + + daemonize.start().unwrap(); // hard crash when fail +} diff --git a/pingora-core/src/server/mod.rs b/pingora-core/src/server/mod.rs new file mode 100644 index 0000000..4273550 --- /dev/null +++ b/pingora-core/src/server/mod.rs @@ -0,0 +1,341 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Server process and configuration management + +pub mod configuration; +mod daemon; +pub(crate) mod transfer_fd; + +use daemon::daemonize; +use log::{debug, error, info}; +use pingora_runtime::Runtime; +use pingora_timeout::fast_timeout; +use std::clone::Clone; +use std::sync::Arc; +use std::thread; +use tokio::signal::unix; +use tokio::sync::{watch, Mutex}; +use tokio::time::{sleep, Duration}; + +use crate::services::Service; +use configuration::{Opt, ServerConf}; +use transfer_fd::Fds; + +use pingora_error::{Error, ErrorType, Result}; + +/* time to wait before exiting the program +this is the graceful period for all existing session to finish */ +const EXIT_TIMEOUT: u64 = 60 * 5; +/* time to wait before shutting down listening sockets +this is the graceful period for the new service to get ready */ +const CLOSE_TIMEOUT: u64 = 5; + +enum ShutdownType { + Graceful, + Quick, +} + +/// The receiver for server's shutdown event. The value will turn to true once the server starts +/// to shutdown +pub type ShutdownWatch = watch::Receiver<bool>; +pub(crate) type ListenFds = Arc<Mutex<Fds>>; + +/// The server object +/// +/// This object represents an entire pingora server process which may have multiple independent +/// services (see [crate::services]). The server object handles signals, reading configuration, +/// zero downtime upgrade and error reporting. +pub struct Server { + services: Vec<Box<dyn Service>>, + listen_fds: Option<ListenFds>, + shutdown_watch: watch::Sender<bool>, + // TODO: we many want to drop this copy to let sender call closed() + shutdown_recv: ShutdownWatch, + /// the parsed server configuration + pub configuration: Arc<ServerConf>, + /// the parser command line options + pub options: Option<Opt>, + /// the Sentry DSN + /// + /// Panics and other events sentry captures will send to this DSN **only in release mode** + pub sentry: Option<String>, +} + +// TODO: delete the pid when exit + +impl Server { + async fn main_loop(&self) -> ShutdownType { + // waiting for exit signal + // TODO: there should be a signal handling function + let mut graceful_upgrade_signal = unix::signal(unix::SignalKind::quit()).unwrap(); + let mut graceful_terminate_signal = unix::signal(unix::SignalKind::terminate()).unwrap(); + let mut fast_shutdown_signal = unix::signal(unix::SignalKind::interrupt()).unwrap(); + tokio::select! { + _ = fast_shutdown_signal.recv() => { + info!("SIGINT received, exiting"); + ShutdownType::Quick + }, + _ = graceful_terminate_signal.recv() => { + // we receive a graceful terminate, all instances are instructed to stop + info!("SIGTERM received, gracefully exiting"); + // graceful shutdown if there are listening sockets + info!("Broadcasting graceful shutdown"); + match self.shutdown_watch.send(true) { + Ok(_) => { info!("Graceful shutdown started!"); } + Err(e) => { + error!("Graceful shutdown broadcast failed: {e}"); + } + } + info!("Broadcast graceful shutdown complete"); + ShutdownType::Graceful + } + _ = graceful_upgrade_signal.recv() => { + // TODO: still need to select! on signals in case a fast shutdown is needed + // aka: move below to another task and only kick it off here + info!("SIGQUIT received, sending socks and gracefully exiting"); + if let Some(fds) = &self.listen_fds { + let fds = fds.lock().await; + info!("Trying to send socks"); + // XXX: this is blocking IO + match fds.send_to_sock( + self.configuration.as_ref().upgrade_sock.as_str()) + { + Ok(_) => {info!("listener sockets sent");}, + Err(e) => { + error!("Unable to send listener sockets to new process: {e}"); + // sentry log error on fd send failure + #[cfg(not(debug_assertions))] + sentry::capture_error(&e); + } + } + sleep(Duration::from_secs(CLOSE_TIMEOUT)).await; + info!("Broadcasting graceful shutdown"); + // gracefully exiting + match self.shutdown_watch.send(true) { + Ok(_) => { info!("Graceful shutdown started!"); } + Err(e) => { + error!("Graceful shutdown broadcast failed: {e}"); + // switch to fast shutdown + return ShutdownType::Graceful; + } + } + info!("Broadcast graceful shutdown complete"); + ShutdownType::Graceful + } else { + info!("No socks to send, shutting down."); + ShutdownType::Graceful + } + }, + } + } + + fn run_service( + mut service: Box<dyn Service>, + fds: Option<ListenFds>, + shutdown: ShutdownWatch, + threads: usize, + work_stealing: bool, + ) -> Runtime +// NOTE: we need to keep the runtime outside async since + // otherwise the runtime will be dropped. + { + let service_runtime = Server::create_runtime(service.name(), threads, work_stealing); + service_runtime.get_handle().spawn(async move { + service.start_service(fds, shutdown).await; + info!("service exited.") + }); + service_runtime + } + + fn load_fds(&mut self, upgrade: bool) -> Result<(), nix::Error> { + let mut fds = Fds::new(); + if upgrade { + debug!("Trying to receive socks"); + fds.get_from_sock(self.configuration.as_ref().upgrade_sock.as_str())? + } + self.listen_fds = Some(Arc::new(Mutex::new(fds))); + Ok(()) + } + + /// Create a new [`Server`]. + /// + /// Only one [`Server`] needs to be created for a process. A [`Server`] can hold multiple + /// independent services. + /// + /// Command line options can either be passed by parsing the command line arguments via + /// `Opt::from_args()`, or be generated by other means. + pub fn new(opt: Option<Opt>) -> Result<Server> { + let (tx, rx) = watch::channel(false); + + let conf = if let Some(opt) = opt.as_ref() { + opt.conf.as_ref().map_or_else( + || { + // options, no conf, generated + ServerConf::new_with_opt_override(opt).ok_or_else(|| { + Error::explain(ErrorType::ReadError, "Conf generation failed") + }) + }, + |_| { + // options and conf loaded + ServerConf::load_yaml_with_opt_override(opt) + }, + ) + } else { + ServerConf::new() + .ok_or_else(|| Error::explain(ErrorType::ReadError, "Conf generation failed")) + }?; + + Ok(Server { + services: vec![], + listen_fds: None, + shutdown_watch: tx, + shutdown_recv: rx, + configuration: Arc::new(conf), + options: opt, + sentry: None, + }) + } + + /// Add a service to this server. + /// + /// A service is anything that implements [`Service`]. + pub fn add_service(&mut self, service: impl Service + 'static) { + self.services.push(Box::new(service)); + } + + /// Similar to [`Self::add_service()`], but take a list of services + pub fn add_services(&mut self, services: Vec<Box<dyn Service>>) { + self.services.extend(services); + } + + /// Prepare the server to start + /// + /// When trying to zero downtime upgrade from an older version of the server which is already + /// running, this function will try to get all its listening sockets in order to take them over. + pub fn bootstrap(&mut self) { + info!("Bootstrap starting"); + debug!("{:#?}", self.options); + + /* only init sentry in release builds */ + #[cfg(not(debug_assertions))] + let _guard = match self.sentry.as_ref() { + Some(uri) => Some(sentry::init(uri.as_str())), + None => None, + }; + + if self.options.as_ref().map_or(false, |o| o.test) { + info!("Server Test passed, exiting"); + std::process::exit(0); + } + + // load fds + match self.load_fds(self.options.as_ref().map_or(false, |o| o.upgrade)) { + Ok(_) => { + info!("Bootstrap done"); + } + Err(e) => { + // sentry log error on fd load failure + #[cfg(not(debug_assertions))] + sentry::capture_error(&e); + + error!("Bootstrap failed on error: {:?}, exiting.", e); + std::process::exit(1); + } + } + } + + /// Start the server + /// + /// This function will block forever until the server needs to quit. So this would be the last + /// function to call for this object. + /// + /// Note: this function may fork the process for daemonization, so any additional threads created + /// before this function will be lost to any service logic once this function is called. + pub fn run_forever(&mut self) { + info!("Server starting"); + + let conf = self.configuration.as_ref(); + + if conf.daemon { + info!("Daemonizing the server"); + fast_timeout::pause_for_fork(); + daemonize(&self.configuration); + fast_timeout::unpause(); + } + + /* only init sentry in release builds */ + #[cfg(not(debug_assertions))] + let _guard = match self.sentry.as_ref() { + Some(uri) => Some(sentry::init(uri.as_str())), + None => None, + }; + + let mut runtimes: Vec<Runtime> = Vec::new(); + + while let Some(service) = self.services.pop() { + let threads = service.threads().unwrap_or(conf.threads); + let runtime = Server::run_service( + service, + self.listen_fds.clone(), + self.shutdown_recv.clone(), + threads, + conf.work_stealing, + ); + runtimes.push(runtime); + } + + // blocked on main loop so that it runs forever + // Only work steal runtime can use block_on() + let server_runtime = Server::create_runtime("Server", 1, true); + let shutdown_type = server_runtime.get_handle().block_on(self.main_loop()); + + if matches!(shutdown_type, ShutdownType::Graceful) { + info!("Graceful shutdown: grace period {}s starts", EXIT_TIMEOUT); + thread::sleep(Duration::from_secs(EXIT_TIMEOUT)); + info!("Graceful shutdown: grace period ends"); + } + + // Give tokio runtimes time to exit + let shutdown_timeout = match shutdown_type { + ShutdownType::Quick => Duration::from_secs(0), + ShutdownType::Graceful => Duration::from_secs(5), + }; + let shutdowns: Vec<_> = runtimes + .into_iter() + .map(|rt| { + info!("Waiting for runtimes to exit!"); + thread::spawn(move || { + rt.shutdown_timeout(shutdown_timeout); + thread::sleep(shutdown_timeout) + }) + }) + .collect(); + for shutdown in shutdowns { + if let Err(e) = shutdown.join() { + error!("Failed to shutdown runtime: {:?}", e); + } + } + info!("All runtimes exited, exiting now"); + std::process::exit(0); + } + + fn create_runtime(name: &str, threads: usize, work_steal: bool) -> Runtime { + if work_steal { + Runtime::new_steal(threads, name) + } else { + Runtime::new_no_steal(threads, name) + } + } +} diff --git a/pingora-core/src/server/transfer_fd/mod.rs b/pingora-core/src/server/transfer_fd/mod.rs new file mode 100644 index 0000000..ae07e33 --- /dev/null +++ b/pingora-core/src/server/transfer_fd/mod.rs @@ -0,0 +1,461 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#[cfg(target_os = "linux")] +use log::{debug, error, warn}; +use nix::errno::Errno; +#[cfg(target_os = "linux")] +use nix::sys::socket::{self, AddressFamily, RecvMsg, SockFlag, SockType, UnixAddr}; +#[cfg(target_os = "linux")] +use nix::sys::stat; +use nix::{Error, NixPath}; +use std::collections::HashMap; +use std::io::Write; +#[cfg(target_os = "linux")] +use std::io::{IoSlice, IoSliceMut}; +use std::os::unix::io::RawFd; +#[cfg(target_os = "linux")] +use std::{thread, time}; + +// Utilities to transfer file descriptors between sockets, e.g. during graceful upgrades. + +/// Container for open file descriptors and their associated bind addresses. +pub struct Fds { + map: HashMap<String, RawFd>, +} + +impl Fds { + pub fn new() -> Self { + Fds { + map: HashMap::new(), + } + } + + pub fn add(&mut self, bind: String, fd: RawFd) { + self.map.insert(bind, fd); + } + + pub fn get(&self, bind: &str) -> Option<&RawFd> { + self.map.get(bind) + } + + pub fn serialize(&self) -> (Vec<String>, Vec<RawFd>) { + let serialized: Vec<(String, RawFd)> = self + .map + .iter() + .map(|(key, value)| (key.clone(), *value)) + .collect(); + + ( + serialized.iter().map(|v| v.0.clone()).collect(), + serialized.iter().map(|v| v.1).collect(), + ) + // Surely there is a better way of doing this + } + + pub fn deserialize(&mut self, binds: Vec<String>, fds: Vec<RawFd>) { + assert!(binds.len() == fds.len()); + // TODO: use zip() + for i in 0..binds.len() { + self.map.insert(binds[i].clone(), fds[i]); + } + } + + pub fn send_to_sock<P>(&self, path: &P) -> Result<usize, Error> + where + P: ?Sized + NixPath + std::fmt::Display, + { + let (vec_key, vec_fds) = self.serialize(); + let mut ser_buf: [u8; 2048] = [0; 2048]; + let ser_key_size = serialize_vec_string(&vec_key, &mut ser_buf); + send_fds_to(vec_fds, &ser_buf[..ser_key_size], path) + } + + pub fn get_from_sock<P>(&mut self, path: &P) -> Result<(), Error> + where + P: ?Sized + NixPath + std::fmt::Display, + { + let mut de_buf: [u8; 2048] = [0; 2048]; + let (fds, bytes) = get_fds_from(path, &mut de_buf)?; + let keys = deserialize_vec_string(&de_buf[..bytes]); + self.deserialize(keys, fds); + Ok(()) + } +} + +fn serialize_vec_string(vec_string: &[String], mut buf: &mut [u8]) -> usize { + // There are many way to do this. serde is probably the way to go + // But let's start with something simple: space separated strings + let joined = vec_string.join(" "); + // TODO: check the buf is large enough + buf.write(joined.as_bytes()).unwrap() +} + +fn deserialize_vec_string(buf: &[u8]) -> Vec<String> { + let joined = std::str::from_utf8(buf).unwrap(); // TODO: handle error + let mut results: Vec<String> = Vec::new(); + for iter in joined.split_ascii_whitespace() { + results.push(String::from(iter)); + } + results +} + +#[cfg(target_os = "linux")] +pub fn get_fds_from<P>(path: &P, payload: &mut [u8]) -> Result<(Vec<RawFd>, usize), Error> +where + P: ?Sized + NixPath + std::fmt::Display, +{ + const MAX_FDS: usize = 32; + + let listen_fd = socket::socket( + AddressFamily::Unix, + SockType::Stream, + SockFlag::SOCK_NONBLOCK, + None, + ) + .unwrap(); + let unix_addr = UnixAddr::new(path).unwrap(); + // clean up old sock + match nix::unistd::unlink(path) { + Ok(()) => { + debug!("unlink {} done", path); + } + Err(e) => { + // Normal if file does not exist + debug!("unlink {} failed: {}", path, e); + // TODO: warn if exist but not able to unlink + } + }; + socket::bind(listen_fd, &unix_addr).unwrap(); + + /* sock is created before we change user, need to give permission to all */ + stat::fchmodat( + None, + path, + stat::Mode::all(), + stat::FchmodatFlags::FollowSymlink, + ) + .unwrap(); + + socket::listen(listen_fd, 8).unwrap(); + + let fd = match accept_with_retry(listen_fd) { + Ok(fd) => fd, + Err(e) => { + error!("Giving up reading socket from: {path}, error: {e:?}"); + //cleanup + if nix::unistd::close(listen_fd).is_ok() { + nix::unistd::unlink(path).unwrap(); + } + return Err(e); + } + }; + + let mut io_vec = [IoSliceMut::new(payload); 1]; + let mut cmsg_buf = nix::cmsg_space!([RawFd; MAX_FDS]); + let msg: RecvMsg<UnixAddr> = socket::recvmsg( + fd, + &mut io_vec, + Some(&mut cmsg_buf), + socket::MsgFlags::empty(), + ) + .unwrap(); + + let mut fds: Vec<RawFd> = Vec::new(); + for cmsg in msg.cmsgs() { + if let socket::ControlMessageOwned::ScmRights(mut vec_fds) = cmsg { + fds.append(&mut vec_fds) + } else { + warn!("Unexpected control messages: {cmsg:?}") + } + } + + //cleanup + if nix::unistd::close(listen_fd).is_ok() { + nix::unistd::unlink(path).unwrap(); + } + + Ok((fds, msg.bytes)) +} + +#[cfg(not(target_os = "linux"))] +pub fn get_fds_from<P>(_path: &P, _payload: &mut [u8]) -> Result<(Vec<RawFd>, usize), Error> +where + P: ?Sized + NixPath + std::fmt::Display, +{ + Err(Errno::ECONNREFUSED) +} + +#[cfg(target_os = "linux")] +const MAX_RETRY: usize = 5; +#[cfg(target_os = "linux")] +const RETRY_INTERVAL: time::Duration = time::Duration::from_secs(1); + +#[cfg(target_os = "linux")] +fn accept_with_retry(listen_fd: i32) -> Result<i32, Error> { + let mut retried = 0; + loop { + match socket::accept(listen_fd) { + Ok(fd) => return Ok(fd), + Err(e) => { + if retried > MAX_RETRY { + return Err(e); + } + match e { + Errno::EAGAIN => { + error!( + "No incoming socket transfer, sleep {RETRY_INTERVAL:?} and try again" + ); + retried += 1; + thread::sleep(RETRY_INTERVAL); + } + _ => { + error!("Error accepting socket transfer: {e}"); + return Err(e); + } + } + } + } + } +} + +#[cfg(target_os = "linux")] +pub fn send_fds_to<P>(fds: Vec<RawFd>, payload: &[u8], path: &P) -> Result<usize, Error> +where + P: ?Sized + NixPath + std::fmt::Display, +{ + const MAX_NONBLOCKING_POLLS: usize = 20; + const NONBLOCKING_POLL_INTERVAL: time::Duration = time::Duration::from_millis(500); + + let send_fd = socket::socket( + AddressFamily::Unix, + SockType::Stream, + SockFlag::SOCK_NONBLOCK, + None, + )?; + let unix_addr = UnixAddr::new(path)?; + let mut retried = 0; + let mut nonblocking_polls = 0; + + let conn_result: Result<usize, Error> = loop { + match socket::connect(send_fd, &unix_addr) { + Ok(_) => break Ok(0), + Err(e) => match e { + /* If the new process hasn't created the upgrade sock we'll get an ENOENT. + ECONNREFUSED may happen if the sock wasn't cleaned up + and the old process tries sending before the new one is listening. + EACCES may happen if connect() happen before the correct permission is set */ + Errno::ENOENT | Errno::ECONNREFUSED | Errno::EACCES => { + /*the server is not ready yet*/ + retried += 1; + if retried > MAX_RETRY { + error!( + "Max retry: {} reached. Giving up sending socket to: {}, error: {:?}", + MAX_RETRY, path, e + ); + break Err(e); + } + warn!("server not ready, will try again in {RETRY_INTERVAL:?}"); + thread::sleep(RETRY_INTERVAL); + } + /* handle nonblocking IO */ + Errno::EINPROGRESS => { + nonblocking_polls += 1; + if nonblocking_polls >= MAX_NONBLOCKING_POLLS { + error!("Connect() not ready after retries when sending socket to: {path}",); + break Err(e); + } + warn!("Connect() not ready, will try again in {NONBLOCKING_POLL_INTERVAL:?}",); + thread::sleep(NONBLOCKING_POLL_INTERVAL); + } + _ => { + error!("Error sending socket to: {path}, error: {e:?}"); + break Err(e); + } + }, + } + }; + + let result = match conn_result { + Ok(_) => { + let io_vec = [IoSlice::new(payload); 1]; + let scm = socket::ControlMessage::ScmRights(fds.as_slice()); + let cmsg = [scm; 1]; + loop { + match socket::sendmsg( + send_fd, + &io_vec, + &cmsg, + socket::MsgFlags::empty(), + None::<&UnixAddr>, + ) { + Ok(result) => break Ok(result), + Err(e) => match e { + /* handle nonblocking IO */ + Errno::EAGAIN => { + nonblocking_polls += 1; + if nonblocking_polls >= MAX_NONBLOCKING_POLLS { + error!( + "Sendmsg() not ready after retries when sending socket to: {}", + path + ); + break Err(e); + } + warn!( + "Sendmsg() not ready, will try again in {:?}", + NONBLOCKING_POLL_INTERVAL + ); + thread::sleep(NONBLOCKING_POLL_INTERVAL); + } + _ => break Err(e), + }, + } + } + } + Err(_) => conn_result, + }; + + nix::unistd::close(send_fd).unwrap(); + result +} + +#[cfg(not(target_os = "linux"))] +pub fn send_fds_to<P>(_fds: Vec<RawFd>, _payload: &[u8], _path: &P) -> Result<usize, Error> +where + P: ?Sized + NixPath + std::fmt::Display, +{ + Ok(0) +} + +#[cfg(test)] +#[cfg(target_os = "linux")] +mod tests { + use super::*; + use log::{debug, error}; + use std::thread; + + fn init_log() { + let _ = env_logger::builder().is_test(true).try_init(); + } + + #[test] + fn test_add_get() { + init_log(); + let mut fds = Fds::new(); + let key = "1.1.1.1:80".to_string(); + fds.add(key.clone(), 128); + assert_eq!(128, *fds.get(&key).unwrap()); + } + + #[test] + fn test_table_serde() { + init_log(); + let mut fds = Fds::new(); + let key1 = "1.1.1.1:80".to_string(); + fds.add(key1.clone(), 128); + let key2 = "1.1.1.1:443".to_string(); + fds.add(key2.clone(), 129); + + let (k, v) = fds.serialize(); + let mut fds2 = Fds::new(); + fds2.deserialize(k, v); + + assert_eq!(128, *fds2.get(&key1).unwrap()); + assert_eq!(129, *fds2.get(&key2).unwrap()); + } + + #[test] + fn test_vec_string_serde() { + init_log(); + let vec_str: Vec<String> = vec!["aaaa".to_string(), "bbb".to_string()]; + let mut ser_buf: [u8; 1024] = [0; 1024]; + let size = serialize_vec_string(&vec_str, &mut ser_buf); + let de_vec_string = deserialize_vec_string(&ser_buf[..size]); + assert_eq!(de_vec_string.len(), 2); + assert_eq!(de_vec_string[0], "aaaa"); + assert_eq!(de_vec_string[1], "bbb"); + } + + #[test] + fn test_send_receive_fds() { + init_log(); + let dumb_fd = socket::socket( + AddressFamily::Unix, + SockType::Stream, + SockFlag::empty(), + None, + ) + .unwrap(); + + // receiver need to start in another thread since it is blocking + let child = thread::spawn(move || { + let mut buf: [u8; 32] = [0; 32]; + let (fds, bytes) = get_fds_from("/tmp/pingora_fds_receive.sock", &mut buf).unwrap(); + debug!("{:?}", fds); + assert_eq!(1, fds.len()); + assert_eq!(32, bytes); + assert_eq!(1, buf[0]); + assert_eq!(1, buf[31]); + }); + + let fds = vec![dumb_fd]; + let buf: [u8; 128] = [1; 128]; + match send_fds_to(fds, &buf, "/tmp/pingora_fds_receive.sock") { + Ok(sent) => { + assert!(sent > 0); + } + Err(e) => { + error!("{:?}", e); + panic!() + } + } + + child.join().unwrap(); + } + + #[test] + fn test_serde_via_socket() { + init_log(); + let mut fds = Fds::new(); + let key1 = "1.1.1.1:80".to_string(); + let dumb_fd1 = socket::socket( + AddressFamily::Unix, + SockType::Stream, + SockFlag::empty(), + None, + ) + .unwrap(); + fds.add(key1.clone(), dumb_fd1); + let key2 = "1.1.1.1:443".to_string(); + let dumb_fd2 = socket::socket( + AddressFamily::Unix, + SockType::Stream, + SockFlag::empty(), + None, + ) + .unwrap(); + fds.add(key2.clone(), dumb_fd2); + + let child = thread::spawn(move || { + let mut fds2 = Fds::new(); + fds2.get_from_sock("/tmp/pingora_fds_receive2.sock") + .unwrap(); + assert!(*fds2.get(&key1).unwrap() > 0); + assert!(*fds2.get(&key2).unwrap() > 0); + }); + + fds.send_to_sock("/tmp/pingora_fds_receive2.sock").unwrap(); + child.join().unwrap(); + } +} diff --git a/pingora-core/src/services/background.rs b/pingora-core/src/services/background.rs new file mode 100644 index 0000000..4eec577 --- /dev/null +++ b/pingora-core/src/services/background.rs @@ -0,0 +1,84 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! The background service +//! +//! A [BackgroundService] can be run as part of a Pingora application to add supporting logic that +//! exists outside of the request/response lifecycle. +//! Examples might include service discovery (load balancing) and background updates such as +//! push-style metrics. + +use async_trait::async_trait; +use std::sync::Arc; + +use super::Service; +use crate::server::{ListenFds, ShutdownWatch}; + +/// The background service interface +#[cfg_attr(not(doc_async_trait), async_trait)] +pub trait BackgroundService { + /// This function is called when the pingora server tries to start all the + /// services. The background service can return at anytime or wait for the + /// `shutdown` signal. + async fn start(&self, mut shutdown: ShutdownWatch); +} + +/// A generic type of background service +pub struct GenBackgroundService<A> { + // Name of the service + name: String, + // Task the service will execute + task: Arc<A>, + /// The number of threads. Default is 1 + pub threads: Option<usize>, +} + +impl<A> GenBackgroundService<A> { + /// Generates a background service that can run in the pingora runtime + pub fn new(name: String, task: Arc<A>) -> Self { + Self { + name, + task, + threads: Some(1), + } + } + + /// Return the task behind [Arc] to be shared other logic. + pub fn task(&self) -> Arc<A> { + self.task.clone() + } +} + +#[async_trait] +impl<A> Service for GenBackgroundService<A> +where + A: BackgroundService + Send + Sync + 'static, +{ + async fn start_service(&mut self, _fds: Option<ListenFds>, shutdown: ShutdownWatch) { + self.task.start(shutdown).await; + } + + fn name(&self) -> &str { + &self.name + } + + fn threads(&self) -> Option<usize> { + self.threads + } +} + +// Helper function to create a background service with a human readable name +pub fn background_service<SV>(name: &str, task: SV) -> GenBackgroundService<SV> { + GenBackgroundService::new(format!("BG {name}"), Arc::new(task)) +} diff --git a/pingora-core/src/services/listening.rs b/pingora-core/src/services/listening.rs new file mode 100644 index 0000000..6960034 --- /dev/null +++ b/pingora-core/src/services/listening.rs @@ -0,0 +1,232 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! The listening service +//! +//! A [Service] (listening service) responds to incoming requests on its endpoints. +//! Each [Service] can be configured with custom application logic (e.g. an `HTTPProxy`) and one or +//! more endpoints to listen to. + +use crate::apps::ServerApp; +use crate::listeners::{Listeners, ServerAddress, TcpSocketOptions, TlsSettings, TransportStack}; +use crate::protocols::Stream; +use crate::server::{ListenFds, ShutdownWatch}; +use crate::services::Service as ServiceTrait; + +use async_trait::async_trait; +use log::{debug, error, info}; +use pingora_error::Result; +use pingora_runtime::current_handle; +use std::fs::Permissions; +use std::sync::Arc; + +/// The type of service that is associated with a list of listening endpoints and a particular application +pub struct Service<A> { + name: String, + listeners: Listeners, + app_logic: Arc<A>, + /// The number of preferred threads. `None` to follow global setting. + pub threads: Option<usize>, +} + +impl<A> Service<A> { + /// Create a new [`Service`] with the given application (see [`crate::apps`]). + pub fn new(name: String, app_logic: Arc<A>) -> Self { + Service { + name, + listeners: Listeners::new(), + app_logic, + threads: None, + } + } + + /// Create a new [`Service`] with the given application (see [`crate::apps`]) and the given + /// [`Listeners`]. + pub fn with_listeners(name: String, listeners: Listeners, app_logic: Arc<A>) -> Self { + Service { + name, + listeners, + app_logic, + threads: None, + } + } + + /// Get the [`Listeners`], mostly to add more endpoints. + pub fn endpoints(&mut self) -> &mut Listeners { + &mut self.listeners + } + + // the follow add* function has no effect if the server is already started + + /// Add a TCP listening endpoint with the given address (e.g., `127.0.0.1:8000`). + pub fn add_tcp(&mut self, addr: &str) { + self.listeners.add_tcp(addr); + } + + /// Add a TCP listening endpoint with the given [`TcpSocketOptions`]. + pub fn add_tcp_with_settings(&mut self, addr: &str, sock_opt: TcpSocketOptions) { + self.listeners.add_tcp_with_settings(addr, sock_opt); + } + + /// Add an Unix domain socket listening endpoint with the given path. + /// + /// Optionally take a permission of the socket file. The default is read and write access for + /// everyone (0o666). + pub fn add_uds(&mut self, addr: &str, perm: Option<Permissions>) { + self.listeners.add_uds(addr, perm); + } + + /// Add a TLS listening endpoint with the given certificate and key paths. + pub fn add_tls(&mut self, addr: &str, cert_path: &str, key_path: &str) -> Result<()> { + self.listeners.add_tls(addr, cert_path, key_path) + } + + /// Add a TLS listening endpoint with the given [`TlsSettings`] and [`TcpSocketOptions`]. + pub fn add_tls_with_settings( + &mut self, + addr: &str, + sock_opt: Option<TcpSocketOptions>, + settings: TlsSettings, + ) { + self.listeners + .add_tls_with_settings(addr, sock_opt, settings) + } + + /// Add an endpoint according to the given [`ServerAddress`] + pub fn add_address(&mut self, addr: ServerAddress) { + self.listeners.add_address(addr); + } +} + +impl<A: ServerApp + Send + Sync + 'static> Service<A> { + pub async fn handle_event(event: Stream, app_logic: Arc<A>, shutdown: ShutdownWatch) { + debug!("new event!"); + let mut reuse_event = app_logic.process_new(event, &shutdown).await; + while let Some(event) = reuse_event { + // TODO: with no steal runtime, consider spawn() the next event on + // another thread for more evenly load balancing + debug!("new reusable event!"); + reuse_event = app_logic.process_new(event, &shutdown).await; + } + } + + async fn run_endpoint( + app_logic: Arc<A>, + mut stack: TransportStack, + mut shutdown: ShutdownWatch, + ) { + if let Err(e) = stack.listen().await { + error!("Listen() failed: {e}"); + return; + } + + // the accept loop, until the system is shutting down + loop { + let new_io = tokio::select! { // TODO: consider biased for perf reason? + new_io = stack.accept() => new_io, + shutdown_signal = shutdown.changed() => { + match shutdown_signal { + Ok(()) => { + if !*shutdown.borrow() { + // happen in the initial read + continue; + } + info!("Shutting down {}", stack.as_str()); + break; + } + Err(e) => { + error!("shutdown_signal error {e}"); + break; + } + } + } + }; + match new_io { + Ok(io) => { + let app = app_logic.clone(); + let shutdown = shutdown.clone(); + current_handle().spawn(async move { + match io.handshake().await { + Ok(io) => Self::handle_event(io, app, shutdown).await, + Err(e) => { + // TODO: Maybe IOApp trait needs a fn to handle/filter our this error + error!("Downstream handshake error {e}"); + } + } + }); + } + Err(e) => { + error!("Accept() failed {e}"); + if let Some(io_error) = e + .root_cause() + .downcast_ref::<std::io::Error>() + .and_then(|e| e.raw_os_error()) + { + // 24: too many open files. In this case accept() will continue return this + // error without blocking, which could use up all the resources + if io_error == 24 { + // call sleep to calm the thread down and wait for others to release + // some resources + tokio::time::sleep(std::time::Duration::from_secs(1)).await; + } + } + } + } + } + + stack.cleanup(); + } +} + +#[async_trait] +impl<A: ServerApp + Send + Sync + 'static> ServiceTrait for Service<A> { + async fn start_service(&mut self, fds: Option<ListenFds>, shutdown: ShutdownWatch) { + let runtime = current_handle(); + let endpoints = self.listeners.build(fds); + + let handlers = endpoints.into_iter().map(|endpoint| { + let app_logic = self.app_logic.clone(); + let shutdown = shutdown.clone(); + runtime.spawn(async move { + Self::run_endpoint(app_logic, endpoint, shutdown).await; + }) + }); + + futures::future::join_all(handlers).await; + self.listeners.cleanup(); + self.app_logic.cleanup(); + } + + fn name(&self) -> &str { + &self.name + } + + fn threads(&self) -> Option<usize> { + self.threads + } +} + +use crate::apps::prometheus_http_app::PrometheusServer; + +impl Service<PrometheusServer> { + /// The Prometheus HTTP server + /// + /// The HTTP server endpoint that reports Prometheus metrics collected in the entire service + pub fn prometheus_http_service() -> Self { + Service::new( + "Prometheus metric HTTP".to_string(), + Arc::new(PrometheusServer::new()), + ) + } +} diff --git a/pingora-core/src/services/mod.rs b/pingora-core/src/services/mod.rs new file mode 100644 index 0000000..67e72dc --- /dev/null +++ b/pingora-core/src/services/mod.rs @@ -0,0 +1,55 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! The service interface +//! +//! A service to the pingora server is just something runs forever until the server is shutting +//! down. +//! +//! Two types of services are particularly useful +//! - services that are listening to some (TCP) endpoints +//! - services that are just running in the background. + +use async_trait::async_trait; + +use crate::server::{ListenFds, ShutdownWatch}; + +pub mod background; +pub mod listening; + +/// The service interface +#[async_trait] +pub trait Service: Sync + Send { + /// This function will be called when the server is ready to start the service. + /// + /// - `fds`: a collection of listening file descriptors. During zero downtime restart + /// the `fds` would contain the listening sockets passed from the old service, services should + /// take the sockets they need to use then. If the sockets the service looks for don't appear in + /// the collection, the service should create its own listening sockets and then put them into + /// the collection in order for them to be passed to the next server. + /// - `shutdown`: the shutdown signal this server would receive. + async fn start_service(&mut self, fds: Option<ListenFds>, mut shutdown: ShutdownWatch); + + /// The name of the service, just for logging and naming the threads assigned to this service + /// + /// Note that due to the limit of the underlying system, only the first 16 chars will be used + fn name(&self) -> &str; + + /// The preferred number of threads to run this service + /// + /// If `None`, the global setting will be used + fn threads(&self) -> Option<usize> { + None + } +} diff --git a/pingora-core/src/upstreams/mod.rs b/pingora-core/src/upstreams/mod.rs new file mode 100644 index 0000000..7352b61 --- /dev/null +++ b/pingora-core/src/upstreams/mod.rs @@ -0,0 +1,17 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! The interface to connect to a remote server + +pub mod peer; diff --git a/pingora-core/src/upstreams/peer.rs b/pingora-core/src/upstreams/peer.rs new file mode 100644 index 0000000..a36418b --- /dev/null +++ b/pingora-core/src/upstreams/peer.rs @@ -0,0 +1,537 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Defines where to connect to and how to connect to a remote server + +use ahash::AHasher; +use std::collections::BTreeMap; +use std::fmt::{Display, Formatter, Result as FmtResult}; +use std::hash::{Hash, Hasher}; +use std::net::{IpAddr, SocketAddr as InetSocketAddr, ToSocketAddrs as ToInetSocketAddrs}; +use std::os::unix::net::SocketAddr as UnixSocketAddr; +use std::os::unix::prelude::AsRawFd; +use std::path::{Path, PathBuf}; +use std::sync::Arc; +use std::time::Duration; + +pub use crate::protocols::l4::ext::TcpKeepalive; +use crate::protocols::l4::socket::SocketAddr; +use crate::protocols::ConnFdReusable; +use crate::tls::x509::X509; +use crate::utils::{get_organization_unit, CertKey}; + +pub use crate::protocols::ssl::ALPN; + +/// The interface to trace the connection +pub trait Tracing: Send + Sync + std::fmt::Debug { + /// This method is called when successfully connected to a remote server + fn on_connected(&self); + /// This method is called when the connection is disconnected. + fn on_disconnected(&self); + /// A way to clone itself + fn boxed_clone(&self) -> Box<dyn Tracing>; +} + +/// An object-safe version of Tracing object that can use Clone +#[derive(Debug)] +pub struct Tracer(pub Box<dyn Tracing>); + +impl Clone for Tracer { + fn clone(&self) -> Self { + Tracer(self.0.boxed_clone()) + } +} + +/// [`Peer`] defines the interface to communicate with the [`crate::connectors`] regarding where to +/// connect to and how to connect to it. +pub trait Peer: Display + Clone { + /// The remote address to connect to + fn address(&self) -> &SocketAddr; + /// If TLS should be used; + fn tls(&self) -> bool; + /// The SNI to send, if TLS is used + fn sni(&self) -> &str; + /// To decide whether a [`Peer`] can use the connection established by another [`Peer`]. + /// + /// The connection to two peers are considered reusable to each other if their reuse hashes are + /// the same + fn reuse_hash(&self) -> u64; + /// Get the proxy setting to connect to the remote server + fn get_proxy(&self) -> Option<&Proxy> { + None + } + /// Get the additional options to connect to the peer. + /// + /// See [`PeerOptions`] for more details + fn get_peer_options(&self) -> Option<&PeerOptions> { + None + } + /// Get the additional options for modification. + fn get_mut_peer_options(&mut self) -> Option<&mut PeerOptions> { + None + } + /// Whether the TLS handshake should validate the cert of the server. + fn verify_cert(&self) -> bool { + match self.get_peer_options() { + Some(opt) => opt.verify_cert, + None => false, + } + } + /// Whether the TLS handshake should verify that the server cert matches the SNI. + fn verify_hostname(&self) -> bool { + match self.get_peer_options() { + Some(opt) => opt.verify_hostname, + None => false, + } + } + /// The alternative common name to use to verify the server cert. + /// + /// If the server cert doesn't match the SNI, this name will be used to + /// verify the cert. + fn alternative_cn(&self) -> Option<&String> { + match self.get_peer_options() { + Some(opt) => opt.alternative_cn.as_ref(), + None => None, + } + } + /// Which local source address this connection should be bind to. + fn bind_to(&self) -> Option<&InetSocketAddr> { + match self.get_peer_options() { + Some(opt) => opt.bind_to.as_ref(), + None => None, + } + } + /// How long connect() call should be wait before it returns a timeout error. + fn connection_timeout(&self) -> Option<Duration> { + match self.get_peer_options() { + Some(opt) => opt.connection_timeout, + None => None, + } + } + /// How long the overall connection establishment should take before a timeout error is returned. + fn total_connection_timeout(&self) -> Option<Duration> { + match self.get_peer_options() { + Some(opt) => opt.total_connection_timeout, + None => None, + } + } + /// If the connection can be reused, how long the connection should wait to be reused before it + /// shuts down. + fn idle_timeout(&self) -> Option<Duration> { + self.get_peer_options().and_then(|o| o.idle_timeout) + } + + /// Get the ALPN preference. + fn get_alpn(&self) -> Option<&ALPN> { + self.get_peer_options().map(|opt| &opt.alpn) + } + + /// Get the CA cert to use to validate the server cert. + /// + /// If not set, the default CAs will be used. + fn get_ca(&self) -> Option<&Arc<Box<[X509]>>> { + match self.get_peer_options() { + Some(opt) => opt.ca.as_ref(), + None => None, + } + } + + /// Get the client cert and key for mutual TLS if any + fn get_client_cert_key(&self) -> Option<&Arc<CertKey>> { + None + } + + /// The TCP keepalive setting that should be applied to this connection + fn tcp_keepalive(&self) -> Option<&TcpKeepalive> { + self.get_peer_options() + .and_then(|o| o.tcp_keepalive.as_ref()) + } + + /// The interval H2 pings to send to the server if any + fn h2_ping_interval(&self) -> Option<Duration> { + self.get_peer_options().and_then(|o| o.h2_ping_interval) + } + + fn matches_fd<V: AsRawFd>(&self, fd: V) -> bool { + self.address().check_fd_match(fd) + } + + fn get_tracer(&self) -> Option<Tracer> { + None + } +} + +/// A simple TCP or TLS peer without many complicated settings. +#[derive(Debug, Clone)] +pub struct BasicPeer { + pub _address: SocketAddr, + pub sni: String, + pub options: PeerOptions, +} + +impl BasicPeer { + /// Create a new [`BasicPeer`] + pub fn new(address: &str) -> Self { + BasicPeer { + _address: SocketAddr::Inet(address.parse().unwrap()), // TODO: check error, add support + // for UDS + sni: "".to_string(), // TODO: add support for SNI + options: PeerOptions::new(), + } + } +} + +impl Display for BasicPeer { + fn fmt(&self, f: &mut Formatter<'_>) -> FmtResult { + write!(f, "{:?}", self) + } +} + +impl Peer for BasicPeer { + fn address(&self) -> &SocketAddr { + &self._address + } + + fn tls(&self) -> bool { + !self.sni.is_empty() + } + + fn bind_to(&self) -> Option<&InetSocketAddr> { + None + } + + fn sni(&self) -> &str { + &self.sni + } + + // TODO: change connection pool to accept u64 instead of String + fn reuse_hash(&self) -> u64 { + let mut hasher = AHasher::default(); + self._address.hash(&mut hasher); + hasher.finish() + } + + fn get_peer_options(&self) -> Option<&PeerOptions> { + Some(&self.options) + } +} + +/// Define whether to connect via http or https +#[derive(Hash, Clone, Debug, PartialEq)] +pub enum Scheme { + HTTP, + HTTPS, +} + +impl Display for Scheme { + fn fmt(&self, f: &mut Formatter<'_>) -> FmtResult { + match self { + Scheme::HTTP => write!(f, "HTTP"), + Scheme::HTTPS => write!(f, "HTTPS"), + } + } +} + +impl Scheme { + pub fn from_tls_bool(tls: bool) -> Self { + if tls { + Self::HTTPS + } else { + Self::HTTP + } + } +} + +/// The preferences to connect to a remote server +/// +/// See [`Peer`] for the meaning of the fields +#[derive(Clone, Debug)] +pub struct PeerOptions { + pub bind_to: Option<InetSocketAddr>, + pub connection_timeout: Option<Duration>, + pub total_connection_timeout: Option<Duration>, + pub read_timeout: Option<Duration>, + pub idle_timeout: Option<Duration>, + pub write_timeout: Option<Duration>, + pub verify_cert: bool, + pub verify_hostname: bool, + /* accept the cert if it's CN matches the SNI or this name */ + pub alternative_cn: Option<String>, + pub alpn: ALPN, + pub ca: Option<Arc<Box<[X509]>>>, + pub tcp_keepalive: Option<TcpKeepalive>, + pub no_header_eos: bool, + pub h2_ping_interval: Option<Duration>, + // how many concurrent h2 stream are allowed in the same connection + pub max_h2_streams: usize, + pub extra_proxy_headers: BTreeMap<String, Vec<u8>>, + // The list of curve the tls connection should advertise + // if `None`, the default curves will be used + pub curves: Option<&'static str>, + // see ssl_use_second_key_share + pub second_keyshare: bool, + // use Arc because Clone is required but not allowed in trait object + pub tracer: Option<Tracer>, +} + +impl PeerOptions { + /// Create a new [`PeerOptions`] + pub fn new() -> Self { + PeerOptions { + bind_to: None, + connection_timeout: None, + total_connection_timeout: None, + read_timeout: None, + idle_timeout: None, + write_timeout: None, + verify_cert: true, + verify_hostname: true, + alternative_cn: None, + alpn: ALPN::H1, + ca: None, + tcp_keepalive: None, + no_header_eos: false, + h2_ping_interval: None, + max_h2_streams: 1, + extra_proxy_headers: BTreeMap::new(), + curves: None, + second_keyshare: true, // default true and noop when not using PQ curves + tracer: None, + } + } + + /// Set the ALPN according to the `max` and `min` constrains. + pub fn set_http_version(&mut self, max: u8, min: u8) { + self.alpn = ALPN::new(max, min); + } +} + +impl Display for PeerOptions { + fn fmt(&self, f: &mut Formatter<'_>) -> FmtResult { + if let Some(b) = self.bind_to { + write!(f, "bind_to: {:?},", b)?; + } + if let Some(t) = self.connection_timeout { + write!(f, "conn_timeout: {:?},", t)?; + } + if let Some(t) = self.total_connection_timeout { + write!(f, "total_conn_timeout: {:?},", t)?; + } + if self.verify_cert { + write!(f, "verify_cert: true,")?; + } + if self.verify_hostname { + write!(f, "verify_hostname: true,")?; + } + if let Some(cn) = &self.alternative_cn { + write!(f, "alt_cn: {},", cn)?; + } + write!(f, "alpn: {},", self.alpn)?; + if let Some(cas) = &self.ca { + for ca in cas.iter() { + write!( + f, + "CA: {}, expire: {},", + get_organization_unit(ca).unwrap_or_default(), + ca.not_after() + )?; + } + } + if let Some(tcp_keepalive) = &self.tcp_keepalive { + write!(f, "tcp_keepalive: {},", tcp_keepalive)?; + } + if self.no_header_eos { + write!(f, "no_header_eos: true,")?; + } + if let Some(h2_ping_interval) = self.h2_ping_interval { + write!(f, "h2_ping_interval: {:?},", h2_ping_interval)?; + } + Ok(()) + } +} + +/// A peer representing the remote HTTP server to connect to +#[derive(Debug, Clone)] +pub struct HttpPeer { + pub _address: SocketAddr, + pub scheme: Scheme, + pub sni: String, + pub proxy: Option<Proxy>, + pub client_cert_key: Option<Arc<CertKey>>, + pub options: PeerOptions, +} + +impl HttpPeer { + // These methods are pretty ad-hoc + pub fn is_tls(&self) -> bool { + match self.scheme { + Scheme::HTTP => false, + Scheme::HTTPS => true, + } + } + + fn new_from_sockaddr(address: SocketAddr, tls: bool, sni: String) -> Self { + HttpPeer { + _address: address, + scheme: Scheme::from_tls_bool(tls), + sni, + proxy: None, + client_cert_key: None, + options: PeerOptions::new(), + } + } + + /// Create a new [`HttpPeer`] with the given socket address and TLS settings. + pub fn new<A: ToInetSocketAddrs>(address: A, tls: bool, sni: String) -> Self { + let mut addrs_iter = address.to_socket_addrs().unwrap(); //TODO: handle error + let addr = addrs_iter.next().unwrap(); + Self::new_from_sockaddr(SocketAddr::Inet(addr), tls, sni) + } + + /// Create a new [`HttpPeer`] with the given path to Unix domain socket and TLS settings. + pub fn new_uds(path: &str, tls: bool, sni: String) -> Self { + let addr = SocketAddr::Unix(UnixSocketAddr::from_pathname(Path::new(path)).unwrap()); //TODO: handle error + Self::new_from_sockaddr(addr, tls, sni) + } + + /// Create a new [`HttpPeer`] that uses a proxy to connect to the upstream IP and port + /// combination. + pub fn new_proxy( + next_hop: &str, + ip_addr: IpAddr, + port: u16, + tls: bool, + sni: &str, + headers: BTreeMap<String, Vec<u8>>, + ) -> Self { + HttpPeer { + _address: SocketAddr::Inet(InetSocketAddr::new(ip_addr, port)), + scheme: Scheme::from_tls_bool(tls), + sni: sni.to_string(), + proxy: Some(Proxy { + next_hop: PathBuf::from(next_hop).into(), + host: ip_addr.to_string(), + port, + headers, + }), + client_cert_key: None, + options: PeerOptions::new(), + } + } + + fn peer_hash(&self) -> u64 { + let mut hasher = AHasher::default(); + self.hash(&mut hasher); + hasher.finish() + } +} + +impl Hash for HttpPeer { + fn hash<H: Hasher>(&self, state: &mut H) { + self._address.hash(state); + self.scheme.hash(state); + self.proxy.hash(state); + self.sni.hash(state); + // client cert serial + self.client_cert_key.hash(state); + // origin server cert verification + self.verify_cert().hash(state); + self.verify_hostname().hash(state); + self.alternative_cn().hash(state); + } +} + +impl Display for HttpPeer { + fn fmt(&self, f: &mut Formatter<'_>) -> FmtResult { + write!(f, "addr: {}, scheme: {},", self._address, self.scheme)?; + if !self.sni.is_empty() { + write!(f, "sni: {},", self.sni)?; + } + if let Some(p) = self.proxy.as_ref() { + write!(f, "proxy: {p},")?; + } + if let Some(cert) = &self.client_cert_key { + write!(f, "client cert: {},", cert)?; + } + Ok(()) + } +} + +impl Peer for HttpPeer { + fn address(&self) -> &SocketAddr { + &self._address + } + + fn tls(&self) -> bool { + self.is_tls() + } + + fn sni(&self) -> &str { + &self.sni + } + + // TODO: change connection pool to accept u64 instead of String + fn reuse_hash(&self) -> u64 { + self.peer_hash() + } + + fn get_peer_options(&self) -> Option<&PeerOptions> { + Some(&self.options) + } + + fn get_mut_peer_options(&mut self) -> Option<&mut PeerOptions> { + Some(&mut self.options) + } + + fn get_proxy(&self) -> Option<&Proxy> { + self.proxy.as_ref() + } + + fn matches_fd<V: AsRawFd>(&self, fd: V) -> bool { + if let Some(proxy) = self.get_proxy() { + proxy.next_hop.check_fd_match(fd) + } else { + self.address().check_fd_match(fd) + } + } + + fn get_client_cert_key(&self) -> Option<&Arc<CertKey>> { + self.client_cert_key.as_ref() + } + + fn get_tracer(&self) -> Option<Tracer> { + self.options.tracer.clone() + } +} + +/// The proxy settings to connect to the remote server, CONNECT only for now +#[derive(Debug, Hash, Clone)] +pub struct Proxy { + pub next_hop: Box<Path>, // for now this will be the path to the UDS + pub host: String, // the proxied host. Could be either IP addr or hostname. + pub port: u16, // the port to proxy to + pub headers: BTreeMap<String, Vec<u8>>, // the additional headers to add to CONNECT +} + +impl Display for Proxy { + fn fmt(&self, f: &mut Formatter) -> FmtResult { + write!( + f, + "next_hop: {}, host: {}, port: {}", + self.next_hop.display(), + self.host, + self.port + ) + } +} diff --git a/pingora-core/src/utils/mod.rs b/pingora-core/src/utils/mod.rs new file mode 100644 index 0000000..c36f7c8 --- /dev/null +++ b/pingora-core/src/utils/mod.rs @@ -0,0 +1,232 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! This module contains various types that make it easier to work with bytes and X509 +//! certificates. + +// TODO: move below to its own mod +use crate::tls::{nid::Nid, pkey::PKey, pkey::Private, x509::X509}; +use crate::Result; +use bytes::Bytes; +use pingora_error::{ErrorType::*, OrErr}; +use std::hash::{Hash, Hasher}; + +/// A `BufRef` is a reference to a buffer of bytes. It removes the need for self-referential data +/// structures. It is safe to use as long as the underlying buffer does not get mutated. +/// +/// # Panics +/// +/// This will panic if an index is out of bounds. +#[derive(Clone, PartialEq, Eq, Debug)] +pub struct BufRef(pub usize, pub usize); + +impl BufRef { + /// Return a sub-slice of `buf`. + pub fn get<'a>(&self, buf: &'a [u8]) -> &'a [u8] { + &buf[self.0..self.1] + } + + /// Return a slice of `buf`. This operation is O(1) and increases the reference count of `buf`. + pub fn get_bytes(&self, buf: &Bytes) -> Bytes { + buf.slice(self.0..self.1) + } + + /// Return the size of the slice reference. + pub fn len(&self) -> usize { + self.1 - self.0 + } + + /// Return true if the length is zero. + pub fn is_empty(&self) -> bool { + self.1 == self.0 + } +} + +impl BufRef { + /// Initialize a `BufRef` that can reference a slice beginning at index `start` and has a + /// length of `len`. + pub fn new(start: usize, len: usize) -> Self { + BufRef(start, start + len) + } +} + +/// A `KVRef` contains a key name and value pair, stored as two [BufRef] types. +#[derive(Clone)] +pub struct KVRef { + name: BufRef, + value: BufRef, +} + +impl KVRef { + /// Like [BufRef::get] for the name. + pub fn get_name<'a>(&self, buf: &'a [u8]) -> &'a [u8] { + self.name.get(buf) + } + + /// Like [BufRef::get] for the value. + pub fn get_value<'a>(&self, buf: &'a [u8]) -> &'a [u8] { + self.value.get(buf) + } + + /// Like [BufRef::get_bytes] for the name. + pub fn get_name_bytes(&self, buf: &Bytes) -> Bytes { + self.name.get_bytes(buf) + } + + /// Like [BufRef::get_bytes] for the value. + pub fn get_value_bytes(&self, buf: &Bytes) -> Bytes { + self.value.get_bytes(buf) + } + + /// Return a new `KVRef` with name and value start indices and lengths. + pub fn new(name_s: usize, name_len: usize, value_s: usize, value_len: usize) -> Self { + KVRef { + name: BufRef(name_s, name_s + name_len), + value: BufRef(value_s, value_s + value_len), + } + } + + /// Return a reference to the value. + pub fn value(&self) -> &BufRef { + &self.value + } +} + +/// A [KVRef] which contains empty sub-slices. +pub const EMPTY_KV_REF: KVRef = KVRef { + name: BufRef(0, 0), + value: BufRef(0, 0), +}; + +fn get_subject_name(cert: &X509, name_type: Nid) -> Option<String> { + cert.subject_name() + .entries_by_nid(name_type) + .next() + .map(|name| { + name.data() + .as_utf8() + .map(|s| s.to_string()) + .unwrap_or_default() + }) +} + +/// Return the organization associated with the X509 certificate. +pub fn get_organization(cert: &X509) -> Option<String> { + get_subject_name(cert, Nid::ORGANIZATIONNAME) +} + +/// Return the common name associated with the X509 certificate. +pub fn get_common_name(cert: &X509) -> Option<String> { + get_subject_name(cert, Nid::COMMONNAME) +} + +/// Return the common name associated with the X509 certificate. +pub fn get_organization_unit(cert: &X509) -> Option<String> { + get_subject_name(cert, Nid::ORGANIZATIONALUNITNAME) +} + +/// Return the serial number associated with the X509 certificate as a hexadecimal value. +pub fn get_serial(cert: &X509) -> Result<String> { + let bn = cert + .serial_number() + .to_bn() + .or_err(InvalidCert, "Invalid serial")?; + let hex = bn.to_hex_str().or_err(InvalidCert, "Invalid serial")?; + + let hex_str: &str = hex.as_ref(); + Ok(hex_str.to_owned()) +} + +/// This type contains a list of one or more certificates and an associated private key. The leaf +/// certificate should always be first. +#[derive(Clone)] +pub struct CertKey { + certificates: Vec<X509>, + key: PKey<Private>, +} + +impl CertKey { + /// Create a new `CertKey` given a list of certificates and a private key. + pub fn new(certificates: Vec<X509>, key: PKey<Private>) -> CertKey { + assert!( + !certificates.is_empty(), + "expected a non-empty vector of certificates in CertKey::new" + ); + + CertKey { certificates, key } + } + + /// Peek at the leaf certificate. + pub fn leaf(&self) -> &X509 { + // This is safe due to the assertion above. + &self.certificates[0] + } + + /// Return the key. + pub fn key(&self) -> &PKey<Private> { + &self.key + } + + /// Return a slice of intermediate certificates. An empty slice means there are none. + pub fn intermediates(&self) -> &[X509] { + if self.certificates.len() <= 1 { + return &[]; + } + &self.certificates[1..] + } + + /// Return the organization from the leaf certificate. + pub fn organization(&self) -> Option<String> { + get_organization(self.leaf()) + } + + /// Return the serial from the leaf certificate. + pub fn serial(&self) -> Result<String> { + get_serial(self.leaf()) + } +} + +impl Hash for CertKey { + fn hash<H: Hasher>(&self, state: &mut H) { + for certificate in &self.certificates { + if let Ok(serial) = get_serial(certificate) { + serial.hash(state) + } + } + } +} + +// hide private key +impl std::fmt::Debug for CertKey { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + f.debug_struct("CertKey") + .field("X509", &self.leaf()) + .finish() + } +} + +impl std::fmt::Display for CertKey { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + let leaf = self.leaf(); + if let Some(cn) = get_common_name(leaf) { + // Write CN if it exists + write!(f, "CN: {cn},")?; + } else if let Some(org_unit) = get_organization_unit(leaf) { + // CA cert might not have CN, so print its unit name instead + write!(f, "Org Unit: {org_unit},")?; + } + write!(f, ", expire: {}", leaf.not_after()) + // ignore the details of the private key + } +} diff --git a/pingora-core/tests/keys/key.pem b/pingora-core/tests/keys/key.pem new file mode 100644 index 0000000..0fe68f2 --- /dev/null +++ b/pingora-core/tests/keys/key.pem @@ -0,0 +1,5 @@ +-----BEGIN EC PRIVATE KEY----- +MHcCAQEEIN5lAOvtlKwtc/LR8/U77dohJmZS30OuezU9gL6vmm6DoAoGCCqGSM49 +AwEHoUQDQgAE2f/1Fm1HjySdokPq2T0F1xxol9nSEYQ+foFINeaWYk+FxMGpriJT +Bb8AGka87cWklw1ZqytfaT6pkureDbTkwg== +-----END EC PRIVATE KEY----- diff --git a/pingora-core/tests/keys/public.pem b/pingora-core/tests/keys/public.pem new file mode 100644 index 0000000..0866a04 --- /dev/null +++ b/pingora-core/tests/keys/public.pem @@ -0,0 +1,4 @@ +-----BEGIN PUBLIC KEY----- +MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE2f/1Fm1HjySdokPq2T0F1xxol9nS +EYQ+foFINeaWYk+FxMGpriJTBb8AGka87cWklw1ZqytfaT6pkureDbTkwg== +-----END PUBLIC KEY----- diff --git a/pingora-core/tests/keys/server.crt b/pingora-core/tests/keys/server.crt new file mode 100644 index 0000000..afb2d1e --- /dev/null +++ b/pingora-core/tests/keys/server.crt @@ -0,0 +1,13 @@ +-----BEGIN CERTIFICATE----- +MIIB9zCCAZ2gAwIBAgIUMI7aLvTxyRFCHhw57hGt4U6yupcwCgYIKoZIzj0EAwIw +ZDELMAkGA1UEBhMCVVMxCzAJBgNVBAgMAkNBMRYwFAYDVQQHDA1TYW4gRnJhbmNp +c2NvMRgwFgYDVQQKDA9DbG91ZGZsYXJlLCBJbmMxFjAUBgNVBAMMDW9wZW5ydXN0 +eS5vcmcwHhcNMjIwNDExMjExMzEzWhcNMzIwNDA4MjExMzEzWjBkMQswCQYDVQQG +EwJVUzELMAkGA1UECAwCQ0ExFjAUBgNVBAcMDVNhbiBGcmFuY2lzY28xGDAWBgNV +BAoMD0Nsb3VkZmxhcmUsIEluYzEWMBQGA1UEAwwNb3BlbnJ1c3R5Lm9yZzBZMBMG +ByqGSM49AgEGCCqGSM49AwEHA0IABNn/9RZtR48knaJD6tk9BdccaJfZ0hGEPn6B +SDXmlmJPhcTBqa4iUwW/ABpGvO3FpJcNWasrX2k+qZLq3g205MKjLTArMCkGA1Ud +EQQiMCCCDyoub3BlbnJ1c3R5Lm9yZ4INb3BlbnJ1c3R5Lm9yZzAKBggqhkjOPQQD +AgNIADBFAiAjISZ9aEKmobKGlT76idO740J6jPaX/hOrm41MLeg69AIhAJqKrSyz +wD/AAF5fR6tXmBqlnpQOmtxfdy13wDr4MT3h +-----END CERTIFICATE----- diff --git a/pingora-core/tests/keys/server.csr b/pingora-core/tests/keys/server.csr new file mode 100644 index 0000000..ca75dce --- /dev/null +++ b/pingora-core/tests/keys/server.csr @@ -0,0 +1,9 @@ +-----BEGIN CERTIFICATE REQUEST----- +MIIBJzCBzgIBADBsMQswCQYDVQQGEwJVUzETMBEGA1UECAwKQ2FsaWZvcm5pYTEW +MBQGA1UEBwwNU2FuIEZyYW5jaXNjbzEYMBYGA1UECgwPQ2xvdWRmbGFyZSwgSW5j +MRYwFAYDVQQDDA1vcGVucnVzdHkub3JnMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcD +QgAE2f/1Fm1HjySdokPq2T0F1xxol9nSEYQ+foFINeaWYk+FxMGpriJTBb8AGka8 +7cWklw1ZqytfaT6pkureDbTkwqAAMAoGCCqGSM49BAMCA0gAMEUCIFyDN8eamnoY +XydKn2oI7qImigxahyCftzjxkIEV5IKbAiEAo5l72X4U+YTVYmyPPnJIj2v5nA1R +RuUfMh5sXzwlwuM= +-----END CERTIFICATE REQUEST----- diff --git a/pingora-core/tests/nginx.conf b/pingora-core/tests/nginx.conf new file mode 100644 index 0000000..55f2e24 --- /dev/null +++ b/pingora-core/tests/nginx.conf @@ -0,0 +1,92 @@ + +#user nobody; +worker_processes 1; + +error_log /dev/stdout; +#error_log logs/error.log notice; +#error_log logs/error.log info; + +pid logs/nginx.pid; +master_process off; +daemon off; + +events { + worker_connections 4096; +} + + +http { + #include mime.types; + #default_type application/octet-stream; + + #log_format main '$remote_addr - $remote_user [$time_local] "$request" ' + # '$status $body_bytes_sent "$http_referer" ' + # '"$http_user_agent" "$http_x_forwarded_for"'; + + # access_log logs/access.log main; + access_log off; + + sendfile on; + #tcp_nopush on; + + #keepalive_timeout 0; + keepalive_timeout 10; + keepalive_requests 99999; + + #gzip on; + + server { + listen 8000; + listen [::]:8000; + listen 8443 ssl http2; + #listen 8443 ssl http2; + server_name localhost; + + ssl_certificate keys/server.crt; + ssl_certificate_key keys/key.pem; + ssl_protocols TLSv1.2; + ssl_ciphers TLS-AES-128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256; + + #charset koi8-r; + + #access_log logs/host.access.log main; + + location / { + root /home/yuchen/nfs/tmp; + index index.html index.htm; + } + location /test { + keepalive_timeout 20; + return 200; + } + location /test2 { + keepalive_timeout 0; + return 200 "hello world"; + } + location /test3 { + keepalive_timeout 0; + return 200; + #content_by_lua_block { + # ngx.print("hello world") + #} + } + + location /test4 { + keepalive_timeout 20; + rewrite_by_lua_block { + ngx.exit(200) + } + #return 201; + + } + + #error_page 404 /404.html; + + # redirect server error pages to the static page /50x.html + # + error_page 500 502 503 504 /50x.html; + location = /50x.html { + root html; + } + } +} diff --git a/pingora-core/tests/nginx_proxy.conf b/pingora-core/tests/nginx_proxy.conf new file mode 100644 index 0000000..0acbd93 --- /dev/null +++ b/pingora-core/tests/nginx_proxy.conf @@ -0,0 +1,86 @@ + +#user nobody; +worker_processes 1; + +error_log /dev/stdout; +#error_log logs/error.log notice; +#error_log logs/error.log info; + +#pid logs/nginx.pid; +master_process off; +daemon off; + +events { + worker_connections 4096; +} + + +http { + #include mime.types; + #default_type application/octet-stream; + + #log_format main '$remote_addr - $remote_user [$time_local] "$request" ' + # '$status $body_bytes_sent "$http_referer" ' + # '"$http_user_agent" "$http_x_forwarded_for"'; + + # access_log logs/access.log main; + access_log off; + + sendfile on; + #tcp_nopush on; + + keepalive_timeout 30; + keepalive_requests 99999; + + upstream plantext { + server 127.0.0.1:8000; + keepalive 128; + keepalive_requests 99999; + } + + upstream ssl { + server 127.0.0.1:8443; + keepalive 128; + keepalive_requests 99999; + } + + #gzip on; + + server { + listen 8001; + listen [::]:8001; + server_name localproxy; + + location / { + keepalive_timeout 30; + proxy_pass http://plantext; + proxy_http_version 1.1; + proxy_set_header Connection "Keep-Alive"; + } + + } + + server { + listen 8002 ssl; + listen [::]:8002 ssl; + server_name localproxy_https; + + ssl_certificate keys/server.crt; + ssl_certificate_key keys/key.pem; + ssl_protocols TLSv1.2; + ssl_ciphers TLS-AES-128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256; + + location / { + keepalive_timeout 30; + proxy_pass https://ssl; + proxy_http_version 1.1; + proxy_ssl_session_reuse off; + proxy_ssl_verify on; + proxy_ssl_server_name on; + proxy_ssl_name "openrusty.org"; + proxy_ssl_trusted_certificate keys/server.crt; + proxy_set_header Connection "Keep-Alive"; + } + + } +} diff --git a/pingora-core/tests/pingora_conf.yaml b/pingora-core/tests/pingora_conf.yaml new file mode 100644 index 0000000..c21ae15 --- /dev/null +++ b/pingora-core/tests/pingora_conf.yaml @@ -0,0 +1,5 @@ +--- +version: 1 +client_bind_to_ipv4: + - 127.0.0.2 +ca_file: tests/keys/server.crt
\ No newline at end of file diff --git a/pingora-core/tests/test_basic.rs b/pingora-core/tests/test_basic.rs new file mode 100644 index 0000000..171f757 --- /dev/null +++ b/pingora-core/tests/test_basic.rs @@ -0,0 +1,60 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +mod utils; + +use hyper::Client; +use hyperlocal::{UnixClientExt, Uri}; +use utils::init; + +#[tokio::test] +async fn test_http() { + init(); + let res = reqwest::get("http://127.0.0.1:6145").await.unwrap(); + assert_eq!(res.status(), reqwest::StatusCode::OK); +} + +#[tokio::test] +async fn test_https_http2() { + init(); + + let client = reqwest::Client::builder() + .danger_accept_invalid_certs(true) + .build() + .unwrap(); + + let res = client.get("https://127.0.0.1:6146").send().await.unwrap(); + assert_eq!(res.status(), reqwest::StatusCode::OK); + assert_eq!(res.version(), reqwest::Version::HTTP_2); + + let client = reqwest::Client::builder() + .danger_accept_invalid_certs(true) + .http1_only() + .build() + .unwrap(); + + let res = client.get("https://127.0.0.1:6146").send().await.unwrap(); + assert_eq!(res.status(), reqwest::StatusCode::OK); + assert_eq!(res.version(), reqwest::Version::HTTP_11); +} + +#[tokio::test] +async fn test_uds() { + init(); + let url = Uri::new("/tmp/echo.sock", "/").into(); + let client = Client::unix(); + + let res = client.get(url).await.unwrap(); + assert_eq!(res.status(), reqwest::StatusCode::OK); +} diff --git a/pingora-core/tests/utils/mod.rs b/pingora-core/tests/utils/mod.rs new file mode 100644 index 0000000..1555b7d --- /dev/null +++ b/pingora-core/tests/utils/mod.rs @@ -0,0 +1,123 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use once_cell::sync::Lazy; +use std::{thread, time}; + +use pingora_core::listeners::Listeners; +use pingora_core::server::configuration::Opt; +use pingora_core::server::Server; +use pingora_core::services::listening::Service; +use structopt::StructOpt; + +use async_trait::async_trait; +use bytes::Bytes; +use http::{Response, StatusCode}; +use pingora_timeout::timeout; +use std::sync::Arc; +use std::time::Duration; + +use pingora_core::apps::http_app::ServeHttp; +use pingora_core::protocols::http::ServerSession; + +#[derive(Clone)] +pub struct EchoApp; + +#[async_trait] +impl ServeHttp for EchoApp { + async fn response(&self, http_stream: &mut ServerSession) -> Response<Vec<u8>> { + // read timeout of 2s + let read_timeout = 2000; + let body = match timeout( + Duration::from_millis(read_timeout), + http_stream.read_request_body(), + ) + .await + { + Ok(res) => match res.unwrap() { + Some(bytes) => bytes, + None => Bytes::from("no body!"), + }, + Err(_) => { + panic!("Timed out after {:?}ms", read_timeout); + } + }; + + Response::builder() + .status(StatusCode::OK) + .header(http::header::CONTENT_TYPE, "text/html") + .header(http::header::CONTENT_LENGTH, body.len()) + .body(body.to_vec()) + .unwrap() + } +} + +pub fn new_http_echo_app() -> Arc<EchoApp> { + Arc::new(EchoApp {}) +} + +pub struct MyServer { + pub handle: thread::JoinHandle<()>, +} + +fn entry_point(opt: Option<Opt>) { + env_logger::init(); + + let cert_path = format!("{}/tests/keys/server.crt", env!("CARGO_MANIFEST_DIR")); + let key_path = format!("{}/tests/keys/key.pem", env!("CARGO_MANIFEST_DIR")); + + let mut my_server = Server::new(opt).unwrap(); + my_server.bootstrap(); + + let mut listeners = Listeners::tcp("0.0.0.0:6145"); + listeners.add_uds("/tmp/echo.sock", None); + + let mut tls_settings = + pingora_core::listeners::TlsSettings::intermediate(&cert_path, &key_path).unwrap(); + tls_settings.enable_h2(); + listeners.add_tls_with_settings("0.0.0.0:6146", None, tls_settings); + + let echo_service_http = Service::with_listeners( + "Echo Service HTTP".to_string(), + listeners, + new_http_echo_app(), + ); + + my_server.add_service(echo_service_http); + my_server.run_forever(); +} + +impl MyServer { + pub fn start() -> Self { + let opts: Vec<String> = vec![ + "pingora".into(), + "-c".into(), + "tests/pingora_conf.yaml".into(), + ]; + let server_handle = thread::spawn(|| { + entry_point(Some(Opt::from_iter(opts))); + }); + // wait until the server is up + thread::sleep(time::Duration::from_secs(2)); + MyServer { + handle: server_handle, + } + } +} + +pub static TEST_SERVER: Lazy<MyServer> = Lazy::new(MyServer::start); + +pub fn init() { + let _ = *TEST_SERVER; +} diff --git a/pingora-error/Cargo.toml b/pingora-error/Cargo.toml new file mode 100644 index 0000000..cdbc619 --- /dev/null +++ b/pingora-error/Cargo.toml @@ -0,0 +1,17 @@ +[package] +name = "pingora-error" +version = "0.1.0" +authors = ["Yuchen Wu <[email protected]>"] +license = "Apache-2.0" +edition = "2021" +repository = "https://github.com/cloudflare/pingora" +categories = ["rust-patterns"] +keywords = ["error", "error-handling", "pingora"] +description = """ +Error types and error handling APIs for Pingora. +""" + +# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html +[lib] +name = "pingora_error" +path = "src/lib.rs" diff --git a/pingora-error/LICENSE b/pingora-error/LICENSE new file mode 100644 index 0000000..d645695 --- /dev/null +++ b/pingora-error/LICENSE @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/pingora-error/src/immut_str.rs b/pingora-error/src/immut_str.rs new file mode 100644 index 0000000..e131b63 --- /dev/null +++ b/pingora-error/src/immut_str.rs @@ -0,0 +1,71 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use std::fmt; + +/// A data struct that holds either immutable string or reference to static str. +/// Compared to String or `Box<str>`, it avoids memory allocation on static str. +#[derive(Debug, PartialEq, Eq, Clone)] +pub enum ImmutStr { + Static(&'static str), + Owned(Box<str>), +} + +impl ImmutStr { + #[inline] + pub fn as_str(&self) -> &str { + match self { + ImmutStr::Static(s) => s, + ImmutStr::Owned(s) => s.as_ref(), + } + } + + pub fn is_owned(&self) -> bool { + match self { + ImmutStr::Static(_) => false, + ImmutStr::Owned(_) => true, + } + } +} + +impl fmt::Display for ImmutStr { + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { + write!(f, "{}", self.as_str()) + } +} + +impl From<&'static str> for ImmutStr { + fn from(s: &'static str) -> Self { + ImmutStr::Static(s) + } +} + +impl From<String> for ImmutStr { + fn from(s: String) -> Self { + ImmutStr::Owned(s.into_boxed_str()) + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_static_vs_owned() { + let s: ImmutStr = "test".into(); + assert!(!s.is_owned()); + let s: ImmutStr = "test".to_string().into(); + assert!(s.is_owned()); + } +} diff --git a/pingora-error/src/lib.rs b/pingora-error/src/lib.rs new file mode 100644 index 0000000..6bd6702 --- /dev/null +++ b/pingora-error/src/lib.rs @@ -0,0 +1,589 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#![warn(clippy::all)] +//! The library to provide the struct to represent errors in pingora. + +pub use std::error::Error as ErrorTrait; +use std::fmt; +use std::fmt::Debug; +use std::result::Result as StdResult; + +mod immut_str; +pub use immut_str::ImmutStr; + +/// The boxed [Error], the desired way to pass [Error] +pub type BError = Box<Error>; +/// Syntax sugar for `std::Result<T, BError>` +pub type Result<T, E = BError> = StdResult<T, E>; + +/// The struct that represents an error +#[derive(Debug)] +pub struct Error { + /// the type of error + pub etype: ErrorType, + /// the source of error: from upstream, downstream or internal + pub esource: ErrorSource, + /// if the error is retry-able + pub retry: RetryType, + /// chain to the cause of this error + pub cause: Option<Box<(dyn ErrorTrait + Send + Sync)>>, + /// an arbitrary string that explains the context when the error happens + pub context: Option<ImmutStr>, +} + +/// The source of the error +#[derive(Debug, PartialEq, Eq, Clone)] +pub enum ErrorSource { + /// The error is caused by the remote server + Upstream, + /// The error is caused by the remote client + Downstream, + /// The error is caused by the internal logic + Internal, + /// Error source unknown or to be set + Unset, +} + +/// Whether the request can be retried after encountering this error +#[derive(Debug, PartialEq, Eq, Clone, Copy)] +pub enum RetryType { + Decided(bool), + ReusedOnly, // only retry when the error is from a reused connection +} + +impl RetryType { + pub fn decide_reuse(&mut self, reused: bool) { + if matches!(self, RetryType::ReusedOnly) { + *self = RetryType::Decided(reused); + } + } + + pub fn retry(&self) -> bool { + match self { + RetryType::Decided(b) => *b, + RetryType::ReusedOnly => { + panic!("Retry is not decided") + } + } + } +} + +impl From<bool> for RetryType { + fn from(b: bool) -> Self { + RetryType::Decided(b) + } +} + +impl ErrorSource { + /// for displaying the error source + pub fn as_str(&self) -> &str { + match self { + Self::Upstream => "Upstream", + Self::Downstream => "Downstream", + Self::Internal => "Internal", + Self::Unset => "", + } + } +} + +/// Predefined type of errors +#[derive(Debug, PartialEq, Eq, Clone)] +pub enum ErrorType { + // connect errors + ConnectTimedout, + ConnectRefused, + ConnectNoRoute, + TLSHandshakeFailure, + TLSHandshakeTimedout, + InvalidCert, + HandshakeError, // other handhshake + ConnectError, // catch all + BindError, + AcceptError, + SocketError, + ConnectProxyFailure, + // protocol errors + InvalidHTTPHeader, + H1Error, // catch all + H2Error, // catch all + H2Downgrade, // Peer over h2 requests to downgrade to h1 + InvalidH2, // Peer sends invalid h2 frames to us + // IO error on established connections + ReadError, + WriteError, + ReadTimedout, + WriteTimedout, + ConnectionClosed, + // application error, will return HTTP status code + HTTPStatus(u16), + // file related + FileOpenError, + FileCreateError, + FileReadError, + FileWriteError, + // other errors + InternalError, + // catch all + UnknownError, + /// Custom error with static string. + /// this field is to allow users to extend the types of errors. If runtime generated string + /// is needed, it is more likely to be treated as "context" rather than "type". + Custom(&'static str), + /// Custom error with static string and code. + /// this field allows users to extend error further with error codes. + CustomCode(&'static str, u16), +} + +impl ErrorType { + /// create a new type of error. Users should try to make `name` unique. + pub const fn new(name: &'static str) -> Self { + ErrorType::Custom(name) + } + + /// create a new type of error. Users should try to make `name` unique. + pub const fn new_code(name: &'static str, code: u16) -> Self { + ErrorType::CustomCode(name, code) + } + + /// for displaying the error type + pub fn as_str(&self) -> &str { + match self { + ErrorType::ConnectTimedout => "ConnectTimedout", + ErrorType::ConnectRefused => "ConnectRefused", + ErrorType::ConnectNoRoute => "ConnectNoRoute", + ErrorType::ConnectProxyFailure => "ConnectProxyFailure", + ErrorType::TLSHandshakeFailure => "TLSHandshakeFailure", + ErrorType::TLSHandshakeTimedout => "TLSHandshakeTimedout", + ErrorType::InvalidCert => "InvalidCert", + ErrorType::HandshakeError => "HandshakeError", + ErrorType::ConnectError => "ConnectError", + ErrorType::BindError => "BindError", + ErrorType::AcceptError => "AcceptError", + ErrorType::SocketError => "SocketError", + ErrorType::InvalidHTTPHeader => "InvalidHTTPHeader", + ErrorType::H1Error => "H1Error", + ErrorType::H2Error => "H2Error", + ErrorType::InvalidH2 => "InvalidH2", + ErrorType::H2Downgrade => "H2Downgrade", + ErrorType::ReadError => "ReadError", + ErrorType::WriteError => "WriteError", + ErrorType::ReadTimedout => "ReadTimedout", + ErrorType::WriteTimedout => "WriteTimedout", + ErrorType::ConnectionClosed => "ConnectionClosed", + ErrorType::FileOpenError => "FileOpenError", + ErrorType::FileCreateError => "FileCreateError", + ErrorType::FileReadError => "FileReadError", + ErrorType::FileWriteError => "FileWriteError", + ErrorType::HTTPStatus(_) => "HTTPStatus", + ErrorType::InternalError => "InternalError", + ErrorType::UnknownError => "UnknownError", + ErrorType::Custom(s) => s, + ErrorType::CustomCode(s, _) => s, + } + } +} + +impl Error { + /// Simply create the error. See other functions that provide less verbose interfaces. + #[inline] + pub fn create( + etype: ErrorType, + esource: ErrorSource, + context: Option<ImmutStr>, + cause: Option<Box<dyn ErrorTrait + Send + Sync>>, + ) -> BError { + let retry = if let Some(c) = cause.as_ref() { + if let Some(e) = c.downcast_ref::<BError>() { + e.retry + } else { + false.into() + } + } else { + false.into() + }; + Box::new(Error { + etype, + esource, + retry, + cause, + context, + }) + } + + #[inline] + fn do_new(e: ErrorType, s: ErrorSource) -> BError { + Self::create(e, s, None, None) + } + + /// Create an error with the given type + #[inline] + pub fn new(e: ErrorType) -> BError { + Self::do_new(e, ErrorSource::Unset) + } + + /// Create an error with the given type, a context string and the causing error. + /// This method is usually used when there the error is caused by another error. + /// ``` + /// use pingora_error::{Error, ErrorType, Result}; + /// + /// fn b() -> Result<()> { + /// // ... + /// Ok(()) + /// } + /// fn do_something() -> Result<()> { + /// // a()?; + /// b().map_err(|e| Error::because(ErrorType::InternalError, "b failed after a", e)) + /// } + /// ``` + /// Choose carefully between simply surfacing the causing error versus Because() here. + /// Only use Because() when there is extra context that is not capture by + /// the causing error itself. + #[inline] + pub fn because<S: Into<ImmutStr>, E: Into<Box<dyn ErrorTrait + Send + Sync>>>( + e: ErrorType, + context: S, + cause: E, + ) -> BError { + Self::create( + e, + ErrorSource::Unset, + Some(context.into()), + Some(cause.into()), + ) + } + + /// Short for Err(Self::because) + #[inline] + pub fn e_because<T, S: Into<ImmutStr>, E: Into<Box<dyn ErrorTrait + Send + Sync>>>( + e: ErrorType, + context: S, + cause: E, + ) -> Result<T> { + Err(Self::because(e, context, cause)) + } + + /// Create an error with context but no direct causing error + #[inline] + pub fn explain<S: Into<ImmutStr>>(e: ErrorType, context: S) -> BError { + Self::create(e, ErrorSource::Unset, Some(context.into()), None) + } + + /// Short for Err(Self::explain) + #[inline] + pub fn e_explain<T, S: Into<ImmutStr>>(e: ErrorType, context: S) -> Result<T> { + Err(Self::explain(e, context)) + } + + /// The new_{up, down, in} functions are to create new errors with source + /// {upstream, downstream, internal} + #[inline] + pub fn new_up(e: ErrorType) -> BError { + Self::do_new(e, ErrorSource::Upstream) + } + + #[inline] + pub fn new_down(e: ErrorType) -> BError { + Self::do_new(e, ErrorSource::Downstream) + } + + #[inline] + pub fn new_in(e: ErrorType) -> BError { + Self::do_new(e, ErrorSource::Internal) + } + + /// Create a new custom error with the static string + #[inline] + pub fn new_str(s: &'static str) -> BError { + Self::do_new(ErrorType::Custom(s), ErrorSource::Unset) + } + + // the err_* functions are the same as new_* but return a Result<T> + #[inline] + pub fn err<T>(e: ErrorType) -> Result<T> { + Err(Self::new(e)) + } + + #[inline] + pub fn err_up<T>(e: ErrorType) -> Result<T> { + Err(Self::new_up(e)) + } + + #[inline] + pub fn err_down<T>(e: ErrorType) -> Result<T> { + Err(Self::new_down(e)) + } + + #[inline] + pub fn err_in<T>(e: ErrorType) -> Result<T> { + Err(Self::new_in(e)) + } + + pub fn etype(&self) -> &ErrorType { + &self.etype + } + + pub fn esource(&self) -> &ErrorSource { + &self.esource + } + + pub fn retry(&self) -> bool { + self.retry.retry() + } + + pub fn set_retry(&mut self, retry: bool) { + self.retry = retry.into(); + } + + pub fn reason_str(&self) -> &str { + self.etype.as_str() + } + + pub fn source_str(&self) -> &str { + self.esource.as_str() + } + + /// The as_{up, down, in} functions are to change the current errors with source + /// {upstream, downstream, internal} + pub fn as_up(&mut self) { + self.esource = ErrorSource::Upstream; + } + + pub fn as_down(&mut self) { + self.esource = ErrorSource::Downstream; + } + + pub fn as_in(&mut self) { + self.esource = ErrorSource::Internal; + } + + /// The into_{up, down, in} are the same as as_* but takes `self` and also return `self` + pub fn into_up(mut self: BError) -> BError { + self.as_up(); + self + } + + pub fn into_down(mut self: BError) -> BError { + self.as_down(); + self + } + + pub fn into_in(mut self: BError) -> BError { + self.as_in(); + self + } + + pub fn into_err<T>(self: BError) -> Result<T> { + Err(self) + } + + pub fn set_cause<C: Into<Box<dyn ErrorTrait + Send + Sync>>>(&mut self, cause: C) { + self.cause = Some(cause.into()); + } + + pub fn set_context<T: Into<ImmutStr>>(&mut self, context: T) { + self.context = Some(context.into()); + } + + /// Create a new error from self, with the same type and source and put self as the cause + /// ``` + /// use pingora_error::Result; + /// + /// fn b() -> Result<()> { + /// // ... + /// Ok(()) + /// } + /// + /// fn do_something() -> Result<()> { + /// // a()?; + /// b().map_err(|e| e.more_context("b failed after a")) + /// } + /// ``` + /// This function is less verbose than `Because`. But it only work for [Error] while + /// `Because` works for all types of errors who implement [std::error::Error] trait. + pub fn more_context<T: Into<ImmutStr>>(self: BError, context: T) -> BError { + let esource = self.esource.clone(); + let retry = self.retry; + let mut e = Self::because(self.etype.clone(), context, self); + e.esource = esource; + e.retry = retry; + e + } + + // Display error but skip the duplicate elements from the error in previous hop + fn chain_display(&self, previous: Option<&Error>, f: &mut fmt::Formatter<'_>) -> fmt::Result { + if previous.map(|p| p.esource != self.esource).unwrap_or(true) { + write!(f, "{}", self.esource.as_str())? + } + if previous.map(|p| p.etype != self.etype).unwrap_or(true) { + write!(f, " {}", self.etype.as_str())? + } + + if let Some(c) = self.context.as_ref() { + write!(f, " context: {}", c)?; + } + if let Some(c) = self.cause.as_ref() { + if let Some(e) = c.downcast_ref::<BError>() { + write!(f, " cause: ")?; + e.chain_display(Some(self), f) + } else { + write!(f, " cause: {}", c) + } + } else { + Ok(()) + } + } + + /// Return the ErrorType of the root Error + pub fn root_etype(&self) -> &ErrorType { + self.cause.as_ref().map_or(&self.etype, |c| { + // Stop the recursion if the cause is not Error + c.downcast_ref::<BError>() + .map_or(&self.etype, |e| e.root_etype()) + }) + } + + pub fn root_cause(&self) -> &(dyn ErrorTrait + Send + Sync + 'static) { + self.cause.as_deref().map_or(self, |c| { + c.downcast_ref::<BError>().map_or(c, |e| e.root_cause()) + }) + } +} + +impl fmt::Display for Error { + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { + self.chain_display(None, f) + } +} + +impl ErrorTrait for Error {} + +/// Helper trait to add more context to a given error +pub trait Context<T> { + /// Wrap the `Err(E)` in [Result] with more context, the existing E will be the cause. + /// + /// This is a shortcut for map_err() + more_context() + fn err_context<C: Into<ImmutStr>, F: FnOnce() -> C>(self, context: F) -> Result<T, BError>; +} + +impl<T> Context<T> for Result<T, BError> { + fn err_context<C: Into<ImmutStr>, F: FnOnce() -> C>(self, context: F) -> Result<T, BError> { + self.map_err(|e| e.more_context(context())) + } +} + +/// Helper trait to chain errors with context +pub trait OrErr<T, E> { + /// Wrap the E in [Result] with new [ErrorType] and context, the existing E will be the cause. + /// + /// This is a shortcut for map_err() + because() + fn or_err(self, et: ErrorType, context: &'static str) -> Result<T, BError> + where + E: Into<Box<dyn ErrorTrait + Send + Sync>>; + + /// Similar to or_err(), but takes a closure, which is useful for constructing String. + fn or_err_with<C: Into<ImmutStr>, F: FnOnce() -> C>( + self, + et: ErrorType, + context: F, + ) -> Result<T, BError> + where + E: Into<Box<dyn ErrorTrait + Send + Sync>>; + + /// Replace the E in [Result] with a new [Error] generated from the current error + /// + /// This is useful when the current error cannot move out of scope. This is a shortcut for map_err() + explain(). + fn explain_err<C: Into<ImmutStr>, F: FnOnce(E) -> C>( + self, + et: ErrorType, + context: F, + ) -> Result<T, BError>; +} + +impl<T, E> OrErr<T, E> for Result<T, E> { + fn or_err(self, et: ErrorType, context: &'static str) -> Result<T, BError> + where + E: Into<Box<dyn ErrorTrait + Send + Sync>>, + { + self.map_err(|e| Error::because(et, context, e)) + } + + fn or_err_with<C: Into<ImmutStr>, F: FnOnce() -> C>( + self, + et: ErrorType, + context: F, + ) -> Result<T, BError> + where + E: Into<Box<dyn ErrorTrait + Send + Sync>>, + { + self.map_err(|e| Error::because(et, context(), e)) + } + + fn explain_err<C: Into<ImmutStr>, F: FnOnce(E) -> C>( + self, + et: ErrorType, + exp: F, + ) -> Result<T, BError> { + self.map_err(|e| Error::explain(et, exp(e))) + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_chain_of_error() { + let e1 = Error::new(ErrorType::InternalError); + let mut e2 = Error::new(ErrorType::HTTPStatus(400)); + e2.set_cause(e1); + assert_eq!(format!("{}", e2), " HTTPStatus cause: InternalError"); + assert_eq!(e2.root_etype().as_str(), "InternalError"); + + let e3 = Error::new(ErrorType::InternalError); + let e4 = Error::because(ErrorType::HTTPStatus(400), "test", e3); + assert_eq!( + format!("{}", e4), + " HTTPStatus context: test cause: InternalError" + ); + assert_eq!(e4.root_etype().as_str(), "InternalError"); + } + + #[test] + fn test_error_context() { + let mut e1 = Error::new(ErrorType::InternalError); + e1.set_context(format!("{} {}", "my", "context")); + assert_eq!(format!("{}", e1), " InternalError context: my context"); + } + + #[test] + fn test_context_trait() { + let e1: Result<(), BError> = Err(Error::new(ErrorType::InternalError)); + let e2 = e1.err_context(|| "another"); + assert_eq!( + format!("{}", e2.unwrap_err()), + " InternalError context: another cause: " + ); + } + + #[test] + fn test_cause_trait() { + let e1: Result<(), BError> = Err(Error::new(ErrorType::InternalError)); + let e2 = e1.or_err(ErrorType::HTTPStatus(400), "another"); + assert_eq!( + format!("{}", e2.unwrap_err()), + " HTTPStatus context: another cause: InternalError" + ); + } +} diff --git a/pingora-header-serde/Cargo.toml b/pingora-header-serde/Cargo.toml new file mode 100644 index 0000000..2968cae --- /dev/null +++ b/pingora-header-serde/Cargo.toml @@ -0,0 +1,32 @@ +[package] +name = "pingora-header-serde" +version = "0.1.0" +authors = ["Yuchen Wu <[email protected]>"] +license = "Apache-2.0" +edition = "2021" +repository = "https://github.com/cloudflare/pingora" +categories = ["compression"] +keywords = ["http", "compression", "pingora"] +exclude = ["samples/*"] +description = """ +HTTP header (de)serialization and compression for Pingora. +""" + +# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html +[lib] +name = "pingora_header_serde" +path = "src/lib.rs" + +[[bin]] +name = "trainer" +path = "src/trainer.rs" + +[dependencies] +zstd = "0.9.0" +zstd-safe = "4.1.1" +http = { workspace = true } +bytes = { workspace = true } +httparse = { workspace = true } +pingora-error = { version = "0.1.0", path = "../pingora-error" } +pingora-http = { version = "0.1.0", path = "../pingora-http" } +thread_local = "1.0" diff --git a/pingora-header-serde/LICENSE b/pingora-header-serde/LICENSE new file mode 100644 index 0000000..d645695 --- /dev/null +++ b/pingora-header-serde/LICENSE @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/pingora-header-serde/samples/test/1 b/pingora-header-serde/samples/test/1 new file mode 100644 index 0000000..9d4c680 --- /dev/null +++ b/pingora-header-serde/samples/test/1 @@ -0,0 +1,15 @@ +HTTP/1.1 200 OK +Server: nginx +Date: Wed, 22 Dec 2021 06:30:29 GMT +Content-Type: application/javascript +Last-Modified: Mon, 29 Nov 2021 10:13:32 GMT +Transfer-Encoding: chunked +Connection: keep-alive +Vary: Accept-Encoding +ETag: W/"61a4a7cc-21df8" +Access-Control-Allow-Origin: * +Access-Control-Allow-Credentials: true +Access-Control-Expose-Headers: Content-Length,Content-Range +Access-Control-Allow-Headers: Range +Content-Encoding: gzip + diff --git a/pingora-header-serde/samples/test/2 b/pingora-header-serde/samples/test/2 new file mode 100644 index 0000000..5cd4026 --- /dev/null +++ b/pingora-header-serde/samples/test/2 @@ -0,0 +1,15 @@ +HTTP/1.1 200 OK +Server: nginx +Date: Thu, 23 Dec 2021 15:12:32 GMT +Content-Type: application/javascript +Last-Modified: Mon, 09 Sep 2019 12:47:14 GMT +Transfer-Encoding: chunked +Connection: keep-alive +Vary: Accept-Encoding +ETag: W/"5d7649d2-16ec64" +Access-Control-Allow-Origin: * +Access-Control-Allow-Credentials: true +Access-Control-Expose-Headers: Content-Length,Content-Range +Access-Control-Allow-Headers: Range +Content-Encoding: gzip + diff --git a/pingora-header-serde/samples/test/3 b/pingora-header-serde/samples/test/3 new file mode 100644 index 0000000..b02aadd --- /dev/null +++ b/pingora-header-serde/samples/test/3 @@ -0,0 +1,15 @@ +HTTP/1.1 200 OK +Server: nginx +Date: Wed, 22 Dec 2021 12:29:00 GMT +Content-Type: application/javascript +Last-Modified: Mon, 09 Sep 2019 07:47:37 GMT +Transfer-Encoding: chunked +Connection: keep-alive +Vary: Accept-Encoding +ETag: W/"5d760399-52868" +Access-Control-Allow-Origin: * +Access-Control-Allow-Credentials: true +Access-Control-Expose-Headers: Content-Length,Content-Range +Access-Control-Allow-Headers: Range +Content-Encoding: gzip + diff --git a/pingora-header-serde/samples/test/4 b/pingora-header-serde/samples/test/4 new file mode 100644 index 0000000..8215d6e --- /dev/null +++ b/pingora-header-serde/samples/test/4 @@ -0,0 +1,15 @@ +HTTP/1.1 200 OK +Server: nginx +Date: Wed, 22 Dec 2021 06:11:09 GMT +Content-Type: application/javascript +Last-Modified: Mon, 20 Dec 2021 01:23:10 GMT +Transfer-Encoding: chunked +Connection: keep-alive +Vary: Accept-Encoding +ETag: W/"61bfdafe-21bc4" +Access-Control-Allow-Origin: * +Access-Control-Allow-Credentials: true +Access-Control-Expose-Headers: Content-Length,Content-Range +Access-Control-Allow-Headers: Range +Content-Encoding: gzip + diff --git a/pingora-header-serde/samples/test/5 b/pingora-header-serde/samples/test/5 new file mode 100644 index 0000000..4bae598 --- /dev/null +++ b/pingora-header-serde/samples/test/5 @@ -0,0 +1,15 @@ +HTTP/1.1 200 OK +Server: nginx +Date: Thu, 23 Dec 2021 15:23:29 GMT +Content-Type: application/javascript +Last-Modified: Sat, 09 Oct 2021 23:41:34 GMT +Transfer-Encoding: chunked +Connection: keep-alive +Vary: Accept-Encoding +ETag: W/"616228ae-52054" +Access-Control-Allow-Origin: * +Access-Control-Allow-Credentials: true +Access-Control-Expose-Headers: Content-Length,Content-Range +Access-Control-Allow-Headers: Range +Content-Encoding: gzip + diff --git a/pingora-header-serde/samples/test/6 b/pingora-header-serde/samples/test/6 new file mode 100644 index 0000000..9d4c680 --- /dev/null +++ b/pingora-header-serde/samples/test/6 @@ -0,0 +1,15 @@ +HTTP/1.1 200 OK +Server: nginx +Date: Wed, 22 Dec 2021 06:30:29 GMT +Content-Type: application/javascript +Last-Modified: Mon, 29 Nov 2021 10:13:32 GMT +Transfer-Encoding: chunked +Connection: keep-alive +Vary: Accept-Encoding +ETag: W/"61a4a7cc-21df8" +Access-Control-Allow-Origin: * +Access-Control-Allow-Credentials: true +Access-Control-Expose-Headers: Content-Length,Content-Range +Access-Control-Allow-Headers: Range +Content-Encoding: gzip + diff --git a/pingora-header-serde/samples/test/7 b/pingora-header-serde/samples/test/7 new file mode 100644 index 0000000..b57e5c0 --- /dev/null +++ b/pingora-header-serde/samples/test/7 @@ -0,0 +1,14 @@ +HTTP/1.1 200 OK +server: nginx +date: Sat, 25 Dec 2021 03:05:35 GMT +content-type: application/javascript +last-modified: Fri, 24 Dec 2021 04:20:01 GMT +transfer-encoding: chunked +connection: keep-alive +vary: Accept-Encoding +etag: W/"61c54a71-2d590" +access-control-allow-origin: * +access-control-allow-credentials: true +access-control-expose-headers: Content-Length,Content-Range +access-control-allow-headers: Range +content-encoding: gzip diff --git a/pingora-header-serde/src/dict.rs b/pingora-header-serde/src/dict.rs new file mode 100644 index 0000000..bc50ada --- /dev/null +++ b/pingora-header-serde/src/dict.rs @@ -0,0 +1,88 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Training to generate the zstd dictionary. + +use std::fs; +use zstd::dict; + +/// Train the zstd dictionary from all the files under the given `dir_path` +/// +/// The output will be the trained dictionary +pub fn train<P: AsRef<std::path::Path>>(dir_path: P) -> Vec<u8> { + // TODO: check f is file, it can be dir + let files = fs::read_dir(dir_path) + .unwrap() + .filter_map(|entry| entry.ok().map(|f| f.path())); + dict::from_files(files, 64 * 1024 * 1024).unwrap() +} + +#[cfg(test)] +mod test { + use super::*; + use crate::resp_header_to_buf; + use pingora_http::ResponseHeader; + + fn gen_test_dict() -> Vec<u8> { + let mut path = std::path::PathBuf::from(env!("CARGO_MANIFEST_DIR")); + path.push("samples/test"); + train(path) + } + + fn gen_test_header() -> ResponseHeader { + let mut header = ResponseHeader::build(200, None).unwrap(); + header + .append_header("Date", "Thu, 23 Dec 2021 11:23:29 GMT") + .unwrap(); + header + .append_header("Last-Modified", "Sat, 09 Oct 2021 22:41:34 GMT") + .unwrap(); + header.append_header("Connection", "keep-alive").unwrap(); + header.append_header("Vary", "Accept-encoding").unwrap(); + header.append_header("Content-Encoding", "gzip").unwrap(); + header + .append_header("Access-Control-Allow-Origin", "*") + .unwrap(); + header + } + + #[test] + fn test_ser_with_dict() { + let dict = gen_test_dict(); + let serde = crate::HeaderSerde::new(Some(dict)); + let serde_no_dict = crate::HeaderSerde::new(None); + let header = gen_test_header(); + + let compressed = serde.serialize(&header).unwrap(); + let compressed_no_dict = serde_no_dict.serialize(&header).unwrap(); + let mut buf = vec![]; + let uncompressed = resp_header_to_buf(&header, &mut buf); + + assert!(compressed.len() < uncompressed); + assert!(compressed.len() < compressed_no_dict.len()); + } + + #[test] + fn test_ser_de_with_dict() { + let dict = gen_test_dict(); + let serde = crate::HeaderSerde::new(Some(dict)); + let header = gen_test_header(); + + let compressed = serde.serialize(&header).unwrap(); + let header2 = serde.deserialize(&compressed).unwrap(); + + assert_eq!(header.status, header2.status); + assert_eq!(header.headers, header2.headers); + } +} diff --git a/pingora-header-serde/src/lib.rs b/pingora-header-serde/src/lib.rs new file mode 100644 index 0000000..73b9b29 --- /dev/null +++ b/pingora-header-serde/src/lib.rs @@ -0,0 +1,203 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! HTTP Response header serialization with compression +//! +//! This crate is able to serialize http response header to about 1/3 of its original size (HTTP/1.1 wire format) +//! with trained dictionary. + +#![warn(clippy::all)] +#![allow(clippy::new_without_default)] +#![allow(clippy::type_complexity)] + +pub mod dict; +mod thread_zstd; + +use bytes::BufMut; +use http::Version; +use pingora_error::{Error, ErrorType, Result}; +use pingora_http::ResponseHeader; +use std::cell::RefCell; +use std::ops::DerefMut; +use thread_local::ThreadLocal; + +/// HTTP Response header serialization +/// +/// This struct provides the APIs to convert HTTP response header into compressed wired format for +/// storage. +pub struct HeaderSerde { + compression: thread_zstd::Compression, + level: i32, + // internal buffer for uncompressed data to be compressed and vice versa + buf: ThreadLocal<RefCell<Vec<u8>>>, +} + +const MAX_HEADER_SIZE: usize = 64 * 1024; +const COMPRESS_LEVEL: i32 = 3; + +impl HeaderSerde { + /// Create a new [HeaderSerde] + /// + /// An optional zstd compression dictionary can be provided to improve the compression ratio + /// and speed. See [dict] for more details. + pub fn new(dict: Option<Vec<u8>>) -> Self { + if let Some(dict) = dict { + HeaderSerde { + compression: thread_zstd::Compression::with_dict(dict), + level: COMPRESS_LEVEL, + buf: ThreadLocal::new(), + } + } else { + HeaderSerde { + compression: thread_zstd::Compression::new(), + level: COMPRESS_LEVEL, + buf: ThreadLocal::new(), + } + } + } + + /// Serialize the given response header + pub fn serialize(&self, header: &ResponseHeader) -> Result<Vec<u8>> { + // for now we use HTTP 1.1 wire format for that + // TODO: should convert to h1 if the incoming header is for h2 + let mut buf = self + .buf + .get_or(|| RefCell::new(Vec::with_capacity(MAX_HEADER_SIZE))) + .borrow_mut(); + buf.clear(); // reset the buf + resp_header_to_buf(header, &mut buf); + self.compression + .compress(&buf, self.level) + .map_err(|e| into_error(e, "compress header")) + } + + /// Deserialize the given response header + pub fn deserialize(&self, data: &[u8]) -> Result<ResponseHeader> { + let mut buf = self + .buf + .get_or(|| RefCell::new(Vec::with_capacity(MAX_HEADER_SIZE))) + .borrow_mut(); + buf.clear(); // reset the buf + self.compression + .decompress_to_buffer(data, buf.deref_mut()) + .map_err(|e| into_error(e, "decompress header"))?; + buf_to_http_header(&buf) + } +} + +#[inline] +fn into_error(e: &'static str, context: &'static str) -> Box<Error> { + Error::because(ErrorType::InternalError, context, e) +} + +const CRLF: &[u8; 2] = b"\r\n"; + +// Borrowed from pingora http1 +#[inline] +fn resp_header_to_buf(resp: &ResponseHeader, buf: &mut Vec<u8>) -> usize { + // Status-Line + let version = match resp.version { + Version::HTTP_10 => "HTTP/1.0 ", + Version::HTTP_11 => "HTTP/1.1 ", + _ => "HTTP/1.1 ", // store everything else (including h2) in http 1.1 format + }; + buf.put_slice(version.as_bytes()); + let status = resp.status; + buf.put_slice(status.as_str().as_bytes()); + buf.put_u8(b' '); + let reason = status.canonical_reason(); + if let Some(reason_buf) = reason { + buf.put_slice(reason_buf.as_bytes()); + } + buf.put_slice(CRLF); + + // headers + resp.header_to_h1_wire(buf); + + buf.put_slice(CRLF); + + buf.len() +} + +// Should match pingora http1 setting +const MAX_HEADERS: usize = 160; + +#[inline] +fn buf_to_http_header(buf: &[u8]) -> Result<ResponseHeader> { + let mut headers = vec![httparse::EMPTY_HEADER; MAX_HEADERS]; + let mut resp = httparse::Response::new(&mut headers); + + match resp.parse(buf) { + Ok(s) => match s { + httparse::Status::Complete(_size) => parsed_to_header(&resp), + // we always feed the but that contains the entire header to parse + _ => Error::e_explain(ErrorType::InternalError, "incomplete uncompressed header"), + }, + Err(e) => Error::e_because( + ErrorType::InternalError, + format!( + "parsing failed on uncompressed header, {}", + String::from_utf8_lossy(buf) + ), + e, + ), + } +} + +#[inline] +fn parsed_to_header(parsed: &httparse::Response) -> Result<ResponseHeader> { + // code should always be there + let mut resp = ResponseHeader::build(parsed.code.unwrap(), Some(parsed.headers.len()))?; + + for header in parsed.headers.iter() { + resp.append_header(header.name.to_string(), header.value)?; + } + + Ok(resp) +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_ser_wo_dict() { + let serde = HeaderSerde::new(None); + let mut header = ResponseHeader::build(200, None).unwrap(); + header.append_header("foo", "bar").unwrap(); + header.append_header("foo", "barbar").unwrap(); + header.append_header("foo", "barbarbar").unwrap(); + header.append_header("Server", "Pingora").unwrap(); + + let compressed = serde.serialize(&header).unwrap(); + let mut buf = vec![]; + let uncompressed = resp_header_to_buf(&header, &mut buf); + assert!(compressed.len() < uncompressed); + } + + #[test] + fn test_ser_de_no_dict() { + let serde = HeaderSerde::new(None); + let mut header = ResponseHeader::build(200, None).unwrap(); + header.append_header("foo1", "bar1").unwrap(); + header.append_header("foo2", "barbar2").unwrap(); + header.append_header("foo3", "barbarbar3").unwrap(); + header.append_header("Server", "Pingora").unwrap(); + + let compressed = serde.serialize(&header).unwrap(); + let header2 = serde.deserialize(&compressed).unwrap(); + assert_eq!(header.status, header2.status); + assert_eq!(header.headers, header2.headers); + } +} diff --git a/pingora-header-serde/src/thread_zstd.rs b/pingora-header-serde/src/thread_zstd.rs new file mode 100644 index 0000000..5c6406e --- /dev/null +++ b/pingora-header-serde/src/thread_zstd.rs @@ -0,0 +1,79 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use std::cell::RefCell; +use thread_local::ThreadLocal; + +/// Each thread will own its compression and decompression CTXes, and they share a single dict +/// https://facebook.github.io/zstd/zstd_manual.html recommends to reuse ctx per thread + +#[derive(Default)] +pub struct Compression { + com_context: ThreadLocal<RefCell<zstd_safe::CCtx<'static>>>, + de_context: ThreadLocal<RefCell<zstd_safe::DCtx<'static>>>, + dict: Vec<u8>, +} + +// these codes are inspired by zstd crate + +impl Compression { + pub fn new() -> Self { + Compression { + com_context: ThreadLocal::new(), + de_context: ThreadLocal::new(), + dict: vec![], + } + } + pub fn with_dict(dict: Vec<u8>) -> Self { + Compression { + com_context: ThreadLocal::new(), + de_context: ThreadLocal::new(), + dict, + } + } + + pub fn compress_to_buffer<C: zstd_safe::WriteBuf + ?Sized>( + &self, + source: &[u8], + destination: &mut C, + level: i32, + ) -> Result<usize, &'static str> { + self.com_context + .get_or(|| RefCell::new(zstd_safe::create_cctx())) + .borrow_mut() + .compress_using_dict(destination, source, &self.dict[..], level) + .map_err(zstd_safe::get_error_name) + } + + pub fn compress(&self, data: &[u8], level: i32) -> Result<Vec<u8>, &'static str> { + let buffer_len = zstd_safe::compress_bound(data.len()); + let mut buffer = Vec::with_capacity(buffer_len); + + self.compress_to_buffer(data, &mut buffer, level)?; + + Ok(buffer) + } + + pub fn decompress_to_buffer<C: zstd_safe::WriteBuf + ?Sized>( + &self, + source: &[u8], + destination: &mut C, + ) -> Result<usize, &'static str> { + self.de_context + .get_or(|| RefCell::new(zstd_safe::create_dctx())) + .borrow_mut() + .decompress_using_dict(destination, source, &self.dict) + .map_err(zstd_safe::get_error_name) + } +} diff --git a/pingora-header-serde/src/trainer.rs b/pingora-header-serde/src/trainer.rs new file mode 100644 index 0000000..36308e5 --- /dev/null +++ b/pingora-header-serde/src/trainer.rs @@ -0,0 +1,23 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use pingora_header_serde::dict::train; +use std::env; +use std::io::{self, Write}; + +pub fn main() { + let args: Vec<String> = env::args().collect(); + let dict = train(&args[1]); + io::stdout().write_all(&dict).unwrap(); +} diff --git a/pingora-http/Cargo.toml b/pingora-http/Cargo.toml new file mode 100644 index 0000000..6cf7f97 --- /dev/null +++ b/pingora-http/Cargo.toml @@ -0,0 +1,26 @@ +[package] +name = "pingora-http" +version = "0.1.0" +authors = ["Yuchen Wu <[email protected]>"] +license = "Apache-2.0" +edition = "2021" +repository = "https://github.com/cloudflare/pingora" +categories = ["web-programming"] +keywords = ["http", "pingora"] +description = """ +HTTP request and response header types for Pingora. +""" + +# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html +[lib] +name = "pingora_http" +path = "src/lib.rs" + +[dependencies] +http = { workspace = true } +bytes = { workspace = true } +pingora-error = { version = "0.1.0", path = "../pingora-error" } + +[features] +default = [] +patched_http1 = [] diff --git a/pingora-http/LICENSE b/pingora-http/LICENSE new file mode 100644 index 0000000..d645695 --- /dev/null +++ b/pingora-http/LICENSE @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/pingora-http/src/case_header_name.rs b/pingora-http/src/case_header_name.rs new file mode 100644 index 0000000..4b6c133 --- /dev/null +++ b/pingora-http/src/case_header_name.rs @@ -0,0 +1,116 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use crate::*; +use bytes::Bytes; +use http::header; + +#[derive(Debug, Clone)] +pub struct CaseHeaderName(Bytes); + +impl CaseHeaderName { + pub fn new(name: String) -> Self { + CaseHeaderName(name.into()) + } +} + +impl CaseHeaderName { + pub fn as_slice(&self) -> &[u8] { + &self.0 + } + + pub fn from_slice(buf: &[u8]) -> Self { + CaseHeaderName(Bytes::copy_from_slice(buf)) + } +} + +/// A trait that converts into case sensitive header names. +pub trait IntoCaseHeaderName { + fn into_case_header_name(self) -> CaseHeaderName; +} + +impl IntoCaseHeaderName for CaseHeaderName { + fn into_case_header_name(self) -> CaseHeaderName { + self + } +} + +impl IntoCaseHeaderName for String { + fn into_case_header_name(self) -> CaseHeaderName { + CaseHeaderName(self.into()) + } +} + +impl IntoCaseHeaderName for &'static str { + fn into_case_header_name(self) -> CaseHeaderName { + CaseHeaderName(self.into()) + } +} + +impl IntoCaseHeaderName for HeaderName { + fn into_case_header_name(self) -> CaseHeaderName { + CaseHeaderName(titled_header_name(&self)) + } +} + +impl IntoCaseHeaderName for &HeaderName { + fn into_case_header_name(self) -> CaseHeaderName { + CaseHeaderName(titled_header_name(self)) + } +} + +impl IntoCaseHeaderName for Bytes { + fn into_case_header_name(self) -> CaseHeaderName { + CaseHeaderName(self) + } +} + +fn titled_header_name(header_name: &HeaderName) -> Bytes { + titled_header_name_str(header_name).map_or_else( + || Bytes::copy_from_slice(header_name.as_str().as_bytes()), + |s| Bytes::from_static(s.as_bytes()), + ) +} + +pub(crate) fn titled_header_name_str(header_name: &HeaderName) -> Option<&'static str> { + Some(match *header_name { + header::AGE => "Age", + header::CACHE_CONTROL => "Cache-Control", + header::CONNECTION => "Connection", + header::CONTENT_TYPE => "Content-Type", + header::CONTENT_ENCODING => "Content-Encoding", + header::CONTENT_LENGTH => "Content-Length", + header::DATE => "Date", + header::TRANSFER_ENCODING => "Transfer-Encoding", + header::HOST => "Host", + header::SERVER => "Server", + // TODO: add more const header here to map to their titled case + // TODO: automatically upper case the first letter? + _ => { + return None; + } + }) +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_case_header_name() { + assert_eq!("FoO".into_case_header_name().as_slice(), b"FoO"); + assert_eq!("FoO".to_string().into_case_header_name().as_slice(), b"FoO"); + assert_eq!(header::SERVER.into_case_header_name().as_slice(), b"Server"); + } +} diff --git a/pingora-http/src/lib.rs b/pingora-http/src/lib.rs new file mode 100644 index 0000000..f310308 --- /dev/null +++ b/pingora-http/src/lib.rs @@ -0,0 +1,672 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! HTTP header objects that preserve http header cases +//! +//! Although HTTP header names are supposed to be case-insensitive for compatibility, proxies +//! ideally shouldn't alter the HTTP traffic, especially the headers they don't need to read. +//! +//! This crate provide structs and methods to preserve the headers in order to build a transparent +//! proxy. + +#![allow(clippy::new_without_default)] + +use bytes::BufMut; +use http::header::{AsHeaderName, HeaderName, HeaderValue}; +use http::request::Builder as ReqBuilder; +use http::request::Parts as ReqParts; +use http::response::Builder as RespBuilder; +use http::response::Parts as RespParts; +use http::uri::Uri; +use pingora_error::{ErrorType::*, OrErr, Result}; +use std::convert::TryInto; +use std::ops::Deref; + +pub use http::method::Method; +pub use http::status::StatusCode; +pub use http::version::Version; +pub use http::HeaderMap as HMap; + +mod case_header_name; +use case_header_name::CaseHeaderName; +pub use case_header_name::IntoCaseHeaderName; + +pub mod prelude { + pub use crate::RequestHeader; +} + +/* an ordered header map to store the original case of each header name +HMap({ + "foo": ["Foo", "foO", "FoO"] +}) +The order how HeaderMap iter over its items is "arbitrary, but consistent". +Hopefully this property makes sure this map of header names always iterates in the +same order of the map of header values. +This idea is inspaired by hyper @nox +*/ +type CaseMap = HMap<CaseHeaderName>; + +/// The HTTP request header type. +/// +/// This type is similar to [http::request::Parts] but preserves header name case. +/// It also preserves request path even if it is not UTF-8. +/// +/// [RequestHeader] implements [Deref] for [http::request::Parts] so it can be used as it in most +/// places. +#[derive(Debug)] +pub struct RequestHeader { + base: ReqParts, + header_name_map: Option<CaseMap>, + // store the raw path bytes only if it is invalid utf-8 + raw_path_fallback: Vec<u8>, // can also be Box<[u8]> +} + +impl AsRef<ReqParts> for RequestHeader { + fn as_ref(&self) -> &ReqParts { + &self.base + } +} + +impl Deref for RequestHeader { + type Target = ReqParts; + + fn deref(&self) -> &Self::Target { + &self.base + } +} + +impl RequestHeader { + fn new_no_case(size_hint: Option<usize>) -> Self { + let mut base = ReqBuilder::new().body(()).unwrap().into_parts().0; + base.headers.reserve(http_header_map_upper_bound(size_hint)); + RequestHeader { + base, + header_name_map: None, + raw_path_fallback: vec![], + } + } + + /// Create a new [RequestHeader] with the given method and path. + /// + /// The `path` can be non UTF-8. + pub fn build( + method: impl TryInto<Method>, + path: &[u8], + size_hint: Option<usize>, + ) -> Result<Self> { + let mut req = Self::build_no_case(method, path, size_hint)?; + req.header_name_map = Some(CaseMap::with_capacity(http_header_map_upper_bound( + size_hint, + ))); + Ok(req) + } + + /// Create a new [RequestHeader] with the given method and path without preserving header case. + /// + /// A [RequestHeader] created from this type is more space efficient than those from [Self::build()]. + /// + /// Use this method if reading from or writing to HTTP/2 sessions where header case doesn't matter anyway. + pub fn build_no_case( + method: impl TryInto<Method>, + path: &[u8], + size_hint: Option<usize>, + ) -> Result<Self> { + let mut req = Self::new_no_case(size_hint); + req.base.method = method + .try_into() + .explain_err(InvalidHTTPHeader, |_| "invalid method")?; + if let Ok(p) = std::str::from_utf8(path) { + let uri = Uri::builder() + .path_and_query(p) + .build() + .explain_err(InvalidHTTPHeader, |_| format!("invalid uri {}", p))?; + req.base.uri = uri; + // keep raw_path empty, no need to store twice + } else { + // put a valid utf-8 path into base for read only access + let lossy_str = String::from_utf8_lossy(path); + let uri = Uri::builder() + .path_and_query(lossy_str.as_ref()) + .build() + .explain_err(InvalidHTTPHeader, |_| format!("invalid uri {}", lossy_str))?; + req.base.uri = uri; + req.raw_path_fallback = path.to_vec(); + } + + Ok(req) + } + + /// Append the header name and value to `self`. + /// + /// If there are already some header under the same name, a new value will be added without + /// any others being removed. + pub fn append_header( + &mut self, + name: impl IntoCaseHeaderName, + value: impl TryInto<HeaderValue>, + ) -> Result<bool> { + let header_value = value + .try_into() + .explain_err(InvalidHTTPHeader, |_| "invalid value while append")?; + append_header_value( + self.header_name_map.as_mut(), + &mut self.base.headers, + name, + header_value, + ) + } + + /// Insert the header name and value to `self`. + /// + /// Different from [Self::append_header()], this method will replace all other existing headers + /// under the same name (case insensitive). + pub fn insert_header( + &mut self, + name: impl IntoCaseHeaderName, + value: impl TryInto<HeaderValue>, + ) -> Result<()> { + let header_value = value + .try_into() + .explain_err(InvalidHTTPHeader, |_| "invalid value while insert")?; + insert_header_value( + self.header_name_map.as_mut(), + &mut self.base.headers, + name, + header_value, + ) + } + + /// Remove all headers under the name + pub fn remove_header<'a, N: ?Sized>(&mut self, name: &'a N) -> Option<HeaderValue> + where + &'a N: 'a + AsHeaderName, + { + remove_header(self.header_name_map.as_mut(), &mut self.base.headers, name) + } + + /// Write the header to the `buf` in HTTP/1.1 wire format. + /// + /// The header case will be preserved. + pub fn header_to_h1_wire(&self, buf: &mut impl BufMut) { + header_to_h1_wire(self.header_name_map.as_ref(), &self.base.headers, buf) + } + + /// Set the request method + pub fn set_method(&mut self, method: Method) { + self.base.method = method; + } + + /// Set the request URI + pub fn set_uri(&mut self, uri: http::Uri) { + self.base.uri = uri; + } + + /// Return the request path in its raw format + /// + /// Non-UTF8 is supported. + pub fn raw_path(&self) -> &[u8] { + if !self.raw_path_fallback.is_empty() { + &self.raw_path_fallback + } else { + // Url should always be set + self.base + .uri + .path_and_query() + .as_ref() + .unwrap() + .as_str() + .as_bytes() + } + } + + /// Return the file extension of the path + pub fn uri_file_extension(&self) -> Option<&str> { + // get everything after the last '.' in path + let (_, ext) = self + .uri + .path_and_query() + .and_then(|pq| pq.path().rsplit_once('.'))?; + Some(ext) + } + + /// Set http version + pub fn set_version(&mut self, version: Version) { + self.base.version = version; + } + + /// Clone `self` into [http::request::Parts]. + pub fn as_owned_parts(&self) -> ReqParts { + clone_req_parts(&self.base) + } +} + +impl Clone for RequestHeader { + fn clone(&self) -> Self { + Self { + base: self.as_owned_parts(), + header_name_map: self.header_name_map.clone(), + raw_path_fallback: self.raw_path_fallback.clone(), + } + } +} + +// The `RequestHeader` will be the no case variant, because `ReqParts` keeps no header case +impl From<ReqParts> for RequestHeader { + fn from(parts: ReqParts) -> RequestHeader { + Self { + base: parts, + header_name_map: None, + // no illegal path + raw_path_fallback: vec![], + } + } +} + +impl From<RequestHeader> for ReqParts { + fn from(resp: RequestHeader) -> ReqParts { + resp.base + } +} + +/// The HTTP response header type. +/// +/// This type is similar to [http::response::Parts] but preserves header name case. +/// [ResponseHeader] implements [Deref] for [http::response::Parts] so it can be used as it in most +/// places. +#[derive(Debug)] +pub struct ResponseHeader { + base: RespParts, + // an ordered header map to store the original case of each header name + header_name_map: Option<CaseMap>, +} + +impl AsRef<RespParts> for ResponseHeader { + fn as_ref(&self) -> &RespParts { + &self.base + } +} + +impl Deref for ResponseHeader { + type Target = RespParts; + + fn deref(&self) -> &Self::Target { + &self.base + } +} + +impl Clone for ResponseHeader { + fn clone(&self) -> Self { + Self { + base: self.as_owned_parts(), + header_name_map: self.header_name_map.clone(), + } + } +} + +// The `ResponseHeader` will be the no case variant, because `RespParts` keeps no header case +impl From<RespParts> for ResponseHeader { + fn from(parts: RespParts) -> ResponseHeader { + Self { + base: parts, + header_name_map: None, + } + } +} + +impl From<ResponseHeader> for RespParts { + fn from(resp: ResponseHeader) -> RespParts { + resp.base + } +} + +impl From<Box<ResponseHeader>> for Box<RespParts> { + fn from(resp: Box<ResponseHeader>) -> Box<RespParts> { + Box::new(resp.base) + } +} + +impl ResponseHeader { + fn new(size_hint: Option<usize>) -> Self { + let mut resp_header = Self::new_no_case(size_hint); + resp_header.header_name_map = Some(CaseMap::with_capacity(http_header_map_upper_bound( + size_hint, + ))); + resp_header + } + + fn new_no_case(size_hint: Option<usize>) -> Self { + let mut base = RespBuilder::new().body(()).unwrap().into_parts().0; + base.headers.reserve(http_header_map_upper_bound(size_hint)); + ResponseHeader { + base, + header_name_map: None, + } + } + + /// Create a new [ResponseHeader] with the given status code. + pub fn build(code: impl TryInto<StatusCode>, size_hint: Option<usize>) -> Result<Self> { + let mut resp = Self::new(size_hint); + resp.base.status = code + .try_into() + .explain_err(InvalidHTTPHeader, |_| "invalid status")?; + Ok(resp) + } + + /// Create a new [ResponseHeader] with the given status code without preserving header case. + /// + /// A [ResponseHeader] created from this type is more space efficient than those from [Self::build()]. + /// + /// Use this method if reading from or writing to HTTP/2 sessions where header case doesn't matter anyway. + pub fn build_no_case(code: impl TryInto<StatusCode>, size_hint: Option<usize>) -> Result<Self> { + let mut resp = Self::new_no_case(size_hint); + resp.base.status = code + .try_into() + .explain_err(InvalidHTTPHeader, |_| "invalid status")?; + Ok(resp) + } + + /// Append the header name and value to `self`. + /// + /// If there are already some header under the same name, a new value will be added without + /// any others being removed. + pub fn append_header( + &mut self, + name: impl IntoCaseHeaderName, + value: impl TryInto<HeaderValue>, + ) -> Result<bool> { + let header_value = value + .try_into() + .explain_err(InvalidHTTPHeader, |_| "invalid value while append")?; + append_header_value( + self.header_name_map.as_mut(), + &mut self.base.headers, + name, + header_value, + ) + } + + /// Insert the header name and value to `self`. + /// + /// Different from [Self::append_header()], this method will replace all other existing headers + /// under the same name (case insensitive). + pub fn insert_header( + &mut self, + name: impl IntoCaseHeaderName, + value: impl TryInto<HeaderValue>, + ) -> Result<()> { + let header_value = value + .try_into() + .explain_err(InvalidHTTPHeader, |_| "invalid value while insert")?; + insert_header_value( + self.header_name_map.as_mut(), + &mut self.base.headers, + name, + header_value, + ) + } + + /// Remove all headers under the name + pub fn remove_header<'a, N: ?Sized>(&mut self, name: &'a N) -> Option<HeaderValue> + where + &'a N: 'a + AsHeaderName, + { + remove_header(self.header_name_map.as_mut(), &mut self.base.headers, name) + } + + /// Write the header to the `buf` in HTTP/1.1 wire format. + /// + /// The header case will be preserved. + pub fn header_to_h1_wire(&self, buf: &mut impl BufMut) { + header_to_h1_wire(self.header_name_map.as_ref(), &self.base.headers, buf) + } + + /// Set the status code + pub fn set_status(&mut self, status: impl TryInto<StatusCode>) -> Result<()> { + self.base.status = status + .try_into() + .explain_err(InvalidHTTPHeader, |_| "invalid status")?; + Ok(()) + } + + /// Set the HTTP version + pub fn set_version(&mut self, version: Version) { + self.base.version = version + } + + /// Clone `self` into [http::response::Parts]. + pub fn as_owned_parts(&self) -> RespParts { + clone_resp_parts(&self.base) + } +} + +fn clone_req_parts(me: &ReqParts) -> ReqParts { + let mut parts = ReqBuilder::new() + .method(me.method.clone()) + .uri(me.uri.clone()) + .version(me.version) + .body(()) + .unwrap() + .into_parts() + .0; + parts.headers = me.headers.clone(); + parts +} + +fn clone_resp_parts(me: &RespParts) -> RespParts { + let mut parts = RespBuilder::new() + .status(me.status) + .version(me.version) + .body(()) + .unwrap() + .into_parts() + .0; + parts.headers = me.headers.clone(); + parts +} + +// This function returns an upper bound on the size of the header map used inside the http crate. +// As of version 0.2, there is a limit of 1 << 15 (32,768) items inside the map. There is an +// assertion against this size inside the crate so we want to avoid panicking by not exceeding this +// upper bound. +fn http_header_map_upper_bound(size_hint: Option<usize>) -> usize { + // Even though the crate has 1 << 15 as the max size, calls to `with_capacity` invoke a + // function that returns the size + size / 3. + // + // See https://github.com/hyperium/http/blob/34a9d6bdab027948d6dea3b36d994f9cbaf96f75/src/header/map.rs#L3220 + // + // Therefore we set our max size to be even lower so we guarantee ourselves we won't hit that + // upper bound in the crate. Any way you cut it, 4,096 headers is insane. + const PINGORA_MAX_HEADER_COUNT: usize = 4096; + const INIT_HEADER_SIZE: usize = 8; + + // We select the size hint or the max size here such that we pick a value substantially lower + // 1 << 15 with room to grow the header map. + std::cmp::min( + size_hint.unwrap_or(INIT_HEADER_SIZE), + PINGORA_MAX_HEADER_COUNT, + ) +} + +#[inline] +fn append_header_value<T>( + name_map: Option<&mut CaseMap>, + value_map: &mut HMap<T>, + name: impl IntoCaseHeaderName, + value: T, +) -> Result<bool> { + let case_header_name = name.into_case_header_name(); + let header_name: HeaderName = case_header_name + .as_slice() + .try_into() + .or_err(InvalidHTTPHeader, "invalid header name")?; + // storage the original case in the map + if let Some(name_map) = name_map { + name_map.append(header_name.clone(), case_header_name); + } + + Ok(value_map.append(header_name, value)) +} + +#[inline] +fn insert_header_value<T>( + name_map: Option<&mut CaseMap>, + value_map: &mut HMap<T>, + name: impl IntoCaseHeaderName, + value: T, +) -> Result<()> { + let case_header_name = name.into_case_header_name(); + let header_name: HeaderName = case_header_name + .as_slice() + .try_into() + .or_err(InvalidHTTPHeader, "invalid header name")?; + if let Some(name_map) = name_map { + // storage the original case in the map + name_map.insert(header_name.clone(), case_header_name); + } + value_map.insert(header_name, value); + Ok(()) +} + +// the &N here is to avoid clone(). None Copy type like String can impl AsHeaderName +#[inline] +fn remove_header<'a, T, N: ?Sized>( + name_map: Option<&mut CaseMap>, + value_map: &mut HMap<T>, + name: &'a N, +) -> Option<T> +where + &'a N: 'a + AsHeaderName, +{ + if let Some(name_map) = name_map { + name_map.remove(name); + } + value_map.remove(name) +} + +#[inline] +fn header_to_h1_wire(key_map: Option<&CaseMap>, value_map: &HMap, buf: &mut impl BufMut) { + const CRLF: &[u8; 2] = b"\r\n"; + const HEADER_KV_DELIMITER: &[u8; 2] = b": "; + + if let Some(key_map) = key_map { + let iter = key_map.iter().zip(value_map.iter()); + for ((header, case_header), (header2, val)) in iter { + if header != header2 { + // in case the header iter order changes in further version of HMap + panic!("header iter mismatch {}, {}", header, header2) + } + buf.put_slice(case_header.as_slice()); + buf.put_slice(HEADER_KV_DELIMITER); + buf.put_slice(val.as_ref()); + buf.put_slice(CRLF); + } + } else { + for (header, value) in value_map { + let titled_header = + case_header_name::titled_header_name_str(header).unwrap_or(header.as_str()); + buf.put_slice(titled_header.as_bytes()); + buf.put_slice(HEADER_KV_DELIMITER); + buf.put_slice(value.as_ref()); + buf.put_slice(CRLF); + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn header_map_upper_bound() { + assert_eq!(8, http_header_map_upper_bound(None)); + assert_eq!(16, http_header_map_upper_bound(Some(16))); + assert_eq!(4096, http_header_map_upper_bound(Some(7777))); + } + + #[test] + fn test_single_header() { + let mut req = RequestHeader::build("GET", b"\\", None).unwrap(); + req.insert_header("foo", "bar").unwrap(); + req.insert_header("FoO", "Bar").unwrap(); + let mut buf: Vec<u8> = vec![]; + req.header_to_h1_wire(&mut buf); + assert_eq!(buf, b"FoO: Bar\r\n"); + + let mut resp = ResponseHeader::new(None); + req.insert_header("foo", "bar").unwrap(); + resp.insert_header("FoO", "Bar").unwrap(); + let mut buf: Vec<u8> = vec![]; + resp.header_to_h1_wire(&mut buf); + assert_eq!(buf, b"FoO: Bar\r\n"); + } + + #[test] + fn test_single_header_no_case() { + let mut req = RequestHeader::new_no_case(None); + req.insert_header("foo", "bar").unwrap(); + req.insert_header("FoO", "Bar").unwrap(); + let mut buf: Vec<u8> = vec![]; + req.header_to_h1_wire(&mut buf); + assert_eq!(buf, b"foo: Bar\r\n"); + + let mut resp = ResponseHeader::new_no_case(None); + req.insert_header("foo", "bar").unwrap(); + resp.insert_header("FoO", "Bar").unwrap(); + let mut buf: Vec<u8> = vec![]; + resp.header_to_h1_wire(&mut buf); + assert_eq!(buf, b"foo: Bar\r\n"); + } + + #[test] + fn test_multiple_header() { + let mut req = RequestHeader::build("GET", b"\\", None).unwrap(); + req.append_header("FoO", "Bar").unwrap(); + req.append_header("fOO", "bar").unwrap(); + req.append_header("BAZ", "baR").unwrap(); + req.append_header(http::header::CONTENT_LENGTH, "0") + .unwrap(); + req.append_header("a", "b").unwrap(); + req.remove_header("a"); + let mut buf: Vec<u8> = vec![]; + req.header_to_h1_wire(&mut buf); + assert_eq!( + buf, + b"FoO: Bar\r\nfOO: bar\r\nBAZ: baR\r\nContent-Length: 0\r\n" + ); + + let mut resp = ResponseHeader::new(None); + resp.append_header("FoO", "Bar").unwrap(); + resp.append_header("fOO", "bar").unwrap(); + resp.append_header("BAZ", "baR").unwrap(); + resp.append_header(http::header::CONTENT_LENGTH, "0") + .unwrap(); + resp.append_header("a", "b").unwrap(); + resp.remove_header("a"); + let mut buf: Vec<u8> = vec![]; + resp.header_to_h1_wire(&mut buf); + assert_eq!( + buf, + b"FoO: Bar\r\nfOO: bar\r\nBAZ: baR\r\nContent-Length: 0\r\n" + ); + } + + #[cfg(feature = "patched_http1")] + #[test] + fn test_invalid_path() { + let raw_path = b"Hello\xF0\x90\x80World"; + let req = RequestHeader::build("GET", &raw_path[..], None).unwrap(); + assert_eq!("Hello�World", req.uri.path_and_query().unwrap()); + assert_eq!(raw_path, req.raw_path()); + } +} diff --git a/pingora-ketama/Cargo.toml b/pingora-ketama/Cargo.toml new file mode 100644 index 0000000..1e467b5 --- /dev/null +++ b/pingora-ketama/Cargo.toml @@ -0,0 +1,32 @@ +[package] +name = "pingora-ketama" +version = "0.1.0" +description = "Rust port of the nginx consistent hash function" +authors = ["Pingora Team <[email protected]>"] +license = "Apache-2.0" +edition = "2021" +repository = "https://github.com/cloudflare/pingora" +categories = ["caching", "algorithms"] +keywords = ["hash", "hashing", "consistent", "pingora"] + +[dependencies] +crc32fast = "1.3" + +[dev-dependencies] +criterion = "0.4" +csv = "1.2" +dhat = "0.3" +env_logger = "0.9" +log = { workspace = true } +rand = "0.8" + +[[bench]] +name = "simple" +harness = false + +[[bench]] +name = "memory" +harness = false + +[features] +heap-prof = [] diff --git a/pingora-ketama/LICENSE b/pingora-ketama/LICENSE new file mode 100644 index 0000000..d645695 --- /dev/null +++ b/pingora-ketama/LICENSE @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/pingora-ketama/benches/memory.rs b/pingora-ketama/benches/memory.rs new file mode 100644 index 0000000..ac12de4 --- /dev/null +++ b/pingora-ketama/benches/memory.rs @@ -0,0 +1,22 @@ +use pingora_ketama::{Bucket, Continuum}; + +#[global_allocator] +static ALLOC: dhat::Alloc = dhat::Alloc; + +fn buckets() -> Vec<Bucket> { + let mut b = Vec::new(); + + for i in 1..254 { + b.push(Bucket::new( + format!("127.0.0.{i}:6443").parse().unwrap(), + 10, + )); + } + + b +} + +pub fn main() { + let _profiler = dhat::Profiler::new_heap(); + let _c = Continuum::new(&buckets()); +} diff --git a/pingora-ketama/benches/simple.rs b/pingora-ketama/benches/simple.rs new file mode 100644 index 0000000..253cf33 --- /dev/null +++ b/pingora-ketama/benches/simple.rs @@ -0,0 +1,45 @@ +use pingora_ketama::{Bucket, Continuum}; + +use criterion::{criterion_group, criterion_main, Criterion}; +use rand::distributions::Alphanumeric; +use rand::{thread_rng, Rng}; + +#[cfg(feature = "heap-prof")] +#[global_allocator] +static ALLOC: dhat::Alloc = dhat::Alloc; + +fn buckets() -> Vec<Bucket> { + let mut b = Vec::new(); + + for i in 1..101 { + b.push(Bucket::new(format!("127.0.0.{i}:6443").parse().unwrap(), 1)); + } + + b +} + +fn random_string() -> String { + thread_rng() + .sample_iter(&Alphanumeric) + .take(30) + .map(char::from) + .collect() +} + +pub fn criterion_benchmark(c: &mut Criterion) { + #[cfg(feature = "heap-prof")] + let _profiler = dhat::Profiler::new_heap(); + + c.bench_function("create_continuum", |b| { + b.iter(|| Continuum::new(&buckets())) + }); + + c.bench_function("continuum_hash", |b| { + let continuum = Continuum::new(&buckets()); + + b.iter(|| continuum.node(random_string().as_bytes())) + }); +} + +criterion_group!(benches, criterion_benchmark); +criterion_main!(benches); diff --git a/pingora-ketama/examples/health_aware_selector.rs b/pingora-ketama/examples/health_aware_selector.rs new file mode 100644 index 0000000..938c7d7 --- /dev/null +++ b/pingora-ketama/examples/health_aware_selector.rs @@ -0,0 +1,94 @@ +use log::info; +use pingora_ketama::{Bucket, Continuum}; +use std::collections::HashMap; +use std::net::SocketAddr; + +// A repository for node healthiness, emulating a health checker. +struct NodeHealthRepository { + nodes: HashMap<SocketAddr, bool>, +} + +impl NodeHealthRepository { + fn new() -> Self { + NodeHealthRepository { + nodes: HashMap::new(), + } + } + + fn set_node_health(&mut self, node: SocketAddr, is_healthy: bool) { + self.nodes.insert(node, is_healthy); + } + + fn node_is_healthy(&self, node: &SocketAddr) -> bool { + self.nodes.get(node).cloned().unwrap_or(false) + } +} + +// A health-aware node selector, which relies on the above health repository. +struct HealthAwareNodeSelector<'a> { + ring: Continuum, + max_tries: usize, + node_health_repo: &'a NodeHealthRepository, +} + +impl<'a> HealthAwareNodeSelector<'a> { + fn new(r: Continuum, tries: usize, nhr: &NodeHealthRepository) -> HealthAwareNodeSelector { + HealthAwareNodeSelector { + ring: r, + max_tries: tries, + node_health_repo: nhr, + } + } + + // Try to select a node within <max_tries> attempts. + fn try_select(&self, key: &str) -> Option<SocketAddr> { + let node_iter = self.ring.node_iter(key.as_bytes()); + + for (tries, node) in node_iter.enumerate() { + if tries >= self.max_tries { + break; + } + + if self.node_health_repo.node_is_healthy(node) { + return Some(*node); + } + } + + None + } +} + +// RUST_LOG=INFO cargo run --example health_aware_selector +fn main() { + env_logger::init(); + + // Set up some nodes. + let buckets: Vec<_> = (1..=10) + .map(|i| Bucket::new(format!("127.0.0.{i}:6443").parse().unwrap(), 1)) + .collect(); + + // Mark the 1-5th nodes healthy, the 6-10th nodes unhealthy. + let mut health_repo = NodeHealthRepository::new(); + (1..=10) + .map(|i| (i, format!("127.0.0.{i}:6443").parse().unwrap())) + .for_each(|(i, n)| { + health_repo.set_node_health(n, i < 6); + }); + + // Create a health-aware selector with up to 3 tries. + let health_aware_selector = + HealthAwareNodeSelector::new(Continuum::new(&buckets), 3, &health_repo); + + // Let's try the selector on a few keys. + for i in 0..5 { + let key = format!("key_{i}"); + match health_aware_selector.try_select(&key) { + Some(node) => { + info!("{key}: {}:{}", node.ip(), node.port()); + } + None => { + info!("{key}: no healthy node found!"); + } + } + } +} diff --git a/pingora-ketama/src/lib.rs b/pingora-ketama/src/lib.rs new file mode 100644 index 0000000..0917056 --- /dev/null +++ b/pingora-ketama/src/lib.rs @@ -0,0 +1,447 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! # pingora-ketama +//! A Rust port of the nginx consistent hashing algorithm. +//! +//! This crate provides a consistent hashing algorithm which is identical in +//! behavior to [nginx consistent hashing](https://www.nginx.com/resources/wiki/modules/consistent_hash/). +//! +//! Using a consistent hash strategy like this is useful when one wants to +//! minimize the amount of requests that need to be rehashed to different nodes +//! when a node is added or removed. +//! +//! Here's a simple example of how one might use it: +//! +//! ``` +//! use pingora_ketama::{Bucket, Continuum}; +//! +//! # #[allow(clippy::needless_doctest_main)] +//! fn main() { +//! // Set up a continuum with a few nodes of various weight. +//! let mut buckets = vec![]; +//! buckets.push(Bucket::new("127.0.0.1:12345".parse().unwrap(), 1)); +//! buckets.push(Bucket::new("127.0.0.2:12345".parse().unwrap(), 2)); +//! buckets.push(Bucket::new("127.0.0.3:12345".parse().unwrap(), 3)); +//! let ring = Continuum::new(&buckets); +//! +//! // Let's see what the result is for a few keys: +//! for key in &["some_key", "another_key", "last_key"] { +//! let node = ring.node(key.as_bytes()).unwrap(); +//! println!("{}: {}:{}", key, node.ip(), node.port()); +//! } +//! } +//! ``` +//! +//! ```bash +//! # Output: +//! some_key: 127.0.0.3:12345 +//! another_key: 127.0.0.3:12345 +//! last_key: 127.0.0.2:12345 +//! ``` +//! +//! We've provided a health-aware example in +//! `pingora-ketama/examples/health_aware_selector.rs`. +//! +//! For a carefully crafted real-world example, see the pingora-load-balancer +//! crate. + +use std::cmp::Ordering; +use std::io::Write; +use std::net::SocketAddr; +use std::usize; + +use crc32fast::Hasher; + +/// A [Bucket] represents a server for consistent hashing +/// +/// A [Bucket] contains a [SocketAddr] to the server and a weight associated with it. +#[derive(Clone, Debug, Eq, PartialEq, PartialOrd)] +pub struct Bucket { + // The node name. + // TODO: UDS + node: SocketAddr, + + // The weight associated with a node. A higher weight indicates that this node should + // receive more requests. + weight: u32, +} + +impl Bucket { + /// Return a new bucket with the given node and weight. + /// + /// The chance that a [Bucket] is selected is proportional to the relative weight of all [Bucket]s. + /// + /// # Panics + /// + /// This will panic if the weight is zero. + pub fn new(node: SocketAddr, weight: u32) -> Self { + assert!(weight != 0, "weight must be at least one"); + + Bucket { node, weight } + } +} + +// A point on the continuum. +#[derive(Clone, Debug, Eq, PartialEq)] +struct Point { + // the index to the actual address + node: u32, + hash: u32, +} + +// We only want to compare the hash when sorting so we implement these traits by hand. +impl Ord for Point { + fn cmp(&self, other: &Self) -> Ordering { + self.hash.cmp(&other.hash) + } +} + +impl PartialOrd for Point { + fn partial_cmp(&self, other: &Self) -> Option<Ordering> { + Some(self.cmp(other)) + } +} + +impl Point { + fn new(node: u32, hash: u32) -> Self { + Point { node, hash } + } +} + +/// The consistent hashing ring +/// +/// A [Continuum] represents a ring of buckets where a node is associated with various points on +/// the ring. +pub struct Continuum { + ring: Box<[Point]>, + addrs: Box<[SocketAddr]>, +} + +impl Continuum { + /// Create a new [Continuum] with the given list of buckets. + pub fn new(buckets: &[Bucket]) -> Self { + // This constant is copied from nginx. It will create 160 points per weight unit. For + // example, a weight of 2 will create 320 points on the ring. + const POINT_MULTIPLE: u32 = 160; + + if buckets.is_empty() { + return Continuum { + ring: Box::new([]), + addrs: Box::new([]), + }; + } + + // The total weight is multiplied by the factor of points to create many points per node. + let total_weight: u32 = buckets.iter().fold(0, |sum, b| sum + b.weight); + let mut ring = Vec::with_capacity((total_weight * POINT_MULTIPLE) as usize); + let mut addrs = Vec::with_capacity(buckets.len()); + + for bucket in buckets { + let mut hasher = Hasher::new(); + + // We only do the following for backwards compatibility with nginx/memcache: + // - Convert SocketAddr to string + // - The hash input is as follows "HOST EMPTY PORT PREVIOUS_HASH". Spaces are only added + // for readability. + // TODO: remove this logic and hash the literal SocketAddr once we no longer + // need backwards compatibility + + // with_capacity = max_len(ipv6)(39) + len(null)(1) + max_len(port)(5) + let mut hash_bytes = Vec::with_capacity(39 + 1 + 5); + write!(&mut hash_bytes, "{}", bucket.node.ip()).unwrap(); + write!(&mut hash_bytes, "\0").unwrap(); + write!(&mut hash_bytes, "{}", bucket.node.port()).unwrap(); + hasher.update(hash_bytes.as_ref()); + + // A higher weight will add more points for this node. + let num_points = bucket.weight * POINT_MULTIPLE; + + // This is appended to the crc32 hash for each point. + let mut prev_hash: u32 = 0; + addrs.push(bucket.node); + let node = addrs.len() - 1; + for _ in 0..num_points { + let mut hasher = hasher.clone(); + hasher.update(&prev_hash.to_le_bytes()); + + let hash = hasher.finalize(); + ring.push(Point::new(node as u32, hash)); + prev_hash = hash; + } + } + + // Sort and remove any duplicates. + ring.sort(); + ring.dedup_by(|a, b| a.hash == b.hash); + + Continuum { + ring: ring.into_boxed_slice(), + addrs: addrs.into_boxed_slice(), + } + } + + /// Find the associated index for the given input. + pub fn node_idx(&self, input: &[u8]) -> usize { + let hash = crc32fast::hash(input); + + // The `Result` returned here is either a match or the error variant returns where the + // value would be inserted. + match self.ring.binary_search_by(|p| p.hash.cmp(&hash)) { + Ok(i) => i, + Err(i) => { + // We wrap around to the front if this value would be inserted at the end. + if i == self.ring.len() { + 0 + } else { + i + } + } + } + } + + /// Hash the given `hash_key` to the server address. + pub fn node(&self, hash_key: &[u8]) -> Option<SocketAddr> { + self.ring + .get(self.node_idx(hash_key)) // should we unwrap here? + .map(|p| self.addrs[p.node as usize]) + } + + /// Get an iterator of nodes starting at the original hashed node of the `hash_key`. + /// + /// This function is useful to find failover servers if the original ones are offline, which is + /// cheaper than rebuilding the entire hash ring. + pub fn node_iter(&self, hash_key: &[u8]) -> NodeIterator { + NodeIterator { + idx: self.node_idx(hash_key), + continuum: self, + } + } + + pub fn get_addr(&self, idx: &mut usize) -> Option<&SocketAddr> { + let point = self.ring.get(*idx); + if point.is_some() { + // only update idx for non-empty ring otherwise we will panic on modulo 0 + *idx = (*idx + 1) % self.ring.len(); + } + point.map(|p| &self.addrs[p.node as usize]) + } +} + +/// Iterator over a Continuum +pub struct NodeIterator<'a> { + idx: usize, + continuum: &'a Continuum, +} + +impl<'a> Iterator for NodeIterator<'a> { + type Item = &'a SocketAddr; + + fn next(&mut self) -> Option<Self::Item> { + self.continuum.get_addr(&mut self.idx) + } +} + +#[cfg(test)] +mod tests { + use std::net::SocketAddr; + use std::path::Path; + + use super::{Bucket, Continuum}; + + fn get_sockaddr(ip: &str) -> SocketAddr { + ip.parse().unwrap() + } + + #[test] + fn consistency_after_adding_host() { + fn assert_hosts(c: &Continuum) { + assert_eq!(c.node(b"a"), Some(get_sockaddr("127.0.0.10:6443"))); + assert_eq!(c.node(b"b"), Some(get_sockaddr("127.0.0.5:6443"))); + } + + let buckets: Vec<_> = (1..11) + .map(|u| Bucket::new(get_sockaddr(&format!("127.0.0.{u}:6443")), 1)) + .collect(); + let c = Continuum::new(&buckets); + assert_hosts(&c); + + // Now add a new host and ensure that the hosts don't get shuffled. + let buckets: Vec<_> = (1..12) + .map(|u| Bucket::new(get_sockaddr(&format!("127.0.0.{u}:6443")), 1)) + .collect(); + + let c = Continuum::new(&buckets); + assert_hosts(&c); + } + + #[test] + fn matches_nginx_sample() { + let upstream_hosts = ["127.0.0.1:7777", "127.0.0.1:7778"]; + let upstream_hosts = upstream_hosts.iter().map(|i| get_sockaddr(i)); + + let mut buckets = Vec::new(); + for upstream in upstream_hosts { + buckets.push(Bucket::new(upstream, 1)); + } + + let c = Continuum::new(&buckets); + + assert_eq!(c.node(b"/some/path"), Some(get_sockaddr("127.0.0.1:7778"))); + assert_eq!( + c.node(b"/some/longer/path"), + Some(get_sockaddr("127.0.0.1:7777")) + ); + assert_eq!( + c.node(b"/sad/zaidoon"), + Some(get_sockaddr("127.0.0.1:7778")) + ); + assert_eq!(c.node(b"/g"), Some(get_sockaddr("127.0.0.1:7777"))); + assert_eq!( + c.node(b"/pingora/team/is/cool/and/this/is/a/long/uri"), + Some(get_sockaddr("127.0.0.1:7778")) + ); + assert_eq!( + c.node(b"/i/am/not/confident/in/this/code"), + Some(get_sockaddr("127.0.0.1:7777")) + ); + } + + #[test] + fn matches_nginx_sample_data() { + let upstream_hosts = [ + "10.0.0.1:443", + "10.0.0.2:443", + "10.0.0.3:443", + "10.0.0.4:443", + "10.0.0.5:443", + "10.0.0.6:443", + "10.0.0.7:443", + "10.0.0.8:443", + "10.0.0.9:443", + ]; + let upstream_hosts = upstream_hosts.iter().map(|i| get_sockaddr(i)); + + let mut buckets = Vec::new(); + for upstream in upstream_hosts { + buckets.push(Bucket::new(upstream, 100)); + } + + let c = Continuum::new(&buckets); + + let path = Path::new(env!("CARGO_MANIFEST_DIR")) + .join("test-data") + .join("sample-nginx-upstream.csv"); + + let mut rdr = csv::ReaderBuilder::new() + .has_headers(false) + .from_path(path) + .unwrap(); + + for pair in rdr.records() { + let pair = pair.unwrap(); + let uri = pair.get(0).unwrap(); + let upstream = pair.get(1).unwrap(); + + let got = c.node(uri.as_bytes()).unwrap(); + assert_eq!(got, get_sockaddr(upstream)); + } + } + + #[test] + fn node_iter() { + let upstream_hosts = ["127.0.0.1:7777", "127.0.0.1:7778", "127.0.0.1:7779"]; + let upstream_hosts = upstream_hosts.iter().map(|i| get_sockaddr(i)); + + let mut buckets = Vec::new(); + for upstream in upstream_hosts { + buckets.push(Bucket::new(upstream, 1)); + } + + let c = Continuum::new(&buckets); + let mut iter = c.node_iter(b"doghash"); + assert_eq!(iter.next(), Some(&get_sockaddr("127.0.0.1:7778"))); + assert_eq!(iter.next(), Some(&get_sockaddr("127.0.0.1:7779"))); + assert_eq!(iter.next(), Some(&get_sockaddr("127.0.0.1:7779"))); + assert_eq!(iter.next(), Some(&get_sockaddr("127.0.0.1:7777"))); + assert_eq!(iter.next(), Some(&get_sockaddr("127.0.0.1:7777"))); + assert_eq!(iter.next(), Some(&get_sockaddr("127.0.0.1:7778"))); + assert_eq!(iter.next(), Some(&get_sockaddr("127.0.0.1:7778"))); + assert_eq!(iter.next(), Some(&get_sockaddr("127.0.0.1:7779"))); + + // drop 127.0.0.1:7777 + let upstream_hosts = ["127.0.0.1:7777", "127.0.0.1:7779"]; + let upstream_hosts = upstream_hosts.iter().map(|i| get_sockaddr(i)); + + let mut buckets = Vec::new(); + for upstream in upstream_hosts { + buckets.push(Bucket::new(upstream, 1)); + } + + let c = Continuum::new(&buckets); + let mut iter = c.node_iter(b"doghash"); + // 127.0.0.1:7778 nodes are gone now + // assert_eq!(iter.next(), Some("127.0.0.1:7778")); + assert_eq!(iter.next(), Some(&get_sockaddr("127.0.0.1:7779"))); + assert_eq!(iter.next(), Some(&get_sockaddr("127.0.0.1:7779"))); + assert_eq!(iter.next(), Some(&get_sockaddr("127.0.0.1:7777"))); + assert_eq!(iter.next(), Some(&get_sockaddr("127.0.0.1:7777"))); + // assert_eq!(iter.next(), Some("127.0.0.1:7778")); + // assert_eq!(iter.next(), Some("127.0.0.1:7778")); + assert_eq!(iter.next(), Some(&get_sockaddr("127.0.0.1:7779"))); + + // assert infinite cycle + let c = Continuum::new(&[Bucket::new(get_sockaddr("127.0.0.1:7777"), 1)]); + let mut iter = c.node_iter(b"doghash"); + + let start_idx = iter.idx; + for _ in 0..c.ring.len() { + assert!(iter.next().is_some()); + } + // assert wrap around + assert_eq!(start_idx, iter.idx); + } + + #[test] + fn test_empty() { + let c = Continuum::new(&[]); + assert!(c.node(b"doghash").is_none()); + + let mut iter = c.node_iter(b"doghash"); + assert!(iter.next().is_none()); + assert!(iter.next().is_none()); + assert!(iter.next().is_none()); + } + + #[test] + fn test_ipv6_ring() { + let upstream_hosts = ["[::1]:7777", "[::1]:7778", "[::1]:7779"]; + let upstream_hosts = upstream_hosts.iter().map(|i| get_sockaddr(i)); + + let mut buckets = Vec::new(); + for upstream in upstream_hosts { + buckets.push(Bucket::new(upstream, 1)); + } + + let c = Continuum::new(&buckets); + let mut iter = c.node_iter(b"doghash"); + assert_eq!(iter.next(), Some(&get_sockaddr("[::1]:7777"))); + assert_eq!(iter.next(), Some(&get_sockaddr("[::1]:7778"))); + assert_eq!(iter.next(), Some(&get_sockaddr("[::1]:7777"))); + assert_eq!(iter.next(), Some(&get_sockaddr("[::1]:7778"))); + assert_eq!(iter.next(), Some(&get_sockaddr("[::1]:7778"))); + assert_eq!(iter.next(), Some(&get_sockaddr("[::1]:7777"))); + assert_eq!(iter.next(), Some(&get_sockaddr("[::1]:7779"))); + } +} diff --git a/pingora-ketama/test-data/README.md b/pingora-ketama/test-data/README.md new file mode 100644 index 0000000..f44bac8 --- /dev/null +++ b/pingora-ketama/test-data/README.md @@ -0,0 +1,18 @@ +# Steps to generate nginx upstream ketama hash logs + +1. Prepare nginx conf +``` +mkdir -p /tmp/nginx-ketama/logs +cp nginx.conf /tmp/nginx-ketama +nginx -t -c nginx.conf -p /tmp/nginx-ketama +``` + +2. Generate trace +``` +./trace.sh +``` + +3. Collect trace +``` + cp /tmp/nginx-ketama/logs/access.log ./sample-nginx-upstream.csv +```
\ No newline at end of file diff --git a/pingora-ketama/test-data/nginx.conf b/pingora-ketama/test-data/nginx.conf new file mode 100644 index 0000000..7b30a3a --- /dev/null +++ b/pingora-ketama/test-data/nginx.conf @@ -0,0 +1,29 @@ +events {} +http { + log_format upper '$request_uri,$upstream_addr'; + + upstream uppers { + hash $request_uri consistent; + + server 10.0.0.1:443 weight=100 max_fails=0; + server 10.0.0.2:443 weight=100 max_fails=0; + server 10.0.0.3:443 weight=100 max_fails=0; + server 10.0.0.4:443 weight=100 max_fails=0; + server 10.0.0.5:443 weight=100 max_fails=0; + server 10.0.0.6:443 weight=100 max_fails=0; + server 10.0.0.7:443 weight=100 max_fails=0; + server 10.0.0.8:443 weight=100 max_fails=0; + server 10.0.0.9:443 weight=100 max_fails=0; + } + + server { + listen 127.0.0.1:8080; + + location / { + access_log /tmp/nginx-ketama/logs/access.log upper; + proxy_connect_timeout 5ms; + proxy_next_upstream off; + proxy_pass http://uppers; + } + } +}
\ No newline at end of file diff --git a/pingora-ketama/test-data/sample-nginx-upstream.csv b/pingora-ketama/test-data/sample-nginx-upstream.csv new file mode 100644 index 0000000..1b0c113 --- /dev/null +++ b/pingora-ketama/test-data/sample-nginx-upstream.csv @@ -0,0 +1,1001 @@ +/81fa1d251d605775d647b5b55565e71526d4cef6,10.0.0.7:443 +/2fec328e6ccdda6a7edf329f9f780e546ea183b4,10.0.0.5:443 +/19fb835d90883a6263ec4279c6da184e3f1a79b2,10.0.0.4:443 +/da7a88e542f7aaddc074f988164b9df7e5f7fea6,10.0.0.4:443 +/8f87cfd8005306643b6528b3d4125cf005139a7e,10.0.0.5:443 +/26d2769eab098458bc3e4e641a4b7d8abffd0aea,10.0.0.6:443 +/aa5b5323980f2d3e21246212ebd820c3949c1e88,10.0.0.7:443 +/d9c4bc3cc4517c629e8f4c911c2fd8baf260ae65,10.0.0.1:443 +/28c1c069a2904bb3b3e0f9731b1ff8de9ab7a76d,10.0.0.4:443 +/fe5199bdfeee5cd431ae7e9f77f178164f9995a0,10.0.0.9:443 +/43992eee187920c5e8695332f71ca6e23ef6ac4b,10.0.0.3:443 +/38528aab753a6f32de86b5a7acdbb0c885137a81,10.0.0.9:443 +/12d4b9155ff599c0ac554226796b58a2278b450f,10.0.0.7:443 +/9c34c9a4f9009997dd29c6e6a627b0aca7beb6e5,10.0.0.5:443 +/eb5a2ab55796afd673874fd7560f1329be5540bd,10.0.0.9:443 +/ad7b5395766b77098c3f212043650a805b622ffe,10.0.0.3:443 +/c72fedf4177499635302849496898fe4f3409cc1,10.0.0.9:443 +/77766138aaf0c016bdd1f6b996177fc8ca1d2204,10.0.0.8:443 +/860c86b94e04f2648fb164c87fd6166707fd08ff,10.0.0.6:443 +/1b419454e4eb63ef915e8e06cc11110a3ccd607e,10.0.0.7:443 +/a8762dc488e1a1af31e53af8ddb887d4f3cca990,10.0.0.8:443 +/2e8e8e8fdeada0bbd33ba57d20209b4d9343f965,10.0.0.4:443 +/0220fa8b9a256e7fcf823097759aa3c44e6390e3,10.0.0.6:443 +/418c1c554186b78c11de89227fbc24ef128bce54,10.0.0.8:443 +/bc86e565b76f8e6f560064b02ab26529b6064571,10.0.0.3:443 +/5c6a9b50df69956bd2b937ce7871ba6d67678db6,10.0.0.5:443 +/5726f95dd0b1b145ad1a06755580f42fea41ac2a,10.0.0.9:443 +/db601a7f7e24504b820e5ef5276b2653ec6c17d9,10.0.0.4:443 +/f428a38a0d3dbbb12d475aa8f5be917147175eaf,10.0.0.6:443 +/b815ca5871d52098946eded8a3382d086747818f,10.0.0.1:443 +/fc61e21e21c6c0a9e03807a2cad7c1e79a104786,10.0.0.1:443 +/8278c52b97c1e805c1c7c1a62123ca0a87e2ea2a,10.0.0.8:443 +/668fd6d99bfb50b85b0928a8915761be2ca19089,10.0.0.2:443 +/fefbfb22035c938b44d305dbb71b11d531257af8,10.0.0.2:443 +/c30b287269464a75cf76a603145a7e44b83c8bde,10.0.0.5:443 +/7584dbc60619230cb5a315cfdd3760fe2e2980c3,10.0.0.9:443 +/399b3bdce88319bdba1b6b310cfcbd9db9cec234,10.0.0.6:443 +/5edc91979f6f38dbbe00544d97d617b92b3df93d,10.0.0.9:443 +/ac740e2450803d9b6042a3a98e5fe16eaad536e6,10.0.0.1:443 +/46013f26dbbde9c25de5fcbb92ff331d5614bae8,10.0.0.5:443 +/f109862c7c78e8ce087aeff9f2368d54d91fd3be,10.0.0.5:443 +/fdc13a7011bbcf36b232adde4c610f0f35e9147e,10.0.0.3:443 +/8387a3c076e525cae448c6a3b22988a2f37a98fc,10.0.0.1:443 +/b4739e36d8e7eba1a400925c928caf0741b1a92a,10.0.0.1:443 +/d92612bb3f678d8b181fa176e0af7227bf5f7e42,10.0.0.9:443 +/89ec56b1d8d72c888b044e8cd7fa51b9ac726a41,10.0.0.2:443 +/7cf921d8181af6912676f20c3d961d3f2ffbad20,10.0.0.3:443 +/9181876c839cf16fd7c8c858b7afdc0178fb9500,10.0.0.3:443 +/1034a4394566c826888f813af75c396fe8082b43,10.0.0.3:443 +/81ac831667e89c2c6b3c6098b598d99eb1ce2b20,10.0.0.2:443 +/d9dbae8a03a430b8d9cbffcf622b4e379bc89bf6,10.0.0.7:443 +/c67776793fdcf7553fe0cb6414bb9dafe0216911,10.0.0.6:443 +/1ee25559aa4aaa11ec1b3d2cc8645ed05ec001b3,10.0.0.9:443 +/580180a2b85efff1a393ea2449ae271148ca2770,10.0.0.2:443 +/84e1a1904a52e43ace344346032daca4e1bb69d6,10.0.0.8:443 +/9cd06ffa608a252a30d935d2ebf10eceda06ba2e,10.0.0.6:443 +/cf85a0000f38ac5346ddddd8cc0c28a054bbe60c,10.0.0.5:443 +/c31f22b05514e380dd4430086486dc3ba4e36ed4,10.0.0.6:443 +/336fdd336fde2bde2e0132d4be65088953036175,10.0.0.7:443 +/cb1e7e2c425607defdd725e81ca3121340dbc8bb,10.0.0.8:443 +/7bd85bb6826eeb30a67a999bfdeb6f6368954a3d,10.0.0.5:443 +/bb542ca4f154437b0fa394b3be8d45350efc4955,10.0.0.8:443 +/53e425848829e3aeb1c6991512e1951145b2ce46,10.0.0.6:443 +/a6ad65c1bcacb876b76165e741f35c98a09cbbf3,10.0.0.3:443 +/1fca16e96a89623e2ef7a93fccd767c4ef2a7905,10.0.0.9:443 +/b9ad129954c11aa1491552845199c2fb4bbff25e,10.0.0.2:443 +/9c0380f918aeb44664929447077ee992894cb754,10.0.0.9:443 +/a9aeb4e3fb0b2358f70a6d9c2ad62409a7c24574,10.0.0.5:443 +/8d563416df0c167343d295f889d64dd9ff213a9e,10.0.0.7:443 +/71ddc6cc8f25f63ad7df9ad963beb9a14ca6b76f,10.0.0.2:443 +/1dd61ea19da5970147129b0ba635338bc93c7aba,10.0.0.7:443 +/2c019dd0aebfdf9d94fb1201b25f443c91c034f8,10.0.0.8:443 +/636b620e6d548492a0fac32e895fa64ab48fa70d,10.0.0.1:443 +/e26420a446174c0bcbc008f3d8ce97570d55619e,10.0.0.7:443 +/2522d660a63527ab2f74c7a167366bbb0bc46cb1,10.0.0.6:443 +/6e585c3e88aeb95554f5c00730c70d71189a12c6,10.0.0.1:443 +/0bc50da77b7cf3959612950d97564e91e5a0f3fa,10.0.0.9:443 +/167872e2688593c6544c0855b76a99fd0f96bb69,10.0.0.8:443 +/7842aa002d2416c4587d779bbea40f5983883a9d,10.0.0.1:443 +/b3cdb310440af5a8a9788534e2a44e1df75fc0aa,10.0.0.2:443 +/7c17fc177496c13dd1207388087ae1979603c886,10.0.0.5:443 +/28865c3daa92ec1e3784c51e9aa70c78b902dfa6,10.0.0.3:443 +/4b990fc439195c5e05cfea65a2453f23fc5bbf1a,10.0.0.5:443 +/7261021a69a6478b0620315c231c4aa26fda2638,10.0.0.2:443 +/d5caa3e251ad2dd28ba82c3dcb99bff6d368e2a0,10.0.0.1:443 +/a8606508d178e519aa53f989ef60db8a0f3a2c2c,10.0.0.2:443 +/eb797fcf3e5954c884b78360247e38566f7f674a,10.0.0.9:443 +/289ced7bea19beee166cf4b07d31c8461975d4e4,10.0.0.6:443 +/e563ce7e72b68097a6432f68f86ed6f40d040ac3,10.0.0.3:443 +/ba22b6f2657746d3b8f802ab2303ffd4b040a73f,10.0.0.7:443 +/5dbda23f45eb02ecc74e57905b9dc6eab6d9770c,10.0.0.9:443 +/637691e12da247452c3a614f560001e263a9f85e,10.0.0.5:443 +/b2e491e1528813c17dfc888c5039c9e3f40f9040,10.0.0.8:443 +/a4575d09e2fcb4d42e214c33be25c2f1c10e8323,10.0.0.5:443 +/d655e051b4f82c459b20afbd2ccca058e16ad3fa,10.0.0.2:443 +/cdca39ce5deb7022702e18e0c6b61010ba931e54,10.0.0.9:443 +/58b31129208a29d2435258dc9f24a6b851ed1ac0,10.0.0.6:443 +/019930f0699b20a72a091c1042dfe33ac568b190,10.0.0.5:443 +/f00117302e2daca8c81e68cb33cf445b72c45895,10.0.0.9:443 +/da90cf74593ee181693910a40142bc79479c354e,10.0.0.5:443 +/87654ba6f96f359e4418b3368ae2256a3c2dad51,10.0.0.2:443 +/e85d0e6a90433b5a64257469c2cb4e441f39d07c,10.0.0.3:443 +/8527e42c8677b3f8264a2a647c00eb3acc5d0207,10.0.0.1:443 +/3adbb76ad6ae8a5342a5458e5f41ac4bdddb45fb,10.0.0.5:443 +/96e7ecedc6c60f0b52869a98f9d192af1e72d329,10.0.0.9:443 +/430095d6c47a7d2a8073e73df1c694fc9065e8f3,10.0.0.4:443 +/475ce23ca92e83ebfbc781aa337063c6b034bfb6,10.0.0.3:443 +/3a2cd1836406244cf08a552f60734872cfabfa1d,10.0.0.4:443 +/47372a5cf6b640c32681f094dd588fa204839637,10.0.0.1:443 +/74d7ecd706817756952727e82a5933549d582f68,10.0.0.4:443 +/0c1ab68f17265ddc9a58577f2a3443b523508d2a,10.0.0.3:443 +/e72871b3b2e08e87443995810c8fc542ec0c3b88,10.0.0.7:443 +/20ffdb8b43d521aee3c81cbb668b94828bf3f86d,10.0.0.9:443 +/b9a4b7d390a4fb62ea6252287351954ce6935fd2,10.0.0.9:443 +/71f52570d9fa32e2df99088e44850fa9097804ec,10.0.0.6:443 +/9533af016368e423dc90b4e249002233fa3fcd06,10.0.0.8:443 +/23992435c60a48db0188097fb2f15826d99be05f,10.0.0.1:443 +/bc351d376bcd7338aca33255199bfa3ced51d66b,10.0.0.5:443 +/bc5a14bccb994f346069886be05ba91dc4cefacd,10.0.0.4:443 +/6a29ff380492b77fe69f9ec0851cbbf7228d62f3,10.0.0.8:443 +/99bbb0675c38e292e979110ac88fc7711edc92a2,10.0.0.7:443 +/786105dc60dfffc8e2ea58679a14fd4428570d10,10.0.0.4:443 +/d983235f5af78dc9b13a5d177a44c6a76c8fbb2c,10.0.0.8:443 +/55163e01bc026cab4cf6985c8c2583876680aa80,10.0.0.2:443 +/eb68e3145c8a531198ea2a60e7a4fe6cb1a2b78f,10.0.0.6:443 +/7996a420a8e08545583a8ca0941c1a0c9ddc875c,10.0.0.9:443 +/d8d3509e8df61eff246be4faa6630d5f11b81172,10.0.0.4:443 +/ecd74f84dadcbb5e7ab90430ba424a996a5ec50f,10.0.0.7:443 +/566ca8a48b0875bdf60d224188b0d952da6c8dc7,10.0.0.5:443 +/0497f891fd6d35ffc0ed28dd3ba17eeba1301fa0,10.0.0.2:443 +/6a406d220cbda7fad4facc04632fd0c12dc6d998,10.0.0.4:443 +/3a54c0bfc41cd0942d0e479430cdbc551e33fb99,10.0.0.9:443 +/a7a224cf1e0d9b4e5493b2f61fa53ad72de58b94,10.0.0.6:443 +/4121200fe9e4e7c2126c5d71d108e5119f37783a,10.0.0.4:443 +/caf4c4b46875bbfa63b9ab35a4bce5646ebd55b4,10.0.0.3:443 +/90ad2be0a253536ab7c3e961443a91ded0e66e61,10.0.0.1:443 +/caf569f41f3556f588fefc887d6ec0d454bfef8c,10.0.0.9:443 +/0e3c3e157ffefdfa94e785d4a55f4eb6fca4dc70,10.0.0.2:443 +/b0b8ba29e45725715f7982a05edac1ff999a7899,10.0.0.3:443 +/cc5430ac1220fe146e68e9cf6f174269d403224d,10.0.0.7:443 +/508445e1be7b2b4495f2eb5907530bb095e98ea7,10.0.0.5:443 +/d6169d6f2495da4842a67163dcc0e5f31acb1a0c,10.0.0.3:443 +/8d85ea8d983c0e35836b8a203660c6c919da645d,10.0.0.8:443 +/ee5128bf7f95196d6569af52c9d99c4d60f132c6,10.0.0.7:443 +/461d5e76ae9d26244e546eed7038efe6cf7d9bbd,10.0.0.2:443 +/9f97615d8e9dea23c4c4e841838404fcd8698d8e,10.0.0.6:443 +/c01e055c153b1d34d51c6598e2e1c3fc362d812e,10.0.0.8:443 +/7c087772081d068f5fd86960e4d89901f3c06afe,10.0.0.2:443 +/37e6e5c96c2661d244cbd243151f9c90119d5f4a,10.0.0.4:443 +/663e532894288bb97751dda93f151d85f6c16813,10.0.0.7:443 +/2b3904fd38fc96f184226c842f0643cd0596d865,10.0.0.3:443 +/14cb69e56f7f17a26f0bdfce16dec5baf539dba0,10.0.0.8:443 +/adbe42c7ca6dd63d976f49262cf3d1a27a5f7bb0,10.0.0.2:443 +/70b58e27d6eb735c3c82d9aec1f6608f2f32195f,10.0.0.3:443 +/e7d3683cca1dcc45d8e3fdfb54eddc9b34141d65,10.0.0.3:443 +/407e3958ae8b94172af71487050ef5dc0aeab2ac,10.0.0.6:443 +/4c5af9e573fc3e0120d322a950fcbb792074d670,10.0.0.7:443 +/fe92a691ba1d11d6f49e5144be9baee390cc27e6,10.0.0.9:443 +/298835604d35f371a68e93047c699a7c41375f97,10.0.0.6:443 +/2155470425069f357851ba81346b879a8193aebb,10.0.0.3:443 +/f55d45d265ec44be7ded0db1252281348fab75f0,10.0.0.4:443 +/798f665aa334e5eb9a49669785e94da933d81f32,10.0.0.8:443 +/ad8bf2624e7fc687b0130b61fdee9db2a2d865fd,10.0.0.7:443 +/d2002a4943563ca4c4fc66b4ad65aac4e1410b2e,10.0.0.2:443 +/a025e91fc9b3fcdc0491d0e4b4b0f09e322e53eb,10.0.0.6:443 +/b4a46e8f0ca5698b4f6dd201b87e88125b153ece,10.0.0.4:443 +/ff2a4976667b127ca1e3bb5027e8a836e56fd358,10.0.0.2:443 +/307086130cdefaa3d899fca3dd9e77047fff1cf7,10.0.0.5:443 +/558d5eeb99c6f1cfd6367fb101392072e5140c44,10.0.0.7:443 +/a1a3799079c1ef01be067c4c6a1db5b7fe6515b1,10.0.0.4:443 +/5b66932db9324bb9f8d6fc1f7be819c1c1ff43bd,10.0.0.5:443 +/1d69b12d308183c0d6432fb4cb8bacbc86193830,10.0.0.8:443 +/eef4c8b2ded3656c9d6174a72ffc487f0c769492,10.0.0.2:443 +/eb439a2cd0e4c9fdd95d8c0f657a81ce20f96a0e,10.0.0.2:443 +/b6f64c4a87c0d38417ce3dcc7a553a185df7f384,10.0.0.8:443 +/393d62711ecc6309a19a96ea73cffae546922f64,10.0.0.8:443 +/aa18663a595f369e048e33505f82d21ebbfe354d,10.0.0.9:443 +/759754a69ee3e4449bacd21a5866b8434b743cfe,10.0.0.1:443 +/c01e96c10fd69b430cf67edcc3fd2fec7ba30097,10.0.0.4:443 +/284e0c7dbb8e7da2a1fd7180f8d542fbf2410767,10.0.0.3:443 +/6f360332b72940cc117999224b5be35551a1790a,10.0.0.5:443 +/a83eee32d7132975d5d2d2848bc7881345e63735,10.0.0.6:443 +/9d8bfc97428dee1b1495d2568e5ac68b8ec7973d,10.0.0.1:443 +/9e09d80d5653ac55445b42c091ada230ed96cf67,10.0.0.4:443 +/6ca8d4fd764a20ca1b766f9d2a14b81011d80da4,10.0.0.5:443 +/fb89be9d12828716f95a60d092f2a028c876259a,10.0.0.1:443 +/29ffb1d20ace9afed20ce8613a2b636dae70638f,10.0.0.6:443 +/b569fa1c31949a8ab05a60939d44b1132534556d,10.0.0.7:443 +/71a89db0bb322607a2557b089a5d160fa574fc7d,10.0.0.1:443 +/4449e3e6404cecdc9a36ecff54babedc84619b1c,10.0.0.2:443 +/b26294352e342bd6e953264f9e14393413bb371d,10.0.0.2:443 +/a72621f8691cf08ffdc5884556d5512a5ecd1f6e,10.0.0.4:443 +/dc4732cfa991632b719def815b228ded96abaa1e,10.0.0.5:443 +/b908128cca7c859493155441660eaaa09b2fae80,10.0.0.1:443 +/d93c9304c07c8f1d2b6f6c89c882fc2cfad3fefe,10.0.0.4:443 +/8a0db29dc8df0b7845a9ab213d4bd8ac59a121e8,10.0.0.7:443 +/49559040bdef5e1a5dc8ee89f897b79115ef1bfe,10.0.0.7:443 +/23428c6b465b7c43629bc28fa1a7431c6e541778,10.0.0.9:443 +/9db1610e40a3197a5b8c2d0dee2b2ccfe4cabb92,10.0.0.3:443 +/1c6cf23cac024d126066771bae7af48ba141dfd9,10.0.0.1:443 +/5e89a982f7f165b47fef959e10c32afa1e01783e,10.0.0.1:443 +/52644098601b604c9e9e5e3d1150f13e81240fc8,10.0.0.8:443 +/1771afea4cf491711aa3b608fbd8b470306d7bc9,10.0.0.4:443 +/825cb4d51b986eef44d3cba31dd87c4ce3d9c159,10.0.0.4:443 +/83a6211a968db8d62e17525ce593c144ed7fbb4c,10.0.0.4:443 +/6a9abd46a919eed40be39b9d53bd73cb74acf540,10.0.0.6:443 +/12db006d907a255f8d61e5070d1a41defdae27ba,10.0.0.2:443 +/0cf51c79b9d115d7be8fcc104e2f51fee1a3caa6,10.0.0.5:443 +/6bbab5e098876a84c403ef8cbe9864c21f9bb0aa,10.0.0.4:443 +/5fc725bf869cf190f8ce82814d5e8e749030c8cf,10.0.0.1:443 +/859d96b17c00e528c07fe1696fc7ddfdb34c4875,10.0.0.7:443 +/a55638df8b2ceca37d24bb78826833deb633c79d,10.0.0.2:443 +/70ed2f73f55d4d00f9cf694a7f669c3ba11f89ed,10.0.0.3:443 +/b5c910057d813197f8353c31d233de719212455a,10.0.0.5:443 +/b602d274d7d8ff89505fb3ba364b6ccbeeb561ab,10.0.0.6:443 +/50ba092d17178b78c2643e798138ff5514d2d0a2,10.0.0.1:443 +/bf3244d6cec5c60aa29ccca799415354607b7803,10.0.0.9:443 +/7f4ddcc20818c0db3cdd8b440c269e33ef22a7c7,10.0.0.4:443 +/9dc2eaaf3539a7c0a5b97be1f722f544539c6257,10.0.0.2:443 +/c5359e50f3c202f5cd5c096bd15d757ba659e815,10.0.0.5:443 +/038366c13ffa60a0d9ef4bef212e6e7354a6bbfa,10.0.0.8:443 +/9e40dac2f57fe43878519a83af3b75fc2e590217,10.0.0.6:443 +/9b2c05c1d561f86cf9682673628dfef2160650a8,10.0.0.5:443 +/78a2ea21a979d1d0c8e07f0185f358fe58393c12,10.0.0.5:443 +/83d46e2ff9cd7bb557c1b00533a0e4f1733df84b,10.0.0.1:443 +/29bf196e578a83824c55b0f78ceab36b1eb9c82b,10.0.0.4:443 +/61249cd3d39f4dae802db5f0a875a5a4a8ad191d,10.0.0.1:443 +/c7c7dfdf8e9e68d5540aae13b2cbb5fe86c1b965,10.0.0.6:443 +/4be4e8d7897f7d9dfa210bd236e9bb45454fea20,10.0.0.9:443 +/cb5ed875dedef2013fab5b051a8636d10fef56dc,10.0.0.6:443 +/e12ec1f2b657ad0f7988db38254652e153525ad9,10.0.0.7:443 +/9ec5a64e415451efcc8aa7648b284774361e03eb,10.0.0.7:443 +/3a6afe9c8e8f041a59695055cb7733ae254632bd,10.0.0.7:443 +/e3393950cb37481a7b00cbefc3298d14aeda0807,10.0.0.3:443 +/7c6e41537748edb49cfc56ee505256f40935a99e,10.0.0.3:443 +/6bbc445ff57bc9c54407f31616f1b23bf5ee27ce,10.0.0.5:443 +/99ba1e8f21532dab31caf0731f1c5edc8455550b,10.0.0.5:443 +/725fbb619d38c436bb88e28d5219e720989ab6db,10.0.0.4:443 +/7b519ba8928f440bf01ac1d6b98611fb59bb1c89,10.0.0.8:443 +/2ff8d8dd2a37ff1cb34692a00c7fb7d1c155b419,10.0.0.3:443 +/f76abffc1a71b95e7969cceaad57429672beaf68,10.0.0.3:443 +/fe58d58e116026db4cf106ef57732e1b629caade,10.0.0.6:443 +/45549ca0d7c95e97c299b58b03ecf1939e140c9c,10.0.0.1:443 +/93695453157442d799a007d1710f7dbf968be8f1,10.0.0.9:443 +/ebe69b2ea9db3e66a2157021a17f852695eab8be,10.0.0.4:443 +/a885aecaaf297eaac5c98ed708fe6a73fc9273b8,10.0.0.2:443 +/2859256b987358b8d2ee0c81b5494cde3a98d602,10.0.0.1:443 +/d19ae90e456730d2db6b36c1ed1a45335b368fc1,10.0.0.1:443 +/f16f2e87bee62b1523dbb5824b5dfe338ec67704,10.0.0.8:443 +/fcd5f91888014decb190a9dac5fe9fca7ed8d70f,10.0.0.9:443 +/3ee610b32554b5f7c27d40a52bb982378ceb4fb6,10.0.0.7:443 +/21cc5cb90ba59b6b743bc437f0f93c45d21aaea9,10.0.0.7:443 +/8d2bffec729dd863e6dcdaeaefca22d6e29403bb,10.0.0.2:443 +/2ce6b015ea081b69a3867f7b09b753f83fbd4b77,10.0.0.9:443 +/64fcc9606275d6a259a084696318ab704a81932b,10.0.0.5:443 +/0984409349566b9bda3f5ff3b0dae93c6979969c,10.0.0.6:443 +/2b3775815cd0064c1603ec6dfe62b9ff54180638,10.0.0.5:443 +/563ff0fa8762400c92ccb700adb6ea6a7bfb0d33,10.0.0.6:443 +/901f7c9eca3f038ecf6a684a2c46b827c24e8ee6,10.0.0.5:443 +/3dbd852fb7f851fda48f742488e51dfd8d4a472e,10.0.0.9:443 +/a50ef8903707c1c5d7158a851d636ef65e198e7a,10.0.0.7:443 +/92603aec7e7f7a5847f523c336bd80d786667d6f,10.0.0.4:443 +/a941b070f313629549a2874fef17b29b25069214,10.0.0.1:443 +/9a80624738b37b3a3d6b0749feae2bb82d0672c0,10.0.0.5:443 +/f863b682f5f260f4762a14831d949c5dc9bd5f28,10.0.0.7:443 +/d41f6919aa10ee037b4a69df874de03ccfc6432a,10.0.0.4:443 +/e995303d36162db8650a2802ce0d52263c29ec0c,10.0.0.1:443 +/7823ceab6e649edbb4f99d62282fe00edbe3acca,10.0.0.2:443 +/bfd84f41dfe1d4470730d0aa41eb73b9d7461503,10.0.0.1:443 +/53f7534ee600e63d0b32bbc1f2f9e4794373c4bb,10.0.0.7:443 +/26f4c39897fdec0b453bc15860a45137064c4ef8,10.0.0.8:443 +/7345179e10fa47e31faf60e165e7802f31315c56,10.0.0.8:443 +/d47e4a2590ff8d5dd916d826adc3c20b9224a3de,10.0.0.8:443 +/8ebb8b58c53468143f882b186fb64ef14e962c0a,10.0.0.4:443 +/7fa7b9821ce360682b88b07fa27158af8d4b10bd,10.0.0.8:443 +/7d3b908d960f61cf4944ac52164eaf9890c17c47,10.0.0.3:443 +/3900dbeff282a20a6dc0b450581ae27f44230f75,10.0.0.7:443 +/327a041d0576f11ba4c0fc677a8b1fa7cdd5b215,10.0.0.6:443 +/20450e190c6b829846d1a67e43b2e57cf7e5b472,10.0.0.4:443 +/d6d97ddf81c5a8f4f11b87198a3f8e75814d09ae,10.0.0.9:443 +/48a468d706a7cc4b07c0e74695a9c2f64012b02a,10.0.0.2:443 +/35903e2f79bf054b45d9f342642d488b85ec086f,10.0.0.2:443 +/4198c731ac8a3638a955ae891498ea4071b2be10,10.0.0.7:443 +/575be0ba8f57b2650f53499ab19fcf10aca1a467,10.0.0.9:443 +/c211460d038ae3aeb286e759dbe99b9084c56fc1,10.0.0.6:443 +/7d5071d6ed21ce66d8887ee6f88bf8b3145d417d,10.0.0.4:443 +/77435459761c415127dac0d314fe73b728e93816,10.0.0.2:443 +/16a401100431531a7cd8528d6ea8f957df584e4f,10.0.0.4:443 +/9b9af306b3fb801bc4cb127118aee22f4678c6d0,10.0.0.2:443 +/4902696d40151e903ec5bf810f2b82af7bf92799,10.0.0.3:443 +/3207830ce45f38a326cba44a2bbed7ea7009e7f1,10.0.0.8:443 +/002655dd3e576dd2be046915f365ee7947c77553,10.0.0.7:443 +/8a316dc9861784929ae9283ff9edf50fcf2abb77,10.0.0.7:443 +/8b2639c2cf4f75723ae219f9c8a60779e93b3a50,10.0.0.2:443 +/d135a3f32a0eaec83386f9b8167c8b351fa0f9cd,10.0.0.5:443 +/3cb5d50669030262c50b916b5e5f0ff112a23f87,10.0.0.5:443 +/791d86b7b2c2860da849c6e20006b3f5f92714a1,10.0.0.3:443 +/de08fde7bd93bfd844407842d09bc163675fbcbb,10.0.0.2:443 +/0576ce89f317cf54673e20eb664bb8992c975a71,10.0.0.9:443 +/fd7244f5203e2985e6c65ee07686cbd2a489e21c,10.0.0.6:443 +/233de62d4ed3f6e6a8d847500ed8be500970bd0e,10.0.0.8:443 +/8b8ec68415a7a9cbc426c23ba98ad165a434fa1c,10.0.0.6:443 +/ec4230ec3e8fa6600907e777c94f2e59382b4542,10.0.0.7:443 +/bd220769eedf9c7efa641de459810048891e3dc6,10.0.0.3:443 +/9165254e59f4fad93b93a02210b25dbcaac4e0be,10.0.0.1:443 +/0adb6ec07cfcd61534a065db496c7042e97391fe,10.0.0.5:443 +/39a5c89484e21a243c7061d39dbd236c80d4ede3,10.0.0.3:443 +/dc560955b3b817db3e79e37255cd18bd66a39a22,10.0.0.9:443 +/9d433be2cca7907dae1b8c24900edc5adb6065bf,10.0.0.2:443 +/2531e51eda6b68cc2faa7a09ad032387b2676523,10.0.0.9:443 +/d591b928b7f89b00458ea30ef6f4fd20cd7e41c2,10.0.0.7:443 +/9720475f8d148f70245ade435243bfba5a1ba559,10.0.0.4:443 +/2544d73a3c1b0f04829284a5b425607f4f61ced7,10.0.0.2:443 +/e3af59332ba621011d98fbf2a38c8a0b69b9ca79,10.0.0.5:443 +/d8f4c58c0db28d7368f453d41abd59f6999b3ccc,10.0.0.8:443 +/8a6180a589aec21a274a3f47781ccb1311b0833f,10.0.0.6:443 +/83aca0a94c4883adb8e7ff795c1008ed59052691,10.0.0.7:443 +/adc1ab7741effd4ece0e832c41d1fe69f5e1805c,10.0.0.6:443 +/35a50236b60e680d3d968ad3857525a8649fd6a7,10.0.0.1:443 +/30495b101ac5318458d74b3a286527e164efac53,10.0.0.9:443 +/e2067038a82745a65406516f15817b63e328a825,10.0.0.2:443 +/754c3e717cf1640d11ee1ca113571fd0ae55a0c2,10.0.0.9:443 +/c4462289d891b8b0c0783041044908bd347a27a7,10.0.0.1:443 +/68129869b04bc2255d2a17ce01afb14f1be73032,10.0.0.3:443 +/683a4f8e369c5c3eeb85f0779aced10809bbdbb8,10.0.0.8:443 +/8da89d7686976482713413835c889a7f289174a7,10.0.0.3:443 +/511bef26cd42422c6f0c9bd33714a07b06dbb3e1,10.0.0.1:443 +/b871c4e41b3eababead2aa4dbea87fef7161affa,10.0.0.8:443 +/808da0dc7d4025a0858eec92ac72e9ffbff233c4,10.0.0.2:443 +/6fee636398f916c4ba0074fc327f7b3dbf683a8b,10.0.0.2:443 +/03ce09b1a7c7ae719f66f489841d0ff11635ffc5,10.0.0.3:443 +/8653f568fca5173d6d274060692912676709981f,10.0.0.9:443 +/8f8402b8bba56124ec6de0552c1844bd76bf72ea,10.0.0.4:443 +/2c89ed1b71a52c9b0a9fa7909dc87d4d06237216,10.0.0.1:443 +/5279a73f9dcfa562f13180791932598ed8a067f1,10.0.0.9:443 +/456482cc45669a59fe8af5e49648a8079bc35c06,10.0.0.4:443 +/90350e060dff6a507e69bd80c38629a2d9bf12b9,10.0.0.1:443 +/2d5c624c50ba3ae06782861bba176e9b2f45f529,10.0.0.8:443 +/67077af97e65ae301e88c6cd0e87c7ddb68fa9ef,10.0.0.6:443 +/f017e954321efbfe4046942be0a1122d9be81d52,10.0.0.9:443 +/0b74a181ce4b4f43023e4bc0acd7770f2867572f,10.0.0.5:443 +/f40f8509ac9f73516224825e88a220ca02db2d81,10.0.0.7:443 +/c2f12ede0ced03c9357a4fc5e05e9af5652433c4,10.0.0.9:443 +/d3ec86f1dde7e9c416c88ddbabf854e21decec2d,10.0.0.7:443 +/ea29487fa7e1c9e79ff0f257bfd8241736ddab9f,10.0.0.7:443 +/b8f4ec5dee59a8693710cb95e1734900d7b6b076,10.0.0.4:443 +/d0be164540802b86de0762ea266e03c8859ff70d,10.0.0.8:443 +/cab9ed312a56db577bc36e4c2f52e84f8abb09ce,10.0.0.7:443 +/62dfa34389964b03792842c09adce33e7decc837,10.0.0.5:443 +/f653827f1289ae68efd5a0d057fbc172f8352842,10.0.0.8:443 +/dbd9b9cf5affa501ddcd1a19eefa4240e311f94a,10.0.0.5:443 +/3e74e167ef6393b6544b4e75da97f30f6c2e6477,10.0.0.6:443 +/d004b31247c668c439fc8e491f71a69dfd35a55b,10.0.0.6:443 +/17e79540d401ae73e7d666444feececf64602d23,10.0.0.8:443 +/06c9cb78908d623842c4a7c7baae3d55009ffc43,10.0.0.1:443 +/64427a2e50196a34670b9de8a4aebe44cbb26cc5,10.0.0.2:443 +/802741e276f10186ee9b63d47af006e8bc3de516,10.0.0.1:443 +/10cea480539356be3f2a2f14c05f057a60ef9b10,10.0.0.5:443 +/17d5b6820d78727b781be06cbc7cd2a9be650794,10.0.0.8:443 +/1c0f7ec3d8919ffd2ccb3312fb7d6d2e15cd3133,10.0.0.2:443 +/ef3afb81312d46b826f033d9adf0c730996e7992,10.0.0.7:443 +/080e45e7955e797bdc906af2fabeb8fbf2ac48e1,10.0.0.1:443 +/c043b1a590f09716da25328fe0573c8e2e9c0bdc,10.0.0.9:443 +/df604b478f31f11b4cb291b1a393749ce4e72ef3,10.0.0.7:443 +/32a8a7a0678c834e2cc7ec0584cd193fd1fd91e5,10.0.0.5:443 +/8303fbffee5f38f8eb4a51f3c1255de830abce34,10.0.0.5:443 +/24c16585ac0791c7aa1d8a16bea2fae9e7008cc5,10.0.0.5:443 +/0059d4d899d1e961f249a060d91e932a89bc9b4e,10.0.0.4:443 +/f1edc74510b17b10cdff8801776d4eaf72a1cf0c,10.0.0.2:443 +/faaec2162ff441c490fd2dc0640bf2c941438995,10.0.0.3:443 +/1159fc0cd3e683f92fc649aad3a4bcc34564c3a3,10.0.0.3:443 +/2ade445523d26c3b1519e699590939011036217c,10.0.0.7:443 +/cc6b2c1b20ef1a63adf2afaf1207ecf446fb5719,10.0.0.6:443 +/8cb65496c9ebd858fb1101d85cf35467c4a0be17,10.0.0.8:443 +/38a5bad44dfc1ee47585be27cbd9598959ea6caf,10.0.0.2:443 +/c651109ffc0f2270596065009642fe3fe0529e60,10.0.0.5:443 +/b71ae7d9c9d4b67677fec2e630313cd01cd130e5,10.0.0.2:443 +/088d49f180d9a2211708c95d5e9d6986705c8d7e,10.0.0.4:443 +/8e44a5bd8519aa7b6d85f01e7b01e1a2ca236b6c,10.0.0.6:443 +/2526eec4890f03477261584646a0ed6def65f8f4,10.0.0.8:443 +/3a70dff8627a20ee109fa8241d7da762a02e02b3,10.0.0.4:443 +/eb999a0198002b52abca1fd424a44013989e1403,10.0.0.4:443 +/22a12a10268929d77ef8dcdea96f8aa69ab92d8a,10.0.0.7:443 +/b467e43c88f0e69b52e3d341fcee52198f90cf77,10.0.0.1:443 +/c5245921b3f6a21adafd6598046957086edfbeea,10.0.0.7:443 +/6229b444f64a7529dd956877c24b2d149a1debf7,10.0.0.8:443 +/4f8b5505d9e817b39e0b68e196c62240acd07306,10.0.0.7:443 +/8699dc12aa122266013234df69eb5e14d6282174,10.0.0.5:443 +/094dc8a3111d132d294030b358c23e63ff2ad680,10.0.0.5:443 +/3ecdf7b29e567a94e1548f0884f414af6539f974,10.0.0.4:443 +/abb4e33382452c2f5c5d3d0b89f11cc2d497a3a9,10.0.0.9:443 +/696adbeedd54670bdcca6356597b580dd9c4c42c,10.0.0.8:443 +/940b2a895c5540f5dd079f7ba930962980bf4e77,10.0.0.7:443 +/e3ad3f730ef0f82530373e966fc35e7521ba0fae,10.0.0.5:443 +/e68f238fe1b9f3ebffb102e98ee2d958194f0c3e,10.0.0.2:443 +/9e09afd63284f764392213c9f282e59c859ccc2a,10.0.0.3:443 +/40d1f0a5f73460d07f777d16ce0edb5215e0af5b,10.0.0.5:443 +/e641c93505fff2131a1bd6a2bcdeccc7ef56e108,10.0.0.5:443 +/51f5ef58442e9176bf4ab0e3d2a31e3919314467,10.0.0.1:443 +/5c752e882a791057babbc7c9d0fcd6f98249a90c,10.0.0.3:443 +/1a93e7ac8192b94e62372c526a858203ab55f82c,10.0.0.9:443 +/4f00d996e102527cfd906afe596b1fd20b9587e2,10.0.0.7:443 +/cb33e35ff6809c1d9e87af168853c8779949df28,10.0.0.1:443 +/8238f15b1556e4c5a2cfd3024ca22ca0cc38cf75,10.0.0.7:443 +/4d9af33ca789f5675e0b3b7db89277d9f07fe487,10.0.0.9:443 +/31f11a12c1729fbf56e7924ddeaef596d9246ffb,10.0.0.2:443 +/dc5b466b915a7eb13aa3c28a6df11201defc4776,10.0.0.1:443 +/a83dcc91951b9b9bfc86dd414ab2378dcb74bdb8,10.0.0.8:443 +/68dcee7d8e5b5bf81c1acdec7dd97891ce3aca1c,10.0.0.3:443 +/661f218fa06f2b92f22e0ab81731ac4029f4adc2,10.0.0.7:443 +/cd29f2138dc83eeb5d4f16d7b13d0c81483959a2,10.0.0.9:443 +/d34d0b880b1e7499c2133f06e9373f5ea4e841ce,10.0.0.6:443 +/1fb666f29371b67fef8e44a9322c9e387261fa18,10.0.0.2:443 +/808b776dc196d1759afc93646bbfca2c3e8074b3,10.0.0.2:443 +/20888fbf1def43e3123b0a9e4eba7fd7d5f2a410,10.0.0.4:443 +/d7cd6d38cbd5b94c432e15cd2f7502dbb306d757,10.0.0.9:443 +/fc018aa6e4835271d8cd024ec3943115cf4e94b0,10.0.0.5:443 +/27dc43e54f5d904746a36b482123346f44293b51,10.0.0.5:443 +/ed60ccb18540ad211073c281aee0dfc31ccc942e,10.0.0.4:443 +/5fe5f6c1b4d13ba7200d6a89c456f60eb36690f3,10.0.0.2:443 +/e2e2680aff762da702475c5307a31779cab5595a,10.0.0.4:443 +/b9a3d570d0f60b98af06bb2bb8f20bfefa8bc4f6,10.0.0.2:443 +/6b476c8bb82dcf0a32e3999826a788c48fe83e7d,10.0.0.2:443 +/80a371eb8a4a18395199455f8bde883ee548e4ba,10.0.0.3:443 +/de87fa6ef77299db61ac048fc87ce6e3e39934bc,10.0.0.3:443 +/52c42c3136fef68070d143e62366e47b5de255cf,10.0.0.4:443 +/ce7ea948d35ff4f7409eb507cfd3fc6e3b7cf30a,10.0.0.6:443 +/53e772cf7a78bda717c98a08b9081a40a03392da,10.0.0.7:443 +/9bcb6eeb362e0f6c8c04325848cd93f1a4fb75b0,10.0.0.4:443 +/2ef946b7546ba83f528e4af5f60a585fa22cd5e9,10.0.0.8:443 +/a964b047adfe079bd4f39c5d79daac5317611bb2,10.0.0.2:443 +/2cfad7113f585441f4fd054df57e08aa3f7d3441,10.0.0.8:443 +/f5352d8e5f1e080bc70c241a8e65cb48690bf44d,10.0.0.8:443 +/7108e1b4b2a8d73f6a892bf394d6db68eab4b06b,10.0.0.8:443 +/5ce086abaaa907ea06ad20e029375d0a469f6688,10.0.0.3:443 +/e8c9dabb0ae5ff95db4dc58d5e5ad8f49dbe1467,10.0.0.8:443 +/3ea868d97d0f366875936dee5944d30f133a11ee,10.0.0.3:443 +/4c34d28c12494292d3eb30c70e2ee5158ae45bcf,10.0.0.3:443 +/b1367bfef3363e82dce33834cccd1b6178dd4e01,10.0.0.5:443 +/e563665cf03942d218f963ac225b9f65f6f47e44,10.0.0.4:443 +/ae4a64bd0638678f1f19b31f1610e800510308a8,10.0.0.3:443 +/581ee1ca9a6cad9786fe37cba35c4809410a902f,10.0.0.2:443 +/d8065f52a7d18810daaeef95f724c63a843f9354,10.0.0.7:443 +/461f1d57aab6fa3c1bf72f5aa724ed264647c4d3,10.0.0.9:443 +/70335780553fc4499c3d7b903b7e6fb6f06bc47b,10.0.0.5:443 +/9733a49859bbd9b971475bd40e4a2bf7d9bfd203,10.0.0.3:443 +/d3d04d2aa72354af7b52b344a07dcd22dca462b8,10.0.0.2:443 +/c5ea5378e9a319cc2a455d6e296dd3ca5aa12477,10.0.0.6:443 +/f20e309d5d7d97489cc0d5ad3737af32008968cb,10.0.0.6:443 +/33d387521125b7825c8480755efce6e298e936a1,10.0.0.7:443 +/a99b72b250bfb58ddf9d6fa9f805e2873ce0c229,10.0.0.8:443 +/c8997a1855b1b4d8c6fd0a9dc21c01ffd7a1aa0d,10.0.0.4:443 +/43b0d060a125208528829cbace2759c05655c8b2,10.0.0.5:443 +/e609d51a78b4e83d2b778fef69d00f07a66e586c,10.0.0.4:443 +/e62b9cead57079d3290ac764079b6466f7340c9f,10.0.0.2:443 +/d31c487a361be7abb28a0b872d87fb41a907dd2b,10.0.0.9:443 +/b17261bb106af0ee0cc6c710439eb26135dd5f5d,10.0.0.6:443 +/14d92585f8702ef9ff0fd000907f00ca1f6a458c,10.0.0.6:443 +/fe79953554bb5aaf34adc6928060af206e8fa993,10.0.0.2:443 +/cefd63669258fb86a920a65d6fd1d511c508d954,10.0.0.6:443 +/143c0259d1759575c859b902f4e56ef478624989,10.0.0.3:443 +/50315f90fc6f82d29d925e3947ba7c32f19b5611,10.0.0.2:443 +/95428f6c4dcfbb2a0c17a5de4861de01bd3f325f,10.0.0.7:443 +/5ae459d770c6446c4d33104734b7a0293b48ce32,10.0.0.1:443 +/7078c3416e5b78c65c318b3916b5be6d92896c88,10.0.0.4:443 +/a489b81fdc1b4020b4bd3af8b759466c9023cbe2,10.0.0.4:443 +/6a6b490f616c8b52c4548b3c7f46d527662bcb5e,10.0.0.4:443 +/a607e4a8ef834a9ba7575729460fb697f5247cf8,10.0.0.2:443 +/0ea9ba29578fd164f5942d037366c3c6768bdda2,10.0.0.4:443 +/d9ba5006e81ac5128c20823155f4cfe991598179,10.0.0.4:443 +/b691e1e01b2f64c9c584bcec928345657b95d293,10.0.0.7:443 +/44710040aaa84b18986e3f16bc18e3c6ae63169b,10.0.0.7:443 +/9ffc3d11bf5b67e6e316646e64f1b703dd79bd94,10.0.0.1:443 +/1e2905288bed5c9b4c3f03cd6daeede56c2ffada,10.0.0.1:443 +/1d54d79aa491a886dfda64a7780b04161d1481a0,10.0.0.4:443 +/318d2d5e7b7347704e8a0552cbcb01adaef758a9,10.0.0.7:443 +/22b7965fa39aba2195e1fdc1e845e01f378c3d51,10.0.0.8:443 +/3cb017f3b5edf03f68ba84c5e8671e959273dd69,10.0.0.5:443 +/60c48c28f0ffdbe1225b02ef2fabd26364126d73,10.0.0.9:443 +/50330c984bb2167761149c43868e27401c678dd8,10.0.0.7:443 +/ee359ac7b7e76cbc38f5a70cfb4a7323161e8519,10.0.0.9:443 +/6dc489dc87cb4c097812bcbffa666f1d3a06920b,10.0.0.6:443 +/b084378cffd7ba654dd33d13ae00915abddb5acb,10.0.0.8:443 +/de8d58cf5b06da64685d35ea3d85692dd98443ba,10.0.0.2:443 +/926ad3a9d1411e8f959f8a99270e534c2fcc60e7,10.0.0.1:443 +/6f5efb290d2fc1f49af390123a8d45242535ba6c,10.0.0.1:443 +/be0651f76c2d6b3fea26840eb42ff84603fdeb91,10.0.0.2:443 +/664dfe003d4a68cd9502992896f6698ac2930b2f,10.0.0.2:443 +/d5c1076ec326bbbf73e06d110ddec314bab4901e,10.0.0.6:443 +/12bb31b94377990b52936138e755d30e1df5c4ad,10.0.0.9:443 +/e4e3d4a7766b7ef0038e05c72d7c60c853902d78,10.0.0.6:443 +/f1ace48a0bd8b225c5b05a4f310ad6146caab520,10.0.0.4:443 +/d6f4e19fb0dd901454337ea0a62a671f7e731b1f,10.0.0.8:443 +/e7e149ae5cc3b8d3432a74d2a5eff884304f149a,10.0.0.8:443 +/55f9309e65ad34333635e2392170a2d9e8e80f62,10.0.0.4:443 +/90cc256a42fe5c65d3b1c261b6713aac2d668405,10.0.0.1:443 +/b9530da73b2c2287f52955290ec2dbc0e2cd197d,10.0.0.3:443 +/45a8ce0efc2197c63f1b0feddd1d0ea3fb62cb62,10.0.0.3:443 +/af5eb18f6287ead6b0d8d9d4f740916eef27b921,10.0.0.9:443 +/7d4c22a4d647cfc95bc563ce3f0d56217623ea40,10.0.0.8:443 +/6f6d1b8ccb3515ea7a4dfbba821b7e24f0af890e,10.0.0.8:443 +/448153aa0816ea4c11fd433e4011279a9e911319,10.0.0.9:443 +/e1db3b77b5891866a1bb0f136e9a3b6f1a8fd7ac,10.0.0.4:443 +/8e5ddebd1bc756119aaa55d408583bf07c6947f4,10.0.0.1:443 +/cb7c13c7fd7c12a0226dca64d01f154380b26ad5,10.0.0.1:443 +/5b5987f50249480e48251ab6f848580d4dc69372,10.0.0.6:443 +/a81f2d7b7e3640837acc6d0532aa52974371215d,10.0.0.6:443 +/4fb00e1ae798e403dee221592d1b63d8a23ec7ee,10.0.0.4:443 +/06ecb7c80ee19fc3dcdde75659cbf0fc1d03d5f7,10.0.0.1:443 +/4c3644f13f8b64a3919d4c8a53387f102f8efbf1,10.0.0.1:443 +/82d78d0be6134b4da73a81a442ce4c5277d23d8c,10.0.0.3:443 +/968765803dd3e6461c0ff78f95342c0c113e4991,10.0.0.5:443 +/b49bd6e5e66970eb7378c37dde945cf542601636,10.0.0.6:443 +/7789c5109794ebfb0fe72e3bdc2fc29baa58e537,10.0.0.8:443 +/29857840ad23d55c82c93cbf7971b416e118985b,10.0.0.7:443 +/9aab0c35922eadd6e1cf5512af5272b20ff87638,10.0.0.6:443 +/094cd8c29e8621928b163ba64c52752f9dfe5ce2,10.0.0.5:443 +/f56b7ae1b06eff812326334a68cee675f3bf390b,10.0.0.5:443 +/a5a029d6c0f758aab2eb8fb56594a0e2e118a8a4,10.0.0.4:443 +/fe56c1fc75e419f220ea6bf06af6f5321c895f6d,10.0.0.6:443 +/0d5b4bab87051120251a86b92db52e12582ffdd1,10.0.0.1:443 +/34f43611722818ef7baa47b8d07067658765aab7,10.0.0.4:443 +/e70ae66289144a1adae07ea70f507767d57e08d2,10.0.0.9:443 +/3f10a40788eb0581f47d1f7ba7c57cdd6beef74f,10.0.0.5:443 +/2d58723eabb6c5adfb1fa239e34c30da32f117a3,10.0.0.6:443 +/efc167a6366a437ba465c1fc71ec28341082ce94,10.0.0.3:443 +/0bfb80585b1a0fa49c2c863ab39dbe06599681b0,10.0.0.5:443 +/57febded047c2a5455d7662d4967ac0666fb58bf,10.0.0.8:443 +/f06e91822c1eb12e9c560453b2947d45c961ebd5,10.0.0.5:443 +/2292f8823bd4364769bfdd6ef106c6186082a345,10.0.0.1:443 +/dd37a086d19829dfff28195b3b63a197193f154c,10.0.0.1:443 +/74f336292ef253b07abdce03a0f0d47ac6482062,10.0.0.8:443 +/22eb9772d6538887bfc57bad65ef8a72d7ede1d6,10.0.0.7:443 +/a1df865f8909812c4a83e29efe67460c6756bc17,10.0.0.5:443 +/82f35bed949dc360e05c3a60f8d30970f9f30fcd,10.0.0.7:443 +/e6d5dc89bb8d677be8c2dc6cc1f164a254e4436d,10.0.0.3:443 +/dc64f2b2db6cbfb3129d83d0a5ca4d18d45ac35b,10.0.0.9:443 +/d7b8e1a5f574730502d6328885ffbe76a93726cb,10.0.0.5:443 +/e98e4eec7b294b9be385e3f87f5a5a86c4b202fd,10.0.0.7:443 +/7c78c7e886c0a735c2043ceffd442e7a3df6de18,10.0.0.7:443 +/53fcdac505c2e9d61d3df43336fbed32d4d52274,10.0.0.1:443 +/e4e48c52272aeb71e75533c0447168a17dd0a8ac,10.0.0.2:443 +/ea3361f8b5124bc65fe4273df0c03177066e8f1a,10.0.0.1:443 +/7db8dcb86856a592128f019df1696d1bd8f0729f,10.0.0.9:443 +/b8b33e4984ab558303db0b2481d8b6ab52bbeba6,10.0.0.1:443 +/a61755fc46e10c87e457c3fe06a46ca261624b33,10.0.0.1:443 +/74e9e54ea37b089a100561e9c89852a47d738657,10.0.0.1:443 +/7daddba4e4b97331ab3e47976295aa522c7622bf,10.0.0.6:443 +/9a4a68b8ccea854a22b1064b2b647270161df2b3,10.0.0.2:443 +/5b42f6b8ac65a644c8203a0e1f44d452653d24c4,10.0.0.3:443 +/2ae53235a7dd1ee0ae2e0187969e33daae6466e1,10.0.0.5:443 +/cedffeffa4fedb95411a03e7abd3e7771982e5b8,10.0.0.6:443 +/56a5eb3af8c0de47c03da5850dd5a41bca391bf4,10.0.0.9:443 +/8770f96efc90e2c044835618c1fbc21f784148fb,10.0.0.4:443 +/77b4bb2780a93a397dc469fd804b05a3fe6443a3,10.0.0.3:443 +/666b5de9ea6b7768dc6638f1426b5292465de061,10.0.0.1:443 +/793233793175f196f861bbd781cffedc1dfe1649,10.0.0.8:443 +/6e6c4ca7b99531e8ed151f023d3f54dbc1aa72b9,10.0.0.1:443 +/a144b632b672fed3d6d3cdc98d4b664993ae2c3d,10.0.0.7:443 +/074d306725d881f0acd57ed9f08ab85976adef84,10.0.0.3:443 +/c9c04880224d2977fa60a0eef06d1c795fe6a423,10.0.0.9:443 +/d20ffc4abee9257d5dc1fca42c53f6db63a2a664,10.0.0.6:443 +/316bdd0350db3c95c148e66a291f23d382251f92,10.0.0.3:443 +/a91ec91455f9c64ff666f271a7697c41b6777c40,10.0.0.5:443 +/ebaa896e268efc129dfb87e51c06b69c0d3d10b5,10.0.0.9:443 +/baa5d67516c1a6346ab05f10821c7ad25d297cdf,10.0.0.7:443 +/e8a049830e420d8502891a00eb55982ba0e9fd8a,10.0.0.2:443 +/2885644f1735ca86b002e623d2ea6824977b3386,10.0.0.7:443 +/85a0370ec94aca8895951954ca810d3b06b1f7af,10.0.0.4:443 +/4eda15a4a13e529b2aea970a1b3591e537b8c906,10.0.0.3:443 +/bfa3f4f77e6f65a87448d260eb7647b6e02cc8ca,10.0.0.5:443 +/599ca6d3012a3aa53854fb5e0d25c303d622e8d3,10.0.0.6:443 +/2332508d304442b6a6b4cebe557be963e31aa07b,10.0.0.3:443 +/0c32f97e9ca195ab9eba167702fb7af9f9809bc2,10.0.0.6:443 +/0c4ad0a80bdb18e6a82233a8ebf2103e72cda899,10.0.0.1:443 +/56e0aaba9e15a92b4ef584d3e8ebe091c52e2f60,10.0.0.4:443 +/0adf617e084382c2a23249aa985cd8b8acdaa65b,10.0.0.3:443 +/ed18addfd26398806fca2a8a2fe14bbe090fe0e0,10.0.0.5:443 +/d1a2ba53f3b93e000e644d5b1ad8c6854ea9f65d,10.0.0.6:443 +/7107620d6a0be698aac43ffc25f86eae6f53a2e1,10.0.0.8:443 +/3e4ae6d93c0251c47b243273c3cfa6a8e75561f1,10.0.0.8:443 +/d4c0a3bd89b4dddf0a6ea0b0527290ea09eb5a9c,10.0.0.1:443 +/152e9ed59081b05a44434e96acb10841a6ac4913,10.0.0.9:443 +/37165697fc6ef78a2f77de34e332378708c645b8,10.0.0.6:443 +/a4e2739e0c203344e3243ecd1085ae55c3e80e7a,10.0.0.8:443 +/23795bea08d22a82aeb2fdde5cf561d439358287,10.0.0.8:443 +/dfec4908e467ca09908796e4adaa92aa9483b5c3,10.0.0.3:443 +/ee0b2b4442cf36ae3c29d01c815bfa6cfe48fdfc,10.0.0.1:443 +/cb78173624dc4cb41a7b0237d17f7af6a083ea18,10.0.0.2:443 +/d739f678a0da873018a2c748a92ee4719fce3a0f,10.0.0.4:443 +/2323b27ff2ca01083ac32f731fbcd4d100e93e19,10.0.0.4:443 +/d499e2ec08b34a0bf998c14c887b383265034c9f,10.0.0.9:443 +/143084dfd65a8bdcd57fb64651f96008b040087b,10.0.0.6:443 +/97eaf99e3029d498617ea5154d783a7eb5c70c48,10.0.0.4:443 +/33a3ea00c963b4f990846722d410b76a95b8de58,10.0.0.7:443 +/74102b50e78c7af16e679839dcbda2e0372b6007,10.0.0.7:443 +/98e74ce3d20c7897ba902ddb716d77ccc12d7184,10.0.0.5:443 +/04675a8dc4351a5aefbd2ca04b6fc110d396a968,10.0.0.6:443 +/fc9d0c65157c42a6e4faaaaeeed021e97eba2829,10.0.0.1:443 +/e79898b6d8400872a5e147b49e9a639756cbe950,10.0.0.4:443 +/383d327d7129f3b712ff73766f8ce42c74e53b15,10.0.0.3:443 +/210ecfe8a674dc8a5ce6e0ff2cecca5f78769a23,10.0.0.8:443 +/0ceddf293af83e1c0b20a2f951c249dae970dac7,10.0.0.1:443 +/35f0c2c9af248df96020a501f0cd01c35750cbc0,10.0.0.8:443 +/52bc55a205fa328543f03294d9ed5417820fa601,10.0.0.6:443 +/3e4431ffbb010af04dd776032e7ac6f59a3b104c,10.0.0.8:443 +/4d9b2f4f469f2b2b5f3962306f97e75859c8d936,10.0.0.8:443 +/56f073a5a58f509bd491f8884fb340a935f40308,10.0.0.4:443 +/3c2b58f0da1000a8a78a3277b7fb5b1cfcd849a6,10.0.0.8:443 +/928720a69173cbfedccbcfc04b198414b5164f00,10.0.0.6:443 +/d87b50fec3d063f2a71d17c82d042c2abdcb7017,10.0.0.1:443 +/3f56709086bde9189226f4c98d6debd284176644,10.0.0.2:443 +/cdb098b71941c131bdafe7123c569453e08875b4,10.0.0.4:443 +/75dc829b5562bf143582d8ebed2766892438fdf6,10.0.0.5:443 +/e92003125b6bf378ea6279b52c744bbfa947c7ff,10.0.0.9:443 +/43601fdc2b42ea10e6eddf5ad078d9959591b054,10.0.0.1:443 +/4de1baf0df9a5653c30ba33552d107c8050ba65a,10.0.0.4:443 +/d8d7b30cd8c473d9dfccf65626bdcaf093524fab,10.0.0.9:443 +/44486486ce00b0c97d44bdaa8e9ad0cd4cef2f24,10.0.0.4:443 +/2f2b76cd6ceb9d1942d5e9ef0f887e2447426de1,10.0.0.3:443 +/0f4b330a57a82f31e00455ef74555b31f5bed5a5,10.0.0.3:443 +/1ac2f42face0e1b71f40775e804816f562f5046b,10.0.0.4:443 +/09d09a595f2bf13257e07133ac78186f35555584,10.0.0.5:443 +/0110e2fe94b7def844f20606e2c8456d3502bccf,10.0.0.3:443 +/e3b4ef341806cc37ca73c853f7fb946e1f5f6e8e,10.0.0.7:443 +/38e28374b1a9f41bc73cdfce491eb2bb0bce6fbd,10.0.0.3:443 +/d1713989a361ca1eed0f5f68fee5c50f91854b5f,10.0.0.3:443 +/f23af6f472102edef6de9491f9a844070ba3029d,10.0.0.9:443 +/f196a713839baae0bc51f4b1436927d36f2ff7d3,10.0.0.1:443 +/7598e8238744d04076326a8e3597578f9ecc8c09,10.0.0.1:443 +/59a02561ca435d78ab8aa7422feabf8773dec410,10.0.0.7:443 +/ca62f7298625dc574326a50ad33a4538559fc47a,10.0.0.5:443 +/04bb8594fabb6dab24d9913b98cf38d0563f83b7,10.0.0.6:443 +/8841711058feec5187329165955a9c60407f83fc,10.0.0.4:443 +/4d884355dd150df67499ef203c01635a229ba6c7,10.0.0.2:443 +/071866a73e91187306c1e407f0115a4d4151aeb6,10.0.0.2:443 +/afd57a92e88c0328fb968edd941b4100c6c9fb61,10.0.0.5:443 +/531c345d706cd574388c37ad61ba205c406b1d59,10.0.0.3:443 +/4502d8a61fb1a981fab9999a469dbac6e25e7f05,10.0.0.1:443 +/e230fc4ffa3bc256581c8eadd173e1cbc73b0cfe,10.0.0.1:443 +/05f427fdba3ff5f8d3039d3630772f161b8c9f14,10.0.0.3:443 +/911e4045fe7a9e356c9a4a83edb7675844e71e1d,10.0.0.1:443 +/6e23a5865d2058af3619068245ead5e51618e0c7,10.0.0.4:443 +/d1a0e473ffe13801ea65171346ad99201c498078,10.0.0.3:443 +/815a0974398d76788a4b3da7d0cca31f126d7de4,10.0.0.9:443 +/4b5e04ef7002bbde1ab4076a838dec1de1d26f39,10.0.0.1:443 +/524cccc30582e8a63d8312af6635e22baae5cc6c,10.0.0.6:443 +/96b6e633b7128d91860de1ac281c3080fc3abafa,10.0.0.9:443 +/bd9d0ab3293f97f2d0723f68fd28f15bd536c156,10.0.0.8:443 +/47e8509e8d7eb07744a7bbb065e51a64af82fa8e,10.0.0.3:443 +/7186ce2f19770360dfb9ecd97b89e16efac316d1,10.0.0.9:443 +/eed6c4e6883b62cdc3a7b196d99273557b071dd2,10.0.0.2:443 +/b96d9c4be72a2b9492c2d7317b95e9e8126b5c22,10.0.0.3:443 +/548f0aba642aad540d75ba5179b4c4e0769d3f1c,10.0.0.9:443 +/26584bcf23d47c9d064c9f80f8e428b694aca25d,10.0.0.9:443 +/e2037c8f1d790b8c725d9b2935056043ecbd8e93,10.0.0.2:443 +/7eba0adbc91b2e122b190de42e4560a2157b112d,10.0.0.8:443 +/9146d10cd7ab98d06cd0f4b4197602778d6e4140,10.0.0.8:443 +/9fa77d555984524b0361fb3072036e91cc6ea3f3,10.0.0.8:443 +/52f48dffdc8736b0c0e71df9a3c0f80e88770725,10.0.0.3:443 +/2265098f5d88d6d99acdb58cdcc1259d928625cc,10.0.0.9:443 +/d6192927efab74c222d087ad8dc2e820dcb7f6ea,10.0.0.4:443 +/18d1d3a2d5ed8d726aef5302787d369b79781866,10.0.0.4:443 +/19c735d6358604129199dcd752ff41f4221fa57e,10.0.0.5:443 +/b351359d0d09616a3fac0e66ae77e26357781753,10.0.0.6:443 +/6dd6111dd3ca9ad06569393e50484d3347035fa0,10.0.0.6:443 +/93e3738859d55cbde6df33ff6cb633a9f80dc0fa,10.0.0.9:443 +/eedde9fe113eb083a5d3c2508057360ddc18686b,10.0.0.7:443 +/62a3e2d15c1c66eb816f3dc13b5827984b9f699b,10.0.0.9:443 +/f1b4e2ea07bb5ed2f47433b75d1f7c8ef5bbf7d8,10.0.0.9:443 +/fa62a18b60e7385df8024532e7aeef87cc1df239,10.0.0.4:443 +/7d1aa3efc0114d39a9f010b900e1db11151e639d,10.0.0.6:443 +/f0468f9ae142e357977988238dd0688c49773aa0,10.0.0.4:443 +/d8b5c9789b2aab6444a2ef1d8cef7ae774458ab8,10.0.0.4:443 +/0a85a262365cc25072bfafe6136a7ba369566112,10.0.0.6:443 +/4dded059c42653a7c1beaa541e63645bf551ad08,10.0.0.2:443 +/49bd739af1ec870702c3847e967e8c6b435c5329,10.0.0.2:443 +/8d95d2e2928f6c92b660ec1c1fa7df8ff813aec6,10.0.0.4:443 +/93bf75656856b0a55da0773071dc39089dad6c98,10.0.0.3:443 +/e3390ff69032d41cc118b525049af3a5b802fae7,10.0.0.2:443 +/3150a053fd0dbbb2d248eebe951137ca8078f1d2,10.0.0.2:443 +/3f9d225fb79822a7b27f99d8d201ca1a45bb2ab6,10.0.0.2:443 +/452402b171258fcd3a5b107c1c939518102d2470,10.0.0.4:443 +/08317e0decede1f945dc9e1fe173bb4f6fb8cfd6,10.0.0.3:443 +/2cb847f4fbc26879a7b38289fb9add3bdc92a61b,10.0.0.4:443 +/42ba7f638d09315bd314f1ecb6cbaba154074a68,10.0.0.5:443 +/c0370eebe06344737bd403540f6adb13609b884f,10.0.0.8:443 +/9c71526d82a2a81ac13581c5f93125202f95aff2,10.0.0.3:443 +/e334fd8fafc2cb8809436c3e95e285d8c6079a94,10.0.0.7:443 +/c396727f862d04377b9b67299231d1d64d3d208a,10.0.0.3:443 +/6e51035a8d1f3bdd7f1732fd749ef13732a95944,10.0.0.8:443 +/785266cb2e5abbf8e8d68ed6944c44bc8784e83d,10.0.0.7:443 +/212094a923c2cc9abba620361a83ed32ffc89562,10.0.0.4:443 +/139a4d3ff8c15c8bbe453dc8b51a4369a9761129,10.0.0.8:443 +/990a8cbd6640ba4c5002b6e139b1ddc0784f687d,10.0.0.9:443 +/a6fb42bfd3197c44719ed46faf9d90cb06548ad8,10.0.0.7:443 +/000c6423f358a2c3d3f495b13d62c0d2c39f50bf,10.0.0.1:443 +/1cec05fac01fdaacba7ce488edd944661542c006,10.0.0.3:443 +/271f90733f7434e025ebc1f28267e53433857e1a,10.0.0.7:443 +/39e38084ed794b098342c8e73c50eb6e8a473451,10.0.0.3:443 +/391aed0b4c3552fe6045ee9e9fa5b1d749dad8f8,10.0.0.5:443 +/6f08ffaed414be3b69721e7f6d32cdbf708af3b3,10.0.0.9:443 +/e70ea65e0acabc146426f773ee43d27693b291cb,10.0.0.3:443 +/b4e2cf556861071282564ee2f08e188fac4c5253,10.0.0.6:443 +/6e68bc22256a8e097ae14c5309b968eac3ff115b,10.0.0.9:443 +/47960ff89fcae7481243ffa5164fd694546f238a,10.0.0.3:443 +/f574bef3f84ad106c19f59e9e9e02b531545144b,10.0.0.2:443 +/4c3e44af33a9121771dfc8ec0f9315ac44133350,10.0.0.9:443 +/b15cfc1da87f56a87bd209364583cb5ab32236dc,10.0.0.4:443 +/40fa370749b2451bea54a014f1424acf8f26689f,10.0.0.5:443 +/2b427b399d5020c4422e0795378c785ddb7294c2,10.0.0.1:443 +/bc6adfe886d10fe6eafb996229582b4bfc8d222a,10.0.0.8:443 +/f7b4e0538be13ab1bf2d07e6811a503a9017d9f3,10.0.0.2:443 +/5e89704df5e29e7cf5d66780e8c21ee43976a437,10.0.0.4:443 +/7b83fa2b9257a4c2d3130dca607ec04fb042a867,10.0.0.6:443 +/a3b86bc98a2d6f3ffa6572d699127b0bddedfed8,10.0.0.8:443 +/e4d0162d0642d1b8c48d405df85e19a38f0feeb3,10.0.0.7:443 +/4d485cce05cbc8a437c1c0935d481a7bea42fca5,10.0.0.2:443 +/845b67ac4a0134e07b0a024cc4e23712b6e6aa9b,10.0.0.2:443 +/b79bb4986238b94afdd2cb060f5552eef7ff54e6,10.0.0.3:443 +/af07fafd8a759a1cf5a609d578639be03caea26d,10.0.0.2:443 +/81af00d61c1ef5b0613ad4beb2dd25f89efc6182,10.0.0.8:443 +/593136c41899f34390c6d37d16edbbb9cfff7abf,10.0.0.7:443 +/cf81d9fb40a3fc64d11770126f3a699d70eb0feb,10.0.0.7:443 +/63c546d9b325c23c3861555b1e605cc203079992,10.0.0.7:443 +/4ca3fa4d44be023ccfa77e4f77f68fa0b8ea7710,10.0.0.5:443 +/e701444e1b1d9f079e9c879d8f999d88b56d68e4,10.0.0.5:443 +/f6c473391fa230f190cdda902f793977ce7de8ad,10.0.0.4:443 +/7bc73bd36bbd732b25e146e2f1b01003eb159775,10.0.0.6:443 +/5f287e6957a09c1be550752505a5c1420ec8d3bc,10.0.0.1:443 +/bebd964d7ca6eddee7de35715ebd96afa29fc8e2,10.0.0.2:443 +/db7a5038c8f2d930653563a2e523683c0a93bc88,10.0.0.1:443 +/47a642c6b0895911ca70f994f506c415505d75e1,10.0.0.9:443 +/8c5554e66fa04d50fb84d5f49ace6de0852a34da,10.0.0.6:443 +/89923f03ca432e5969fd0cdf5c15bb70b98efe2c,10.0.0.3:443 +/7a0aa569021d8b2b2ce9bfcb07310c1ebfad5151,10.0.0.1:443 +/0937e417478b8173a2bb86dc7b8b9454e059de4d,10.0.0.3:443 +/188c99ab8d7267ee33a9a5acc8de76bd58898b45,10.0.0.2:443 +/c4a8f4fb2483266c7166285df63f3640fa2319ae,10.0.0.8:443 +/4b0d34f033113d93ecf921c6a0feb0b2d45fb126,10.0.0.9:443 +/fb42de2cd7c51cacd89d7575bccd67f7ec25e451,10.0.0.4:443 +/028a4c7b547864a379e75c95c587ce64202181dc,10.0.0.8:443 +/cc5a526f4da916ea0a13b85e0c6e98b3d7437cc2,10.0.0.3:443 +/9113f453c2c434907b4ff773fe5b86f21e382416,10.0.0.1:443 +/830245306057d4c72845188ee1bcd3faec480fa2,10.0.0.1:443 +/81d4b64d141924bb939f5379c79f68c8f73f6861,10.0.0.5:443 +/be650948e3ec4fb93c66085f4ccf5b03ab52da7d,10.0.0.3:443 +/dbf7fb2f6d1c71680d904a7982c255d38d9cb66d,10.0.0.6:443 +/df2fd0964b001d47e7d7d8f704997f8b2c188e2b,10.0.0.3:443 +/52599e6c628d3713b5a86776e7bf4e8eba08096b,10.0.0.8:443 +/9a8dfa9f83cddf2f78d3dfaf2977d373250d08bc,10.0.0.6:443 +/f6fff2fc64e81812c1fd6f44b17c0fca8f4d9b21,10.0.0.7:443 +/c74b67ef81c0f0159ebf55bdb7cec7edbb49ef01,10.0.0.7:443 +/1d489c78f83a538058f88a86d5d3b784a4c8c15b,10.0.0.3:443 +/c19e2d12a05580a32ed36959def7a94542f3e4b2,10.0.0.2:443 +/0ee2827f11b4ec40f5b9461cafbafc67c9bb19a4,10.0.0.6:443 +/6fb7e55aa135073aafdaee252c0d41711f0012cf,10.0.0.2:443 +/c31f4f1035ecf919c2a8ad9aaba8f34787b6dbfc,10.0.0.5:443 +/da600775aba95b86e525802936ee1080854c9171,10.0.0.7:443 +/713544fa205d177ae8892dcf355b3d9099d36346,10.0.0.9:443 +/7c838304cffcd5141ea13dc98ebe74d3c9fb0b68,10.0.0.5:443 +/1dfa23ab97268feb6b3fd019ff226fbcd628995c,10.0.0.2:443 +/525a23553d91d72be5ae26127029e9e3bdfdb6d7,10.0.0.9:443 +/963f18f8446ec2ab1921e0579c7af87477ef3592,10.0.0.3:443 +/64b104d6efe390c5e8a008312e16772cc76632e8,10.0.0.2:443 +/de3125db331f2c903e96f273a649f782abde3250,10.0.0.7:443 +/fb2452ca4688e9fa8177919d72032717ac19d692,10.0.0.4:443 +/76e8bbc123c7346c3398619f12d9888fee50811e,10.0.0.8:443 +/d6f9d8096fe75dd35f317b7d1ce0bc02306f6121,10.0.0.3:443 +/1e543c13b6af55fc60e9ba216790bccda9c2c3d1,10.0.0.2:443 +/5f5831dbaeba200adb52eefeb26d792cf996a01f,10.0.0.3:443 +/60ac99dd1fe31198399c6b447438e8ac62a35036,10.0.0.4:443 +/579077bb8a893f47c1b1eca7d8cd2d62c8a3b774,10.0.0.2:443 +/86590bffd69757efb277e2b6db69385bbb6ebba3,10.0.0.7:443 +/f829ea6c46b59cde7ba3c6745b58c3ea170bc73b,10.0.0.3:443 +/27f54b7138336175705207561c6a0a123049ab66,10.0.0.7:443 +/fab8ba44a38c657d3b8f9066daac9f3222e3fd50,10.0.0.3:443 +/03540d372c8e6621cf57ecb910cf872164f9c53a,10.0.0.9:443 +/be320aecabb73286f782df249b53ef4f7c816b72,10.0.0.1:443 +/7e383ecf09c026dec3eb9c77ca35274f209748a3,10.0.0.2:443 +/749dd7577081e9d47a3bf9ef091b6e5b8daa5786,10.0.0.3:443 +/9507f34faa73988fb0d35f65141b2e1b0f407cd7,10.0.0.3:443 +/a2188b80856ef974da6eac0238a35afef2e1c30c,10.0.0.6:443 +/1ca972ed159b7e8c838752ce60186262a923d057,10.0.0.4:443 +/57e7abe6d05fe0443800929dcf46315bc5135519,10.0.0.4:443 +/075a1e1426a5d210fd5501249809b1e20c793c92,10.0.0.2:443 +/36c50cfcf56f7e15f91844b9f3201efa4cc26fc8,10.0.0.5:443 +/f824965edace119738da671b1e3cba0e44be224d,10.0.0.5:443 +/ba49bc86400cd3970ea82550b493ac21a5bf750c,10.0.0.6:443 +/2dbc4c595e56af88a175811e715d908d04223c2b,10.0.0.1:443 +/e322f439382552e76fb5672448c2fc6beb56e32f,10.0.0.8:443 +/e86b62c9f240c97f1503be26f97fbd412b846e2d,10.0.0.3:443 +/e15c58a856824298802fed4151234e2a4812697a,10.0.0.7:443 +/e7b5cbc5701beadb3c8dc04361cdd1de8f8b0d1d,10.0.0.3:443 +/bf317994d21c8731f4afdebb4d77e8eb997a962b,10.0.0.5:443 +/cabb2f8ffcd78c654c2eb0bbecd32a4a5523c165,10.0.0.8:443 +/17e54d3b00f5b989798846b7729789343df86d56,10.0.0.2:443 +/334fcdca67212b01973f946729a2cb569adec510,10.0.0.3:443 +/a5d301755197c85985ee48ec53b4c3a1b6bfd9fe,10.0.0.1:443 +/f2b98e19ef44d81d16fce5e9e3b51c3be2a8c713,10.0.0.2:443 +/6a69c385697c7ed6b85ced626d993b02216c2186,10.0.0.6:443 +/7d269fcf2722875d9d75eb2e3a7e0fae6ae69a98,10.0.0.5:443 +/126e39e079fa7f2b254146466e9ab79dc4347be6,10.0.0.6:443 +/8a4aa1d8a6d6a2938ecf7a42814723e282a90af7,10.0.0.1:443 +/b9d9fabb46dbe2db5f5e0164c98002114c96006e,10.0.0.2:443 +/d50477758d2f8e99392ae9d51ecef57484c97c5c,10.0.0.1:443 +/09ac262b5b37a5b19fa989e5ac75722fdb8e17fa,10.0.0.1:443 +/e1a322b3fc40fbc296f9ed8dc86d6e91c6dc8edb,10.0.0.7:443 +/1c08ea2a0c42f17d99acbcb2b0ffd33ba1fb34be,10.0.0.8:443 +/6aeefd631b7ddc01936b7fc11f1e8aa5a892bec0,10.0.0.4:443 +/f579e4e7a050029dd9f21a38104c1f1b65e201e5,10.0.0.3:443 +/9764ed5aff70cac5a5859415251c0b78dfd37909,10.0.0.7:443 +/42d5cfb85dba89806546ac382f61fd909cdd40e9,10.0.0.2:443 +/4cf6e01e948d9cd30032b5ce77d2d8b2daaf21c8,10.0.0.2:443 +/e0367d6d11e81aaed084aceac24b327abf053c0f,10.0.0.2:443 +/28b88a8f6ce2d2b9bd5867f89555f2e43dc96c96,10.0.0.8:443 +/56b207c041ede2daa065d56b03173663eb6a1f2b,10.0.0.9:443 +/6aa844c78eb0fb44bd506a920360377163eb7150,10.0.0.9:443 +/327dabc969eb616dc10e4ba33f45d0cc97bf79d9,10.0.0.4:443 +/7352ac0a60c2f2bedb0d80b0249db5aa59e0c876,10.0.0.4:443 +/70a2d83045053301afcd56deb30bf9095a9ad9d6,10.0.0.6:443 +/cbd05a3e43fb4b1aa98283c8d460571c43144391,10.0.0.9:443 +/0c903f04b7f6d4a59a97c4ae212a41ad579e5a1f,10.0.0.1:443 +/3dc7749f4a1f30ab1da8b4e2722d3228b0e8b33d,10.0.0.7:443 +/b502f7505431f9b895c9fe04de74250f27f96f7f,10.0.0.7:443 +/14aa1d4b3ddf14d7f5cba9de1f08213a8a104131,10.0.0.9:443 +/cce74a90b51c414ffbb28ad26794bf589c1eaf82,10.0.0.2:443 +/a86f00b80c039be1e4fad0b5508bd1710ba08ae1,10.0.0.4:443 +/055b3843846da3e34606dc209c9b30192e6119da,10.0.0.4:443 +/f05c831872443fb183c6c19d1a9dbe45315f0205,10.0.0.4:443 +/d974f0525fca48f3fae6d6cbd0b394e4324ad422,10.0.0.3:443 +/00ef799f7c152055a4163c41bb9bcd37fa375f77,10.0.0.8:443 +/7d876337b6f86b9873ccc49f7efaae91d180a06d,10.0.0.4:443 +/93e9efe1dba728a9a51d68024eddf764903cac1f,10.0.0.3:443 +/891d1f6116aec61df9dce652743ae159861e9160,10.0.0.5:443 +/eec92486c171c84f0dcf3db001ce9b32615166f3,10.0.0.2:443 +/363f23c9bea6cac01f85f7ec7142110363110c83,10.0.0.7:443 +/51c699024eca637817d2bdad512b9c1d19fafec4,10.0.0.6:443 +/b7aaa2cd36505f55bb047799649acadc5e20e499,10.0.0.8:443 +/b17b2a0be4fa445a0f762a92dee8b8d83e992eae,10.0.0.8:443 +/e4c7719fd87bd86bf5c818a7aa38af4d50691d51,10.0.0.6:443 +/edf7c0e6f7c1bbdc7bc67912607a32ff5ea786fc,10.0.0.7:443 +/62d1fb98dbf4825ae9db6f919233f32391ce1e63,10.0.0.8:443 +/e0214b5cf9fb6464dfd37a84fd41a759e571a531,10.0.0.7:443 +/bf7e606b9858081dd5f7f2346f70e314238809e3,10.0.0.9:443 +/f4907d05ffb94d07ea2a1eb8dc187268cd386f0e,10.0.0.2:443 +/cdfc8d53ae78926694e5d1ebe872d30f7e57b324,10.0.0.5:443 +/0de8c668d5445e80ca5d0ece0e653fc664b494e8,10.0.0.3:443 +/fafc8ecbe6fc726999a249b9a2bfcb52371dec5b,10.0.0.2:443 +/ec395c49cff03d52796f59fabdabd045eac000c2,10.0.0.3:443 +/0bfbcb5addb0f60e28504de36a10abb80d253011,10.0.0.3:443 +/6bfbff0669887fbad3bc871771c62f9c5d723740,10.0.0.3:443 +/d0b0e5ce39dbfd7a3b46542c1088e0231295be11,10.0.0.9:443 +/79dc5f7b75fb82ed6dc0be947434d924ade41bc0,10.0.0.6:443 +/f729ac665760539349f1c808bee81b70d58537d2,10.0.0.8:443 +/fc97f5ede259d60ba9f28888ac3b2155b5490df6,10.0.0.1:443 +/0c5cb1e0dfed034af7c893de52841686a762f6aa,10.0.0.2:443 +/accc74f350aa74842630517d6abfed728d5da084,10.0.0.8:443 +/1c2e760244036ee1f0539288d446830a0fa38457,10.0.0.7:443 +/0f60c86359a38087105591e609f90b962b02b74d,10.0.0.4:443 +/9e26cafc4562d086fadce9cdf56c049e9cfd93b8,10.0.0.5:443 +/bce221dd780f255f84e159f1097ada3461ff5ab6,10.0.0.2:443 +/c918871122a23989009073c879ff5c6d3add13c9,10.0.0.2:443 +/50735adf2192b44d672b99cacf66e650582f9088,10.0.0.1:443 +/fb763e843db495284a217b2ec5027d7698bd49ca,10.0.0.1:443 +/ae068dab9d1784aa098085b9f62e168c79bb75cc,10.0.0.9:443 +/6b989c5c0af97c2acd640cb2f05dc4b4fe7c9323,10.0.0.4:443 +/fd81185d074101b22059f022b174d058b8d84a6e,10.0.0.8:443 +/20277b4366206f795ed97f4631baa755636ddc1e,10.0.0.6:443 +/c9199750bcb6c7b916958bee64f562547d6b1209,10.0.0.3:443 +/9ea2e79813eb3a6d7b710affe1b15f7da29c1800,10.0.0.2:443 +/2de1269fec587f989b911e4780eb056c310d2f38,10.0.0.3:443 +/6869b9dc65fbfb56fbe828c0ac57da2bc7c0f460,10.0.0.6:443 +/aab69be5e6c02f972e8a5e0e4c9716b84d91e667,10.0.0.6:443 +/f8bb7d358e7c5e58b5d68a29a2d1ca4cb1a4af61,10.0.0.6:443 +/441f8eab7ac90bb4022d35aca197b5765ecb7dbd,10.0.0.6:443 +/36bad018c5b41a8f3731267f364cbe74cee32a81,10.0.0.4:443 +/8d31a4ecc03b0d8680ccef099ed528f07a5e8aa0,10.0.0.8:443 +/2e8ee9b09986f2e9435dc2b3b05d9353a8edfd3b,10.0.0.2:443 +/950dedb075f2e3b2bc485659e0158fa0c26511ee,10.0.0.7:443 +/2bff636db028f497c8d26cce0c138a5049e69db7,10.0.0.4:443 +/33353d31c296876c1267ae7793d0c03caea2a387,10.0.0.6:443 +/bf5e822a05f799f153c648ccad9987396c9350cd,10.0.0.8:443 +/92a14cc304c5fce82d3e4999fb1a819c92cfe2fb,10.0.0.9:443 +/06c1a1908f37b4cdacd7098b7304ced839f8412b,10.0.0.3:443 +/de42c498d10e019fad348d4e2e512afd8981a7b9,10.0.0.3:443 +/42fda236e32757240063ecb99681a30121629ab3,10.0.0.8:443 +/b93d832939ca288351cb707d8332ff7aade5c6d0,10.0.0.1:443 +/2fac7e24ec806b0b5b9f7bf99587e50357d9db04,10.0.0.9:443 +/8750e04a953b8d6a5922ccd6172d59052671207a,10.0.0.5:443 +/c5128cc73e48a0270a27e6e2b89f563dd756f987,10.0.0.6:443 +/4f798881963c3638cd89168511eccdc8f6ecc62f,10.0.0.9:443 +/330116fb139c1313bc78bf314a211d98dad8093e,10.0.0.1:443 +/d2ff19c61e6be34733295d7c2f81b18c81a687bb,10.0.0.5:443 +/5ff80f927eb571aa8cc006e0b7c237d8f571ffce,10.0.0.6:443 +/b2ec40a643fd05a9f0d52b180893c9837a824060,10.0.0.8:443 +/9445a32fc9ea94e27622826699123272ca5bb38a,10.0.0.8:443 +/7b12433171a2a61d85b95d8ffd9ddd7e8ba1f0d2,10.0.0.5:443 +/8f1cdae5a092253771e8d07f66ef45d6cc540c5e,10.0.0.6:443 +/7b70d96ff0b14c53aaf134e16cb0ac8394b9ec93,10.0.0.7:443 +/1c91cb4395964c049653b422be092ecc6957ccb1,10.0.0.3:443 +/1680013d16baaadb8cffb6838dcaacfa9ddf815b,10.0.0.5:443 +/a6cd614c3f4c92608f176ecbbd24d4d3e980c18e,10.0.0.5:443 +/5f1ad23d97b5effe56b07770e1b1e3eb9302ce32,10.0.0.9:443 +/4014ea7ccfafddaa137d536e55a4fcd04f4acd23,10.0.0.7:443 +/4fe5c55f2b57b57de6773d70b788f2950f98aedb,10.0.0.4:443 +/a5e42b24f4d2b631529dc5e693f0f44bec25d917,10.0.0.4:443 +/605b3c16dde75dde0ed96aa420121d491043b868,10.0.0.5:443 +/a44dfbc27a6071c7fc05ec6c9bcf08a80059d9fa,10.0.0.1:443 +/376db7678f97b6614c81a967ac480d2af1ec0494,10.0.0.9:443 +/7ab905b17ac0f3dabc9e73654f7108a60382b4bc,10.0.0.9:443 +/7e15729b3688eac3a344b412fb66660812e51f5a,10.0.0.9:443 +/fbe2e0d0d220635295397028926abcab95eae6f3,10.0.0.9:443 +/68c054186128694496e7e34307333f0ee0fcdd83,10.0.0.5:443 +/1539ed321544c684ffec1d533551c8ca3b814bed,10.0.0.2:443 +/52239e8766c814c60a588320ff4e55dc9fed5345,10.0.0.9:443 +/7dbfc596b9cf0c5fada4c6521d67d7e8d42f6fea,10.0.0.2:443 +/d04621fb5f44305c3532899899c6f53b6afd5a0e,10.0.0.9:443 +/581c7d2014ef9aa2a30e54ac748164ba66a534d1,10.0.0.8:443 +/d6c7154f9d5d3dadd5e890c8f7a1b178c4992eea,10.0.0.2:443 +/bbf232fd8d3a7f781b378f160e3fb629ce09ad59,10.0.0.5:443 +/5290410ff6812f651bac427d10fae8ec7a40af1f,10.0.0.9:443 +/41cfbbdf14643164ccb783e93eb123f23f2f32bc,10.0.0.8:443 +/9259ff15fbb6d028166e31434d4e2dc335c390c1,10.0.0.2:443 +/20258376b3bf0e3221118d0b5af2c4b8fa99fef6,10.0.0.6:443 +/7e34ebb4b42eafe0d92fb748745416bdcdb67bb6,10.0.0.2:443 +/2fe398814da4f25bf0f9f6982b43d955b93c9c4b,10.0.0.1:443 +/6bcc489d911d7e972b2decd5827d4eb628111a84,10.0.0.4:443 +/87c70ef625901e6ba2c9dd6951efa9964bebe528,10.0.0.1:443 +/3b79cc56f47bc4487a12e940caff6f4c294b633d,10.0.0.1:443 +/bc5e3f9cfd2ca035467321a7280dd2bf1250259c,10.0.0.4:443 +/260555bb77d941f8761b6a8b81267b6884da79d0,10.0.0.5:443 +/e77e4992d06c6b662602f32bf8a52b1567eb2f48,10.0.0.6:443 +/0923bd0601bf7c074d173268cbf94cff7b39045b,10.0.0.6:443 +/d7d8f74a97ab8b8f0df34c447e2ba2bf19a8a142,10.0.0.2:443 +/1151eda28cedf93b701b4ae0f47b35c488d493af,10.0.0.3:443 +/e4b07856919718e9c7cd00154fbb12c343bf5b55,10.0.0.6:443 +/d5751616b55b115bb0260fee64eef350e83cf080,10.0.0.1:443 +/83bcc3183e99dcba798f17ac6340722b1431ed0d,10.0.0.5:443 +/2e84b809ca1744c2abfe7d8ec18dc80adbddc0bc,10.0.0.2:443 +/782631660d3fd29586a00922d9e1885c30615bbf,10.0.0.2:443 +/0b43b69669114bc2ef38ad7e03fae7a82ac22f29,10.0.0.5:443 +/59bb4679fc26333daec10a9a06ebe185ab252628,10.0.0.4:443 +/26b7bd3eae9d48dae1677893dc7efb7d7d9386ac,10.0.0.6:443 +/1842eded9ad6f8e951f4e788bbedd8ad9b5d2289,10.0.0.8:443 +/c0b818cdd2005b05932a6369a80d47993cf1982b,10.0.0.9:443 +/6f3b0ed59323086c2557c6ba6aeb433bd82509bb,10.0.0.1:443 +/bbe9eabb406b05b3b13ce634caf88123d095de39,10.0.0.3:443 +/fad30adc6c95657b9be1a490b338798fa08504c4,10.0.0.3:443 +/56a66dd5081af7ed9d9307374606245a751cceb7,10.0.0.4:443 +/356beada6689aaba95fafc526101a982acb007ae,10.0.0.9:443 +/c9917bb9de0eea13e66c71c88c35a03f3567bae2,10.0.0.1:443 +/495d1913174caaaae68eda040e329cf9ab6b807c,10.0.0.2:443 +/f89a11728d36429756ea52ed934a853ae2cf62a5,10.0.0.2:443 +/3263ba8199377065aafe566132716c2bf4fa300d,10.0.0.9:443 +/cf44e7fe43a615a8b58cf5c3b7486e95acf95904,10.0.0.5:443 +/2a055e911545b3eb8ca31202c16a916a061eb35c,10.0.0.7:443 +/120600f104f1305c96642a7382c7074386dc9f08,10.0.0.3:443 +/05690a98a3b475aed24b300bd1d8303c5f0ed93c,10.0.0.8:443 +/648371cafaf32c90a7c48ec2c6d57f84d4f81346,10.0.0.1:443 +/c346dd5aae927ff513fe5a8ece05ec7c2d5674d1,10.0.0.9:443 +/531b748b3a7ba01687e3eb6bff2df8951daaf1f4,10.0.0.5:443 +/c0935bb5ad0fad0b3f9ff4b924c01971116eac7a,10.0.0.7:443 +/2bf19d2be50af6207c977fa502caa8a6dd6722da,10.0.0.8:443 +/afc63845c577fb67edb01f2f4faa19415f36420d,10.0.0.2:443 +/efaa738ada05c233a8f0d57b3064d6ccd86b1a14,10.0.0.5:443 +/b1d708f56428e8c5e9ae6f8a2a492c75271b8a45,10.0.0.5:443 +/5b9abc512a9b5ee9083d0feec592002f5f31dad3,10.0.0.7:443 +/55289a755e9560c48aa164f357598a47603badbd,10.0.0.2:443 +/472ecb69a8ca3b590d1dd3153928d7bb09f43799,10.0.0.4:443 +/072d33b67d2137d961ff8ee5a5c0d024196e4c91,10.0.0.7:443 +/ca285c1755e98e6a8080cbe45f54e91f7d466c2a,10.0.0.2:443 +/13661ad2ef4c48f7e99825bce7efc3833d0bb1d4,10.0.0.3:443 +/b093d7e4f7588296fb1b02b38f20effb4c193571,10.0.0.1:443 +/dc82252b61b41f0760eb44be138b6bca8a22b804,10.0.0.9:443 +/1cddee843b564a0ef0acb8ca9d7dff0d7df3dcfd,10.0.0.2:443 +/560e227e16cb133b6fe840bf9ef03e8aee0dad66,10.0.0.8:443 +/55b1bd74ae0aea31731589550b7610ed00c1e616,10.0.0.9:443 +/87f0d8ea3eaa758b41763fd06d2fa52d2fcff7a7,10.0.0.7:443 +/e669bc52f256862c9ae2e0a7748ee3786a2abcf8,10.0.0.9:443 +/3fca3907c98117eb1920bd03179a35aee3cf1939,10.0.0.3:443 +/3d2582fcdf65814ef32ceb3ca30a38aba02020d2,10.0.0.7:443 +/2628fbf7ed1e6f3f2152a57e1b7c440895088534,10.0.0.6:443 +/37f57ab1100409e62e4ca9532dd245a2e84eab7a,10.0.0.8:443 +/1558c2abe0ba2f4d83172d32ee9489a5f2469174,10.0.0.2:443 +/8e2dc4336eb4fb16deebee7c3157b2a4ddc040d8,10.0.0.5:443 +/016a870bfd0d09832a50ca14d666e7ef21978f5a,10.0.0.3:443 +/f3d2b5c0cad02f1507f1a5c21b4cd8704f32ac11,10.0.0.6:443 +/1ded786d6cc703eda6a20bb7388114a6b74cb17b,10.0.0.1:443 +/de0132cb6cdc9aeba721296519d0b64bdb752855,10.0.0.6:443 +/30f11551b9d666c96604d01526eb001c9cd7524c,10.0.0.9:443 +/eb8bc43a527ac9f8088010954c78fff7a75f1bca,10.0.0.9:443 +/30803c02cdc19bccba32ad9fc2e77a4d20eac091,10.0.0.5:443 +/7281a18bbbb2dfb089b3039f516792e6b0c67714,10.0.0.2:443 +/8787a462e8890da32babea287e4874d3da54246e,10.0.0.6:443 +/2b76b52f43b204a6353a1f999ff40a221602199b,10.0.0.3:443 +/56e71d678f5482e4dfe45ae83993e9e1c9fe2731,10.0.0.1:443 +/08d820998361440ed076fa803e05d3c983383920,10.0.0.2:443 +/cc449cf6cee8ce83f646097ac3d3287becacfe41,10.0.0.4:443 +/2da713a30a3333b7895b3d65a320fe6c20bf670a,10.0.0.8:443 +/658e0cefd2b9a97b2dd840c2a243afc0dbf06cf1,10.0.0.9:443
\ No newline at end of file diff --git a/pingora-ketama/test-data/trace.sh b/pingora-ketama/test-data/trace.sh new file mode 100755 index 0000000..1c309bb --- /dev/null +++ b/pingora-ketama/test-data/trace.sh @@ -0,0 +1,6 @@ +#!/bin/bash +set -eu +for i in {0..1000}; do + URI=$(openssl rand -hex 20) + curl http://localhost:8080/$URI -so /dev/null || true +done
\ No newline at end of file diff --git a/pingora-limits/Cargo.toml b/pingora-limits/Cargo.toml index 88ee11d..0a0ef99 100644 --- a/pingora-limits/Cargo.toml +++ b/pingora-limits/Cargo.toml @@ -4,14 +4,18 @@ version = "0.1.0" authors = ["Yuchen Wu <[email protected]>"] license = "Apache-2.0" description = "A library for rate limiting and event frequency estimation" -edition = "2018" +edition = "2021" +repository = "https://github.com/cloudflare/pingora" +categories = ["algorithms"] +keywords = ["rate-limit", "pingora"] +# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html [lib] name = "pingora_limits" path = "src/lib.rs" [dependencies] -ahash = "0" +ahash = { workspace = true } [dev-dependencies] rand = "0" diff --git a/pingora-limits/benches/benchmark.rs b/pingora-limits/benches/benchmark.rs index bf84a10..1cc4b96 100644 --- a/pingora-limits/benches/benchmark.rs +++ b/pingora-limits/benches/benchmark.rs @@ -1,4 +1,4 @@ -// Copyright 2023 Cloudflare, Inc. +// Copyright 2024 Cloudflare, Inc. // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. diff --git a/pingora-limits/src/estimator.rs b/pingora-limits/src/estimator.rs index 45f1592..2e98ad4 100644 --- a/pingora-limits/src/estimator.rs +++ b/pingora-limits/src/estimator.rs @@ -1,4 +1,4 @@ -// Copyright 2023 Cloudflare, Inc. +// Copyright 2024 Cloudflare, Inc. // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. diff --git a/pingora-limits/src/inflight.rs b/pingora-limits/src/inflight.rs index 68d78d2..9c8814d 100644 --- a/pingora-limits/src/inflight.rs +++ b/pingora-limits/src/inflight.rs @@ -1,4 +1,4 @@ -// Copyright 2023 Cloudflare, Inc. +// Copyright 2024 Cloudflare, Inc. // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. diff --git a/pingora-limits/src/lib.rs b/pingora-limits/src/lib.rs index 5a396e0..33012c4 100644 --- a/pingora-limits/src/lib.rs +++ b/pingora-limits/src/lib.rs @@ -1,4 +1,4 @@ -// Copyright 2023 Cloudflare, Inc. +// Copyright 2024 Cloudflare, Inc. // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. @@ -24,11 +24,9 @@ pub mod inflight; pub mod rate; use ahash::RandomState; -use std::hash::{BuildHasher, Hash, Hasher}; +use std::hash::Hash; #[inline] fn hash<T: Hash>(key: T, hasher: &RandomState) -> u64 { - let mut hasher = hasher.build_hasher(); - key.hash(&mut hasher); - hasher.finish() + hasher.hash_one(key) } diff --git a/pingora-limits/src/rate.rs b/pingora-limits/src/rate.rs index dd05cec..40f7605 100644 --- a/pingora-limits/src/rate.rs +++ b/pingora-limits/src/rate.rs @@ -1,4 +1,4 @@ -// Copyright 2023 Cloudflare, Inc. +// Copyright 2024 Cloudflare, Inc. // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. diff --git a/pingora-load-balancing/Cargo.toml b/pingora-load-balancing/Cargo.toml new file mode 100644 index 0000000..7d2cc62 --- /dev/null +++ b/pingora-load-balancing/Cargo.toml @@ -0,0 +1,33 @@ +[package] +name = "pingora-load-balancing" +version = "0.1.0" +authors = ["Yuchen Wu <[email protected]>"] +license = "Apache-2.0" +edition = "2021" +repository = "https://github.com/cloudflare/pingora" +categories = ["network-programming"] +keywords = ["proxy", "pingora"] +description = """ +Common load balancing features for Pingora proxy. +""" + +# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html +[lib] +name = "pingora_load_balancing" +path = "src/lib.rs" + +[dependencies] +async-trait = { workspace = true } +pingora-http = { version = "0.1.0", path = "../pingora-http" } +pingora-error = { version = "0.1.0", path = "../pingora-error" } +pingora-core = { version = "0.1.0", path = "../pingora-core" } +pingora-ketama = { version = "0.1.0", path = "../pingora-ketama" } +pingora-runtime = { version = "0.1.0", path = "../pingora-runtime" } +arc-swap = "1" +fnv = "1" +rand = "0" +tokio = { workspace = true } +futures = "0" +log = { workspace = true } + +[dev-dependencies] diff --git a/pingora-load-balancing/LICENSE b/pingora-load-balancing/LICENSE new file mode 100644 index 0000000..d645695 --- /dev/null +++ b/pingora-load-balancing/LICENSE @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/pingora-load-balancing/src/background.rs b/pingora-load-balancing/src/background.rs new file mode 100644 index 0000000..3d3a1f7 --- /dev/null +++ b/pingora-load-balancing/src/background.rs @@ -0,0 +1,61 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Implement [BackgroundService] for [LoadBalancer] + +use std::time::{Duration, Instant}; + +use super::{BackendIter, BackendSelection, LoadBalancer}; +use async_trait::async_trait; +use pingora_core::services::background::BackgroundService; + +#[async_trait] +impl<S: Send + Sync + BackendSelection + 'static> BackgroundService for LoadBalancer<S> +where + S::Iter: BackendIter, +{ + async fn start(&self, shutdown: pingora_core::server::ShutdownWatch) -> () { + // 136 years + const NEVER: Duration = Duration::from_secs(u32::MAX as u64); + let mut now = Instant::now(); + // run update and health check once + let mut next_update = now; + let mut next_health_check = now; + loop { + if *shutdown.borrow() { + return; + } + + if next_update <= now { + // TODO: log err + let _ = self.update().await; + next_update = now + self.update_frequency.unwrap_or(NEVER); + } + + if next_health_check <= now { + self.backends + .run_health_check(self.parallel_health_check) + .await; + next_health_check = now + self.health_check_frequency.unwrap_or(NEVER); + } + + if self.update_frequency.is_none() && self.health_check_frequency.is_none() { + return; + } + let to_wake = std::cmp::min(next_update, next_health_check); + tokio::time::sleep_until(to_wake.into()).await; + now = Instant::now(); + } + } +} diff --git a/pingora-load-balancing/src/discovery.rs b/pingora-load-balancing/src/discovery.rs new file mode 100644 index 0000000..5a38c2f --- /dev/null +++ b/pingora-load-balancing/src/discovery.rs @@ -0,0 +1,107 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Service discovery interface and implementations + +use arc_swap::ArcSwap; +use async_trait::async_trait; +use pingora_core::protocols::l4::socket::SocketAddr; +use pingora_error::Result; +use std::io::Result as IoResult; +use std::net::ToSocketAddrs; +use std::{ + collections::{BTreeSet, HashMap}, + sync::Arc, +}; + +use crate::Backend; + +/// [ServiceDiscovery] is the interface to discover [Backend]s. +#[async_trait] +pub trait ServiceDiscovery { + /// Return the discovered collection of backends. + /// And *optionally* whether these backends are enabled to serve or not in a `HashMap`. Any backend + /// that is not explicitly in the set is considered enabled. + async fn discover(&self) -> Result<(BTreeSet<Backend>, HashMap<u64, bool>)>; +} + +// TODO: add DNS base discovery + +/// A static collection of [Backend]s for service discovery. +#[derive(Default)] +pub struct Static { + backends: ArcSwap<BTreeSet<Backend>>, +} + +impl Static { + /// Create a new boxed [Static] service discovery with the given backends. + pub fn new(backends: BTreeSet<Backend>) -> Box<Self> { + Box::new(Static { + backends: ArcSwap::new(Arc::new(backends)), + }) + } + + /// Create a new boxed [Static] from a given iterator of items that implements [ToSocketAddrs]. + pub fn try_from_iter<A, T: IntoIterator<Item = A>>(iter: T) -> IoResult<Box<Self>> + where + A: ToSocketAddrs, + { + let mut upstreams = BTreeSet::new(); + for addrs in iter.into_iter() { + let addrs = addrs.to_socket_addrs()?.map(|addr| Backend { + addr: SocketAddr::Inet(addr), + weight: 1, + }); + upstreams.extend(addrs); + } + Ok(Self::new(upstreams)) + } + + /// return the collection to backends + pub fn get(&self) -> BTreeSet<Backend> { + BTreeSet::clone(&self.backends.load()) + } + + // Concurrent set/add/remove might race with each other + // TODO: use a queue to avoid racing + + // TODO: take an impl iter + #[allow(dead_code)] + pub(crate) fn set(&self, backends: BTreeSet<Backend>) { + self.backends.store(backends.into()) + } + + #[allow(dead_code)] + pub(crate) fn add(&self, backend: Backend) { + let mut new = self.get(); + new.insert(backend); + self.set(new) + } + + #[allow(dead_code)] + pub(crate) fn remove(&self, backend: &Backend) { + let mut new = self.get(); + new.remove(backend); + self.set(new) + } +} + +#[async_trait] +impl ServiceDiscovery for Static { + async fn discover(&self) -> Result<(BTreeSet<Backend>, HashMap<u64, bool>)> { + // no readiness + let health = HashMap::new(); + Ok((self.get(), health)) + } +} diff --git a/pingora-load-balancing/src/health_check.rs b/pingora-load-balancing/src/health_check.rs new file mode 100644 index 0000000..adb0c4f --- /dev/null +++ b/pingora-load-balancing/src/health_check.rs @@ -0,0 +1,384 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Health Check interface and methods. + +use crate::Backend; +use arc_swap::ArcSwap; +use async_trait::async_trait; +use pingora_core::connectors::{http::Connector as HttpConnector, TransportConnector}; +use pingora_core::upstreams::peer::{BasicPeer, HttpPeer, Peer}; +use pingora_error::{Error, ErrorType::CustomCode, Result}; +use pingora_http::{RequestHeader, ResponseHeader}; +use std::sync::Arc; +use std::time::Duration; + +/// [HealthCheck] is the interface to implement health check for backends +#[async_trait] +pub trait HealthCheck { + /// Check the given backend. + /// + /// `Ok(())`` if the check passes, otherwise the check fails. + async fn check(&self, target: &Backend) -> Result<()>; + /// This function defines how many *consecutive* checks should flip the health of a backend. + /// + /// For example: with `success``: `true`: this function should return the + /// number of check need to to flip from unhealthy to healthy. + fn health_threshold(&self, success: bool) -> usize; +} + +/// TCP health check +/// +/// This health check checks if a TCP (or TLS) connection can be established to a given backend. +pub struct TcpHealthCheck { + /// Number of successful check to flip from unhealthy to healthy. + pub consecutive_success: usize, + /// Number of failed check to flip from healthy to unhealthy. + pub consecutive_failure: usize, + /// How to connect to the backend. + /// + /// This field defines settings like the connect timeout and src IP to bind. + /// The SocketAddr of `peer_template` is just a placeholder which will be replaced by the + /// actual address of the backend when the health check runs. + /// + /// By default this check will try to establish a TCP connection. When the the `sni` field is + /// set, it will also try to establish a TLS connection on top of the TCP connection. + pub peer_template: BasicPeer, + connector: TransportConnector, +} + +impl Default for TcpHealthCheck { + fn default() -> Self { + let mut peer_template = BasicPeer::new("0.0.0.0:1"); + peer_template.options.connection_timeout = Some(Duration::from_secs(1)); + TcpHealthCheck { + consecutive_success: 1, + consecutive_failure: 1, + peer_template, + connector: TransportConnector::new(None), + } + } +} + +impl TcpHealthCheck { + /// Create a new [TcpHealthCheck] with the following default values + /// * connect timeout: 1 second + /// * consecutive_success: 1 + /// * consecutive_failure: 1 + pub fn new() -> Box<Self> { + Box::<TcpHealthCheck>::default() + } + + /// Create a new [TcpHealthCheck] that tries to establish a TLS connection. + /// + /// The default values are the same as [Self::new()]. + pub fn new_tls(sni: &str) -> Box<Self> { + let mut new = Self::default(); + new.peer_template.sni = sni.into(); + Box::new(new) + } + + /// Replace the internal tcp connector with the given [TransportConnector] + pub fn set_connector(&mut self, connector: TransportConnector) { + self.connector = connector; + } +} + +#[async_trait] +impl HealthCheck for TcpHealthCheck { + fn health_threshold(&self, success: bool) -> usize { + if success { + self.consecutive_success + } else { + self.consecutive_failure + } + } + + async fn check(&self, target: &Backend) -> Result<()> { + let mut peer = self.peer_template.clone(); + peer._address = target.addr.clone(); + self.connector.get_stream(&peer).await.map(|_| {}) + } +} + +type Validator = Box<dyn Fn(&ResponseHeader) -> Result<()> + Send + Sync>; + +/// HTTP health check +/// +/// This health check checks if it can receive the expected HTTP(s) response from the given backend. +pub struct HttpHealthCheck { + /// Number of successful check to flip from unhealthy to healthy. + pub consecutive_success: usize, + /// Number of failed check to flip from healthy to unhealthy. + pub consecutive_failure: usize, + /// How to connect to the backend. + /// + /// This field defines settings like the connect timeout and src IP to bind. + /// The SocketAddr of `peer_template` is just a placeholder which will be replaced by the + /// actual address of the backend when the health check runs. + /// + /// Set the `scheme` field to use HTTPs. + pub peer_template: HttpPeer, + /// Whether the underlying TCP/TLS connection can be reused across checks. + /// + /// * `false` will make sure that every health check goes through TCP (and TLS) handshakes. + /// Established connections sometimes hide the issue of firewalls and L4 LB. + /// * `true` will try to reuse connections across checks, this is the more efficient and fast way + /// to perform health checks. + pub reuse_connection: bool, + /// The request header to send to the backend + pub req: RequestHeader, + connector: HttpConnector, + /// Optional field to define how to validate the response from the server. + /// + /// If not set, any response with a `200 OK` is considered a successful check. + pub validator: Option<Validator>, + /// Sometimes the health check endpoint lives one a different port than the actual backend. + /// Setting this option allows the health check to perform on the given port of the backend IP. + pub port_override: Option<u16>, +} + +impl HttpHealthCheck { + /// Create a new [HttpHealthCheck] with the following default settings + /// * connect timeout: 1 second + /// * read timeout: 1 second + /// * req: a GET to the `/` of the given host name + /// * consecutive_success: 1 + /// * consecutive_failure: 1 + /// * reuse_connection: false + /// * validator: `None`, any 200 response is consider successful + pub fn new(host: &str, tls: bool) -> Self { + let mut req = RequestHeader::build("GET", b"/", None).unwrap(); + req.append_header("Host", host).unwrap(); + let sni = if tls { host.into() } else { String::new() }; + let mut peer_template = HttpPeer::new("0.0.0.0:1", tls, sni); + peer_template.options.connection_timeout = Some(Duration::from_secs(1)); + peer_template.options.read_timeout = Some(Duration::from_secs(1)); + HttpHealthCheck { + consecutive_success: 1, + consecutive_failure: 1, + peer_template, + connector: HttpConnector::new(None), + reuse_connection: false, + req, + validator: None, + port_override: None, + } + } + + /// Replace the internal http connector with the given [HttpConnector] + pub fn set_connector(&mut self, connector: HttpConnector) { + self.connector = connector; + } +} + +#[async_trait] +impl HealthCheck for HttpHealthCheck { + fn health_threshold(&self, success: bool) -> usize { + if success { + self.consecutive_success + } else { + self.consecutive_failure + } + } + + async fn check(&self, target: &Backend) -> Result<()> { + let mut peer = self.peer_template.clone(); + peer._address = target.addr.clone(); + if let Some(port) = self.port_override { + peer._address.set_port(port); + } + let session = self.connector.get_http_session(&peer).await?; + + let mut session = session.0; + let req = Box::new(self.req.clone()); + session.write_request_header(req).await?; + + if let Some(read_timeout) = peer.options.read_timeout { + session.set_read_timeout(read_timeout); + } + + session.read_response_header().await?; + + let resp = session.response_header().expect("just read"); + + if let Some(validator) = self.validator.as_ref() { + validator(resp)?; + } else if resp.status != 200 { + return Error::e_explain( + CustomCode("non 200 code", resp.status.as_u16()), + "during http healthcheck", + ); + }; + + while session.read_response_body().await?.is_some() { + // drain the body if any + } + + if self.reuse_connection { + let idle_timeout = peer.idle_timeout(); + self.connector + .release_http_session(session, &peer, idle_timeout) + .await; + } + + Ok(()) + } +} + +#[derive(Clone)] +struct HealthInner { + /// Whether the endpoint is healthy to serve traffic + healthy: bool, + /// Whether the endpoint is allowed to serve traffic independent from its health + enabled: bool, + /// The counter for stateful transition between healthy and unhealthy. + /// When [healthy] is true, this counts the number of consecutive health check failures + /// so that the caller can flip the healthy when a certain threshold is met, and vise versa. + consecutive_counter: usize, +} + +/// Health of backends that can be updated atomically +pub(crate) struct Health(ArcSwap<HealthInner>); + +impl Default for Health { + fn default() -> Self { + Health(ArcSwap::new(Arc::new(HealthInner { + healthy: true, // TODO: allow to start with unhealthy + enabled: true, + consecutive_counter: 0, + }))) + } +} + +impl Clone for Health { + fn clone(&self) -> Self { + let inner = self.0.load_full(); + Health(ArcSwap::new(inner)) + } +} + +impl Health { + pub fn ready(&self) -> bool { + let h = self.0.load(); + h.healthy && h.enabled + } + + pub fn enable(&self, enabled: bool) { + let h = self.0.load(); + if h.enabled != enabled { + // clone the inner + let mut new_health = (**h).clone(); + new_health.enabled = enabled; + self.0.store(Arc::new(new_health)); + }; + } + + // return true when the health is flipped + pub fn observe_health(&self, health: bool, flip_threshold: usize) -> bool { + let h = self.0.load(); + let mut flipped = false; + if h.healthy != health { + // opposite health observed, ready to increase the counter + // clone the inner + let mut new_health = (**h).clone(); + new_health.consecutive_counter += 1; + if new_health.consecutive_counter >= flip_threshold { + new_health.healthy = health; + new_health.consecutive_counter = 0; + flipped = true; + } + self.0.store(Arc::new(new_health)); + } else if h.consecutive_counter > 0 { + // observing the same health as the current state. + // reset the counter, if it is non-zero, because it is no longer consecutive + let mut new_health = (**h).clone(); + new_health.consecutive_counter = 0; + self.0.store(Arc::new(new_health)); + } + flipped + } +} + +#[cfg(test)] +mod test { + use super::*; + use crate::SocketAddr; + + #[tokio::test] + async fn test_tcp_check() { + let tcp_check = TcpHealthCheck::default(); + + let backend = Backend { + addr: SocketAddr::Inet("1.1.1.1:80".parse().unwrap()), + weight: 1, + }; + + assert!(tcp_check.check(&backend).await.is_ok()); + + let backend = Backend { + addr: SocketAddr::Inet("1.1.1.1:79".parse().unwrap()), + weight: 1, + }; + + assert!(tcp_check.check(&backend).await.is_err()); + } + + #[tokio::test] + async fn test_tls_check() { + let tls_check = TcpHealthCheck::new_tls("one.one.one.one"); + let backend = Backend { + addr: SocketAddr::Inet("1.1.1.1:443".parse().unwrap()), + weight: 1, + }; + + assert!(tls_check.check(&backend).await.is_ok()); + } + + #[tokio::test] + async fn test_https_check() { + let https_check = HttpHealthCheck::new("one.one.one.one", true); + + let backend = Backend { + addr: SocketAddr::Inet("1.1.1.1:443".parse().unwrap()), + weight: 1, + }; + + assert!(https_check.check(&backend).await.is_ok()); + } + + #[tokio::test] + async fn test_http_custom_check() { + let mut http_check = HttpHealthCheck::new("one.one.one.one", false); + http_check.validator = Some(Box::new(|resp: &ResponseHeader| { + if resp.status == 301 { + Ok(()) + } else { + Error::e_explain( + CustomCode("non 301 code", resp.status.as_u16()), + "during http healthcheck", + ) + } + })); + + let backend = Backend { + addr: SocketAddr::Inet("1.1.1.1:80".parse().unwrap()), + weight: 1, + }; + + http_check.check(&backend).await.unwrap(); + + assert!(http_check.check(&backend).await.is_ok()); + } +} diff --git a/pingora-load-balancing/src/lib.rs b/pingora-load-balancing/src/lib.rs new file mode 100644 index 0000000..60e30ed --- /dev/null +++ b/pingora-load-balancing/src/lib.rs @@ -0,0 +1,487 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! # Pingora Load Balancing utilities +//! This crate provides common service discovery, health check and load balancing +//! algorithms for proxies to use. + +use arc_swap::ArcSwap; +use futures::FutureExt; +use pingora_core::protocols::l4::socket::SocketAddr; +use pingora_error::{ErrorType, OrErr, Result}; +use std::collections::hash_map::DefaultHasher; +use std::collections::{BTreeSet, HashMap}; +use std::hash::{Hash, Hasher}; +use std::io::Result as IoResult; +use std::net::ToSocketAddrs; +use std::sync::Arc; +use std::time::Duration; + +mod background; +pub mod discovery; +pub mod health_check; +pub mod selection; + +use discovery::ServiceDiscovery; +use health_check::Health; +use selection::UniqueIterator; +use selection::{BackendIter, BackendSelection}; + +pub mod prelude { + pub use crate::health_check::TcpHealthCheck; + pub use crate::selection::RoundRobin; + pub use crate::LoadBalancer; +} + +/// [Backend] represents a server to proxy or connect to. +#[derive(Clone, Hash, PartialEq, Eq, PartialOrd, Ord, Debug)] +pub struct Backend { + /// The address to the backend server. + pub addr: SocketAddr, + /// The relative weight of the server. Load balancing algorithms will + /// proportionally distributed traffic according to this value. + pub weight: usize, +} + +impl Backend { + /// Create a new [Backend] with `weight` 1. The function will try to parse + /// `addr` into a [std::net::SocketAddr]. + pub fn new(addr: &str) -> Result<Self> { + let addr = addr + .parse() + .or_err(ErrorType::InternalError, "invalid socket addr")?; + Ok(Backend { + addr: SocketAddr::Inet(addr), + weight: 1, + }) + // TODO: UDS + } + + pub(crate) fn hash_key(&self) -> u64 { + let mut hasher = DefaultHasher::new(); + self.hash(&mut hasher); + hasher.finish() + } +} + +impl std::ops::Deref for Backend { + type Target = SocketAddr; + + fn deref(&self) -> &Self::Target { + &self.addr + } +} + +impl std::ops::DerefMut for Backend { + fn deref_mut(&mut self) -> &mut Self::Target { + &mut self.addr + } +} + +impl std::net::ToSocketAddrs for Backend { + type Iter = std::iter::Once<std::net::SocketAddr>; + + fn to_socket_addrs(&self) -> std::io::Result<Self::Iter> { + self.addr.to_socket_addrs() + } +} + +/// [Backends] is a collection of [Backend]s. +/// +/// It includes a service discovery method (static or dynamic) to discover all +/// the available backends as well as an optionally health check method to probe the liveness +/// of each backend. +pub struct Backends { + discovery: Box<dyn ServiceDiscovery + Send + Sync + 'static>, + health_check: Option<Arc<dyn health_check::HealthCheck + Send + Sync + 'static>>, + backends: ArcSwap<BTreeSet<Backend>>, + health: ArcSwap<HashMap<u64, Health>>, +} + +impl Backends { + /// Create a new [Backends] with the given [ServiceDiscovery] implementation. + /// + /// The health check method is by default empty. + pub fn new(discovery: Box<dyn ServiceDiscovery + Send + Sync + 'static>) -> Self { + Self { + discovery, + health_check: None, + backends: Default::default(), + health: Default::default(), + } + } + + /// Set the health check method. See [health_check] for the methods provided. + pub fn set_health_check( + &mut self, + hc: Box<dyn health_check::HealthCheck + Send + Sync + 'static>, + ) { + self.health_check = Some(hc.into()) + } + + /// Return true when the new is different from the current set of backends + fn do_update(&self, new_backends: BTreeSet<Backend>, enablement: HashMap<u64, bool>) -> bool { + if (**self.backends.load()) != new_backends { + let old_health = self.health.load(); + let mut health = HashMap::with_capacity(new_backends.len()); + for backend in new_backends.iter() { + let hash_key = backend.hash_key(); + // use the default health if the backend is new + let backend_health = old_health.get(&hash_key).cloned().unwrap_or_default(); + + // override enablement + if let Some(backend_enabled) = enablement.get(&hash_key) { + backend_health.enable(*backend_enabled); + } + health.insert(hash_key, backend_health); + } + + // TODO: put backend and health under 1 ArcSwap so that this update is atomic + self.backends.store(Arc::new(new_backends)); + self.health.store(Arc::new(health)); + true + } else { + // no backend change, just check enablement + for (hash_key, backend_enabled) in enablement.iter() { + // override enablement if set + // this get should always be Some(_) because we already populate `health`` for all known backends + if let Some(backend_health) = self.health.load().get(hash_key) { + backend_health.enable(*backend_enabled); + } + } + false + } + } + + /// Whether a certain [Backend] is ready to serve traffic. + /// + /// This function returns true when the backend is both healthy and enabled. + /// This function returns true when the health check is unset but the backend is enabled. + /// When the health check is set, this function will return false for the `backend` it + /// doesn't know. + pub fn ready(&self, backend: &Backend) -> bool { + self.health + .load() + .get(&backend.hash_key()) + // Racing: return `None` when this function is called between the + // backend store and the health store + .map_or(self.health_check.is_none(), |h| h.ready()) + } + + /// Manually set if a [Backend] is ready to serve traffic. + /// + /// This method does not override the health of the backend. It is meant to be used + /// to stop a backend from accepting traffic when it is still healthy. + /// + /// This method is noop when the given backend doesn't exist in the service discovery. + pub fn set_enable(&self, backend: &Backend, enabled: bool) { + // this should always be Some(_) because health is always populated during update + if let Some(h) = self.health.load().get(&backend.hash_key()) { + h.enable(enabled) + }; + } + + /// Return the collection of the backends. + pub fn get_backend(&self) -> Arc<BTreeSet<Backend>> { + self.backends.load_full() + } + + /// Call the service discovery method to update the collection of backends. + /// + /// Return `true` when the new collection is different from the current set of backends. + /// This return value is useful to tell the caller when to rebuild things that are expensive to + /// update, such as consistent hashing rings. + pub async fn update(&self) -> Result<bool> { + let (new_backends, enablement) = self.discovery.discover().await?; + Ok(self.do_update(new_backends, enablement)) + } + + /// Run health check on all the backend if it is set. + /// + /// When `parallel: true`, all the backends are checked in parallel instead of sequentially + pub async fn run_health_check(&self, parallel: bool) { + use crate::health_check::HealthCheck; + use log::{info, warn}; + use pingora_runtime::current_handle; + + async fn check_and_report( + backend: &Backend, + check: &Arc<dyn HealthCheck + Send + Sync>, + health_table: &HashMap<u64, Health>, + ) { + let errored = check.check(backend).await.err(); + if let Some(h) = health_table.get(&backend.hash_key()) { + let flipped = + h.observe_health(errored.is_none(), check.health_threshold(errored.is_none())); + if flipped { + if let Some(e) = errored { + warn!("{backend:?} becomes unhealthy, {e}"); + } else { + info!("{backend:?} becomes healthy"); + } + } + } + } + + let Some(health_check) = self.health_check.as_ref() else { + return; + }; + + let backends = self.backends.load(); + if parallel { + let health_table = self.health.load_full(); + let runtime = current_handle(); + let jobs = backends.iter().map(|backend| { + let backend = backend.clone(); + let check = health_check.clone(); + let ht = health_table.clone(); + runtime.spawn(async move { + check_and_report(&backend, &check, &ht).await; + }) + }); + + futures::future::join_all(jobs).await; + } else { + for backend in backends.iter() { + check_and_report(backend, health_check, &self.health.load()).await; + } + } + } +} + +/// A [LoadBalancer] instance contains the service discovery, health check and backend selection +/// all together. +/// +/// In order to run service discovery and health check at the designated frequencies, the [LoadBalancer] +/// needs to be run as a [pingora_core::services::background::BackgroundService]. +pub struct LoadBalancer<S> { + backends: Backends, + selector: ArcSwap<S>, + /// How frequent the health check logic (if set) should run. + /// + /// If `None`, the health check logic will only run once at the beginning. + pub health_check_frequency: Option<Duration>, + /// How frequent the service discovery should run. + /// + /// If `None`, the service discovery will only run once at the beginning. + pub update_frequency: Option<Duration>, + /// Whether to run health check to all backends in parallel. Default is false. + pub parallel_health_check: bool, +} + +impl<'a, S: BackendSelection> LoadBalancer<S> +where + S: BackendSelection + 'static, + S::Iter: BackendIter, +{ + /// Build a [LoadBalancer] with static backends created from the iter. + /// + /// Note: [ToSocketAddrs] will invoke blocking network IO for DNS lookup if + /// the input cannot be directly parsed as [SocketAddr]. + pub fn try_from_iter<A, T: IntoIterator<Item = A>>(iter: T) -> IoResult<Self> + where + A: ToSocketAddrs, + { + let discovery = discovery::Static::try_from_iter(iter)?; + let backends = Backends::new(discovery); + let lb = Self::from_backends(backends); + lb.update() + .now_or_never() + .expect("static should not block") + .expect("static should not error"); + Ok(lb) + } + + /// Build a [LoadBalancer] with the given [Backends]. + pub fn from_backends(backends: Backends) -> Self { + let selector = ArcSwap::new(Arc::new(S::build(&backends.get_backend()))); + LoadBalancer { + backends, + selector, + health_check_frequency: None, + update_frequency: None, + parallel_health_check: false, + } + } + + /// Run the service discovery and update the selection algorithm. + /// + /// This function will be called every `update_frequency` if this [LoadBalancer] instance + /// is running as a background service. + pub async fn update(&self) -> Result<()> { + if self.backends.update().await? { + self.selector + .store(Arc::new(S::build(&self.backends.get_backend()))) + } + Ok(()) + } + + /// Return the first healthy [Backend] according to the selection algorithm and the + /// health check results. + /// + /// The `key` is used for hash based selection and is ignored if the selection is random or + /// round robin. + /// + /// the `max_iterations` is there to bound the search time for the next Backend. In certain + /// algorithm like Ketama hashing, the search for the next backend is linear and could take + /// a lot steps. + // TODO: consider remove `max_iterations` as users have no idea how to set it. + pub fn select(&self, key: &[u8], max_iterations: usize) -> Option<Backend> { + self.select_with(key, max_iterations, |_, health| health) + } + + /// Similar to [Self::select], return the first healthy [Backend] according to the selection algorithm + /// and the user defined `accept` function. + /// + /// The `accept` function takes two inputs, the backend being selected and the internal health of that + /// backend. The function can do things like ignoring the internal health checks or skipping this backend + /// because it failed before. The `accept` function is called multiple times iterating over backends + /// until it returns `true`. + pub fn select_with<F>(&self, key: &[u8], max_iterations: usize, accept: F) -> Option<Backend> + where + F: Fn(&Backend, bool) -> bool, + { + let selection = self.selector.load(); + let mut iter = UniqueIterator::new(selection.iter(key), max_iterations); + while let Some(b) = iter.get_next() { + if accept(&b, self.backends.ready(&b)) { + return Some(b); + } + } + None + } + + /// Set the health check method. See [health_check]. + pub fn set_health_check( + &mut self, + hc: Box<dyn health_check::HealthCheck + Send + Sync + 'static>, + ) { + self.backends.set_health_check(hc); + } + + /// Access the [Backends] of this [LoadBalancer] + pub fn backends(&self) -> &Backends { + &self.backends + } +} + +#[cfg(test)] +mod test { + use super::*; + use async_trait::async_trait; + + #[tokio::test] + async fn test_static_backends() { + let backends: LoadBalancer<selection::RoundRobin> = + LoadBalancer::try_from_iter(["1.1.1.1:80", "1.0.0.1:80"]).unwrap(); + + let backend1 = Backend::new("1.1.1.1:80").unwrap(); + let backend2 = Backend::new("1.0.0.1:80").unwrap(); + let backend = backends.backends().get_backend(); + assert!(backend.contains(&backend1)); + assert!(backend.contains(&backend2)); + } + + #[tokio::test] + async fn test_backends() { + let discovery = discovery::Static::default(); + let good1 = Backend::new("1.1.1.1:80").unwrap(); + discovery.add(good1.clone()); + let good2 = Backend::new("1.0.0.1:80").unwrap(); + discovery.add(good2.clone()); + let bad = Backend::new("127.0.0.1:79").unwrap(); + discovery.add(bad.clone()); + + let mut backends = Backends::new(Box::new(discovery)); + let check = health_check::TcpHealthCheck::new(); + backends.set_health_check(check); + + // true: new backend discovered + assert!(backends.update().await.unwrap()); + + // false: no new backend discovered + assert!(!backends.update().await.unwrap()); + + backends.run_health_check(false).await; + + let backend = backends.get_backend(); + assert!(backend.contains(&good1)); + assert!(backend.contains(&good2)); + assert!(backend.contains(&bad)); + + assert!(backends.ready(&good1)); + assert!(backends.ready(&good2)); + assert!(!backends.ready(&bad)); + } + + #[tokio::test] + async fn test_discovery_readiness() { + use discovery::Static; + + struct TestDiscovery(Static); + #[async_trait] + impl ServiceDiscovery for TestDiscovery { + async fn discover(&self) -> Result<(BTreeSet<Backend>, HashMap<u64, bool>)> { + let bad = Backend::new("127.0.0.1:79").unwrap(); + let (backends, mut readiness) = self.0.discover().await?; + readiness.insert(bad.hash_key(), false); + Ok((backends, readiness)) + } + } + let discovery = Static::default(); + let good1 = Backend::new("1.1.1.1:80").unwrap(); + discovery.add(good1.clone()); + let good2 = Backend::new("1.0.0.1:80").unwrap(); + discovery.add(good2.clone()); + let bad = Backend::new("127.0.0.1:79").unwrap(); + discovery.add(bad.clone()); + let discovery = TestDiscovery(discovery); + + let backends = Backends::new(Box::new(discovery)); + assert!(backends.update().await.unwrap()); + + let backend = backends.get_backend(); + assert!(backend.contains(&good1)); + assert!(backend.contains(&good2)); + assert!(backend.contains(&bad)); + + assert!(backends.ready(&good1)); + assert!(backends.ready(&good2)); + assert!(!backends.ready(&bad)); + } + + #[tokio::test] + async fn test_parallel_health_check() { + let discovery = discovery::Static::default(); + let good1 = Backend::new("1.1.1.1:80").unwrap(); + discovery.add(good1.clone()); + let good2 = Backend::new("1.0.0.1:80").unwrap(); + discovery.add(good2.clone()); + let bad = Backend::new("127.0.0.1:79").unwrap(); + discovery.add(bad.clone()); + + let mut backends = Backends::new(Box::new(discovery)); + let check = health_check::TcpHealthCheck::new(); + backends.set_health_check(check); + + // true: new backend discovered + assert!(backends.update().await.unwrap()); + + backends.run_health_check(true).await; + + assert!(backends.ready(&good1)); + assert!(backends.ready(&good2)); + assert!(!backends.ready(&bad)); + } +} diff --git a/pingora-load-balancing/src/selection/algorithms.rs b/pingora-load-balancing/src/selection/algorithms.rs new file mode 100644 index 0000000..b17d973 --- /dev/null +++ b/pingora-load-balancing/src/selection/algorithms.rs @@ -0,0 +1,61 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Implementation of algorithms for weighted selection +//! +//! All [std::hash::Hasher] + [Default] can be used directly as a selection algorithm. + +use super::*; +use std::hash::Hasher; +use std::sync::atomic::{AtomicUsize, Ordering}; + +impl<H> SelectionAlgorithm for H +where + H: Default + Hasher, +{ + fn new() -> Self { + H::default() + } + fn next(&self, key: &[u8]) -> u64 { + let mut hasher = H::default(); + hasher.write(key); + hasher.finish() + } +} + +/// Round Robin selection +pub struct RoundRobin(AtomicUsize); + +impl SelectionAlgorithm for RoundRobin { + fn new() -> Self { + Self(AtomicUsize::new(0)) + } + fn next(&self, _key: &[u8]) -> u64 { + self.0.fetch_add(1, Ordering::Relaxed) as u64 + } +} + +/// Random selection +pub struct Random; + +impl SelectionAlgorithm for Random { + fn new() -> Self { + Self + } + fn next(&self, _key: &[u8]) -> u64 { + use rand::Rng; + let mut rng = rand::thread_rng(); + rng.gen() + } +} diff --git a/pingora-load-balancing/src/selection/consistent.rs b/pingora-load-balancing/src/selection/consistent.rs new file mode 100644 index 0000000..60c7b9f --- /dev/null +++ b/pingora-load-balancing/src/selection/consistent.rs @@ -0,0 +1,135 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Consistent Hashing + +use super::*; +use pingora_core::protocols::l4::socket::SocketAddr; +use pingora_ketama::{Bucket, Continuum}; +use std::collections::HashMap; +use std::sync::Arc; + +/// Weighted Ketama consistent hashing +pub struct KetamaHashing { + ring: Continuum, + // TODO: update Ketama to just store this + backends: HashMap<SocketAddr, Backend>, +} + +impl BackendSelection for KetamaHashing { + type Iter = OwnedNodeIterator; + + fn build(backends: &BTreeSet<Backend>) -> Self { + let buckets: Vec<_> = backends + .iter() + .filter_map(|b| { + // FIXME: ketama only supports Inet addr, UDS addrs are ignored here + if let SocketAddr::Inet(addr) = b.addr { + Some(Bucket::new(addr, b.weight as u32)) + } else { + None + } + }) + .collect(); + let new_backends = backends + .iter() + .map(|b| (b.addr.clone(), b.clone())) + .collect(); + KetamaHashing { + ring: Continuum::new(&buckets), + backends: new_backends, + } + } + + fn iter(self: &Arc<Self>, key: &[u8]) -> Self::Iter { + OwnedNodeIterator { + idx: self.ring.node_idx(key), + ring: self.clone(), + } + } +} + +/// Iterator over a Continuum +pub struct OwnedNodeIterator { + idx: usize, + ring: Arc<KetamaHashing>, +} + +impl BackendIter for OwnedNodeIterator { + fn next(&mut self) -> Option<&Backend> { + self.ring.ring.get_addr(&mut self.idx).and_then(|addr| { + let addr = SocketAddr::Inet(*addr); + self.ring.backends.get(&addr) + }) + } +} + +#[cfg(test)] +mod test { + use super::*; + + #[test] + fn test_ketama() { + let b1 = Backend::new("1.1.1.1:80").unwrap(); + let b2 = Backend::new("1.0.0.1:80").unwrap(); + let b3 = Backend::new("1.0.0.255:80").unwrap(); + let backends = BTreeSet::from_iter([b1.clone(), b2.clone(), b3.clone()]); + let hash = Arc::new(KetamaHashing::build(&backends)); + + let mut iter = hash.iter(b"test0"); + assert_eq!(iter.next(), Some(&b2)); + let mut iter = hash.iter(b"test1"); + assert_eq!(iter.next(), Some(&b1)); + let mut iter = hash.iter(b"test2"); + assert_eq!(iter.next(), Some(&b1)); + let mut iter = hash.iter(b"test3"); + assert_eq!(iter.next(), Some(&b1)); + let mut iter = hash.iter(b"test4"); + assert_eq!(iter.next(), Some(&b1)); + let mut iter = hash.iter(b"test5"); + assert_eq!(iter.next(), Some(&b3)); + let mut iter = hash.iter(b"test6"); + assert_eq!(iter.next(), Some(&b1)); + let mut iter = hash.iter(b"test7"); + assert_eq!(iter.next(), Some(&b3)); + let mut iter = hash.iter(b"test8"); + assert_eq!(iter.next(), Some(&b1)); + let mut iter = hash.iter(b"test9"); + assert_eq!(iter.next(), Some(&b2)); + + // remove b3 + let backends = BTreeSet::from_iter([b1.clone(), b2.clone()]); + let hash = Arc::new(KetamaHashing::build(&backends)); + let mut iter = hash.iter(b"test0"); + assert_eq!(iter.next(), Some(&b2)); + let mut iter = hash.iter(b"test1"); + assert_eq!(iter.next(), Some(&b1)); + let mut iter = hash.iter(b"test2"); + assert_eq!(iter.next(), Some(&b1)); + let mut iter = hash.iter(b"test3"); + assert_eq!(iter.next(), Some(&b1)); + let mut iter = hash.iter(b"test4"); + assert_eq!(iter.next(), Some(&b1)); + let mut iter = hash.iter(b"test5"); + assert_eq!(iter.next(), Some(&b2)); // changed + let mut iter = hash.iter(b"test6"); + assert_eq!(iter.next(), Some(&b1)); + let mut iter = hash.iter(b"test7"); + assert_eq!(iter.next(), Some(&b1)); // changed + let mut iter = hash.iter(b"test8"); + assert_eq!(iter.next(), Some(&b1)); + let mut iter = hash.iter(b"test9"); + assert_eq!(iter.next(), Some(&b2)); + } +} diff --git a/pingora-load-balancing/src/selection/mod.rs b/pingora-load-balancing/src/selection/mod.rs new file mode 100644 index 0000000..6320a8e --- /dev/null +++ b/pingora-load-balancing/src/selection/mod.rs @@ -0,0 +1,171 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Backend selection interfaces and algorithms + +pub mod algorithms; +pub mod consistent; +pub mod weighted; + +use super::Backend; +use std::collections::{BTreeSet, HashSet}; +use std::sync::Arc; +use weighted::Weighted; + +/// [BackendSelection] is the interface to implement backend selection mechanisms. +pub trait BackendSelection { + /// The [BackendIter] returned from iter() below. + type Iter; + /// The function to create a [BackendSelection] implementation. + fn build(backends: &BTreeSet<Backend>) -> Self; + /// Select backends for a given key. + /// + /// An [BackendIter] should be returned. The first item in the iter is the first + /// choice backend. The user should continue iterate over it if the first backend + /// cannot be used due to its health or other reasons. + fn iter(self: &Arc<Self>, key: &[u8]) -> Self::Iter + where + Self::Iter: BackendIter; +} + +/// An iterator to find the suitable backend +/// +/// Similar to [Iterator] but allow self referencing. +pub trait BackendIter { + /// Return `Some(&Backend)` when there are more backends left to choose from. + fn next(&mut self) -> Option<&Backend>; +} + +/// [SelectionAlgorithm] is the interface to implement selection algorithms. +/// +/// All [std::hash::Hasher] + [Default] can be used directly as a selection algorithm. +pub trait SelectionAlgorithm { + /// Create a new implementation + fn new() -> Self; + /// Return the next index of backend. The caller should perform modulo to get + /// the valid index of the backend. + fn next(&self, key: &[u8]) -> u64; +} + +/// [FVN](https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function) hashing +/// on weighted backends +pub type FVNHash = Weighted<fnv::FnvHasher>; +/// Random selection on weighted backends +pub type Random = Weighted<algorithms::Random>; +/// Round robin selection on weighted backends +pub type RoundRobin = Weighted<algorithms::RoundRobin>; +/// Consistent Ketama hashing on weighted backends +pub type Consistent = consistent::KetamaHashing; + +// TODO: least conn + +/// An iterator which wraps another iterator and yields unique items. It optionally takes a max +/// number of iterations if the wrapped iterator never returns. +pub struct UniqueIterator<I> +where + I: BackendIter, +{ + iter: I, + seen: HashSet<u64>, + max_iterations: usize, + steps: usize, +} + +impl<I> UniqueIterator<I> +where + I: BackendIter, +{ + /// Wrap a new iterator and specify the maximum number of times we want to iterate. + pub fn new(iter: I, max_iterations: usize) -> Self { + Self { + iter, + max_iterations, + seen: HashSet::new(), + steps: 0, + } + } + + pub fn get_next(&mut self) -> Option<Backend> { + while let Some(item) = self.iter.next() { + if self.steps >= self.max_iterations { + return None; + } + self.steps += 1; + + let hash_key = item.hash_key(); + if !self.seen.contains(&hash_key) { + self.seen.insert(hash_key); + return Some(item.clone()); + } + } + + None + } +} + +#[cfg(test)] +mod tests { + use super::*; + + struct TestIter { + seq: Vec<Backend>, + idx: usize, + } + impl TestIter { + fn new(input: &[&Backend]) -> Self { + Self { + seq: input.iter().cloned().cloned().collect(), + idx: 0, + } + } + } + impl BackendIter for TestIter { + fn next(&mut self) -> Option<&Backend> { + let idx = self.idx; + self.idx += 1; + self.seq.get(idx) + } + } + + #[test] + fn unique_iter_max_iterations_is_correct() { + let b1 = Backend::new("1.1.1.1:80").unwrap(); + let b2 = Backend::new("1.0.0.1:80").unwrap(); + let b3 = Backend::new("1.0.0.255:80").unwrap(); + let items = [&b1, &b2, &b3]; + + let mut all = UniqueIterator::new(TestIter::new(&items), 3); + assert_eq!(all.get_next(), Some(b1.clone())); + assert_eq!(all.get_next(), Some(b2.clone())); + assert_eq!(all.get_next(), Some(b3.clone())); + assert_eq!(all.get_next(), None); + + let mut stop = UniqueIterator::new(TestIter::new(&items), 1); + assert_eq!(stop.get_next(), Some(b1)); + assert_eq!(stop.get_next(), None); + } + + #[test] + fn unique_iter_duplicate_items_are_filtered() { + let b1 = Backend::new("1.1.1.1:80").unwrap(); + let b2 = Backend::new("1.0.0.1:80").unwrap(); + let b3 = Backend::new("1.0.0.255:80").unwrap(); + let items = [&b1, &b1, &b2, &b2, &b2, &b3]; + + let mut uniq = UniqueIterator::new(TestIter::new(&items), 10); + assert_eq!(uniq.get_next(), Some(b1)); + assert_eq!(uniq.get_next(), Some(b2)); + assert_eq!(uniq.get_next(), Some(b3)); + } +} diff --git a/pingora-load-balancing/src/selection/weighted.rs b/pingora-load-balancing/src/selection/weighted.rs new file mode 100644 index 0000000..3f37de6 --- /dev/null +++ b/pingora-load-balancing/src/selection/weighted.rs @@ -0,0 +1,208 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Weighted Selection + +use super::{Backend, BackendIter, BackendSelection, SelectionAlgorithm}; +use fnv::FnvHasher; +use std::collections::BTreeSet; +use std::sync::Arc; + +/// Weighted selection with a given selection algorithm +/// +/// The default algorithm is [FnvHasher]. See [super::algorithms] for more choices. +pub struct Weighted<H = FnvHasher> { + backends: Box<[Backend]>, + // each item is an index to the `backends`, use u16 to save memory, support up to 2^16 backends + weighted: Box<[u16]>, + algorithm: H, +} + +impl<H: SelectionAlgorithm> BackendSelection for Weighted<H> { + type Iter = WeightedIterator<H>; + + fn build(backends: &BTreeSet<Backend>) -> Self { + assert!( + backends.len() <= u16::MAX as usize, + "support up to 2^16 backends" + ); + let backends = Vec::from_iter(backends.iter().cloned()).into_boxed_slice(); + let mut weighted = Vec::with_capacity(backends.len()); + for (index, b) in backends.iter().enumerate() { + for _ in 0..b.weight { + weighted.push(index as u16); + } + } + Weighted { + backends, + weighted: weighted.into_boxed_slice(), + algorithm: H::new(), + } + } + + fn iter(self: &Arc<Self>, key: &[u8]) -> Self::Iter { + WeightedIterator::new(key, self.clone()) + } +} + +/// An iterator over the backends of a [Weighted] selection. +/// +/// See [super::BackendSelection] for more information. +pub struct WeightedIterator<H> { + // the unbounded index seed + index: u64, + backend: Arc<Weighted<H>>, + first: bool, +} + +impl<H: SelectionAlgorithm> WeightedIterator<H> { + /// Constructs a new [WeightedIterator]. + fn new(input: &[u8], backend: Arc<Weighted<H>>) -> Self { + Self { + index: backend.algorithm.next(input), + backend, + first: true, + } + } +} + +impl<H: SelectionAlgorithm> BackendIter for WeightedIterator<H> { + fn next(&mut self) -> Option<&Backend> { + if self.backend.backends.is_empty() { + // short circuit if empty + return None; + } + + if self.first { + // initial hash, select from the weighted list + self.first = false; + let len = self.backend.weighted.len(); + let index = self.backend.weighted[self.index as usize % len]; + Some(&self.backend.backends[index as usize]) + } else { + // fallback, select from the unique list + // deterministically select the next item + self.index = self.backend.algorithm.next(&self.index.to_le_bytes()); + let len = self.backend.backends.len(); + Some(&self.backend.backends[self.index as usize % len]) + } + } +} + +#[cfg(test)] +mod test { + use super::super::algorithms::*; + use super::*; + use std::collections::HashMap; + + #[test] + fn test_fnv() { + let b1 = Backend::new("1.1.1.1:80").unwrap(); + let mut b2 = Backend::new("1.0.0.1:80").unwrap(); + b2.weight = 10; // 10x than the rest + let b3 = Backend::new("1.0.0.255:80").unwrap(); + let backends = BTreeSet::from_iter([b1.clone(), b2.clone(), b3.clone()]); + let hash: Arc<Weighted> = Arc::new(Weighted::build(&backends)); + + // same hash iter over + let mut iter = hash.iter(b"test"); + // first, should be weighted + assert_eq!(iter.next(), Some(&b2)); + // fallbacks, should be uniform, not weighted + assert_eq!(iter.next(), Some(&b2)); + assert_eq!(iter.next(), Some(&b2)); + assert_eq!(iter.next(), Some(&b1)); + assert_eq!(iter.next(), Some(&b3)); + assert_eq!(iter.next(), Some(&b2)); + assert_eq!(iter.next(), Some(&b2)); + assert_eq!(iter.next(), Some(&b1)); + assert_eq!(iter.next(), Some(&b2)); + assert_eq!(iter.next(), Some(&b3)); + assert_eq!(iter.next(), Some(&b1)); + + // different hashes, the first selection should be weighted + let mut iter = hash.iter(b"test1"); + assert_eq!(iter.next(), Some(&b2)); + let mut iter = hash.iter(b"test2"); + assert_eq!(iter.next(), Some(&b2)); + let mut iter = hash.iter(b"test3"); + assert_eq!(iter.next(), Some(&b3)); + let mut iter = hash.iter(b"test4"); + assert_eq!(iter.next(), Some(&b1)); + let mut iter = hash.iter(b"test5"); + assert_eq!(iter.next(), Some(&b2)); + let mut iter = hash.iter(b"test6"); + assert_eq!(iter.next(), Some(&b2)); + let mut iter = hash.iter(b"test7"); + assert_eq!(iter.next(), Some(&b2)); + } + + #[test] + fn test_round_robin() { + let b1 = Backend::new("1.1.1.1:80").unwrap(); + let mut b2 = Backend::new("1.0.0.1:80").unwrap(); + b2.weight = 8; // 8x than the rest + let b3 = Backend::new("1.0.0.255:80").unwrap(); + let backends = BTreeSet::from_iter([b1.clone(), b2.clone(), b3.clone()]); + let hash: Arc<Weighted<RoundRobin>> = Arc::new(Weighted::build(&backends)); + + // same hash iter over + let mut iter = hash.iter(b"test"); + // first, should be weighted + assert_eq!(iter.next(), Some(&b2)); + // fallbacks, should be round robin + assert_eq!(iter.next(), Some(&b3)); + assert_eq!(iter.next(), Some(&b1)); + assert_eq!(iter.next(), Some(&b2)); + assert_eq!(iter.next(), Some(&b3)); + + // round robin, ignoring the hash key + let mut iter = hash.iter(b"test1"); + assert_eq!(iter.next(), Some(&b2)); + let mut iter = hash.iter(b"test1"); + assert_eq!(iter.next(), Some(&b2)); + let mut iter = hash.iter(b"test1"); + assert_eq!(iter.next(), Some(&b2)); + let mut iter = hash.iter(b"test1"); + assert_eq!(iter.next(), Some(&b3)); + let mut iter = hash.iter(b"test1"); + assert_eq!(iter.next(), Some(&b1)); + let mut iter = hash.iter(b"test1"); + assert_eq!(iter.next(), Some(&b2)); + let mut iter = hash.iter(b"test1"); + assert_eq!(iter.next(), Some(&b2)); + } + + #[test] + fn test_random() { + let b1 = Backend::new("1.1.1.1:80").unwrap(); + let mut b2 = Backend::new("1.0.0.1:80").unwrap(); + b2.weight = 8; // 8x than the rest + let b3 = Backend::new("1.0.0.255:80").unwrap(); + let backends = BTreeSet::from_iter([b1.clone(), b2.clone(), b3.clone()]); + let hash: Arc<Weighted<Random>> = Arc::new(Weighted::build(&backends)); + + let mut count = HashMap::new(); + count.insert(b1.clone(), 0); + count.insert(b2.clone(), 0); + count.insert(b3.clone(), 0); + + for _ in 0..100 { + let mut iter = hash.iter(b"test"); + *count.get_mut(iter.next().unwrap()).unwrap() += 1; + } + let b2_count = *count.get(&b2).unwrap(); + assert!((70..=90).contains(&b2_count)); + } +} diff --git a/pingora-lru/Cargo.toml b/pingora-lru/Cargo.toml new file mode 100644 index 0000000..69851c3 --- /dev/null +++ b/pingora-lru/Cargo.toml @@ -0,0 +1,34 @@ +[package] +name = "pingora-lru" +version = "0.1.0" +authors = ["Yuchen Wu <[email protected]>"] +license = "Apache-2.0" +edition = "2021" +repository = "https://github.com/cloudflare/pingora" +categories = ["algorithms", "caching"] +keywords = ["lru", "cache", "pingora"] +description = """ +LRU cache that focuses on memory efficiency, concurrency and persistence. +""" + +# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html +[lib] +name = "pingora_lru" +path = "src/lib.rs" + +[dependencies] +hashbrown = "0" +parking_lot = "0" +arrayvec = "0" +rand = "0" + +[dev-dependencies] +lru = { workspace = true } + +[[bench]] +name = "bench_linked_list" +harness = false + +[[bench]] +name = "bench_lru" +harness = false diff --git a/pingora-lru/LICENSE b/pingora-lru/LICENSE new file mode 100644 index 0000000..d645695 --- /dev/null +++ b/pingora-lru/LICENSE @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/pingora-lru/benches/bench_linked_list.rs b/pingora-lru/benches/bench_linked_list.rs new file mode 100644 index 0000000..7d90da9 --- /dev/null +++ b/pingora-lru/benches/bench_linked_list.rs @@ -0,0 +1,144 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use std::time::Instant; + +fn main() { + const ITEMS: usize = 5_000_000; + + // push bench + + let mut std_list = std::collections::LinkedList::<u64>::new(); + let before = Instant::now(); + for _ in 0..ITEMS { + std_list.push_front(0); + } + let elapsed = before.elapsed(); + println!( + "std linked list push_front total {elapsed:?}, {:?} avg per operation", + elapsed / ITEMS as u32 + ); + + let mut list = pingora_lru::linked_list::LinkedList::with_capacity(ITEMS); + let before = Instant::now(); + for _ in 0..ITEMS { + list.push_head(0); + } + let elapsed = before.elapsed(); + println!( + "pingora linked list push_head total {elapsed:?}, {:?} avg per operation", + elapsed / ITEMS as u32 + ); + + // iter bench + + let mut count = 0; + let before = Instant::now(); + for _ in std_list.iter() { + count += 1; + } + let elapsed = before.elapsed(); + println!( + "std linked list iter total {count} {elapsed:?}, {:?} avg per operation", + elapsed / count as u32 + ); + + let mut count = 0; + let before = Instant::now(); + for _ in list.iter() { + count += 1; + } + let elapsed = before.elapsed(); + println!( + "pingora linked list iter total {count} {elapsed:?}, {:?} avg per operation", + elapsed / count as u32 + ); + + // search bench + + let before = Instant::now(); + for _ in 0..ITEMS { + assert!(!std_list.iter().take(10).any(|v| *v == 1)); + } + let elapsed = before.elapsed(); + println!( + "std linked search first 10 items total {elapsed:?}, {:?} avg per operation", + elapsed / ITEMS as u32 + ); + + let before = Instant::now(); + for _ in 0..ITEMS { + assert!(!list.iter().take(10).any(|v| *v == 1)); + } + let elapsed = before.elapsed(); + println!( + "pingora linked search first 10 items total {elapsed:?}, {:?} avg per operation", + elapsed / ITEMS as u32 + ); + + let before = Instant::now(); + for _ in 0..ITEMS { + assert!(!list.exist_near_head(1, 10)); + } + let elapsed = before.elapsed(); + println!( + "pingora linked optimized search first 10 items total {elapsed:?}, {:?} avg per operation", + elapsed / ITEMS as u32 + ); + + // move node bench + let before = Instant::now(); + for _ in 0..ITEMS { + let value = std_list.pop_back().unwrap(); + std_list.push_front(value); + } + let elapsed = before.elapsed(); + println!( + "std linked list move back to front total {elapsed:?}, {:?} avg per operation", + elapsed / ITEMS as u32 + ); + + let before = Instant::now(); + for _ in 0..ITEMS { + let index = list.tail().unwrap(); + list.promote(index); + } + let elapsed = before.elapsed(); + println!( + "pingora linked list move tail to head total {elapsed:?}, {:?} avg per operation", + elapsed / ITEMS as u32 + ); + + // pop bench + + let before = Instant::now(); + for _ in 0..ITEMS { + std_list.pop_back(); + } + let elapsed = before.elapsed(); + println!( + "std linked list pop_back {elapsed:?}, {:?} avg per operation", + elapsed / ITEMS as u32 + ); + + let before = Instant::now(); + for _ in 0..ITEMS { + list.pop_tail(); + } + let elapsed = before.elapsed(); + println!( + "pingora linked list pop_tail total {elapsed:?}, {:?} avg per operation", + elapsed / ITEMS as u32 + ); +} diff --git a/pingora-lru/benches/bench_lru.rs b/pingora-lru/benches/bench_lru.rs new file mode 100644 index 0000000..25d8bbb --- /dev/null +++ b/pingora-lru/benches/bench_lru.rs @@ -0,0 +1,148 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use rand::distributions::WeightedIndex; +use rand::prelude::*; +use std::sync::Arc; +use std::thread; +use std::time::Instant; + +// Non-uniform distributions, 100 items, 10 of them are 100x more likely to appear +const WEIGHTS: &[usize] = &[ + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 100, 100, 100, + 100, 100, 100, 100, 100, 100, 100, +]; + +const ITERATIONS: usize = 5_000_000; +const THREADS: usize = 8; + +fn main() { + let lru = parking_lot::Mutex::new(lru::LruCache::<u64, ()>::unbounded()); + + let plru = pingora_lru::Lru::<(), 10>::with_capacity(1000, 100); + // populate first, then we bench access/promotion + for i in 0..WEIGHTS.len() { + lru.lock().put(i as u64, ()); + } + for i in 0..WEIGHTS.len() { + plru.admit(i as u64, (), 1); + } + + // single thread + let mut rng = thread_rng(); + let dist = WeightedIndex::new(WEIGHTS).unwrap(); + + let before = Instant::now(); + for _ in 0..ITERATIONS { + lru.lock().get(&(dist.sample(&mut rng) as u64)); + } + let elapsed = before.elapsed(); + println!( + "lru promote total {elapsed:?}, {:?} avg per operation", + elapsed / ITERATIONS as u32 + ); + + let before = Instant::now(); + for _ in 0..ITERATIONS { + plru.promote(dist.sample(&mut rng) as u64); + } + let elapsed = before.elapsed(); + println!( + "pingora lru promote total {elapsed:?}, {:?} avg per operation", + elapsed / ITERATIONS as u32 + ); + + let before = Instant::now(); + for _ in 0..ITERATIONS { + plru.promote_top_n(dist.sample(&mut rng) as u64, 10); + } + let elapsed = before.elapsed(); + println!( + "pingora lru promote_top_10 total {elapsed:?}, {:?} avg per operation", + elapsed / ITERATIONS as u32 + ); + + // concurrent + + let lru = Arc::new(lru); + let mut handlers = vec![]; + for i in 0..THREADS { + let lru = lru.clone(); + let handler = thread::spawn(move || { + let mut rng = thread_rng(); + let dist = WeightedIndex::new(WEIGHTS).unwrap(); + let before = Instant::now(); + for _ in 0..ITERATIONS { + lru.lock().get(&(dist.sample(&mut rng) as u64)); + } + let elapsed = before.elapsed(); + println!( + "lru promote total {elapsed:?}, {:?} avg per operation thread {i}", + elapsed / ITERATIONS as u32 + ); + }); + handlers.push(handler); + } + for thread in handlers { + thread.join().unwrap(); + } + + let plru = Arc::new(plru); + + let mut handlers = vec![]; + for i in 0..THREADS { + let plru = plru.clone(); + let handler = thread::spawn(move || { + let mut rng = thread_rng(); + let dist = WeightedIndex::new(WEIGHTS).unwrap(); + let before = Instant::now(); + for _ in 0..ITERATIONS { + plru.promote(dist.sample(&mut rng) as u64); + } + let elapsed = before.elapsed(); + println!( + "pingora lru promote total {elapsed:?}, {:?} avg per operation thread {i}", + elapsed / ITERATIONS as u32 + ); + }); + handlers.push(handler); + } + for thread in handlers { + thread.join().unwrap(); + } + + let mut handlers = vec![]; + for i in 0..THREADS { + let plru = plru.clone(); + let handler = thread::spawn(move || { + let mut rng = thread_rng(); + let dist = WeightedIndex::new(WEIGHTS).unwrap(); + let before = Instant::now(); + for _ in 0..ITERATIONS { + plru.promote_top_n(dist.sample(&mut rng) as u64, 10); + } + let elapsed = before.elapsed(); + println!( + "pingora lru promote_top_10 total {elapsed:?}, {:?} avg per operation thread {i}", + elapsed / ITERATIONS as u32 + ); + }); + handlers.push(handler); + } + for thread in handlers { + thread.join().unwrap(); + } +} diff --git a/pingora-lru/src/lib.rs b/pingora-lru/src/lib.rs new file mode 100644 index 0000000..a2ddf40 --- /dev/null +++ b/pingora-lru/src/lib.rs @@ -0,0 +1,661 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! An implementation of a LRU that focuses on memory efficiency, concurrency and persistence +//! +//! Features +//! - keys can have different sizes +//! - LRUs are sharded to avoid global locks. +//! - Memory layout and usage are optimized: small and no memory fragmentation + +pub mod linked_list; + +use linked_list::{LinkedList, LinkedListIter}; + +use hashbrown::HashMap; +use parking_lot::RwLock; +use std::sync::atomic::{AtomicUsize, Ordering}; + +/// The LRU with `N` shards +pub struct Lru<T, const N: usize> { + units: [RwLock<LruUnit<T>>; N], + weight: AtomicUsize, + weight_limit: usize, + len: AtomicUsize, + evicted_weight: AtomicUsize, + evicted_len: AtomicUsize, +} + +impl<T, const N: usize> Lru<T, N> { + /// Create an [Lru] with the given weight limit and predicted capacity. + /// + /// The capacity is per shard (for simplicity). So the total capacity = capacity * N + pub fn with_capacity(weight_limit: usize, capacity: usize) -> Self { + // use the unsafe code from ArrayVec just to init the array + let mut units = arrayvec::ArrayVec::<_, N>::new(); + for _ in 0..N { + units.push(RwLock::new(LruUnit::with_capacity(capacity))); + } + Lru { + // we did init all N elements so safe to unwrap + // map_err because unwrap() requires LruUnit to TODO: impl Debug + units: units.into_inner().map_err(|_| "").unwrap(), + weight: AtomicUsize::new(0), + weight_limit, + len: AtomicUsize::new(0), + evicted_weight: AtomicUsize::new(0), + evicted_len: AtomicUsize::new(0), + } + } + + /// Admit the key value to the [Lru] + /// + /// Return the shard index which the asset is added to + pub fn admit(&self, key: u64, data: T, weight: usize) -> usize { + let shard = get_shard(key, N); + let unit = &mut self.units[shard].write(); + + // Make sure weight is positive otherwise eviction won't work + // TODO: Probably should use NonZeroUsize instead + let weight = if weight == 0 { 1 } else { weight }; + + let old_weight = unit.admit(key, data, weight); + if old_weight != weight { + self.weight.fetch_add(weight, Ordering::Relaxed); + if old_weight > 0 { + self.weight.fetch_sub(old_weight, Ordering::Relaxed); + } else { + // Assume old_weight == 0 means a new item is admitted + self.len.fetch_add(1, Ordering::Relaxed); + } + } + shard + } + + /// Promote the key to the head of the LRU + /// + /// Return `true` if the key exist. + pub fn promote(&self, key: u64) -> bool { + self.units[get_shard(key, N)].write().access(key) + } + + /// Promote to the top n of the LRU + /// + /// This function is a bit more efficient in terms of reducing lock contention because it + /// will acquire a write lock only if the key is outside top n but only acquires a read lock + /// when the key is already in the top n. + /// + /// Return false if the item doesn't exist + pub fn promote_top_n(&self, key: u64, top: usize) -> bool { + let unit = &self.units[get_shard(key, N)]; + if !unit.read().need_promote(key, top) { + return true; + } + unit.write().access(key) + } + + /// Evict at most one item from the given shard + /// + /// Return the evicted asset and its size if there is anything to evict + pub fn evict_shard(&self, shard: u64) -> Option<(T, usize)> { + let evicted = self.units[get_shard(shard, N)].write().evict(); + if let Some((_, weight)) = evicted.as_ref() { + self.weight.fetch_sub(*weight, Ordering::Relaxed); + self.len.fetch_sub(1, Ordering::Relaxed); + self.evicted_weight.fetch_add(*weight, Ordering::Relaxed); + self.evicted_len.fetch_add(1, Ordering::Relaxed); + } + evicted + } + + /// Evict the [Lru] until the overall weight is below the limit. + /// + /// Return a list of evicted items. + /// + /// The evicted items are randomly selected from all the shards. + pub fn evict_to_limit(&self) -> Vec<(T, usize)> { + let mut evicted = vec![]; + let mut initial_weight = self.weight(); + let mut shard_seed = rand::random(); // start from a random shard + let mut empty_shard = 0; + + // Entries can be admitted or removed from the LRU by others during the loop below + // Track initial_weight not to over evict due to entries admitted after the loop starts + // self.weight() is also used not to over evict due to some entries are removed by others + while initial_weight > self.weight_limit + && self.weight() > self.weight_limit + && empty_shard < N + { + if let Some(i) = self.evict_shard(shard_seed) { + initial_weight -= i.1; + evicted.push(i) + } else { + empty_shard += 1; + } + // move on to the next shard + shard_seed += 1; + } + evicted + } + + /// Remove the given asset + pub fn remove(&self, key: u64) -> Option<(T, usize)> { + let removed = self.units[get_shard(key, N)].write().remove(key); + if let Some((_, weight)) = removed.as_ref() { + self.weight.fetch_sub(*weight, Ordering::Relaxed); + self.len.fetch_sub(1, Ordering::Relaxed); + } + removed + } + + /// Insert the item to the tail of this LRU + /// + /// Useful to recreate an LRU in most-to-least order + pub fn insert_tail(&self, key: u64, data: T, weight: usize) -> bool { + if self.units[get_shard(key, N)] + .write() + .insert_tail(key, data, weight) + { + self.weight.fetch_add(weight, Ordering::Relaxed); + self.len.fetch_add(1, Ordering::Relaxed); + true + } else { + false + } + } + + /// Check existence of a key without changing the order in LRU + pub fn peek(&self, key: u64) -> bool { + self.units[get_shard(key, N)].read().peek(key).is_some() + } + + /// Return the current total weight + pub fn weight(&self) -> usize { + self.weight.load(Ordering::Relaxed) + } + + /// Return the total weight of items evicted from this [Lru]. + pub fn evicted_weight(&self) -> usize { + self.evicted_weight.load(Ordering::Relaxed) + } + + /// Return the total count of items evicted from this [Lru]. + pub fn evicted_len(&self) -> usize { + self.evicted_len.load(Ordering::Relaxed) + } + + /// The number of items inside this [Lru]. + #[allow(clippy::len_without_is_empty)] + pub fn len(&self) -> usize { + self.len.load(Ordering::Relaxed) + } + + /// Scan a shard with the given function F + pub fn iter_for_each<F>(&self, shard: usize, f: F) + where + F: FnMut((&T, usize)), + { + assert!(shard < N); + self.units[shard].read().iter().for_each(f); + } + + /// Get the total number of shards + pub const fn shards(&self) -> usize { + N + } + + /// Get the number of items inside a shard + pub fn shard_len(&self, shard: usize) -> usize { + self.units[shard].read().len() + } +} + +#[inline] +fn get_shard(key: u64, n_shards: usize) -> usize { + (key % n_shards as u64) as usize +} + +struct LruNode<T> { + data: T, + list_index: usize, + weight: usize, +} + +struct LruUnit<T> { + lookup_table: HashMap<u64, Box<LruNode<T>>>, + order: LinkedList, + used_weight: usize, +} + +impl<T> LruUnit<T> { + fn with_capacity(capacity: usize) -> Self { + LruUnit { + lookup_table: HashMap::with_capacity(capacity), + order: LinkedList::with_capacity(capacity), + used_weight: 0, + } + } + + pub fn peek(&self, key: u64) -> Option<&T> { + self.lookup_table.get(&key).map(|n| &n.data) + } + + // admin into LRU, return old weight if there was any + pub fn admit(&mut self, key: u64, data: T, weight: usize) -> usize { + if let Some(node) = self.lookup_table.get_mut(&key) { + let old_weight = node.weight; + if weight != old_weight { + self.used_weight += weight; + self.used_weight -= old_weight; + node.weight = weight; + } + node.data = data; + self.order.promote(node.list_index); + return old_weight; + } + self.used_weight += weight; + let list_index = self.order.push_head(key); + let node = Box::new(LruNode { + data, + list_index, + weight, + }); + self.lookup_table.insert(key, node); + 0 + } + + pub fn access(&mut self, key: u64) -> bool { + if let Some(node) = self.lookup_table.get(&key) { + self.order.promote(node.list_index); + true + } else { + false + } + } + + // Check if a key is already in the top n most recently used nodes. + // this is a heuristic to reduce write, which requires exclusive locks, for promotion, + // especially on very populate nodes + // NOTE: O(n) search here so limit needs to be small + pub fn need_promote(&self, key: u64, limit: usize) -> bool { + !self.order.exist_near_head(key, limit) + } + + // try to evict 1 node + pub fn evict(&mut self) -> Option<(T, usize)> { + self.order.pop_tail().map(|key| { + // unwrap is safe because we always insert in both the hashtable and the list + let node = self.lookup_table.remove(&key).unwrap(); + self.used_weight -= node.weight; + (node.data, node.weight) + }) + } + // TODO: scan the tail up to K elements to decide which ones to evict + + pub fn remove(&mut self, key: u64) -> Option<(T, usize)> { + self.lookup_table.remove(&key).map(|node| { + let list_key = self.order.remove(node.list_index); + assert_eq!(key, list_key); + (node.data, node.weight) + }) + } + + pub fn insert_tail(&mut self, key: u64, data: T, weight: usize) -> bool { + if self.lookup_table.contains_key(&key) { + return false; + } + let list_index = self.order.push_tail(key); + let node = Box::new(LruNode { + data, + list_index, + weight, + }); + self.lookup_table.insert(key, node); + true + } + + pub fn len(&self) -> usize { + assert_eq!(self.lookup_table.len(), self.order.len()); + self.lookup_table.len() + } + + #[cfg(test)] + pub fn used_weight(&self) -> usize { + self.used_weight + } + + pub fn iter(&self) -> LruUnitIter<'_, T> { + LruUnitIter { + unit: self, + iter: self.order.iter(), + } + } +} + +struct LruUnitIter<'a, T> { + unit: &'a LruUnit<T>, + iter: LinkedListIter<'a>, +} + +impl<'a, T> Iterator for LruUnitIter<'a, T> { + type Item = (&'a T, usize); + + fn next(&mut self) -> Option<Self::Item> { + self.iter.next().map(|key| { + // safe because we always items in table and list are always 1:1 + let node = self.unit.lookup_table.get(key).unwrap(); + (&node.data, node.weight) + }) + } + + fn size_hint(&self) -> (usize, Option<usize>) { + self.iter.size_hint() + } +} + +impl<'a, T> DoubleEndedIterator for LruUnitIter<'a, T> { + fn next_back(&mut self) -> Option<Self::Item> { + self.iter.next_back().map(|key| { + // safe because we always items in table and list are always 1:1 + let node = self.unit.lookup_table.get(key).unwrap(); + (&node.data, node.weight) + }) + } +} + +#[cfg(test)] +mod test_lru { + use super::*; + + fn assert_lru<T: Copy + PartialEq + std::fmt::Debug, const N: usize>( + lru: &Lru<T, N>, + values: &[T], + shard: usize, + ) { + let mut list_values = vec![]; + lru.iter_for_each(shard, |(v, _)| list_values.push(*v)); + assert_eq!(values, &list_values) + } + + #[test] + fn test_admit() { + let lru = Lru::<_, 2>::with_capacity(30, 10); + assert_eq!(lru.len(), 0); + + lru.admit(2, 2, 3); + assert_eq!(lru.len(), 1); + assert_eq!(lru.weight(), 3); + + lru.admit(2, 2, 1); + assert_eq!(lru.len(), 1); + assert_eq!(lru.weight(), 1); + + lru.admit(2, 2, 2); // admit again with different weight + assert_eq!(lru.len(), 1); + assert_eq!(lru.weight(), 2); + + lru.admit(3, 3, 3); + lru.admit(4, 4, 4); + + assert_eq!(lru.weight(), 2 + 3 + 4); + assert_eq!(lru.len(), 3); + } + + #[test] + fn test_promote() { + let lru = Lru::<_, 2>::with_capacity(30, 10); + + lru.admit(2, 2, 2); + lru.admit(3, 3, 3); + lru.admit(4, 4, 4); + lru.admit(5, 5, 5); + lru.admit(6, 6, 6); + assert_lru(&lru, &[6, 4, 2], 0); + assert_lru(&lru, &[5, 3], 1); + + assert!(lru.promote(3)); + assert_lru(&lru, &[3, 5], 1); + assert!(lru.promote(3)); + assert_lru(&lru, &[3, 5], 1); + + assert!(lru.promote(2)); + assert_lru(&lru, &[2, 6, 4], 0); + + assert!(!lru.promote(7)); // 7 doesn't exist + assert_lru(&lru, &[2, 6, 4], 0); + assert_lru(&lru, &[3, 5], 1); + + // promote 2 to top 1, already there + assert!(lru.promote_top_n(2, 1)); + assert_lru(&lru, &[2, 6, 4], 0); + + // promote 4 to top 3, already there + assert!(lru.promote_top_n(4, 3)); + assert_lru(&lru, &[2, 6, 4], 0); + + // promote 4 to top 2 + assert!(lru.promote_top_n(4, 2)); + assert_lru(&lru, &[4, 2, 6], 0); + + // promote 2 to top 1 + assert!(lru.promote_top_n(2, 1)); + assert_lru(&lru, &[2, 4, 6], 0); + + assert!(!lru.promote_top_n(7, 1)); // 7 doesn't exist + } + + #[test] + fn test_evict() { + let lru = Lru::<_, 2>::with_capacity(14, 10); + + // same weight to make the random eviction less random + lru.admit(2, 2, 2); + lru.admit(3, 3, 2); + lru.admit(4, 4, 4); + lru.admit(5, 5, 4); + lru.admit(6, 6, 2); + lru.admit(7, 7, 2); + + assert_lru(&lru, &[6, 4, 2], 0); + assert_lru(&lru, &[7, 5, 3], 1); + + assert_eq!(lru.weight(), 16); + assert_eq!(lru.len(), 6); + + let evicted = lru.evict_to_limit(); + assert_eq!(lru.weight(), 14); + assert_eq!(lru.len(), 5); + assert_eq!(lru.evicted_weight(), 2); + assert_eq!(lru.evicted_len(), 1); + assert_eq!(evicted.len(), 1); + assert_eq!(evicted[0].1, 2); //weight + assert!(evicted[0].0 == 2 || evicted[0].0 == 3); //either 2 or 3 are evicted + + let lru = Lru::<_, 2>::with_capacity(6, 10); + + // same weight random eviction less random + lru.admit(2, 2, 2); + lru.admit(3, 3, 2); + lru.admit(4, 4, 2); + lru.admit(5, 5, 2); + lru.admit(6, 6, 2); + lru.admit(7, 7, 2); + assert_eq!(lru.weight(), 12); + assert_eq!(lru.len(), 6); + + let evicted = lru.evict_to_limit(); + // NOTE: there is a low chance this test would fail see the TODO in evict_to_limit + assert_eq!(lru.weight(), 6); + assert_eq!(lru.len(), 3); + assert_eq!(lru.evicted_weight(), 6); + assert_eq!(lru.evicted_len(), 3); + assert_eq!(evicted.len(), 3); + } + + #[test] + fn test_remove() { + let lru = Lru::<_, 2>::with_capacity(30, 10); + lru.admit(2, 2, 2); + lru.admit(3, 3, 3); + lru.admit(4, 4, 4); + lru.admit(5, 5, 5); + lru.admit(6, 6, 6); + + assert_eq!(lru.weight(), 2 + 3 + 4 + 5 + 6); + assert_eq!(lru.len(), 5); + assert_lru(&lru, &[6, 4, 2], 0); + assert_lru(&lru, &[5, 3], 1); + + let node = lru.remove(6).unwrap(); + assert_eq!(node.0, 6); // data + assert_eq!(node.1, 6); // weight + assert_eq!(lru.weight(), 2 + 3 + 4 + 5); + assert_eq!(lru.len(), 4); + assert_lru(&lru, &[4, 2], 0); + + let node = lru.remove(3).unwrap(); + assert_eq!(node.0, 3); // data + assert_eq!(node.1, 3); // weight + assert_eq!(lru.weight(), 2 + 4 + 5); + assert_eq!(lru.len(), 3); + assert_lru(&lru, &[5], 1); + + assert!(lru.remove(7).is_none()); + } + + #[test] + fn test_peek() { + let lru = Lru::<_, 2>::with_capacity(30, 10); + lru.admit(2, 2, 2); + lru.admit(3, 3, 3); + lru.admit(4, 4, 4); + + assert!(lru.peek(4)); + assert!(lru.peek(3)); + assert!(lru.peek(2)); + + assert_lru(&lru, &[4, 2], 0); + assert_lru(&lru, &[3], 1); + } + + #[test] + fn test_insert_tail() { + let lru = Lru::<_, 2>::with_capacity(30, 10); + lru.admit(2, 2, 2); + lru.admit(3, 3, 3); + lru.admit(4, 4, 4); + lru.admit(5, 5, 5); + lru.admit(6, 6, 6); + + assert_eq!(lru.weight(), 2 + 3 + 4 + 5 + 6); + assert_eq!(lru.len(), 5); + assert_lru(&lru, &[6, 4, 2], 0); + assert_lru(&lru, &[5, 3], 1); + + assert!(lru.insert_tail(7, 7, 7)); + assert_eq!(lru.weight(), 2 + 3 + 4 + 5 + 6 + 7); + assert_eq!(lru.len(), 6); + assert_lru(&lru, &[5, 3, 7], 1); + + // ignore existing ones + assert!(!lru.insert_tail(6, 6, 7)); + } +} + +#[cfg(test)] +mod test_lru_unit { + use super::*; + + fn assert_lru<T: Copy + PartialEq + std::fmt::Debug>(lru: &LruUnit<T>, values: &[T]) { + let list_values: Vec<_> = lru.iter().map(|(v, _)| *v).collect(); + assert_eq!(values, &list_values) + } + + #[test] + fn test_admit() { + let mut lru = LruUnit::with_capacity(10); + assert_eq!(lru.len(), 0); + assert!(lru.peek(0).is_none()); + + lru.admit(2, 2, 1); + assert_eq!(lru.len(), 1); + assert_eq!(lru.peek(2).unwrap(), &2); + assert_eq!(lru.used_weight(), 1); + + lru.admit(2, 2, 2); // admit again with different weight + assert_eq!(lru.used_weight(), 2); + + lru.admit(3, 3, 3); + lru.admit(4, 4, 4); + + assert_eq!(lru.used_weight(), 2 + 3 + 4); + assert_lru(&lru, &[4, 3, 2]); + } + + #[test] + fn test_access() { + let mut lru = LruUnit::with_capacity(10); + + lru.admit(2, 2, 2); + lru.admit(3, 3, 3); + lru.admit(4, 4, 4); + assert_lru(&lru, &[4, 3, 2]); + + assert!(lru.access(3)); + assert_lru(&lru, &[3, 4, 2]); + assert!(lru.access(3)); + assert_lru(&lru, &[3, 4, 2]); + assert!(lru.access(2)); + assert_lru(&lru, &[2, 3, 4]); + + assert!(!lru.access(5)); // 5 doesn't exist + assert_lru(&lru, &[2, 3, 4]); + + assert!(!lru.need_promote(2, 1)); + assert!(lru.need_promote(3, 1)); + assert!(!lru.need_promote(4, 9999)); + } + + #[test] + fn test_evict() { + let mut lru = LruUnit::with_capacity(10); + + lru.admit(2, 2, 2); + lru.admit(3, 3, 3); + lru.admit(4, 4, 4); + assert_lru(&lru, &[4, 3, 2]); + + assert!(lru.access(3)); + assert!(lru.access(3)); + assert!(lru.access(2)); + assert_lru(&lru, &[2, 3, 4]); + + assert_eq!(lru.used_weight(), 2 + 3 + 4); + assert_eq!(lru.evict(), Some((4, 4))); + assert_eq!(lru.used_weight(), 2 + 3); + assert_lru(&lru, &[2, 3]); + + assert_eq!(lru.evict(), Some((3, 3))); + assert_eq!(lru.used_weight(), 2); + assert_lru(&lru, &[2]); + + assert_eq!(lru.evict(), Some((2, 2))); + assert_eq!(lru.used_weight(), 0); + assert_lru(&lru, &[]); + + assert_eq!(lru.evict(), None); + assert_eq!(lru.used_weight(), 0); + assert_lru(&lru, &[]); + } +} diff --git a/pingora-lru/src/linked_list.rs b/pingora-lru/src/linked_list.rs new file mode 100644 index 0000000..7664aaf --- /dev/null +++ b/pingora-lru/src/linked_list.rs @@ -0,0 +1,439 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// Can't tell people you know Rust until you write a (doubly) linked list + +//! Doubly linked list +//! +//! Features +//! - Preallocate consecutive memory, no memory fragmentation. +//! - No shrink function: for Lru cache that grows to a certain size but never shrink. +//! - Relatively fast and efficient. + +// inspired by clru::FixedSizeList (Élie!) + +use std::mem::replace; + +type Index = usize; +const NULL: Index = usize::MAX; +const HEAD: Index = 0; +const TAIL: Index = 1; +const OFFSET: usize = 2; + +#[derive(Debug)] +struct Node { + pub(crate) prev: Index, + pub(crate) next: Index, + pub(crate) data: u64, +} + +// Functionally the same as vec![head, tail, data_nodes...] where head & tail are fixed and +// the rest data nodes can expand. Both head and tail can be accessed faster than using index +struct Nodes { + // we use these sentinel nodes to guard the head and tail of the list so that list + // manipulation is simpler (fewer if-else) + head: Node, + tail: Node, + data_nodes: Vec<Node>, +} + +impl Nodes { + fn with_capacity(capacity: usize) -> Self { + Nodes { + head: Node { + prev: NULL, + next: TAIL, + data: 0, + }, + tail: Node { + prev: HEAD, + next: NULL, + data: 0, + }, + data_nodes: Vec::with_capacity(capacity), + } + } + + fn new_node(&mut self, data: u64) -> Index { + const VEC_EXP_GROWTH_CAP: usize = 65536; + let node = Node { + prev: NULL, + next: NULL, + data, + }; + // Constrain the growth of vec: vec always double its capacity when it needs to grow + // It could waste too much memory when it is already very large. + // Here we limit the memory waste to 10% onces it grows beyond the cap. + // The amortized growth cost is O(n) beyond the max of the initial reserved capacity and + // the cap. But this list is for limited sized LRU and we recycle released node, so + // hopefully insertions are rare beyond certain sizes + if self.data_nodes.capacity() > VEC_EXP_GROWTH_CAP + && self.data_nodes.capacity() - self.data_nodes.len() < 2 + { + self.data_nodes + .reserve_exact(self.data_nodes.capacity() / 10) + } + self.data_nodes.push(node); + self.data_nodes.len() - 1 + OFFSET + } + + fn len(&self) -> usize { + self.data_nodes.len() + } + + fn head(&self) -> &Node { + &self.head + } + + fn tail(&self) -> &Node { + &self.tail + } +} + +impl std::ops::Index<usize> for Nodes { + type Output = Node; + + fn index(&self, index: usize) -> &Self::Output { + match index { + HEAD => &self.head, + TAIL => &self.tail, + _ => &self.data_nodes[index - OFFSET], + } + } +} + +impl std::ops::IndexMut<usize> for Nodes { + fn index_mut(&mut self, index: usize) -> &mut Self::Output { + match index { + HEAD => &mut self.head, + TAIL => &mut self.tail, + _ => &mut self.data_nodes[index - OFFSET], + } + } +} + +/// Doubly linked list +pub struct LinkedList { + nodes: Nodes, + free: Vec<Index>, // to keep track of freed node to be used again +} +// Panic when index used as parameters are invalid +// Index returned by push_* are always valid. +impl LinkedList { + /// Create a [LinkedList] with the given predicted capacity. + pub fn with_capacity(capacity: usize) -> Self { + LinkedList { + nodes: Nodes::with_capacity(capacity), + free: vec![], + } + } + + // Allocate a new node and return its index + // NOTE: this node is leaked if not used by caller + fn new_node(&mut self, data: u64) -> Index { + if let Some(index) = self.free.pop() { + // have a free node, update its payload and return its index + self.nodes[index].data = data; + index + } else { + // create a new node + self.nodes.new_node(data) + } + } + + /// How many nodes in the list + #[allow(clippy::len_without_is_empty)] + pub fn len(&self) -> usize { + // exclude the 2 sentinels + self.nodes.len() - self.free.len() + } + + fn valid_index(&self, index: Index) -> bool { + index != HEAD && index != TAIL && index < self.nodes.len() + OFFSET + // TODO: check node prev/next not NULL + // TODO: debug_check index not in self.free + } + + fn node(&self, index: Index) -> Option<&Node> { + if self.valid_index(index) { + Some(&self.nodes[index]) + } else { + None + } + } + + /// Peek into the list + pub fn peek(&self, index: Index) -> Option<u64> { + self.node(index).map(|n| n.data) + } + + // safe because index still needs to be in the range of the vec + fn peek_unchecked(&self, index: Index) -> &u64 { + &self.nodes[index].data + } + + /// Whether the value exists closed (up to search_limit nodes) to the head of the list + // It can be done via iter().take().find() but this is cheaper + pub fn exist_near_head(&self, value: u64, search_limit: usize) -> bool { + let mut current_node = HEAD; + for _ in 0..search_limit { + current_node = self.nodes[current_node].next; + if current_node == TAIL { + return false; + } + if self.nodes[current_node].data == value { + return true; + } + } + false + } + + // put a node right after the node at `at` + fn insert_after(&mut self, node_index: Index, at: Index) { + assert!(at != TAIL && at != node_index); // can't insert after tail or to itself + + let next = replace(&mut self.nodes[at].next, node_index); + + let node = &mut self.nodes[node_index]; + node.next = next; + node.prev = at; + + self.nodes[next].prev = node_index; + } + + /// Put the data at the head of the list. + pub fn push_head(&mut self, data: u64) -> Index { + let new_node_index = self.new_node(data); + self.insert_after(new_node_index, HEAD); + new_node_index + } + + /// Put the data at the tail of the list. + pub fn push_tail(&mut self, data: u64) -> Index { + let new_node_index = self.new_node(data); + self.insert_after(new_node_index, self.nodes.tail().prev); + new_node_index + } + + // lift the node out of the linked list, to either delete it or insert to another place + // NOTE: the node is leaked if not used by the caller + fn lift(&mut self, index: Index) -> u64 { + // can't touch the sentinels + assert!(index != HEAD && index != TAIL); + + let node = &mut self.nodes[index]; + + // zero out the pointers, useful in case we try to access a freed node + let prev = replace(&mut node.prev, NULL); + let next = replace(&mut node.next, NULL); + let data = node.data; + + // make sure we are accessing a node in the list, not freed already + assert!(prev != NULL && next != NULL); + + self.nodes[prev].next = next; + self.nodes[next].prev = prev; + + data + } + + /// Remove the node at the index, and return the value + pub fn remove(&mut self, index: Index) -> u64 { + self.free.push(index); + self.lift(index) + } + + /// Remove the tail of the list + pub fn pop_tail(&mut self) -> Option<u64> { + let data_tail = self.nodes.tail().prev; + if data_tail == HEAD { + None // empty list + } else { + Some(self.remove(data_tail)) + } + } + + /// Put the node at the index to the head + pub fn promote(&mut self, index: Index) { + if self.nodes.head().next == index { + return; // already head + } + self.lift(index); + self.insert_after(index, HEAD); + } + + fn next(&self, index: Index) -> Index { + self.nodes[index].next + } + + fn prev(&self, index: Index) -> Index { + self.nodes[index].prev + } + + /// Get the head of the list + pub fn head(&self) -> Option<Index> { + let data_head = self.nodes.head().next; + if data_head == TAIL { + None + } else { + Some(data_head) + } + } + + /// Get the tail of the list + pub fn tail(&self) -> Option<Index> { + let data_tail = self.nodes.tail().prev; + if data_tail == HEAD { + None + } else { + Some(data_tail) + } + } + + /// Iterate over the list + pub fn iter(&self) -> LinkedListIter<'_> { + LinkedListIter { + list: self, + head: HEAD, + tail: TAIL, + len: self.len(), + } + } +} + +/// The iter over the list +pub struct LinkedListIter<'a> { + list: &'a LinkedList, + head: Index, + tail: Index, + len: usize, +} + +impl<'a> Iterator for LinkedListIter<'a> { + type Item = &'a u64; + + fn next(&mut self) -> Option<Self::Item> { + let next_index = self.list.next(self.head); + if next_index == TAIL || next_index == NULL { + None + } else { + self.head = next_index; + self.len -= 1; + Some(self.list.peek_unchecked(next_index)) + } + } + + fn size_hint(&self) -> (usize, Option<usize>) { + (self.len, Some(self.len)) + } +} + +impl<'a> DoubleEndedIterator for LinkedListIter<'a> { + fn next_back(&mut self) -> Option<Self::Item> { + let prev_index = self.list.prev(self.tail); + if prev_index == HEAD || prev_index == NULL { + None + } else { + self.tail = prev_index; + self.len -= 1; + Some(self.list.peek_unchecked(prev_index)) + } + } +} + +#[cfg(test)] +mod test { + use super::*; + + // assert the list is the same as `values` + fn assert_list(list: &LinkedList, values: &[u64]) { + let list_values: Vec<_> = list.iter().copied().collect(); + assert_eq!(values, &list_values) + } + + fn assert_list_reverse(list: &LinkedList, values: &[u64]) { + let list_values: Vec<_> = list.iter().rev().copied().collect(); + assert_eq!(values, &list_values) + } + + #[test] + fn test_insert() { + let mut list = LinkedList::with_capacity(10); + assert_eq!(list.len(), 0); + assert!(list.node(2).is_none()); + assert_eq!(list.head(), None); + assert_eq!(list.tail(), None); + + let index1 = list.push_head(2); + assert_eq!(list.len(), 1); + assert_eq!(list.peek(index1).unwrap(), 2); + + let index2 = list.push_head(3); + assert_eq!(list.head(), Some(index2)); + assert_eq!(list.tail(), Some(index1)); + + let index3 = list.push_tail(4); + assert_eq!(list.head(), Some(index2)); + assert_eq!(list.tail(), Some(index3)); + + assert_list(&list, &[3, 2, 4]); + assert_list_reverse(&list, &[4, 2, 3]); + } + + #[test] + fn test_pop() { + let mut list = LinkedList::with_capacity(10); + list.push_head(2); + list.push_head(3); + list.push_tail(4); + assert_list(&list, &[3, 2, 4]); + assert_eq!(list.pop_tail(), Some(4)); + assert_eq!(list.pop_tail(), Some(2)); + assert_eq!(list.pop_tail(), Some(3)); + assert_eq!(list.pop_tail(), None); + } + + #[test] + fn test_promote() { + let mut list = LinkedList::with_capacity(10); + let index2 = list.push_head(2); + let index3 = list.push_head(3); + let index4 = list.push_tail(4); + assert_list(&list, &[3, 2, 4]); + + list.promote(index3); + assert_list(&list, &[3, 2, 4]); + + list.promote(index2); + assert_list(&list, &[2, 3, 4]); + + list.promote(index4); + assert_list(&list, &[4, 2, 3]); + } + + #[test] + fn test_exist_near_head() { + let mut list = LinkedList::with_capacity(10); + list.push_head(2); + list.push_head(3); + list.push_tail(4); + assert_list(&list, &[3, 2, 4]); + + assert!(!list.exist_near_head(4, 1)); + assert!(!list.exist_near_head(4, 2)); + assert!(list.exist_near_head(4, 3)); + assert!(list.exist_near_head(4, 4)); + assert!(list.exist_near_head(4, 99999)); + } +} diff --git a/pingora-memory-cache/Cargo.toml b/pingora-memory-cache/Cargo.toml new file mode 100644 index 0000000..d51268b --- /dev/null +++ b/pingora-memory-cache/Cargo.toml @@ -0,0 +1,27 @@ +[package] +name = "pingora-memory-cache" +version = "0.1.0" +authors = ["Yuchen Wu <[email protected]>"] +license = "Apache-2.0" +edition = "2021" +repository = "https://github.com/cloudflare/pingora" +categories = ["algorithms", "caching"] +keywords = ["async", "cache", "pingora"] +description = """ +An async in-memory cache with cache stampede protection. +""" + +# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html +[lib] +name = "pingora_memory_cache" +path = "src/lib.rs" + +[dependencies] +TinyUFO = { version = "0.1.0", path = "../tinyufo" } +ahash = { workspace = true } +tokio = { workspace = true, features = ["sync"] } +async-trait = { workspace = true } +pingora-error = { version = "0.1.0", path = "../pingora-error" } +log = { workspace = true } +parking_lot = "0" +pingora-timeout = { version = "0.1.0", path = "../pingora-timeout" } diff --git a/pingora-memory-cache/LICENSE b/pingora-memory-cache/LICENSE new file mode 100644 index 0000000..d645695 --- /dev/null +++ b/pingora-memory-cache/LICENSE @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/pingora-memory-cache/src/lib.rs b/pingora-memory-cache/src/lib.rs new file mode 100644 index 0000000..f5c037c --- /dev/null +++ b/pingora-memory-cache/src/lib.rs @@ -0,0 +1,249 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use ahash::RandomState; +use std::hash::Hash; +use std::marker::PhantomData; +use std::time::{Duration, Instant}; + +use tinyufo::TinyUfo; + +mod read_through; +pub use read_through::{Lookup, MultiLookup, RTCache}; + +#[derive(Debug, PartialEq, Eq)] +/// [CacheStatus] indicates the response type for a query. +pub enum CacheStatus { + /// The key was found in cache + Hit, + /// The key was not found. + Miss, + /// The key was found but it was expired. + Expired, + /// The key was not initially found but was found after awaiting a lock. + LockHit, +} + +impl CacheStatus { + /// Return the string representation for [CacheStatus]. + pub fn as_str(&self) -> &str { + match self { + Self::Hit => "hit", + Self::Miss => "miss", + Self::Expired => "expired", + Self::LockHit => "lock_hit", + } + } +} + +#[derive(Debug, Clone)] +struct Node<T: Clone> { + pub value: T, + expire_on: Option<Instant>, +} + +impl<T: Clone> Node<T> { + fn new(value: T, ttl: Option<Duration>) -> Self { + let expire_on = match ttl { + Some(t) => Instant::now().checked_add(t), + None => None, + }; + Node { value, expire_on } + } + + fn will_expire_at(&self, time: &Instant) -> bool { + match self.expire_on.as_ref() { + Some(t) => t <= time, + None => false, + } + } + + fn is_expired(&self) -> bool { + self.will_expire_at(&Instant::now()) + } +} + +/// A high performant in-memory cache with S3-FIFO + TinyLFU +pub struct MemoryCache<K: Hash, T: Clone> { + store: TinyUfo<u64, Node<T>>, + _key_type: PhantomData<K>, + pub(crate) hasher: RandomState, +} + +impl<K: Hash, T: Clone + Send + Sync> MemoryCache<K, T> { + /// Create a new [MemoryCache] with the given size. + pub fn new(size: usize) -> Self { + MemoryCache { + store: TinyUfo::new(size, size), + _key_type: PhantomData, + hasher: RandomState::new(), + } + } + + /// Fetch the key and return its value in addition to a [CacheStatus]. + pub fn get(&self, key: &K) -> (Option<T>, CacheStatus) { + let hashed_key = self.hasher.hash_one(key); + + if let Some(n) = self.store.get(&hashed_key) { + if !n.is_expired() { + (Some(n.value), CacheStatus::Hit) + } else { + // TODO: consider returning the staled value + (None, CacheStatus::Expired) + } + } else { + (None, CacheStatus::Miss) + } + } + + /// Insert a key and value pair with an optional TTL into the cache. + /// + /// An item with zero TTL of zero not inserted. + pub fn put(&self, key: &K, value: T, ttl: Option<Duration>) { + if let Some(t) = ttl { + if t.is_zero() { + return; + } + } + let hashed_key = self.hasher.hash_one(key); + let node = Node::new(value, ttl); + // weight is always 1 for now + self.store.put(hashed_key, node, 1); + } + + pub(crate) fn force_put(&self, key: &K, value: T, ttl: Option<Duration>) { + if let Some(t) = ttl { + if t.is_zero() { + return; + } + } + let hashed_key = self.hasher.hash_one(key); + let node = Node::new(value, ttl); + // weight is always 1 for now + self.store.force_put(hashed_key, node, 1); + } + + /// This is equivalent to [MemoryCache::get] but for an arbitrary amount of keys. + pub fn multi_get<'a, I>(&self, keys: I) -> Vec<(Option<T>, CacheStatus)> + where + I: Iterator<Item = &'a K>, + K: 'a, + { + let mut resp = Vec::with_capacity(keys.size_hint().0); + for key in keys { + resp.push(self.get(key)); + } + resp + } + + /// Same as [MemoryCache::multi_get] but returns the keys that are missing from the cache. + pub fn multi_get_with_miss<'a, I>(&self, keys: I) -> (Vec<(Option<T>, CacheStatus)>, Vec<&'a K>) + where + I: Iterator<Item = &'a K>, + K: 'a, + { + let mut resp = Vec::with_capacity(keys.size_hint().0); + let mut missed = Vec::with_capacity(keys.size_hint().0 / 2); + for key in keys { + let (lookup, cache_status) = self.get(key); + if lookup.is_none() { + missed.push(key); + } + resp.push((lookup, cache_status)); + } + (resp, missed) + } + + // TODO: evict expired first +} + +#[cfg(test)] +mod tests { + use super::*; + use std::thread::sleep; + + #[test] + fn test_get() { + let cache: MemoryCache<i32, ()> = MemoryCache::new(10); + let (res, hit) = cache.get(&1); + assert_eq!(res, None); + assert_eq!(hit, CacheStatus::Miss); + } + + #[test] + fn test_put_get() { + let cache: MemoryCache<i32, i32> = MemoryCache::new(10); + let (res, hit) = cache.get(&1); + assert_eq!(res, None); + assert_eq!(hit, CacheStatus::Miss); + cache.put(&1, 2, None); + let (res, hit) = cache.get(&1); + assert_eq!(res.unwrap(), 2); + assert_eq!(hit, CacheStatus::Hit); + } + + #[test] + fn test_get_expired() { + let cache: MemoryCache<i32, i32> = MemoryCache::new(10); + let (res, hit) = cache.get(&1); + assert_eq!(res, None); + assert_eq!(hit, CacheStatus::Miss); + cache.put(&1, 2, Some(Duration::from_secs(1))); + sleep(Duration::from_millis(1100)); + let (res, hit) = cache.get(&1); + assert_eq!(res, None); + assert_eq!(hit, CacheStatus::Expired); + } + + #[test] + fn test_eviction() { + let cache: MemoryCache<i32, i32> = MemoryCache::new(2); + cache.put(&1, 2, None); + cache.put(&2, 4, None); + cache.put(&3, 6, None); + let (res, hit) = cache.get(&1); + assert_eq!(res, None); + assert_eq!(hit, CacheStatus::Miss); + let (res, hit) = cache.get(&2); + assert_eq!(res.unwrap(), 4); + assert_eq!(hit, CacheStatus::Hit); + let (res, hit) = cache.get(&3); + assert_eq!(res.unwrap(), 6); + assert_eq!(hit, CacheStatus::Hit); + } + + #[test] + fn test_multi_get() { + let cache: MemoryCache<i32, i32> = MemoryCache::new(10); + cache.put(&2, -2, None); + let keys: Vec<i32> = vec![1, 2, 3]; + let resp = cache.multi_get(keys.iter()); + assert_eq!(resp[0].0, None); + assert_eq!(resp[0].1, CacheStatus::Miss); + assert_eq!(resp[1].0.unwrap(), -2); + assert_eq!(resp[1].1, CacheStatus::Hit); + assert_eq!(resp[2].0, None); + assert_eq!(resp[2].1, CacheStatus::Miss); + + let (resp, missed) = cache.multi_get_with_miss(keys.iter()); + assert_eq!(resp[0].0, None); + assert_eq!(resp[0].1, CacheStatus::Miss); + assert_eq!(resp[1].0.unwrap(), -2); + assert_eq!(resp[1].1, CacheStatus::Hit); + assert_eq!(resp[2].0, None); + assert_eq!(resp[2].1, CacheStatus::Miss); + assert_eq!(missed[0], &1); + assert_eq!(missed[1], &3); + } +} diff --git a/pingora-memory-cache/src/read_through.rs b/pingora-memory-cache/src/read_through.rs new file mode 100644 index 0000000..05a8d89 --- /dev/null +++ b/pingora-memory-cache/src/read_through.rs @@ -0,0 +1,689 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! An async read through cache where cache miss are populated via the provided +//! async callback. + +use super::{CacheStatus, MemoryCache}; + +use async_trait::async_trait; +use log::warn; +use parking_lot::RwLock; +use pingora_error::{Error, ErrorTrait}; +use std::collections::HashMap; +use std::hash::Hash; +use std::marker::PhantomData; +use std::sync::Arc; +use std::time::{Duration, Instant}; +use tokio::sync::Semaphore; + +struct CacheLock { + pub lock_start: Instant, + pub lock: Semaphore, +} + +impl CacheLock { + pub fn new_arc() -> Arc<Self> { + Arc::new(CacheLock { + lock: Semaphore::new(0), + lock_start: Instant::now(), + }) + } + + pub fn too_old(&self, age: Option<&Duration>) -> bool { + match age { + Some(t) => Instant::now() - self.lock_start > *t, + None => false, + } + } +} + +#[async_trait] +/// [Lookup] defines the caching behavior that the implementor needs. The `extra` field can be used +/// to define any additional metadata that the implementor uses to determine cache eligibility. +/// +/// # Examples +/// +/// ```ignore +/// use pingora_error::{ErrorTrait, Result}; +/// use std::time::Duration; +/// +/// struct MyLookup; +/// +/// impl Lookup<usize, usize, ()> for MyLookup { +/// async fn lookup( +/// &self, +/// _key: &usize, +/// extra: Option<&()>, +/// ) -> Result<(usize, Option<Duration>), Box<dyn ErrorTrait + Send + Sync>> { +/// // Define your business logic here. +/// Ok(1, None) +/// } +/// } +/// ``` +pub trait Lookup<K, T, S> { + /// Return a value and an optional TTL for the given key. + async fn lookup( + key: &K, + extra: Option<&S>, + ) -> Result<(T, Option<Duration>), Box<dyn ErrorTrait + Send + Sync>> + where + K: 'async_trait, + S: 'async_trait; +} + +#[async_trait] +/// [MultiLookup] is similar to [Lookup]. Implement this trait if the system being queried support +/// looking up multiple keys in a single API call. +pub trait MultiLookup<K, T, S> { + /// Like [Lookup::lookup] but for an arbitrary amount of keys. + async fn multi_lookup( + keys: &[&K], + extra: Option<&S>, + ) -> Result<Vec<(T, Option<Duration>)>, Box<dyn ErrorTrait + Send + Sync>> + where + K: 'async_trait, + S: 'async_trait; +} + +const LOOKUP_ERR_MSG: &str = "RTCache: lookup error"; + +/// A read-through in-memory cache on top of [MemoryCache] +/// +/// Instead of providing a `put` function, [RTCache] requires a type which implements [Lookup] to +/// be automatically called during cache miss to populate the cache. This is useful when trying to +/// cache queries to external system such as DNS or databases. +/// +/// Lookup coalescing is provided so that multiple concurrent lookups for the same key results +/// only in one lookup callback. +pub struct RTCache<K, T, CB, S> +where + K: Hash + Send, + T: Clone + Send, +{ + inner: MemoryCache<K, T>, + _callback: PhantomData<CB>, + lockers: RwLock<HashMap<u64, Arc<CacheLock>>>, + lock_age: Option<Duration>, + lock_timeout: Option<Duration>, + phantom: PhantomData<S>, +} + +impl<K, T, CB, S> RTCache<K, T, CB, S> +where + K: Hash + Send, + T: Clone + Send + Sync, +{ + /// Create a new [RTCache] of given size. `lock_age` defines how long a lock is valid for. + /// `lock_timeout` is used to stop a lookup from holding on to the key for too long. + pub fn new(size: usize, lock_age: Option<Duration>, lock_timeout: Option<Duration>) -> Self { + RTCache { + inner: MemoryCache::new(size), + lockers: RwLock::new(HashMap::new()), + _callback: PhantomData, + lock_age, + lock_timeout, + phantom: PhantomData, + } + } +} + +impl<K, T, CB, S> RTCache<K, T, CB, S> +where + K: Hash + Send, + T: Clone + Send + Sync, + CB: Lookup<K, T, S>, +{ + /// Query the cache for a given value. If it exists and no TTL is configured initially, it will + /// use the `ttl` value given. + pub async fn get( + &self, + key: &K, + ttl: Option<Duration>, + extra: Option<&S>, + ) -> (Result<T, Box<Error>>, CacheStatus) { + let (result, cache_state) = self.inner.get(key); + if let Some(result) = result { + /* cache hit */ + return (Ok(result), cache_state); + } + + let hashed_key = self.inner.hasher.hash_one(key); + + /* Cache miss, try to lock the lookup. Check if there is already a lookup */ + let my_lock = { + let lockers = self.lockers.read(); + /* clone the Arc */ + lockers.get(&hashed_key).cloned() + }; // read lock dropped + + /* try insert a cache lock into locker */ + let (my_write, my_read) = match my_lock { + // TODO: use a union + Some(lock) => { + /* There is an ongoing lookup to the same key */ + if lock.too_old(self.lock_age.as_ref()) { + (None, None) + } else { + (None, Some(lock)) + } + } + None => { + let mut lockers = self.lockers.write(); + match lockers.get(&hashed_key) { + Some(lock) => { + /* another lookup to the same key got the write lock to locker first */ + if lock.too_old(self.lock_age.as_ref()) { + (None, None) + } else { + (None, Some(lock.clone())) + } + } + None => { + let new_lock = CacheLock::new_arc(); + let new_lock2 = new_lock.clone(); + lockers.insert(hashed_key, new_lock2); + (Some(new_lock), None) + } + } // write lock dropped + } + }; + + if my_read.is_some() { + /* another task will do the lookup */ + + let my_lock = my_read.unwrap(); + /* if available_permits > 0, writer is done */ + if my_lock.lock.available_permits() == 0 { + /* block here to wait for writer to finish lookup */ + let lock_fut = my_lock.lock.acquire(); + let timed_out = match self.lock_timeout { + Some(t) => pingora_timeout::timeout(t, lock_fut).await.is_err(), + None => { + let _ = lock_fut.await; + false + } + }; + if timed_out { + let value = CB::lookup(key, extra).await; + return match value { + Ok((v, _ttl)) => (Ok(v), cache_state), + Err(e) => { + let mut err = Error::new_str(LOOKUP_ERR_MSG); + err.set_cause(e); + (Err(err), cache_state) + } + }; + } + } // permit returned here + + let (result, cache_state) = self.inner.get(key); + if let Some(result) = result { + /* cache lock hit, slow as a miss */ + (Ok(result), CacheStatus::LockHit) + } else { + /* probably error happen during the actual lookup */ + warn!( + "RTCache: no result after read lock, cache status: {:?}", + cache_state + ); + match CB::lookup(key, extra).await { + Ok((v, new_ttl)) => { + self.inner.force_put(key, v.clone(), new_ttl.or(ttl)); + (Ok(v), cache_state) + } + Err(e) => { + let mut err = Error::new_str(LOOKUP_ERR_MSG); + err.set_cause(e); + (Err(err), cache_state) + } + } + } + } else { + /* this one will do the look up, either because it gets the write lock or the read + * lock age is reached */ + let value = CB::lookup(key, extra).await; + let ret = match value { + Ok((v, new_ttl)) => { + /* Don't put() if lock ago too old, to avoid too many concurrent writes */ + if my_write.is_some() { + self.inner.force_put(key, v.clone(), new_ttl.or(ttl)); + } + (Ok(v), cache_state) // the original cache_state: Miss or Expired + } + Err(e) => { + let mut err = Error::new_str(LOOKUP_ERR_MSG); + err.set_cause(e); + (Err(err), cache_state) + } + }; + if my_write.is_some() { + /* add permit so that reader can start. Any number of permits will do, + * since readers will return permits right away. */ + my_write.unwrap().lock.add_permits(10); + + { + // remove the lock from locker + let mut lockers = self.lockers.write(); + lockers.remove(&hashed_key); + } // write lock dropped here + } + + ret + } + } +} + +impl<K, T, CB, S> RTCache<K, T, CB, S> +where + K: Hash + Send, + T: Clone + Send + Sync, + CB: MultiLookup<K, T, S>, +{ + /// Same behavior as [RTCache::get] but for an arbitrary amount of keys. + /// + /// If there are keys that are missing from cache, `multi_lookup` is invoked to populate the + /// cache before returning the final results. This is useful if your type supports batch + /// queries. + /// + /// To avoid dead lock for the same key across concurrent `multi_get` calls, + /// this function does not provide lookup coalescing. + pub async fn multi_get<'a, I>( + &self, + keys: I, + ttl: Option<Duration>, + extra: Option<&S>, + ) -> Result<Vec<(T, CacheStatus)>, Box<Error>> + where + I: Iterator<Item = &'a K>, + K: 'a, + { + let size = keys.size_hint().0; + let (hits, misses) = self.inner.multi_get_with_miss(keys); + let mut final_results = Vec::with_capacity(size); + let miss_results = if !misses.is_empty() { + match CB::multi_lookup(&misses, extra).await { + Ok(miss_results) => { + // assert! here to prevent index panic when building results, + // final_results has full list of misses but miss_results might not + assert!( + miss_results.len() == misses.len(), + "multi_lookup() failed to return the matching number of results" + ); + /* put the misses into cache */ + for item in misses.iter().zip(miss_results.iter()) { + self.inner + .force_put(item.0, (item.1).0.clone(), (item.1).1.or(ttl)); + } + miss_results + } + Err(e) => { + /* NOTE: we give up the hits when encounter lookup error */ + let mut err = Error::new_str(LOOKUP_ERR_MSG); + err.set_cause(e); + return Err(err); + } + } + } else { + vec![] // to make the rest code simple, allocating one unused empty vec should be fine + }; + /* fill in final_result */ + let mut n_miss = 0; + for item in hits { + match item.0 { + Some(v) => final_results.push((v, item.1)), + None => { + final_results // miss_results.len() === #None in result (asserted above) + .push((miss_results[n_miss].0.clone(), CacheStatus::Miss)); + n_miss += 1; + } + } + } + Ok(final_results) + } +} + +#[cfg(test)] +mod tests { + use super::*; + use atomic::AtomicI32; + use std::sync::atomic; + + #[derive(Clone, Debug)] + struct ExtraOpt { + error: bool, + empty: bool, + delay_for: Option<Duration>, + used: Arc<AtomicI32>, + } + + struct TestCB(); + + #[async_trait] + impl Lookup<i32, i32, ExtraOpt> for TestCB { + async fn lookup( + _key: &i32, + extra: Option<&ExtraOpt>, + ) -> Result<(i32, Option<Duration>), Box<dyn ErrorTrait + Send + Sync>> { + // this function returns #lookup_times + let mut used = 0; + if let Some(e) = extra { + used = e.used.fetch_add(1, atomic::Ordering::Relaxed) + 1; + if e.error { + return Err(Error::new_str("test error")); + } + if let Some(delay_for) = e.delay_for { + tokio::time::sleep(delay_for).await; + } + } + Ok((used, None)) + } + } + + #[async_trait] + impl MultiLookup<i32, i32, ExtraOpt> for TestCB { + async fn multi_lookup( + keys: &[&i32], + extra: Option<&ExtraOpt>, + ) -> Result<Vec<(i32, Option<Duration>)>, Box<dyn ErrorTrait + Send + Sync>> { + let mut resp = vec![]; + if let Some(extra) = extra { + if extra.empty { + return Ok(resp); + } + } + for key in keys { + resp.push((**key, None)); + } + Ok(resp) + } + } + + #[tokio::test] + async fn test_basic_get() { + let cache: RTCache<i32, i32, TestCB, ExtraOpt> = RTCache::new(10, None, None); + let opt = Some(ExtraOpt { + error: false, + empty: false, + delay_for: None, + used: Arc::new(AtomicI32::new(0)), + }); + let (res, hit) = cache.get(&1, None, opt.as_ref()).await; + assert_eq!(res.unwrap(), 1); + assert_eq!(hit, CacheStatus::Miss); + let (res, hit) = cache.get(&1, None, opt.as_ref()).await; + assert_eq!(res.unwrap(), 1); + assert_eq!(hit, CacheStatus::Hit); + } + + #[tokio::test] + async fn test_basic_get_error() { + let cache: RTCache<i32, i32, TestCB, ExtraOpt> = RTCache::new(10, None, None); + let opt1 = Some(ExtraOpt { + error: true, + empty: false, + delay_for: None, + used: Arc::new(AtomicI32::new(0)), + }); + let (res, hit) = cache.get(&-1, None, opt1.as_ref()).await; + assert!(res.is_err()); + assert_eq!(hit, CacheStatus::Miss); + } + + #[tokio::test] + async fn test_concurrent_get() { + let cache: RTCache<i32, i32, TestCB, ExtraOpt> = RTCache::new(10, None, None); + let cache = Arc::new(cache); + let opt = Some(ExtraOpt { + error: false, + empty: false, + delay_for: None, + used: Arc::new(AtomicI32::new(0)), + }); + let cache_c = cache.clone(); + let opt1 = opt.clone(); + // concurrent gets, only 1 will call the callback + let t1 = tokio::spawn(async move { + let (res, _hit) = cache_c.get(&1, None, opt1.as_ref()).await; + res.unwrap() + }); + let cache_c = cache.clone(); + let opt2 = opt.clone(); + let t2 = tokio::spawn(async move { + let (res, _hit) = cache_c.get(&1, None, opt2.as_ref()).await; + res.unwrap() + }); + let opt3 = opt.clone(); + let cache_c = cache.clone(); + let t3 = tokio::spawn(async move { + let (res, _hit) = cache_c.get(&1, None, opt3.as_ref()).await; + res.unwrap() + }); + let (r1, r2, r3) = tokio::join!(t1, t2, t3); + assert_eq!(r1.unwrap(), 1); + assert_eq!(r2.unwrap(), 1); + assert_eq!(r3.unwrap(), 1); + } + + #[tokio::test] + async fn test_concurrent_get_error() { + let cache: RTCache<i32, i32, TestCB, ExtraOpt> = RTCache::new(10, None, None); + let cache = Arc::new(cache); + let cache_c = cache.clone(); + let opt1 = Some(ExtraOpt { + error: true, + empty: false, + delay_for: None, + used: Arc::new(AtomicI32::new(0)), + }); + let opt2 = opt1.clone(); + let opt3 = opt1.clone(); + // concurrent gets, only 1 will call the callback + let t1 = tokio::spawn(async move { + let (res, _hit) = cache_c.get(&-1, None, opt1.as_ref()).await; + res.is_err() + }); + let cache_c = cache.clone(); + let t2 = tokio::spawn(async move { + let (res, _hit) = cache_c.get(&-1, None, opt2.as_ref()).await; + res.is_err() + }); + let cache_c = cache.clone(); + let t3 = tokio::spawn(async move { + let (res, _hit) = cache_c.get(&-1, None, opt3.as_ref()).await; + res.is_err() + }); + let (r1, r2, r3) = tokio::join!(t1, t2, t3); + assert!(r1.unwrap()); + assert!(r2.unwrap()); + assert!(r3.unwrap()); + } + + #[tokio::test] + async fn test_concurrent_get_different_value() { + let cache: RTCache<i32, i32, TestCB, ExtraOpt> = RTCache::new(10, None, None); + let cache = Arc::new(cache); + let opt1 = Some(ExtraOpt { + error: false, + empty: false, + delay_for: None, + used: Arc::new(AtomicI32::new(0)), + }); + let opt2 = opt1.clone(); + let opt3 = opt1.clone(); + let cache_c = cache.clone(); + // concurrent gets to different keys, no locks, all will call the cb + let t1 = tokio::spawn(async move { + let (res, _hit) = cache_c.get(&1, None, opt1.as_ref()).await; + res.unwrap() + }); + let cache_c = cache.clone(); + let t2 = tokio::spawn(async move { + let (res, _hit) = cache_c.get(&3, None, opt2.as_ref()).await; + res.unwrap() + }); + let cache_c = cache.clone(); + let t3 = tokio::spawn(async move { + let (res, _hit) = cache_c.get(&5, None, opt3.as_ref()).await; + res.unwrap() + }); + let (r1, r2, r3) = tokio::join!(t1, t2, t3); + // 1 lookup + 2 lookups + 3 lookups, order not matter + assert_eq!(r1.unwrap() + r2.unwrap() + r3.unwrap(), 6); + } + + #[tokio::test] + async fn test_get_lock_age() { + // 1 sec lock age + let cache: RTCache<i32, i32, TestCB, ExtraOpt> = + RTCache::new(10, Some(Duration::from_secs(1)), None); + let cache = Arc::new(cache); + let counter = Arc::new(AtomicI32::new(0)); + let opt1 = Some(ExtraOpt { + error: false, + empty: false, + delay_for: Some(Duration::from_secs(2)), + used: counter.clone(), + }); + + let opt2 = Some(ExtraOpt { + error: false, + empty: false, + delay_for: None, + used: counter.clone(), + }); + let opt3 = opt2.clone(); + let cache_c = cache.clone(); + // t1 will be delay for 2 sec + let t1 = tokio::spawn(async move { + let (res, _hit) = cache_c.get(&1, None, opt1.as_ref()).await; + res.unwrap() + }); + // start t2 and t3 1.5 seconds later, since lock age is 1 sec, there will be no lock + tokio::time::sleep(Duration::from_secs_f32(1.5)).await; + let cache_c = cache.clone(); + let t2 = tokio::spawn(async move { + let (res, _hit) = cache_c.get(&1, None, opt2.as_ref()).await; + res.unwrap() + }); + let cache_c = cache.clone(); + let t3 = tokio::spawn(async move { + let (res, _hit) = cache_c.get(&1, None, opt3.as_ref()).await; + res.unwrap() + }); + let (r1, r2, r3) = tokio::join!(t1, t2, t3); + // 1 lookup + 2 lookups + 3 lookups, order not matter + assert_eq!(r1.unwrap() + r2.unwrap() + r3.unwrap(), 6); + } + + #[tokio::test] + async fn test_get_lock_timeout() { + // 1 sec lock timeout + let cache: RTCache<i32, i32, TestCB, ExtraOpt> = + RTCache::new(10, None, Some(Duration::from_secs(1))); + let cache = Arc::new(cache); + let counter = Arc::new(AtomicI32::new(0)); + let opt1 = Some(ExtraOpt { + error: false, + empty: false, + delay_for: Some(Duration::from_secs(2)), + used: counter.clone(), + }); + let opt2 = Some(ExtraOpt { + error: false, + empty: false, + delay_for: None, + used: counter.clone(), + }); + let opt3 = opt2.clone(); + let cache_c = cache.clone(); + // t1 will be delay for 2 sec + let t1 = tokio::spawn(async move { + let (res, _hit) = cache_c.get(&1, None, opt1.as_ref()).await; + res.unwrap() + }); + // since lock timeout is 1 sec, t2 and t3 will do their own lookup after 1 sec + let cache_c = cache.clone(); + let t2 = tokio::spawn(async move { + let (res, _hit) = cache_c.get(&1, None, opt2.as_ref()).await; + res.unwrap() + }); + let cache_c = cache.clone(); + let t3 = tokio::spawn(async move { + let (res, _hit) = cache_c.get(&1, None, opt3.as_ref()).await; + res.unwrap() + }); + let (r1, r2, r3) = tokio::join!(t1, t2, t3); + // 1 lookup + 2 lookups + 3 lookups, order not matter + assert_eq!(r1.unwrap() + r2.unwrap() + r3.unwrap(), 6); + } + + #[tokio::test] + async fn test_multi_get() { + let cache: RTCache<i32, i32, TestCB, ExtraOpt> = RTCache::new(10, None, None); + let counter = Arc::new(AtomicI32::new(0)); + let opt1 = Some(ExtraOpt { + error: false, + empty: false, + delay_for: Some(Duration::from_secs(2)), + used: counter.clone(), + }); + // make 1 a hit first + let (res, hit) = cache.get(&1, None, opt1.as_ref()).await; + assert_eq!(res.unwrap(), 1); + assert_eq!(hit, CacheStatus::Miss); + let (res, hit) = cache.get(&1, None, opt1.as_ref()).await; + assert_eq!(res.unwrap(), 1); + assert_eq!(hit, CacheStatus::Hit); + // 1 hit 2 miss 3 miss + let resp = cache + .multi_get([1, 2, 3].iter(), None, opt1.as_ref()) + .await + .unwrap(); + assert_eq!(resp[0].0, 1); + assert_eq!(resp[0].1, CacheStatus::Hit); + assert_eq!(resp[1].0, 2); + assert_eq!(resp[1].1, CacheStatus::Miss); + assert_eq!(resp[2].0, 3); + assert_eq!(resp[2].1, CacheStatus::Miss); + // all hit after a fetch + let resp = cache + .multi_get([1, 2, 3].iter(), None, opt1.as_ref()) + .await + .unwrap(); + assert_eq!(resp[0].0, 1); + assert_eq!(resp[0].1, CacheStatus::Hit); + assert_eq!(resp[1].0, 2); + assert_eq!(resp[1].1, CacheStatus::Hit); + assert_eq!(resp[2].0, 3); + assert_eq!(resp[2].1, CacheStatus::Hit); + } + + #[tokio::test] + #[should_panic(expected = "multi_lookup() failed to return the matching number of results")] + async fn test_inconsistent_miss_results() { + // force empty result + let opt1 = Some(ExtraOpt { + error: false, + empty: true, + delay_for: None, + used: Arc::new(AtomicI32::new(0)), + }); + let cache: RTCache<i32, i32, TestCB, ExtraOpt> = RTCache::new(10, None, None); + cache + .multi_get([4, 5, 6].iter(), None, opt1.as_ref()) + .await + .unwrap(); + } +} diff --git a/pingora-openssl/Cargo.toml b/pingora-openssl/Cargo.toml new file mode 100644 index 0000000..19b3349 --- /dev/null +++ b/pingora-openssl/Cargo.toml @@ -0,0 +1,29 @@ +[package] +name = "pingora-openssl" +version = "0.1.0" +authors = ["Yuchen Wu <[email protected]>"] +license = "Apache-2.0" +edition = "2021" +repository = "https://github.com/cloudflare/pingora" +categories = ["asynchronous", "network-programming"] +keywords = ["async", "tls", "ssl", "pingora"] +description = """ +OpenSSL async APIs for Pingora. +""" + +# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html +[lib] +name = "pingora_openssl" +path = "src/lib.rs" + +[dependencies] +openssl-sys = "0.9" +openssl = { version = "0.10", features = ["vendored"] } +openssl-src = { version = "300", features = ["weak-crypto"] } +tokio-openssl = { version = "0.6" } +libc = "0.2.70" +foreign-types = { version = "0.3"} + +[dev-dependencies] +tokio-test = "0.4" +tokio = { workspace = true, features = ["full"] } diff --git a/pingora-openssl/LICENSE b/pingora-openssl/LICENSE new file mode 100644 index 0000000..d645695 --- /dev/null +++ b/pingora-openssl/LICENSE @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/pingora-openssl/src/ext.rs b/pingora-openssl/src/ext.rs new file mode 100644 index 0000000..f8cebb2 --- /dev/null +++ b/pingora-openssl/src/ext.rs @@ -0,0 +1,209 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use foreign_types::ForeignTypeRef; +use libc::*; +use openssl::error::ErrorStack; +use openssl::pkey::{HasPrivate, PKeyRef}; +use openssl::ssl::{Ssl, SslAcceptor, SslRef}; +use openssl::x509::store::X509StoreRef; +use openssl::x509::verify::X509VerifyParamRef; +use openssl::x509::X509Ref; +use openssl_sys::{ + SSL_ctrl, EVP_PKEY, SSL, SSL_CTRL_SET_GROUPS_LIST, SSL_CTRL_SET_VERIFY_CERT_STORE, X509, + X509_VERIFY_PARAM, +}; +use std::ffi::CString; +use std::os::raw; + +fn cvt(r: c_int) -> Result<c_int, ErrorStack> { + if r != 1 { + Err(ErrorStack::get()) + } else { + Ok(r) + } +} + +extern "C" { + pub fn X509_VERIFY_PARAM_add1_host( + param: *mut X509_VERIFY_PARAM, + name: *const c_char, + namelen: size_t, + ) -> c_int; + + pub fn SSL_use_certificate(ssl: *const SSL, cert: *mut X509) -> c_int; + pub fn SSL_use_PrivateKey(ctx: *const SSL, key: *mut EVP_PKEY) -> c_int; + + pub fn SSL_set_cert_cb( + ssl: *mut SSL, + cb: ::std::option::Option< + unsafe extern "C" fn(ssl: *mut SSL, arg: *mut raw::c_void) -> raw::c_int, + >, + arg: *mut raw::c_void, + ); +} + +/// Add name as an additional reference identifier that can match the peer's certificate +/// +/// See [X509_VERIFY_PARAM_set1_host](https://www.openssl.org/docs/man3.1/man3/X509_VERIFY_PARAM_set1_host.html). +pub fn add_host(verify_param: &mut X509VerifyParamRef, host: &str) -> Result<(), ErrorStack> { + if host.is_empty() { + return Ok(()); + } + unsafe { + cvt(X509_VERIFY_PARAM_add1_host( + verify_param.as_ptr(), + host.as_ptr() as *const _, + host.len(), + )) + .map(|_| ()) + } +} + +/// Set the verify cert store of `ssl` +/// +/// See [SSL_set1_verify_cert_store](https://www.openssl.org/docs/man1.1.1/man3/SSL_set1_verify_cert_store.html). +pub fn ssl_set_verify_cert_store( + ssl: &mut SslRef, + cert_store: &X509StoreRef, +) -> Result<(), ErrorStack> { + unsafe { + cvt(SSL_ctrl( + ssl.as_ptr(), + SSL_CTRL_SET_VERIFY_CERT_STORE, + 1, // increase the ref count of X509Store so that ssl_ctx can outlive X509StoreRef + cert_store.as_ptr() as *mut c_void, + ) as i32)?; + } + Ok(()) +} + +/// Load the certificate into `ssl` +/// +/// See [SSL_use_certificate](https://www.openssl.org/docs/man1.1.1/man3/SSL_use_certificate.html). +pub fn ssl_use_certificate(ssl: &mut SslRef, cert: &X509Ref) -> Result<(), ErrorStack> { + unsafe { + cvt(SSL_use_certificate(ssl.as_ptr(), cert.as_ptr()))?; + } + Ok(()) +} + +/// Load the private key into `ssl` +/// +/// See [SSL_use_certificate](https://www.openssl.org/docs/man1.1.1/man3/SSL_use_PrivateKey.html). +pub fn ssl_use_private_key<T>(ssl: &mut SslRef, key: &PKeyRef<T>) -> Result<(), ErrorStack> +where + T: HasPrivate, +{ + unsafe { + cvt(SSL_use_PrivateKey(ssl.as_ptr(), key.as_ptr()))?; + } + Ok(()) +} + +/// Add the certificate into the cert chain of `ssl` +/// +/// See [SSL_add1_chain_cert](https://www.openssl.org/docs/man1.1.1/man3/SSL_add1_chain_cert.html) +pub fn ssl_add_chain_cert(ssl: &mut SslRef, cert: &X509Ref) -> Result<(), ErrorStack> { + const SSL_CTRL_CHAIN_CERT: i32 = 89; + unsafe { + cvt(SSL_ctrl( + ssl.as_ptr(), + SSL_CTRL_CHAIN_CERT, + 1, // increase the ref count of X509 so that ssl can outlive X509StoreRef + cert.as_ptr() as *mut c_void, + ) as i32)?; + } + Ok(()) +} + +/// Set renegotiation +/// +/// This function is specific to BoringSSL. This function is noop for OpenSSL. +pub fn ssl_set_renegotiate_mode_freely(_ssl: &mut SslRef) {} + +/// Set the curves/groups of `ssl` +/// +/// See [set_groups_list](https://www.openssl.org/docs/manmaster/man3/SSL_CTX_set1_curves.html). +pub fn ssl_set_groups_list(ssl: &mut SslRef, groups: &str) -> Result<(), ErrorStack> { + let groups = CString::new(groups).unwrap(); + unsafe { + cvt(SSL_ctrl( + ssl.as_ptr(), + SSL_CTRL_SET_GROUPS_LIST, + 0, + groups.as_ptr() as *mut c_void, + ) as i32)?; + } + Ok(()) +} + +/// Set's whether a second keyshare to be sent in client hello when PQ is used. +/// +/// This function is specific to BoringSSL. This function is noop for OpenSSL. +pub fn ssl_use_second_key_share(_ssl: &mut SslRef, _enabled: bool) {} + +/// Clear the error stack +/// +/// SSL calls should check and clear the OpenSSL error stack. But some calls fail to do so. +/// This causes the next unrelated SSL call to fail due to the leftover errors. This function allow +/// caller to clear the error stack before performing SSL calls to avoid this issue. +pub fn clear_error_stack() { + let _ = ErrorStack::get(); +} + +/// Create a new [Ssl] from &[SslAcceptor] +/// +/// this function is to unify the interface between this crate and `pingora-boringssl` +pub fn ssl_from_acceptor(acceptor: &SslAcceptor) -> Result<Ssl, ErrorStack> { + Ssl::new(acceptor.context()) +} + +/// Suspend the TLS handshake when a certificate is needed. +/// +/// This function will cause tls handshake to pause and return the error: SSL_ERROR_WANT_X509_LOOKUP. +/// The caller should set the certificate and then call [unblock_ssl_cert()] before continue the +/// handshake on the tls connection. +pub fn suspend_when_need_ssl_cert(ssl: &mut SslRef) { + unsafe { + SSL_set_cert_cb(ssl.as_ptr(), Some(raw_cert_block), std::ptr::null_mut()); + } +} + +/// Unblock a TLS handshake after the certificate is set. +/// +/// The user should continue to call tls handshake after this function is called. +pub fn unblock_ssl_cert(ssl: &mut SslRef) { + unsafe { + SSL_set_cert_cb(ssl.as_ptr(), None, std::ptr::null_mut()); + } +} + +// Just block the handshake +extern "C" fn raw_cert_block(_ssl: *mut openssl_sys::SSL, _arg: *mut c_void) -> c_int { + -1 +} + +/// Whether the TLS error is SSL_ERROR_WANT_X509_LOOKUP +pub fn is_suspended_for_cert(error: &openssl::ssl::Error) -> bool { + error.code().as_raw() == openssl_sys::SSL_ERROR_WANT_X509_LOOKUP +} + +#[allow(clippy::mut_from_ref)] +/// Get a mutable SslRef ouf of SslRef, which is a missing functionality even when holding &mut SslStream +/// # Safety +/// the caller need to make sure that they hold a &mut SslStream (or other mutable ref to the Ssl) +pub unsafe fn ssl_mut(ssl: &SslRef) -> &mut SslRef { + SslRef::from_ptr_mut(ssl.as_ptr()) +} diff --git a/pingora-openssl/src/lib.rs b/pingora-openssl/src/lib.rs new file mode 100644 index 0000000..b12cee1 --- /dev/null +++ b/pingora-openssl/src/lib.rs @@ -0,0 +1,33 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! The OpenSSL API compatibility layer. +//! +//! This crate aims at making [openssl] APIs interchangeable with [boring](https://docs.rs/boring/latest/boring/). +//! In other words, this crate and `pingora-boringssl` expose identical rust APIs. + +#![warn(clippy::all)] + +use openssl as ssl_lib; +pub use openssl_sys as ssl_sys; +pub use tokio_openssl as tokio_ssl; +pub mod ext; + +// export commonly used libs +pub use ssl_lib::error; +pub use ssl_lib::hash; +pub use ssl_lib::nid; +pub use ssl_lib::pkey; +pub use ssl_lib::ssl; +pub use ssl_lib::x509; diff --git a/pingora-pool/Cargo.toml b/pingora-pool/Cargo.toml new file mode 100644 index 0000000..170e497 --- /dev/null +++ b/pingora-pool/Cargo.toml @@ -0,0 +1,29 @@ +[package] +name = "pingora-pool" +version = "0.1.0" +authors = ["Yuchen Wu <[email protected]>"] +license = "Apache-2.0" +edition = "2021" +repository = "https://github.com/cloudflare/pingora" +categories = ["network-programming"] +keywords = ["async", "pooling", "pingora"] +description = """ +A connection pool system for connection reuse. +""" + +# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html +[lib] +name = "pingora_pool" +path = "src/lib.rs" + +[dependencies] +tokio = { workspace = true, features = ["sync", "io-util"] } +thread_local = "1.0" +lru = { workspace = true } +log = { workspace = true } +parking_lot = "0.12" +crossbeam-queue = "0.3" +pingora-timeout = { version = "0.1.0", path = "../pingora-timeout" } + +[dev-dependencies] +tokio-test = "0.4" diff --git a/pingora-pool/LICENSE b/pingora-pool/LICENSE new file mode 100644 index 0000000..d645695 --- /dev/null +++ b/pingora-pool/LICENSE @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/pingora-pool/src/connection.rs b/pingora-pool/src/connection.rs new file mode 100644 index 0000000..c8a5e33 --- /dev/null +++ b/pingora-pool/src/connection.rs @@ -0,0 +1,530 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Generic connection pooling + +use log::{debug, warn}; +use parking_lot::{Mutex, RwLock}; +use pingora_timeout::{sleep, timeout}; +use std::collections::HashMap; +use std::io; +use std::sync::Arc; +use std::time::Duration; +use tokio::io::{AsyncRead, AsyncReadExt}; +use tokio::sync::{oneshot, watch, Notify, OwnedMutexGuard}; + +use super::lru::Lru; + +type GroupKey = u64; +type ID = i32; + +/// the metadata of a connection +#[derive(Clone, Debug)] +pub struct ConnectionMeta { + /// The group key. All connections under the same key are considered the same for connection reuse. + pub key: GroupKey, + /// The unique ID of a connection. + pub id: ID, +} + +impl ConnectionMeta { + /// Create a new [ConnectionMeta] + pub fn new(key: GroupKey, id: ID) -> Self { + ConnectionMeta { key, id } + } +} + +struct PoolConnection<S> { + pub notify_use: oneshot::Sender<bool>, + pub connection: S, +} + +impl<S> PoolConnection<S> { + pub fn new(notify_use: oneshot::Sender<bool>, connection: S) -> Self { + PoolConnection { + notify_use, + connection, + } + } + + pub fn release(self) -> S { + // notify the idle watcher to release the connection + let _ = self.notify_use.send(true); + // wait for the watcher to release + self.connection + } +} + +use crossbeam_queue::ArrayQueue; + +/// A pool of exchangeable items +pub struct PoolNode<T> { + connections: Mutex<HashMap<ID, T>>, + // a small lock free queue to avoid lock contention + hot_queue: ArrayQueue<(ID, T)>, + // to avoid race between 2 evictions on the queue + hot_queue_remove_lock: Mutex<()>, + // TODO: store the GroupKey to avoid hash collision? +} + +// Keep the queue size small because eviction is O(n) in the queue +const HOT_QUEUE_SIZE: usize = 16; + +impl<T> PoolNode<T> { + /// Create a new [PoolNode] + pub fn new() -> Self { + PoolNode { + connections: Mutex::new(HashMap::new()), + hot_queue: ArrayQueue::new(HOT_QUEUE_SIZE), + hot_queue_remove_lock: Mutex::new(()), + } + } + + /// Get any item from the pool + pub fn get_any(&self) -> Option<(ID, T)> { + let hot_conn = self.hot_queue.pop(); + if hot_conn.is_some() { + return hot_conn; + } + let mut connections = self.connections.lock(); + // find one connection, any connection will do + let id = match connections.iter().next() { + Some((k, _)) => *k, // OK to copy i32 + None => return None, + }; + // unwrap is safe since we just found it + let connection = connections.remove(&id).unwrap(); + /* NOTE: we don't resize or drop empty connections hashmap + * We may want to do it if they consume too much memory + * maybe we should use trees to save memory */ + Some((id, connection)) + // connections.lock released here + } + + /// Insert an item with the given unique ID into the pool + pub fn insert(&self, id: ID, conn: T) { + if let Err(node) = self.hot_queue.push((id, conn)) { + // hot queue is full + let mut connections = self.connections.lock(); + connections.insert(node.0, node.1); // TODO: check dup + } + } + + // This function acquires 2 locks and iterates over the entire hot queue + // But it should be fine because remove() rarely happens on a busy PoolNode + /// Remove the item associated with the id from the pool. The item is returned + /// if it is found and removed. + pub fn remove(&self, id: ID) -> Option<T> { + // check the table first as least recent used ones are likely there + let removed = self.connections.lock().remove(&id); + if removed.is_some() { + return removed; + } // lock drops here + + let _queue_lock = self.hot_queue_remove_lock.lock(); + // check the hot queue, note that the queue can be accessed in parallel by insert and get + let max_len = self.hot_queue.len(); + for _ in 0..max_len { + if let Some((conn_id, conn)) = self.hot_queue.pop() { + if conn_id == id { + // this is the item, it is already popped + return Some(conn); + } else { + // not this item, put back to hot queue but it could also be full + self.insert(conn_id, conn); + } + } else { + // other threads grab all the connections + return None; + } + } + None + // _queue_lock drops here + } +} + +/// Connection pool +/// +/// [ConnectionPool] holds reusable connections. A reusable connection is released to this pool to +/// be picked up by another user/request. +pub struct ConnectionPool<S> { + // TODO: n-way pools to reduce lock contention + pool: RwLock<HashMap<GroupKey, Arc<PoolNode<PoolConnection<S>>>>>, + lru: Lru<ID, ConnectionMeta>, +} + +impl<S> ConnectionPool<S> { + /// Create a new [ConnectionPool] with a size limit. + /// + /// when a connection is released to this pool, the least recently used connection will be dropped. + pub fn new(size: usize) -> Self { + ConnectionPool { + pool: RwLock::new(HashMap::with_capacity(size)), // this is oversized since some connections will have the same key + lru: Lru::new(size), + } + } + + /* get or create and insert a pool node for the hash key */ + fn get_pool_node(&self, key: GroupKey) -> Arc<PoolNode<PoolConnection<S>>> { + { + let pool = self.pool.read(); + if let Some(v) = pool.get(&key) { + return (*v).clone(); + } + } // read lock released here + + { + // write lock section + let mut pool = self.pool.write(); + // check again since another task might already added it + if let Some(v) = pool.get(&key) { + return (*v).clone(); + } + let node = Arc::new(PoolNode::new()); + let node_ret = node.clone(); + pool.insert(key, node); // TODO: check dup + node_ret + } + } + + // only remove from pool because lru already removed it + fn pop_evicted(&self, meta: &ConnectionMeta) { + let pool_node = { + let pool = self.pool.read(); + match pool.get(&meta.key) { + Some(v) => (*v).clone(), + None => { + warn!("Fail to get pool node for {:?}", meta); + return; + } // nothing to pop, should return error? + } + }; // read lock released here + + pool_node.remove(meta.id); + debug!("evict fd: {} from key {}", meta.id, meta.key); + } + + fn pop_closed(&self, meta: &ConnectionMeta) { + // NOTE: which of these should be done first? + self.pop_evicted(meta); + self.lru.pop(&meta.id); + } + + /// Get a connection from this pool under the same group key + pub fn get(&self, key: &GroupKey) -> Option<S> { + let pool_node = { + let pool = self.pool.read(); + match pool.get(key) { + Some(v) => (*v).clone(), + None => return None, + } + }; // read lock released here + + if let Some((id, connection)) = pool_node.get_any() { + self.lru.pop(&id); // the notified is not needed + Some(connection.release()) + } else { + None + } + } + + /// Release a connection to this pool for reuse + /// + /// - The returned [`Arc<Notify>`] will notify any listen when the connection is evicted from the pool. + /// - The returned [`oneshot::Receiver<bool>`] will notify when the connection is being picked up by [Self::get()]. + pub fn put( + &self, + meta: &ConnectionMeta, + connection: S, + ) -> (Arc<Notify>, oneshot::Receiver<bool>) { + let (notify_close, replaced) = self.lru.add(meta.id, meta.clone()); + if let Some(meta) = replaced { + self.pop_evicted(&meta); + }; + let pool_node = self.get_pool_node(meta.key); + let (notify_use, watch_use) = oneshot::channel(); + let connection = PoolConnection::new(notify_use, connection); + pool_node.insert(meta.id, connection); + (notify_close, watch_use) + } + + /// Actively monitor the health of a connection that is already released to this pool + /// + /// When the connection breaks, or the optional `timeout` is reached this function will + /// remove it from the pool and drop the connection. + /// + /// If the connection is reused via [Self::get()] or being evicted, this function will just exit. + pub async fn idle_poll<Stream>( + &self, + connection: OwnedMutexGuard<Stream>, + meta: &ConnectionMeta, + timeout: Option<Duration>, + notify_evicted: Arc<Notify>, + watch_use: oneshot::Receiver<bool>, + ) where + Stream: AsyncRead + Unpin + Send, + { + let read_result = tokio::select! { + biased; + _ = watch_use => { + debug!("idle connection is being picked up"); + return + }, + _ = notify_evicted.notified() => { + debug!("idle connection is being evicted"); + // TODO: gracefully close the connection? + return + } + read_result = read_with_timeout(connection , timeout) => read_result + }; + + match read_result { + Ok(n) => { + if n > 0 { + warn!("Data received on idle client connection, close it") + } else { + debug!("Peer closed the idle connection or timeout") + } + } + + Err(e) => { + debug!("error with the idle connection, close it {:?}", e); + } + } + // connection terminated from either peer or timer + self.pop_closed(meta); + } + + /// Passively wait to close the connection after the timeout + /// + /// If this connection is not being picked up or evicted before the timeout is reach, this + /// function will removed it from the pool and close the connection. + pub async fn idle_timeout( + &self, + meta: &ConnectionMeta, + timeout: Duration, + notify_evicted: Arc<Notify>, + mut notify_closed: watch::Receiver<bool>, + watch_use: oneshot::Receiver<bool>, + ) { + tokio::select! { + biased; + _ = watch_use => { + debug!("idle connection is being picked up"); + }, + _ = notify_evicted.notified() => { + debug!("idle connection is being evicted"); + // TODO: gracefully close the connection? + } + _ = notify_closed.changed() => { + // assume always changed from false to true + debug!("idle connection is being closed"); + self.pop_closed(meta); + } + _ = sleep(timeout) => { + debug!("idle connection is being evicted"); + self.pop_closed(meta); + } + }; + } +} + +async fn read_with_timeout<S>( + mut connection: OwnedMutexGuard<S>, + timeout_duration: Option<Duration>, +) -> io::Result<usize> +where + S: AsyncRead + Unpin + Send, +{ + let mut buf = [0; 1]; + let read_event = connection.read(&mut buf[..]); + match timeout_duration { + Some(d) => match timeout(d, read_event).await { + Ok(res) => res, + Err(e) => { + debug!("keepalive timeout {:?} reached, {:?}", d, e); + Ok(0) + } + }, + _ => read_event.await, + } +} + +#[cfg(test)] +mod tests { + use super::*; + use log::debug; + use tokio::sync::Mutex as AsyncMutex; + use tokio_test::io::{Builder, Mock}; + + #[tokio::test] + async fn test_lookup() { + let meta1 = ConnectionMeta::new(101, 1); + let value1 = "v1".to_string(); + let meta2 = ConnectionMeta::new(102, 2); + let value2 = "v2".to_string(); + let meta3 = ConnectionMeta::new(101, 3); + let value3 = "v3".to_string(); + let cp: ConnectionPool<String> = ConnectionPool::new(3); //#CP3 + cp.put(&meta1, value1.clone()); + cp.put(&meta2, value2.clone()); + cp.put(&meta3, value3.clone()); + + let found_b = cp.get(&meta2.key).unwrap(); + assert_eq!(found_b, value2); + + let found_a1 = cp.get(&meta1.key).unwrap(); + let found_a2 = cp.get(&meta1.key).unwrap(); + + assert!( + found_a1 == value1 && found_a2 == value3 || found_a2 == value1 && found_a1 == value3 + ); + } + + #[tokio::test] + async fn test_pop() { + let meta1 = ConnectionMeta::new(101, 1); + let value1 = "v1".to_string(); + let meta2 = ConnectionMeta::new(102, 2); + let value2 = "v2".to_string(); + let meta3 = ConnectionMeta::new(101, 3); + let value3 = "v3".to_string(); + let cp: ConnectionPool<String> = ConnectionPool::new(3); //#CP3 + cp.put(&meta1, value1); + cp.put(&meta2, value2); + cp.put(&meta3, value3.clone()); + + cp.pop_closed(&meta1); + + let found_a1 = cp.get(&meta1.key).unwrap(); + assert_eq!(found_a1, value3); + + cp.pop_closed(&meta1); + assert!(cp.get(&meta1.key).is_none()) + } + + #[tokio::test] + async fn test_eviction() { + let meta1 = ConnectionMeta::new(101, 1); + let value1 = "v1".to_string(); + let meta2 = ConnectionMeta::new(102, 2); + let value2 = "v2".to_string(); + let meta3 = ConnectionMeta::new(101, 3); + let value3 = "v3".to_string(); + let cp: ConnectionPool<String> = ConnectionPool::new(2); + let (notify_close1, _) = cp.put(&meta1, value1.clone()); + let (notify_close2, _) = cp.put(&meta2, value2.clone()); + let (notify_close3, _) = cp.put(&meta3, value3.clone()); // meta 1 should be evicted + + let closed_item = tokio::select! { + _ = notify_close1.notified() => {debug!("notifier1"); 1}, + _ = notify_close2.notified() => {debug!("notifier2"); 2}, + _ = notify_close3.notified() => {debug!("notifier3"); 3}, + }; + assert_eq!(closed_item, 1); + + let found_a1 = cp.get(&meta1.key).unwrap(); + assert_eq!(found_a1, value3); + assert_eq!(cp.get(&meta1.key), None) + } + + #[tokio::test] + #[should_panic(expected = "There is still data left to read.")] + async fn test_read_close() { + let meta1 = ConnectionMeta::new(101, 1); + let mock_io1 = Arc::new(AsyncMutex::new(Builder::new().read(b"garbage").build())); + let meta2 = ConnectionMeta::new(102, 2); + let mock_io2 = Arc::new(AsyncMutex::new( + Builder::new().wait(Duration::from_secs(99)).build(), + )); + let meta3 = ConnectionMeta::new(101, 3); + let mock_io3 = Arc::new(AsyncMutex::new( + Builder::new().wait(Duration::from_secs(99)).build(), + )); + let cp: ConnectionPool<Arc<AsyncMutex<Mock>>> = ConnectionPool::new(3); + let (c1, u1) = cp.put(&meta1, mock_io1.clone()); + let (c2, u2) = cp.put(&meta2, mock_io2.clone()); + let (c3, u3) = cp.put(&meta3, mock_io3.clone()); + + let closed_item = tokio::select! { + _ = cp.idle_poll(mock_io1.try_lock_owned().unwrap(), &meta1, None, c1, u1) => {debug!("notifier1"); 1}, + _ = cp.idle_poll(mock_io2.try_lock_owned().unwrap(), &meta1, None, c2, u2) => {debug!("notifier2"); 2}, + _ = cp.idle_poll(mock_io3.try_lock_owned().unwrap(), &meta1, None, c3, u3) => {debug!("notifier3"); 3}, + }; + assert_eq!(closed_item, 1); + + let _ = cp.get(&meta1.key).unwrap(); // mock_io3 should be selected + assert!(cp.get(&meta1.key).is_none()) // mock_io1 should already be removed by idle_poll + } + + #[tokio::test] + async fn test_read_timeout() { + let meta1 = ConnectionMeta::new(101, 1); + let mock_io1 = Arc::new(AsyncMutex::new( + Builder::new().wait(Duration::from_secs(99)).build(), + )); + let meta2 = ConnectionMeta::new(102, 2); + let mock_io2 = Arc::new(AsyncMutex::new( + Builder::new().wait(Duration::from_secs(99)).build(), + )); + let meta3 = ConnectionMeta::new(101, 3); + let mock_io3 = Arc::new(AsyncMutex::new( + Builder::new().wait(Duration::from_secs(99)).build(), + )); + let cp: ConnectionPool<Arc<AsyncMutex<Mock>>> = ConnectionPool::new(3); + let (c1, u1) = cp.put(&meta1, mock_io1.clone()); + let (c2, u2) = cp.put(&meta2, mock_io2.clone()); + let (c3, u3) = cp.put(&meta3, mock_io3.clone()); + + let closed_item = tokio::select! { + _ = cp.idle_poll(mock_io1.try_lock_owned().unwrap(), &meta1, Some(Duration::from_secs(1)), c1, u1) => {debug!("notifier1"); 1}, + _ = cp.idle_poll(mock_io2.try_lock_owned().unwrap(), &meta1, Some(Duration::from_secs(2)), c2, u2) => {debug!("notifier2"); 2}, + _ = cp.idle_poll(mock_io3.try_lock_owned().unwrap(), &meta1, Some(Duration::from_secs(3)), c3, u3) => {debug!("notifier3"); 3}, + }; + assert_eq!(closed_item, 1); + + let _ = cp.get(&meta1.key).unwrap(); // mock_io3 should be selected + assert!(cp.get(&meta1.key).is_none()) // mock_io1 should already be removed by idle_poll + } + + #[tokio::test] + async fn test_evict_poll() { + let meta1 = ConnectionMeta::new(101, 1); + let mock_io1 = Arc::new(AsyncMutex::new( + Builder::new().wait(Duration::from_secs(99)).build(), + )); + let meta2 = ConnectionMeta::new(102, 2); + let mock_io2 = Arc::new(AsyncMutex::new( + Builder::new().wait(Duration::from_secs(99)).build(), + )); + let meta3 = ConnectionMeta::new(101, 3); + let mock_io3 = Arc::new(AsyncMutex::new( + Builder::new().wait(Duration::from_secs(99)).build(), + )); + let cp: ConnectionPool<Arc<AsyncMutex<Mock>>> = ConnectionPool::new(2); + let (c1, u1) = cp.put(&meta1, mock_io1.clone()); + let (c2, u2) = cp.put(&meta2, mock_io2.clone()); + let (c3, u3) = cp.put(&meta3, mock_io3.clone()); // 1 should be evicted at this point + + let closed_item = tokio::select! { + _ = cp.idle_poll(mock_io1.try_lock_owned().unwrap(), &meta1, None, c1, u1) => {debug!("notifier1"); 1}, + _ = cp.idle_poll(mock_io2.try_lock_owned().unwrap(), &meta1, None, c2, u2) => {debug!("notifier2"); 2}, + _ = cp.idle_poll(mock_io3.try_lock_owned().unwrap(), &meta1, None, c3, u3) => {debug!("notifier3"); 3}, + }; + assert_eq!(closed_item, 1); + + let _ = cp.get(&meta1.key).unwrap(); // mock_io3 should be selected + assert!(cp.get(&meta1.key).is_none()) // mock_io1 should already be removed by idle_poll + } +} diff --git a/pingora-pool/src/lib.rs b/pingora-pool/src/lib.rs new file mode 100644 index 0000000..6cbd99e --- /dev/null +++ b/pingora-pool/src/lib.rs @@ -0,0 +1,28 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Generic connection pooling +//! +//! The pool is optimized for high concurrency, high RPS use cases. Each connection group has a +//! lock free hot pool to reduce the lock contention when some connections are reused and released +//! very frequently. + +#![warn(clippy::all)] +#![allow(clippy::new_without_default)] +#![allow(clippy::type_complexity)] + +mod connection; +mod lru; + +pub use connection::{ConnectionMeta, ConnectionPool, PoolNode}; diff --git a/pingora-pool/src/lru.rs b/pingora-pool/src/lru.rs new file mode 100644 index 0000000..a370cfb --- /dev/null +++ b/pingora-pool/src/lru.rs @@ -0,0 +1,177 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use core::hash::Hash; +use lru::LruCache; +use parking_lot::RwLock; +use std::cell::RefCell; +use std::sync::atomic::{AtomicBool, Ordering::Relaxed}; +use std::sync::Arc; +use thread_local::ThreadLocal; +use tokio::sync::Notify; + +pub struct Node<T> { + pub close_notifier: Arc<Notify>, + pub meta: T, +} + +impl<T> Node<T> { + pub fn new(meta: T) -> Self { + Node { + close_notifier: Arc::new(Notify::new()), + meta, + } + } + + pub fn notify_close(&self) { + self.close_notifier.notify_one(); + } +} + +pub struct Lru<K, T> +where + K: Send, + T: Send, +{ + lru: RwLock<ThreadLocal<RefCell<LruCache<K, Node<T>>>>>, + size: usize, + drain: AtomicBool, +} + +impl<K, T> Lru<K, T> +where + K: Hash + Eq + Send, + T: Send, +{ + pub fn new(size: usize) -> Self { + Lru { + lru: RwLock::new(ThreadLocal::new()), + size, + drain: AtomicBool::new(false), + } + } + + // put a node in and return the meta of the replaced node + pub fn put(&self, key: K, value: Node<T>) -> Option<T> { + if self.drain.load(Relaxed) { + value.notify_close(); // sort of hack to simulate being evicted right away + return None; + } + let lru = self.lru.read(); /* read lock */ + let lru_cache = &mut *(lru + .get_or(|| RefCell::new(LruCache::unbounded())) + .borrow_mut()); + lru_cache.put(key, value); + if lru_cache.len() > self.size { + match lru_cache.pop_lru() { + Some((_, v)) => { + // TODO: drop the lock here? + v.notify_close(); + return Some(v.meta); + } + None => return None, + } + } + None + /* read lock dropped */ + } + + pub fn add(&self, key: K, meta: T) -> (Arc<Notify>, Option<T>) { + let node = Node::new(meta); + let notifier = node.close_notifier.clone(); + // TODO: check if the key is already in it + (notifier, self.put(key, node)) + } + + pub fn pop(&self, key: &K) -> Option<Node<T>> { + let lru = self.lru.read(); /* read lock */ + let lru_cache = &mut *(lru + .get_or(|| RefCell::new(LruCache::unbounded())) + .borrow_mut()); + lru_cache.pop(key) + /* read lock dropped */ + } + + #[allow(dead_code)] + pub fn drain(&self) { + self.drain.store(true, Relaxed); + + /* drain need to go through all the local lru cache objects + * acquire an exclusive write lock to make it safe */ + let mut lru = self.lru.write(); /* write lock */ + let lru_cache_iter = lru.iter_mut(); + for lru_cache_rc in lru_cache_iter { + let mut lru_cache = lru_cache_rc.borrow_mut(); + for (_, item) in lru_cache.iter() { + item.notify_close(); + } + lru_cache.clear(); + } + /* write lock dropped */ + } +} + +#[cfg(test)] +mod tests { + use super::*; + use log::debug; + + #[tokio::test] + async fn test_evict_close() { + let pool: Lru<i32, ()> = Lru::new(2); + let (notifier1, _) = pool.add(1, ()); + let (notifier2, _) = pool.add(2, ()); + let (notifier3, _) = pool.add(3, ()); + let closed_item = tokio::select! { + _ = notifier1.notified() => {debug!("notifier1"); 1}, + _ = notifier2.notified() => {debug!("notifier2"); 2}, + _ = notifier3.notified() => {debug!("notifier3"); 3}, + }; + assert_eq!(closed_item, 1); + } + + #[tokio::test] + async fn test_evict_close_with_pop() { + let pool: Lru<i32, ()> = Lru::new(2); + let (notifier1, _) = pool.add(1, ()); + let (notifier2, _) = pool.add(2, ()); + pool.pop(&1); + let (notifier3, _) = pool.add(3, ()); + let (notifier4, _) = pool.add(4, ()); + let closed_item = tokio::select! { + _ = notifier1.notified() => {debug!("notifier1"); 1}, + _ = notifier2.notified() => {debug!("notifier2"); 2}, + _ = notifier3.notified() => {debug!("notifier3"); 3}, + _ = notifier4.notified() => {debug!("notifier4"); 4}, + }; + assert_eq!(closed_item, 2); + } + + #[tokio::test] + async fn test_drain() { + let pool: Lru<i32, ()> = Lru::new(4); + let (notifier1, _) = pool.add(1, ()); + let (notifier2, _) = pool.add(2, ()); + let (notifier3, _) = pool.add(3, ()); + pool.drain(); + let (notifier4, _) = pool.add(4, ()); + + tokio::join!( + notifier1.notified(), + notifier2.notified(), + notifier3.notified(), + notifier4.notified() + ); + } +} diff --git a/pingora-proxy/Cargo.toml b/pingora-proxy/Cargo.toml new file mode 100644 index 0000000..d76ab48 --- /dev/null +++ b/pingora-proxy/Cargo.toml @@ -0,0 +1,49 @@ +[package] +name = "pingora-proxy" +version = "0.1.0" +authors = ["Yuchen Wu <[email protected]>"] +license = "Apache-2.0" +edition = "2021" +repository = "https://github.com/cloudflare/pingora" +categories = ["asynchronous", "network-programming"] +keywords = ["async", "http", "proxy", "pingora"] +exclude = ["tests/*"] +description = """ +Pingora HTTP proxy APIs and traits. +""" + +# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html +[lib] +name = "pingora_proxy" +path = "src/lib.rs" + +[dependencies] +pingora-error = { version = "0.1.0", path = "../pingora-error" } +pingora-core = { version = "0.1.0", path = "../pingora-core" } +pingora-timeout = { version = "0.1.0", path = "../pingora-timeout" } +pingora-cache = { version = "0.1.0", path = "../pingora-cache" } +tokio = { workspace = true, features = ["macros", "net"] } +pingora-http = { version = "0.1.0", path = "../pingora-http" } +http = { workspace = true } +futures = "0.3" +bytes = { workspace = true } +async-trait = { workspace = true } +log = { workspace = true } +h2 = { workspace = true } +once_cell = { workspace = true } +structopt = "0.3" +regex = "1" + +[dev-dependencies] +reqwest = { version = "0.11", features = [ + "gzip", + "rustls", +], default-features = false } +tokio-test = "0.4" +env_logger = "0.9" +hyperlocal = "0.8" +hyper = "0.14" +tokio-tungstenite = "0.20.1" +pingora-load-balancing = { version = "0.1.0", path = "../pingora-load-balancing" } +prometheus = "0" +futures-util = "0.3" diff --git a/pingora-proxy/LICENSE b/pingora-proxy/LICENSE new file mode 100644 index 0000000..d645695 --- /dev/null +++ b/pingora-proxy/LICENSE @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/pingora-proxy/examples/ctx.rs b/pingora-proxy/examples/ctx.rs new file mode 100644 index 0000000..36169e2 --- /dev/null +++ b/pingora-proxy/examples/ctx.rs @@ -0,0 +1,99 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use async_trait::async_trait; +use log::info; +use std::sync::Mutex; +use structopt::StructOpt; + +use pingora_core::server::configuration::Opt; +use pingora_core::server::Server; +use pingora_core::upstreams::peer::HttpPeer; +use pingora_core::Result; +use pingora_proxy::{ProxyHttp, Session}; + +// global counter +static REQ_COUNTER: Mutex<usize> = Mutex::new(0); + +pub struct MyProxy { + // counter for the service + beta_counter: Mutex<usize>, // AtomicUsize works too +} + +pub struct MyCtx { + beta_user: bool, +} + +fn check_beta_user(req: &pingora_http::RequestHeader) -> bool { + // some simple logic to check if user is beta + req.headers.get("beta-flag").is_some() +} + +#[async_trait] +impl ProxyHttp for MyProxy { + type CTX = MyCtx; + fn new_ctx(&self) -> Self::CTX { + MyCtx { beta_user: false } + } + + async fn request_filter(&self, session: &mut Session, ctx: &mut Self::CTX) -> Result<bool> { + ctx.beta_user = check_beta_user(session.req_header()); + Ok(false) + } + + async fn upstream_peer( + &self, + _session: &mut Session, + ctx: &mut Self::CTX, + ) -> Result<Box<HttpPeer>> { + let mut req_counter = REQ_COUNTER.lock().unwrap(); + *req_counter += 1; + + let addr = if ctx.beta_user { + let mut beta_count = self.beta_counter.lock().unwrap(); + *beta_count += 1; + info!("I'm a beta user #{beta_count}"); + ("1.0.0.1", 443) + } else { + info!("I'm an user #{req_counter}"); + ("1.1.1.1", 443) + }; + + let peer = Box::new(HttpPeer::new(addr, true, "one.one.one.one".to_string())); + Ok(peer) + } +} + +// RUST_LOG=INFO cargo run --example ctx +// curl 127.0.0.1:6190 -H "Host: one.one.one.one" +// curl 127.0.0.1:6190 -H "Host: one.one.one.one" -H "beta-flag: 1" +fn main() { + env_logger::init(); + + // read command line arguments + let opt = Opt::from_args(); + let mut my_server = Server::new(Some(opt)).unwrap(); + my_server.bootstrap(); + + let mut my_proxy = pingora_proxy::http_proxy_service( + &my_server.configuration, + MyProxy { + beta_counter: Mutex::new(0), + }, + ); + my_proxy.add_tcp("0.0.0.0:6190"); + + my_server.add_service(my_proxy); + my_server.run_forever(); +} diff --git a/pingora-proxy/examples/gateway.rs b/pingora-proxy/examples/gateway.rs new file mode 100644 index 0000000..27c5020 --- /dev/null +++ b/pingora-proxy/examples/gateway.rs @@ -0,0 +1,136 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use async_trait::async_trait; +use log::info; +use prometheus::register_int_counter; +use structopt::StructOpt; + +use pingora_core::server::configuration::Opt; +use pingora_core::server::Server; +use pingora_core::upstreams::peer::HttpPeer; +use pingora_core::Result; +use pingora_http::ResponseHeader; +use pingora_proxy::{ProxyHttp, Session}; + +fn check_login(req: &pingora_http::RequestHeader) -> bool { + // implement you logic check logic here + req.headers.get("Authorization").map(|v| v.as_bytes()) == Some(b"password") +} + +pub struct MyGateway { + req_metric: prometheus::IntCounter, +} + +#[async_trait] +impl ProxyHttp for MyGateway { + type CTX = (); + fn new_ctx(&self) -> Self::CTX {} + + async fn request_filter(&self, session: &mut Session, _ctx: &mut Self::CTX) -> Result<bool> { + if session.req_header().uri.path().starts_with("/login") + && !check_login(session.req_header()) + { + let _ = session.respond_error(403).await; + // true: early return as the response is already written + return Ok(true); + } + Ok(false) + } + + async fn upstream_peer( + &self, + session: &mut Session, + _ctx: &mut Self::CTX, + ) -> Result<Box<HttpPeer>> { + let addr = if session.req_header().uri.path().starts_with("/family") { + ("1.0.0.1", 443) + } else { + ("1.1.1.1", 443) + }; + + info!("connecting to {addr:?}"); + + let peer = Box::new(HttpPeer::new(addr, true, "one.one.one.one".to_string())); + Ok(peer) + } + + async fn response_filter( + &self, + _session: &mut Session, + upstream_response: &mut ResponseHeader, + _ctx: &mut Self::CTX, + ) -> Result<()> + where + Self::CTX: Send + Sync, + { + // replace existing header if any + upstream_response + .insert_header("Server", "MyGateway") + .unwrap(); + // because we don't support h3 + upstream_response.remove_header("alt-svc"); + + Ok(()) + } + + async fn logging( + &self, + session: &mut Session, + _e: Option<&pingora_core::Error>, + ctx: &mut Self::CTX, + ) { + let response_code = session + .response_written() + .map_or(0, |resp| resp.status.as_u16()); + info!( + "{} response code: {response_code}", + self.request_summary(session, ctx) + ); + + self.req_metric.inc(); + } +} + +// RUST_LOG=INFO cargo run --example load_balancer +// curl 127.0.0.1:6191 -H "Host: one.one.one.one" +// curl 127.0.0.1:6190/family/ -H "Host: one.one.one.one" +// curl 127.0.0.1:6191/login/ -H "Host: one.one.one.one" -I -H "Authorization: password" +// curl 127.0.0.1:6191/login/ -H "Host: one.one.one.one" -I -H "Authorization: bad" +// For metrics +// curl 127.0.0.1:6192/ +fn main() { + env_logger::init(); + + // read command line arguments + let opt = Opt::from_args(); + let mut my_server = Server::new(Some(opt)).unwrap(); + my_server.bootstrap(); + + let mut my_proxy = pingora_proxy::http_proxy_service( + &my_server.configuration, + MyGateway { + req_metric: register_int_counter!("reg_counter", "Number of requests").unwrap(), + }, + ); + my_proxy.add_tcp("0.0.0.0:6191"); + my_server.add_service(my_proxy); + + let mut prometheus_service_http = + pingora_core::services::listening::Service::prometheus_http_service(); + prometheus_service_http.add_tcp("127.0.0.1:6192"); + my_server.add_service(prometheus_service_http); + + my_server.run_forever(); +} diff --git a/pingora-proxy/examples/load_balancer.rs b/pingora-proxy/examples/load_balancer.rs new file mode 100644 index 0000000..425b6ec --- /dev/null +++ b/pingora-proxy/examples/load_balancer.rs @@ -0,0 +1,96 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use async_trait::async_trait; +use log::info; +use pingora_core::services::background::background_service; +use std::{sync::Arc, time::Duration}; +use structopt::StructOpt; + +use pingora_core::server::configuration::Opt; +use pingora_core::server::Server; +use pingora_core::upstreams::peer::HttpPeer; +use pingora_core::Result; +use pingora_load_balancing::{health_check, selection::RoundRobin, LoadBalancer}; +use pingora_proxy::{ProxyHttp, Session}; + +pub struct LB(Arc<LoadBalancer<RoundRobin>>); + +#[async_trait] +impl ProxyHttp for LB { + type CTX = (); + fn new_ctx(&self) -> Self::CTX {} + + async fn upstream_peer(&self, _session: &mut Session, _ctx: &mut ()) -> Result<Box<HttpPeer>> { + let upstream = self + .0 + .select(b"", 256) // hash doesn't matter + .unwrap(); + + info!("upstream peer is: {:?}", upstream); + + let peer = Box::new(HttpPeer::new(upstream, true, "one.one.one.one".to_string())); + Ok(peer) + } + + async fn upstream_request_filter( + &self, + _session: &mut Session, + upstream_request: &mut pingora_http::RequestHeader, + _ctx: &mut Self::CTX, + ) -> Result<()> { + upstream_request + .insert_header("Host", "one.one.one.one") + .unwrap(); + Ok(()) + } +} + +// RUST_LOG=INFO cargo run --example load_balancer +fn main() { + env_logger::init(); + + // read command line arguments + let opt = Opt::from_args(); + let mut my_server = Server::new(Some(opt)).unwrap(); + my_server.bootstrap(); + + // 127.0.0.1:343" is just a bad server + let mut upstreams = + LoadBalancer::try_from_iter(["1.1.1.1:443", "1.0.0.1:443", "127.0.0.1:343"]).unwrap(); + + // We add health check in the background so that the bad server is never selected. + let hc = health_check::TcpHealthCheck::new(); + upstreams.set_health_check(hc); + upstreams.health_check_frequency = Some(Duration::from_secs(1)); + + let background = background_service("health check", upstreams); + + let upstreams = background.task(); + + let mut lb = pingora_proxy::http_proxy_service(&my_server.configuration, LB(upstreams)); + lb.add_tcp("0.0.0.0:6188"); + + let cert_path = format!("{}/tests/keys/server.crt", env!("CARGO_MANIFEST_DIR")); + let key_path = format!("{}/tests/keys/key.pem", env!("CARGO_MANIFEST_DIR")); + + let mut tls_settings = + pingora_core::listeners::TlsSettings::intermediate(&cert_path, &key_path).unwrap(); + tls_settings.enable_h2(); + lb.add_tls_with_settings("0.0.0.0:6189", None, tls_settings); + + my_server.add_service(lb); + my_server.add_service(background); + my_server.run_forever(); +} diff --git a/pingora-proxy/src/lib.rs b/pingora-proxy/src/lib.rs new file mode 100644 index 0000000..d1aa399 --- /dev/null +++ b/pingora-proxy/src/lib.rs @@ -0,0 +1,628 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! # pingora-proxy +//! +//! Programmable HTTP proxy built on top of [pingora_core]. +//! +//! # Features +//! - HTTP/1.x and HTTP/2 for both downstream and upstream +//! - Connection pooling +//! - TLSv1.3, mutual TLS, customizable CA +//! - Request/Response scanning, modification or rejection +//! - Dynamic upstream selection +//! - Configurable retry and failover +//! - Fully programmable and customizable at any stage of a HTTP request +//! +//! # How to use +//! +//! Users of this crate defines their proxy by implementing [ProxyHttp] trait, which contains the +//! callbacks to be invoked at each stage of a HTTP request. +//! +//! Then the service can be passed into [`http_proxy_service()`] for a [pingora_core::server::Server] to +//! run it. +//! +//! See `examples/load_balancer.rs` for a detailed example. + +// enable nightly feature async trait so that the docs are cleaner +#![cfg_attr(doc_async_trait, feature(async_fn_in_trait))] + +use async_trait::async_trait; +use bytes::Bytes; +use futures::future::FutureExt; +use http::{header, version::Version}; +use log::{debug, error, trace, warn}; +use once_cell::sync::Lazy; +use pingora_http::{RequestHeader, ResponseHeader}; +use std::fmt::Debug; +use std::str; +use std::sync::Arc; +use tokio::sync::{mpsc, Notify}; +use tokio::time; + +use pingora_cache::NoCacheReason; +use pingora_core::apps::HttpServerApp; +use pingora_core::connectors::{http::Connector, ConnectorOptions}; +use pingora_core::protocols::http::client::HttpSession as ClientSession; +use pingora_core::protocols::http::v1::client::HttpSession as HttpSessionV1; +use pingora_core::protocols::http::HttpTask; +use pingora_core::protocols::http::ServerSession as HttpSession; +use pingora_core::protocols::http::SERVER_NAME; +use pingora_core::protocols::Stream; +use pingora_core::protocols::{Digest, UniqueID}; +use pingora_core::server::configuration::ServerConf; +use pingora_core::server::ShutdownWatch; +use pingora_core::upstreams::peer::{HttpPeer, Peer}; +use pingora_error::{Error, ErrorSource, ErrorType::*, OrErr, Result}; + +const MAX_RETRIES: usize = 16; +const TASK_BUFFER_SIZE: usize = 4; + +mod proxy_cache; +mod proxy_common; +mod proxy_h1; +mod proxy_h2; +mod proxy_purge; +mod proxy_trait; +mod subrequest; + +use subrequest::Ctx as SubReqCtx; + +pub use proxy_trait::ProxyHttp; + +pub mod prelude { + pub use crate::{http_proxy_service, ProxyHttp, Session}; +} + +/// The concrete type that holds the user defined HTTP proxy. +/// +/// Users don't need to interact with this object directly. +pub struct HttpProxy<SV> { + inner: SV, // TODO: name it better than inner + client_upstream: Connector, + shutdown: Notify, +} + +impl<SV> HttpProxy<SV> { + fn new(inner: SV, conf: Arc<ServerConf>) -> Arc<Self> { + Arc::new(HttpProxy { + inner, + client_upstream: Connector::new(Some(ConnectorOptions::from_server_conf(&conf))), + shutdown: Notify::new(), + }) + } + + async fn handle_new_request( + &self, + mut downstream_session: Box<HttpSession>, + ) -> Option<Box<HttpSession>> + where + SV: ProxyHttp + Send + Sync, + SV::CTX: Send + Sync, + { + // phase 1 read request header + + let res = tokio::select! { + biased; // biased select is cheaper and we don't want to drop already buffered requests + res = downstream_session.read_request() => { res } + _ = self.shutdown.notified() => { + // service shutting down, dropping the connection to stop more req from coming in + return None; + } + }; + match res { + Ok(true) => { + // TODO: check n==0 + debug!("Successfully get a new request"); + } + Ok(false) => { + return None; // TODO: close connection? + } + Err(mut e) => { + e.as_down(); + error!("Fail to proxy: {}", e); + if matches!(e.etype, InvalidHTTPHeader) { + downstream_session.respond_error(400).await; + } // otherwise the connection must be broken, no need to send anything + downstream_session.shutdown().await; + return None; + } + } + trace!( + "Request header: {:?}", + downstream_session.req_header().as_ref() + ); + Some(downstream_session) + } + + // return bool: server_session can be reused, and error if any + async fn proxy_to_upstream( + &self, + session: &mut Session, + ctx: &mut SV::CTX, + ) -> (bool, Option<Box<Error>>) + where + SV: ProxyHttp + Send + Sync, + SV::CTX: Send + Sync, + { + let peer = match self.inner.upstream_peer(session, ctx).await { + Ok(p) => p, + Err(e) => return (false, Some(e)), + }; + + let client_session = self.client_upstream.get_http_session(&*peer).await; + match client_session { + Ok((client_session, client_reused)) => { + let (server_reused, error) = match client_session { + ClientSession::H1(mut h1) => { + let (server_reused, client_reuse, error) = self + .proxy_to_h1_upstream(session, &mut h1, client_reused, &peer, ctx) + .await; + if client_reuse { + let session = ClientSession::H1(h1); + self.client_upstream + .release_http_session(session, &*peer, peer.idle_timeout()) + .await; + } + (server_reused, error) + } + ClientSession::H2(mut h2) => { + let (server_reused, mut error) = self + .proxy_to_h2_upstream(session, &mut h2, client_reused, &peer, ctx) + .await; + let session = ClientSession::H2(h2); + self.client_upstream + .release_http_session(session, &*peer, peer.idle_timeout()) + .await; + + if let Some(e) = error.as_mut() { + // try to downgrade if A. origin says so or B. origin sends an invalid + // response, which usually means origin h2 is not production ready + if matches!(e.etype, H2Downgrade | InvalidH2) { + if peer + .get_alpn() + .map_or(true, |alpn| alpn.get_min_http_version() == 1) + { + // Add the peer to prefer h1 so that all following requests + // will use h1 + self.client_upstream.prefer_h1(&*peer); + } else { + // the peer doesn't allow downgrading to h1 (e.g. gRPC) + e.retry = false.into(); + } + } + } + + (server_reused, error) + } + }; + ( + server_reused, + error.map(|e| { + self.inner + .error_while_proxy(&peer, session, e, ctx, client_reused) + }), + ) + } + Err(e) => { + let new_err = self.inner.fail_to_connect(session, &peer, ctx, e); + (false, Some(new_err.into_up())) + } + } + } + + fn upstream_filter(&self, session: &mut Session, task: &mut HttpTask, ctx: &mut SV::CTX) + where + SV: ProxyHttp, + { + match task { + HttpTask::Header(header, _eos) => { + self.inner.upstream_response_filter(session, header, ctx) + } + HttpTask::Body(data, eos) => self + .inner + .upstream_response_body_filter(session, data, *eos, ctx), + _ => { + // TODO: add other upstream filter traits + } + } + } + + async fn finish( + &self, + mut session: Session, + ctx: &mut SV::CTX, + reuse: bool, + error: Option<&Error>, + ) -> Option<Stream> + where + SV: ProxyHttp + Send + Sync, + SV::CTX: Send + Sync, + { + self.inner.logging(&mut session, error, ctx).await; + + if reuse { + // TODO: log error + session.downstream_session.finish().await.ok().flatten() + } else { + None + } + } +} + +use pingora_cache::HttpCache; +use pingora_core::protocols::http::compression::ResponseCompressionCtx; + +/// The established HTTP session +/// +/// This object is what users interact with in order to access the request itself or change the proxy +/// behavior. +pub struct Session { + /// the HTTP session to downstream (the client) + pub downstream_session: Box<HttpSession>, + /// The interface to control HTTP caching + pub cache: HttpCache, + /// (de)compress responses coming into the proxy (from upstream) + pub upstream_compression: ResponseCompressionCtx, + /// (de)compress responses leaving the proxy (to downstream) + pub downstream_compression: ResponseCompressionCtx, + /// ignore downstream range (skip downstream range filters) + pub ignore_downstream_range: bool, + // the context from parent request + subrequest_ctx: Option<Box<SubReqCtx>>, +} + +impl Session { + fn new(downstream_session: impl Into<Box<HttpSession>>) -> Self { + Session { + downstream_session: downstream_session.into(), + cache: HttpCache::new(), + upstream_compression: ResponseCompressionCtx::new(0, false), // disable both + downstream_compression: ResponseCompressionCtx::new(0, false), // disable both + ignore_downstream_range: false, + subrequest_ctx: None, + } + } + + /// Create a new [Session] from the given [Stream] + /// + /// This function is mostly used for testing and mocking. + pub fn new_h1(stream: Stream) -> Self { + Self::new(Box::new(HttpSession::new_http1(stream))) + } + + pub fn as_downstream_mut(&mut self) -> &mut HttpSession { + &mut self.downstream_session + } + + pub fn as_downstream(&self) -> &HttpSession { + &self.downstream_session + } +} + +impl Session { + async fn write_response_tasks(&mut self, mut tasks: Vec<HttpTask>) -> Result<bool> { + // all built-in downstream response filters goes here + // NOTE: if downstream_session is written directly (error page), the filters will be + // bypassed. + tasks + .iter_mut() + .for_each(|t| self.downstream_compression.response_filter(t)); + self.downstream_session.response_duplex_vec(tasks).await + } +} + +impl AsRef<HttpSession> for Session { + fn as_ref(&self) -> &HttpSession { + &self.downstream_session + } +} + +impl AsMut<HttpSession> for Session { + fn as_mut(&mut self) -> &mut HttpSession { + &mut self.downstream_session + } +} + +use std::ops::{Deref, DerefMut}; + +impl Deref for Session { + type Target = HttpSession; + + fn deref(&self) -> &Self::Target { + &self.downstream_session + } +} + +impl DerefMut for Session { + fn deref_mut(&mut self) -> &mut Self::Target { + &mut self.downstream_session + } +} + +// generic HTTP 502 response sent when proxy_upstream_filter refuses to connect to upstream +static BAD_GATEWAY: Lazy<ResponseHeader> = Lazy::new(|| { + let mut resp = ResponseHeader::build(http::StatusCode::BAD_GATEWAY, Some(3)).unwrap(); + resp.insert_header(header::SERVER, &SERVER_NAME[..]) + .unwrap(); + resp.insert_header(header::CONTENT_LENGTH, 0).unwrap(); + resp.insert_header(header::CACHE_CONTROL, "private, no-store") + .unwrap(); + + resp +}); + +impl<SV> HttpProxy<SV> { + async fn process_request( + self: &Arc<Self>, + mut session: Session, + mut ctx: <SV as ProxyHttp>::CTX, + ) -> Option<Stream> + where + SV: ProxyHttp + Send + Sync + 'static, + <SV as ProxyHttp>::CTX: Send + Sync, + { + match self.inner.request_filter(&mut session, &mut ctx).await { + Ok(response_sent) => { + if response_sent { + // TODO: log error + self.inner.logging(&mut session, None, &mut ctx).await; + return session.downstream_session.finish().await.ok().flatten(); + } + /* else continue */ + } + Err(e) => { + if !self.inner.suppress_error_log(&session, &ctx, &e) { + error!( + "Fail to filter request: {}, {}", + e, + self.inner.request_summary(&session, &ctx) + ); + } + self.inner.fail_to_proxy(&mut session, &e, &mut ctx).await; + self.inner.logging(&mut session, Some(&e), &mut ctx).await; + return None; + } + } + + // all built-in downstream request filters go below + + session + .downstream_compression + .request_filter(session.downstream_session.req_header()); + + if let Some((reuse, err)) = self.proxy_cache(&mut session, &mut ctx).await { + // cache hit + return self.finish(session, &mut ctx, reuse, err.as_deref()).await; + } + // either uncacheable, or cache miss + + // decide if the request is allowed to go to upstream + match self + .inner + .proxy_upstream_filter(&mut session, &mut ctx) + .await + { + Ok(proxy_to_upstream) => { + if !proxy_to_upstream { + // The hook can choose to write its own response, but if it doesn't we respond + // with a generic 502 + if session.response_written().is_none() { + match session.write_response_header_ref(&BAD_GATEWAY).await { + Ok(()) => {} + Err(e) => { + if !self.inner.suppress_error_log(&session, &ctx, &e) { + error!( + "Error responding with Bad Gateway: {}, {}", + e, + self.inner.request_summary(&session, &ctx) + ); + } + self.inner.fail_to_proxy(&mut session, &e, &mut ctx).await; + self.inner.logging(&mut session, Some(&e), &mut ctx).await; + return None; + } + } + } + + return self.finish(session, &mut ctx, false, None).await; + } + /* else continue */ + } + Err(e) => { + if !self.inner.suppress_error_log(&session, &ctx, &e) { + error!( + "Error deciding if we should proxy to upstream: {}, {}", + e, + self.inner.request_summary(&session, &ctx) + ); + } + self.inner.fail_to_proxy(&mut session, &e, &mut ctx).await; + self.inner.logging(&mut session, Some(&e), &mut ctx).await; + return None; + } + } + + let mut retries: usize = 0; + + let mut server_reuse = false; + let mut proxy_error: Option<Box<Error>> = None; + + while retries < MAX_RETRIES { + retries += 1; + + let (reuse, e) = self.proxy_to_upstream(&mut session, &mut ctx).await; + server_reuse = reuse; + + match e { + Some(error) => { + let retry = error.retry(); + proxy_error = Some(error); + if !retry { + break; + } + // only log error that will be retried here, the final error will be logged below + warn!( + "Fail to proxy: {}, tries: {}, retry: {}, {}", + proxy_error.as_ref().unwrap(), + retries, + retry, + self.inner.request_summary(&session, &ctx) + ); + } + None => { + proxy_error = None; + break; + } + }; + } + + // serve stale if error + // check both error and cache before calling the function because await is not cheap + let serve_stale_result = if proxy_error.is_some() && session.cache.can_serve_stale_error() { + self.handle_stale_if_error(&mut session, &mut ctx, proxy_error.as_ref().unwrap()) + .await + } else { + None + }; + + let final_error = if let Some((reuse, stale_cache_error)) = serve_stale_result { + // don't reuse server conn if serve stale polluted it + server_reuse = server_reuse && reuse; + stale_cache_error + } else { + proxy_error + }; + + if let Some(e) = final_error.as_ref() { + let status = self.inner.fail_to_proxy(&mut session, e, &mut ctx).await; + + // final error will have > 0 status unless downstream connection is dead + if !self.inner.suppress_error_log(&session, &ctx, e) { + error!( + "Fail to proxy: {}, status: {}, tries: {}, retry: {}, {}", + final_error.as_ref().unwrap(), + status, + retries, + false, // we never retry here + self.inner.request_summary(&session, &ctx) + ); + } + } + + // logging() will be called in finish() + self.finish(session, &mut ctx, server_reuse, final_error.as_deref()) + .await + } +} + +/* Make process_subrequest() a trait to workaround https://github.com/rust-lang/rust/issues/78649 + if process_subrequest() is implemented as a member of HttpProxy, rust complains + +error[E0391]: cycle detected when computing type of `proxy_cache::<impl at pingora-proxy/src/proxy_cache.rs:7:1: 7:23>::proxy_cache::{opaque#0}` + --> pingora-proxy/src/proxy_cache.rs:13:10 + | +13 | ) -> Option<(bool, Option<Box<Error>>)> + +*/ +#[async_trait] +trait Subrequest { + async fn process_subrequest( + self: &Arc<Self>, + session: Box<HttpSession>, + sub_req_ctx: Box<SubReqCtx>, + ); +} + +#[async_trait] +impl<SV> Subrequest for HttpProxy<SV> +where + SV: ProxyHttp + Send + Sync + 'static, + <SV as ProxyHttp>::CTX: Send + Sync, +{ + async fn process_subrequest( + self: &Arc<Self>, + session: Box<HttpSession>, + sub_req_ctx: Box<SubReqCtx>, + ) { + debug!("starting subrequest"); + let mut session = match self.handle_new_request(session).await { + Some(downstream_session) => Session::new(downstream_session), + None => return, // bad request + }; + + // no real downstream to keepalive but it doesn't matter what is set here because at the end + // of this fn the dummy connection will be dropped + session.set_keepalive(None); + + session.subrequest_ctx.replace(sub_req_ctx); + trace!("processing subrequest"); + let ctx = self.inner.new_ctx(); + self.process_request(session, ctx).await; + trace!("subrequest done"); + } +} + +#[async_trait] +impl<SV> HttpServerApp for HttpProxy<SV> +where + SV: ProxyHttp + Send + Sync + 'static, + <SV as ProxyHttp>::CTX: Send + Sync, +{ + async fn process_new_http( + self: &Arc<Self>, + session: HttpSession, + shutdown: &ShutdownWatch, + ) -> Option<Stream> { + let session = Box::new(session); + + // TODO: keepalive pool, use stack + let mut session = match self.handle_new_request(session).await { + Some(downstream_session) => Session::new(downstream_session), + None => return None, // bad request + }; + + if *shutdown.borrow() { + // stop downstream downstream from reusing if this service is shutting down soon + session.set_keepalive(None); + } else { + // default 60s + session.set_keepalive(Some(60)); + } + + let ctx = self.inner.new_ctx(); + self.process_request(session, ctx).await + } + + fn http_cleanup(&self) { + // Notify all keepalived request blocking on read_request() to abort + self.shutdown.notify_waiters(); + + // TODO: impl shutting down flag so that we don't need to read stack.is_shutting_down() + } + + // TODO implement h2_options +} + +use pingora_core::services::listening::Service; + +/// Create a [Service] from the user implemented [ProxyHttp]. +/// +/// The returned [Service] can be hosted by a [pingora_core::server::Server] directly. +pub fn http_proxy_service<SV>(conf: &Arc<ServerConf>, inner: SV) -> Service<HttpProxy<SV>> { + Service::new( + "Pingora HTTP Proxy Service".into(), + HttpProxy::new(inner, conf.clone()), + ) +} diff --git a/pingora-proxy/src/proxy_cache.rs b/pingora-proxy/src/proxy_cache.rs new file mode 100644 index 0000000..02bc378 --- /dev/null +++ b/pingora-proxy/src/proxy_cache.rs @@ -0,0 +1,1203 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use super::*; +use http::StatusCode; +use pingora_cache::key::CacheHashKey; +use pingora_cache::lock::LockStatus; +use pingora_cache::max_file_size::ERR_RESPONSE_TOO_LARGE; +use pingora_cache::{HitStatus, RespCacheable::*}; +use pingora_core::protocols::http::v1::common::header_value_content_length; +use pingora_core::ErrorType; + +impl<SV> HttpProxy<SV> { + // return bool: server_session can be reused, and error if any + pub(crate) async fn proxy_cache( + self: &Arc<Self>, + session: &mut Session, + ctx: &mut SV::CTX, + ) -> Option<(bool, Option<Box<Error>>)> + // None: continue to proxy, Some: return + where + SV: ProxyHttp + Send + Sync + 'static, + SV::CTX: Send + Sync, + { + // Cache logic request phase + if let Err(e) = self.inner.request_cache_filter(session, ctx) { + // TODO: handle this error + warn!( + "Fail to request_cache_filter: {e}, {}", + self.inner.request_summary(session, ctx) + ); + } + + // cache key logic, should this be part of request_cache_filter? + if session.cache.enabled() { + match self.inner.cache_key_callback(session, ctx) { + Ok(key) => { + session.cache.set_cache_key(key); + } + Err(e) => { + // TODO: handle this error + session.cache.disable(NoCacheReason::StorageError); + warn!( + "Fail to cache_key_callback: {e}, {}", + self.inner.request_summary(session, ctx) + ); + } + } + } + + // cache purge logic: PURGE short-circuits rest of request + if self.inner.is_purge(session, ctx) { + if session.cache.enabled() { + return self.proxy_purge(session, ctx).await; + } else { + return Some(proxy_purge::write_no_purge_response(session).await); + } + } + + // bypass cache lookup if we predict to be uncacheable + if session.cache.enabled() && !session.cache.cacheable_prediction() { + session.cache.bypass(); + } + + if !session.cache.enabled() { + return None; + } + + // cache lookup logic + loop { + // for cache lock, TODO: cap the max number of loops + match session.cache.cache_lookup().await { + Ok(res) => { + if let Some((mut meta, handler)) = res { + // vary logic + // because this branch can be called multiple times in a loop, and we only + // need to to update the vary once, check if variance is already set to + // prevent unnecessary vary lookups + let cache_key = session.cache.cache_key(); + if let Some(variance) = cache_key.variance_bin() { + // adhoc double check the variance found is the variance we want + if Some(variance) != meta.variance() { + warn!("Cache variance mismatch, {variance:?}, {cache_key:?}"); + session.cache.disable(NoCacheReason::InternalError); + break None; + } + } else { + let req_header = session.req_header(); + let variance = self.inner.cache_vary_filter(&meta, ctx, req_header); + if let Some(variance) = variance { + if !session.cache.cache_vary_lookup(variance, &meta) { + // cache key variance updated, need to lookup again + continue; + } + } //else: vary is not in use + } + + // either no variance or the current handler is the variance + + // hit + // TODO: maybe round and/or cache now() + let hit_status = if meta.is_fresh(std::time::SystemTime::now()) { + // check if we should force expire + // (this is a soft purge which tries to revalidate, + // vs. hard purge which forces miss) + // TODO: allow hard purge + match self + .inner + .cache_hit_filter(&meta, ctx, session.req_header()) + .await + { + Err(e) => { + error!( + "Failed to filter cache hit: {e}, {}", + self.inner.request_summary(session, ctx) + ); + // this return value will cause us to fetch from upstream + HitStatus::FailedHitFilter + } + Ok(expired) => { + // force expired asset should not be serve as stale + // because force expire is usually to remove data + if expired { + meta.disable_serve_stale(); + HitStatus::ForceExpired + } else { + HitStatus::Fresh + } + } + } + } else { + HitStatus::Expired + }; + // init cache for hit / stale + session.cache.cache_found(meta, handler, hit_status); + + if !hit_status.is_fresh() { + // expired or force expired asset + if session.cache.is_cache_locked() { + // first if this is the sub request for the background cache update + if let Some(write_lock) = session + .subrequest_ctx + .as_mut() + .and_then(|ctx| ctx.write_lock.take()) + { + // Put the write lock in the request + session.cache.set_write_lock(write_lock); + // and then let it go to upstream + break None; + } + let will_serve_stale = session.cache.can_serve_stale_updating() + && self.inner.should_serve_stale(session, ctx, None); + if !will_serve_stale { + let lock_status = session.cache.cache_lock_wait().await; + if self.handle_lock_status(session, ctx, lock_status) { + continue; + } else { + break None; + } + } // else continue to serve stale + } else if session.cache.is_cache_lock_writer() { + // stale while revalidate logic for the writer + let will_serve_stale = session.cache.can_serve_stale_updating() + && self.inner.should_serve_stale(session, ctx, None); + if will_serve_stale { + // create a background thread to do the actual update + let subrequest = + Box::new(crate::subrequest::create_dummy_session(session)); + let new_app = self.clone(); // Clone the Arc + let sub_req_ctx = Box::new(SubReqCtx { + write_lock: Some(session.cache.take_write_lock()), + }); + tokio::spawn(async move { + new_app.process_subrequest(subrequest, sub_req_ctx).await; + }); + // continue to serve stale for this request + } else { + // return to fetch from upstream + break None; + } + } else { + // return to fetch from upstream + break None; + } + } + let (reuse, err) = self.proxy_cache_hit(session, ctx).await; + if let Some(e) = err.as_ref() { + error!( + "Fail to serve cache: {e}, {}", + self.inner.request_summary(session, ctx) + ); + } + // responses is served from cache, exit + break Some((reuse, err)); + } else { + // cache miss + if session.cache.is_cache_locked() { + let lock_status = session.cache.cache_lock_wait().await; + if self.handle_lock_status(session, ctx, lock_status) { + continue; + } else { + break None; + } + } else { + self.inner.cache_miss(session, ctx); + break None; + } + } + } + Err(e) => { + // Allow cache miss to fill cache even if cache lookup errors + // this is mostly to suppport backward incompatible metadata update + // TODO: check error types + // session.cache.disable(); + self.inner.cache_miss(session, ctx); + warn!( + "Fail to cache lookup: {e}, {}", + self.inner.request_summary(session, ctx) + ); + break None; + } + } + } + } + + // return bool: server_session can be reused, and error if any + pub(crate) async fn proxy_cache_hit( + &self, + session: &mut Session, + ctx: &mut SV::CTX, + ) -> (bool, Option<Box<Error>>) + where + SV: ProxyHttp + Send + Sync, + SV::CTX: Send + Sync, + { + use range_filter::*; + + let seekable = session.cache.hit_handler().can_seek(); + let mut header = cache_hit_header(&session.cache); + + let req = session.req_header(); + + let header_only = conditional_filter::not_modified_filter(req, &mut header) + || req.method == http::method::Method::HEAD; + + // process range header if the cache storage supports seek + let range_type = if seekable && !session.ignore_downstream_range { + range_header_filter(req, &mut header) + } else { + RangeType::None + }; + + // return a 416 with an empty body for simplicity + let header_only = header_only || matches!(range_type, RangeType::Invalid); + + // TODO: use ProxyUseCache to replace the logic below + match self.inner.response_filter(session, &mut header, ctx).await { + Ok(_) => { + if let Err(e) = session + .as_mut() + .write_response_header(header) + .await + .map_err(|e| e.into_down()) + { + // downstream connection is bad already + return (false, Some(e)); + } + } + Err(e) => { + // TODO: more logging and error handling + session.as_mut().respond_error(500).await; + // we have not write anything dirty to downstream, it is still reuseable + return (true, Some(e)); + } + } + debug!("finished sending cached header to downstream"); + + if !header_only { + if let RangeType::Single(r) = range_type { + if let Err(e) = session.cache.hit_handler().seek(r.start, Some(r.end)) { + return (false, Some(e)); + } + } + loop { + match session.cache.hit_handler().read_body().await { + Ok(body) => { + if let Some(b) = body { + // write to downstream + if let Err(e) = session + .as_mut() + .write_response_body(b) + .await + .map_err(|e| e.into_down()) + { + return (false, Some(e)); + } + } else { + break; + } + } + Err(e) => return (false, Some(e)), + } + } + } + + if let Err(e) = session.cache.finish_hit_handler().await { + warn!("Error during finish_hit_handler: {}", e); + } + + match session.as_mut().finish_body().await { + Ok(_) => { + debug!("finished sending cached body to downstream"); + (true, None) + } + Err(e) => (false, Some(e)), + } + } + + // TODO: cache upstream header filter to add/remove headers + + pub(crate) async fn cache_http_task( + &self, + session: &mut Session, + task: &HttpTask, + ctx: &mut SV::CTX, + serve_from_cache: &mut ServeFromCache, + ) -> Result<()> + where + SV: ProxyHttp + Send + Sync, + SV::CTX: Send + Sync, + { + if !session.cache.enabled() && !session.cache.bypassing() { + return Ok(()); + } + + match task { + HttpTask::Header(header, end_stream) => { + // decide if cacheable and create cache meta + // for now, skip 1xxs (should not affect response cache decisions) + // However 101 is an exception because it is the final response header + if header.status.is_informational() + && header.status != StatusCode::SWITCHING_PROTOCOLS + { + return Ok(()); + } + match self.inner.response_cache_filter(session, header, ctx)? { + Cacheable(meta) => { + let mut fill_cache = true; + if session.cache.bypassing() { + // The cache might have been bypassed because the response exceeded the + // maximum cacheable asset size. If that looks like the case (there + // is a maximum file size configured and we don't know the content + // length up front), attempting to re-enable the cache now would cause + // the request to fail when the chunked response exceeds the maximum + // file size again. + if session.cache.max_file_size_bytes().is_some() + && !header.headers.contains_key(header::CONTENT_LENGTH) + { + session.cache.disable(NoCacheReason::ResponseTooLarge); + return Ok(()); + } + + session.cache.response_became_cacheable(); + + if header.status == StatusCode::OK { + self.inner.cache_miss(session, ctx); + } else { + // we've allowed caching on the next request, + // but do not cache _this_ request if bypassed and not 200 + // (We didn't run upstream request cache filters to strip range or condition headers, + // so this could be an uncacheable response e.g. 206 or 304. + // Exclude all non-200 for simplicity, may expand allowable codes in the future.) + fill_cache = false; + session.cache.disable(NoCacheReason::Deferred); + } + } + + // If the Content-Length is known, and a maximum asset size has been configured + // on the cache, validate that the response does not exceed the maximum asset size. + if session.cache.enabled() { + if let Some(max_file_size) = session.cache.max_file_size_bytes() { + let content_length_hdr = header.headers.get(header::CONTENT_LENGTH); + if let Some(content_length) = + header_value_content_length(content_length_hdr) + { + if content_length > max_file_size { + fill_cache = false; + session.cache.response_became_uncacheable( + NoCacheReason::ResponseTooLarge, + ); + session.cache.disable(NoCacheReason::ResponseTooLarge); + } + } + // if the content-length header is not specified, the miss handler + // will count the response size on the fly, aborting the request + // mid-transfer if the max file size is exceeded + } + } + if fill_cache { + let req_header = session.req_header(); + // Update the variance in the meta via the same callback, + // cache_vary_filter(), used in cache lookup for consistency. + // Future cache lookups need a matching variance in the meta + // with the cache key to pick up the correct variance + let variance = self.inner.cache_vary_filter(&meta, ctx, req_header); + session.cache.set_cache_meta(meta); + session.cache.update_variance(variance); + // this sends the meta and header + session.cache.set_miss_handler().await?; + if session.cache.miss_body_reader().is_some() { + serve_from_cache.enable_miss(); + } + if *end_stream { + session + .cache + .miss_handler() + .unwrap() // safe, it is set above + .write_body(Bytes::new(), true) + .await?; + session.cache.finish_miss_handler().await?; + } + } + } + Uncacheable(reason) => { + if !session.cache.bypassing() { + // mark as uncacheable, so we bypass cache next time + session.cache.response_became_uncacheable(reason); + } + session.cache.disable(reason); + } + } + } + HttpTask::Body(data, end_stream) => match data { + Some(d) => { + if session.cache.enabled() { + // this will panic if more data is sent after we see end_stream + // but should be impossible in real world + let miss_handler = session.cache.miss_handler().unwrap(); + // TODO: do this async + let res = miss_handler.write_body(d.clone(), *end_stream).await; + if let Err(err) = res { + if err.etype == ERR_RESPONSE_TOO_LARGE { + debug!("chunked response exceeded max cache size, remembering that it is uncacheable"); + session + .cache + .response_became_uncacheable(NoCacheReason::ResponseTooLarge); + } + + return Err(err); + } + if *end_stream { + session.cache.finish_miss_handler().await?; + } + } + } + None => { + if session.cache.enabled() && *end_stream { + session.cache.finish_miss_handler().await?; + } + } + }, + HttpTask::Trailer(_) => {} // h1 trailer is not supported yet + HttpTask::Done => { + if session.cache.enabled() { + session.cache.finish_miss_handler().await?; + } + } + HttpTask::Failed(_) => { + // TODO: handle this failure: delete the temp files? + } + } + Ok(()) + } + + // Decide if local cache can be used according to upstream http header + // 1. when upstream returns 304, the local cache is refreshed and served fresh + // 2. when upstream returns certain HTTP error status, the local cache is served stale + // Return true if local cache should be used, false otherwise + pub(crate) async fn revalidate_or_stale( + &self, + session: &mut Session, + task: &mut HttpTask, + ctx: &mut SV::CTX, + ) -> bool + where + SV: ProxyHttp + Send + Sync, + SV::CTX: Send + Sync, + { + if !session.cache.enabled() { + return false; + } + + match task { + HttpTask::Header(resp, _eos) => { + if resp.status == StatusCode::NOT_MODIFIED { + if session.cache.maybe_cache_meta().is_some() { + // 304 doesn't contain all the headers, merge 304 into cached 200 header + // in order for response_cache_filter to run correctly + let merged_header = session.cache.revalidate_merge_header(resp); + match self + .inner + .response_cache_filter(session, &merged_header, ctx) + { + Ok(Cacheable(mut meta)) => { + // For simplicity, ignore changes to variance over 304 for now. + // Note this means upstream can only update variance via 2xx + // (expired response). + // + // TODO: if we choose to respect changing Vary / variance over 304, + // then there are a few cases to consider. See `update_variance` in + // the `pingora-cache` module. + let old_meta = session.cache.maybe_cache_meta().unwrap(); // safe, checked above + if let Some(old_variance) = old_meta.variance() { + meta.set_variance(old_variance); + } + if let Err(e) = session.cache.revalidate_cache_meta(meta).await { + warn!("revalidate_cache_meta failed {:?}", e); + } + // We can continue use the revalidated one even the meta was failed + // to write to storage + true + } + Ok(Uncacheable(reason)) => { + // This response was once cacheable, and upstream tells us it has not changed + // but now we decided it is uncacheable! + // RFC 9111: still allowed to reuse stored response this time because + // it was "successfully validated" + // https://www.rfc-editor.org/rfc/rfc9111#constructing.responses.from.caches + // Serve the response, but do not update cache + + // We also want to avoid poisoning downstream's cache with an unsolicited 304 + // if we did not receive a conditional request from downstream + // (downstream may have a different cacheability assessment and could cache the 304) + + //TODO: log more + warn!("Uncacheable {:?} 304 received", reason); + session.cache.response_became_uncacheable(reason); + session.cache.revalidate_uncacheable(merged_header, reason); + true + } + Err(e) => { + warn!("Error {:?} response_cache_filter during revalidation, disable caching", e); + session.cache.disable(NoCacheReason::InternalError); + false + } + } + } else { + //TODO: log more + warn!("304 received without cached asset, disable caching"); + let reason = NoCacheReason::Custom("304 on miss"); + session.cache.response_became_uncacheable(reason); + session.cache.disable(reason); + false + } + } else if resp.status.is_server_error() { + // stale if error logic, 5xx only for now + + // this is response header filter, response_written should always be None? + if !session.cache.can_serve_stale_error() + || session.response_written().is_some() + { + return false; + } + + // create an error to encode the http status code + let http_status_error = Error::create( + ErrorType::HTTPStatus(resp.status.as_u16()), + ErrorSource::Upstream, + None, + None, + ); + self.inner + .should_serve_stale(session, ctx, Some(&http_status_error)) + } else { + false // not 304, not stale if error status code + } + } + _ => false, // not header + } + } + + // None: no staled asset is used, Some(_): staled asset is sent to downstream + // bool: can the downstream connection be reused + pub(crate) async fn handle_stale_if_error( + &self, + session: &mut Session, + ctx: &mut SV::CTX, + error: &Error, + ) -> Option<(bool, Option<Box<Error>>)> + where + SV: ProxyHttp + Send + Sync, + SV::CTX: Send + Sync, + { + // the caller might already checked this as an optimization + if !session.cache.can_serve_stale_error() { + return None; + } + + // the error happen halfway through a regular response to downstream + // can't resend the response + if session.response_written().is_some() { + return None; + } + + // check error types + if !self.inner.should_serve_stale(session, ctx, Some(error)) { + return None; + } + + // log the original error + warn!( + "Fail to proxy: {}, serving stale, {}", + error, + self.inner.request_summary(session, ctx) + ); + + Some(self.proxy_cache_hit(session, ctx).await) + } + + // helper function to check when to continue to retry lock (true) or give up (false) + fn handle_lock_status( + &self, + session: &mut Session, + ctx: &SV::CTX, + lock_status: LockStatus, + ) -> bool + where + SV: ProxyHttp, + { + debug!("cache unlocked {lock_status:?}"); + match lock_status { + // should lookup the cached asset again + LockStatus::Done => true, + // should compete to be a new writer + LockStatus::TransientError => true, + // the request is uncacheable, go ahead to fetch from the origin + LockStatus::GiveUp => { + // TODO: It will be nice for the writer to propagate the real reason + session.cache.disable(NoCacheReason::CacheLockGiveUp); + // not cacheable, just go to the origin. + false + } + // treat this the same as TransientError + LockStatus::Dangling => { + // software bug, but request can recover from this + warn!( + "Dangling cache lock, {}", + self.inner.request_summary(session, ctx) + ); + true + } + /* We have 3 options when a lock is held too long + * 1. release the lock and let every request complete for it again + * 2. let every request cache miss + * 3. let every request through while disabling cache + * #1 could repeat the situation but protect the origin from load + * #2 could amplify disk writes and storage for temp file + * #3 is the simplest option for now */ + LockStatus::Timeout => { + warn!( + "Cache lock timeout, {}", + self.inner.request_summary(session, ctx) + ); + session.cache.disable(NoCacheReason::CacheLockTimeout); + // not cacheable, just go to the origin. + false + } + // software bug, this status should be impossible to reach + LockStatus::Waiting => panic!("impossible LockStatus::Waiting"), + } + } +} + +fn cache_hit_header(cache: &HttpCache) -> Box<ResponseHeader> { + let mut header = Box::new(cache.cache_meta().response_header_copy()); + // convert cache response + + // these status codes / method cannot have body, so no need to add chunked encoding + let no_body = matches!(header.status.as_u16(), 204 | 304); + + // https://www.rfc-editor.org/rfc/rfc9111#section-4: + // When a stored response is used to satisfy a request without validation, a cache + // MUST generate an Age header field + if !cache.upstream_used() { + let age = cache.cache_meta().age().as_secs(); + header.insert_header(http::header::AGE, age).unwrap(); + } + + /* Add chunked header to tell downstream to use chunked encoding + * during the absent of content-length in h2 */ + if !no_body + && !header.status.is_informational() + && header.headers.get(http::header::CONTENT_LENGTH).is_none() + { + header + .insert_header(http::header::TRANSFER_ENCODING, "chunked") + .unwrap(); + } + header +} + +// https://datatracker.ietf.org/doc/html/rfc7233#section-3 +pub(crate) mod range_filter { + use super::*; + use http::header::*; + use std::ops::Range; + + // parse bytes into usize, ignores specifc error + fn parse_number(input: &[u8]) -> Option<usize> { + str::from_utf8(input).ok()?.parse().ok() + } + + fn parse_range_header(range: &[u8], content_length: usize) -> RangeType { + use regex::Regex; + + // single byte range only for now + // https://datatracker.ietf.org/doc/html/rfc7233#section-2.1 + // https://datatracker.ietf.org/doc/html/rfc7233#appendix-C: case insensitive + static RE_SINGLE_RANGE: Lazy<Regex> = + Lazy::new(|| Regex::new(r"(?i)bytes=(?P<start>\d*)-(?P<end>\d*)").unwrap()); + + // ignore invalid range header + let Ok(range_str) = str::from_utf8(range) else { + return RangeType::None; + }; + + let Some(captured) = RE_SINGLE_RANGE.captures(range_str) else { + return RangeType::None; + }; + let maybe_start = captured + .name("start") + .and_then(|s| s.as_str().parse::<usize>().ok()); + let end = captured + .name("end") + .and_then(|s| s.as_str().parse::<usize>().ok()); + + if let Some(start) = maybe_start { + if start >= content_length { + RangeType::Invalid + } else { + // open-ended range should end at the last byte + // over sized end is allow but ignored + // range end is inclusive + let end = std::cmp::min(end.unwrap_or(content_length - 1), content_length - 1) + 1; + if end <= start { + RangeType::Invalid + } else { + RangeType::new_single(start, end) + } + } + } else { + // start is empty, this changes the meaning of the value of `end` + // Now it means to read the last `end` bytes + if let Some(end) = end { + if content_length >= end { + RangeType::new_single(content_length - end, content_length) + } else { + // over sized end is allow but ignored + RangeType::new_single(0, content_length) + } + } else { + // both empty/invalid + RangeType::Invalid + } + } + } + #[test] + fn test_parse_range() { + assert_eq!( + parse_range_header(b"bytes=0-1", 10), + RangeType::new_single(0, 2) + ); + assert_eq!( + parse_range_header(b"bYTes=0-9", 10), + RangeType::new_single(0, 10) + ); + assert_eq!( + parse_range_header(b"bytes=0-12", 10), + RangeType::new_single(0, 10) + ); + assert_eq!( + parse_range_header(b"bytes=0-", 10), + RangeType::new_single(0, 10) + ); + assert_eq!(parse_range_header(b"bytes=2-1", 10), RangeType::Invalid); + assert_eq!(parse_range_header(b"bytes=10-11", 10), RangeType::Invalid); + assert_eq!( + parse_range_header(b"bytes=-2", 10), + RangeType::new_single(8, 10) + ); + assert_eq!( + parse_range_header(b"bytes=-12", 10), + RangeType::new_single(0, 10) + ); + assert_eq!(parse_range_header(b"bytes=-", 10), RangeType::Invalid); + assert_eq!(parse_range_header(b"bytes=", 10), RangeType::None); + } + + #[derive(Debug, Eq, PartialEq, Clone)] + pub enum RangeType { + None, + Single(Range<usize>), + // TODO: multi-range + Invalid, + } + + impl RangeType { + fn new_single(start: usize, end: usize) -> Self { + RangeType::Single(Range { start, end }) + } + } + + // TODO: if-range + + // single range for now + pub fn range_header_filter(req: &RequestHeader, resp: &mut ResponseHeader) -> RangeType { + // The Range header field is evaluated after evaluating the precondition + // header fields defined in [RFC7232], and only if the result in absence + // of the Range header field would be a 200 (OK) response + if resp.status != StatusCode::OK { + return RangeType::None; + } + + // "A server MUST ignore a Range header field received with a request method other than GET." + if req.method != http::Method::GET && req.method != http::Method::HEAD { + return RangeType::None; + } + + let Some(range_header) = req.headers.get(RANGE) else { + return RangeType::None; + }; + + // Content-Length is not required by RFC but it is what nginx does and easier to implement + // with this header present. + let Some(content_length_bytes) = resp.headers.get(CONTENT_LENGTH) else { + return RangeType::None; + }; + // bail on invalid content length + let Some(content_length) = parse_number(content_length_bytes.as_bytes()) else { + return RangeType::None; + }; + + // TODO: we can also check Accept-Range header from resp. Nginx gives uses the option + // see proxy_force_ranges + + let range_type = parse_range_header(range_header.as_bytes(), content_length); + + match &range_type { + RangeType::None => { /* nothing to do*/ } + RangeType::Single(r) => { + // 206 response + resp.set_status(StatusCode::PARTIAL_CONTENT).unwrap(); + resp.insert_header(&CONTENT_LENGTH, r.end - r.start) + .unwrap(); + resp.insert_header( + &CONTENT_RANGE, + format!("bytes {}-{}/{content_length}", r.start, r.end - 1), // range end is inclusive + ) + .unwrap() + } + RangeType::Invalid => { + // 416 response + resp.set_status(StatusCode::RANGE_NOT_SATISFIABLE).unwrap(); + // empty body for simplicity + resp.insert_header(&CONTENT_LENGTH, HeaderValue::from_static("0")) + .unwrap(); + // TODO: remove other headers like content-encoding + resp.remove_header(&CONTENT_TYPE); + resp.insert_header(&CONTENT_RANGE, format!("bytes */{content_length}")) + .unwrap() + } + } + + range_type + } + + #[test] + fn test_range_filter() { + fn gen_req() -> RequestHeader { + RequestHeader::build(http::Method::GET, b"/", Some(1)).unwrap() + } + fn gen_resp() -> ResponseHeader { + let mut resp = ResponseHeader::build(200, Some(1)).unwrap(); + resp.append_header("Content-Length", "10").unwrap(); + resp + } + + // no range + let req = gen_req(); + let mut resp = gen_resp(); + assert_eq!(RangeType::None, range_header_filter(&req, &mut resp)); + assert_eq!(resp.status.as_u16(), 200); + + // regular range + let mut req = gen_req(); + req.insert_header("Range", "bytes=0-1").unwrap(); + let mut resp = gen_resp(); + assert_eq!( + RangeType::new_single(0, 2), + range_header_filter(&req, &mut resp) + ); + assert_eq!(resp.status.as_u16(), 206); + assert_eq!(resp.headers.get("content-length").unwrap().as_bytes(), b"2"); + assert_eq!( + resp.headers.get("content-range").unwrap().as_bytes(), + b"bytes 0-1/10" + ); + + // bad range + let mut req = gen_req(); + req.insert_header("Range", "bytes=1-0").unwrap(); + let mut resp = gen_resp(); + assert_eq!(RangeType::Invalid, range_header_filter(&req, &mut resp)); + assert_eq!(resp.status.as_u16(), 416); + assert_eq!(resp.headers.get("content-length").unwrap().as_bytes(), b"0"); + assert_eq!( + resp.headers.get("content-range").unwrap().as_bytes(), + b"bytes */10" + ); + } + + pub struct RangeBodyFilter { + range: RangeType, + current: usize, + } + + impl RangeBodyFilter { + pub fn new() -> Self { + RangeBodyFilter { + range: RangeType::None, + current: 0, + } + } + + pub fn set(&mut self, range: RangeType) { + self.range = range; + } + + pub fn filter_body(&mut self, data: Option<Bytes>) -> Option<Bytes> { + match &self.range { + RangeType::None => data, + RangeType::Invalid => None, + RangeType::Single(r) => { + let current = self.current; + self.current += data.as_ref().map_or(0, |d| d.len()); + data.and_then(|d| Self::filter_range_data(r.start, r.end, current, d)) + } + } + } + + fn filter_range_data( + start: usize, + end: usize, + current: usize, + data: Bytes, + ) -> Option<Bytes> { + if current + data.len() < start || current >= end { + // if the current data is out side the desired range, just drop the data + None + } else if current >= start && current + data.len() <= end { + // all data is within the slice + Some(data) + } else { + // data: current........current+data.len() + // range: start...........end + let slice_start = start.saturating_sub(current); + let slice_end = std::cmp::min(data.len(), end - current); + Some(data.slice(slice_start..slice_end)) + } + } + } + + #[test] + fn test_range_body_filter() { + let mut body_filter = RangeBodyFilter::new(); + assert_eq!(body_filter.filter_body(Some("123".into())).unwrap(), "123"); + + let mut body_filter = RangeBodyFilter::new(); + body_filter.set(RangeType::Invalid); + assert!(body_filter.filter_body(Some("123".into())).is_none()); + + let mut body_filter = RangeBodyFilter::new(); + body_filter.set(RangeType::new_single(0, 1)); + assert_eq!(body_filter.filter_body(Some("012".into())).unwrap(), "0"); + assert!(body_filter.filter_body(Some("345".into())).is_none()); + + let mut body_filter = RangeBodyFilter::new(); + body_filter.set(RangeType::new_single(4, 6)); + assert!(body_filter.filter_body(Some("012".into())).is_none()); + assert_eq!(body_filter.filter_body(Some("345".into())).unwrap(), "45"); + assert!(body_filter.filter_body(Some("678".into())).is_none()); + + let mut body_filter = RangeBodyFilter::new(); + body_filter.set(RangeType::new_single(1, 7)); + assert_eq!(body_filter.filter_body(Some("012".into())).unwrap(), "12"); + assert_eq!(body_filter.filter_body(Some("345".into())).unwrap(), "345"); + assert_eq!(body_filter.filter_body(Some("678".into())).unwrap(), "6"); + } +} + +// https://datatracker.ietf.org/doc/html/rfc7232 +// Strictly speaking this module is also usable for web server, not just proxy +mod conditional_filter { + use super::*; + use http::header::*; + + // return if 304 is applied to the response + pub fn not_modified_filter(req: &RequestHeader, resp: &mut ResponseHeader) -> bool { + // https://datatracker.ietf.org/doc/html/rfc7232#section-4.1 + // 304 can only validate 200 + if resp.status != StatusCode::OK { + return false; + } + + // TODO: If-Match and if If-Unmodified-Since + + // https://datatracker.ietf.org/doc/html/rfc7232#section-6 + + if let Some(inm) = req.headers.get(IF_NONE_MATCH) { + if let Some(etag) = resp.headers.get(ETAG) { + if validate_etag(inm.as_bytes(), etag.as_bytes()) { + to_304(resp); + return true; + } + } + // MUST ignore If-Modified-Since if the request contains an If-None-Match header + return false; + } + + // TODO: GET/HEAD only https://datatracker.ietf.org/doc/html/rfc7232#section-3.3 + if let Some(since) = req.headers.get(IF_MODIFIED_SINCE) { + if let Some(last) = resp.headers.get(LAST_MODIFIED) { + if test_not_modified(since.as_bytes(), last.as_bytes()) { + to_304(resp); + return true; + } + } + } + false + } + + fn validate_etag(input_etag: &[u8], target_etag: &[u8]) -> bool { + // https://datatracker.ietf.org/doc/html/rfc7232#section-3.2 unsafe method only + if input_etag == b"*" { + return true; + } + // TODO: etag validation: https://datatracker.ietf.org/doc/html/rfc7232#section-2.3.2 + input_etag == target_etag + } + + fn test_not_modified(input_time: &[u8], last_modified_time: &[u8]) -> bool { + // TODO: http-date comparison: https://datatracker.ietf.org/doc/html/rfc7232#section-2.2.2 + input_time == last_modified_time + } + + fn to_304(resp: &mut ResponseHeader) { + // https://datatracker.ietf.org/doc/html/rfc7232#section-4.1 + // XXX: https://datatracker.ietf.org/doc/html/rfc7230#section-3.3.2 + // "A server may send content-length in 304", but no common web server does it + // So we drop both content-length and content-type for consistency/less surprise + resp.set_status(StatusCode::NOT_MODIFIED).unwrap(); + resp.remove_header(&CONTENT_LENGTH); + resp.remove_header(&CONTENT_TYPE); + } +} + +// a state machine for proxy logic to tell when to use cache in the case of +// miss/revalidation/error. +#[derive(Debug)] +pub(crate) enum ServeFromCache { + Off, // not using cache + CacheHeader, // should serve cache header + CacheHeaderOnly, // should serve cache header + CacheBody, // should serve cache body + CacheHeaderMiss, // should serve cache header but upstream response should be admitted to cache + CacheBodyMiss, // should serve cache body but upstream response should be admitted to cache + Done, // should serve cache body +} + +impl ServeFromCache { + pub fn new() -> Self { + Self::Off + } + + pub fn is_on(&self) -> bool { + !matches!(self, Self::Off) + } + + pub fn is_miss(&self) -> bool { + matches!(self, Self::CacheHeaderMiss | Self::CacheBodyMiss) + } + + pub fn is_miss_header(&self) -> bool { + matches!(self, Self::CacheHeaderMiss) + } + + pub fn is_miss_body(&self) -> bool { + matches!(self, Self::CacheBodyMiss) + } + + pub fn should_discard_upstream(&self) -> bool { + self.is_on() && !self.is_miss() + } + + pub fn should_send_to_downstream(&self) -> bool { + !self.is_on() + } + + pub fn enable(&mut self) { + *self = Self::CacheHeader; + } + + pub fn enable_miss(&mut self) { + if !self.is_on() { + *self = Self::CacheHeaderMiss; + } + } + + pub fn enable_header_only(&mut self) { + match self { + Self::CacheBody => *self = Self::Done, // TODO: make sure no body is read yet + _ => *self = Self::CacheHeaderOnly, + } + } + + // This function is (best effort) cancel-safe to be used in select + pub async fn next_http_task(&mut self, cache: &mut HttpCache) -> Result<HttpTask> { + if !cache.enabled() { + // Cache is disabled due to internal error + // TODO: if nothing is sent to eyeball yet, figure out a way to recovery by + // fetching from upstream + return Error::e_explain(InternalError, "Cache disabled"); + } + match self { + Self::Off => panic!("ProxyUseCache not enabled"), + Self::CacheHeader => { + *self = Self::CacheBody; + Ok(HttpTask::Header(cache_hit_header(cache), false)) // false for now + } + Self::CacheHeaderMiss => { + *self = Self::CacheBodyMiss; + Ok(HttpTask::Header(cache_hit_header(cache), false)) // false for now + } + Self::CacheHeaderOnly => { + *self = Self::Done; + Ok(HttpTask::Header(cache_hit_header(cache), true)) + } + Self::CacheBody => { + if let Some(b) = cache.hit_handler().read_body().await? { + Ok(HttpTask::Body(Some(b), false)) // false for now + } else { + *self = Self::Done; + Ok(HttpTask::Done) + } + } + Self::CacheBodyMiss => { + // safety: called of enable_miss() call it only if the async_body_reader exist + if let Some(b) = cache.miss_body_reader().unwrap().read_body().await? { + Ok(HttpTask::Body(Some(b), false)) // false for now + } else { + *self = Self::Done; + Ok(HttpTask::Done) + } + } + Self::Done => Ok(HttpTask::Done), + } + } +} + +/* Downstream revalidation, only needed when cache is on because otherwise origin + * will handle it */ +pub(crate) fn downstream_response_conditional_filter( + use_cache: &mut ServeFromCache, + req: &RequestHeader, + resp: &mut ResponseHeader, +) { + // TODO: range + let header_only = conditional_filter::not_modified_filter(req, resp) + || req.method == http::method::Method::HEAD; + if header_only { + if use_cache.is_on() { + // tell cache to stop after yielding header + use_cache.enable_header_only(); + } else { + // headers only during cache miss, upstream should continue send + // body to cache, `session` will ignore body automatically because + // of the signature of `header` (304) + // TODO: we should drop body before/within this filter so that body + // filter only runs on data downstream sees + } + } +} diff --git a/pingora-proxy/src/proxy_common.rs b/pingora-proxy/src/proxy_common.rs new file mode 100644 index 0000000..d7d97b3 --- /dev/null +++ b/pingora-proxy/src/proxy_common.rs @@ -0,0 +1,93 @@ +/// Possible downstream states during request multiplexing +#[derive(Debug, Clone, Copy)] +pub(crate) enum DownstreamStateMachine { + /// more request (body) to read + Reading, + /// no more data to read + ReadingFinished, + /// downstream is already errored or closed + Errored, +} + +#[allow(clippy::wrong_self_convention)] +impl DownstreamStateMachine { + pub fn new(finished: bool) -> Self { + if finished { + Self::ReadingFinished + } else { + Self::Reading + } + } + + // Can call read() to read more data or wait on closing + pub fn can_poll(&self) -> bool { + !matches!(self, Self::Errored) + } + + pub fn is_reading(&self) -> bool { + matches!(self, Self::Reading) + } + + pub fn is_done(&self) -> bool { + !matches!(self, Self::Reading) + } + + pub fn is_errored(&self) -> bool { + matches!(self, Self::Errored) + } + + /// Move the state machine to Finished state if `set` is true + pub fn maybe_finished(&mut self, set: bool) { + if set { + *self = Self::ReadingFinished + } + } + + pub fn to_errored(&mut self) { + *self = Self::Errored + } +} + +/// Possible upstream states during request multiplexing +#[derive(Debug, Clone, Copy)] +pub(crate) struct ResponseStateMachine { + upstream_response_done: bool, + cached_response_done: bool, +} + +impl ResponseStateMachine { + pub fn new() -> Self { + ResponseStateMachine { + upstream_response_done: false, + cached_response_done: true, // no cached response by default + } + } + + pub fn is_done(&self) -> bool { + self.upstream_response_done && self.cached_response_done + } + + pub fn upstream_done(&self) -> bool { + self.upstream_response_done + } + + pub fn cached_done(&self) -> bool { + self.cached_response_done + } + + pub fn enable_cached_response(&mut self) { + self.cached_response_done = false; + } + + pub fn maybe_set_upstream_done(&mut self, done: bool) { + if done { + self.upstream_response_done = true; + } + } + + pub fn maybe_set_cache_done(&mut self, done: bool) { + if done { + self.cached_response_done = true; + } + } +} diff --git a/pingora-proxy/src/proxy_h1.rs b/pingora-proxy/src/proxy_h1.rs new file mode 100644 index 0000000..77bc50b --- /dev/null +++ b/pingora-proxy/src/proxy_h1.rs @@ -0,0 +1,596 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use super::*; +use crate::proxy_cache::{range_filter::RangeBodyFilter, ServeFromCache}; +use crate::proxy_common::*; +use http::Version; + +impl<SV> HttpProxy<SV> { + pub(crate) async fn proxy_1to1( + &self, + session: &mut Session, + client_session: &mut HttpSessionV1, + peer: &HttpPeer, + ctx: &mut SV::CTX, + ) -> (bool, bool, Option<Box<Error>>) + where + SV: ProxyHttp + Send + Sync, + SV::CTX: Send + Sync, + { + client_session.read_timeout = peer.options.read_timeout; + client_session.write_timeout = peer.options.write_timeout; + + // phase 2 send to upstream + + let mut req = session.req_header().clone(); + + // Convert HTTP2 headers to H1 + if req.version == Version::HTTP_2 { + req.set_version(Version::HTTP_11); + // if client has body but has no content length, add chunked encoding + // https://datatracker.ietf.org/doc/html/rfc9112#name-message-body + // "The presence of a message body in a request is signaled by a Content-Length or Transfer-Encoding header field." + if !session.is_body_empty() && session.get_header(header::CONTENT_LENGTH).is_none() { + req.insert_header(header::TRANSFER_ENCODING, "chunked") + .unwrap(); + } + if session.get_header(header::HOST).is_none() { + // H2 is required to set :authority, but no necessarily header + // most H1 server expect host header, so convert + let host = req.uri.authority().map_or("", |a| a.as_str()).to_owned(); + req.insert_header(header::HOST, host).unwrap(); + } + // TODO: Add keepalive header for connection reuse, but this is not required per RFC + } + + if session.cache.enabled() { + if let Err(e) = pingora_cache::filters::upstream::request_filter( + &mut req, + session.cache.maybe_cache_meta(), + ) { + session.cache.disable(NoCacheReason::InternalError); + warn!("cache upstream filter error {}, disabling cache", e); + } + } + + match self + .inner + .upstream_request_filter(session, &mut req, ctx) + .await + { + Ok(_) => { /* continue */ } + Err(e) => { + return (false, true, Some(e)); + } + } + + session.upstream_compression.request_filter(&req); + + debug!("Sending header to upstream {:?}", req); + + match client_session.write_request_header(Box::new(req)).await { + Ok(_) => { /* Continue */ } + Err(e) => { + return (false, false, Some(e.into_up())); + } + } + + let (tx_upstream, rx_upstream) = mpsc::channel::<HttpTask>(TASK_BUFFER_SIZE); + let (tx_downstream, rx_downstream) = mpsc::channel::<HttpTask>(TASK_BUFFER_SIZE); + + session.as_mut().enable_retry_buffering(); + + // start bi-directional streaming + let ret = tokio::try_join!( + self.proxy_handle_downstream(session, tx_downstream, rx_upstream, ctx), + self.proxy_handle_upstream(client_session, tx_upstream, rx_downstream), + ); + + match ret { + Ok((_first, _second)) => { + client_session.respect_keepalive(); + (true, true, None) + } + Err(e) => (false, false, Some(e)), + } + } + + pub(crate) async fn proxy_to_h1_upstream( + &self, + session: &mut Session, + client_session: &mut HttpSessionV1, + reused: bool, + peer: &HttpPeer, + ctx: &mut SV::CTX, + ) -> (bool, bool, Option<Box<Error>>) + // (reuse_server, reuse_client, error) + where + SV: ProxyHttp + Send + Sync, + SV::CTX: Send + Sync, + { + if let Err(e) = self + .inner + .connected_to_upstream( + session, + reused, + peer, + client_session.id(), + Some(client_session.digest()), + ctx, + ) + .await + { + return (false, false, Some(e)); + } + + let (server_session_reuse, client_session_reuse, error) = + self.proxy_1to1(session, client_session, peer, ctx).await; + + (server_session_reuse, client_session_reuse, error) + } + + async fn proxy_handle_upstream( + &self, + client_session: &mut HttpSessionV1, + tx: mpsc::Sender<HttpTask>, + mut rx: mpsc::Receiver<HttpTask>, + ) -> Result<()> + where + SV: ProxyHttp + Send + Sync, + SV::CTX: Send + Sync, + { + let mut request_done = false; + let mut response_done = false; + + /* duplex mode, wait for either to complete */ + while !request_done || !response_done { + tokio::select! { + res = client_session.read_response_task(), if !response_done => { + match res { + Ok(task) => { + response_done = task.is_end(); + let result = tx.send(task) + .await.or_err( + InternalError, + "Failed to send upstream header to pipe"); + // If the request is upgraded, the downstream pipe can early exit + // when the downstream connection is closed. + // In that case, this function should ignore that the pipe is closed. + // So that this function could read the rest events from rx including + // the closure, then exit. + if result.is_err() && !client_session.is_upgrade_req() { + return result; + } + }, + Err(e) => { + // Push the error to downstream and then quit + // Don't care if send fails: downstream already gone + let _ = tx.send(HttpTask::Failed(e.into_up())).await; + // Downstream should consume all remaining data and handle the error + return Ok(()) + } + } + }, + + body = rx.recv(), if !request_done => { + request_done = send_body_to1(client_session, body).await?; + // An upgraded request is terminated when either side is done + if request_done && client_session.is_upgrade_req() { + response_done = true; + } + }, + + else => { + // this shouldn't be reached as the while loop would already exit + break; + } + } + } + + Ok(()) + } + + // todo use this function to replace bidirection_1to2() + async fn proxy_handle_downstream( + &self, + session: &mut Session, + tx: mpsc::Sender<HttpTask>, + mut rx: mpsc::Receiver<HttpTask>, + ctx: &mut SV::CTX, + ) -> Result<()> + where + SV: ProxyHttp + Send + Sync, + SV::CTX: Send + Sync, + { + let mut downstream_state = DownstreamStateMachine::new(session.as_mut().is_body_done()); + + let buffer = session.as_ref().get_retry_buffer(); + + // retry, send buffer if it exists or body empty + if buffer.is_some() || session.as_mut().is_body_empty() { + let send_permit = tx + .reserve() + .await + .or_err(InternalError, "reserving body pipe")?; + send_body_to_pipe(buffer, downstream_state.is_done(), send_permit).await; + } + + let mut response_state = ResponseStateMachine::new(); + + // these two below can be wrapped into an internal ctx + // use cache when upstream revalidates (or TODO: error) + let mut serve_from_cache = proxy_cache::ServeFromCache::new(); + let mut range_body_filter = proxy_cache::range_filter::RangeBodyFilter::new(); + + /* duplex mode without caching + * Read body from downstream while reading response from upstream + * If response is done, only read body from downstream + * If request is done, read response from upstream while idling downstream (to close quickly) + * If both are done, quit the loop + * + * With caching + but without partial read support + * Similar to above, cache admission write happen when the data is write to downstream + * + * With caching + partial read support + * A. Read upstream response and write to cache + * B. Read data from cache and send to downstream + * If B fails (usually downstream close), continue A. + * If A fails, exit with error. + * If both are done, quit the loop + * Usually there is no request body to read for cacheable request + */ + while !downstream_state.is_done() || !response_state.is_done() { + // reserve tx capacity ahead to avoid deadlock, see below + + let send_permit = tx + .try_reserve() + .or_err(InternalError, "try_reserve() body pipe for upstream"); + + tokio::select! { + // only try to send to pipe if there is capacity to avoid deadlock + // Otherwise deadlock could happen if both upstream and downstream are blocked + // on sending to their corresponding pipes which are both full. + body = session.downstream_session.read_body_or_idle(downstream_state.is_done()), + if downstream_state.can_poll() && send_permit.is_ok() => { + + debug!("downstream event"); + let body = match body { + Ok(b) => b, + Err(e) => { + if serve_from_cache.is_miss() { + // ignore downstream error so that upstream can continue write cache + downstream_state.to_errored(); + warn!( + "Downstream Error ignored during caching: {}, {}", + e, + self.inner.request_summary(session, ctx) + ); + continue; + } else { + return Err(e.into_down()); + } + } + }; + // If the request is websocket, `None` body means the request is closed. + // Set the response to be done as well so that the request completes normally. + if body.is_none() && session.is_upgrade_req() { + response_state.maybe_set_upstream_done(true); + } + // TODO: consider just drain this if serve_from_cache is set + let request_done = send_body_to_pipe( + body, + session.is_body_done(), + send_permit.unwrap(), // safe because we checked is_ok() + ) + .await; + downstream_state.maybe_finished(request_done); + }, + + _ = tx.reserve(), if downstream_state.is_reading() && send_permit.is_err() => { + debug!("waiting for permit {send_permit:?}"); + /* No permit, wait on more capacity to avoid starving. + * Otherwise this select only blocks on rx, which might send no data + * before the entire body is uploaded. + * once more capacity arrives we just loop back + */ + }, + + task = rx.recv(), if !response_state.upstream_done() => { + debug!("upstream event: {:?}", task); + if let Some(t) = task { + if serve_from_cache.should_discard_upstream() { + // just drain, do we need to do anything else? + continue; + } + // pull as many tasks as we can + let mut tasks = Vec::with_capacity(TASK_BUFFER_SIZE); + tasks.push(t); + while let Some(maybe_task) = rx.recv().now_or_never() { + debug!("upstream event now: {:?}", maybe_task); + if let Some(t) = maybe_task { + tasks.push(t); + } else { + break; // upstream closed + } + } + + /* run filters before sending to downstream */ + let mut filtered_tasks = Vec::with_capacity(TASK_BUFFER_SIZE); + for mut t in tasks { + if self.revalidate_or_stale(session, &mut t, ctx).await { + serve_from_cache.enable(); + response_state.enable_cached_response(); + // skip downstream filtering entirely as the 304 will not be sent + break; + } + session.upstream_compression.response_filter(&mut t); + let task = self.h1_response_filter(session, t, ctx, + &mut serve_from_cache, + &mut range_body_filter, false).await?; + if serve_from_cache.is_miss_header() { + response_state.enable_cached_response(); + } + // check error and abort + // otherwise the error is surfaced via write_response_tasks() + if !serve_from_cache.should_send_to_downstream() { + if let HttpTask::Failed(e) = task { + return Err(e); + } + } + filtered_tasks.push(task); + } + + if !serve_from_cache.should_send_to_downstream() { + // TODO: need to derive response_done from filtered_tasks in case downstream failed already + continue; + } + + // set to downstream + let response_done = session.write_response_tasks(filtered_tasks).await?; + response_state.maybe_set_upstream_done(response_done); + // unsuccessful upgrade response may force the request done + downstream_state.maybe_finished(session.is_body_done()); + } else { + debug!("empty upstream event"); + response_state.maybe_set_upstream_done(true); + } + }, + + task = serve_from_cache.next_http_task(&mut session.cache), + if !response_state.cached_done() && !downstream_state.is_errored() && serve_from_cache.is_on() => { + + let task = self.h1_response_filter(session, task?, ctx, + &mut serve_from_cache, + &mut range_body_filter, true).await?; + debug!("serve_from_cache task {task:?}"); + + match session.write_response_tasks(vec![task]).await { + Ok(b) => response_state.maybe_set_cache_done(b), + Err(e) => if serve_from_cache.is_miss() { + // give up writing to downstream but wait for upstream cache write to finish + downstream_state.to_errored(); + response_state.maybe_set_cache_done(true); + warn!( + "Downstream Error ignored during caching: {}, {}", + e, + self.inner.request_summary(session, ctx) + ); + continue; + } else { + return Err(e); + } + } + if response_state.cached_done() { + if let Err(e) = session.cache.finish_hit_handler().await { + warn!("Error during finish_hit_handler: {}", e); + } + } + } + + else => { + break; + } + } + } + + match session.as_mut().finish_body().await { + Ok(_) => { + debug!("finished sending body to downstream"); + } + Err(e) => { + error!("Error finish sending body to downstream: {}", e); + // TODO: don't do downstream keepalive + } + } + Ok(()) + } + + async fn h1_response_filter( + &self, + session: &mut Session, + mut task: HttpTask, + ctx: &mut SV::CTX, + serve_from_cache: &mut ServeFromCache, + range_body_filter: &mut RangeBodyFilter, + from_cache: bool, // are the task from cache already + ) -> Result<HttpTask> + where + SV: ProxyHttp + Send + Sync, + SV::CTX: Send + Sync, + { + // skip caching if already served from cache + if !from_cache { + self.upstream_filter(session, &mut task, ctx); + + // cache the original response before any downstream transformation + // requests that bypassed cache still need to run filters to see if the response has become cacheable + if session.cache.enabled() || session.cache.bypassing() { + if let Err(e) = self + .cache_http_task(session, &task, ctx, serve_from_cache) + .await + { + session.cache.disable(NoCacheReason::StorageError); + if serve_from_cache.is_miss_body() { + // if the response stream cache body during miss but write fails, it has to + // give up the entire request + return Err(e); + } else { + // otherwise, continue processing the response + warn!( + "Fail to cache response: {}, {}", + e, + self.inner.request_summary(session, ctx) + ); + } + } + } + + if !serve_from_cache.should_send_to_downstream() { + return Ok(task); + } + } // else: cached/local response, no need to trigger upstream filters and caching + + match task { + HttpTask::Header(mut header, end) => { + let req = session.req_header(); + + /* Downstream revalidation/range, only needed when cache is on because otherwise origin + * will handle it */ + // TODO: if cache is disabled during response phase, we should still do the filter + if session.cache.enabled() { + proxy_cache::downstream_response_conditional_filter( + serve_from_cache, + req, + &mut header, + ); + if !session.ignore_downstream_range { + let range_type = + proxy_cache::range_filter::range_header_filter(req, &mut header); + range_body_filter.set(range_type); + } + } + + /* Convert HTTP 1.0 style response to chunked encoding so that we don't + * have to close the downstream connection */ + // these status codes / method cannot have body, so no need to add chunked encoding + let no_body = req.method == http::method::Method::HEAD + || matches!(header.status.as_u16(), 204 | 304); + if !no_body + && !header.status.is_informational() + && header + .headers + .get(http::header::TRANSFER_ENCODING) + .is_none() + && header.headers.get(http::header::CONTENT_LENGTH).is_none() + && !end + { + header.insert_header(http::header::TRANSFER_ENCODING, "chunked")?; + } + + match self.inner.response_filter(session, &mut header, ctx).await { + Ok(_) => Ok(HttpTask::Header(header, end)), + Err(e) => Err(e), + } + } + HttpTask::Body(data, end) => { + let data = range_body_filter.filter_body(data); + if let Some(duration) = self.inner.response_body_filter(session, &data, ctx)? { + trace!("delaying response for {:?}", duration); + time::sleep(duration).await; + } + Ok(HttpTask::Body(data, end)) + } + HttpTask::Trailer(h) => Ok(HttpTask::Trailer(h)), // no h1 trailer filter yet + HttpTask::Done => Ok(task), + HttpTask::Failed(_) => Ok(task), // Do nothing just pass the error down + } + } +} + +// TODO:: use this function to replace send_body_to2 +pub(crate) async fn send_body_to_pipe( + data: Option<Bytes>, + end_of_body: bool, + tx: mpsc::Permit<'_, HttpTask>, +) -> bool { + match data { + Some(data) => { + debug!("Read {} bytes body from downstream", data.len()); + if data.is_empty() && !end_of_body { + /* it is normal to get 0 bytes because of multi-chunk + * don't write 0 bytes to downstream since it will be + * misread as the terminating chunk */ + return false; + } + tx.send(HttpTask::Body(Some(data), end_of_body)); + end_of_body + } + None => { + tx.send(HttpTask::Body(None, true)); + true + } + } +} + +pub(crate) async fn send_body_to1( + client_session: &mut HttpSessionV1, + recv_task: Option<HttpTask>, +) -> Result<bool> { + let body_done; + + if let Some(task) = recv_task { + match task { + HttpTask::Body(data, end) => { + body_done = end; + if let Some(d) = data { + let m = client_session.write_body(&d).await; + match m { + Ok(m) => match m { + Some(n) => { + debug!("Write {} bytes body to upstream", n); + } + None => { + warn!("Upstream body is already finished. Nothing to write"); + } + }, + Err(e) => { + return e.into_up().into_err(); + } + } + } + } + _ => { + // should never happen, sender only sends body + warn!("Unexpected task sent to upstream"); + body_done = true; + } + } + } else { + // sender dropped + body_done = true; + } + + if body_done { + match client_session.finish_body().await { + Ok(_) => { + debug!("finish sending body to upstream"); + Ok(true) + } + Err(e) => e.into_up().into_err(), + } + } else { + Ok(false) + } +} diff --git a/pingora-proxy/src/proxy_h2.rs b/pingora-proxy/src/proxy_h2.rs new file mode 100644 index 0000000..87bb895 --- /dev/null +++ b/pingora-proxy/src/proxy_h2.rs @@ -0,0 +1,616 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use super::*; +use crate::proxy_cache::{range_filter::RangeBodyFilter, ServeFromCache}; +use crate::proxy_common::*; +use pingora_core::protocols::http::v2::client::{write_body, Http2Session}; + +// add scheme and authority as required by h2 lib +fn update_h2_scheme_authority(header: &mut http::request::Parts, raw_host: &[u8]) -> Result<()> { + let authority = if let Ok(s) = std::str::from_utf8(raw_host) { + if s.starts_with('[') { + // don't mess with ipv6 host + s + } else if let Some(colon) = s.find(':') { + if s.len() == colon + 1 { + // colon is the last char, ignore + s + } else if let Some(another_colon) = s[colon + 1..].find(':') { + // try to get rid of extra port numbers + &s[..colon + 1 + another_colon] + } else { + s + } + } else { + s + } + } else { + return Error::e_explain( + InvalidHTTPHeader, + format!("invalid authority from host {:?}", raw_host), + ); + }; + + let uri = http::uri::Builder::new() + .scheme("https") + .authority(authority) + .path_and_query(header.uri.path_and_query().as_ref().unwrap().as_str()) + .build(); + match uri { + Ok(uri) => { + header.uri = uri; + Ok(()) + } + Err(_) => Error::e_explain( + InvalidHTTPHeader, + format!("invalid authority from host {}", authority), + ), + } +} + +impl<SV> HttpProxy<SV> { + pub(crate) async fn proxy_1to2( + &self, + session: &mut Session, + client_session: &mut Http2Session, + peer: &HttpPeer, + ctx: &mut SV::CTX, + ) -> (bool, Option<Box<Error>>) + // (reuse_server, error) + where + SV: ProxyHttp + Send + Sync, + SV::CTX: Send + Sync, + { + let mut req = session.req_header().clone(); + + if req.version != Version::HTTP_2 { + /* remove H1 specific headers */ + // https://github.com/hyperium/h2/blob/d3b9f1e36aadc1a7a6804e2f8e86d3fe4a244b4f/src/proto/streams/send.rs#L72 + req.remove_header(&http::header::TRANSFER_ENCODING); + req.remove_header(&http::header::CONNECTION); + req.remove_header(&http::header::UPGRADE); + req.remove_header("keep-alive"); + req.remove_header("proxy-connection"); + } + + /* turn it into h2 */ + req.set_version(Version::HTTP_2); + + if session.cache.enabled() { + if let Err(e) = pingora_cache::filters::upstream::request_filter( + &mut req, + session.cache.maybe_cache_meta(), + ) { + session.cache.disable(NoCacheReason::InternalError); + warn!("cache upstream filter error {}, disabling cache", e); + } + } + + match self + .inner + .upstream_request_filter(session, &mut req, ctx) + .await + { + Ok(_) => { /* continue */ } + Err(e) => { + return (false, Some(e)); + } + } + + // Remove H1 `Host` header, save it in order to add to :authority + // We do this because certain H2 servers expect request not to have a host header. + // The `Host` is removed after the upstream filters above for 2 reasons + // 1. there is no API to change the :authority header + // 2. the filter code needs to be aware of the host vs :authority across http versions otherwise + let host = req.remove_header(&http::header::HOST); + + session.upstream_compression.request_filter(&req); + let body_empty = session.as_mut().is_body_empty(); + + let mut req: http::request::Parts = req.into(); + + // H2 requires authority to be set, so copy that from H1 host if that is set + if let Some(host) = host { + if let Err(e) = update_h2_scheme_authority(&mut req, host.as_bytes()) { + return (false, Some(e)); + } + } + + debug!("Request to h2: {:?}", req); + + // don't send END_STREAM on HEADERS for no_header_eos + let send_header_eos = !peer.options.no_header_eos && body_empty; + + let req = Box::new(RequestHeader::from(req)); + match client_session.write_request_header(req, send_header_eos) { + Ok(v) => v, + Err(e) => { + return (false, Some(e.into_up())); + } + }; + + // send END_STREAM on empty DATA frame for no_headers_eos + if peer.options.no_header_eos && body_empty { + match client_session.write_request_body(Bytes::new(), true) { + Ok(()) => debug!("sent empty DATA frame to h2"), + Err(e) => { + return (false, Some(e.into_up())); + } + }; + } + + client_session.read_timeout = peer.options.read_timeout; + + // take the body writer out of the client for easy duplex + let mut client_body = client_session + .take_request_body_writer() + .expect("already send request header"); + + let (tx, rx) = mpsc::channel::<HttpTask>(TASK_BUFFER_SIZE); + + session.as_mut().enable_retry_buffering(); + + /* read downstream body and upstream response at the same time */ + + let ret = tokio::try_join!( + self.bidirection_1to2(session, &mut client_body, rx, ctx), + pipe_2to1_response(client_session, tx) + ); + + match ret { + Ok((_first, _second)) => (true, None), + Err(e) => (false, Some(e)), + } + } + + pub(crate) async fn proxy_to_h2_upstream( + &self, + session: &mut Session, + client_session: &mut Http2Session, + reused: bool, + peer: &HttpPeer, + ctx: &mut SV::CTX, + ) -> (bool, Option<Box<Error>>) + where + SV: ProxyHttp + Send + Sync, + SV::CTX: Send + Sync, + { + if let Err(e) = self + .inner + .connected_to_upstream( + session, + reused, + peer, + client_session.fd(), + client_session.digest(), + ctx, + ) + .await + { + return (false, Some(e)); + } + + let (server_session_reuse, error) = + self.proxy_1to2(session, client_session, peer, ctx).await; + + (server_session_reuse, error) + } + + async fn bidirection_1to2( + &self, + session: &mut Session, + client_body: &mut h2::SendStream<bytes::Bytes>, + mut rx: mpsc::Receiver<HttpTask>, + ctx: &mut SV::CTX, + ) -> Result<()> + where + SV: ProxyHttp + Send + Sync, + SV::CTX: Send + Sync, + { + let mut downstream_state = DownstreamStateMachine::new(session.as_mut().is_body_done()); + + // retry, send buffer if it exists + if let Some(buffer) = session.as_mut().get_retry_buffer() { + send_body_to2(Ok(Some(buffer)), downstream_state.is_done(), client_body)?; + } + + let mut response_state = ResponseStateMachine::new(); + + // these two below can be wrapped into an internal ctx + // use cache when upstream revalidates (or TODO: error) + let mut serve_from_cache = ServeFromCache::new(); + let mut range_body_filter = proxy_cache::range_filter::RangeBodyFilter::new(); + + /* duplex mode + * see the Same function for h1 for more comments + */ + while !downstream_state.is_done() || !response_state.is_done() { + // Similar logic in h1 need to reserve capacity first to avoid deadlock + // But we don't need to do the same because the h2 client_body pipe is unbounded (never block) + tokio::select! { + // NOTE: cannot avoid this copy since h2 owns the buf + body = session.downstream_session.read_body_or_idle(downstream_state.is_done()), if downstream_state.can_poll() => { + debug!("downstream event"); + let body = match body { + Ok(b) => b, + Err(e) => { + if serve_from_cache.is_miss() { + // ignore downstream error so that upstream can continue write cache + downstream_state.to_errored(); + warn!( + "Downstream Error ignored during caching: {}, {}", + e, + self.inner.request_summary(session, ctx) + ); + continue; + } else { + return Err(e.into_down()); + } + } + }; + let request_done = send_body_to2(Ok(body), session.is_body_done(), client_body)?; + downstream_state.maybe_finished(request_done); + }, + + task = rx.recv(), if !response_state.upstream_done() => { + if let Some(t) = task { + debug!("upstream event: {:?}", t); + if serve_from_cache.should_discard_upstream() { + // just drain, do we need to do anything else? + continue; + } + // pull as many tasks as we can + let mut tasks = Vec::with_capacity(TASK_BUFFER_SIZE); + tasks.push(t); + while let Some(maybe_task) = rx.recv().now_or_never() { + if let Some(t) = maybe_task { + tasks.push(t); + } else { + break + } + } + + /* run filters before sending to downstream */ + let mut filtered_tasks = Vec::with_capacity(TASK_BUFFER_SIZE); + for mut t in tasks { + if self.revalidate_or_stale(session, &mut t, ctx).await { + serve_from_cache.enable(); + response_state.enable_cached_response(); + // skip downstream filtering entirely as the 304 will not be sent + break; + } + session.upstream_compression.response_filter(&mut t); + // check error and abort + // otherwise the error is surfaced via write_response_tasks() + if !serve_from_cache.should_send_to_downstream() { + if let HttpTask::Failed(e) = t { + return Err(e); + } + } + filtered_tasks.push( + self.h2_response_filter(session, t, ctx, + &mut serve_from_cache, + &mut range_body_filter, false).await?); + if serve_from_cache.is_miss_header() { + response_state.enable_cached_response(); + } + } + + if !serve_from_cache.should_send_to_downstream() { + // TODO: need to derive response_done from filtered_tasks in case downstream failed already + continue; + } + + let response_done = session.write_response_tasks(filtered_tasks).await?; + response_state.maybe_set_upstream_done(response_done); + } else { + debug!("empty upstream event"); + response_state.maybe_set_upstream_done(true); + } + } + + task = serve_from_cache.next_http_task(&mut session.cache), + if !response_state.cached_done() && !downstream_state.is_errored() && serve_from_cache.is_on() => { + let task = self.h2_response_filter(session, task?, ctx, + &mut serve_from_cache, + &mut range_body_filter, true).await?; + match session.write_response_tasks(vec![task]).await { + Ok(b) => response_state.maybe_set_cache_done(b), + Err(e) => if serve_from_cache.is_miss() { + // give up writing to downstream but wait for upstream cache write to finish + downstream_state.to_errored(); + response_state.maybe_set_cache_done(true); + warn!( + "Downstream Error ignored during caching: {}, {}", + e, + self.inner.request_summary(session, ctx) + ); + continue; + } else { + return Err(e); + } + } + if response_state.cached_done() { + if let Err(e) = session.cache.finish_hit_handler().await { + warn!("Error during finish_hit_handler: {}", e); + } + } + } + + else => { + break; + } + } + } + + match session.as_mut().finish_body().await { + Ok(_) => { + debug!("finished sending body to downstream"); + } + Err(e) => { + error!("Error finish sending body to downstream: {}", e); + // TODO: don't do downstream keepalive + } + } + Ok(()) + } + + async fn h2_response_filter( + &self, + session: &mut Session, + mut task: HttpTask, + ctx: &mut SV::CTX, + serve_from_cache: &mut ServeFromCache, + range_body_filter: &mut RangeBodyFilter, + from_cache: bool, // are the task from cache already + ) -> Result<HttpTask> + where + SV: ProxyHttp + Send + Sync, + SV::CTX: Send + Sync, + { + if !from_cache { + self.upstream_filter(session, &mut task, ctx); + + // cache the original response before any downstream transformation + // requests that bypassed cache still need to run filters to see if the response has become cacheable + if session.cache.enabled() || session.cache.bypassing() { + if let Err(e) = self + .cache_http_task(session, &task, ctx, serve_from_cache) + .await + { + if serve_from_cache.is_miss_body() { + // if the response stream cache body during miss but write fails, it has to + // give up the entire request + return Err(e); + } else { + // otherwise, continue processing the response + warn!( + "Fail to cache response: {}, {}", + e, + self.inner.request_summary(session, ctx) + ); + } + } + } + // skip the downstream filtering if these tasks are just for cache admission + if !serve_from_cache.should_send_to_downstream() { + return Ok(task); + } + } // else: cached/local response, no need to trigger upstream filters and caching + + match task { + HttpTask::Header(mut header, eos) => { + let req = session.req_header(); + + /* Downstream revalidation, only needed when cache is on because otherwise origin + * will handle it */ + // TODO: if cache is disabled during response phase, we should still do the filter + if session.cache.enabled() { + proxy_cache::downstream_response_conditional_filter( + serve_from_cache, + req, + &mut header, + ); + if !session.ignore_downstream_range { + let range_type = + proxy_cache::range_filter::range_header_filter(req, &mut header); + range_body_filter.set(range_type); + } + } + + self.inner + .response_filter(session, &mut header, ctx) + .await?; + /* Downgrade the version so that write_response_header won't panic */ + header.set_version(Version::HTTP_11); + + // these status codes / method cannot have body, so no need to add chunked encoding + let no_body = session.req_header().method == "HEAD" + || matches!(header.status.as_u16(), 204 | 304); + + /* Add chunked header to tell downstream to use chunked encoding + * during the absent of content-length in h2 */ + if !no_body + && !header.status.is_informational() + && header.headers.get(http::header::CONTENT_LENGTH).is_none() + { + header.insert_header(http::header::TRANSFER_ENCODING, "chunked")?; + } + Ok(HttpTask::Header(header, eos)) + } + HttpTask::Body(data, eos) => { + let data = range_body_filter.filter_body(data); + if let Some(duration) = self.inner.response_body_filter(session, &data, ctx)? { + trace!("delaying response for {:?}", duration); + time::sleep(duration).await; + } + Ok(HttpTask::Body(data, eos)) + } + HttpTask::Trailer(header_map) => { + let trailer_buffer = match header_map { + Some(mut trailer_map) => { + debug!("Parsing response trailers.."); + match self + .inner + .response_trailer_filter(session, &mut trailer_map, ctx) + .await + { + Ok(buf) => buf, + Err(e) => { + error!( + "Encountered error while filtering upstream trailers {:?}", + e + ); + None + } + } + } + _ => None, + }; + // if we have a trailer buffer write it to the downstream response body + if let Some(buffer) = trailer_buffer { + // write_body will not write additional bytes after reaching the content-length + // for gRPC H2 -> H1 this is not a problem but may be a problem for non gRPC code + // https://http2.github.io/http2-spec/#malformed + Ok(HttpTask::Body(Some(buffer), true)) + } else { + Ok(HttpTask::Done) + } + } + HttpTask::Done => Ok(task), + HttpTask::Failed(_) => Ok(task), // Do nothing just pass the error down + } + } +} + +pub(crate) fn send_body_to2( + data: Result<Option<Bytes>>, + end_of_body: bool, + client_body: &mut h2::SendStream<bytes::Bytes>, +) -> Result<bool> { + match data { + Ok(res) => match res { + Some(data) => { + let data_len = data.len(); + debug!( + "Read {} bytes body from downstream, body end: {}", + data_len, end_of_body + ); + if data_len == 0 && !end_of_body { + /* it is normal to get 0 bytes because of multi-chunk parsing */ + return Ok(false); + } + write_body(client_body, data, end_of_body).map_err(|e| e.into_up())?; + debug!("Write {} bytes body to h2 upstream", data_len); + Ok(end_of_body) + } + None => { + debug!("Read downstream body done"); + /* send a standalone END_STREAM flag */ + write_body(client_body, Bytes::new(), true).map_err(|e| e.into_up())?; + debug!("Write END_STREAM to h2 upstream"); + Ok(true) + } + }, + Err(e) => e.into_down().into_err(), + } +} + +/* Read response header, body and trailer from h2 upstream and send them to tx */ +pub(crate) async fn pipe_2to1_response( + client: &mut Http2Session, + tx: mpsc::Sender<HttpTask>, +) -> Result<()> { + client + .read_response_header() + .await + .map_err(|e| e.into_up())?; // should we send the error as an HttpTask? + + let resp_header = Box::new(client.response_header().expect("just read").clone()); + + tx.send(HttpTask::Header(resp_header, client.response_finished())) + .await + .or_err(InternalError, "sending h2 headers to pipe")?; + + while let Some(chunk) = client + .read_response_body() + .await + .map_err(|e| e.into_up()) + .transpose() + { + let data = match chunk { + Ok(d) => d, + Err(e) => { + // Push the error to downstream and then quit + // Don't care if send fails: downstream already gone + let _ = tx.send(HttpTask::Failed(e.into_up())).await; + // Downstream should consume all remaining data and handle the error + return Ok(()); + } + }; + if data.is_empty() && !client.response_finished() { + /* it is normal to get 0 bytes because of multi-chunk + * don't write 0 bytes to downstream since it will be + * misread as the terminating chunk */ + continue; + } + tx.send(HttpTask::Body(Some(data), client.response_finished())) + .await + .or_err(InternalError, "sending h2 body to pipe")?; + } + + // attempt to get trailers + let trailers = match client.read_trailers().await { + Ok(t) => t, + Err(e) => { + // Similar to above, push the error to downstream and then quit + let _ = tx.send(HttpTask::Failed(e.into_up())).await; + return Ok(()); + } + }; + + let trailers = trailers.map(Box::new); + + if trailers.is_some() { + tx.send(HttpTask::Trailer(trailers)) + .await + .or_err(InternalError, "sending h2 trailer to pipe")?; + } + + tx.send(HttpTask::Done) + .await + .unwrap_or_else(|_| debug!("h2 to h1 channel closed!")); + + Ok(()) +} + +#[test] +fn test_update_authority() { + let mut parts = http::request::Builder::new() + .body(()) + .unwrap() + .into_parts() + .0; + update_h2_scheme_authority(&mut parts, b"example.com").unwrap(); + assert_eq!("example.com", parts.uri.authority().unwrap()); + update_h2_scheme_authority(&mut parts, b"example.com:456").unwrap(); + assert_eq!("example.com:456", parts.uri.authority().unwrap()); + update_h2_scheme_authority(&mut parts, b"example.com:").unwrap(); + assert_eq!("example.com:", parts.uri.authority().unwrap()); + update_h2_scheme_authority(&mut parts, b"example.com:123:345").unwrap(); + assert_eq!("example.com:123", parts.uri.authority().unwrap()); + update_h2_scheme_authority(&mut parts, b"[::1]").unwrap(); + assert_eq!("[::1]", parts.uri.authority().unwrap()); +} diff --git a/pingora-proxy/src/proxy_purge.rs b/pingora-proxy/src/proxy_purge.rs new file mode 100644 index 0000000..16796ba --- /dev/null +++ b/pingora-proxy/src/proxy_purge.rs @@ -0,0 +1,90 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use super::*; + +use once_cell::sync::Lazy; +use pingora_core::protocols::http::SERVER_NAME; + +fn gen_purge_response(code: u16) -> ResponseHeader { + let mut resp = ResponseHeader::build(code, Some(3)).unwrap(); + resp.insert_header(header::SERVER, &SERVER_NAME[..]) + .unwrap(); + resp.insert_header(header::CONTENT_LENGTH, 0).unwrap(); + resp.insert_header(header::CACHE_CONTROL, "private, no-store") + .unwrap(); + // TODO more headers? + resp +} + +async fn write_purge_response( + session: &mut Session, + resp: &ResponseHeader, +) -> (bool, Option<Box<Error>>) { + match session.as_mut().write_response_header_ref(resp).await { + Ok(_) => (true, None), + // dirty, not reusable + Err(e) => (false, Some(e.into_down())), + } +} + +/// Write a response for a rejected cache purge requests +pub async fn write_no_purge_response(session: &mut Session) -> (bool, Option<Box<Error>>) { + // TODO: log send error + write_purge_response(session, &NOT_PURGEABLE).await +} + +static OK: Lazy<ResponseHeader> = Lazy::new(|| gen_purge_response(200)); +static NOT_FOUND: Lazy<ResponseHeader> = Lazy::new(|| gen_purge_response(404)); +// for when purge is sent to uncacheable assets +static NOT_PURGEABLE: Lazy<ResponseHeader> = Lazy::new(|| gen_purge_response(405)); + +impl<SV> HttpProxy<SV> { + pub(crate) async fn proxy_purge( + &self, + session: &mut Session, + ctx: &mut SV::CTX, + ) -> Option<(bool, Option<Box<Error>>)> + where + SV: ProxyHttp + Send + Sync, + SV::CTX: Send + Sync, + { + match session.cache.purge().await { + Ok(found) => { + // canned PURGE response based on whether we found the asset or not + let resp = if found { &*OK } else { &*NOT_FOUND }; + let (reuse, err) = write_purge_response(session, resp).await; + if let Some(e) = err.as_ref() { + error!( + "Failed to send purge response: {}, {}", + e, + self.inner.request_summary(session, ctx) + ) + } + Some((reuse, err)) + } + Err(e) => { + session.cache.disable(NoCacheReason::StorageError); + warn!( + "Fail to purge cache: {}, {}", + e, + self.inner.request_summary(session, ctx) + ); + session.downstream_session.respond_error(500).await; + // still reusable + Some((true, Some(e))) + } + } + } +} diff --git a/pingora-proxy/src/proxy_trait.rs b/pingora-proxy/src/proxy_trait.rs new file mode 100644 index 0000000..c4fa2ef --- /dev/null +++ b/pingora-proxy/src/proxy_trait.rs @@ -0,0 +1,365 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use super::*; +use pingora_cache::{ + key::HashBinary, CacheKey, CacheMeta, NoCacheReason, RespCacheable, RespCacheable::*, +}; + +/// The interface to control the HTTP proxy +/// +/// The methods in [ProxyHttp] are filters/callbacks which will be performed on all requests at their +/// paticular stage (if applicable). +/// +/// If any of the filters returns [Result::Err], the request will fail and the error will be logged. +#[cfg_attr(not(doc_async_trait), async_trait)] +pub trait ProxyHttp { + /// The per request object to share state across the different filters + type CTX; + + /// Define how the `ctx` should be created. + fn new_ctx(&self) -> Self::CTX; + + /// Define where the proxy should sent the request to. + /// + /// The returned [HttpPeer] contains the information regarding where and how this request should + /// be forwarded to. + async fn upstream_peer( + &self, + session: &mut Session, + ctx: &mut Self::CTX, + ) -> Result<Box<HttpPeer>>; + + /// Handle the incoming request. + /// + /// In this phase, users can parse, validate, rate limit, perform access control and/or + /// return a response for this request. + /// + /// If the user already sent a response to this request, a `Ok(true)` should be returned so that + /// the proxy would exit. The proxy continues to the next phases when `Ok(false)` is returned. + /// + /// By default this filter does nothing and returns `Ok(false)`. + async fn request_filter(&self, _session: &mut Session, _ctx: &mut Self::CTX) -> Result<bool> + where + Self::CTX: Send + Sync, + { + Ok(false) + } + + /// This filter decides if the request is cacheable and what cache backend to use + /// + /// The caller can interact with `Session.cache` to enabled caching. + /// + /// By default this filter does nothing which effectively disables caching. + // Ideally only session.cache should be modified, TODO: reflect that in this interface + fn request_cache_filter(&self, _session: &mut Session, _ctx: &mut Self::CTX) -> Result<()> { + Ok(()) + } + + /// This callback generates the cache key + /// + /// This callback is called only when cache is enabled for this request + /// + /// By default this callback returns a default cache key generated from the request. + fn cache_key_callback(&self, session: &Session, _ctx: &mut Self::CTX) -> Result<CacheKey> { + let req_header = session.req_header(); + Ok(CacheKey::default(req_header)) + } + + /// This callback is invoked when a cacheable response is ready to be admitted to cache + fn cache_miss(&self, session: &mut Session, _ctx: &mut Self::CTX) { + session.cache.cache_miss(); + } + + /// This filter is called after a successful cache lookup and before the cache asset is ready to + /// be used. + /// + /// This filter allow the user to log or force expire the asset. + // flex purge, other filtering, returns whether asset is should be force expired or not + async fn cache_hit_filter( + &self, + _meta: &CacheMeta, + _ctx: &mut Self::CTX, + _req: &RequestHeader, + ) -> Result<bool> + where + Self::CTX: Send + Sync, + { + Ok(false) + } + + /// Decide if a request should continue to upstream after not being served from cache. + /// + /// returns: Ok(true) if the request should continue, Ok(false) if a response was written by the + /// callback and the session should be finished, or an error + /// + /// This filter can be used for deferring checks like rate limiting or access control to when they + /// actually needed after cache miss. + async fn proxy_upstream_filter( + &self, + _session: &mut Session, + _ctx: &mut Self::CTX, + ) -> Result<bool> + where + Self::CTX: Send + Sync, + { + Ok(true) + } + + /// Decide if the response is cacheable + fn response_cache_filter( + &self, + _session: &Session, + _resp: &ResponseHeader, + _ctx: &mut Self::CTX, + ) -> Result<RespCacheable> { + Ok(Uncacheable(NoCacheReason::Custom("default"))) + } + + /// Decide how to generate cache vary key from both request and response + /// + /// None means no variance is need. + fn cache_vary_filter( + &self, + _meta: &CacheMeta, + _ctx: &mut Self::CTX, + _req: &RequestHeader, + ) -> Option<HashBinary> { + // default to None for now to disable vary feature + None + } + + /// Modify the request before it is sent to the upstream + /// + /// Unlike [Self::request_filter()], this filter allows to change the request headers to send + /// to the upstream. + async fn upstream_request_filter( + &self, + _session: &mut Session, + _upstream_request: &mut RequestHeader, + _ctx: &mut Self::CTX, + ) -> Result<()> + where + Self::CTX: Send + Sync, + { + Ok(()) + } + + /// Modify the response header from the upstream + /// + /// The modification is before caching so any change here will be stored in cache if enabled. + /// + /// Responses served from cache won't trigger this filter. + fn upstream_response_filter( + &self, + _session: &mut Session, + _upstream_response: &mut ResponseHeader, + _ctx: &mut Self::CTX, + ) { + } + + /// Modify the response header before it is send to the downstream + /// + /// The modification is after caching. This filter is called for all responses including + /// responses served from cache.. + async fn response_filter( + &self, + _session: &mut Session, + _upstream_response: &mut ResponseHeader, + _ctx: &mut Self::CTX, + ) -> Result<()> + where + Self::CTX: Send + Sync, + { + Ok(()) + } + + /// Similar to [Self::upstream_response_filter()] but for response body + /// + /// This function will be called every time a piece of response body is received. The `body` is + /// **not the entire response body**. + fn upstream_response_body_filter( + &self, + _session: &mut Session, + _body: &Option<Bytes>, + _end_of_stream: bool, + _ctx: &mut Self::CTX, + ) { + } + + /// Similar to [Self::response_filter()] but for response body chunks + fn response_body_filter( + &self, + _session: &mut Session, + _body: &Option<Bytes>, + _ctx: &mut Self::CTX, + ) -> Result<Option<std::time::Duration>> + where + Self::CTX: Send + Sync, + { + Ok(None) + } + + /// When a trailer is received. + async fn response_trailer_filter( + &self, + _session: &mut Session, + _upstream_trailers: &mut header::HeaderMap, + _ctx: &mut Self::CTX, + ) -> Result<Option<Bytes>> + where + Self::CTX: Send + Sync, + { + Ok(None) + } + + /// This filter is called when the entire response is sent to the downstream successfully or + /// there is a fatal error that terminate the request. + /// + /// An error log is already emitted if there is any error. This phase is used for collecting + /// metrics and sending access logs. + async fn logging(&self, _session: &mut Session, _e: Option<&Error>, _ctx: &mut Self::CTX) + where + Self::CTX: Send + Sync, + { + } + + /// A value of true means that the log message will be suppressed. The default value is false. + fn suppress_error_log(&self, _session: &Session, _ctx: &Self::CTX, _error: &Error) -> bool { + false + } + + /// This filter is called when there is an error **after** a connection is established (or reused) + /// to the upstream. + fn error_while_proxy( + &self, + peer: &HttpPeer, + session: &mut Session, + e: Box<Error>, + _ctx: &mut Self::CTX, + client_reused: bool, + ) -> Box<Error> { + let mut e = e.more_context(format!("Peer: {}", peer)); + // only reused client connections where retry buffer is not truncated + e.retry + .decide_reuse(client_reused && !session.as_ref().retry_buffer_truncated()); + e + } + + /// This filter is called when there is an error in the process of establishing a connection + /// to the upstream. + /// + /// In this filter the user can decide whether the error is retry-able by marking the error `e`. + /// + /// If the error can be retried, [Self::upstream_peer()] will be called again so that the user + /// can decide whether to send the request to the same upstream or another upstream that is possibly + /// avaliable. + fn fail_to_connect( + &self, + _session: &mut Session, + _peer: &HttpPeer, + _ctx: &mut Self::CTX, + e: Box<Error>, + ) -> Box<Error> { + e + } + + /// This filter is called when the request encounters a fatal error. + /// + /// Users may write an error response to the downstream if the downstream is still writable. + /// + /// The response status code of the error response maybe returned for logging purpose. + async fn fail_to_proxy(&self, session: &mut Session, e: &Error, _ctx: &mut Self::CTX) -> u16 + where + Self::CTX: Send + Sync, + { + let server_session = session.as_mut(); + let code = match e.etype() { + HTTPStatus(code) => *code, + _ => { + match e.esource() { + ErrorSource::Upstream => 502, + ErrorSource::Downstream => { + match e.etype() { + WriteError | ReadError | ConnectionClosed => { + /* conn already dead */ + 0 + } + _ => 400, + } + } + ErrorSource::Internal | ErrorSource::Unset => 500, + } + } + }; + if code > 0 { + server_session.respond_error(code).await + } + code + } + + /// Decide whether should serve stale when encountering an error or during revalidation + /// + /// An implementation should follow + /// <https://datatracker.ietf.org/doc/html/rfc9111#section-4.2.4> + /// <https://www.rfc-editor.org/rfc/rfc5861#section-4> + /// + /// This filter is only called if cache is enabled. + // 5xx HTTP status will be encoded as ErrorType::HTTPStatus(code) + fn should_serve_stale( + &self, + _session: &mut Session, + _ctx: &mut Self::CTX, + error: Option<&Error>, // None when it is called during stale while revalidate + ) -> bool { + // A cache MUST NOT generate a stale response unless + // it is disconnected + // or doing so is explicitly permitted by the client or origin server + // (e.g. headers or an out-of-band contract) + error.map_or(false, |e| e.esource() == &ErrorSource::Upstream) + } + + /// This filter is called when the request just established or reused a connection to the upstream + /// + /// This filter allows user to log timing and connection related info. + async fn connected_to_upstream( + &self, + _session: &mut Session, + _reused: bool, + _peer: &HttpPeer, + _fd: std::os::unix::io::RawFd, + _digest: Option<&Digest>, + _ctx: &mut Self::CTX, + ) -> Result<()> + where + Self::CTX: Send + Sync, + { + Ok(()) + } + + /// This callback is invoked every time request related error log needs to be generated + /// + /// Users can define what is the important to be written about this request via the returned string. + fn request_summary(&self, session: &Session, _ctx: &Self::CTX) -> String { + session.as_ref().request_summary() + } + + /// Whether the request should be used to invalidate(delete) the HTTP cache + /// + /// - `true`: this request will be used to invalidate the cache. + /// - `false`: this request is a treated as an normal request + fn is_purge(&self, _session: &Session, _ctx: &Self::CTX) -> bool { + false + } +} diff --git a/pingora-proxy/src/subrequest.rs b/pingora-proxy/src/subrequest.rs new file mode 100644 index 0000000..9490a40 --- /dev/null +++ b/pingora-proxy/src/subrequest.rs @@ -0,0 +1,134 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use async_trait::async_trait; +use core::pin::Pin; +use core::task::{Context, Poll}; +use pingora_cache::lock::WritePermit; +use pingora_core::protocols::raw_connect::ProxyDigest; +use pingora_core::protocols::{GetProxyDigest, GetTimingDigest, Ssl, TimingDigest, UniqueID}; +use std::io::Cursor; +use std::sync::Arc; +use tokio::io::{AsyncRead, AsyncWrite, Error, ReadBuf}; + +// An async IO stream that returns the request when being read from and dumps the data to the void +// when being write to +#[derive(Debug)] +pub(crate) struct DummyIO(Cursor<Vec<u8>>); + +impl DummyIO { + pub fn new(read_bytes: &[u8]) -> Self { + DummyIO(Cursor::new(Vec::from(read_bytes))) + } +} + +impl AsyncRead for DummyIO { + fn poll_read( + mut self: Pin<&mut Self>, + cx: &mut Context<'_>, + buf: &mut ReadBuf<'_>, + ) -> Poll<Result<(), Error>> { + if self.0.position() < self.0.get_ref().len() as u64 { + Pin::new(&mut self.0).poll_read(cx, buf) + } else { + // all data is read, pending forever otherwise the stream is considered closed + Poll::Pending + } + } +} + +impl AsyncWrite for DummyIO { + fn poll_write( + self: Pin<&mut Self>, + _cx: &mut Context<'_>, + buf: &[u8], + ) -> Poll<Result<usize, Error>> { + Poll::Ready(Ok(buf.len())) + } + + fn poll_flush(self: Pin<&mut Self>, _cx: &mut Context<'_>) -> Poll<Result<(), Error>> { + Poll::Ready(Ok(())) + } + fn poll_shutdown(self: Pin<&mut Self>, _cx: &mut Context<'_>) -> Poll<Result<(), Error>> { + Poll::Ready(Ok(())) + } +} + +impl UniqueID for DummyIO { + fn id(&self) -> i32 { + 0 // placeholder + } +} + +impl Ssl for DummyIO {} + +impl GetTimingDigest for DummyIO { + fn get_timing_digest(&self) -> Vec<Option<TimingDigest>> { + vec![] + } +} + +impl GetProxyDigest for DummyIO { + fn get_proxy_digest(&self) -> Option<Arc<ProxyDigest>> { + None + } +} + +#[async_trait] +impl pingora_core::protocols::Shutdown for DummyIO { + async fn shutdown(&mut self) -> () {} +} + +#[tokio::test] +async fn test_dummy_io() { + use futures::FutureExt; + use tokio::io::{AsyncReadExt, AsyncWriteExt}; + + let mut dummy = DummyIO::new(&[1, 2]); + let res = dummy.read_u8().await; + assert_eq!(res.unwrap(), 1); + let res = dummy.read_u8().await; + assert_eq!(res.unwrap(), 2); + let res = dummy.read_u8().now_or_never(); + assert!(res.is_none()); // pending forever + let res = dummy.write_u8(0).await; + assert!(res.is_ok()); +} + +// To share state across the parent req and the sub req +pub(crate) struct Ctx { + pub(crate) write_lock: Option<WritePermit>, +} + +use crate::HttpSession; + +pub(crate) fn create_dummy_session(parsed_session: &HttpSession) -> HttpSession { + // TODO: check if there is req body, we don't capture the body for now + HttpSession::new_http1(Box::new(DummyIO::new(&parsed_session.to_h1_raw()))) +} + +#[tokio::test] +async fn test_dummy_request() { + use tokio_test::io::Builder; + + let input = b"GET / HTTP/1.1\r\n\r\n"; + let mock_io = Builder::new().read(&input[..]).build(); + let mut req = HttpSession::new_http1(Box::new(mock_io)); + req.read_request().await.unwrap(); + assert_eq!(input.as_slice(), req.to_h1_raw()); + + let mut dummy_req = create_dummy_session(&req); + dummy_req.read_request().await.unwrap(); + assert_eq!(input.as_slice(), req.to_h1_raw()); +} diff --git a/pingora-proxy/tests/headers.dict b/pingora-proxy/tests/headers.dict Binary files differnew file mode 100644 index 0000000..a88fa03 --- /dev/null +++ b/pingora-proxy/tests/headers.dict diff --git a/pingora-proxy/tests/keys/key.pem b/pingora-proxy/tests/keys/key.pem new file mode 100644 index 0000000..0fe68f2 --- /dev/null +++ b/pingora-proxy/tests/keys/key.pem @@ -0,0 +1,5 @@ +-----BEGIN EC PRIVATE KEY----- +MHcCAQEEIN5lAOvtlKwtc/LR8/U77dohJmZS30OuezU9gL6vmm6DoAoGCCqGSM49 +AwEHoUQDQgAE2f/1Fm1HjySdokPq2T0F1xxol9nSEYQ+foFINeaWYk+FxMGpriJT +Bb8AGka87cWklw1ZqytfaT6pkureDbTkwg== +-----END EC PRIVATE KEY----- diff --git a/pingora-proxy/tests/keys/public.pem b/pingora-proxy/tests/keys/public.pem new file mode 100644 index 0000000..0866a04 --- /dev/null +++ b/pingora-proxy/tests/keys/public.pem @@ -0,0 +1,4 @@ +-----BEGIN PUBLIC KEY----- +MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE2f/1Fm1HjySdokPq2T0F1xxol9nS +EYQ+foFINeaWYk+FxMGpriJTBb8AGka87cWklw1ZqytfaT6pkureDbTkwg== +-----END PUBLIC KEY----- diff --git a/pingora-proxy/tests/keys/server.crt b/pingora-proxy/tests/keys/server.crt new file mode 100644 index 0000000..afb2d1e --- /dev/null +++ b/pingora-proxy/tests/keys/server.crt @@ -0,0 +1,13 @@ +-----BEGIN CERTIFICATE----- +MIIB9zCCAZ2gAwIBAgIUMI7aLvTxyRFCHhw57hGt4U6yupcwCgYIKoZIzj0EAwIw +ZDELMAkGA1UEBhMCVVMxCzAJBgNVBAgMAkNBMRYwFAYDVQQHDA1TYW4gRnJhbmNp +c2NvMRgwFgYDVQQKDA9DbG91ZGZsYXJlLCBJbmMxFjAUBgNVBAMMDW9wZW5ydXN0 +eS5vcmcwHhcNMjIwNDExMjExMzEzWhcNMzIwNDA4MjExMzEzWjBkMQswCQYDVQQG +EwJVUzELMAkGA1UECAwCQ0ExFjAUBgNVBAcMDVNhbiBGcmFuY2lzY28xGDAWBgNV +BAoMD0Nsb3VkZmxhcmUsIEluYzEWMBQGA1UEAwwNb3BlbnJ1c3R5Lm9yZzBZMBMG +ByqGSM49AgEGCCqGSM49AwEHA0IABNn/9RZtR48knaJD6tk9BdccaJfZ0hGEPn6B +SDXmlmJPhcTBqa4iUwW/ABpGvO3FpJcNWasrX2k+qZLq3g205MKjLTArMCkGA1Ud +EQQiMCCCDyoub3BlbnJ1c3R5Lm9yZ4INb3BlbnJ1c3R5Lm9yZzAKBggqhkjOPQQD +AgNIADBFAiAjISZ9aEKmobKGlT76idO740J6jPaX/hOrm41MLeg69AIhAJqKrSyz +wD/AAF5fR6tXmBqlnpQOmtxfdy13wDr4MT3h +-----END CERTIFICATE----- diff --git a/pingora-proxy/tests/keys/server.csr b/pingora-proxy/tests/keys/server.csr new file mode 100644 index 0000000..ca75dce --- /dev/null +++ b/pingora-proxy/tests/keys/server.csr @@ -0,0 +1,9 @@ +-----BEGIN CERTIFICATE REQUEST----- +MIIBJzCBzgIBADBsMQswCQYDVQQGEwJVUzETMBEGA1UECAwKQ2FsaWZvcm5pYTEW +MBQGA1UEBwwNU2FuIEZyYW5jaXNjbzEYMBYGA1UECgwPQ2xvdWRmbGFyZSwgSW5j +MRYwFAYDVQQDDA1vcGVucnVzdHkub3JnMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcD +QgAE2f/1Fm1HjySdokPq2T0F1xxol9nSEYQ+foFINeaWYk+FxMGpriJTBb8AGka8 +7cWklw1ZqytfaT6pkureDbTkwqAAMAoGCCqGSM49BAMCA0gAMEUCIFyDN8eamnoY +XydKn2oI7qImigxahyCftzjxkIEV5IKbAiEAo5l72X4U+YTVYmyPPnJIj2v5nA1R +RuUfMh5sXzwlwuM= +-----END CERTIFICATE REQUEST----- diff --git a/pingora-proxy/tests/pingora_conf.yaml b/pingora-proxy/tests/pingora_conf.yaml new file mode 100644 index 0000000..c21ae15 --- /dev/null +++ b/pingora-proxy/tests/pingora_conf.yaml @@ -0,0 +1,5 @@ +--- +version: 1 +client_bind_to_ipv4: + - 127.0.0.2 +ca_file: tests/keys/server.crt
\ No newline at end of file diff --git a/pingora-proxy/tests/test_basic.rs b/pingora-proxy/tests/test_basic.rs new file mode 100644 index 0000000..a4730bf --- /dev/null +++ b/pingora-proxy/tests/test_basic.rs @@ -0,0 +1,736 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +mod utils; + +use hyper::Client; +use hyperlocal::{UnixClientExt, Uri}; +use reqwest::{header, StatusCode}; + +use utils::server_utils::init; + +#[tokio::test] +async fn test_origin_alive() { + init(); + let res = reqwest::get("http://127.0.0.1:8000/").await.unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers[header::CONTENT_LENGTH], "13"); + let body = res.text().await.unwrap(); + assert_eq!(body, "Hello World!\n"); +} + +#[tokio::test] +async fn test_simple_proxy() { + init(); + let res = reqwest::get("http://127.0.0.1:6147").await.unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers[header::CONTENT_LENGTH], "13"); + let body = res.text().await.unwrap(); + assert_eq!(body, "Hello World!\n"); +} + +#[tokio::test] +async fn test_h2_to_h1() { + init(); + let client = reqwest::Client::builder() + .danger_accept_invalid_certs(true) + .build() + .unwrap(); + + let res = client.get("https://127.0.0.1:6150").send().await.unwrap(); + assert_eq!(res.status(), reqwest::StatusCode::OK); + assert_eq!(res.version(), reqwest::Version::HTTP_2); + let headers = res.headers(); + assert_eq!(headers[header::CONTENT_LENGTH], "13"); + let body = res.text().await.unwrap(); + assert_eq!(body, "Hello World!\n"); +} + +#[tokio::test] +async fn test_h2_to_h2() { + init(); + let client = reqwest::Client::builder() + .danger_accept_invalid_certs(true) + .build() + .unwrap(); + + let res = client + .get("https://127.0.0.1:6150") + .header("x-h2", "true") + .send() + .await + .unwrap(); + assert_eq!(res.status(), reqwest::StatusCode::OK); + assert_eq!(res.version(), reqwest::Version::HTTP_2); + let headers = res.headers(); + assert_eq!(headers[header::CONTENT_LENGTH], "13"); + let body = res.text().await.unwrap(); + assert_eq!(body, "Hello World!\n"); +} + +#[tokio::test] +async fn test_h2_to_h2_host_override() { + init(); + let client = reqwest::Client::builder() + .danger_accept_invalid_certs(true) + .build() + .unwrap(); + + let res = client + .get("https://127.0.0.1:6150") + .header("x-h2", "true") + .header("host-override", "test.com") + .send() + .await + .unwrap(); + assert_eq!(res.status(), reqwest::StatusCode::OK); + assert_eq!(res.version(), reqwest::Version::HTTP_2); + let headers = res.headers(); + assert_eq!(headers[header::CONTENT_LENGTH], "13"); + let body = res.text().await.unwrap(); + assert_eq!(body, "Hello World!\n"); +} + +#[tokio::test] +async fn test_h2_to_h2_upload() { + init(); + let client = reqwest::Client::builder() + .danger_accept_invalid_certs(true) + .build() + .unwrap(); + + let payload = "test upload"; + + let res = client + .get("https://127.0.0.1:6150/echo") + .header("x-h2", "true") + .body(payload) + .send() + .await + .unwrap(); + assert_eq!(res.status(), reqwest::StatusCode::OK); + assert_eq!(res.version(), reqwest::Version::HTTP_2); + let body = res.text().await.unwrap(); + assert_eq!(body, payload); +} + +#[tokio::test] +async fn test_h2_to_h1_upload() { + init(); + let client = reqwest::Client::builder() + .danger_accept_invalid_certs(true) + .build() + .unwrap(); + + let payload = "test upload"; + + let res = client + .get("https://127.0.0.1:6150/echo") + .body(payload) + .send() + .await + .unwrap(); + assert_eq!(res.status(), reqwest::StatusCode::OK); + assert_eq!(res.version(), reqwest::Version::HTTP_2); + let body = res.text().await.unwrap(); + assert_eq!(body, payload); +} + +#[tokio::test] +async fn test_simple_proxy_uds() { + init(); + let url = Uri::new("/tmp/pingora_proxy.sock", "/").into(); + let client = Client::unix(); + + let res = client.get(url).await.unwrap(); + + assert_eq!(res.status(), reqwest::StatusCode::OK); + let (resp, body) = res.into_parts(); + assert_eq!(resp.headers[header::CONTENT_LENGTH], "13"); + let body = hyper::body::to_bytes(body).await.unwrap(); + assert_eq!(body.as_ref(), b"Hello World!\n"); +} + +#[tokio::test] +async fn test_simple_proxy_uds_peer() { + init(); + let client = reqwest::Client::new(); + let res = client + .get("http://127.0.0.1:6147") + .header("x-uds-peer", "1") // force upstream peer to be UDS + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers[header::CONTENT_LENGTH], "13"); + let body = res.text().await.unwrap(); + assert_eq!(body, "Hello World!\n"); +} + +async fn test_dropped_conn_get() { + init(); + let client = reqwest::Client::new(); + let port = "8001"; // special port to avoid unexpected connection reuse from other tests + + for _ in 1..3 { + // load conns into pool + let res = client + .get("http://127.0.0.1:6147") + .header("x-port", port) + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + } + + let res = client + .get("http://127.0.0.1:6147/bad_lb") + .header("x-port", port) + .send() + .await + .unwrap(); + + // retry gives 200 + assert_eq!(res.status(), StatusCode::OK); + let body = res.text().await.unwrap(); + assert_eq!(body, "dog!\n"); +} + +async fn test_dropped_conn_post_empty_body() { + init(); + let client = reqwest::Client::new(); + let port = "8001"; // special port to avoid unexpected connection reuse from other tests + + for _ in 1..3 { + // load conn into pool + let res = client + .get("http://127.0.0.1:6147") + .header("x-port", port) + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + } + + let res = client + .post("http://127.0.0.1:6147/bad_lb") + .header("x-port", port) + .send() + .await + .unwrap(); + + assert_eq!(res.status(), StatusCode::OK); + let body = res.text().await.unwrap(); + assert_eq!(body, "dog!\n"); +} + +async fn test_dropped_conn_post_body() { + init(); + let client = reqwest::Client::new(); + let port = "8001"; // special port to avoid unexpected connection reuse from other tests + + for _ in 1..3 { + // load conn into pool + let res = client + .get("http://127.0.0.1:6147") + .header("x-port", port) + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + } + + let res = client + .post("http://127.0.0.1:6147/bad_lb") + .header("x-port", port) + .body("cat!") + .send() + .await + .unwrap(); + + assert_eq!(res.status(), StatusCode::OK); + let body = res.text().await.unwrap(); + assert_eq!(body, "cat!\n"); +} + +async fn test_dropped_conn_post_body_over() { + init(); + let client = reqwest::Client::new(); + let port = "8001"; // special port to avoid unexpected connection reuse from other tests + let large_body = String::from_utf8(vec![b'e'; 1024 * 64 + 1]).unwrap(); + + for _ in 1..3 { + // load conn into pool + let res = client + .get("http://127.0.0.1:6147") + .header("x-port", port) + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + } + + let res = client + .post("http://127.0.0.1:6147/bad_lb") + .header("x-port", port) + .body(large_body) + .send() + .await + .unwrap(); + + // 502, body larger than buffer limit + assert_eq!(res.status(), StatusCode::from_u16(502).unwrap()); +} + +#[tokio::test] +async fn test_dropped_conn() { + // These tests can race with each other + // So force run them sequentially + test_dropped_conn_get().await; + test_dropped_conn_post_empty_body().await; + test_dropped_conn_post_body().await; + test_dropped_conn_post_body_over().await; +} + +#[tokio::test] +async fn test_tls_no_verify() { + init(); + let client = reqwest::Client::new(); + let res = client + .get("http://127.0.0.1:6149/tls_verify") + .send() + .await + .unwrap(); + + assert_eq!(res.status(), StatusCode::OK); +} + +#[tokio::test] +async fn test_tls_verify_sni_not_host() { + init(); + let client = reqwest::Client::new(); + + let res = client + .get("http://127.0.0.1:6149/tls_verify") + .header("sni", "openrusty.org") + .header("verify", "1") + .send() + .await + .unwrap(); + + assert_eq!(res.status(), StatusCode::OK); +} + +#[tokio::test] +async fn test_tls_none_verify_host() { + init(); + let client = reqwest::Client::new(); + + let res = client + .get("http://127.0.0.1:6149/tls_verify") + .header("verify", "1") + .header("verify_host", "1") + .send() + .await + .unwrap(); + + assert_eq!(res.status(), StatusCode::OK); +} + +#[tokio::test] +async fn test_tls_verify_sni_host() { + init(); + let client = reqwest::Client::new(); + + let res = client + .get("http://127.0.0.1:6149/tls_verify") + .header("sni", "openrusty.org") + .header("verify", "1") + .header("verify_host", "1") + .send() + .await + .unwrap(); + + assert_eq!(res.status(), StatusCode::OK); +} + +#[tokio::test] +async fn test_tls_underscore_sub_sni_verify_host() { + init(); + let client = reqwest::Client::new(); + + let res = client + .get("http://127.0.0.1:6149/tls_verify") + .header("sni", "d_g.openrusty.org") + .header("verify", "1") + .header("verify_host", "1") + .send() + .await + .unwrap(); + + assert_eq!(res.status(), StatusCode::OK); +} + +#[tokio::test] +async fn test_tls_underscore_non_sub_sni_verify_host() { + init(); + let client = reqwest::Client::new(); + + let res = client + .get("http://127.0.0.1:6149/tls_verify") + .header("sni", "open_rusty.org") + .header("verify", "1") + .header("verify_host", "1") + .send() + .await + .unwrap(); + + assert_eq!(res.status(), StatusCode::BAD_GATEWAY); + let headers = res.headers(); + assert_eq!(headers[header::CONNECTION], "close"); +} + +#[tokio::test] +async fn test_tls_alt_verify_host() { + init(); + let client = reqwest::Client::new(); + + let res = client + .get("http://127.0.0.1:6149/tls_verify") + .header("sni", "open_rusty.org") + .header("alt", "openrusty.org") + .header("verify", "1") + .header("verify_host", "1") + .send() + .await + .unwrap(); + + assert_eq!(res.status(), StatusCode::OK); +} + +#[tokio::test] +async fn test_tls_underscore_sub_alt_verify_host() { + init(); + let client = reqwest::Client::new(); + + let res = client + .get("http://127.0.0.1:6149/tls_verify") + .header("sni", "open_rusty.org") + .header("alt", "d_g.openrusty.org") + .header("verify", "1") + .header("verify_host", "1") + .send() + .await + .unwrap(); + + assert_eq!(res.status(), StatusCode::OK); +} + +#[tokio::test] +async fn test_tls_underscore_non_sub_alt_verify_host() { + init(); + let client = reqwest::Client::new(); + + let res = client + .get("http://127.0.0.1:6149/tls_verify") + .header("sni", "open_rusty.org") + .header("alt", "open_rusty.org") + .header("verify", "1") + .header("verify_host", "1") + .send() + .await + .unwrap(); + + assert_eq!(res.status(), StatusCode::BAD_GATEWAY); +} + +#[tokio::test] +async fn test_upstream_compression() { + init(); + + // disable reqwest gzip support to check compression headers and body + // otherwise reqwest will decompress and strip the headers + let client = reqwest::ClientBuilder::new().gzip(false).build().unwrap(); + let res = client + .get("http://127.0.0.1:6147/no_compression") + .header("accept-encoding", "gzip") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + assert_eq!(res.headers().get("Content-Encoding").unwrap(), "gzip"); + let body = res.bytes().await.unwrap(); + assert!(body.len() < 32); + + // Next let reqwest decompress to validate the data + let client = reqwest::ClientBuilder::new().gzip(true).build().unwrap(); + let res = client + .get("http://127.0.0.1:6147/no_compression") + .header("accept-encoding", "gzip") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let body = res.bytes().await.unwrap(); + assert_eq!(body.as_ref(), &[b'B'; 32]); +} + +#[tokio::test] +async fn test_downstream_compression() { + init(); + + // disable reqwest gzip support to check compression headers and body + // otherwise reqwest will decompress and strip the headers + let client = reqwest::ClientBuilder::new().gzip(false).build().unwrap(); + let res = client + .get("http://127.0.0.1:6147/no_compression") + // tell the test proxy to use downstream compression module instead of upstream + .header("x-downstream-compression", "1") + .header("accept-encoding", "gzip") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + assert_eq!(res.headers().get("Content-Encoding").unwrap(), "gzip"); + let body = res.bytes().await.unwrap(); + assert!(body.len() < 32); + + // Next let reqwest decompress to validate the data + let client = reqwest::ClientBuilder::new().gzip(true).build().unwrap(); + let res = client + .get("http://127.0.0.1:6147/no_compression") + .header("accept-encoding", "gzip") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let body = res.bytes().await.unwrap(); + assert_eq!(body.as_ref(), &[b'B'; 32]); +} + +#[tokio::test] +async fn test_connect_close() { + init(); + + // default keep-alive + let client = reqwest::ClientBuilder::new().build().unwrap(); + let res = client.get("http://127.0.0.1:6147").send().await.unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers[header::CONTENT_LENGTH], "13"); + assert_eq!(headers[header::CONNECTION], "keep-alive"); + let body = res.text().await.unwrap(); + assert_eq!(body, "Hello World!\n"); + + // close + let client = reqwest::ClientBuilder::new().build().unwrap(); + let res = client + .get("http://127.0.0.1:6147") + .header("connection", "close") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers[header::CONTENT_LENGTH], "13"); + assert_eq!(headers[header::CONNECTION], "close"); + let body = res.text().await.unwrap(); + assert_eq!(body, "Hello World!\n"); +} + +#[tokio::test] +async fn test_mtls_no_client_cert() { + init(); + let client = reqwest::Client::new(); + + let res = client + .get("http://127.0.0.1:6149/tls_verify") + .header("x-port", "8444") + .header("sni", "openrusty.org") + .header("verify", "1") + .header("verify_host", "1") + .send() + .await + .unwrap(); + + // 400: because no cert + assert_eq!(res.status(), StatusCode::BAD_REQUEST); +} + +#[tokio::test] +async fn test_mtls_no_intermediate_cert() { + init(); + let client = reqwest::Client::new(); + + let res = client + .get("http://127.0.0.1:6149/tls_verify") + .header("x-port", "8444") + .header("sni", "openrusty.org") + .header("verify", "1") + .header("verify_host", "1") + .header("client_cert", "1") + .send() + .await + .unwrap(); + + // 400: because no intermediate cert + assert_eq!(res.status(), StatusCode::BAD_REQUEST); +} + +#[tokio::test] +async fn test_mtls() { + init(); + let client = reqwest::Client::new(); + + let res = client + .get("http://127.0.0.1:6149/") + .header("x-port", "8444") + .header("sni", "openrusty.org") + .header("verify", "1") + .header("verify_host", "1") + .header("client_cert", "1") + .header("client_intermediate", "1") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); +} + +async fn assert_reuse(req: reqwest::RequestBuilder) { + req.try_clone().unwrap().send().await.unwrap(); + let res = req.send().await.unwrap(); + let headers = res.headers(); + assert!(headers.get("x-conn-reuse").is_some()); +} + +#[tokio::test] +async fn test_mtls_diff_cert_no_reuse() { + init(); + let client = reqwest::Client::new(); + + let req = client + .get("http://127.0.0.1:6149/") + .header("x-port", "8444") + .header("sni", "openrusty.org") + .header("verify", "1") + .header("verify_host", "1") + .header("client_cert", "1") + .header("client_intermediate", "1"); + + // pre check re-use + assert_reuse(req).await; + + // different cert no re-use + let res = client + .get("http://127.0.0.1:6149/") + .header("x-port", "8444") + .header("sni", "openrusty.org") + .header("verify", "1") + .header("verify_host", "1") + .header("client_cert", "2") + .header("client_intermediate", "1") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert!(headers.get("x-conn-reuse").is_none()); +} + +#[tokio::test] +async fn test_tls_diff_verify_no_reuse() { + init(); + let client = reqwest::Client::new(); + + let req = client + .get("http://127.0.0.1:6149/") + .header("sni", "dog.openrusty.org") + .header("verify", "1"); + + // pre check re-use + assert_reuse(req).await; + + // disable 'verify' no re-use + let res = client + .get("http://127.0.0.1:6149/") + .header("sni", "dog.openrusty.org") + .header("verify", "0") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert!(headers.get("x-conn-reuse").is_none()); +} + +#[tokio::test] +async fn test_tls_diff_verify_host_no_reuse() { + init(); + let client = reqwest::Client::new(); + + let req = client + .get("http://127.0.0.1:6149/") + .header("sni", "cat.openrusty.org") + .header("verify", "1") + .header("verify_host", "1"); + + // pre check re-use + assert_reuse(req).await; + + // disable 'verify_host' no re-use + let res = client + .get("http://127.0.0.1:6149/") + .header("sni", "cat.openrusty.org") + .header("verify", "1") + .header("verify_host", "0") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert!(headers.get("x-conn-reuse").is_none()); +} + +#[tokio::test] +async fn test_tls_diff_alt_cnt_no_reuse() { + init(); + let client = reqwest::Client::new(); + + let req = client + .get("http://127.0.0.1:6149/") + .header("sni", "openrusty.org") + .header("alt", "cat.com") + .header("verify", "1") + .header("verify_host", "1"); + + // pre check re-use + assert_reuse(req).await; + + // use alt-cn no reuse + let res = client + .get("http://127.0.0.1:6149/") + .header("sni", "openrusty.org") + .header("alt", "dog.com") + .header("verify", "1") + .header("verify_host", "1") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert!(headers.get("x-conn-reuse").is_none()); +} diff --git a/pingora-proxy/tests/test_upstream.rs b/pingora-proxy/tests/test_upstream.rs new file mode 100644 index 0000000..0456b94 --- /dev/null +++ b/pingora-proxy/tests/test_upstream.rs @@ -0,0 +1,1625 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +mod utils; + +use utils::server_utils::init; +use utils::websocket::WS_ECHO; + +use futures::{SinkExt, StreamExt}; +use reqwest::header::HeaderValue; +use reqwest::StatusCode; +use std::time::Duration; +use tokio_tungstenite::tungstenite::{client::IntoClientRequest, Message}; + +#[tokio::test] +async fn test_ip_binding() { + init(); + let res = reqwest::get("http://127.0.0.1:6147/client_ip") + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-client-ip"], "127.0.0.2"); +} + +#[tokio::test] +async fn test_duplex() { + init(); + // NOTE: this doesn't really verify that we are in full duplex mode as reqwest + // won't allow us control when req body is sent + let client = reqwest::Client::new(); + let res = client + .post("http://127.0.0.1:6147/duplex/") + .body("b".repeat(1024 * 1024)) // 1 MB upload + .timeout(Duration::from_secs(5)) + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let body = res.text().await.unwrap(); + assert_eq!(body.len(), 64 * 5); +} + +#[tokio::test] +async fn test_connection_die() { + init(); + let res = reqwest::get("http://127.0.0.1:6147/connection_die") + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let body = res.text().await; + // reqwest doesn't allow us to inspect the partial body + assert!(body.is_err()); +} + +#[tokio::test] +async fn test_upload() { + init(); + let client = reqwest::Client::new(); + let res = client + .post("http://127.0.0.1:6147/upload/") + .body("b".repeat(15 * 1024 * 1024)) // 15 MB upload + .timeout(Duration::from_secs(5)) + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let body = res.text().await.unwrap(); + assert_eq!(body.len(), 64 * 5); +} + +#[tokio::test] +async fn test_ws_server_ends_conn() { + init(); + let _ = *WS_ECHO; + + // server gracefully closes connection + + let mut req = "ws://127.0.0.1:6147".into_client_request().unwrap(); + req.headers_mut() + .insert("x-port", HeaderValue::from_static("9283")); + + let (mut ws_stream, _) = tokio_tungstenite::connect_async(req).await.unwrap(); + // gracefully close connection + ws_stream.send("test".into()).await.unwrap(); + ws_stream.next().await.unwrap().unwrap(); + ws_stream.send("graceful".into()).await.unwrap(); + let msg = ws_stream.next().await.unwrap().unwrap(); + // assert graceful close + assert!(matches!(msg, Message::Close(None))); + // test may hang here if downstream doesn't close when upstream does + assert!(ws_stream.next().await.is_none()); + + // server abruptly closes connection + + let mut req = "ws://127.0.0.1:6147".into_client_request().unwrap(); + req.headers_mut() + .insert("x-port", HeaderValue::from_static("9283")); + + let (mut ws_stream, _) = tokio_tungstenite::connect_async(req).await.unwrap(); + // abrupt close connection + ws_stream.send("close".into()).await.unwrap(); + // test will hang here if downstream doesn't close when upstream does + assert!(ws_stream.next().await.unwrap().is_err()); + + // client gracefully closes connection + + let mut req = "ws://127.0.0.1:6147".into_client_request().unwrap(); + req.headers_mut() + .insert("x-port", HeaderValue::from_static("9283")); + + let (mut ws_stream, _) = tokio_tungstenite::connect_async(req).await.unwrap(); + ws_stream.send("test".into()).await.unwrap(); + // sender initiates close + ws_stream.close(None).await.unwrap(); + let msg = ws_stream.next().await.unwrap().unwrap(); + // assert echo + assert_eq!("test", msg.into_text().unwrap()); + let msg = ws_stream.next().await.unwrap().unwrap(); + // assert graceful close + assert!(matches!(msg, Message::Close(None))); + assert!(ws_stream.next().await.is_none()); +} + +mod test_cache { + use super::*; + use tokio::time::{sleep, Duration}; + + #[tokio::test] + async fn test_basic_caching() { + init(); + let url = "http://127.0.0.1:6148/unique/test_basic_caching/now"; + + let res = reqwest::get(url).await.unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + let cache_miss_epoch = headers["x-epoch"].to_str().unwrap().parse::<f64>().unwrap(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), "hello world"); + + let res = reqwest::get(url).await.unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + let cache_hit_epoch = headers["x-epoch"].to_str().unwrap().parse::<f64>().unwrap(); + assert_eq!(headers["x-cache-status"], "hit"); + assert_eq!(res.text().await.unwrap(), "hello world"); + + assert_eq!(cache_miss_epoch, cache_hit_epoch); + + sleep(Duration::from_millis(1100)).await; // ttl is 1 + + let res = reqwest::get(url).await.unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + let cache_expired_epoch = headers["x-epoch"].to_str().unwrap().parse::<f64>().unwrap(); + assert_eq!(headers["x-cache-status"], "expired"); + assert_eq!(res.text().await.unwrap(), "hello world"); + + assert!(cache_expired_epoch > cache_hit_epoch); + } + + #[tokio::test] + async fn test_purge() { + init(); + let res = reqwest::get("http://127.0.0.1:6148/unique/test_purge/test2") + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), "hello world"); + + let res = reqwest::get("http://127.0.0.1:6148/unique/test_purge/test2") + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "hit"); + assert_eq!(res.text().await.unwrap(), "hello world"); + + let res = reqwest::Client::builder() + .build() + .unwrap() + .request( + reqwest::Method::from_bytes(b"PURGE").unwrap(), + "http://127.0.0.1:6148/unique/test_purge/test2", + ) + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + assert_eq!(res.text().await.unwrap(), ""); + + let res = reqwest::Client::builder() + .build() + .unwrap() + .request( + reqwest::Method::from_bytes(b"PURGE").unwrap(), + "http://127.0.0.1:6148/unique/test_purge/test2", + ) + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::NOT_FOUND); + assert_eq!(res.text().await.unwrap(), ""); + + let res = reqwest::get("http://127.0.0.1:6148/unique/test_purge/test2") + .await + .unwrap(); + let headers = res.headers(); + assert_eq!(res.status(), StatusCode::OK); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), "hello world"); + } + + #[tokio::test] + async fn test_cache_miss_convert() { + init(); + + // test if-* header is stripped + let client = reqwest::Client::new(); + let res = client + .get("http://127.0.0.1:6148/unique/test_cache_miss_convert/no_if_headers") + .header("if-modified-since", "Wed, 19 Jan 2022 18:39:12 GMT") + .send() + .await + .unwrap(); + // 200 because last-modified not returned from upstream + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), "no if headers detected\n"); + + // test range header is stripped + let client = reqwest::Client::new(); + let res = client + .get("http://127.0.0.1:6148/unique/test_cache_miss_convert2/no_if_headers") + .header("Range", "bytes=0-1") + .send() + .await + .unwrap(); + // we have not implemented downstream range yet, it should be 206 once we have it + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), "no if headers detected\n"); + } + + #[tokio::test] + async fn test_network_error_mid_response() { + init(); + let url = "http://127.0.0.1:6148/sleep/test_network_error_mid_response.txt"; + + let res = reqwest::Client::new() + .get(url) + .header("x-set-sleep", "0") // no need to sleep + .header("x-set-body-sleep", "0.1") // pause the body a bit before abort + .header("x-abort-body", "true") // this will tell origin to kill the conn right away + .send() + .await + .unwrap(); + assert_eq!(res.status(), 200); + // sleep just a little to make sure the req above gets the cache lock + sleep(Duration::from_millis(50)).await; + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); + // the connection dies + assert!(res.text().await.is_err()); + + let res = reqwest::Client::new() + .get(url) + .header("x-set-sleep", "0") // no need to sleep + .header("x-set-body-sleep", "0.1") // pause the body a bit before abort + .header("x-abort-body", "true") // this will tell origin to kill the conn right away + .send() + .await + .unwrap(); + assert_eq!(res.status(), 200); + // sleep just a little to make sure the req above gets the cache lock + sleep(Duration::from_millis(50)).await; + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); + // the connection dies + assert!(res.text().await.is_err()); + } + + #[tokio::test] + async fn test_cache_upstream_revalidation() { + init(); + let url = "http://127.0.0.1:6148/unique/test_upstream_revalidation/revalidate_now"; + + let res = reqwest::get(url).await.unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + let cache_miss_epoch = headers["x-epoch"].to_str().unwrap().parse::<f64>().unwrap(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), "hello world"); + + let res = reqwest::get(url).await.unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + let cache_hit_epoch = headers["x-epoch"].to_str().unwrap().parse::<f64>().unwrap(); + assert_eq!(headers["x-cache-status"], "hit"); + assert_eq!(res.text().await.unwrap(), "hello world"); + + assert_eq!(cache_miss_epoch, cache_hit_epoch); + + sleep(Duration::from_millis(1100)).await; // ttl is 1 + + let res = reqwest::get(url).await.unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + let cache_expired_epoch = headers["x-epoch"].to_str().unwrap().parse::<f64>().unwrap(); + assert_eq!(headers["x-cache-status"], "revalidated"); + assert_eq!(res.text().await.unwrap(), "hello world"); + + // still the old object + assert_eq!(cache_expired_epoch, cache_hit_epoch); + } + + #[tokio::test] + async fn test_cache_downstream_revalidation() { + init(); + let url = "http://127.0.0.1:6148/unique/test_downstream_revalidation/revalidate_now"; + let client = reqwest::Client::new(); + + // MISS + 304 + let res = client + .get(url) + .header("If-None-Match", "\"abcd\"") // the fixed etag of this endpoint + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::NOT_MODIFIED); + let headers = res.headers(); + let cache_miss_epoch = headers["x-epoch"].to_str().unwrap().parse::<f64>().unwrap(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), ""); // 304 no body + + // HIT + 304 + let res = client + .get(url) + .header("If-None-Match", "\"abcd\"") // the fixed etag of this endpoint + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::NOT_MODIFIED); + let headers = res.headers(); + let cache_hit_epoch = headers["x-epoch"].to_str().unwrap().parse::<f64>().unwrap(); + assert_eq!(headers["x-cache-status"], "hit"); + assert_eq!(res.text().await.unwrap(), ""); // 304 no body + + assert_eq!(cache_miss_epoch, cache_hit_epoch); + + sleep(Duration::from_millis(1100)).await; // ttl is 1 + + // revalidated + 304 + let res = client + .get(url) + .header("If-None-Match", "\"abcd\"") // the fixed etag of this endpoint + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::NOT_MODIFIED); + let headers = res.headers(); + let cache_expired_epoch = headers["x-epoch"].to_str().unwrap().parse::<f64>().unwrap(); + assert_eq!(headers["x-cache-status"], "revalidated"); + assert_eq!(res.text().await.unwrap(), ""); // 304 no body + + // still the old object + assert_eq!(cache_expired_epoch, cache_hit_epoch); + } + + #[tokio::test] + async fn test_cache_downstream_head() { + init(); + let url = "http://127.0.0.1:6148/unique/test_downstream_head/revalidate_now"; + let client = reqwest::Client::new(); + + // MISS + HEAD + let res = client.head(url).send().await.unwrap(); + + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + let cache_miss_epoch = headers["x-epoch"].to_str().unwrap().parse::<f64>().unwrap(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), ""); // HEAD no body + + // HIT + HEAD + let res = client.head(url).send().await.unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + let cache_hit_epoch = headers["x-epoch"].to_str().unwrap().parse::<f64>().unwrap(); + assert_eq!(headers["x-cache-status"], "hit"); + assert_eq!(res.text().await.unwrap(), ""); // HEAD no body + + assert_eq!(cache_miss_epoch, cache_hit_epoch); + + sleep(Duration::from_millis(1100)).await; // ttl is 1 + + // revalidated + HEAD + let res = client.head(url).send().await.unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + let cache_expired_epoch = headers["x-epoch"].to_str().unwrap().parse::<f64>().unwrap(); + assert_eq!(headers["x-cache-status"], "revalidated"); + assert_eq!(res.text().await.unwrap(), ""); // HEAD no body + + // still the old object + assert_eq!(cache_expired_epoch, cache_hit_epoch); + } + + #[tokio::test] + async fn test_purge_reject() { + init(); + + let res = reqwest::Client::builder() + .build() + .unwrap() + .request( + reqwest::Method::from_bytes(b"PURGE").unwrap(), + "http://127.0.0.1:6148/", + ) + .header("x-bypass-cache", "1") // not to cache this one + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::METHOD_NOT_ALLOWED); + assert_eq!(res.text().await.unwrap(), ""); + } + + #[tokio::test] + async fn test_1xx_caching() { + // 1xx shouldn't interfere with HTTP caching + + // set up a one-off mock server + // (warp / hyper don't have custom 1xx sending capabilities yet) + async fn mock_1xx_server(port: u16, cc_header: &str) { + use tokio::io::AsyncWriteExt; + + let listener = tokio::net::TcpListener::bind(format!("127.0.0.1:{}", port)) + .await + .unwrap(); + if let Ok((mut stream, _addr)) = listener.accept().await { + stream.write_all(b"HTTP/1.1 103 Early Hints\r\nLink: <https://foo.bar>; rel=preconnect\r\n\r\n").await.unwrap(); + // wait a bit so that the client can read + sleep(Duration::from_millis(100)).await; + stream.write_all(format!("HTTP/1.1 200 OK\r\nContent-Length: 5\r\nCache-Control: {}\r\n\r\nhello", cc_header).as_bytes()).await.unwrap(); + sleep(Duration::from_millis(100)).await; + } + } + + init(); + + let url = "http://127.0.0.1:6148/unique/test_1xx_caching"; + + tokio::spawn(async { + mock_1xx_server(6151, "max-age=5").await; + }); + // wait for server to start + sleep(Duration::from_millis(100)).await; + + let client = reqwest::Client::new(); + let res = client + .get(url) + .header("x-port", "6151") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), "hello"); + + let res = client + .get(url) + .header("x-port", "6151") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "hit"); + assert_eq!(res.text().await.unwrap(), "hello"); + + // 1xx shouldn't interfere with bypass + let url = "http://127.0.0.1:6148/unique/test_1xx_bypass"; + + tokio::spawn(async { + mock_1xx_server(6152, "private, no-store").await; + }); + // wait for server to start + sleep(Duration::from_millis(100)).await; + + let res = client + .get(url) + .header("x-port", "6152") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "no-cache"); + assert_eq!(res.text().await.unwrap(), "hello"); + + // restart the one-off server - still uncacheable + sleep(Duration::from_millis(100)).await; + tokio::spawn(async { + mock_1xx_server(6152, "private, no-store").await; + }); + // wait for server to start + sleep(Duration::from_millis(100)).await; + + let res = client + .get(url) + .header("x-port", "6152") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "no-cache"); + assert_eq!(res.text().await.unwrap(), "hello"); + } + + #[tokio::test] + async fn test_bypassed_became_cacheable() { + init(); + + let url = "http://127.0.0.1:6148/unique/test_bypassed/cache_control"; + + let res = reqwest::Client::new() + .get(url) + .header("set-cache-control", "private, max-age=0") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + let cc = headers.get("Cache-Control").unwrap(); + assert_eq!(cc, "private, max-age=0"); + assert_eq!(headers["x-cache-status"], "no-cache"); + assert_eq!(res.text().await.unwrap(), "hello world"); + + // request should bypass cache, but became cacheable (cache fill) + let res = reqwest::Client::new() + .get(url) + .header("set-cache-control", "public, max-age=10") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), "hello world"); + + // HIT + let res = reqwest::Client::new() + .get(url) + .header("set-cache-control", "public, max-age=10") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "hit"); + assert_eq!(res.text().await.unwrap(), "hello world"); + } + + #[tokio::test] + async fn test_bypassed_304() { + init(); + + let url = "http://127.0.0.1:6148/unique/test_bypassed_304/cache_control"; + + let res = reqwest::Client::new() + .get(url) + .header("set-cache-control", "private, max-age=0") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + let cc = headers.get("Cache-Control").unwrap(); + assert_eq!(cc, "private, max-age=0"); + assert_eq!(headers["x-cache-status"], "no-cache"); + assert_eq!(res.text().await.unwrap(), "hello world"); + + // cacheable without private cache-control + // note this will be a 304 and not a 200, we will cache on _next_ request + let res = reqwest::Client::new() + .get(url) + .header("set-cache-control", "public, max-age=10") + .header("set-revalidated", "1") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::NOT_MODIFIED); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "deferred"); + + // should be cache fill + let res = reqwest::Client::new() + .get(url) + .header("set-cache-control", "public, max-age=10") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), "hello world"); + + // HIT + let res = reqwest::Client::new() + .get(url) + .header("set-cache-control", "public, max-age=10") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "hit"); + assert_eq!(res.text().await.unwrap(), "hello world"); + } + + #[tokio::test] + async fn test_bypassed_uncacheable_304() { + init(); + + let url = "http://127.0.0.1:6148/unique/test_bypassed_private_304/cache_control"; + + // cache fill + let res = reqwest::Client::new() + .get(url) + .header("set-cache-control", "public, max-age=0") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + let cc = headers.get("Cache-Control").unwrap(); + assert_eq!(cc, "public, max-age=0"); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), "hello world"); + + // cache stale + // upstream returns 304, but response became uncacheable + let res = reqwest::Client::new() + .get(url) + .header("set-cache-control", "private") + .header("set-revalidated", "1") + .send() + .await + .unwrap(); + // should see the response body because we didn't send conditional headers + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "revalidated"); + assert_eq!(res.text().await.unwrap(), "hello world"); + + // we bypass cache for this next request + let res = reqwest::Client::new() + .get(url) + .header("set-cache-control", "public, max-age=10") + .header("set-revalidated", "1") // non-200 status to get bypass phase + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::NOT_MODIFIED); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "deferred"); + } + + #[tokio::test] + async fn test_eviction() { + init(); + let url = "http://127.0.0.1:6148/file_maker/test_eviction".to_owned(); + + // admit asset 1 + let res = reqwest::Client::new() + .get(url.clone() + "1") + .header("x-set-size", "3000") + .header("x-eviction", "1") // tell test proxy to use eviction manager + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap().len(), 3000); + + // admit asset 2 + let res = reqwest::Client::new() + .get(url.clone() + "2") + .header("x-set-size", "3000") + .header("x-eviction", "1") // tell test proxy to use eviction manager + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap().len(), 3000); + + // touch asset 2 + let res = reqwest::Client::new() + .get(url.clone() + "2") + .header("x-set-size", "3000") + .header("x-eviction", "1") // tell test proxy to use eviction manager + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "hit"); + assert_eq!(res.text().await.unwrap().len(), 3000); + + // touch asset 1 + let res = reqwest::Client::new() + .get(url.clone() + "1") + .header("x-set-size", "3000") + .header("x-eviction", "1") // tell test proxy to use eviction manager + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "hit"); + assert_eq!(res.text().await.unwrap().len(), 3000); + + // admit asset 3 + let res = reqwest::Client::new() + .get(url.clone() + "3") + .header("x-set-size", "6000") + .header("x-eviction", "1") // tell test proxy to use eviction manager + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap().len(), 6000); + + // check asset 2, it should be evicted already because admitting asset 3 made it full + let res = reqwest::Client::new() + .get(url.clone() + "2") + .header("x-set-size", "3000") + .header("x-eviction", "1") // tell test proxy to use eviction manager + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); // evicted + assert_eq!(res.text().await.unwrap().len(), 3000); + } + + #[tokio::test] + async fn test_cache_lock_miss_hit() { + init(); + let url = "http://127.0.0.1:6148/sleep/test_cache_lock_miss_hit.txt"; + + // no lock, parallel fetches to a slow origin are all misses + tokio::spawn(async move { + let res = reqwest::Client::new().get(url).send().await.unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), "hello world"); + }); + + tokio::spawn(async move { + let res = reqwest::Client::new().get(url).send().await.unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), "hello world"); + }); + + tokio::spawn(async move { + let res = reqwest::Client::new().get(url).send().await.unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), "hello world"); + }) + .await + .unwrap(); // wait for at least one of them to finish + + let res = reqwest::Client::new().get(url).send().await.unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "hit"); + assert_eq!(res.text().await.unwrap(), "hello world"); + + // try with lock + let url = "http://127.0.0.1:6148/sleep/test_cache_lock_miss_hit2.txt"; + let task1 = tokio::spawn(async move { + let res = reqwest::Client::new() + .get(url) + .header("x-lock", "true") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), "hello world"); + }); + // sleep just a little to make sure the req above gets the cache lock + sleep(Duration::from_millis(50)).await; + let task2 = tokio::spawn(async move { + let res = reqwest::Client::new() + .get(url) + .header("x-lock", "true") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "hit"); + let lock_time_ms: u32 = headers["x-cache-lock-time-ms"] + .to_str() + .unwrap() + .parse() + .unwrap(); + assert!(lock_time_ms > 900 && lock_time_ms < 1000); + assert_eq!(res.text().await.unwrap(), "hello world"); + }); + let task3 = tokio::spawn(async move { + let res = reqwest::Client::new() + .get(url) + .header("x-lock", "true") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "hit"); + let lock_time_ms: u32 = headers["x-cache-lock-time-ms"] + .to_str() + .unwrap() + .parse() + .unwrap(); + assert!(lock_time_ms > 900 && lock_time_ms < 1000); + assert_eq!(res.text().await.unwrap(), "hello world"); + }); + + task1.await.unwrap(); + task2.await.unwrap(); + task3.await.unwrap(); + } + + #[tokio::test] + async fn test_cache_lock_expired() { + init(); + let url = "http://127.0.0.1:6148/sleep/test_cache_lock_expired.txt"; + + // cache one + let res = reqwest::Client::new() + .get(url) + .header("x-no-stale-revalidate", "true") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), "hello world"); + // let it stale + sleep(Duration::from_secs(1)).await; + + let task1 = tokio::spawn(async move { + let res = reqwest::Client::new() + .get(url) + .header("x-lock", "true") + .header("x-no-stale-revalidate", "true") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "expired"); + assert_eq!(res.text().await.unwrap(), "hello world"); + }); + // sleep just a little to make sure the req above gets the cache lock + sleep(Duration::from_millis(50)).await; + let task2 = tokio::spawn(async move { + let res = reqwest::Client::new() + .get(url) + .header("x-lock", "true") + .header("x-no-stale-revalidate", "true") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "hit"); + assert_eq!(res.text().await.unwrap(), "hello world"); + }); + let task3 = tokio::spawn(async move { + let res = reqwest::Client::new() + .get(url) + .header("x-lock", "true") + .header("x-no-stale-revalidate", "true") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "hit"); + assert_eq!(res.text().await.unwrap(), "hello world"); + }); + + task1.await.unwrap(); + task2.await.unwrap(); + task3.await.unwrap(); + } + + #[tokio::test] + async fn test_cache_lock_network_error() { + init(); + let url = "http://127.0.0.1:6148/sleep/test_cache_lock_network_error.txt"; + + // FIXME: Dangling lock happens in this test because the first request aborted without + // properly release the lock. This is a bug + + let task1 = tokio::spawn(async move { + let res = reqwest::Client::new() + .get(url) + .header("x-lock", "true") + .header("x-set-sleep", "0.3") // sometimes we hit the retry logic which is x3 slow + .header("x-abort", "true") // this will tell origin to kill the conn right away + .send() + .await + .unwrap(); + assert_eq!(res.status(), 502); // error happened + }); + // sleep just a little to make sure the req above gets the cache lock + sleep(Duration::from_millis(50)).await; + + let task2 = tokio::spawn(async move { + let res = reqwest::Client::new() + .get(url) + .header("x-lock", "true") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + let status = headers["x-cache-status"].to_owned(); + assert_eq!(res.text().await.unwrap(), "hello world"); + status + }); + let task3 = tokio::spawn(async move { + let res = reqwest::Client::new() + .get(url) + .header("x-lock", "true") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + let status = headers["x-cache-status"].to_owned(); + assert_eq!(res.text().await.unwrap(), "hello world"); + status + }); + + task1.await.unwrap(); + let status2 = task2.await.unwrap(); + let status3 = task3.await.unwrap(); + + let mut count_miss = 0; + if status2 == "miss" { + count_miss += 1; + } + if status3 == "miss" { + count_miss += 1; + } + assert_eq!(count_miss, 1); + } + + #[tokio::test] + async fn test_cache_lock_uncacheable() { + init(); + let url = "http://127.0.0.1:6148/sleep/test_cache_lock_uncacheable.txt"; + + let task1 = tokio::spawn(async move { + let res = reqwest::Client::new() + .get(url) + .header("x-lock", "true") + .header("x-no-store", "true") // tell origin to return CC: no-store + .send() + .await + .unwrap(); + assert_eq!(res.status(), 200); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "no-cache"); + assert_eq!(res.text().await.unwrap(), "hello world"); + }); + // sleep just a little to make sure the req above gets the cache lock + sleep(Duration::from_millis(50)).await; + + let task2 = tokio::spawn(async move { + let res = reqwest::Client::new() + .get(url) + .header("x-lock", "true") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "no-cache"); + assert_eq!(res.text().await.unwrap(), "hello world"); + }); + let task3 = tokio::spawn(async move { + let res = reqwest::Client::new() + .get(url) + .header("x-lock", "true") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "no-cache"); + assert_eq!(res.text().await.unwrap(), "hello world"); + }); + + task1.await.unwrap(); + task2.await.unwrap(); + task3.await.unwrap(); + } + + #[tokio::test] + async fn test_cache_lock_timeout() { + init(); + let url = "http://127.0.0.1:6148/sleep/test_cache_lock_timeout.txt"; + + let task1 = tokio::spawn(async move { + let res = reqwest::Client::new() + .get(url) + .header("x-lock", "true") + .header("x-set-sleep", "3") // we have a 2 second cache lock timeout + .send() + .await + .unwrap(); + assert_eq!(res.status(), 200); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), "hello world"); + }); + // sleep just a little to make sure the req above gets the cache lock + sleep(Duration::from_millis(50)).await; + + let task2 = tokio::spawn(async move { + let res = reqwest::Client::new() + .get(url) + .header("x-lock", "true") + .header("x-set-sleep", "0.1") // tell origin to return faster + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "no-cache"); + assert_eq!(res.text().await.unwrap(), "hello world"); + }); + + // send the 3rd request after the 2 second cache lock timeout where the + // first request still holds the lock (3s delay in origin) + sleep(Duration::from_millis(2000)).await; + let task3 = tokio::spawn(async move { + let res = reqwest::Client::new() + .get(url) + .header("x-lock", "true") + .header("x-set-sleep", "0.1") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "no-cache"); + assert_eq!(res.text().await.unwrap(), "hello world"); + }); + + task1.await.unwrap(); + task2.await.unwrap(); + task3.await.unwrap(); + + let res = reqwest::Client::new() + .get(url) + .header("x-lock", "true") + .send() + .await + .unwrap(); + assert_eq!(res.status(), 200); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "hit"); // the first request cached it + assert_eq!(res.text().await.unwrap(), "hello world"); + } + + #[tokio::test] + async fn test_cache_serve_stale_network_error() { + init(); + let url = "http://127.0.0.1:6148/sleep/test_cache_serve_stale_network_error.txt"; + + let res = reqwest::Client::new() + .get(url) + .header("x-set-sleep", "0") // no need to sleep we just reuse this endpoint + .send() + .await + .unwrap(); + assert_eq!(res.status(), 200); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), "hello world"); + + sleep(Duration::from_millis(1100)).await; // ttl is 1 + + let res = reqwest::Client::new() + .get(url) + .header("x-set-sleep", "0") // no need to sleep we just reuse this endpoint + .header("x-abort", "true") // this will tell origin to kill the conn right away + .send() + .await + .unwrap(); + assert_eq!(res.status(), 200); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "stale"); + assert_eq!(res.text().await.unwrap(), "hello world"); + } + + #[tokio::test] + async fn test_cache_serve_stale_network_error_mid_response() { + init(); + let url = + "http://127.0.0.1:6148/sleep/test_cache_serve_stale_network_error_mid_response.txt"; + + let res = reqwest::Client::new() + .get(url) + .header("x-set-sleep", "0") // no need to sleep we just reuse this endpoint + .send() + .await + .unwrap(); + assert_eq!(res.status(), 200); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), "hello world"); + + sleep(Duration::from_millis(1100)).await; // ttl is 1 + + let res = reqwest::Client::new() + .get(url) + .header("x-set-sleep", "0") // no need to sleep we just reuse this endpoint + .header("x-set-body-sleep", "0.1") // pause the body a bit before abort + .header("x-abort-body", "true") // this will tell origin to kill the conn right away + .send() + .await + .unwrap(); + assert_eq!(res.status(), 200); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "expired"); + // the connection dies + assert!(res.text().await.is_err()); + } + + #[tokio::test] + async fn test_cache_serve_stale_on_500() { + init(); + let url = "http://127.0.0.1:6148/sleep/test_cache_serve_stale_on_500.txt"; + + let res = reqwest::Client::new() + .get(url) + .header("x-set-sleep", "0") // no need to sleep we just reuse this endpoint + .send() + .await + .unwrap(); + assert_eq!(res.status(), 200); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), "hello world"); + + sleep(Duration::from_millis(1100)).await; // ttl is 1 + + let res = reqwest::Client::new() + .get(url) + .header("x-set-sleep", "0") // no need to sleep we just reuse this endpoint + .header("x-error-header", "true") // this will tell origin to return 500 + .send() + .await + .unwrap(); + assert_eq!(res.status(), 200); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "stale"); + assert_eq!(res.text().await.unwrap(), "hello world"); + } + + #[tokio::test] + async fn test_stale_while_revalidate_many_readers() { + init(); + let url = "http://127.0.0.1:6148/sleep/test_stale_while_revalidate_many_readers.txt"; + + // cache one + let res = reqwest::Client::new().get(url).send().await.unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), "hello world"); + // let it stale + sleep(Duration::from_secs(1)).await; + + let task1 = tokio::spawn(async move { + let res = reqwest::Client::new() + .get(url) + .header("x-lock", "true") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "stale"); + assert_eq!(res.text().await.unwrap(), "hello world"); + }); + // sleep just a little to make sure the req above gets the cache lock + sleep(Duration::from_millis(50)).await; + let task2 = tokio::spawn(async move { + let res = reqwest::Client::new() + .get(url) + .header("x-lock", "true") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "stale"); + assert_eq!(res.text().await.unwrap(), "hello world"); + }); + let task3 = tokio::spawn(async move { + let res = reqwest::Client::new() + .get(url) + .header("x-lock", "true") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "stale"); + assert_eq!(res.text().await.unwrap(), "hello world"); + }); + + task1.await.unwrap(); + task2.await.unwrap(); + task3.await.unwrap(); + } + + #[tokio::test] + async fn test_stale_while_revalidate_single_request() { + init(); + let url = "http://127.0.0.1:6148/sleep/test_stale_while_revalidate_single_request.txt"; + + // cache one + let res = reqwest::Client::new() + .get(url) + .header("x-set-sleep", "0") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), "hello world"); + // let it stale + sleep(Duration::from_secs(1)).await; + + let res = reqwest::Client::new() + .get(url) + .header("x-lock", "true") + .header("x-set-sleep", "0") // by default /sleep endpoint will sleep 1s + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "stale"); + assert_eq!(res.text().await.unwrap(), "hello world"); + + // wait for the background request to finish + sleep(Duration::from_millis(100)).await; + + let res = reqwest::Client::new() + .get(url) + .header("x-lock", "true") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "hit"); // fresh + assert_eq!(res.text().await.unwrap(), "hello world"); + } + + #[tokio::test] + async fn test_cache_streaming_partial_body() { + init(); + let url = "http://127.0.0.1:6148/slow_body/test_cache_streaming_partial_body.txt"; + let task1 = tokio::spawn(async move { + let res = reqwest::Client::new() + .get(url) + .header("x-lock", "true") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), "hello world!"); + }); + // sleep just a little to make sure the req above gets the cache lock + sleep(Duration::from_millis(50)).await; + + let task2 = tokio::spawn(async move { + let res = reqwest::Client::new() + .get(url) + .header("x-lock", "true") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "hit"); + let lock_time_ms: u32 = headers["x-cache-lock-time-ms"] + .to_str() + .unwrap() + .parse() + .unwrap(); + // the entire body should need 2 extra seconds, here the test shows that + // only the header is under cache lock and the body should be streamed + assert!(lock_time_ms > 900 && lock_time_ms < 1000); + assert_eq!(res.text().await.unwrap(), "hello world!"); + }); + let task3 = tokio::spawn(async move { + let res = reqwest::Client::new() + .get(url) + .header("x-lock", "true") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "hit"); + let lock_time_ms: u32 = headers["x-cache-lock-time-ms"] + .to_str() + .unwrap() + .parse() + .unwrap(); + // the entire body should need 2 extra seconds, here the test shows that + // only the header is under cache lock and the body should be streamed + assert!(lock_time_ms > 900 && lock_time_ms < 1000); + assert_eq!(res.text().await.unwrap(), "hello world!"); + }); + + task1.await.unwrap(); + task2.await.unwrap(); + task3.await.unwrap(); + } + + #[tokio::test] + async fn test_range_request() { + init(); + let url = "http://127.0.0.1:6148/unique/test_range_request/now"; + + let res = reqwest::Client::new() + .get(url) + .header("Range", "bytes=0-1") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::PARTIAL_CONTENT); + let headers = res.headers(); + let cache_miss_epoch = headers["x-epoch"].to_str().unwrap().parse::<f64>().unwrap(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), "he"); + + // full body is cached + let res = reqwest::get(url).await.unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + let cache_hit_epoch = headers["x-epoch"].to_str().unwrap().parse::<f64>().unwrap(); + assert_eq!(headers["x-cache-status"], "hit"); + assert_eq!(res.text().await.unwrap(), "hello world"); + + assert_eq!(cache_miss_epoch, cache_hit_epoch); + + let res = reqwest::Client::new() + .get(url) + .header("Range", "bytes=0-1") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::PARTIAL_CONTENT); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "hit"); + assert_eq!(res.text().await.unwrap(), "he"); + + let res = reqwest::Client::new() + .get(url) + .header("Range", "bytes=1-0") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::RANGE_NOT_SATISFIABLE); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "hit"); + assert_eq!(res.text().await.unwrap(), ""); + + let res = reqwest::Client::new() + .head(url) + .header("Range", "bytes=0-1") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::PARTIAL_CONTENT); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "hit"); + assert_eq!(res.text().await.unwrap(), ""); + + sleep(Duration::from_millis(1100)).await; // ttl is 1 + + let res = reqwest::Client::new() + .get(url) + .header("Range", "bytes=0-1") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::PARTIAL_CONTENT); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "expired"); + assert_eq!(res.text().await.unwrap(), "he"); + } + + #[tokio::test] + async fn test_caching_when_downstream_bails() { + init(); + let url = "http://127.0.0.1:6148/slow_body/test_caching_when_downstream_bails/"; + + tokio::spawn(async move { + let res = reqwest::Client::new() + .get(url) + .header("x-lock", "true") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); + // exit without res.text().await so that we bail early + }); + // sleep just a little to make sure the req above gets the cache lock + sleep(Duration::from_millis(50)).await; + + let res = reqwest::Client::new() + .get(url) + .header("x-lock", "true") + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "hit"); + let lock_time_ms: u32 = headers["x-cache-lock-time-ms"] + .to_str() + .unwrap() + .parse() + .unwrap(); + // the entire body should need 2 extra seconds, here the test shows that + // only the header is under cache lock and the body should be streamed + assert!(lock_time_ms > 900 && lock_time_ms < 1000); + assert_eq!(res.text().await.unwrap(), "hello world!"); + } + + async fn send_vary_req(url: &str, vary: &str) -> reqwest::Response { + reqwest::Client::new() + .get(url) + .header("x-vary-me", vary) + .send() + .await + .unwrap() + } + + #[tokio::test] + async fn test_vary_caching() { + init(); + let url = "http://127.0.0.1:6148/unique/test_vary_caching/now"; + + let res = send_vary_req(url, "a").await; + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + let cache_a_miss_epoch = headers["x-epoch"].to_str().unwrap().parse::<f64>().unwrap(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), "hello world"); + + let res = send_vary_req(url, "a").await; + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + let cache_hit_epoch = headers["x-epoch"].to_str().unwrap().parse::<f64>().unwrap(); + assert_eq!(headers["x-cache-status"], "hit"); + assert_eq!(res.text().await.unwrap(), "hello world"); + + assert_eq!(cache_a_miss_epoch, cache_hit_epoch); + + let res = send_vary_req(url, "b").await; + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + let cache_b_miss_epoch = headers["x-epoch"].to_str().unwrap().parse::<f64>().unwrap(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), "hello world"); + + let res = send_vary_req(url, "b").await; + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + let cache_hit_epoch = headers["x-epoch"].to_str().unwrap().parse::<f64>().unwrap(); + assert_eq!(headers["x-cache-status"], "hit"); + assert_eq!(res.text().await.unwrap(), "hello world"); + + assert_eq!(cache_b_miss_epoch, cache_hit_epoch); + assert!(cache_a_miss_epoch != cache_b_miss_epoch); + } + + #[tokio::test] + async fn test_vary_purge() { + init(); + let url = "http://127.0.0.1:6148/unique/test_vary_purge/now"; + + send_vary_req(url, "a").await; + let res = send_vary_req(url, "a").await; + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "hit"); + + send_vary_req(url, "b").await; + let res = send_vary_req(url, "b").await; + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "hit"); + + //both variances are cached + + let res = reqwest::Client::builder() + .build() + .unwrap() + .request(reqwest::Method::from_bytes(b"PURGE").unwrap(), url) + .send() + .await + .unwrap(); + assert_eq!(res.status(), StatusCode::OK); + assert_eq!(res.text().await.unwrap(), ""); + + //both should be miss + + let res = send_vary_req(url, "a").await; + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); + + let res = send_vary_req(url, "b").await; + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); + } + + async fn send_max_file_size_req(url: &str, max_file_size_bytes: usize) -> reqwest::Response { + reqwest::Client::new() + .get(url) + .header( + "x-cache-max-file-size-bytes", + max_file_size_bytes.to_string(), + ) + .send() + .await + .unwrap() + } + + #[tokio::test] + async fn test_cache_max_file_size() { + init(); + let url = "http://127.0.0.1:6148/unique/test_cache_max_file_size_100/now"; + + let res = send_max_file_size_req(url, 100).await; + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "miss"); + assert_eq!(res.text().await.unwrap(), "hello world"); + + let res = send_max_file_size_req(url, 100).await; + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "hit"); + assert_eq!(res.text().await.unwrap(), "hello world"); + + let url = "http://127.0.0.1:6148/unique/test_cache_max_file_size_1/now"; + let res = send_max_file_size_req(url, 1).await; + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "no-cache"); + assert_eq!(res.text().await.unwrap(), "hello world"); + + let res = send_max_file_size_req(url, 1).await; + assert_eq!(res.status(), StatusCode::OK); + let headers = res.headers(); + assert_eq!(headers["x-cache-status"], "no-cache"); + assert_eq!(res.text().await.unwrap(), "hello world"); + } +} diff --git a/pingora-proxy/tests/utils/cert.rs b/pingora-proxy/tests/utils/cert.rs new file mode 100644 index 0000000..674a3ac --- /dev/null +++ b/pingora-proxy/tests/utils/cert.rs @@ -0,0 +1,47 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use once_cell::sync::Lazy; +use pingora_core::tls::pkey::{PKey, Private}; +use pingora_core::tls::x509::X509; +use std::fs; + +pub static ROOT_CERT: Lazy<X509> = Lazy::new(|| load_cert("keys/root.crt")); +pub static ROOT_KEY: Lazy<PKey<Private>> = Lazy::new(|| load_key("keys/root.key")); +pub static INTERMEDIATE_CERT: Lazy<X509> = Lazy::new(|| load_cert("keys/intermediate.crt")); +pub static INTERMEDIATE_KEY: Lazy<PKey<Private>> = Lazy::new(|| load_key("keys/intermediate.key")); +pub static LEAF_CERT: Lazy<X509> = Lazy::new(|| load_cert("keys/leaf.crt")); +pub static LEAF2_CERT: Lazy<X509> = Lazy::new(|| load_cert("keys/leaf2.crt")); +pub static LEAF_KEY: Lazy<PKey<Private>> = Lazy::new(|| load_key("keys/leaf.key")); +pub static LEAF2_KEY: Lazy<PKey<Private>> = Lazy::new(|| load_key("keys/leaf2.key")); +pub static SERVER_CERT: Lazy<X509> = Lazy::new(|| load_cert("keys/server.crt")); +pub static SERVER_KEY: Lazy<PKey<Private>> = Lazy::new(|| load_key("keys/key.pem")); +pub static CURVE_521_TEST_KEY: Lazy<PKey<Private>> = + Lazy::new(|| load_key("keys/curve_test.521.key.pem")); +pub static CURVE_521_TEST_CERT: Lazy<X509> = Lazy::new(|| load_cert("keys/curve_test.521.crt")); +pub static CURVE_384_TEST_KEY: Lazy<PKey<Private>> = + Lazy::new(|| load_key("keys/curve_test.384.key.pem")); +pub static CURVE_384_TEST_CERT: Lazy<X509> = Lazy::new(|| load_cert("keys/curve_test.384.crt")); + +fn load_cert(path: &str) -> X509 { + let path = format!("{}/{path}", super::conf_dir()); + let cert_bytes = fs::read(path).unwrap(); + X509::from_pem(&cert_bytes).unwrap() +} + +fn load_key(path: &str) -> PKey<Private> { + let path = format!("{}/{path}", super::conf_dir()); + let key_bytes = fs::read(path).unwrap(); + PKey::private_key_from_pem(&key_bytes).unwrap() +} diff --git a/pingora-proxy/tests/utils/conf/keys/README.md b/pingora-proxy/tests/utils/conf/keys/README.md new file mode 100644 index 0000000..13965cd --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/README.md @@ -0,0 +1,18 @@ +Some test certificates. The CA is specified in your package directory (grep for ca_file). + +Some handy commands: +``` +# Describe a pkey +openssl [ec|rsa|...] -in key.pem -noout -text +# Describe a cert +openssl x509 -in some_cert.crt -noout -text + +# Generate self-signed cert +openssl ecparam -genkey -name secp256r1 -noout -out test_key.pem +openssl req -new -x509 -key test_key.pem -out test.crt -days 3650 -sha256 -subj '/CN=openrusty.org' + +# Generate a cert signed by another +openssl ecparam -genkey -name secp256r1 -noout -out test_key.pem +openssl req -new -key test_key.pem -out test.csr +openssl x509 -req -in test.csr -CA server.crt -CAkey key.pem -CAcreateserial -CAserial test.srl -out test.crt -days 3650 -sha256 +``` diff --git a/pingora-proxy/tests/utils/conf/keys/ca1.crt b/pingora-proxy/tests/utils/conf/keys/ca1.crt new file mode 100644 index 0000000..021c59d --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/ca1.crt @@ -0,0 +1,32 @@ +-----BEGIN CERTIFICATE----- +MIIFbTCCA1UCFDsRVhSk+Asz9Q9BwsvZucCbYA5/MA0GCSqGSIb3DQEBCwUAMHMx +CzAJBgNVBAYTAlVTMRMwEQYDVQQIDApDYWxpZm9ybmlhMRYwFAYDVQQHDA1TYW4g +RnJhbmNpc2NvMR4wHAYDVQQKDBVDbG91ZGZsYXJlIFRlc3QgU3VpdGUxFzAVBgNV +BAMMDnNlbGZzaWduZWQuY29tMB4XDTIwMDkxODIwMzk1MloXDTMwMDkxNjIwMzk1 +MlowczELMAkGA1UEBhMCVVMxEzARBgNVBAgMCkNhbGlmb3JuaWExFjAUBgNVBAcM +DVNhbiBGcmFuY2lzY28xHjAcBgNVBAoMFUNsb3VkZmxhcmUgVGVzdCBTdWl0ZTEX +MBUGA1UEAwwOc2VsZnNpZ25lZC5jb20wggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAw +ggIKAoICAQCuFbjnE8gTFMrcCXmiP4t1wrK0uW5JSvWpxZAfTHroka/o8wBcKa1c +7dXOGSEzKkTdsmrAkvi2KXMEAd08iwnY52xQ3vpaQDCiBhJhLUGaG2nJ5iH6A3CX +VfsoHccFTp3N4/iiCjxyxnUoQZW1fuun5A9cow6F8xNa7EPtPMJsK7nUYDW2PLj4 +881aphUM483gS/Ph5IpaZs6bRP0HyscdSC8hoIZxkOfIgp8a9BvgnaK8cPhoNGFl +HNu4hU+0cxjke/iz9iKRHtdcyuKnRMv8kt+acTpdgWl5E4nmvwXFloPeUuUAEgcc +qcp9Uai2dp9XKfxAGW2wEQPpZseDH7mZ7+NwqxJ2z4R55fdIn8jmALJdz+npvpRr +QHHc6k9jv0iYv9XwZOqT1crlzwcCo3x8A7oD+sJrat5oY1zBXjNzLpb9DKyVQ1em +Ho/7VrLFtK+rJzI/b7D0r6GKk/h3SeqxmgN22fFPcbEM2eUIibUvmCB4OLooWkBs +eSeDr5wMZ7u9ExljGLywKHnOQQ7dlVUWeN5cncv9yU05fWE/whPEOri1ksyNdYr6 +kjIY1NYKmXfRaKaR9/JCVkhZj0H8VI6QpkqVHKgI5UMeE5dHMYbxJv0lmG+w6XN1 +Zew7DZRTidlBa6COxgCeQydxRTORCCPYQVYAGY5XiYtmWLGmsQjC1QIDAQABMA0G +CSqGSIb3DQEBCwUAA4ICAQAgGv+gvw5X9ftkGu/FEFK15dLHlFZ25tKHJw3LhJEf +xlDOCFI/zR+h2PFdVzks14LLrf4sSkRfJVkk2Qe5uRhHLcgnPIkCkJpGlpFMx2+V +O6azhJlnLEYeVXuzNiQHC+9LJH8i3NK37O8Z1z2EGsAz9kR09OBEvgDjSXFxCN0J +KLAMe4wfAhjUUt9/0bm9u7FYWyj0D5dUVeAul9X3Vo1HfffNovq2cuUlL1AG5Ku+ +nPkxGckBo/Lc7jZQRcoZ2+mtvsfyMH5l9OW6JRrnC/Rf5P9bEjUcAskMh5WRdHSL +j98oCkosxg2ndTXke091lToqr7sZ1kiGA+Bj4cPlVXckQn3WU7GiUSSRqotZtn8g +EMT2iqHH3/iJOgtDe8XPWdBYNDeDFRVNpOtgCuYLXdz/Vli0Cecm3escbW/+GZ9P +vgZoNUej8/WTWHNy732N1cHvSbT3kLN6uONP4wNelh+UnfmiG10O54x7iaM3grt9 +YvQ1I1G60NCj1tF9KvrCYCK/wnXnTWhlNZ4y+XbILFqE+k8zqiNzGZV9a8FAzht2 +APsm2JzzZz6Ph6Zw8fVOS/LX7WgF/kNe5nIzVLqyFXtFxgomXaoxbADUTe16TVb3 +6sV8p7nlq2r7Dr0+uROm7ZEg1F23SiieDoRvw5fUbRhZCU93fv7Nt7hWlKP+UqJj +Zg== +-----END CERTIFICATE----- diff --git a/pingora-proxy/tests/utils/conf/keys/ca1.key.pem b/pingora-proxy/tests/utils/conf/keys/ca1.key.pem new file mode 100644 index 0000000..813dbe5 --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/ca1.key.pem @@ -0,0 +1,51 @@ +-----BEGIN RSA PRIVATE KEY----- +MIIJKAIBAAKCAgEArhW45xPIExTK3Al5oj+LdcKytLluSUr1qcWQH0x66JGv6PMA +XCmtXO3VzhkhMypE3bJqwJL4tilzBAHdPIsJ2OdsUN76WkAwogYSYS1BmhtpyeYh ++gNwl1X7KB3HBU6dzeP4ogo8csZ1KEGVtX7rp+QPXKMOhfMTWuxD7TzCbCu51GA1 +tjy4+PPNWqYVDOPN4Evz4eSKWmbOm0T9B8rHHUgvIaCGcZDnyIKfGvQb4J2ivHD4 +aDRhZRzbuIVPtHMY5Hv4s/YikR7XXMrip0TL/JLfmnE6XYFpeROJ5r8FxZaD3lLl +ABIHHKnKfVGotnafVyn8QBltsBED6WbHgx+5me/jcKsSds+EeeX3SJ/I5gCyXc/p +6b6Ua0Bx3OpPY79ImL/V8GTqk9XK5c8HAqN8fAO6A/rCa2reaGNcwV4zcy6W/Qys +lUNXph6P+1ayxbSvqycyP2+w9K+hipP4d0nqsZoDdtnxT3GxDNnlCIm1L5ggeDi6 +KFpAbHkng6+cDGe7vRMZYxi8sCh5zkEO3ZVVFnjeXJ3L/clNOX1hP8ITxDq4tZLM +jXWK+pIyGNTWCpl30WimkffyQlZIWY9B/FSOkKZKlRyoCOVDHhOXRzGG8Sb9JZhv +sOlzdWXsOw2UU4nZQWugjsYAnkMncUUzkQgj2EFWABmOV4mLZlixprEIwtUCAwEA +AQKCAgBP/nVX4dQnSH+rOsNk1fRcqZn6x9aw4TwfxkPizf8QjZma3scEkrYyJKwB +p7SE0WCRyyGY2jBlbIiIh97EqlNdE4LHap76B9MRMN8TPnuNuBkViKWGQDxlnkHp +/jzs6GJFMQOYWkHKr/04AWMs4mShYn/YnqjWzorPVhAknK3ujO04dPlZg2+wHj/3 +7qdvo+J/tgccfytAPUulN79Z7Ekw4HGf7ya4WtDXZ4Z7GT8SKP2VwAe1wpQapXcl +xESK8/S1UW5IK8tYiiaGYkhieo+NwWP0kSEzxHrWAy90E8UwNWjlKYxHSwFvn2oH +yhVPuxSfNhDO16B6rmbwwqTdUR+0pepF9IcgWuGO/AAMPlo6tKKqo7oW8xUqX0EW +vSCdISLlOITe2GBFv0q1xcUG9xZM5/Hde4NPU6OpghFcM/Okl3MoGqvqH4Fcd2Lm +HsjHxE6/8pDvxy8wGMeHEYTcDnKdTGPQgyEHHTZBsoHOzrM7CXGgpGIj9DPxrJO+ +VZFHqoILRbhiU3LTnyb5J8X8zyPv064LOoZOu2JoY99E2j1PtI4ym1fAzhd5ScU7 +X2CJTXAA57e0ezZCuPh/isgHmhx3bFHUvluWPKyspchLy/Pk28382jgnM+/vdbZh +wObGpeLpIEylxMmMROxZSDiDFhwG/rrp08vmhJRjgCb6XRAiZQKCAQEA1dnTbqve +aMioUOt70U6kuokJGpibmp5CXD+yGev4QZiVuoZvURfqqwRoMyJ2jaALAYRJpRMc +tbBvi3iq+uhB4FFiSCegK+F3qYPpgPvC2kHDrqgm4Fdjhz0/CfkkihzyVae5BHU9 +nm9xS39vmHKtPdM4Yt8n/vGXZy06pKRo2gxA4K5UswtJ3GGgKY+/dgRgXGS7eIaw +2b1uLvIZ8p2XGzMbjAtaTEykAQXMX7mPanpizT8LguvxCAFo2QyzCMJyuUii8pQS +H/ewKGVd3zZVN3KgWnGWoYpnRaY/eG6O60APV625yRgI0k4CZucWK8wuNU4TGpy7 +YCnJSX3q/nIh9wKCAQEA0GVwvHjqWTOCpeTw5+QJq/0dvTVmaJzUe+EfBYpgaqd3 +K+Lcj3TuNr+/v8sJ6ZhvflBxLI9Yk9kmfHMVsCrTXPr2mkmZMeucbM6Lot9tcYAN +FX+LKFIz9uDMXhMZfnycEurEdwlo1A1c4dpYEaOJ+OCmzglg7Bfxq7ol32MlVg8e +06VyjfFVR2fNzlRUFX/DZrI8mjgsVone/eJNGLYPUhXMZ905vfQFefP9DijTtecZ +AcPkhMMCXaldtuZ9WE9SRnV0HRpggDFdA+7AJnqp9umc3S1yv1YQvSFomAH+Aszs +LKuwS4VPwZWNiMHqRlQrZ6lKa+rMWSowHiJCgIpOkwKCAQEAyiSeLIX/tXK/T8ZY +gxBgvAae+Wn55Fzmg4aeFsysHW1bUzaScMg3xbJjwLo58EOxQ5zFdGmtgL0no2HL +1WLIKn8jdOsoB3KYBz+u8IKKvH7ftvAx12wjo4msVgQQmxEjrP3e8SzVszbKlEAA +v8zen4tSSHuCtgWuRRRG06yphDuC9B815wyro8sQd1ju9WLLp2p8n0BKWXgrd+rX +xjNay5Yy2t08XNUxTdoqRu4Dd/X6AOMwQXA/pX6XmlvbvFL52NSlWsHGpDsgY/71 +jfIw+Tm8A+JNLaPDXN36Lx/qrssd9ZY9AK5cYFbnBFg55+qYX0DO5B/1KsA1Cegh +wqUmHwKCAQBw/r/NAccXzM1HREa3hbcU0W7hm+XGTVsNPHiEmY5D5j/AxQaQpndP +qlK/HMloJqY1mEp1PdhqejDbA8+7sMzgOpeh+swc/ELZ4HhoPLtr8mGlyX1bxI62 +ixdk3vhQ1CIQQ8l5PdngOMqnD6v3DHSQRMdNKlqqSSVZ1toYMPsamaI+YhQmELgL +uqYl/SWGbrs1oOkpOdIYrjMB+EWTY4wVFwq5OoPHkluxz3Djz5FTrVWq1lu+/Ln4 +rQ/KT1mhm4jh+WeXLCks+RcVPcxkUNh9sBfE+ZKhWnpDAq1i1pmzTQe2BPXXTRZ8 +wal3gKWVsqfCUlGvCCX7JtvmSu9CITwPAoIBAEQO6PQh3nD/tJSFZxgtPVp7r3Px ++QEnE68Y0B0veq9g5SBovg4KADTcHbIbRymOBw+9skB65pxdoV3EFGmEXpMm5+5b +HC/DTXf2hEuKb49VO52NbbthiZg+xsnitEv4ZBfSVBRw+nL3Dx5c30M9wG/3OdGX +OWPYFoIJZDlyy3ynZtiGrjHgNqi/coHdsYVLfMkc+/hidApzhoApDkFGusVB6GHB +fTSeyuGfh39120LVnhFjDr+SpfyIXNJIiCwizLJtc1WliTtQzd/Fh1M62qO6ye4/ +3M24xoaVCDgzNrSibELkiLTmqEA4cZwtN5BqhfnQa+Prujd5ElmABZSqDz8= +-----END RSA PRIVATE KEY----- diff --git a/pingora-proxy/tests/utils/conf/keys/ca2.crt b/pingora-proxy/tests/utils/conf/keys/ca2.crt new file mode 100644 index 0000000..021c59d --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/ca2.crt @@ -0,0 +1,32 @@ +-----BEGIN CERTIFICATE----- +MIIFbTCCA1UCFDsRVhSk+Asz9Q9BwsvZucCbYA5/MA0GCSqGSIb3DQEBCwUAMHMx +CzAJBgNVBAYTAlVTMRMwEQYDVQQIDApDYWxpZm9ybmlhMRYwFAYDVQQHDA1TYW4g +RnJhbmNpc2NvMR4wHAYDVQQKDBVDbG91ZGZsYXJlIFRlc3QgU3VpdGUxFzAVBgNV +BAMMDnNlbGZzaWduZWQuY29tMB4XDTIwMDkxODIwMzk1MloXDTMwMDkxNjIwMzk1 +MlowczELMAkGA1UEBhMCVVMxEzARBgNVBAgMCkNhbGlmb3JuaWExFjAUBgNVBAcM +DVNhbiBGcmFuY2lzY28xHjAcBgNVBAoMFUNsb3VkZmxhcmUgVGVzdCBTdWl0ZTEX +MBUGA1UEAwwOc2VsZnNpZ25lZC5jb20wggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAw +ggIKAoICAQCuFbjnE8gTFMrcCXmiP4t1wrK0uW5JSvWpxZAfTHroka/o8wBcKa1c +7dXOGSEzKkTdsmrAkvi2KXMEAd08iwnY52xQ3vpaQDCiBhJhLUGaG2nJ5iH6A3CX +VfsoHccFTp3N4/iiCjxyxnUoQZW1fuun5A9cow6F8xNa7EPtPMJsK7nUYDW2PLj4 +881aphUM483gS/Ph5IpaZs6bRP0HyscdSC8hoIZxkOfIgp8a9BvgnaK8cPhoNGFl +HNu4hU+0cxjke/iz9iKRHtdcyuKnRMv8kt+acTpdgWl5E4nmvwXFloPeUuUAEgcc +qcp9Uai2dp9XKfxAGW2wEQPpZseDH7mZ7+NwqxJ2z4R55fdIn8jmALJdz+npvpRr +QHHc6k9jv0iYv9XwZOqT1crlzwcCo3x8A7oD+sJrat5oY1zBXjNzLpb9DKyVQ1em +Ho/7VrLFtK+rJzI/b7D0r6GKk/h3SeqxmgN22fFPcbEM2eUIibUvmCB4OLooWkBs +eSeDr5wMZ7u9ExljGLywKHnOQQ7dlVUWeN5cncv9yU05fWE/whPEOri1ksyNdYr6 +kjIY1NYKmXfRaKaR9/JCVkhZj0H8VI6QpkqVHKgI5UMeE5dHMYbxJv0lmG+w6XN1 +Zew7DZRTidlBa6COxgCeQydxRTORCCPYQVYAGY5XiYtmWLGmsQjC1QIDAQABMA0G +CSqGSIb3DQEBCwUAA4ICAQAgGv+gvw5X9ftkGu/FEFK15dLHlFZ25tKHJw3LhJEf +xlDOCFI/zR+h2PFdVzks14LLrf4sSkRfJVkk2Qe5uRhHLcgnPIkCkJpGlpFMx2+V +O6azhJlnLEYeVXuzNiQHC+9LJH8i3NK37O8Z1z2EGsAz9kR09OBEvgDjSXFxCN0J +KLAMe4wfAhjUUt9/0bm9u7FYWyj0D5dUVeAul9X3Vo1HfffNovq2cuUlL1AG5Ku+ +nPkxGckBo/Lc7jZQRcoZ2+mtvsfyMH5l9OW6JRrnC/Rf5P9bEjUcAskMh5WRdHSL +j98oCkosxg2ndTXke091lToqr7sZ1kiGA+Bj4cPlVXckQn3WU7GiUSSRqotZtn8g +EMT2iqHH3/iJOgtDe8XPWdBYNDeDFRVNpOtgCuYLXdz/Vli0Cecm3escbW/+GZ9P +vgZoNUej8/WTWHNy732N1cHvSbT3kLN6uONP4wNelh+UnfmiG10O54x7iaM3grt9 +YvQ1I1G60NCj1tF9KvrCYCK/wnXnTWhlNZ4y+XbILFqE+k8zqiNzGZV9a8FAzht2 +APsm2JzzZz6Ph6Zw8fVOS/LX7WgF/kNe5nIzVLqyFXtFxgomXaoxbADUTe16TVb3 +6sV8p7nlq2r7Dr0+uROm7ZEg1F23SiieDoRvw5fUbRhZCU93fv7Nt7hWlKP+UqJj +Zg== +-----END CERTIFICATE----- diff --git a/pingora-proxy/tests/utils/conf/keys/ca_chain.cert b/pingora-proxy/tests/utils/conf/keys/ca_chain.cert new file mode 100644 index 0000000..2cb2caf --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/ca_chain.cert @@ -0,0 +1,60 @@ +-----BEGIN CERTIFICATE----- +MIIEjjCCAnagAwIBAgIUHIB/tqjZJaKIgeWwvXRt03C0yIMwDQYJKoZIhvcNAQEL +BQAwXzELMAkGA1UEBhMCVVMxCzAJBgNVBAgMAkNBMRYwFAYDVQQHDA1TYW4gRnJh +bmNpc2NvMRAwDgYDVQQKDAdSb290IENBMRkwFwYDVQQDDBByb290LnBpbmdvcmEu +b3JnMB4XDTIyMTExMDE5MzI0M1oXDTI1MDgwNjE5MzI0M1owTjELMAkGA1UEBhMC +VVMxCzAJBgNVBAgMAkNBMRgwFgYDVQQKDA9JbnRlcm1lZGlhdGUgQ0ExGDAWBgNV +BAMMD2ludC5waW5nb3JhLm9yZzCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoC +ggEBAL4klMT1Bc4vYWN7zF+x7x34s3L51Sve3AVydGGtzej2hC3m4CictVfKfkC6 +jMNRo3mpUsnAJSbyRh91fec8nnOT8MEYnmm05Lbf5DG4RULrKSg52zge4SFTLO2n +2eCa4SYwRpj+MQmFrCQ++s9gJ/5weN95z23XAS1EL2GK50Z/fKQfRCo+aZTRB6dU +KK2cUwuDAHTkVSePVAX8KGcZu2Qm/jTBlcDIfn7OmTu2g/n5YSRJg3MWKeJlAbVo +VNxmaRYQOs2X7y4WwcSAfEncyVXRzqFxEfSDnq2A2+pp/sKoCjTgE6n94SzyqyFm +yJ8FmvV79qCDHSaeIhR5qQEIlO8CAwEAAaNTMFEwHQYDVR0OBBYEFP5ivTJr/S6Z +VpOI4+JykGPID8s3MB8GA1UdIwQYMBaAFJ5hR0odQYOtYsY3P18WIC2byI1oMA8G +A1UdEwEB/wQFMAMBAf8wDQYJKoZIhvcNAQELBQADggIBAM337XP2Hm7LE3KLW+nn +khyj82ahj2k2+H/OdwGqzfqHCObTP+ydNJhOQVD+r255qQC9eAvd6ivF/h1tJOvv +Ed8vQMfLCO9VDFy6KCmlZQV6djRU1QXJIR/jf7TNqrFOcuKPGv5Vs6JwDdaHc0ae +ug7CGppnu5cxf/04sa7pWOdCFbhDRtfooo9fgGN2jcTFqfGyzocBwx7dgqEmZkae +yJAH0x4ldpKM9aO44h0Uy36c5RaWmdyFIh88QW62NoHamfwZoaVyycn82wcP4fFG +PRHm/AaDkYFGiQy22y7DD+MeZNUgCcAJpDYxfe87Cm4dw9NweMF6Jpo/8Ib1oLPq +E3miiFjWQwpMhxSQxpjqR92FPs9+/ktvYqbbMlyu/tju0rK17DXUi1zSIHoydPt0 +ymwWMxg7Jxpmg0x+eyWr5CP/ULM+F2Tk9W7x0B5DnpDJeCk+1ydUhII9AnTOCUWs +0VRlqTgFKahkHfiLBjPaLCgA0D3dz06EfEq5tmC8t0MDAqw9M4bDdow29K0aN6K8 +Gax7S5EK9aK09+HJ+7T5uxkUC+iIzfk53RhAfQiXdyKPpkbndRP67OiaAwk+hIXm +U1d1GsC854KYQs2GtHHvBcTGEADfU36TF/w2oJYQIrBjd23ZCf9jFK/WQ5GBFitT +ljoURxQQQy3LGjcH8W18JdRE +-----END CERTIFICATE----- +-----BEGIN CERTIFICATE----- +MIIFnzCCA4egAwIBAgIUE5kg5Z26V4swShJoSwfNVsJkHbYwDQYJKoZIhvcNAQEL +BQAwXzELMAkGA1UEBhMCVVMxCzAJBgNVBAgMAkNBMRYwFAYDVQQHDA1TYW4gRnJh +bmNpc2NvMRAwDgYDVQQKDAdSb290IENBMRkwFwYDVQQDDBByb290LnBpbmdvcmEu +b3JnMB4XDTIyMTExMDE5MjY1MFoXDTQyMTExMDE5MjY1MFowXzELMAkGA1UEBhMC +VVMxCzAJBgNVBAgMAkNBMRYwFAYDVQQHDA1TYW4gRnJhbmNpc2NvMRAwDgYDVQQK +DAdSb290IENBMRkwFwYDVQQDDBByb290LnBpbmdvcmEub3JnMIICIjANBgkqhkiG +9w0BAQEFAAOCAg8AMIICCgKCAgEA4s1XxwZruaRwuDX1IkM2oxdSdjg7FeUp8lsN +Uix4NdXz8IoQWRzCfFuRBKFHptahutSO6Bbewm9XmU2hHG7aoCqaZqEVQ/3KRLZ4 +mzaNBCzDNgPTmDkz/DZKzOVuyVvbmTOsLn53yxKnFP9MEDIEemqGiM80MmFfCm/o +0vLkjwkRpreMsWPUhrq3igTWRctUYMJAeDsEaaXB1k5ovWICrEylMzslgSNfoBed +NmBpurz+yQddKNMTb/SLYxa7B1uZKDRSIXwwOZPdBDyUdlStUPodNG/OzprN+bRC +oFRB9EFG1m5oPJXQIalePj0dwhXl/bkV4uRxCSZmBZK3fbtLMF+Wkg2voTrn51Yv +lKkzUQoEX6WWtUameZZbUB8TbW2lmANuvGBmvBbj3+4ztmtJPXfJBkckCeUC6bwC +4CKrgB587ElY357Vqv/HmRRC9kxdzpOS9s5CtcqJ3Dg1TmLajyRQkf8wMqk0fhh7 +V+VrPXB030MGABXh5+B2HOsF307vF030v7z+Xp5VRLGBqmDwK0Reo2h8cg9PkMDS +5Qc2zOJVslkJ+QYdkea1ajVpCsFbaC1JPmRWihTllboUqsk9oSS3jcIZ8vW3QKMg +ZbKtVbtVHr3mNGWuVs96iDN5Us3SJ6KGS8sanrAYAAB/NKd1Wl3I0aVtcb6eOONd +edf9+b0CAwEAAaNTMFEwHQYDVR0OBBYEFJ5hR0odQYOtYsY3P18WIC2byI1oMB8G +A1UdIwQYMBaAFJ5hR0odQYOtYsY3P18WIC2byI1oMA8GA1UdEwEB/wQFMAMBAf8w +DQYJKoZIhvcNAQELBQADggIBAIrpAsrPre3R4RY0JmnvomgH+tCSMHb6dW52YrEl +JkEG4cVc5MKs5QfPp8l2d1DngqiOUnOf0MWwWNDidHQZKrWs59j67L8qKN91VQKe +cSNEX3iMFvE59Hr0Ner6Kr09wZLHVVNGcy0FdhWpJdDUGDoQjfL7n7usJyCUqWSq +/pa1I9Is3ZfeQ5f7Ztrdz35vVPj+0BlHXbZM5AZi8Dwf3vXFBlPty3fITpE65cty +cYnbpGto+wDoZj9fkKImjK21QsJdmHwaWRgmXX3WbdFBAbScTjDOc5Mls2VY8rSh ++xLI1KMB0FHSJqrGoFN3uE+G1vJX/hgn98KZKob23yJr2TWr9LHI56sMfN5xdd5A +iOHxYODSrIAi1k+bSlDz6WfEtufoqwBwHiog4nFOXrlHpGO6eUB1QjaQJZwKn2zE +3BjqJOoqbuBMg5XZRjihHcVVuZdU39/zQDwqliNpx3km4FzOiEoBABGzLP+Qt0Ch +cJFS1Yc8ffv616yP4A9qkyogk9YBBvNbDLB7WV8h8p1s4JP3f5aDUlxtAD+E+3aJ +8mrb3P7/0A2QyxlgX4qQOdj++b7GzXDxxLgOimJ4pLo0fdY8KWMeHvZPiMryHkMx +3GSZCHeleSVBCPB2pPCzUqkkKADbjBX3SYJsAMF9uXQAR4U7wojjvAmbt6vJEh6j +TEUG +-----END CERTIFICATE----- diff --git a/pingora-proxy/tests/utils/conf/keys/ca_chain.srl b/pingora-proxy/tests/utils/conf/keys/ca_chain.srl new file mode 100644 index 0000000..5c31864 --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/ca_chain.srl @@ -0,0 +1 @@ +764CA822243398735D12CB8F1295AEDF38869BA7 diff --git a/pingora-proxy/tests/utils/conf/keys/cert_chain.crt b/pingora-proxy/tests/utils/conf/keys/cert_chain.crt new file mode 100644 index 0000000..6b19e08 --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/cert_chain.crt @@ -0,0 +1,40 @@ +-----BEGIN CERTIFICATE----- +MIICtzCCAl2gAwIBAgIUC8kzFXZNRqjR158InTieHg1VrWowCgYIKoZIzj0EAwIw +gY4xCzAJBgNVBAYTAlVTMRMwEQYDVQQIEwpDYWxpZm9ybmlhMRYwFAYDVQQHEw1T +YW4gRnJhbmNpc2NvMRgwFgYDVQQKEw9IYXBweUNlcnQsIEluYy4xHzAdBgNVBAsT +FkhhcHB5Q2VydCBJbnRlcm1lZGlhdGUxFzAVBgNVBAMTDihkZXYgdXNlIG9ubHkp +MCAXDTE5MTIwOTE5NDgwMFoYDzIxMTkxMTE1MTk0ODAwWjCBgTELMAkGA1UEBhMC +VVMxEzARBgNVBAgTCkNhbGlmb3JuaWExFjAUBgNVBAcTDVNhbiBGcmFuY2lzY28x +GTAXBgNVBAoTEERFUiBpcyBGdW4sIEluYy4xETAPBgNVBAsTCEVuY29kaW5nMRcw +FQYDVQQDEw4oZGV2IHVzZSBvbmx5KTBZMBMGByqGSM49AgEGCCqGSM49AwEHA0IA +BJSBMLYEVPgjmd2vWgMpN9LupZa56T7Ds1+wAlyMphLDN56PWuphsrNsEwiIIeNv +MtRTPRuoiBkfvMiWON6nkGWjgaEwgZ4wDgYDVR0PAQH/BAQDAgWgMBMGA1UdJQQM +MAoGCCsGAQUFBwMBMAwGA1UdEwEB/wQCMAAwHQYDVR0OBBYEFOYNuOCrYKnTFIEV +ck5845y/yZHkMB8GA1UdIwQYMBaAFFZRXwepqUwm9Kh+repV7LkBDnEHMCkGA1Ud +EQQiMCCCCWRlcmlzLmZ1boITd2VsbGtub3duLmRlcmlzLmZ1bjAKBggqhkjOPQQD +AgNIADBFAiEA9XAQ1Xi4Lav8LKzXZMSOHHj21ycqf3grnUfKJ6iwRvkCIDevfipo +qIuR/Dnt1bBoXxFKv0w/LpH/89jIohUQwVSc +-----END CERTIFICATE----- +-----BEGIN CERTIFICATE----- +MIIDwzCCAqugAwIBAgIJAN0mCzwZkgZKMA0GCSqGSIb3DQEBCwUAMHgxCzAJBgNV +BAYTAlVTMRMwEQYDVQQIDApDYWxpZm9ybmlhMRYwFAYDVQQHDA1TYW4gRnJhbmNp +c2NvMRgwFgYDVQQKDA9DbG91ZEZsYXJlLCBJbmMxDDAKBgNVBAsMA1ImRDEUMBIG +A1UEAwwLZXhhbXBsZS5jb20wHhcNMTYwNjMwMTY1NTM5WhcNMzYwNjI1MTY1NTM5 +WjB4MQswCQYDVQQGEwJVUzETMBEGA1UECAwKQ2FsaWZvcm5pYTEWMBQGA1UEBwwN +U2FuIEZyYW5jaXNjbzEYMBYGA1UECgwPQ2xvdWRGbGFyZSwgSW5jMQwwCgYDVQQL +DANSJkQxFDASBgNVBAMMC2V4YW1wbGUuY29tMIIBIjANBgkqhkiG9w0BAQEFAAOC +AQ8AMIIBCgKCAQEA7y+v+9Eh2LjFoZbUetrJc+IVPb92PBNNY5AM+Nxukzj/9hth +tu7UPFnO+USrh+nFtR/rFfC6UwUqCtPaQ4EkSVJslR8f34GoOlc8zz7+dq9sGGu0 +hUPCLiptfBdIu73l0XqMd+xdGprl8hMdpH0CyKhAqTpv/00cmFobFwm1Fbf146hb +YAhyP6rIzDlrhvYFe3sFwAIjXQ0qyN+ffm/Ot1iFdYER24sl63XfwBPS97DwO70p +4jtbea8zlN58CFmTTK899J1f4MGbzvMyttdHG+WjhLNplB7fhtBdiHes2EdQws2S +TKbK5D/69OYXSVCwimcOnlklcJ1NpQJFFaWeKQIDAQABo1AwTjAdBgNVHQ4EFgQU +cu65A8EdrKWjFy9PZSRvSu8+4G0wHwYDVR0jBBgwFoAUcu65A8EdrKWjFy9PZSRv +Su8+4G0wDAYDVR0TBAUwAwEB/zANBgkqhkiG9w0BAQsFAAOCAQEAl3lAgKb+3NQ/ ++a+ooaML1Ndmh7h+UWl4bXx1TXwaLAi0iIujrAa3AM86vqKxFeCwZC9bAPEyGQrH +AF8JQbWAa2SckSDPSxM1ETV7EtJS4plaSfWxzX/m8jtd7D5RbzyE/qUH5JsXvCta +rKOMJPNvSfTuxQMX/Qyp0cHZUr/3ylUhdLWYsNwTAlQgx0OK8w+zWx6ESCM52Cz4 +Gqjpgcq6qylE2RoNmY0L+xb1B0YS+fslcjSXJZ/Z1j9mVrUM4wuekgcIxJfUrfhv +/957d4I04iMp6F/XgrrKUewCGiifcDi87nwoqHJwSIWG33LTb4e8mSe4Y83Fh8L2 +KWQDqcnYug== +-----END CERTIFICATE-----
\ No newline at end of file diff --git a/pingora-proxy/tests/utils/conf/keys/curve_test.384.crt b/pingora-proxy/tests/utils/conf/keys/curve_test.384.crt new file mode 100644 index 0000000..b44accd --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/curve_test.384.crt @@ -0,0 +1,11 @@ +-----BEGIN CERTIFICATE----- +MIIBnDCCASOgAwIBAgIJAJ8dDVMCYWE3MAoGCCqGSM49BAMDMBsxGTAXBgNVBAMM +EG9wZW5ydXN0eTM4NC5vcmcwHhcNMjMwNDA3MTY0NzEyWhcNMzMwNDA0MTY0NzEy +WjAbMRkwFwYDVQQDDBBvcGVucnVzdHkzODQub3JnMHYwEAYHKoZIzj0CAQYFK4EE +ACIDYgAENKtL8ciBDxA9G2auTbtbteNu8DI7gp0039+J6Z29laQpHLMw8MH7Wegx +HTv9RTXcf1sTCBloZh8qTvZTDh1yi7kjhZ2yLdVEVoakC5HBKvWzo1ewjSkOfBX7 +LF4p/8ULozMwMTAvBgNVHREEKDAmghIqLm9wZW5ydXN0eTM4NC5vcmeCEG9wZW5y +dXN0eTM4NC5vcmcwCgYIKoZIzj0EAwMDZwAwZAIwL8ad/dyrC62bFC7gGZkRzaTm +r2XlaMk6LB02IbVJgQytu+p50pnAgELVXISLP8LIAjBAjQ71pDbCjfg8Ts6iOnWH +p4R+Z2BjbTZu+Kmn1x8nyo2OJcchRYTRAKMS7YWstIk= +-----END CERTIFICATE----- diff --git a/pingora-proxy/tests/utils/conf/keys/curve_test.384.key.pem b/pingora-proxy/tests/utils/conf/keys/curve_test.384.key.pem new file mode 100644 index 0000000..391da5b --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/curve_test.384.key.pem @@ -0,0 +1,6 @@ +-----BEGIN EC PRIVATE KEY----- +MIGkAgEBBDCWPID9PlALCL+dNPdlEBw2fP4cU56akYDeV08fpY+DkhaJicPxAilY +2T68Epv7nh6gBwYFK4EEACKhZANiAAQ0q0vxyIEPED0bZq5Nu1u1427wMjuCnTTf +34npnb2VpCkcszDwwftZ6DEdO/1FNdx/WxMIGWhmHypO9lMOHXKLuSOFnbIt1URW +hqQLkcEq9bOjV7CNKQ58FfssXin/xQs= +-----END EC PRIVATE KEY----- diff --git a/pingora-proxy/tests/utils/conf/keys/curve_test.521.crt b/pingora-proxy/tests/utils/conf/keys/curve_test.521.crt new file mode 100644 index 0000000..b4a24d4 --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/curve_test.521.crt @@ -0,0 +1,13 @@ +-----BEGIN CERTIFICATE----- +MIIB5zCCAUmgAwIBAgIJALxqm9BrQU12MAoGCCqGSM49BAMEMBsxGTAXBgNVBAMM +EG9wZW5ydXN0eTUyMS5vcmcwHhcNMjMwNDA3MTY0NjU4WhcNMzMwNDA0MTY0NjU4 +WjAbMRkwFwYDVQQDDBBvcGVucnVzdHk1MjEub3JnMIGbMBAGByqGSM49AgEGBSuB +BAAjA4GGAAQA9LXDr66Cx/DZYnSacGu0FxlSx/e7xTm49g2QGU7TkO8TEyaOkErl +IaqJE7YxQp+CUMfelVVkUJmVlJ4Fkrl3nR4A3YLDjEYihXnuLZajbwkjC7vzKO8A +O2ln8R5JSzClUoTu7s2nok7tw/6dP4i08YPk4Pkxm5NHIok0uFmoaJpdkq6jMzAx +MC8GA1UdEQQoMCaCEioub3BlbnJ1c3R5NTIxLm9yZ4IQb3BlbnJ1c3R5NTIxLm9y +ZzAKBggqhkjOPQQDBAOBiwAwgYcCQgCdVxTjVAPCIouh1HH4haJDpS1/g30jcTj6 +FGvyxofIX4Q6fO3Ig8DlJa+SrDq2f75/f8RSC71NB6peNjP8IARCOAJBKEMcXjK5 +btvZxg+puzyxuMNRtUUk/Re/pzzLJbi7o6MWVNgLQJ3d9kUVHzbQEXNiUe82vbYK +uairSMDS6Dl1j/A= +-----END CERTIFICATE----- diff --git a/pingora-proxy/tests/utils/conf/keys/curve_test.521.key.pem b/pingora-proxy/tests/utils/conf/keys/curve_test.521.key.pem new file mode 100644 index 0000000..739b862 --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/curve_test.521.key.pem @@ -0,0 +1,7 @@ +-----BEGIN EC PRIVATE KEY----- +MIHbAgEBBEFiMUgbEqjcf3K4Ba+CFUv20+ryJq9REjWUkoi9AgkpGuEAqLQza3CM +kSGSiPdm9gWmpeLlCExPVJRbcTmAhoZUcKAHBgUrgQQAI6GBiQOBhgAEAPS1w6+u +gsfw2WJ0mnBrtBcZUsf3u8U5uPYNkBlO05DvExMmjpBK5SGqiRO2MUKfglDH3pVV +ZFCZlZSeBZK5d50eAN2Cw4xGIoV57i2Wo28JIwu78yjvADtpZ/EeSUswpVKE7u7N +p6JO7cP+nT+ItPGD5OD5MZuTRyKJNLhZqGiaXZKu +-----END EC PRIVATE KEY----- diff --git a/pingora-proxy/tests/utils/conf/keys/ex1.crt b/pingora-proxy/tests/utils/conf/keys/ex1.crt new file mode 100644 index 0000000..2846e9f --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/ex1.crt @@ -0,0 +1,17 @@ +-----BEGIN CERTIFICATE----- +MIICtzCCAl2gAwIBAgIUC8kzFXZNRqjR158InTieHg1VrWowCgYIKoZIzj0EAwIw +gY4xCzAJBgNVBAYTAlVTMRMwEQYDVQQIEwpDYWxpZm9ybmlhMRYwFAYDVQQHEw1T +YW4gRnJhbmNpc2NvMRgwFgYDVQQKEw9IYXBweUNlcnQsIEluYy4xHzAdBgNVBAsT +FkhhcHB5Q2VydCBJbnRlcm1lZGlhdGUxFzAVBgNVBAMTDihkZXYgdXNlIG9ubHkp +MCAXDTE5MTIwOTE5NDgwMFoYDzIxMTkxMTE1MTk0ODAwWjCBgTELMAkGA1UEBhMC +VVMxEzARBgNVBAgTCkNhbGlmb3JuaWExFjAUBgNVBAcTDVNhbiBGcmFuY2lzY28x +GTAXBgNVBAoTEERFUiBpcyBGdW4sIEluYy4xETAPBgNVBAsTCEVuY29kaW5nMRcw +FQYDVQQDEw4oZGV2IHVzZSBvbmx5KTBZMBMGByqGSM49AgEGCCqGSM49AwEHA0IA +BJSBMLYEVPgjmd2vWgMpN9LupZa56T7Ds1+wAlyMphLDN56PWuphsrNsEwiIIeNv +MtRTPRuoiBkfvMiWON6nkGWjgaEwgZ4wDgYDVR0PAQH/BAQDAgWgMBMGA1UdJQQM +MAoGCCsGAQUFBwMBMAwGA1UdEwEB/wQCMAAwHQYDVR0OBBYEFOYNuOCrYKnTFIEV +ck5845y/yZHkMB8GA1UdIwQYMBaAFFZRXwepqUwm9Kh+repV7LkBDnEHMCkGA1Ud +EQQiMCCCCWRlcmlzLmZ1boITd2VsbGtub3duLmRlcmlzLmZ1bjAKBggqhkjOPQQD +AgNIADBFAiEA9XAQ1Xi4Lav8LKzXZMSOHHj21ycqf3grnUfKJ6iwRvkCIDevfipo +qIuR/Dnt1bBoXxFKv0w/LpH/89jIohUQwVSc +-----END CERTIFICATE----- diff --git a/pingora-proxy/tests/utils/conf/keys/ex1.key.b64 b/pingora-proxy/tests/utils/conf/keys/ex1.key.b64 new file mode 100644 index 0000000..a3d18c5 --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/ex1.key.b64 @@ -0,0 +1 @@ +AAEAAJIx7XoAAAABAAAAIAAAAAAAAAACAAH//wAAABAAAAAAW66OYKnvlI3LQETZc85HajCUyhsAAAAAAAAAAAAAAAD+EOoVAAAAAQAAAKAAAAAAAAAAAv////8AAACQAAAAB018vkpfL1Bmrc2c9A5NcT3M3EdG+ZQfTZGN4BHUIpzOXK85cESryj5aFHIOh37fuRZlcCO8i9G44x+xNE45M9nw7tI2D4Sf1zraq9titAqMj3I+I3CZW2LX61CHyMYlfdxG/F7OR7dz1kbUcJeP73l+v65cPIEwek6gzvTZOIz2W8AnFdc0jW3iZFcgAhPmJzkBs4EAAAABAAAAMAAAAAAAAAACAAAAAAAAACAAAAAM0IInmYQDB4EBkHw182qCs6LncTgAAAAAAAAAAAAAAAA=
\ No newline at end of file diff --git a/pingora-proxy/tests/utils/conf/keys/intermediate.crt b/pingora-proxy/tests/utils/conf/keys/intermediate.crt new file mode 100644 index 0000000..7d34a41 --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/intermediate.crt @@ -0,0 +1,27 @@ +-----BEGIN CERTIFICATE----- +MIIEjjCCAnagAwIBAgIUHIB/tqjZJaKIgeWwvXRt03C0yIMwDQYJKoZIhvcNAQEL +BQAwXzELMAkGA1UEBhMCVVMxCzAJBgNVBAgMAkNBMRYwFAYDVQQHDA1TYW4gRnJh +bmNpc2NvMRAwDgYDVQQKDAdSb290IENBMRkwFwYDVQQDDBByb290LnBpbmdvcmEu +b3JnMB4XDTIyMTExMDE5MzI0M1oXDTI1MDgwNjE5MzI0M1owTjELMAkGA1UEBhMC +VVMxCzAJBgNVBAgMAkNBMRgwFgYDVQQKDA9JbnRlcm1lZGlhdGUgQ0ExGDAWBgNV +BAMMD2ludC5waW5nb3JhLm9yZzCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoC +ggEBAL4klMT1Bc4vYWN7zF+x7x34s3L51Sve3AVydGGtzej2hC3m4CictVfKfkC6 +jMNRo3mpUsnAJSbyRh91fec8nnOT8MEYnmm05Lbf5DG4RULrKSg52zge4SFTLO2n +2eCa4SYwRpj+MQmFrCQ++s9gJ/5weN95z23XAS1EL2GK50Z/fKQfRCo+aZTRB6dU +KK2cUwuDAHTkVSePVAX8KGcZu2Qm/jTBlcDIfn7OmTu2g/n5YSRJg3MWKeJlAbVo +VNxmaRYQOs2X7y4WwcSAfEncyVXRzqFxEfSDnq2A2+pp/sKoCjTgE6n94SzyqyFm +yJ8FmvV79qCDHSaeIhR5qQEIlO8CAwEAAaNTMFEwHQYDVR0OBBYEFP5ivTJr/S6Z +VpOI4+JykGPID8s3MB8GA1UdIwQYMBaAFJ5hR0odQYOtYsY3P18WIC2byI1oMA8G +A1UdEwEB/wQFMAMBAf8wDQYJKoZIhvcNAQELBQADggIBAM337XP2Hm7LE3KLW+nn +khyj82ahj2k2+H/OdwGqzfqHCObTP+ydNJhOQVD+r255qQC9eAvd6ivF/h1tJOvv +Ed8vQMfLCO9VDFy6KCmlZQV6djRU1QXJIR/jf7TNqrFOcuKPGv5Vs6JwDdaHc0ae +ug7CGppnu5cxf/04sa7pWOdCFbhDRtfooo9fgGN2jcTFqfGyzocBwx7dgqEmZkae +yJAH0x4ldpKM9aO44h0Uy36c5RaWmdyFIh88QW62NoHamfwZoaVyycn82wcP4fFG +PRHm/AaDkYFGiQy22y7DD+MeZNUgCcAJpDYxfe87Cm4dw9NweMF6Jpo/8Ib1oLPq +E3miiFjWQwpMhxSQxpjqR92FPs9+/ktvYqbbMlyu/tju0rK17DXUi1zSIHoydPt0 +ymwWMxg7Jxpmg0x+eyWr5CP/ULM+F2Tk9W7x0B5DnpDJeCk+1ydUhII9AnTOCUWs +0VRlqTgFKahkHfiLBjPaLCgA0D3dz06EfEq5tmC8t0MDAqw9M4bDdow29K0aN6K8 +Gax7S5EK9aK09+HJ+7T5uxkUC+iIzfk53RhAfQiXdyKPpkbndRP67OiaAwk+hIXm +U1d1GsC854KYQs2GtHHvBcTGEADfU36TF/w2oJYQIrBjd23ZCf9jFK/WQ5GBFitT +ljoURxQQQy3LGjcH8W18JdRE +-----END CERTIFICATE----- diff --git a/pingora-proxy/tests/utils/conf/keys/intermediate.csr b/pingora-proxy/tests/utils/conf/keys/intermediate.csr new file mode 100644 index 0000000..8035e28 --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/intermediate.csr @@ -0,0 +1,16 @@ +-----BEGIN CERTIFICATE REQUEST----- +MIICkzCCAXsCAQAwTjELMAkGA1UEBhMCVVMxCzAJBgNVBAgMAkNBMRgwFgYDVQQK +DA9JbnRlcm1lZGlhdGUgQ0ExGDAWBgNVBAMMD2ludC5waW5nb3JhLm9yZzCCASIw +DQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAL4klMT1Bc4vYWN7zF+x7x34s3L5 +1Sve3AVydGGtzej2hC3m4CictVfKfkC6jMNRo3mpUsnAJSbyRh91fec8nnOT8MEY +nmm05Lbf5DG4RULrKSg52zge4SFTLO2n2eCa4SYwRpj+MQmFrCQ++s9gJ/5weN95 +z23XAS1EL2GK50Z/fKQfRCo+aZTRB6dUKK2cUwuDAHTkVSePVAX8KGcZu2Qm/jTB +lcDIfn7OmTu2g/n5YSRJg3MWKeJlAbVoVNxmaRYQOs2X7y4WwcSAfEncyVXRzqFx +EfSDnq2A2+pp/sKoCjTgE6n94SzyqyFmyJ8FmvV79qCDHSaeIhR5qQEIlO8CAwEA +AaAAMA0GCSqGSIb3DQEBCwUAA4IBAQAb7MY4eggHzheSS0wA2CtY1Q1YCU44XIjU +CuR0ht02jEZ5sXAvIvtrBfQdTZ3pWWbUwxfFmGcLUqS2aafQsVR5EHDl1YjAxy0Z +htIecE8Rb89p/O44pVdivLYPj4SvHUk0Hq7hkgyPL55Va0thMLSjjCyRouK4dX2d +gKnGfdFg5vkOjY7HWVJuWHioQTb5gjvF3dOTWbmJl3m5so5QJxjOjfA+iKvH+kOb +4/OnwmDYxJ/d+4sReiFeiOXWlwuubYpzAbi1TxsDcQGlp8PO6JjFA9PJqZuYZqhL +KzBwFqxujS12v6ccyqg2zLLveH9M9HgliAqDlSxsGvhyu19RVwtG +-----END CERTIFICATE REQUEST----- diff --git a/pingora-proxy/tests/utils/conf/keys/intermediate.key b/pingora-proxy/tests/utils/conf/keys/intermediate.key new file mode 100644 index 0000000..31ac627 --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/intermediate.key @@ -0,0 +1,28 @@ +-----BEGIN PRIVATE KEY----- +MIIEvgIBADANBgkqhkiG9w0BAQEFAASCBKgwggSkAgEAAoIBAQC+JJTE9QXOL2Fj +e8xfse8d+LNy+dUr3twFcnRhrc3o9oQt5uAonLVXyn5AuozDUaN5qVLJwCUm8kYf +dX3nPJ5zk/DBGJ5ptOS23+QxuEVC6ykoOds4HuEhUyztp9ngmuEmMEaY/jEJhawk +PvrPYCf+cHjfec9t1wEtRC9hiudGf3ykH0QqPmmU0QenVCitnFMLgwB05FUnj1QF +/ChnGbtkJv40wZXAyH5+zpk7toP5+WEkSYNzFiniZQG1aFTcZmkWEDrNl+8uFsHE +gHxJ3MlV0c6hcRH0g56tgNvqaf7CqAo04BOp/eEs8qshZsifBZr1e/aggx0mniIU +eakBCJTvAgMBAAECggEAHukXfkVO4kv1ixSvDseAVeD+WyyeKPmbzw7iOJbmqH6a +0lN8EV4YZOM4TxGEnKQC7V5HZSDlaUVtfOO+yf6iy6s7MkjsR8buf4Q6NpL8P3q3 +QCDXsHHkq2Q4I5Jr6wWCoJCsiWaZVjDy4RmT8G5zUfu6yqmkvPh86nzxLuxD2MPM +nu3eR7hbiCisk+Of6oIknRU8vhsnjOOkEZ5PyDAcQbxwbfXuARHayxNOsyM2bMKM +LvS01JRQHbDEm/vpjvMmoAuGk1EKyTnGxMmooEYKDYHTJQYYEZTRwJtD/YZD90aY +D7tg1YMAn80z6l2f1Mg1PN8X9yIhUhHL4uZfhGa1kQKBgQD5A8R3uwlTBCivYYM3 +WdICFJVhsKIS1M8mJgepNmZtggkaeGHrGiOWfXX8kr5Jqkw4QwvJyz1QFMVvathM +WbuY+1TCzvN7820n0luWFe8iGHqyuXBksroeW+d8p+qD6B1Y1YOtdb2/gmeYrWIH +pwTlalTsE0gVfh363/ow5GOHrQKBgQDDegqcPjjOd5EdlN19ga7x4NNWT/aT07UO +FC8Xwml6VvmN8cGn4z9dbzPB8Q6uUlgfwsyzO7DyXDxHQZA8iRb/5HjlEGJ3HYbC +lq5nkvgfM3GGVQ/7EbVt7TtDFE8ZQsObvS6KCIXDT5NL8JfypmajQYVnupgbBTlR +bjVOnrrSiwKBgC9KNd9/F7A6U/eqjx7N4gIfIpdg0ga9f3GBO2c5O46EaXIrdn0N +g8CqpuOGgri+rKbqpKx3+nbg2vXj1pv5VpUg9eHhJ4BcpFgxrM7972IMQBD9Aok9 +H/dwALA9u129kQUz10Pz3ksmWsI1+y303AstfF8w8jmSr+La8kqitPwpAoGBAJxV +uLKo2MnXuomMC3BbDU2JX7xCC5TC1qTB47/+zlj3wnKRjS32gzD4xM4xOmqUlMIi +C5C1Bpluxw6+Ets3UNurID0i030sciCiXi2bzzE09XBYC4Xi7dVSy/ij/3bWfJbL +wLLIiiJgPA+aBgwcpS2gM094XjoN/X9wwtV0ATptAoGBAIZ7jEYoz4G4wkM7qyeW +bA87hAlTcBY4kB9FueUHsPw3VYXQUVu6MB5YzgirRFMLTcGN+ltuZ9bRYxJSFiCi +R6eaBkMIBO6vrp5UDjrIuSAW6IpIFsgaVqFOZxkk4mkISef8d3b4n5JMpKNGrbFk +tlasWnqSVlC53w9u6yL+s07f +-----END PRIVATE KEY----- diff --git a/pingora-proxy/tests/utils/conf/keys/intermediate.srl b/pingora-proxy/tests/utils/conf/keys/intermediate.srl new file mode 100644 index 0000000..8eb8f1d --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/intermediate.srl @@ -0,0 +1 @@ +199D7F7B72FA2892E58A80EC205EE63A20543BE0 diff --git a/pingora-proxy/tests/utils/conf/keys/key.pem b/pingora-proxy/tests/utils/conf/keys/key.pem new file mode 100644 index 0000000..0fe68f2 --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/key.pem @@ -0,0 +1,5 @@ +-----BEGIN EC PRIVATE KEY----- +MHcCAQEEIN5lAOvtlKwtc/LR8/U77dohJmZS30OuezU9gL6vmm6DoAoGCCqGSM49 +AwEHoUQDQgAE2f/1Fm1HjySdokPq2T0F1xxol9nSEYQ+foFINeaWYk+FxMGpriJT +Bb8AGka87cWklw1ZqytfaT6pkureDbTkwg== +-----END EC PRIVATE KEY----- diff --git a/pingora-proxy/tests/utils/conf/keys/leaf.crt b/pingora-proxy/tests/utils/conf/keys/leaf.crt new file mode 100644 index 0000000..8ebd98c --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/leaf.crt @@ -0,0 +1,20 @@ +-----BEGIN CERTIFICATE----- +MIIDQDCCAigCFHZMqCIkM5hzXRLLjxKVrt84hpunMA0GCSqGSIb3DQEBCwUAME4x +CzAJBgNVBAYTAlVTMQswCQYDVQQIDAJDQTEYMBYGA1UECgwPSW50ZXJtZWRpYXRl +IENBMRgwFgYDVQQDDA9pbnQucGluZ29yYS5vcmcwHhcNMjIxMTEwMTg1NzE0WhcN +MzIxMTA3MTg1NzE0WjBrMQswCQYDVQQGEwJVUzELMAkGA1UECAwCQ0ExFjAUBgNV +BAcMDVNhbiBGcmFuY2lzY28xITAfBgNVBAoMGEludGVybmV0IFdpZGdpdHMgUHR5 +IEx0ZDEUMBIGA1UEAwwLcGluZ29yYS5vcmcwggEiMA0GCSqGSIb3DQEBAQUAA4IB +DwAwggEKAoIBAQCTvo3hkSRrrJfrfZ1LiujaffSuErWbkiHkqOqAMofsqmkt+S4K +BAbwcJN8g/HN7Jxr43lFo7kZeFQZ6utg6uywe4yBxppqAt4r/Th1tUBJ982Vcs9K +3sMyjWO9UgSyoQdRjjXKlUYI316SBPYgFiac1M2UocPycEavxIlYrpS7d1i1PCSj +ByMiBbalSxrwEv97FOlSW0f0COiLoV36SXuq8jNyrFzk4zZXCYz5WjgZSkm/iFJL +abbX5nTmrzLnfm7BSbpnRMdQtYUqYubR+rlBuiGZsDM9FRsT+H6uOQwgIKqGz6I+ +diBK3oIHeD4F5Lma6Evt66AGwrwDkNhSyQV1AgMBAAEwDQYJKoZIhvcNAQELBQAD +ggEBADn5HmEwQUn/Tbb+Lqh6Zp2K/RrOH7lEz4IE1N90mRPF2Aa8oOwE7dwWfsUr +dJqzkrARiiYMy1wL6P8xhBsStLJPf0RM9uIpfxIaq7fF5RhJPuc3rVfkDsnZeo+Q +zdXtBal8BlfGjLvZgZzIei6IlGZ/j8yHDcEVP8IpQoSLtrQpSWe4CwGoSXfx/JqA +SD2ZS46mEVQIaQ4QEZecVLEQQTeEYMX50HkD+ea9GsuSQF5cOfY/lrHuFa0tW0SX +zYWtq9XTwEc+nPPLL0UMQWFWlsMb7pS2vtQS93wm00G6rpFHVEyq1ePbmDxRsjV4 +cgEH6QwqLWOmGHx4xpw2ZESwnUY= +-----END CERTIFICATE----- diff --git a/pingora-proxy/tests/utils/conf/keys/leaf.csr b/pingora-proxy/tests/utils/conf/keys/leaf.csr new file mode 100644 index 0000000..7c3a206 --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/leaf.csr @@ -0,0 +1,17 @@ +-----BEGIN CERTIFICATE REQUEST----- +MIICsDCCAZgCAQAwazELMAkGA1UEBhMCVVMxCzAJBgNVBAgMAkNBMRYwFAYDVQQH +DA1TYW4gRnJhbmNpc2NvMSEwHwYDVQQKDBhJbnRlcm5ldCBXaWRnaXRzIFB0eSBM +dGQxFDASBgNVBAMMC3BpbmdvcmEub3JnMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8A +MIIBCgKCAQEAk76N4ZEka6yX632dS4ro2n30rhK1m5Ih5KjqgDKH7KppLfkuCgQG +8HCTfIPxzeyca+N5RaO5GXhUGerrYOrssHuMgcaaagLeK/04dbVASffNlXLPSt7D +Mo1jvVIEsqEHUY41ypVGCN9ekgT2IBYmnNTNlKHD8nBGr8SJWK6Uu3dYtTwkowcj +IgW2pUsa8BL/exTpUltH9Ajoi6Fd+kl7qvIzcqxc5OM2VwmM+Vo4GUpJv4hSS2m2 +1+Z05q8y535uwUm6Z0THULWFKmLm0fq5QbohmbAzPRUbE/h+rjkMICCqhs+iPnYg +St6CB3g+BeS5muhL7eugBsK8A5DYUskFdQIDAQABoAAwDQYJKoZIhvcNAQELBQAD +ggEBABu/Xes3RKEGof0N6jGbP9UgPeTm5ljIfqY1/xJT1uQTKNti7qn8OCzEBFGf +IvlXnqN3bjSK7wExan9hNZqJO2R2ye+Jliil39LsUellU9BL3TWayKFgu7h4eoCs +J1Yty2qibhxowzld3qhBIxJO1Qf2MAxM/O4KmBmiKLbPLRodRckGH+22JYqM3NNt +WXDyncHOBD1DpoMHfHgvdmdPPXuBoDTNbS9Wtyf/EmXck5Uj1rinBzlIJZJZxtkN +qlOW9HcBDBUeIJq4qt92niSia/9kfndrcnOSonylQHF0ALw6S6MHBVkok3Vro0Gc +CxMlJO8IQSxn11Xeg4WSCtTLipk= +-----END CERTIFICATE REQUEST----- diff --git a/pingora-proxy/tests/utils/conf/keys/leaf.key b/pingora-proxy/tests/utils/conf/keys/leaf.key new file mode 100644 index 0000000..d58e9f3 --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/leaf.key @@ -0,0 +1,28 @@ +-----BEGIN PRIVATE KEY----- +MIIEuwIBADANBgkqhkiG9w0BAQEFAASCBKUwggShAgEAAoIBAQCTvo3hkSRrrJfr +fZ1LiujaffSuErWbkiHkqOqAMofsqmkt+S4KBAbwcJN8g/HN7Jxr43lFo7kZeFQZ +6utg6uywe4yBxppqAt4r/Th1tUBJ982Vcs9K3sMyjWO9UgSyoQdRjjXKlUYI316S +BPYgFiac1M2UocPycEavxIlYrpS7d1i1PCSjByMiBbalSxrwEv97FOlSW0f0COiL +oV36SXuq8jNyrFzk4zZXCYz5WjgZSkm/iFJLabbX5nTmrzLnfm7BSbpnRMdQtYUq +YubR+rlBuiGZsDM9FRsT+H6uOQwgIKqGz6I+diBK3oIHeD4F5Lma6Evt66AGwrwD +kNhSyQV1AgMBAAECgf9K7dlcYhVBMRyF0gR0INRMpenxs+C8MDYAQaqsWZ7rPYHE ++cWKTtXgxeIGxDleC8yeQD9rkh0jTbiuwaBJBtwDT/rH1nF5p6Va/zwjIPP55N3e +wttenUYMh/3i24sxDM8pYsuPx8+SWwvGAmjQ3RW4Ht95rJDejmf1vIyWQp7Wc8C/ +i5/8YIznfP8TVBA5aggUKJ+aqsHlpBxuZ2+Lnv+Dn67cAiBR1uB7Pvtu25MhrocW +z8ZCgUhwcimmi0+ZoxJSd8wVA7bPJqg4gBeELCbWiR4Bnn2vLu55/UJ4jtoVRU3l +vzlrRZTBRVElocojX7u+0sdovWtD0SIEzN+7sWcCgYEAxXsy5w+HSefIBrtsIod4 +GtpXfGusgUOqR9c57Tq8boOkT+5vkTAovk+BRXQGWcaU3a2wEqbPQ6UQhWOoBiTu ++eiwhv7dUiczDte1xABkyqHYeaE+VMiGh+8orR/AxQWHtYQW01Fa1CSayGW9GEIY +94BhmmvQuMUklM9gYDDO4RsCgYEAv4ZYsIimjhMO47X3A86GA4hgBoI1IO9WZWjK +SmYNdWPcH7A5iamfdmk4frSjDVZ5oHHKtx9JhgpKucFjRCLzWVJw34JohExICHxu +FKeoVSKiUcmke5kE+wDRpz38vCUzThDAdI4kK5ALt35XCp0GLahhgral7C0bO5L3 +K9NNrK8CgYEAhsvHPQzObdX2JRI9h3wssTekS9s7TqifTJZOhe13vX/B4oWARfw3 +c1/Vf1DpHNJ9vqrV6oVOKIA9PK9/e2IudQsto5fH/lGfelwnR/h01BORLcSwRTLz +EUpf23w3GsThkzbsVaXMd83ckTlQz4QegqJw/PTm7ZgzlhfPUxk9vU0CgYAKn3X9 +3KZ4TOBPiwE3adYPDhpdYg12VTASkdxeggiPEUndTBX0576bf7yNcpF0pO48EvOu +coLOd3Wrlelelx2uP59ZFk+bvutj5Rrp9F6m0jP0m12PKW6YSXRXdV22Rc3xr4Yt +MNEaxXOQ6uYDBbCZCbTW3jCXx+yxwjYJbT/qqwKBgGoGcZtZPVSQebxAWUXXIRT4 +kCen7eaJFgkD8OKILfq02ks2WYs3IK+XIv/+DEuYZJKP1dyTPnKL1Qc9B4kcrgWj +dR0yEiGnjG7ULzJWHhx/lSXot8VcDSOr9oZ+hVIjwKMsO1WwhPD5Hp4r9a0jIquI +A8xdbpCsmC60hkhJk2B1 +-----END PRIVATE KEY----- diff --git a/pingora-proxy/tests/utils/conf/keys/leaf2.crt b/pingora-proxy/tests/utils/conf/keys/leaf2.crt new file mode 100644 index 0000000..0e58371 --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/leaf2.crt @@ -0,0 +1,25 @@ +-----BEGIN CERTIFICATE----- +MIIEQDCCAygCFBmdf3ty+iiS5YqA7CBe5jogVDvhMA0GCSqGSIb3DQEBCwUAME4x +CzAJBgNVBAYTAlVTMQswCQYDVQQIDAJDQTEYMBYGA1UECgwPSW50ZXJtZWRpYXRl +IENBMRgwFgYDVQQDDA9pbnQucGluZ29yYS5vcmcwHhcNMjIxMjIyMjIxNzIyWhcN +MzIxMjE5MjIxNzIyWjBrMQswCQYDVQQGEwJVUzELMAkGA1UECAwCQ0ExFjAUBgNV +BAcMDVNhbiBGcmFuY2lzY28xITAfBgNVBAoMGEludGVybmV0IFdpZGdpdHMgUHR5 +IEx0ZDEUMBIGA1UEAwwLcGluZ29yYS5vcmcwggIiMA0GCSqGSIb3DQEBAQUAA4IC +DwAwggIKAoICAQD1HQzvibMFxG/OdudzEpiFHl5JkHj/ZzhR8a47dmWcIjEaYd1z +/hMZ8Zdpc/Ho3IKzwSQ5+5UyKcFjNmERYIje6pdH6NG8407Syv3Cxr1oR7t8kWoW +lIsbC1A9Ikhh7pHZntoYrUUjGslgHH8KQFtNPYmOJxwx1EYha/7pdr3/mc2MvidW +IRcxokkww39G3YP5UxV1IWM7OJZ8nWASRthwerfhCRrAX+OilVB+Ei8p08+BJnvS +gyROC/vUU9RXggg63qgRKNraamUlW4fhBY9Qxr8vkuFFoXNxZllKxUlZW2YtQKmk +QQCs4u1cF42ugGBeVqGooFvmezYPRwOxwL3R71UDDdEd/PQEg7skvu/Tyn+s6st1 +zcyBO+CT4Ogo2qbT7BaD9K/umElSDEIkW4JED+WtMihAZSeoAO4vsrh3ZGK5i3zv +VLFTbbbgE0vxoqF78ryxrzQuPJEIA5j1TycWjxTNl6IDy3J3QUjNzuVHZB5NK+N2 +Xx/rPhxh96GpY31tOCVC2L/YgkdnQB0e5ICet+LMGDcaNbXTFJoEEvq1patLJ23P +tyXgigl19OgLLFW9U5eExQ99QbdQhMORh4M7IN+UAmIiokHi4ZaH76VKaqKPzZ7r +MEsAeYryTfN5SdF4XFTDojR7rYT3kwPl7au66rDNdS3nNUTSHja6RxWqzwIDAQAB +MA0GCSqGSIb3DQEBCwUAA4IBAQCpyWaCksa8DSofS3ttjh5fRjUkth7O6nEDDZC3 +jOSNmwK0rZIK7pPLl7ogPVGpgu+dyTGQ9Jb3w5Xm3N26u/fLbVk7t7BCYbDMr14o +bJrSswz04GN/+e+JEVVTd6vU7weQGLbXrSMSsovzRJDhJe7qeV+u3RsxOLFyQntr +OqWB1x4bU/OghDOUSlRENwUCFursFHO3QWeD/ECPPSe1Q9J5Tkk/wd3TGTyyRUkW +hIgXrfIrZjEApa+nQma7+gUUQ6gwJxB1wEeQOOkSNizrOj0kdSKBCpSEeJCcbJpl +29FigdShOhBUqIZH0Y487VpaxfqBB4Kq4vlIQhfas/f6h6hS +-----END CERTIFICATE----- diff --git a/pingora-proxy/tests/utils/conf/keys/leaf2.csr b/pingora-proxy/tests/utils/conf/keys/leaf2.csr new file mode 100644 index 0000000..fe84667 --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/leaf2.csr @@ -0,0 +1,28 @@ +-----BEGIN CERTIFICATE REQUEST----- +MIIEsDCCApgCAQAwazELMAkGA1UEBhMCVVMxCzAJBgNVBAgMAkNBMRYwFAYDVQQH +DA1TYW4gRnJhbmNpc2NvMSEwHwYDVQQKDBhJbnRlcm5ldCBXaWRnaXRzIFB0eSBM +dGQxFDASBgNVBAMMC3BpbmdvcmEub3JnMIICIjANBgkqhkiG9w0BAQEFAAOCAg8A +MIICCgKCAgEA9R0M74mzBcRvznbncxKYhR5eSZB4/2c4UfGuO3ZlnCIxGmHdc/4T +GfGXaXPx6NyCs8EkOfuVMinBYzZhEWCI3uqXR+jRvONO0sr9wsa9aEe7fJFqFpSL +GwtQPSJIYe6R2Z7aGK1FIxrJYBx/CkBbTT2JjiccMdRGIWv+6Xa9/5nNjL4nViEX +MaJJMMN/Rt2D+VMVdSFjOziWfJ1gEkbYcHq34QkawF/jopVQfhIvKdPPgSZ70oMk +Tgv71FPUV4IIOt6oESja2mplJVuH4QWPUMa/L5LhRaFzcWZZSsVJWVtmLUCppEEA +rOLtXBeNroBgXlahqKBb5ns2D0cDscC90e9VAw3RHfz0BIO7JL7v08p/rOrLdc3M +gTvgk+DoKNqm0+wWg/Sv7phJUgxCJFuCRA/lrTIoQGUnqADuL7K4d2RiuYt871Sx +U2224BNL8aKhe/K8sa80LjyRCAOY9U8nFo8UzZeiA8tyd0FIzc7lR2QeTSvjdl8f +6z4cYfehqWN9bTglQti/2IJHZ0AdHuSAnrfizBg3GjW10xSaBBL6taWrSydtz7cl +4IoJdfToCyxVvVOXhMUPfUG3UITDkYeDOyDflAJiIqJB4uGWh++lSmqij82e6zBL +AHmK8k3zeUnReFxUw6I0e62E95MD5e2ruuqwzXUt5zVE0h42ukcVqs8CAwEAAaAA +MA0GCSqGSIb3DQEBCwUAA4ICAQDn0ccCSKBk85dj4EeEzlBhRSlyPh+I6MamrIeM +OnKf0MPnetjAWsbHCqsbXakxC27u5MhiNu9g7zStuisG9oYE3cZVhGPK0QBf0N6C +wlclbFj6tTMhtxpwP2D8vNxEXbBqfCcQHI36qVfFOPm7wUlNGVKeinaW+d3azLtw +oUi683poEBWXdFu/xeE27DPT829/J0OHMm0O++lbKuzJO/9081vIMi95RUiz4uFX ++PyIXXYL+5C+L5PLmCtL/i5pVYo5ldHtqV1Q5XNlklmrF4YTHI6skovp6wkPCgpb +46W1mpHVt3sCpb52HGrcQWvyzBnkgekIdew0ed7DOMYV1dnGA95AzvwYQF2+thHp +yb7PINgrnby1h2y15EY7fczwfw5QffTc1zHaWQSXueBMQz0UkZS6h8bBON3tqoFd +Rhf9tdzIxp7l61Up0LJA2upRLW38Y81eqtpKVZHQY/qk2nLkKZ+Cbt1+TwtpfNll +aFcR+8epD4ojyAxwPJsSBL1VD9BMBX+Yep80qxNvj4O18dkrFSPOcsDW+IigMi26 +AWcOqPzQKUt5dGiGFJbvn5huCyUlX2Al7pXUTGBmXuO+4P0EFWo/oeUCLJ3nkrvQ +2tK39+RHwuZ9MQpZl0FSJFpXEECuRfI5l3Ci1fQD7F9T+T6mIlDGnO0cPVSaQ6dO +OdpRdQ== +-----END CERTIFICATE REQUEST----- diff --git a/pingora-proxy/tests/utils/conf/keys/leaf2.key b/pingora-proxy/tests/utils/conf/keys/leaf2.key new file mode 100644 index 0000000..47e4c40 --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/leaf2.key @@ -0,0 +1,51 @@ +-----BEGIN RSA PRIVATE KEY----- +MIIJKAIBAAKCAgEA9R0M74mzBcRvznbncxKYhR5eSZB4/2c4UfGuO3ZlnCIxGmHd +c/4TGfGXaXPx6NyCs8EkOfuVMinBYzZhEWCI3uqXR+jRvONO0sr9wsa9aEe7fJFq +FpSLGwtQPSJIYe6R2Z7aGK1FIxrJYBx/CkBbTT2JjiccMdRGIWv+6Xa9/5nNjL4n +ViEXMaJJMMN/Rt2D+VMVdSFjOziWfJ1gEkbYcHq34QkawF/jopVQfhIvKdPPgSZ7 +0oMkTgv71FPUV4IIOt6oESja2mplJVuH4QWPUMa/L5LhRaFzcWZZSsVJWVtmLUCp +pEEArOLtXBeNroBgXlahqKBb5ns2D0cDscC90e9VAw3RHfz0BIO7JL7v08p/rOrL +dc3MgTvgk+DoKNqm0+wWg/Sv7phJUgxCJFuCRA/lrTIoQGUnqADuL7K4d2RiuYt8 +71SxU2224BNL8aKhe/K8sa80LjyRCAOY9U8nFo8UzZeiA8tyd0FIzc7lR2QeTSvj +dl8f6z4cYfehqWN9bTglQti/2IJHZ0AdHuSAnrfizBg3GjW10xSaBBL6taWrSydt +z7cl4IoJdfToCyxVvVOXhMUPfUG3UITDkYeDOyDflAJiIqJB4uGWh++lSmqij82e +6zBLAHmK8k3zeUnReFxUw6I0e62E95MD5e2ruuqwzXUt5zVE0h42ukcVqs8CAwEA +AQKCAgBI6Lk+TzFHF+VB/rBd1Dw17JCTRTwYjHV+OmtfGJqk1K7ScCXVKNA5uVkW +bvyYDW97VIoYDTOV1kHF5xj8eEB+Pj19kE1C6EI8BVFyLHeOmzezl/V8ffbatoTJ +incJWlNb7hplmLSl+oPH6PII9Jez5AgUlqGWWNP7gQo0G7PsYa14nd9JiVJC20j2 +DlC/nYhyEzqguquvo+dvbchz50reOkKT14dzjZJCfDOTLImG4ZAplG7kcUnNRVdF +EyJoXS9hg3VulT50FY28jPtf/a1hk5yu4/vKIHocUxtgWEq3H67G6yMKzqMKyf1c +lUz5iQohRZeUdw6fAitUZAU/TFupj4cblFish6BsJBZRBdxKAh5hzbDGeVFPtJi0 +k2fgSPBJNzOz8yrowPyaTF0dzoowdFGQ7FNoNM4Xokk7D9s+dZda8D6bW6hMlj7j +yV2MmCHVDq3zuT+HeuygJSzsrF94p9p8xFrKBFl7saltuDv/GQ5yVIAz449nbwh/ +b5C2LQ+ZqG6sHtcR6zDImxDyfV6Rjq2qTqOFWSOpMyj+fiyzw8aOFWw1D26Q9stA +PqP/XjqhjzJcK3emW1Z5bFQ4YJ9JnFlx11MjeCf/XyQc1Bt3SXEh5NtOB9iECpN6 +VfILGDQcUFHEsObrCZI3GLHppOf387qgry0m8B5G/8CbfFqKUQKCAQEA+w0AvX0U +c6W4GoZRyaVTF/NLHgECHHTCP14kliXJchRqpmfHZ2BnUt7YhSX/cLZfROEPzkN0 +dWyZNRMnyTqbNqQ8nec1m9RUfNY9aNRg2dn9eWB032qkyv4hH40tAB+SysNP+Vwk +O3SfFv0+xEWV6Btg3lfaXUxaIdIqpyO31GH5+Q4/+7ykCrKXXRRCGwsF6+nQ2heM +mfjFpBfXvbzCHfVtcofklM/JFZgflmpbBh8UuCoXkiD2s47QHHUVXtrJmlmuj3dX +cOu4n9gCwmDJ7GQHHVrM/UmazOxOA/Jm2hZBO4lsKnWKziYw2pxPtkNNKKlQKEve +HQrA1QlEuHzNrQKCAQEA+fIVViEgCoO/8RfCzZ1KRaaRmbLj+OPaif2tuKTa/5yM +GwsRPlkrIv3aqt0Ut2xUmCS4UA9JHDgEu2cU7MwE4K+/JWcjMmTA+dB0wJBwReXQ +XCrNbzj3si7faHUhngE5V5g2e6LKzG7H+KP5INIusx7MOLXql/arXoekfr3mqYhB +KN2kbDDeco4466e+ADT5WtyTmVeTU0mrCrslOelGfk2z8kZAFAbs3Lx3aUhRiqYg +vkFWOEQLi7RnDTnbq/VlNlLco+M4GDVd7i1oyDKtIX+0cgDlPiJpwdkEB91/CXEJ +oeIZhvwBs5+Wl4IcBfT8KwA1cXi2ojsTwRs93Rnx6wKCAQAYtyr/fLTqvcHmOpsK +sw//J6CZj5fZnVUST/5iGc4/QOtO/qCO+NqzOeUvFpKTUiEG8vFPaSyp8ssSgpRE +J1ToiDq/gOeyM7EtqRnanC38xI1Dyc83v5QBuAsixA9OF82n0Jqq/ftDLzQKW1w2 +jnM3qppayWNiFAY7lilE0yth6VNmxZRfAC9WLkbgjwIDD47Brv80uWTKM8ehZAeF +UnP55xOjVuWWEO7HBXb2o/naHG05xEsVw9EF1GWAp7Y25Gs8mt+omCMvpsVCV03O +PSEj+KUKqsnLldd7nTgBA3hEuDQr3Fedxnyn1vKwUvs2AmIyQpj1nqJ7UXeygXsW +fpLxAoIBAQDgIj8R4liKNUUdHLKakY710H3Gd03JdgIWNf7fki20hBx7b7xBzdJJ +6Zx6FhCqvyFI4bzKRjrIbE+KAdEY24cQOWlOUCOW4BTQsCbSO3QCqifjTpq0P0CX +b0L1t/uyZeSW8S8CRaRYGIuIIvqXfQNVqqt1u2Qoa5GXDknrQb2jj0TnMYJtZpFD +5teSMvTF2Ls2yJAvNQIu8OPJlrK2MML/JgzUmDyD+QXUl8j5B1nf3EOGeK6pfBNi +bx7uFFEx7beaNEoZSPuXcdvOZrgMtqzcWllk1fq8cj2mEEZ2CyENRWle2pMLodag +zd5L9OfOS7cJlIFYROh5qEJ5q0UZjVeLAoIBACjbUlgxojC5e4zVMtSuudnkzNo1 +sMr5sb0UHa9SgVvi8gPaoNnwYorKgjBH6Kuv5reOpc9yuc58zl6kFuBUdZDV70yM +CODpVao3Zr8V4P+aGL8PtW4CEShpx8RhoOquTvnp1/cyMaWG27zc49QMf3bcCqWZ +IxZv1U+6AcohtljRuoI2sIzt3Nr5LWUqQxeaR6gVQNgJHLUNcBc7OEFWJOfQp0iv ++ibpIuGczevkNjQf+h4XpK/BTAQLo0w0u3Q05hPVpIcb44/9Br8wDyIbSz8YnpRe +DAC/IedXRyfpH6ySOI1et+CQ14PPktR44dc/e5RolW/OW24+p9l6D4iNp4g= +-----END RSA PRIVATE KEY----- diff --git a/pingora-proxy/tests/utils/conf/keys/public.pem b/pingora-proxy/tests/utils/conf/keys/public.pem new file mode 100644 index 0000000..0866a04 --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/public.pem @@ -0,0 +1,4 @@ +-----BEGIN PUBLIC KEY----- +MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE2f/1Fm1HjySdokPq2T0F1xxol9nS +EYQ+foFINeaWYk+FxMGpriJTBb8AGka87cWklw1ZqytfaT6pkureDbTkwg== +-----END PUBLIC KEY----- diff --git a/pingora-proxy/tests/utils/conf/keys/root.crt b/pingora-proxy/tests/utils/conf/keys/root.crt new file mode 100644 index 0000000..5d6507e --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/root.crt @@ -0,0 +1,33 @@ +-----BEGIN CERTIFICATE----- +MIIFnzCCA4egAwIBAgIUE5kg5Z26V4swShJoSwfNVsJkHbYwDQYJKoZIhvcNAQEL +BQAwXzELMAkGA1UEBhMCVVMxCzAJBgNVBAgMAkNBMRYwFAYDVQQHDA1TYW4gRnJh +bmNpc2NvMRAwDgYDVQQKDAdSb290IENBMRkwFwYDVQQDDBByb290LnBpbmdvcmEu +b3JnMB4XDTIyMTExMDE5MjY1MFoXDTQyMTExMDE5MjY1MFowXzELMAkGA1UEBhMC +VVMxCzAJBgNVBAgMAkNBMRYwFAYDVQQHDA1TYW4gRnJhbmNpc2NvMRAwDgYDVQQK +DAdSb290IENBMRkwFwYDVQQDDBByb290LnBpbmdvcmEub3JnMIICIjANBgkqhkiG +9w0BAQEFAAOCAg8AMIICCgKCAgEA4s1XxwZruaRwuDX1IkM2oxdSdjg7FeUp8lsN +Uix4NdXz8IoQWRzCfFuRBKFHptahutSO6Bbewm9XmU2hHG7aoCqaZqEVQ/3KRLZ4 +mzaNBCzDNgPTmDkz/DZKzOVuyVvbmTOsLn53yxKnFP9MEDIEemqGiM80MmFfCm/o +0vLkjwkRpreMsWPUhrq3igTWRctUYMJAeDsEaaXB1k5ovWICrEylMzslgSNfoBed +NmBpurz+yQddKNMTb/SLYxa7B1uZKDRSIXwwOZPdBDyUdlStUPodNG/OzprN+bRC +oFRB9EFG1m5oPJXQIalePj0dwhXl/bkV4uRxCSZmBZK3fbtLMF+Wkg2voTrn51Yv +lKkzUQoEX6WWtUameZZbUB8TbW2lmANuvGBmvBbj3+4ztmtJPXfJBkckCeUC6bwC +4CKrgB587ElY357Vqv/HmRRC9kxdzpOS9s5CtcqJ3Dg1TmLajyRQkf8wMqk0fhh7 +V+VrPXB030MGABXh5+B2HOsF307vF030v7z+Xp5VRLGBqmDwK0Reo2h8cg9PkMDS +5Qc2zOJVslkJ+QYdkea1ajVpCsFbaC1JPmRWihTllboUqsk9oSS3jcIZ8vW3QKMg +ZbKtVbtVHr3mNGWuVs96iDN5Us3SJ6KGS8sanrAYAAB/NKd1Wl3I0aVtcb6eOONd +edf9+b0CAwEAAaNTMFEwHQYDVR0OBBYEFJ5hR0odQYOtYsY3P18WIC2byI1oMB8G +A1UdIwQYMBaAFJ5hR0odQYOtYsY3P18WIC2byI1oMA8GA1UdEwEB/wQFMAMBAf8w +DQYJKoZIhvcNAQELBQADggIBAIrpAsrPre3R4RY0JmnvomgH+tCSMHb6dW52YrEl +JkEG4cVc5MKs5QfPp8l2d1DngqiOUnOf0MWwWNDidHQZKrWs59j67L8qKN91VQKe +cSNEX3iMFvE59Hr0Ner6Kr09wZLHVVNGcy0FdhWpJdDUGDoQjfL7n7usJyCUqWSq +/pa1I9Is3ZfeQ5f7Ztrdz35vVPj+0BlHXbZM5AZi8Dwf3vXFBlPty3fITpE65cty +cYnbpGto+wDoZj9fkKImjK21QsJdmHwaWRgmXX3WbdFBAbScTjDOc5Mls2VY8rSh ++xLI1KMB0FHSJqrGoFN3uE+G1vJX/hgn98KZKob23yJr2TWr9LHI56sMfN5xdd5A +iOHxYODSrIAi1k+bSlDz6WfEtufoqwBwHiog4nFOXrlHpGO6eUB1QjaQJZwKn2zE +3BjqJOoqbuBMg5XZRjihHcVVuZdU39/zQDwqliNpx3km4FzOiEoBABGzLP+Qt0Ch +cJFS1Yc8ffv616yP4A9qkyogk9YBBvNbDLB7WV8h8p1s4JP3f5aDUlxtAD+E+3aJ +8mrb3P7/0A2QyxlgX4qQOdj++b7GzXDxxLgOimJ4pLo0fdY8KWMeHvZPiMryHkMx +3GSZCHeleSVBCPB2pPCzUqkkKADbjBX3SYJsAMF9uXQAR4U7wojjvAmbt6vJEh6j +TEUG +-----END CERTIFICATE----- diff --git a/pingora-proxy/tests/utils/conf/keys/root.key b/pingora-proxy/tests/utils/conf/keys/root.key new file mode 100644 index 0000000..ce4f21e --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/root.key @@ -0,0 +1,52 @@ +-----BEGIN PRIVATE KEY----- +MIIJQgIBADANBgkqhkiG9w0BAQEFAASCCSwwggkoAgEAAoICAQDizVfHBmu5pHC4 +NfUiQzajF1J2ODsV5SnyWw1SLHg11fPwihBZHMJ8W5EEoUem1qG61I7oFt7Cb1eZ +TaEcbtqgKppmoRVD/cpEtnibNo0ELMM2A9OYOTP8NkrM5W7JW9uZM6wufnfLEqcU +/0wQMgR6aoaIzzQyYV8Kb+jS8uSPCRGmt4yxY9SGureKBNZFy1RgwkB4OwRppcHW +Tmi9YgKsTKUzOyWBI1+gF502YGm6vP7JB10o0xNv9ItjFrsHW5koNFIhfDA5k90E +PJR2VK1Q+h00b87Oms35tEKgVEH0QUbWbmg8ldAhqV4+PR3CFeX9uRXi5HEJJmYF +krd9u0swX5aSDa+hOufnVi+UqTNRCgRfpZa1RqZ5lltQHxNtbaWYA268YGa8FuPf +7jO2a0k9d8kGRyQJ5QLpvALgIquAHnzsSVjfntWq/8eZFEL2TF3Ok5L2zkK1yonc +ODVOYtqPJFCR/zAyqTR+GHtX5Ws9cHTfQwYAFeHn4HYc6wXfTu8XTfS/vP5enlVE +sYGqYPArRF6jaHxyD0+QwNLlBzbM4lWyWQn5Bh2R5rVqNWkKwVtoLUk+ZFaKFOWV +uhSqyT2hJLeNwhny9bdAoyBlsq1Vu1UeveY0Za5Wz3qIM3lSzdInooZLyxqesBgA +AH80p3VaXcjRpW1xvp4441151/35vQIDAQABAoICABm7ytXeOKLbsZ51INc+YRio +MMcRIkMduWCyTBSizxDssbz9LVWvGbIagZ3Q3txjRf54164lyiitkXbng/xB57R8 +oQA8DrmkNisNuSmDSwTKP2wFiyCefPOFBX+yGJvoPEZpwoOT/eugtix/uxWrVy68 +n38uY3HD8pCwme41eRFxqfsMoH4QIbEXxnN2kQliRLSl1cLOj3WdRR0X0HKMiFkc +aTIi5+J7LQJxK3lb/yMdBpuwpjVXncD6MkaP8bCoB/yz0w3RlXcy+8TbSs0SVof1 +mRK2DPUMQ4qtlVGzvbgFIBB8fn9BUFhBa1wMey/mZC4hrgYMfXbYUIMZXpB5i9I+ +kLz4IuTYlKL46IWa+f1WritsC2F/Oog7zuejo2MNGmma+ITReCx2hxB1+H+yl3As +HmXDjp4wDrnTIR38MgIfZmrtSqqvm5zUYsjEBFSleasH/K7uDddwqgYQ6TwUaqVY +eiDsyWELZQY+0JozP9zeE9J2X0HbOvid+fwwns1TPXyTjnPsLdSOCFuBZoWcYfiu +XnFXCEjT3HDjx9ZmzAujm7is86QSkKDZHJB34DTd0eVs8EZyxNqsB748vfigc7ag +1F/quaKYihBY7BKG8dDyJ6m7hyG2j4jHy5zZgG4mEs84n4ETvUSWK1g+vpVgb3vB +MXcK6N8M/vAl+GT3LJOBAoIBAQD44nPNIYK3X1ZWj5zea4i/LucNHRwU7ViuXZJW +c4WxeT2uo/24zVcUZlvgaor5QTlsw2Ab38gc6OxBwRv0v7GRlxufi+xpG/wJDJs3 +ZSAMa4P5l/C06sOIpOq9p0X0Y+amVliAFcQtYQBTBK/APD3HIhm03hW9U1pT2jKV +JnkKaA/eMZPj55wtKEHDuvUcYll7bF5xmp9+/ECSnobxFSE0sFbXWss8CkEVJBdr +OFOlWNUJcGtBJwQi3P/OeOqotfo0BCxZ4Rt51/GFLqWjZC81lfvcVbcC4Ba8LXkI +AlLYI1uPI0ohxIMFd27i6Q92Ih042LzTWfl1MwotBSBM8CNVAoIBAQDpSUW+kCao +HOTPTn7mv8jR8Vp/uosyIqG4guynm65udI55n+y3881v/BrPGG6tsFaLsOTCUdR8 +mxiK0X7d6alSE94H8DREhMnRJjoVJsyvjF6mYleqdjDUFxzkwImu0TWsZz3NhIqv +8kgSEa58JPEinufoKHVYh0J3LLXHYQ3J3sFx3IcO32Afe7pLwuLjEh7j1GWM7auW +V0fpDMUjri/j7NF/4hiBnd7fs/i2nMp03+XxYxrqnInolhJkXxyVbsIwFLb0flbK +EWeGudwMYc3W1f/uV2+OjdNPDY2ve7GntPMRFu7SSvFFjTRdqUhXlBfNUDGWugeT +tng3onk7IUzJAoIBAQDd6PubkR995LGUqKQT5QmefXFhzey19BI4FhJeps4zuYh3 +6JxXZC8ab1HIPPcA21kaUvGkqNlCfaP51PbaOPlYeMUWcqot5dfJMcZLlA0JRev8 +Za8ngJMriPAMfdLv3wtOkHqEaePrGiwx2WHjI1Np9Eu7arEzh9hoH4suVYli7/oG +AWp9sIsd8GEC5fWag06Jr8xduqIvlTb2BAcJee+LjRdBGSFQvUveT7nZzfU23ofE +zMm049baRvaG4GVKXEdkjbwFv6LB9vrP5xGlJ7S4MKzKflqZY7ihvGHH9FptgMko +TSzSAudXvm/OPkOc7zni780dHYJBL2sJTSLJtuupAoIBAHhoS0k6Wdl3YFnnp/Qt +lNdXfWBjxiiQW2xClydDYVq9ajQ4aRPhEG32b1fowmd/lovvN4NcfRH7c0VjL9oW +GkC05GqwfinHZ+s9kckNB6SsDMZQB/OBoV42t8ER536FmPBtMSb8fCCoKq641ZhZ +8OPvpL7c8wRIe/PK7eAEpftFsA62xjbU8GYPlG46HqUY2zy4idmdamzki8crwizS +YQGBX/hjmEZ+V2SbHYoTjyOX1LUsc94YAc48dy27MaOnUS9D4dJ7ywvsw8Rz9bGm +YXm7Zqd8FaY8aY5p7nFepKls6fAuKAH+kF1XrmmRUDdzxn1AIPgs+HAzRAVjJLNy +UpECggEAJLxoXdw6VbOCri1Q8wlA78ngcUEE09yL7qPVGtckCD1OdpJjkcskWoOO +CkMsVtFjJOmQL0Xj/MR4/Zyk7qB3bm3oUWev4sFzdpWswN8fzOA7K4hTVC5BeSoS +0uCiJ9/Up0Yte5Q0sHtO8U5xtnrSPYx5mHjPoh1ZbLem3OeGy1ifm/K8/0697bjX +1UI5OSG/bZUU+bO7oBoZPIXoyMUYvnPBPqfdVI6E+mz1zFILOh9Vl7017Gi9UT9z +hDb8K7IfTDTSgvqS+H7U0X9T8cfoSSWNxRo2DyaJ0aNt36qZzkJNhunvaif5W8f/ +74xuCrejGJzwfA5Uel7mb6rqB/1law== +-----END PRIVATE KEY----- diff --git a/pingora-proxy/tests/utils/conf/keys/root.srl b/pingora-proxy/tests/utils/conf/keys/root.srl new file mode 100644 index 0000000..7c228f6 --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/root.srl @@ -0,0 +1 @@ +1C807FB6A8D925A28881E5B0BD746DD370B4C883 diff --git a/pingora-proxy/tests/utils/conf/keys/server.crt b/pingora-proxy/tests/utils/conf/keys/server.crt new file mode 100644 index 0000000..afb2d1e --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/server.crt @@ -0,0 +1,13 @@ +-----BEGIN CERTIFICATE----- +MIIB9zCCAZ2gAwIBAgIUMI7aLvTxyRFCHhw57hGt4U6yupcwCgYIKoZIzj0EAwIw +ZDELMAkGA1UEBhMCVVMxCzAJBgNVBAgMAkNBMRYwFAYDVQQHDA1TYW4gRnJhbmNp +c2NvMRgwFgYDVQQKDA9DbG91ZGZsYXJlLCBJbmMxFjAUBgNVBAMMDW9wZW5ydXN0 +eS5vcmcwHhcNMjIwNDExMjExMzEzWhcNMzIwNDA4MjExMzEzWjBkMQswCQYDVQQG +EwJVUzELMAkGA1UECAwCQ0ExFjAUBgNVBAcMDVNhbiBGcmFuY2lzY28xGDAWBgNV +BAoMD0Nsb3VkZmxhcmUsIEluYzEWMBQGA1UEAwwNb3BlbnJ1c3R5Lm9yZzBZMBMG +ByqGSM49AgEGCCqGSM49AwEHA0IABNn/9RZtR48knaJD6tk9BdccaJfZ0hGEPn6B +SDXmlmJPhcTBqa4iUwW/ABpGvO3FpJcNWasrX2k+qZLq3g205MKjLTArMCkGA1Ud +EQQiMCCCDyoub3BlbnJ1c3R5Lm9yZ4INb3BlbnJ1c3R5Lm9yZzAKBggqhkjOPQQD +AgNIADBFAiAjISZ9aEKmobKGlT76idO740J6jPaX/hOrm41MLeg69AIhAJqKrSyz +wD/AAF5fR6tXmBqlnpQOmtxfdy13wDr4MT3h +-----END CERTIFICATE----- diff --git a/pingora-proxy/tests/utils/conf/keys/server.csr b/pingora-proxy/tests/utils/conf/keys/server.csr new file mode 100644 index 0000000..ca75dce --- /dev/null +++ b/pingora-proxy/tests/utils/conf/keys/server.csr @@ -0,0 +1,9 @@ +-----BEGIN CERTIFICATE REQUEST----- +MIIBJzCBzgIBADBsMQswCQYDVQQGEwJVUzETMBEGA1UECAwKQ2FsaWZvcm5pYTEW +MBQGA1UEBwwNU2FuIEZyYW5jaXNjbzEYMBYGA1UECgwPQ2xvdWRmbGFyZSwgSW5j +MRYwFAYDVQQDDA1vcGVucnVzdHkub3JnMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcD +QgAE2f/1Fm1HjySdokPq2T0F1xxol9nSEYQ+foFINeaWYk+FxMGpriJTBb8AGka8 +7cWklw1ZqytfaT6pkureDbTkwqAAMAoGCCqGSM49BAMCA0gAMEUCIFyDN8eamnoY +XydKn2oI7qImigxahyCftzjxkIEV5IKbAiEAo5l72X4U+YTVYmyPPnJIj2v5nA1R +RuUfMh5sXzwlwuM= +-----END CERTIFICATE REQUEST----- diff --git a/pingora-proxy/tests/utils/conf/origin/.gitignore b/pingora-proxy/tests/utils/conf/origin/.gitignore new file mode 100644 index 0000000..eed5f2a --- /dev/null +++ b/pingora-proxy/tests/utils/conf/origin/.gitignore @@ -0,0 +1,6 @@ +** +!html +!html/** +!conf +!conf/** +!.gitignore diff --git a/pingora-proxy/tests/utils/conf/origin/conf/keys b/pingora-proxy/tests/utils/conf/origin/conf/keys new file mode 120000 index 0000000..f86df4c --- /dev/null +++ b/pingora-proxy/tests/utils/conf/origin/conf/keys @@ -0,0 +1 @@ +../../keys
\ No newline at end of file diff --git a/pingora-proxy/tests/utils/conf/origin/conf/nginx.conf b/pingora-proxy/tests/utils/conf/origin/conf/nginx.conf new file mode 100644 index 0000000..b0ab281 --- /dev/null +++ b/pingora-proxy/tests/utils/conf/origin/conf/nginx.conf @@ -0,0 +1,432 @@ + +#user nobody; +worker_processes 1; + +error_log /dev/stdout; +#error_log logs/error.log notice; +#error_log logs/error.log info; + +pid /tmp/mock_origin.pid; +master_process off; +daemon off; + +events { + worker_connections 4096; +} + + +http { + #include mime.types; + #default_type application/octet-stream; + + #log_format main '$remote_addr - $remote_user [$time_local] "$request" ' + # '$status $body_bytes_sent "$http_referer" ' + # '"$http_user_agent" "$http_x_forwarded_for"'; + + access_log off; + + sendfile on; + #tcp_nopush on; + + keepalive_timeout 60; + keepalive_requests 99999; + + lua_shared_dict hit_counter 10m; + + #gzip on; + + # mTLS endpoint + server { + listen 8444 ssl http2; + ssl_certificate keys/server.crt; + ssl_certificate_key keys/key.pem; + ssl_protocols TLSv1.2; + ssl_ciphers TLS-AES-128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256; + ssl_client_certificate keys/root.crt; + ssl_verify_client on; + ssl_verify_depth 4; + + location / { + return 200 "hello world"; + } + } + + # secp384r1 endpoint (ECDH and ECDSA) + server { + listen 8445 ssl http2; + ssl_protocols TLSv1.2; + ssl_ciphers TLS-AES-128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA512; + ssl_certificate keys/curve_test.384.crt; + ssl_certificate_key keys/curve_test.384.key.pem; + ssl_ecdh_curve secp384r1; + + location /384 { + return 200 "Happy Friday!"; + } + } + + # secp521r1 endpoint (ECDH and ECDSA) + server { + listen 8446 ssl http2; + ssl_protocols TLSv1.2; + ssl_ciphers TLS-AES-128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA512; + ssl_certificate keys/curve_test.521.crt; + ssl_certificate_key keys/curve_test.521.key.pem; + ssl_ecdh_curve secp521r1; + + location /521 { + return 200 "Happy Monday!"; + } + } + + server { + listen 8000; + # 8001 is used for bad_lb test only to avoid unexpected connection reuse + listen 8001; + listen [::]:8000; + #listen 8443 ssl; + listen unix:/tmp/nginx-test.sock; + listen 8443 ssl http2; + server_name localhost; + + ssl_certificate keys/server.crt; + ssl_certificate_key keys/key.pem; + ssl_protocols TLSv1.2; + ssl_ciphers TLS-AES-128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256; + + # for benchmark + http2_max_requests 999999; + + #charset koi8-r; + + #access_log logs/host.access.log main; + + add_header Origin-Http2 $http2; + + location / { + root ./html; + index index.html index.htm; + } + + # this allows an arbitrary prefix to be included in URLs, so + # that tests can control caching. + location ~ ^/unique/[^/]+(/.*)$ { + rewrite ^/unique/[^/]+(/.*)$ $1 last; + } + + # this serves as an origin hit counter for an arbitrary prefix, which + # then redirects to the rest of the URL like our unique/... endpoint. + location ~ ^/hitcounted/[^/]+(/.*)$ { + rewrite_by_lua_block { + -- Extract specified ID + local _, _, id = string.find(ngx.var.request_uri, "[^/]+/([^/]+)") + + -- Incr hit counter + local hits = ngx.shared.hit_counter + if not hits:get(id) then + hits:safe_set(id, 0, nil) + end + local value = hits:incr(id, 1) + + -- Rewrite URI to the requested destination + local destStartIndex = string.find(ngx.var.request_uri, id) + string.len(id) + local dest = string.sub(ngx.var.request_uri, destStartIndex) + ngx.req.set_uri(dest, true) + } + } + + # this serves the hit count from the hitcounted endpoint + location ~ ^/read_hit_count/[^/]+(/.*)$ { + content_by_lua_block { + -- Find the hit count for the given ID and return it. + local _, _, id = string.find(ngx.var.request_uri, "[^/]+/([^/]+)") + local hits = ngx.shared.hit_counter + ngx.print(hits:get(id) or 0) + } + } + + location /test { + return 200; + } + location /test2 { + return 200 "hello world"; + } + location /test3 { + #return 200; + content_by_lua_block { + ngx.print("hello world") + } + } + + location /test4 { + rewrite_by_lua_block { + ngx.exit(200) + } + #return 201; + + } + + location /now { + header_filter_by_lua_block { + ngx.header["x-epoch"] = ngx.now() + } + return 200 "hello world"; + } + + location /brotli { + header_filter_by_lua_block { + local ae = ngx.req.get_headers()["Accept-Encoding"] + if ae and ae:find("br") then + ngx.header["Content-Encoding"] = "br" + else + return ngx.exit(400) + end + } + content_by_lua_block { + -- brotli compressed 'hello'. + ngx.print("\x0f\x02\x80hello\x03") + } + } + + location /cache_control { + header_filter_by_lua_block { + local h = ngx.req.get_headers() + if h["set-cache-control"] then + ngx.header["Cache-Control"] = h["set-cache-control"] + end + if h["set-cache-tag"] then + ngx.header["Cache-Tag"] = h["set-cache-tag"] + end + if h["set-revalidated"] then + return ngx.exit(304) + end + } + return 200 "hello world"; + } + + location /revalidate_now { + header_filter_by_lua_block { + ngx.header["x-epoch"] = ngx.now() + ngx.header["Last-Modified"] = "Tue, 03 May 2022 01:04:39 GMT" + ngx.header["Etag"] = '"abcd"' + local h = ngx.req.get_headers() + if h["if-modified-since"] or h["if-none-match"] then + -- just assume they match + return ngx.exit(304) + end + } + return 200 "hello world"; + } + + location /revalidate_vary { + header_filter_by_lua_block { + ngx.header["Last-Modified"] = "Tue, 03 May 2022 01:04:39 GMT" + ngx.header["Etag"] = '"abcd"' + local h = ngx.req.get_headers() + if h["set-vary"] then + ngx.header["Vary"] = h["set-vary"] + end + if h["set-no-vary"] then + -- expects proxy to force return no variance with this + ngx.header["Vary"] = "x-no-vary" + end + if not h["x-no-revalidate"] and (h["if-modified-since"] or h["if-none-match"]) then + -- just assume they match + return ngx.exit(304) + end + } + return 200 "hello world"; + } + + location /no_if_headers { + content_by_lua_block { + local h = ngx.req.get_headers() + if h["if-modified-since"] or h["if-none-match"] or h["range"] then + return ngx.exit(400) + end + ngx.say("no if headers detected") + } + } + + location /client_ip { + add_header x-client-ip $remote_addr; + return 200; + } + + # 1. A origin load balancer that rejects reused connetions. + # this is to simulate the common problem when customers LB drops + # connection silently after being keepalived for awhile + # 2. A middlebox might drop the connection if the origin takes too long + # to respond. We should not retry in this case. + location /bad_lb { + rewrite_by_lua_block { + ngx.sleep(1) + if tonumber(ngx.var.connection_requests) > 1 then + -- force drop the request and close the connection + ngx.exit(444) + end + ngx.req.read_body() + local data = ngx.req.get_body_data() + if data then + ngx.say(data) + else + ngx.say("dog!") + end + } + } + + location /duplex/ { + client_max_body_size 1G; + content_by_lua_block { + ngx.print(string.rep("A", 64)) + ngx.print(string.rep("A", 64)) + ngx.print(string.rep("A", 64)) + ngx.print(string.rep("A", 64)) + ngx.print(string.rep("A", 64)) + -- without ngx.req.read_body(), the body will return wihtout waiting for req body + } + } + + location /upload/ { + client_max_body_size 1G; + content_by_lua_block { + ngx.req.read_body() + ngx.print(string.rep("A", 64)) + ngx.print(string.rep("A", 64)) + ngx.print(string.rep("A", 64)) + ngx.print(string.rep("A", 64)) + ngx.print(string.rep("A", 64)) + } + } + + location /tls_verify { + keepalive_timeout 0; + return 200; + } + + location /noreuse { + keepalive_timeout 0; + return 200 "hello world"; + } + + location /set_cookie { + add_header Set-Cookie "chocolate chip"; + return 200 "hello world"; + } + + location /chunked { + content_by_lua_block { + ngx.req.read_body() + ngx.print(string.rep("A", 64)) + } + } + + location /echo { + content_by_lua_block { + ngx.req.read_body() + local data = ngx.req.get_body_data() + if data then + ngx.print(data) + end + } + } + + location /low_ttl { + add_header Cache-Control "public, max-age=0"; + return 200 "low ttl"; + } + + location /connection_die { + content_by_lua_block { + ngx.print(string.rep("A", 5)) + ngx.flush() + ngx.exit(444) -- 444 kills the connection right away + } + } + + location /no_compression { + gzip off; # avoid accidental turn it on at server block + content_by_lua_block { + ngx.print(string.rep("B", 32)) + } + } + + location /file_maker { + gzip off; # fixed content size + content_by_lua_block { + local size = tonumber(ngx.var.http_x_set_size) or 1024 + ngx.print(string.rep("A", size)) + } + } + + location /sleep { + rewrite_by_lua_block { + local sleep_sec = tonumber(ngx.var.http_x_set_sleep) or 1 + ngx.sleep(sleep_sec) + if ngx.var.http_x_abort then + -- force drop the request and close the connection + ngx.exit(444) + end + } + content_by_lua_block { + if ngx.var.http_x_error_header then + ngx.status = 500 + ngx.exit(0) + return + end + ngx.print("hello ") + ngx.flush() + local sleep_sec = tonumber(ngx.var.http_x_set_body_sleep) or 0 + ngx.sleep(sleep_sec) + if ngx.var.http_x_abort_body then + ngx.flush() + -- force drop the request and close the connection + ngx.exit(444) + return + end + ngx.print("world") + } + header_filter_by_lua_block { + if ngx.var.http_x_no_store then + ngx.header["Cache-control"] = "no-store" + end + if ngx.var.http_x_no_stale_revalidate then + ngx.header["Cache-control"] = "stale-while-revalidate=0" + end + if ngx.var.http_x_set_content_length then + ngx.header["Content-Length"] = "11" -- based on "hello world" + end + } + } + + location /slow_body { + content_by_lua_block { + local sleep_sec = tonumber(ngx.var.http_x_set_sleep) or 1 + ngx.flush() + ngx.sleep(sleep_sec) + ngx.print("hello ") + ngx.flush() + ngx.sleep(sleep_sec) + ngx.print("world") + ngx.sleep(sleep_sec) + ngx.print("!") + } + } + + location /content_type { + header_filter_by_lua_block { + ngx.header["Content-Type"] = ngx.var.http_set_content_type + } + return 200 "hello world"; + } + + #error_page 404 /404.html; + + # redirect server error pages to the static page /50x.html + # + error_page 500 502 503 504 /50x.html; + location = /50x.html { + root html; + } + } +} diff --git a/pingora-proxy/tests/utils/conf/origin/html/index.html b/pingora-proxy/tests/utils/conf/origin/html/index.html new file mode 100644 index 0000000..980a0d5 --- /dev/null +++ b/pingora-proxy/tests/utils/conf/origin/html/index.html @@ -0,0 +1 @@ +Hello World! diff --git a/pingora-proxy/tests/utils/mock_origin.rs b/pingora-proxy/tests/utils/mock_origin.rs new file mode 100644 index 0000000..db84f8d --- /dev/null +++ b/pingora-proxy/tests/utils/mock_origin.rs @@ -0,0 +1,36 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use once_cell::sync::Lazy; +use std::process; +use std::{thread, time}; + +pub static MOCK_ORIGIN: Lazy<bool> = Lazy::new(init); + +fn init() -> bool { + // TODO: figure out a way to kill openresty when exiting + process::Command::new("pkill") + .args(["-F", "/tmp/mock_origin.pid"]) + .spawn() + .unwrap(); + let _origin = thread::spawn(|| { + process::Command::new("openresty") + .args(["-p", &format!("{}/origin", super::conf_dir())]) + .output() + .unwrap(); + }); + // wait until the server is up + thread::sleep(time::Duration::from_secs(2)); + true +} diff --git a/pingora-proxy/tests/utils/mod.rs b/pingora-proxy/tests/utils/mod.rs new file mode 100644 index 0000000..6a5a1c9 --- /dev/null +++ b/pingora-proxy/tests/utils/mod.rs @@ -0,0 +1,32 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#![allow(unused)] + +pub mod cert; +pub mod mock_origin; +pub mod server_utils; +pub mod websocket; + +use once_cell::sync::Lazy; +use tokio::runtime::{Builder, Runtime}; + +// for tests with a static connection pool, if we use tokio::test the reactor +// will no longer be associated with the backing pool fds since it's dropped per test +pub static GLOBAL_RUNTIME: Lazy<Runtime> = + Lazy::new(|| Builder::new_multi_thread().enable_all().build().unwrap()); + +pub fn conf_dir() -> String { + format!("{}/tests/utils/conf", env!("CARGO_MANIFEST_DIR")) +} diff --git a/pingora-proxy/tests/utils/server_utils.rs b/pingora-proxy/tests/utils/server_utils.rs new file mode 100644 index 0000000..6862912 --- /dev/null +++ b/pingora-proxy/tests/utils/server_utils.rs @@ -0,0 +1,399 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use super::cert; +use async_trait::async_trait; +use once_cell::sync::Lazy; +use pingora_cache::cache_control::CacheControl; +use pingora_cache::key::HashBinary; +use pingora_cache::VarianceBuilder; +use pingora_cache::{ + eviction::simple_lru::Manager, filters::resp_cacheable, lock::CacheLock, predictor::Predictor, + set_compression_dict_path, CacheMeta, CacheMetaDefaults, CachePhase, MemCache, NoCacheReason, + RespCacheable, +}; +use pingora_core::protocols::Digest; +use pingora_core::server::configuration::Opt; +use pingora_core::services::Service; +use pingora_core::upstreams::peer::HttpPeer; +use pingora_core::utils::CertKey; +use pingora_error::{Error, ErrorSource, Result}; +use pingora_http::{RequestHeader, ResponseHeader}; +use pingora_proxy::{ProxyHttp, Session}; +use std::sync::Arc; +use std::thread; +use structopt::StructOpt; + +pub struct ExampleProxyHttps {} + +#[allow(clippy::upper_case_acronyms)] +pub struct CTX { + conn_reused: bool, +} + +#[async_trait] +impl ProxyHttp for ExampleProxyHttps { + type CTX = CTX; + fn new_ctx(&self) -> Self::CTX { + CTX { conn_reused: false } + } + + async fn upstream_peer( + &self, + session: &mut Session, + _ctx: &mut Self::CTX, + ) -> Result<Box<HttpPeer>> { + let session = session.as_downstream(); + let req = session.req_header(); + + let port = req + .headers + .get("x-port") + .map_or("8443", |v| v.to_str().unwrap()); + let sni = req.headers.get("sni").map_or("", |v| v.to_str().unwrap()); + let alt = req.headers.get("alt").map_or("", |v| v.to_str().unwrap()); + + let client_cert = session.get_header_bytes("client_cert"); + + let mut peer = Box::new(HttpPeer::new( + format!("127.0.0.1:{port}"), + true, + sni.to_string(), + )); + peer.options.alternative_cn = Some(alt.to_string()); + + let verify = session.get_header_bytes("verify") == b"1"; + peer.options.verify_cert = verify; + + let verify_host = session.get_header_bytes("verify_host") == b"1"; + peer.options.verify_hostname = verify_host; + + if matches!(client_cert, b"1" | b"2") { + let (mut certs, key) = if client_cert == b"1" { + (vec![cert::LEAF_CERT.clone()], cert::LEAF_KEY.clone()) + } else { + (vec![cert::LEAF2_CERT.clone()], cert::LEAF2_KEY.clone()) + }; + if session.get_header_bytes("client_intermediate") == b"1" { + certs.push(cert::INTERMEDIATE_CERT.clone()); + } + peer.client_cert_key = Some(Arc::new(CertKey::new(certs, key))); + } + + if session.get_header_bytes("x-h2") == b"true" { + // default is 1, 1 + peer.options.set_http_version(2, 2); + } + + Ok(peer) + } + + async fn response_filter( + &self, + _session: &mut Session, + upstream_response: &mut ResponseHeader, + ctx: &mut Self::CTX, + ) -> Result<()> + where + Self::CTX: Send + Sync, + { + if ctx.conn_reused { + upstream_response.insert_header("x-conn-reuse", "1")?; + } + Ok(()) + } + + async fn upstream_request_filter( + &self, + session: &mut Session, + req: &mut RequestHeader, + _ctx: &mut Self::CTX, + ) -> Result<()> + where + Self::CTX: Send + Sync, + { + let host = session.get_header_bytes("host-override"); + if host != b"" { + req.insert_header("host", host)?; + } + Ok(()) + } + + async fn connected_to_upstream( + &self, + _http_session: &mut Session, + reused: bool, + _peer: &HttpPeer, + _fd: std::os::unix::io::RawFd, + _digest: Option<&Digest>, + ctx: &mut CTX, + ) -> Result<()> { + ctx.conn_reused = reused; + Ok(()) + } +} + +pub struct ExampleProxyHttp {} + +#[async_trait] +impl ProxyHttp for ExampleProxyHttp { + type CTX = (); + fn new_ctx(&self) -> Self::CTX {} + + async fn request_filter(&self, session: &mut Session, _ctx: &mut Self::CTX) -> Result<bool> { + let req = session.req_header(); + let downstream_compression = req.headers.get("x-downstream-compression").is_some(); + if downstream_compression { + session.downstream_compression.adjust_level(6); + } else { + // enable upstream compression for all requests by default + session.upstream_compression.adjust_level(6); + } + + Ok(false) + } + + async fn upstream_peer( + &self, + session: &mut Session, + _ctx: &mut Self::CTX, + ) -> Result<Box<HttpPeer>> { + let req = session.req_header(); + if req.headers.contains_key("x-uds-peer") { + return Ok(Box::new(HttpPeer::new_uds( + "/tmp/nginx-test.sock", + false, + "".to_string(), + ))); + } + let port = req + .headers + .get("x-port") + .map_or("8000", |v| v.to_str().unwrap()); + let peer = Box::new(HttpPeer::new( + format!("127.0.0.1:{}", port), + false, + "".to_string(), + )); + Ok(peer) + } +} + +static CACHE_BACKEND: Lazy<MemCache> = Lazy::new(MemCache::new); +const CACHE_DEFAULT: CacheMetaDefaults = CacheMetaDefaults::new(|_| Some(1), 1, 1); +static CACHE_PREDICTOR: Lazy<Predictor<32>> = Lazy::new(|| Predictor::new(5, None)); +static EVICTION_MANAGER: Lazy<Manager> = Lazy::new(|| Manager::new(8192)); // 8192 bytes +static CACHE_LOCK: Lazy<CacheLock> = + Lazy::new(|| CacheLock::new(std::time::Duration::from_secs(2))); + +pub struct ExampleProxyCache {} + +#[async_trait] +impl ProxyHttp for ExampleProxyCache { + type CTX = (); + fn new_ctx(&self) -> Self::CTX {} + + async fn upstream_peer( + &self, + session: &mut Session, + _ctx: &mut Self::CTX, + ) -> Result<Box<HttpPeer>> { + let req = session.req_header(); + let port = req + .headers + .get("x-port") + .map_or("8000", |v| v.to_str().unwrap()); + let peer = Box::new(HttpPeer::new( + format!("127.0.0.1:{}", port), + false, + "".to_string(), + )); + Ok(peer) + } + + fn request_cache_filter(&self, session: &mut Session, _ctx: &mut Self::CTX) -> Result<()> { + // TODO: only allow GET & HEAD + + if session.get_header_bytes("x-bypass-cache") != b"" { + return Ok(()); + } + + // turn on eviction only for some requests to avoid interference across tests + let eviction = session.req_header().headers.get("x-eviction").map(|_| { + &*EVICTION_MANAGER as &'static (dyn pingora_cache::eviction::EvictionManager + Sync) + }); + let lock = session + .req_header() + .headers + .get("x-lock") + .map(|_| &*CACHE_LOCK); + session + .cache + .enable(&*CACHE_BACKEND, eviction, Some(&*CACHE_PREDICTOR), lock); + + if let Some(max_file_size_hdr) = session + .req_header() + .headers + .get("x-cache-max-file-size-bytes") + { + let bytes = max_file_size_hdr + .to_str() + .unwrap() + .parse::<usize>() + .unwrap(); + session.cache.set_max_file_size_bytes(bytes); + } + + Ok(()) + } + + fn cache_vary_filter( + &self, + _meta: &CacheMeta, + _ctx: &mut Self::CTX, + req: &RequestHeader, + ) -> Option<HashBinary> { + // Here the response is always vary on request header "x-vary-me" if it exists + // in the real world, this callback should check Vary response header to decide + let vary_me = req.headers.get("x-vary-me")?; + let mut key = VarianceBuilder::new(); + key.add_value("headers.x-vary-me", vary_me); + key.finalize() + } + + fn response_cache_filter( + &self, + _session: &Session, + resp: &ResponseHeader, + _ctx: &mut Self::CTX, + ) -> Result<RespCacheable> { + let cc = CacheControl::from_resp_headers(resp); + Ok(resp_cacheable(cc.as_ref(), resp, false, &CACHE_DEFAULT)) + } + + async fn response_filter( + &self, + session: &mut Session, + upstream_response: &mut ResponseHeader, + _ctx: &mut Self::CTX, + ) -> Result<()> + where + Self::CTX: Send + Sync, + { + if session.cache.enabled() { + match session.cache.phase() { + CachePhase::Hit => upstream_response.insert_header("x-cache-status", "hit")?, + CachePhase::Miss => upstream_response.insert_header("x-cache-status", "miss")?, + CachePhase::Stale => upstream_response.insert_header("x-cache-status", "stale")?, + CachePhase::Expired => { + upstream_response.insert_header("x-cache-status", "expired")? + } + CachePhase::Revalidated | CachePhase::RevalidatedNoCache(_) => { + upstream_response.insert_header("x-cache-status", "revalidated")? + } + _ => upstream_response.insert_header("x-cache-status", "invalid")?, + } + } else { + match session.cache.phase() { + CachePhase::Disabled(NoCacheReason::Deferred) => { + upstream_response.insert_header("x-cache-status", "deferred")?; + } + _ => upstream_response.insert_header("x-cache-status", "no-cache")?, + } + } + if let Some(d) = session.cache.lock_duration() { + upstream_response.insert_header("x-cache-lock-time-ms", format!("{}", d.as_millis()))? + } + Ok(()) + } + + fn should_serve_stale( + &self, + _session: &mut Session, + _ctx: &mut Self::CTX, + error: Option<&Error>, // None when it is called during stale while revalidate + ) -> bool { + // enable serve stale while updating + error.map_or(true, |e| e.esource() == &ErrorSource::Upstream) + } + + fn is_purge(&self, session: &Session, _ctx: &Self::CTX) -> bool { + session.req_header().method == "PURGE" + } +} + +fn test_main() { + env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info")).init(); + + let opts: Vec<String> = vec![ + "pingora-proxy".into(), + "-c".into(), + "tests/pingora_conf.yaml".into(), + ]; + let mut my_server = pingora_core::server::Server::new(Some(Opt::from_iter(opts))).unwrap(); + my_server.bootstrap(); + + let mut proxy_service_http = + pingora_proxy::http_proxy_service(&my_server.configuration, ExampleProxyHttp {}); + proxy_service_http.add_tcp("0.0.0.0:6147"); + proxy_service_http.add_uds("/tmp/pingora_proxy.sock", None); + + let mut proxy_service_https = + pingora_proxy::http_proxy_service(&my_server.configuration, ExampleProxyHttps {}); + proxy_service_https.add_tcp("0.0.0.0:6149"); + let cert_path = format!("{}/tests/keys/server.crt", env!("CARGO_MANIFEST_DIR")); + let key_path = format!("{}/tests/keys/key.pem", env!("CARGO_MANIFEST_DIR")); + let mut tls_settings = + pingora_core::listeners::TlsSettings::intermediate(&cert_path, &key_path).unwrap(); + tls_settings.enable_h2(); + proxy_service_https.add_tls_with_settings("0.0.0.0:6150", None, tls_settings); + + let mut proxy_service_cache = + pingora_proxy::http_proxy_service(&my_server.configuration, ExampleProxyCache {}); + proxy_service_cache.add_tcp("0.0.0.0:6148"); + + let services: Vec<Box<dyn Service>> = vec![ + Box::new(proxy_service_http), + Box::new(proxy_service_https), + Box::new(proxy_service_cache), + ]; + + set_compression_dict_path("tests/headers.dict"); + my_server.add_services(services); + my_server.run_forever(); +} + +pub struct Server { + pub handle: thread::JoinHandle<()>, +} + +impl Server { + pub fn start() -> Self { + let server_handle = thread::spawn(|| { + test_main(); + }); + Server { + handle: server_handle, + } + } +} + +// FIXME: this still allows multiple servers to spawn across intergration tests +pub static TEST_SERVER: Lazy<Server> = Lazy::new(Server::start); +use super::mock_origin::MOCK_ORIGIN; + +pub fn init() { + let _ = *TEST_SERVER; + let _ = *MOCK_ORIGIN; +} diff --git a/pingora-proxy/tests/utils/websocket.rs b/pingora-proxy/tests/utils/websocket.rs new file mode 100644 index 0000000..92b35e9 --- /dev/null +++ b/pingora-proxy/tests/utils/websocket.rs @@ -0,0 +1,58 @@ +use std::{io::Error, thread, time::Duration}; + +use futures_util::{SinkExt, StreamExt}; +use log::debug; +use once_cell::sync::Lazy; +use tokio::{ + net::{TcpListener, TcpStream}, + runtime::Builder, +}; + +pub static WS_ECHO: Lazy<bool> = Lazy::new(init); + +fn init() -> bool { + thread::spawn(move || { + let runtime = Builder::new_current_thread() + .thread_name("websocket echo") + .enable_all() + .build() + .unwrap(); + runtime.block_on(async move { + server("127.0.0.1:9283").await.unwrap(); + }) + }); + thread::sleep(Duration::from_millis(200)); + true +} + +async fn server(addr: &str) -> Result<(), Error> { + let listener = TcpListener::bind(&addr).await.unwrap(); + while let Ok((stream, _)) = listener.accept().await { + tokio::spawn(handle_connection(stream)); + } + Ok(()) +} + +async fn handle_connection(stream: TcpStream) { + let mut ws_stream = tokio_tungstenite::accept_async(stream).await.unwrap(); + + while let Some(msg) = ws_stream.next().await { + let msg = msg.unwrap(); + let echo = msg.clone(); + if msg.is_text() { + let data = msg.into_text().unwrap(); + if data.contains("close") { + // abruptly close the stream without WS close; + debug!("abrupt close"); + return; + } else if data.contains("graceful") { + debug!("graceful close"); + ws_stream.close(None).await.unwrap(); + // close() only sends frame + return; + } else { + ws_stream.send(echo).await.unwrap(); + } + } + } +} diff --git a/pingora-runtime/Cargo.toml b/pingora-runtime/Cargo.toml new file mode 100644 index 0000000..7305129 --- /dev/null +++ b/pingora-runtime/Cargo.toml @@ -0,0 +1,30 @@ +[package] +name = "pingora-runtime" +version = "0.1.0" +authors = ["Yuchen Wu <[email protected]>"] +license = "Apache-2.0" +edition = "2021" +repository = "https://github.com/cloudflare/pingora" +categories = ["asynchronous", "network-programming"] +keywords = ["async", "non-blocking", "pingora"] +description = """ +Multithreaded Tokio runtime with the option of disabling work stealing. +""" + +# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html +[lib] +name = "pingora_runtime" +path = "src/lib.rs" + +[dependencies] +rand = "0.8" +tokio = { workspace = true, features = ["rt-multi-thread", "sync", "time"] } +once_cell = { workspace = true } +thread_local = "1" + +[dev-dependencies] +tokio = { workspace = true, features = ["io-util", "net"] } + +[[bench]] +name = "hello" +harness = false diff --git a/pingora-runtime/LICENSE b/pingora-runtime/LICENSE new file mode 100644 index 0000000..d645695 --- /dev/null +++ b/pingora-runtime/LICENSE @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/pingora-runtime/benches/hello.rs b/pingora-runtime/benches/hello.rs new file mode 100644 index 0000000..ef715b3 --- /dev/null +++ b/pingora-runtime/benches/hello.rs @@ -0,0 +1,106 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Pingora tokio runtime. +//! +//! Tokio runtime comes in two flavors: a single-threaded runtime +//! and a multi-threaded one which provides work stealing. +//! Benchmark shows that, compared to the single-threaded runtime, the multi-threaded one +//! has some overhead due to its more sophisticated work steal scheduling. +//! +//! This crate provides a third flavor: a multi-threaded runtime without work stealing. +//! This flavor is as efficient as the single-threaded runtime while allows the async +//! program to use multiple cores. + +use pingora_runtime::{current_handle, Runtime}; +use std::error::Error; +use std::{thread, time}; +use tokio::io::{AsyncReadExt, AsyncWriteExt}; +use tokio::net::TcpListener; + +async fn hello_server(port: usize) -> Result<(), Box<dyn Error + Send>> { + let addr = format!("127.0.0.1:{port}"); + let listener = TcpListener::bind(&addr).await.unwrap(); + println!("Listening on: {}", addr); + + loop { + let (mut socket, _) = listener.accept().await.unwrap(); + socket.set_nodelay(true).unwrap(); + let rt = current_handle(); + rt.spawn(async move { + loop { + let mut buf = [0; 1024]; + let res = socket.read(&mut buf).await; + + let n = match res { + Ok(n) => n, + Err(_) => return, + }; + + if n == 0 { + return; + } + + let _ = socket + .write_all( + b"HTTP/1.1 200 OK\r\ncontent-length: 12\r\nconnection: keep-alive\r\n\r\nHello world!", + ) + .await; + } + }); + } +} + +/* On M1 macbook pro +wrk -t40 -c1000 -d10 http://127.0.0.1:3001 --latency +Running 10s test @ http://127.0.0.1:3001 + 40 threads and 1000 connections + Thread Stats Avg Stdev Max +/- Stdev + Latency 3.53ms 0.87ms 17.12ms 84.99% + Req/Sec 7.09k 1.29k 33.11k 93.30% + Latency Distribution + 50% 3.56ms + 75% 3.95ms + 90% 4.37ms + 99% 5.38ms + 2844034 requests in 10.10s, 203.42MB read +Requests/sec: 281689.27 +Transfer/sec: 20.15MB + +wrk -t40 -c1000 -d10 http://127.0.0.1:3000 --latency +Running 10s test @ http://127.0.0.1:3000 + 40 threads and 1000 connections + Thread Stats Avg Stdev Max +/- Stdev + Latency 12.16ms 16.29ms 112.29ms 83.40% + Req/Sec 5.47k 2.01k 48.85k 83.67% + Latency Distribution + 50% 2.09ms + 75% 20.23ms + 90% 37.11ms + 99% 65.16ms + 2190869 requests in 10.10s, 156.70MB read +Requests/sec: 216918.71 +Transfer/sec: 15.52MB +*/ + +fn main() -> Result<(), Box<dyn std::error::Error + Send + Sync>> { + let rt = Runtime::new_steal(2, ""); + let handle = rt.get_handle(); + handle.spawn(hello_server(3000)); + let rt2 = Runtime::new_no_steal(2, ""); + let handle = rt2.get_handle(); + handle.spawn(hello_server(3001)); + thread::sleep(time::Duration::from_secs(999999999)); + Ok(()) +} diff --git a/pingora-runtime/src/lib.rs b/pingora-runtime/src/lib.rs new file mode 100644 index 0000000..b07ee72 --- /dev/null +++ b/pingora-runtime/src/lib.rs @@ -0,0 +1,265 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Pingora tokio runtime. +//! +//! Tokio runtime comes in two flavors: a single-threaded runtime +//! and a multi-threaded one which provides work stealing. +//! Benchmark shows that, compared to the single-threaded runtime, the multi-threaded one +//! has some overhead due to its more sophisticated work steal scheduling. +//! +//! This crate provides a third flavor: a multi-threaded runtime without work stealing. +//! This flavor is as efficient as the single-threaded runtime while allows the async +//! program to use multiple cores. + +use once_cell::sync::{Lazy, OnceCell}; +use rand::Rng; +use std::sync::Arc; +use std::thread::JoinHandle; +use std::time::Duration; +use thread_local::ThreadLocal; +use tokio::runtime::{Builder, Handle}; +use tokio::sync::oneshot::{channel, Sender}; + +/// Pingora async multi-threaded runtime +/// +/// The `Steal` flavor is effectively tokio multi-threaded runtime. +/// +/// The `NoSteal` flavor is backed by multiple tokio single-threaded runtime. +pub enum Runtime { + Steal(tokio::runtime::Runtime), + NoSteal(NoStealRuntime), +} + +impl Runtime { + /// Create a `Steal` flavor runtime. This just a regular tokio runtime + pub fn new_steal(threads: usize, name: &str) -> Self { + Self::Steal( + Builder::new_multi_thread() + .enable_all() + .worker_threads(threads) + .thread_name(name) + .build() + .unwrap(), + ) + } + + /// Create a `NoSteal` flavor runtime. This is backed by multiple tokio current-thread runtime + pub fn new_no_steal(threads: usize, name: &str) -> Self { + Self::NoSteal(NoStealRuntime::new(threads, name)) + } + + /// Return the &[Handle] of the [Runtime]. + /// For `Steal` flavor, it will just return the &[Handle]. + /// For `NoSteal` flavor, it will return the &[Handle] of a random thread in its pool. + /// So if we want tasks to spawn on all the threads, call this function to get a fresh [Handle] + /// for each async task. + pub fn get_handle(&self) -> &Handle { + match self { + Self::Steal(r) => r.handle(), + Self::NoSteal(r) => r.get_runtime(), + } + } + + /// Call tokio's `shutdown_timeout` of all the runtimes. This function is blocking until + /// all runtimes exit. + pub fn shutdown_timeout(self, timeout: Duration) { + match self { + Self::Steal(r) => r.shutdown_timeout(timeout), + Self::NoSteal(r) => r.shutdown_timeout(timeout), + } + } +} + +// only NoStealRuntime set the pools in thread threads +static CURRENT_HANDLE: Lazy<ThreadLocal<Pools>> = Lazy::new(ThreadLocal::new); + +/// Return the [Handle] of current runtime. +/// If the current thread is under a `Steal` runtime, the current [Handle] is returned. +/// If the current thread is under a `NoSteal` runtime, the [Handle] of a random thread +/// under this runtime is returned. This function will panic if called outside any runtime. +pub fn current_handle() -> Handle { + if let Some(pools) = CURRENT_HANDLE.get() { + // safety: the CURRENT_HANDLE is set when the pool is being initialized in init_pools() + let pools = pools.get().unwrap(); + let mut rng = rand::thread_rng(); + let index = rng.gen_range(0..pools.len()); + pools[index].clone() + } else { + // not NoStealRuntime, just check the current tokio runtime + Handle::current() + } +} + +type Control = (Sender<Duration>, JoinHandle<()>); +type Pools = Arc<OnceCell<Box<[Handle]>>>; + +/// Multi-threaded runtime backed by a pool of single threaded tokio runtime +pub struct NoStealRuntime { + threads: usize, + name: String, + // Lazily init the runtimes so that they are created after pingora + // daemonize itself. Otherwise the runtime threads are lost. + pools: Arc<OnceCell<Box<[Handle]>>>, + controls: OnceCell<Vec<Control>>, +} + +impl NoStealRuntime { + /// Create a new [NoStealRuntime]. Panic if `threads` is 0 + pub fn new(threads: usize, name: &str) -> Self { + assert!(threads != 0); + NoStealRuntime { + threads, + name: name.to_string(), + pools: Arc::new(OnceCell::new()), + controls: OnceCell::new(), + } + } + + fn init_pools(&self) -> (Box<[Handle]>, Vec<Control>) { + let mut pools = Vec::with_capacity(self.threads); + let mut controls = Vec::with_capacity(self.threads); + for _ in 0..self.threads { + let rt = Builder::new_current_thread().enable_all().build().unwrap(); + let handler = rt.handle().clone(); + let (tx, rx) = channel::<Duration>(); + let pools_ref = self.pools.clone(); + let join = std::thread::Builder::new() + .name(self.name.clone()) + .spawn(move || { + CURRENT_HANDLE.get_or(|| pools_ref); + if let Ok(timeout) = rt.block_on(rx) { + rt.shutdown_timeout(timeout); + } // else Err(_): tx is dropped, just exit + }) + .unwrap(); + pools.push(handler); + controls.push((tx, join)); + } + + (pools.into_boxed_slice(), controls) + } + + /// Return the &[Handle] of a random thread of this runtime + pub fn get_runtime(&self) -> &Handle { + let mut rng = rand::thread_rng(); + + let index = rng.gen_range(0..self.threads); + self.get_runtime_at(index) + } + + /// Return the number of threads of this runtime + pub fn threads(&self) -> usize { + self.threads + } + + fn get_pools(&self) -> &[Handle] { + if let Some(p) = self.pools.get() { + p + } else { + // TODO: use a mutex to avoid creating a lot threads only to drop them + let (pools, controls) = self.init_pools(); + // there could be another thread racing with this one to init the pools + match self.pools.try_insert(pools) { + Ok(p) => { + // unwrap to make sure that this is the one that init both pools and controls + self.controls.set(controls).unwrap(); + p + } + // another thread already set it, just return it + Err((p, _my_pools)) => p, + } + } + } + + /// Return the &[Handle] of a given thread of this runtime + pub fn get_runtime_at(&self, index: usize) -> &Handle { + let pools = self.get_pools(); + &pools[index] + } + + /// Call tokio's `shutdown_timeout` of all the runtimes. This function is blocking until + /// all runtimes exit. + pub fn shutdown_timeout(mut self, timeout: Duration) { + if let Some(controls) = self.controls.take() { + let (txs, joins): (Vec<Sender<_>>, Vec<JoinHandle<()>>) = controls.into_iter().unzip(); + for tx in txs { + let _ = tx.send(timeout); // Err() when rx is dropped + } + for join in joins { + let _ = join.join(); // ignore thread error + } + } // else, the controls and the runtimes are not even init yet, just return; + } + + // TODO: runtime metrics +} + +#[test] +fn test_steal_runtime() { + use tokio::time::{sleep, Duration}; + + let rt = Runtime::new_steal(2, "test"); + let handle = rt.get_handle(); + let ret = handle.block_on(async { + sleep(Duration::from_secs(1)).await; + let handle = current_handle(); + let join = handle.spawn(async { + sleep(Duration::from_secs(1)).await; + }); + join.await.unwrap(); + 1 + }); + + assert_eq!(ret, 1); +} + +#[test] +fn test_no_steal_runtime() { + use tokio::time::{sleep, Duration}; + + let rt = Runtime::new_no_steal(2, "test"); + let handle = rt.get_handle(); + let ret = handle.block_on(async { + sleep(Duration::from_secs(1)).await; + let handle = current_handle(); + let join = handle.spawn(async { + sleep(Duration::from_secs(1)).await; + }); + join.await.unwrap(); + 1 + }); + + assert_eq!(ret, 1); +} + +#[test] +fn test_no_steal_shutdown() { + use tokio::time::{sleep, Duration}; + + let rt = Runtime::new_no_steal(2, "test"); + let handle = rt.get_handle(); + let ret = handle.block_on(async { + sleep(Duration::from_secs(1)).await; + let handle = current_handle(); + let join = handle.spawn(async { + sleep(Duration::from_secs(1)).await; + }); + join.await.unwrap(); + 1 + }); + assert_eq!(ret, 1); + + rt.shutdown_timeout(Duration::from_secs(1)); +} diff --git a/pingora-timeout/Cargo.toml b/pingora-timeout/Cargo.toml new file mode 100644 index 0000000..1b271e4 --- /dev/null +++ b/pingora-timeout/Cargo.toml @@ -0,0 +1,37 @@ +[package] +name = "pingora-timeout" +version = "0.1.0" +authors = ["Yuchen Wu <[email protected]>"] +license = "Apache-2.0" +edition = "2021" +repository = "https://github.com/cloudflare/pingora" +categories = ["asynchronous"] +keywords = ["async", "non-blocking", "pingora"] +description = """ +Highly efficient async timer and timeout system for Tokio runtimes. +""" + +# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html +[lib] +name = "pingora_timeout" +path = "src/lib.rs" + +[dependencies] +tokio = { workspace = true, features = [ + "time", + "rt-multi-thread", + "macros", + "sync", +] } +pin-project-lite = "0.2" +futures = "0.3" +once_cell = { workspace = true } +parking_lot = "0.12" +thread_local = "1.0" + +[dev-dependencies] +bencher = "0.1.5" + +[[bench]] +name = "benchmark" +harness = false diff --git a/pingora-timeout/LICENSE b/pingora-timeout/LICENSE new file mode 100644 index 0000000..d645695 --- /dev/null +++ b/pingora-timeout/LICENSE @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/pingora-timeout/benches/benchmark.rs b/pingora-timeout/benches/benchmark.rs new file mode 100644 index 0000000..cd6635d --- /dev/null +++ b/pingora-timeout/benches/benchmark.rs @@ -0,0 +1,169 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use pingora_timeout::*; +use std::time::{Duration, Instant}; +use tokio::time::sleep; +use tokio::time::timeout as tokio_timeout; + +const LOOP_SIZE: u32 = 100000; + +async fn bench_timeout() -> u32 { + let mut n = 0; + for _ in 0..LOOP_SIZE { + let fut = async { 1 }; + let to = timeout(Duration::from_secs(1), fut); + n += to.await.unwrap(); + } + n +} + +async fn bench_tokio_timeout() -> u32 { + let mut n = 0; + for _ in 0..LOOP_SIZE { + let fut = async { 1 }; + let to = tokio_timeout(Duration::from_secs(1), fut); + n += to.await.unwrap(); + } + n +} + +async fn bench_fast_timeout() -> u32 { + let mut n = 0; + for _ in 0..LOOP_SIZE { + let fut = async { 1 }; + let to = fast_timeout::fast_timeout(Duration::from_secs(1), fut); + n += to.await.unwrap(); + } + n +} + +fn bench_tokio_timer() { + let mut list = Vec::with_capacity(LOOP_SIZE as usize); + let before = Instant::now(); + for _ in 0..LOOP_SIZE { + list.push(sleep(Duration::from_secs(1))); + } + let elapsed = before.elapsed(); + println!( + "tokio timer create {:?} total, {:?} avg per iteration", + elapsed, + elapsed / LOOP_SIZE + ); + + let before = Instant::now(); + drop(list); + let elapsed = before.elapsed(); + println!( + "tokio timer drop {:?} total, {:?} avg per iteration", + elapsed, + elapsed / LOOP_SIZE + ); +} + +async fn bench_multi_thread_tokio_timer(threads: usize) { + let mut handlers = vec![]; + for _ in 0..threads { + let handler = tokio::spawn(async { + bench_tokio_timer(); + }); + handlers.push(handler); + } + for thread in handlers { + thread.await.unwrap(); + } +} + +use std::sync::Arc; + +async fn bench_multi_thread_timer(threads: usize, tm: Arc<TimerManager>) { + let mut handlers = vec![]; + for _ in 0..threads { + let tm_ref = tm.clone(); + let handler = tokio::spawn(async move { + bench_timer(&tm_ref); + }); + handlers.push(handler); + } + for thread in handlers { + thread.await.unwrap(); + } +} + +use pingora_timeout::timer::TimerManager; + +fn bench_timer(tm: &TimerManager) { + let mut list = Vec::with_capacity(LOOP_SIZE as usize); + let before = Instant::now(); + for _ in 0..LOOP_SIZE { + list.push(tm.register_timer(Duration::from_secs(1))); + } + let elapsed = before.elapsed(); + println!( + "pingora timer create {:?} total, {:?} avg per iteration", + elapsed, + elapsed / LOOP_SIZE + ); + + let before = Instant::now(); + drop(list); + let elapsed = before.elapsed(); + println!( + "pingora timer drop {:?} total, {:?} avg per iteration", + elapsed, + elapsed / LOOP_SIZE + ); +} + +#[tokio::main(worker_threads = 4)] +async fn main() { + let before = Instant::now(); + bench_timeout().await; + let elapsed = before.elapsed(); + println!( + "pingora timeout {:?} total, {:?} avg per iteration", + elapsed, + elapsed / LOOP_SIZE + ); + + let before = Instant::now(); + bench_fast_timeout().await; + let elapsed = before.elapsed(); + println!( + "pingora fast timeout {:?} total, {:?} avg per iteration", + elapsed, + elapsed / LOOP_SIZE + ); + + let before = Instant::now(); + bench_tokio_timeout().await; + let elapsed = before.elapsed(); + println!( + "tokio timeout {:?} total, {:?} avg per iteration", + elapsed, + elapsed / LOOP_SIZE + ); + + println!("==========================="); + + let tm = pingora_timeout::timer::TimerManager::new(); + bench_timer(&tm); + bench_tokio_timer(); + + println!("==========================="); + + let tm = Arc::new(tm); + bench_multi_thread_timer(4, tm).await; + bench_multi_thread_tokio_timer(4).await; +} diff --git a/pingora-timeout/src/fast_timeout.rs b/pingora-timeout/src/fast_timeout.rs new file mode 100644 index 0000000..5fa7a3d --- /dev/null +++ b/pingora-timeout/src/fast_timeout.rs @@ -0,0 +1,132 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! The fast and more complicated version of pingora-timeout +//! +//! The following optimizations are applied +//! - The timeouts lazily initialize their timer when the Future is pending for the first time. +//! - There is no global lock for creating and cancelling timeouts. +//! - Timeout timers are rounded to the next 10ms tick and timers are shared across all timeouts with the same deadline. +//! +//! In order for this to work, a standalone thread is created to arm the timers, which has its +//! overheads. As a general rule, the benefits of this doesn't outweight the overhead unless +//! there are more than about 100/s timeout() calls in the system. Use regular tokio timeout or +//! [super::tokio_timeout] in the low usage case. + +use super::timer::*; +use super::*; +use once_cell::sync::Lazy; +use std::sync::Arc; + +static TIMER_MANAGER: Lazy<Arc<TimerManager>> = Lazy::new(|| { + let tm = Arc::new(TimerManager::new()); + check_clock_thread(&tm); + tm +}); + +fn check_clock_thread(tm: &Arc<TimerManager>) { + if tm.should_i_start_clock() { + std::thread::Builder::new() + .name("Timer thread".into()) + .spawn(|| TIMER_MANAGER.clock_thread()) + .unwrap(); + } +} + +/// The timeout generated by [fast_timeout()]. +/// +/// Users don't need to interact with this object. +pub struct FastTimeout(Duration); + +impl ToTimeout for FastTimeout { + fn timeout(&self) -> BoxFuture<'static, ()> { + Box::pin(TIMER_MANAGER.register_timer(self.0).poll()) + } + + fn create(d: Duration) -> Self { + FastTimeout(d) + } +} + +/// Similar to [tokio::time::timeout] but more efficient. +pub fn fast_timeout<T>(duration: Duration, future: T) -> Timeout<T, FastTimeout> +where + T: Future, +{ + check_clock_thread(&TIMER_MANAGER); + Timeout::new_with_delay(future, duration) +} + +/// Similar to [tokio::time::sleep] but more efficient. +pub async fn fast_sleep(duration: Duration) { + check_clock_thread(&TIMER_MANAGER); + TIMER_MANAGER.register_timer(duration).poll().await +} + +/// Pause the timer for fork() +/// +/// Because RwLock across fork() is undefined behavior, this function makes sure that no one +/// holds any locks. +/// +/// This function should be called right before fork(). +pub fn pause_for_fork() { + TIMER_MANAGER.pause_for_fork(); +} + +/// Unpause the timer after fork() +/// +/// This function should be called right after fork(). +pub fn unpause() { + TIMER_MANAGER.unpause(); +} + +#[cfg(test)] +mod tests { + use super::*; + use std::time::Duration; + + #[tokio::test] + async fn test_timeout() { + let fut = tokio_sleep(Duration::from_secs(1000)); + let to = fast_timeout(Duration::from_secs(1), fut); + assert!(to.await.is_err()) + } + + #[tokio::test] + async fn test_instantly_return() { + let fut = async { 1 }; + let to = fast_timeout(Duration::from_secs(1), fut); + assert_eq!(to.await.unwrap(), 1) + } + + #[tokio::test] + async fn test_delayed_return() { + let fut = async { + tokio_sleep(Duration::from_secs(1)).await; + 1 + }; + let to = fast_timeout(Duration::from_secs(1000), fut); + assert_eq!(to.await.unwrap(), 1) + } + + #[tokio::test] + async fn test_sleep() { + let fut = async { + fast_sleep(Duration::from_secs(1)).await; + 1 + }; + let to = fast_timeout(Duration::from_secs(1000), fut); + assert_eq!(to.await.unwrap(), 1) + } +} diff --git a/pingora-timeout/src/lib.rs b/pingora-timeout/src/lib.rs new file mode 100644 index 0000000..f3a33dd --- /dev/null +++ b/pingora-timeout/src/lib.rs @@ -0,0 +1,175 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#![warn(clippy::all)] + +//! A drop-in replacement of [tokio::time::timeout] which is much more efficient. +//! +//! Similar to [tokio::time::timeout] but more efficient on busy concurrent IOs where timeouts are +//! created and canceled very frequently. +//! +//! This crate provides the following optimizations +//! - The timeouts lazily initializes their timer when the Future is pending for the first time. +//! - There is no global lock for creating and cancelling timeouts. +//! - Timeout timers are rounded to the next 10ms tick and timers are shared across all timeouts with the same deadline. +//! +//! Benchmark: +//! +//! 438.302µs total, 4ns avg per iteration +//! +//! v.s. Tokio timeout(): +//! +//! 10.716192ms total, 107ns avg per iteration +//! + +pub mod fast_timeout; +pub mod timer; + +pub use fast_timeout::fast_sleep as sleep; +pub use fast_timeout::fast_timeout as timeout; + +use futures::future::BoxFuture; +use pin_project_lite::pin_project; +use std::future::Future; +use std::pin::Pin; +use std::task::{self, Poll}; +use tokio::time::{sleep as tokio_sleep, Duration}; + +/// The interface to start a timeout +/// +/// Users don't need to interact with this trait +pub trait ToTimeout { + fn timeout(&self) -> BoxFuture<'static, ()>; + fn create(d: Duration) -> Self; +} + +/// The timeout generated by [tokio_timeout()]. +/// +/// Users don't need to interact with this object. +pub struct TokioTimeout(Duration); + +impl ToTimeout for TokioTimeout { + fn timeout(&self) -> BoxFuture<'static, ()> { + Box::pin(tokio_sleep(self.0)) + } + + fn create(d: Duration) -> Self { + TokioTimeout(d) + } +} + +/// The error type returned when the timeout is reached. +#[derive(Debug)] +pub struct Elapsed; + +impl std::fmt::Display for Elapsed { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + write!(f, "Timeout Elapsed") + } +} + +impl std::error::Error for Elapsed {} + +/// The [tokio::time::timeout] with just lazy timer initialization. +/// +/// The timer is created the first time the `future` is pending. This avoids unnecessary timer +/// creation and cancellation on busy IOs with a good chance to be already ready (e.g., reading +/// data from TCP where the recv buffer already has a lot data to read right away). +pub fn tokio_timeout<T>(duration: Duration, future: T) -> Timeout<T, TokioTimeout> +where + T: Future, +{ + Timeout::<T, TokioTimeout>::new_with_delay(future, duration) +} + +pin_project! { + /// The timeout future returned by the timeout functions + #[must_use = "futures do nothing unless you `.await` or poll them"] + pub struct Timeout<T, F> { + #[pin] + value: T, + #[pin] + delay: Option<BoxFuture<'static, ()>>, + callback: F, // callback to create the timer + } +} + +impl<T, F> Timeout<T, F> +where + F: ToTimeout, +{ + pub(crate) fn new_with_delay(value: T, d: Duration) -> Timeout<T, F> { + Timeout { + value, + delay: None, + callback: F::create(d), + } + } +} + +impl<T, F> Future for Timeout<T, F> +where + T: Future, + F: ToTimeout, +{ + type Output = Result<T::Output, Elapsed>; + + fn poll(self: Pin<&mut Self>, cx: &mut task::Context<'_>) -> Poll<Self::Output> { + let mut me = self.project(); + + // First, try polling the future + if let Poll::Ready(v) = me.value.poll(cx) { + return Poll::Ready(Ok(v)); + } + + let delay = me + .delay + .get_or_insert_with(|| Box::pin(me.callback.timeout())); + + match delay.as_mut().poll(cx) { + Poll::Pending => Poll::Pending, + Poll::Ready(()) => Poll::Ready(Err(Elapsed {})), + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + use std::time::Duration; + + #[tokio::test] + async fn test_timeout() { + let fut = tokio_sleep(Duration::from_secs(1000)); + let to = timeout(Duration::from_secs(1), fut); + assert!(to.await.is_err()) + } + + #[tokio::test] + async fn test_instantly_return() { + let fut = async { 1 }; + let to = timeout(Duration::from_secs(1), fut); + assert_eq!(to.await.unwrap(), 1) + } + + #[tokio::test] + async fn test_delayed_return() { + let fut = async { + tokio_sleep(Duration::from_secs(1)).await; + 1 + }; + let to = timeout(Duration::from_secs(1000), fut); + assert_eq!(to.await.unwrap(), 1) + } +} diff --git a/pingora-timeout/src/timer.rs b/pingora-timeout/src/timer.rs new file mode 100644 index 0000000..6e25c24 --- /dev/null +++ b/pingora-timeout/src/timer.rs @@ -0,0 +1,328 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! Lightweight timer for systems with high rate of operations with timeout +//! associated with them +//! +//! Users don't need to interact with this module. +//! +//! The idea is to bucket timers into finite time slots so that operations that +//! start and end quickly don't have to create their own timers all the time +//! +//! Benchmark: +//! - create 7.809622ms total, 78ns avg per iteration +//! - drop: 1.348552ms total, 13ns avg per iteration +//! +//! tokio timer: +//! - create 34.317439ms total, 343ns avg per iteration +//! - drop: 10.694154ms total, 106ns avg per iteration + +use parking_lot::RwLock; +use std::collections::BTreeMap; +use std::sync::atomic::{AtomicBool, AtomicI64, Ordering}; +use std::sync::Arc; +use std::time::{Duration, Instant}; +use thread_local::ThreadLocal; +use tokio::sync::Notify; + +const RESOLUTION_MS: u64 = 10; +const RESOLUTION_DURATION: Duration = Duration::from_millis(RESOLUTION_MS); + +// round to the NEXT timestamp based on the resolution +#[inline] +fn round_to(raw: u128, resolution: u128) -> u128 { + raw - 1 + resolution - (raw - 1) % resolution +} +// millisecond resolution as most +#[derive(PartialEq, PartialOrd, Eq, Ord, Clone, Copy, Debug)] +struct Time(u128); + +impl From<u128> for Time { + fn from(raw_ms: u128) -> Self { + Time(round_to(raw_ms, RESOLUTION_MS as u128)) + } +} + +impl From<Duration> for Time { + fn from(d: Duration) -> Self { + Time(round_to(d.as_millis(), RESOLUTION_MS as u128)) + } +} + +impl Time { + pub fn not_after(&self, ts: u128) -> bool { + self.0 <= ts + } +} + +/// the stub for waiting for a timer to be expired. +pub struct TimerStub(Arc<Notify>, Arc<AtomicBool>); + +impl TimerStub { + /// Wait for the timer to expire. + pub async fn poll(self) { + if self.1.load(Ordering::SeqCst) { + return; + } + self.0.notified().await; + } +} + +struct Timer(Arc<Notify>, Arc<AtomicBool>); + +impl Timer { + pub fn new() -> Self { + Timer(Arc::new(Notify::new()), Arc::new(AtomicBool::new(false))) + } + + pub fn fire(&self) { + self.1.store(true, Ordering::SeqCst); + self.0.notify_waiters(); + } + + pub fn subscribe(&self) -> TimerStub { + TimerStub(self.0.clone(), self.1.clone()) + } +} + +/// The object that holds all the timers registered to it. +pub struct TimerManager { + // each thread insert into its local timer tree to avoid lock contention + timers: ThreadLocal<RwLock<BTreeMap<Time, Timer>>>, + zero: Instant, // the reference zero point of Timestamp + // Start a new clock thread if this is -1 or staled. The clock thread should keep updating this + clock_watchdog: AtomicI64, + paused: AtomicBool, +} + +// Consider the clock thread is dead after it fails to update the thread in DELAYS_SEC +const DELAYS_SEC: i64 = 2; // TODO: make sure this value is larger than RESOLUTION_DURATION + +impl Default for TimerManager { + fn default() -> Self { + TimerManager { + timers: ThreadLocal::new(), + zero: Instant::now(), + clock_watchdog: AtomicI64::new(-DELAYS_SEC), + paused: AtomicBool::new(false), + } + } +} + +impl TimerManager { + /// Create a new [TimerManager] + pub fn new() -> Self { + Self::default() + } + + // this thread sleep a resolution time and fire all Timers that a due to fire + pub(crate) fn clock_thread(&self) { + loop { + std::thread::sleep(RESOLUTION_DURATION); + let now = Instant::now() - self.zero; + self.clock_watchdog + .store(now.as_secs() as i64, Ordering::Relaxed); + if self.is_paused_for_fork() { + // just stop acquiring the locks, waiting for fork to happen + continue; + } + let now = now.as_millis(); + // iterate through the timer tree for all threads + for thread_timer in self.timers.iter() { + let mut timers = thread_timer.write(); + // Fire all timers until now + loop { + let key_to_remove = timers.iter().next().and_then(|(k, _)| { + if k.not_after(now) { + Some(*k) + } else { + None + } + }); + if let Some(k) = key_to_remove { + let timer = timers.remove(&k); + // safe to unwrap, the key is from iter().next() + timer.unwrap().fire(); + } else { + break; + } + } + // write lock drops here + } + } + } + + // False if the clock is already started + // If true, the caller must start the clock thread next + pub(crate) fn should_i_start_clock(&self) -> bool { + let Err(prev) = self.is_clock_running() else { + return false; + }; + let now = Instant::now().duration_since(self.zero).as_secs() as i64; + let res = + self.clock_watchdog + .compare_exchange(prev, now, Ordering::SeqCst, Ordering::SeqCst); + res.is_ok() + } + + // Ok(()) if clock is running (watch dog is within DELAYS_SEC of now) + // Err(time) if watch do stopped at `time` + pub(crate) fn is_clock_running(&self) -> Result<(), i64> { + let now = Instant::now().duration_since(self.zero).as_secs() as i64; + let prev = self.clock_watchdog.load(Ordering::SeqCst); + if now < prev + DELAYS_SEC { + Ok(()) + } else { + Err(prev) + } + } + + /// Register a timer. + /// + /// When the timer expires, the [TimerStub] will be notified. + pub fn register_timer(&self, duration: Duration) -> TimerStub { + if self.is_paused_for_fork() { + // Return a dummy TimerStub that will trigger right away. + // This is fine assuming pause_for_fork() is called right before fork(). + // The only possible register_timer() is from another thread which will + // be entirely lost after fork() + // TODO: buffer these register calls instead (without a lock) + let timer = Timer::new(); + timer.fire(); + return timer.subscribe(); + } + let now: Time = (Instant::now() + duration - self.zero).into(); + { + let timers = self.timers.get_or(|| RwLock::new(BTreeMap::new())).read(); + if let Some(t) = timers.get(&now) { + return t.subscribe(); + } + } // drop read lock + + let timer = Timer::new(); + let mut timers = self.timers.get_or(|| RwLock::new(BTreeMap::new())).write(); + // Usually we check if another thread has insert the same node before we get the write lock, + // but because only this thread will insert anything to its local timers tree, there + // is no possible race that can happen. The only other thread is the clock thread who + // only removes timer from the tree + let stub = timer.subscribe(); + timers.insert(now, timer); + stub + } + + fn is_paused_for_fork(&self) -> bool { + self.paused.load(Ordering::SeqCst) + } + + /// Pause the timer for fork() + /// + /// Because RwLock across fork() is undefined behavior, this function makes sure that no one + /// holds any locks. + /// + /// This function should be called right before fork(). + pub fn pause_for_fork(&self) { + self.paused.store(true, Ordering::SeqCst); + // wait for everything to get out of their locks + std::thread::sleep(RESOLUTION_DURATION * 2); + } + + /// Unpause the timer after fork() + /// + /// This function should be called right after fork(). + pub fn unpause(&self) { + self.paused.store(false, Ordering::SeqCst) + } +} + +#[cfg(test)] +mod tests { + use super::*; + use std::sync::Arc; + + #[test] + fn test_round() { + assert_eq!(round_to(30, 10), 30); + assert_eq!(round_to(31, 10), 40); + assert_eq!(round_to(29, 10), 30); + } + + #[test] + fn test_time() { + let t: Time = 128.into(); // t will round to 130 + assert_eq!(t, Duration::from_millis(130).into()); + assert!(!t.not_after(128)); + assert!(!t.not_after(129)); + assert!(t.not_after(130)); + assert!(t.not_after(131)); + } + + #[tokio::test] + async fn test_timer_manager() { + let tm_a = Arc::new(TimerManager::new()); + let tm = tm_a.clone(); + std::thread::spawn(move || tm_a.clock_thread()); + + let now = Instant::now(); + let t1 = tm.register_timer(Duration::from_secs(1)); + let t2 = tm.register_timer(Duration::from_secs(1)); + t1.poll().await; + assert_eq!(now.elapsed().as_secs(), 1); + let now = Instant::now(); + t2.poll().await; + // t2 fired along t1 so no extra wait time + assert_eq!(now.elapsed().as_secs(), 0); + } + + #[test] + fn test_timer_manager_start_check() { + let tm = Arc::new(TimerManager::new()); + assert!(tm.should_i_start_clock()); + assert!(!tm.should_i_start_clock()); + assert!(tm.is_clock_running().is_ok()); + } + + #[test] + fn test_timer_manager_watchdog() { + let tm = Arc::new(TimerManager::new()); + assert!(tm.should_i_start_clock()); + assert!(!tm.should_i_start_clock()); + + // we don't actually start the clock thread, sleep for the watchdog to expire + std::thread::sleep(Duration::from_secs(DELAYS_SEC as u64 + 1)); + assert!(tm.is_clock_running().is_err()); + assert!(tm.should_i_start_clock()); + } + + #[tokio::test] + async fn test_timer_manager_pause() { + let tm_a = Arc::new(TimerManager::new()); + let tm = tm_a.clone(); + std::thread::spawn(move || tm_a.clock_thread()); + + let now = Instant::now(); + let t1 = tm.register_timer(Duration::from_secs(2)); + tm.pause_for_fork(); + // no actual fork happen, we just test that pause and unpause work + + // any timer in this critical section is timed out right away + let t2 = tm.register_timer(Duration::from_secs(2)); + t2.poll().await; + assert_eq!(now.elapsed().as_secs(), 0); + + std::thread::sleep(Duration::from_secs(1)); + tm.unpause(); + t1.poll().await; + assert_eq!(now.elapsed().as_secs(), 2); + } +} diff --git a/pingora/Cargo.toml b/pingora/Cargo.toml new file mode 100644 index 0000000..c269199 --- /dev/null +++ b/pingora/Cargo.toml @@ -0,0 +1,50 @@ +[package] +name = "pingora" +version = "0.1.0" +authors = ["Yuchen Wu <[email protected]>"] +license = "Apache-2.0" +edition = "2021" +repository = "https://github.com/cloudflare/pingora" +description = """ +A framework to build fast, reliable and programmable networked systems at Internet scale. +""" +categories = ["asynchronous", "network-programming"] +keywords = ["async", "proxy", "http", "pingora"] + +# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html + +[lib] +name = "pingora" +path = "src/lib.rs" + +[dependencies] +pingora-core = { version = "0.1.0", path = "../pingora-core" } +pingora-http = { version = "0.1.0", path = "../pingora-http" } +pingora-timeout = { version = "0.1.0", path = "../pingora-timeout" } +pingora-load-balancing = { version = "0.1.0", path = "../pingora-load-balancing", optional = true } +pingora-proxy = { version = "0.1.0", path = "../pingora-proxy", optional = true } +pingora-cache = { version = "0.1.0", path = "../pingora-cache", optional = true } + +[dev-dependencies] +structopt = "0.3" +tokio = { workspace = true, features = ["rt-multi-thread", "signal"] } +matches = "0.1" +env_logger = "0.9" +reqwest = { version = "0.11", features = ["rustls"], default-features = false } +hyperlocal = "0.8" +hyper = "0.14" +jemallocator = "0.5" +async-trait = { workspace = true } +http = { workspace = true } +log = { workspace = true } +prometheus = "0.13" +once_cell = { workspace = true } +bytes = { workspace = true } + +[features] +default = ["openssl"] +openssl = ["pingora-core/openssl"] +boringssl = ["pingora-core/boringssl"] +proxy = ["pingora-proxy"] +lb = ["pingora-load-balancing", "proxy"] +cache = ["pingora-cache"] diff --git a/pingora/LICENSE b/pingora/LICENSE new file mode 100644 index 0000000..d645695 --- /dev/null +++ b/pingora/LICENSE @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/pingora/examples/app/echo.rs b/pingora/examples/app/echo.rs new file mode 100644 index 0000000..61f94e5 --- /dev/null +++ b/pingora/examples/app/echo.rs @@ -0,0 +1,98 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use async_trait::async_trait; +use bytes::Bytes; +use http::{Response, StatusCode}; +use log::debug; +use once_cell::sync::Lazy; +use pingora_timeout::timeout; +use prometheus::{register_int_counter, IntCounter}; +use std::sync::Arc; +use std::time::Duration; +use tokio::io::{AsyncReadExt, AsyncWriteExt}; + +use pingora::apps::http_app::ServeHttp; +use pingora::apps::ServerApp; +use pingora::protocols::http::ServerSession; +use pingora::protocols::Stream; +use pingora::server::ShutdownWatch; + +static REQ_COUNTER: Lazy<IntCounter> = + Lazy::new(|| register_int_counter!("reg_counter", "Number of requests").unwrap()); + +#[derive(Clone)] +pub struct EchoApp; + +#[async_trait] +impl ServerApp for EchoApp { + async fn process_new( + self: &Arc<Self>, + mut io: Stream, + _shutdown: &ShutdownWatch, + ) -> Option<Stream> { + let mut buf = [0; 1024]; + loop { + let n = io.read(&mut buf).await.unwrap(); + if n == 0 { + debug!("session closing"); + return None; + } + io.write_all(&buf[0..n]).await.unwrap(); + io.flush().await.unwrap(); + } + } +} + +pub struct HttpEchoApp; + +#[async_trait] +impl ServeHttp for HttpEchoApp { + async fn response(&self, http_stream: &mut ServerSession) -> Response<Vec<u8>> { + REQ_COUNTER.inc(); + // read timeout of 2s + let read_timeout = 2000; + let body = match timeout( + Duration::from_millis(read_timeout), + http_stream.read_request_body(), + ) + .await + { + Ok(res) => match res.unwrap() { + Some(bytes) => bytes, + None => Bytes::from("no body!"), + }, + Err(_) => { + panic!("Timed out after {:?}ms", read_timeout); + } + }; + + Response::builder() + .status(StatusCode::OK) + .header(http::header::CONTENT_TYPE, "text/html") + .header(http::header::CONTENT_LENGTH, body.len()) + .body(body.to_vec()) + .unwrap() + } +} + +impl EchoApp { + pub fn new() -> Arc<Self> { + Arc::new(EchoApp {}) + } +} + +pub fn new_http_echo_app() -> Arc<HttpEchoApp> { + Arc::new(HttpEchoApp {}) +} diff --git a/pingora/examples/app/mod.rs b/pingora/examples/app/mod.rs new file mode 100644 index 0000000..7395897 --- /dev/null +++ b/pingora/examples/app/mod.rs @@ -0,0 +1,16 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +pub mod echo; +pub mod proxy; diff --git a/pingora/examples/app/proxy.rs b/pingora/examples/app/proxy.rs new file mode 100644 index 0000000..4ac0aae --- /dev/null +++ b/pingora/examples/app/proxy.rs @@ -0,0 +1,104 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use async_trait::async_trait; +use log::debug; + +use std::sync::Arc; +use tokio::io::{AsyncReadExt, AsyncWriteExt}; +use tokio::select; + +use pingora::apps::ServerApp; +use pingora::connectors::TransportConnector; +use pingora::protocols::Stream; +use pingora::server::ShutdownWatch; +use pingora::upstreams::peer::BasicPeer; + +pub struct ProxyApp { + client_connector: TransportConnector, + proxy_to: BasicPeer, +} + +enum DuplexEvent { + DownstreamRead(usize), + UpstreamRead(usize), +} + +impl ProxyApp { + pub fn new(proxy_to: BasicPeer) -> Self { + ProxyApp { + client_connector: TransportConnector::new(None), + proxy_to, + } + } + + async fn duplex(&self, mut server_session: Stream, mut client_session: Stream) { + let mut upstream_buf = [0; 1024]; + let mut downstream_buf = [0; 1024]; + loop { + let downstream_read = server_session.read(&mut upstream_buf); + let upstream_read = client_session.read(&mut downstream_buf); + let event: DuplexEvent; + select! { + n = downstream_read => event + = DuplexEvent::DownstreamRead(n.unwrap()), + n = upstream_read => event + = DuplexEvent::UpstreamRead(n.unwrap()), + } + match event { + DuplexEvent::DownstreamRead(0) => { + debug!("downstream session closing"); + return; + } + DuplexEvent::UpstreamRead(0) => { + debug!("upstream session closing"); + return; + } + DuplexEvent::DownstreamRead(n) => { + client_session.write_all(&upstream_buf[0..n]).await.unwrap(); + client_session.flush().await.unwrap(); + } + DuplexEvent::UpstreamRead(n) => { + server_session + .write_all(&downstream_buf[0..n]) + .await + .unwrap(); + server_session.flush().await.unwrap(); + } + } + } + } +} + +#[async_trait] +impl ServerApp for ProxyApp { + async fn process_new( + self: &Arc<Self>, + io: Stream, + _shutdown: &ShutdownWatch, + ) -> Option<Stream> { + let client_session = self.client_connector.new_stream(&self.proxy_to).await; + + match client_session { + Ok(client_session) => { + self.duplex(io, client_session).await; + None + } + Err(e) => { + debug!("Failed to create client session: {}", e); + None + } + } + } +} diff --git a/pingora/examples/client.rs b/pingora/examples/client.rs new file mode 100644 index 0000000..c235dd1 --- /dev/null +++ b/pingora/examples/client.rs @@ -0,0 +1,40 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use pingora::connectors::http::Connector; +use pingora::upstreams::peer::HttpPeer; +use pingora_http::RequestHeader; + +#[tokio::main] +async fn main() { + let connector = Connector::new(None); + + let mut peer = HttpPeer::new("1.1.1.1:443", true, "one.one.one.one".into()); + peer.options.set_http_version(2, 1); + let (mut http, _reused) = connector.get_http_session(&peer).await.unwrap(); + + let mut new_request = RequestHeader::build("GET", b"/", None).unwrap(); + new_request + .insert_header("Host", "one.one.one.one") + .unwrap(); + http.write_request_header(Box::new(new_request)) + .await + .unwrap(); + // Servers usually don't respond until the full request body is read. + http.finish_request_body().await.unwrap(); + http.read_response_header().await.unwrap(); + println!("{:#?}", http.response_header().unwrap()); + // TODO: continue reading the body + // TODO: return the connection back to the `connector` (or discard it) +} diff --git a/pingora/examples/server.rs b/pingora/examples/server.rs new file mode 100644 index 0000000..be60b3a --- /dev/null +++ b/pingora/examples/server.rs @@ -0,0 +1,163 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#[global_allocator] +static GLOBAL: jemallocator::Jemalloc = jemallocator::Jemalloc; + +use pingora::server::configuration::Opt; +use pingora::server::{Server, ShutdownWatch}; +use pingora::services::background::{background_service, BackgroundService}; +use pingora::services::{listening::Service as ListeningService, Service}; + +use async_trait::async_trait; +use structopt::StructOpt; +use tokio::time::interval; + +use std::time::Duration; +use std::vec::Vec; + +mod app; +mod service; + +pub struct ExampleBackgroundService; +#[async_trait] +impl BackgroundService for ExampleBackgroundService { + async fn start(&self, mut shutdown: ShutdownWatch) { + let mut period = interval(Duration::from_secs(1)); + loop { + tokio::select! { + _ = shutdown.changed() => { + // shutdown + break; + } + _ = period.tick() => { + // do some work + // ... + } + } + } + } +} + +use pingora::tls::pkey::{PKey, Private}; +use pingora::tls::x509::X509; +struct DynamicCert { + cert: X509, + key: PKey<Private>, +} + +impl DynamicCert { + fn new(cert: &str, key: &str) -> Box<Self> { + let cert_bytes = std::fs::read(cert).unwrap(); + let cert = X509::from_pem(&cert_bytes).unwrap(); + + let key_bytes = std::fs::read(key).unwrap(); + let key = PKey::private_key_from_pem(&key_bytes).unwrap(); + Box::new(DynamicCert { cert, key }) + } +} + +#[async_trait] +impl pingora::listeners::TlsAccept for DynamicCert { + async fn certificate_callback(&self, ssl: &mut pingora::tls::ssl::SslRef) { + use pingora::tls::ext; + ext::ssl_use_certificate(ssl, &self.cert).unwrap(); + ext::ssl_use_private_key(ssl, &self.key).unwrap(); + } +} + +const USAGE: &str = r#" +Usage +port 6142: TCP echo server +nc 127.0.0.1 6142 + +port 6143: TLS echo server +openssl s_client -connect 127.0.0.1:6143 + +port 6145: Http echo server +curl http://127.0.0.1:6145 -v -d 'hello' + +port 6148: Https echo server +curl https://127.0.0.1:6148 -vk -d 'hello' + +port 6141: TCP proxy +curl http://127.0.0.1:6141 -v -H 'host: 1.1.1.1' + +port 6144: TLS proxy +curl https://127.0.0.1:6144 -vk -H 'host: one.one.one.one' -o /dev/null + +port 6150: metrics endpoint +curl http://127.0.0.1:6150 +"#; + +pub fn main() { + env_logger::init(); + + print!("{USAGE}"); + + let opt = Some(Opt::from_args()); + let mut my_server = Server::new(opt).unwrap(); + my_server.bootstrap(); + + let cert_path = format!("{}/tests/keys/server.crt", env!("CARGO_MANIFEST_DIR")); + let key_path = format!("{}/tests/keys/key.pem", env!("CARGO_MANIFEST_DIR")); + + let mut echo_service = service::echo::echo_service(); + echo_service.add_tcp("127.0.0.1:6142"); + echo_service + .add_tls("0.0.0.0:6143", &cert_path, &key_path) + .unwrap(); + + let mut echo_service_http = service::echo::echo_service_http(); + echo_service_http.add_tcp("0.0.0.0:6145"); + echo_service_http.add_uds("/tmp/echo.sock", None); + + let dynamic_cert = DynamicCert::new(&cert_path, &key_path); + let mut tls_settings = pingora::listeners::TlsSettings::with_callbacks(dynamic_cert).unwrap(); + // by default intermediate supports both TLS 1.2 and 1.3. We force to tls 1.2 just for the demo + tls_settings + .set_max_proto_version(Some(pingora::tls::ssl::SslVersion::TLS1_2)) + .unwrap(); + tls_settings.enable_h2(); + echo_service_http.add_tls_with_settings("0.0.0.0:6148", None, tls_settings); + + let proxy_service = service::proxy::proxy_service( + "0.0.0.0:6141", // listen + "1.1.1.1:80", // proxy to + ); + + let proxy_service_ssl = service::proxy::proxy_service_tls( + "0.0.0.0:6144", // listen + "1.1.1.1:443", // proxy to + "one.one.one.one", // SNI + &cert_path, + &key_path, + ); + + let mut prometheus_service_http = ListeningService::prometheus_http_service(); + prometheus_service_http.add_tcp("127.0.0.1:6150"); + + let background_service = background_service("example", ExampleBackgroundService {}); + + let services: Vec<Box<dyn Service>> = vec![ + Box::new(echo_service), + Box::new(echo_service_http), + Box::new(proxy_service), + Box::new(proxy_service_ssl), + Box::new(prometheus_service_http), + Box::new(background_service), + ]; + my_server.add_services(services); + my_server.run_forever(); +} diff --git a/pingora/examples/service/echo.rs b/pingora/examples/service/echo.rs new file mode 100644 index 0000000..8126088 --- /dev/null +++ b/pingora/examples/service/echo.rs @@ -0,0 +1,24 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use crate::app::echo::{new_http_echo_app, EchoApp, HttpEchoApp}; +use pingora::services::listening::Service; + +pub fn echo_service() -> Service<EchoApp> { + Service::new("Echo Service".to_string(), EchoApp::new()) +} + +pub fn echo_service_http() -> Service<HttpEchoApp> { + Service::new("Echo Service HTTP".to_string(), new_http_echo_app()) +} diff --git a/pingora/examples/service/mod.rs b/pingora/examples/service/mod.rs new file mode 100644 index 0000000..7395897 --- /dev/null +++ b/pingora/examples/service/mod.rs @@ -0,0 +1,16 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +pub mod echo; +pub mod proxy; diff --git a/pingora/examples/service/proxy.rs b/pingora/examples/service/proxy.rs new file mode 100644 index 0000000..a103b47 --- /dev/null +++ b/pingora/examples/service/proxy.rs @@ -0,0 +1,46 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use crate::app::proxy::ProxyApp; +use pingora_core::listeners::Listeners; +use pingora_core::services::listening::Service; +use pingora_core::upstreams::peer::BasicPeer; +use std::sync::Arc; + +pub fn proxy_service(addr: &str, proxy_addr: &str) -> Service<ProxyApp> { + let proxy_to = BasicPeer::new(proxy_addr); + + Service::with_listeners( + "Proxy Service".to_string(), + Listeners::tcp(addr), + Arc::new(ProxyApp::new(proxy_to)), + ) +} + +pub fn proxy_service_tls( + addr: &str, + proxy_addr: &str, + proxy_sni: &str, + cert_path: &str, + key_path: &str, +) -> Service<ProxyApp> { + let mut proxy_to = BasicPeer::new(proxy_addr); + // set SNI to enable TLS + proxy_to.sni = proxy_sni.into(); + Service::with_listeners( + "Proxy Service TLS".to_string(), + Listeners::tls(addr, cert_path, key_path).unwrap(), + Arc::new(ProxyApp::new(proxy_to)), + ) +} diff --git a/pingora/src/lib.rs b/pingora/src/lib.rs new file mode 100644 index 0000000..c714ca6 --- /dev/null +++ b/pingora/src/lib.rs @@ -0,0 +1,102 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#![warn(clippy::all)] +#![allow(clippy::new_without_default)] +#![allow(clippy::type_complexity)] +#![allow(clippy::match_wild_err_arm)] +#![allow(clippy::missing_safety_doc)] +#![allow(clippy::upper_case_acronyms)] +// This enables the feature that labels modules that are only available with +// certain pingora features +#![cfg_attr(docsrs, feature(doc_cfg))] + +//! # Pingora +//! +//! Pingora is a framework to build fast, reliable and programmable networked systems at Internet scale. +//! +//! # Features +//! - Http 1.x and Http 2 +//! - Modern TLS with OpenSSL or BoringSSL (FIPS compatible) +//! - Zero downtime upgrade +//! +//! # Usage +//! This crate provides low level service and protocol implementation and abstraction. +//! +//! If looking to build a (reverse) proxy, see [`pingora-proxy`](https://docs.rs/pingora-proxy) crate. +//! +//! # features +//! * `openssl`: Using OpenSSL as the internal TLS backend. This feature is default on. +//! * `boringssl`: Switch the internal TLS library from OpenSSL to BoringSSL. This feature will disable `openssl`. +//! * `proxy`: This feature will include and export `pingora_proxy::prelude::*`. +//! * `lb`: This feature will include and export `pingora_load_balancing::prelude::*`. +//! * `cache`: This feature will include and export `pingora_cache::prelude::*`. + +pub use pingora_core::*; + +/// HTTP header objects that preserve http header cases +pub mod http { + pub use pingora_http::*; +} + +#[cfg(feature = "cache")] +#[cfg_attr(docsrs, doc(cfg(feature = "cache")))] +/// Caching services and tooling +pub mod cache { + pub use pingora_cache::*; +} + +#[cfg(feature = "lb")] +#[cfg_attr(docsrs, doc(cfg(feature = "lb")))] +/// Load balancing recipes +pub mod lb { + pub use pingora_load_balancing::*; +} + +#[cfg(feature = "proxy")] +#[cfg_attr(docsrs, doc(cfg(feature = "proxy")))] +/// Load balancing recipes +pub mod proxy { + pub use pingora_proxy::*; +} + +#[cfg(feature = "time")] +#[cfg_attr(docsrs, doc(cfg(feature = "time")))] +/// Timeouts and other useful time utilities +pub mod time { + pub use pingora_timeout::*; +} + +/// A useful set of types for getting started +pub mod prelude { + pub use pingora_core::prelude::*; + pub use pingora_http::prelude::*; + pub use pingora_timeout::*; + + #[cfg(feature = "cache")] + #[cfg_attr(docsrs, doc(cfg(feature = "cache")))] + pub use pingora_cache::prelude::*; + + #[cfg(feature = "lb")] + #[cfg_attr(docsrs, doc(cfg(feature = "lb")))] + pub use pingora_load_balancing::prelude::*; + + #[cfg(feature = "proxy")] + #[cfg_attr(docsrs, doc(cfg(feature = "proxy")))] + pub use pingora_proxy::prelude::*; + + #[cfg(feature = "time")] + #[cfg_attr(docsrs, doc(cfg(feature = "time")))] + pub use pingora_timeout::*; +} diff --git a/pingora/tests/pingora_conf.yaml b/pingora/tests/pingora_conf.yaml new file mode 100644 index 0000000..c21ae15 --- /dev/null +++ b/pingora/tests/pingora_conf.yaml @@ -0,0 +1,5 @@ +--- +version: 1 +client_bind_to_ipv4: + - 127.0.0.2 +ca_file: tests/keys/server.crt
\ No newline at end of file diff --git a/tinyufo/Cargo.toml b/tinyufo/Cargo.toml new file mode 100644 index 0000000..a726715 --- /dev/null +++ b/tinyufo/Cargo.toml @@ -0,0 +1,41 @@ +[package] +name = "TinyUFO" +version = "0.1.0" +authors = ["Yuchen Wu <[email protected]>"] +edition = "2021" +license = "Apache-2.0" +description = "In-memory cache implementation with TinyLFU as the admission policy and S3-FIFO as the eviction policy" +repository = "https://github.com/cloudflare/pingora" +categories = ["algorithms", "caching"] +keywords = ["cache", "pingora"] + +# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html + +[lib] +name = "tinyufo" +path = "src/lib.rs" + +[dependencies] +ahash = { workspace = true } +flurry = "<0.5.0" # Try not to require Rust 1.71 +parking_lot = "0" +crossbeam-queue = "0" + +[dev-dependencies] +rand = "0" +lru = "0" +zipf = "7" +moka = { version = "0", features = ["sync"] } +dhat = "0" + +[[bench]] +name = "bench_perf" +harness = false + +[[bench]] +name = "bench_hit_ratio" +harness = false + +[[bench]] +name = "bench_memory" +harness = false diff --git a/tinyufo/LICENSE b/tinyufo/LICENSE new file mode 100644 index 0000000..d645695 --- /dev/null +++ b/tinyufo/LICENSE @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/tinyufo/README.md b/tinyufo/README.md new file mode 100644 index 0000000..50e2dd3 --- /dev/null +++ b/tinyufo/README.md @@ -0,0 +1,49 @@ +# TinyUFO + +TinyUFO is a fast and efficient in-memory cache. It adopts the state-of-the-art [S3-FIFO](https://s3fifo.com/) as well as [TinyLFU](https://arxiv.org/abs/1512.00727) algorithms to achieve high throughput and high hit ratio as the same time. + +## Usage + +See docs + +## Performance Comparison +We compare TinyUFO with [lru](https://crates.io/crates/lru), the most commonly used cache algorithm and [moka](https://crates.io/crates/moka), another [great](https://github.com/rust-lang/crates.io/pull/3999) cache library that implements TinyLFU. + +### Hit Ratio + +The table below show the cache hit ratio of the compared algorithm under different size of cache, zipf=1. + +|cache size / total assets | TinyUFO | TinyUFO - LRU | TinyUFO - moka (TinyLFU) | +| -------- | ------- | ------- | ------ | +| 0.5% | 45.26% | +14.21pp | -0.33pp +| 1% | 52.35% | +13.19pp | +1.69pp +| 5% | 68.89% | +10.14pp | +1.91pp +| 10% | 75.98% | +8.39pp | +1.59pp +| 25% | 85.34% | +5.39pp | +0.95pp + +Both TinyUFO and moka greatly improves hit ratio from lru. TinyUFO is the one better in this workload. +[This paper](https://dl.acm.org/doi/pdf/10.1145/3600006.3613147) contains more thorough cache performance +evaluations S3-FIFO, which TinyUFO varies from, against many caching algorithms under a variety of workloads. + +### Speed + +The table below shows the number of operations performed per second for each cache library. The tests are performed using 8 threads on a x64 Linux desktop. + +| Setup | TinyUFO | LRU | moka | +| -------- | ------- | ------- | ------ | +| Pure read | 148.7 million ops | 7.0 million ops | 14.1 million ops +| Mixed read/write | 80.9 million ops | 6.8 million ops | 16.6 million ops + +Because of TinyUFO's lock-free design, it greatly outperforms the others. + +### Memory overhead + +The table below show the memory allocation (in bytes) of the compared cache library under certain workloads to store zero-sized assets. + +| cache size | TinyUFO | LRU | moka | +| -------- | ------- | ------- | ------ | +| 100 | 39,409 | 9,408 | 354,376 +| 1000 | 236,053 | 128,512 | 535,888 +| 10000 | 2,290,635 | 1,075,648 | 2,489,088 + +Whether these overheads matter depends on the actual sizes and volume of the assets. The more advanced algorithms are likely to be less memory efficient than the simple LRU.
\ No newline at end of file diff --git a/tinyufo/benches/bench_hit_ratio.rs b/tinyufo/benches/bench_hit_ratio.rs new file mode 100644 index 0000000..72dacd5 --- /dev/null +++ b/tinyufo/benches/bench_hit_ratio.rs @@ -0,0 +1,100 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use rand::prelude::*; +use std::num::NonZeroUsize; + +const ITEMS: usize = 10_000; +const ITERATIONS: usize = 5_000_000; + +fn bench_one(zip_exp: f64, cache_size_percent: f32) { + print!("{zip_exp:.2}, {cache_size_percent:4}\t\t\t"); + let cache_size = (cache_size_percent * ITEMS as f32).round() as usize; + let mut lru = lru::LruCache::<u64, ()>::new(NonZeroUsize::new(cache_size).unwrap()); + let moka = moka::sync::Cache::new(cache_size as u64); + let tinyufo = tinyufo::TinyUfo::new(cache_size, cache_size); + + let mut rng = thread_rng(); + let zipf = zipf::ZipfDistribution::new(ITEMS, zip_exp).unwrap(); + + let mut lru_hit = 0; + let mut moka_hit = 0; + let mut tinyufo_hit = 0; + + for _ in 0..ITERATIONS { + let key = zipf.sample(&mut rng) as u64; + + if lru.get(&key).is_some() { + lru_hit += 1; + } else { + lru.push(key, ()); + } + + if moka.get(&key).is_some() { + moka_hit += 1; + } else { + moka.insert(key, ()); + } + + if tinyufo.get(&key).is_some() { + tinyufo_hit += 1; + } else { + tinyufo.put(key, (), 1); + } + } + + print!("{:.2}%\t\t", lru_hit as f32 / ITERATIONS as f32 * 100.0); + print!("{:.2}%\t\t", moka_hit as f32 / ITERATIONS as f32 * 100.0); + println!("{:.2}%", tinyufo_hit as f32 / ITERATIONS as f32 * 100.0); +} + +/* +cargo bench --bench bench_hit_ratio + +zipf & cache size lru moka TinyUFO +0.90, 0.005 19.23% 33.46% 33.35% +0.90, 0.01 26.21% 37.88% 40.10% +0.90, 0.05 45.59% 55.34% 57.81% +0.90, 0.1 55.73% 64.22% 66.34% +0.90, 0.25 71.18% 77.15% 78.53% +1.00, 0.005 31.09% 45.65% 45.13% +1.00, 0.01 39.17% 50.69% 52.23% +1.00, 0.05 58.73% 66.95% 68.81% +1.00, 0.1 67.57% 74.35% 75.93% +1.00, 0.25 79.91% 84.34% 85.27% +1.05, 0.005 37.68% 51.77% 51.26% +1.05, 0.01 46.11% 57.07% 58.41% +1.05, 0.05 65.04% 72.33% 73.91% +1.05, 0.1 73.11% 78.96% 80.22% +1.05, 0.25 83.77% 87.45% 88.16% +1.10, 0.005 44.48% 57.86% 57.25% +1.10, 0.01 52.97% 63.18% 64.23% +1.10, 0.05 70.94% 77.27% 78.57% +1.10, 0.1 78.11% 83.05% 84.06% +1.10, 0.25 87.08% 90.06% 90.62% +1.50, 0.005 85.25% 89.89% 89.68% +1.50, 0.01 89.88% 92.79% 92.94% +1.50, 0.05 96.04% 97.09% 97.25% +1.50, 0.1 97.52% 98.17% 98.26% +1.50, 0.25 98.81% 99.09% 99.10% + */ + +fn main() { + println!("zipf & cache size\t\tlru\t\tmoka\t\tTinyUFO",); + for zif_exp in [0.9, 1.0, 1.05, 1.1, 1.5] { + for cache_capacity in [0.005, 0.01, 0.05, 0.1, 0.25] { + bench_one(zif_exp, cache_capacity); + } + } +} diff --git a/tinyufo/benches/bench_memory.rs b/tinyufo/benches/bench_memory.rs new file mode 100644 index 0000000..e55a561 --- /dev/null +++ b/tinyufo/benches/bench_memory.rs @@ -0,0 +1,120 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#[global_allocator] +static ALLOC: dhat::Alloc = dhat::Alloc; + +use rand::prelude::*; +use std::num::NonZeroUsize; + +const ITERATIONS: usize = 5_000_000; + +fn bench_lru(zip_exp: f64, items: usize, cache_size_percent: f32) { + let cache_size = (cache_size_percent * items as f32).round() as usize; + let mut lru = lru::LruCache::<u64, ()>::new(NonZeroUsize::new(cache_size).unwrap()); + + let mut rng = thread_rng(); + let zipf = zipf::ZipfDistribution::new(items, zip_exp).unwrap(); + + for _ in 0..ITERATIONS { + let key = zipf.sample(&mut rng) as u64; + + if lru.get(&key).is_none() { + lru.push(key, ()); + } + } +} + +fn bench_moka(zip_exp: f64, items: usize, cache_size_percent: f32) { + let cache_size = (cache_size_percent * items as f32).round() as usize; + let moka = moka::sync::Cache::new(cache_size as u64); + + let mut rng = thread_rng(); + let zipf = zipf::ZipfDistribution::new(items, zip_exp).unwrap(); + + for _ in 0..ITERATIONS { + let key = zipf.sample(&mut rng) as u64; + + if moka.get(&key).is_none() { + moka.insert(key, ()); + } + } +} + +fn bench_tinyufo(zip_exp: f64, items: usize, cache_size_percent: f32) { + let cache_size = (cache_size_percent * items as f32).round() as usize; + let tinyufo = tinyufo::TinyUfo::new(cache_size, (cache_size as f32 * 1.0) as usize); + + let mut rng = thread_rng(); + let zipf = zipf::ZipfDistribution::new(items, zip_exp).unwrap(); + + for _ in 0..ITERATIONS { + let key = zipf.sample(&mut rng) as u64; + + if tinyufo.get(&key).is_none() { + tinyufo.put(key, (), 1); + } + } +} + +/* +cargo bench --bench bench_memory + +total items 1000, cache size 10% +lru +dhat: At t-gmax: 9,408 bytes in 106 blocks +moka +dhat: At t-gmax: 354,232 bytes in 1,581 blocks +TinyUFO +dhat: At t-gmax: 37,337 bytes in 351 blocks + +total items 10000, cache size 10% +lru +dhat: At t-gmax: 128,512 bytes in 1,004 blocks +moka +dhat: At t-gmax: 535,320 bytes in 7,278 blocks +TinyUFO +dhat: At t-gmax: 236,053 bytes in 2,182 blocks + +total items 100000, cache size 10% +lru +dhat: At t-gmax: 1,075,648 bytes in 10,004 blocks +moka +dhat: At t-gmax: 2,489,088 bytes in 62,374 blocks +TinyUFO +dhat: At t-gmax: 2,290,635 bytes in 20,467 blocks +*/ + +fn main() { + for items in [1000, 10_000, 100_000] { + println!("\ntotal items {items}, cache size 10%"); + { + let _profiler = dhat::Profiler::new_heap(); + bench_lru(1.05, items, 0.1); + println!("lru"); + } + + { + let _profiler = dhat::Profiler::new_heap(); + bench_moka(1.05, items, 0.1); + println!("\nmoka"); + } + + { + let _profiler = dhat::Profiler::new_heap(); + bench_tinyufo(1.05, items, 0.1); + println!("\nTinyUFO"); + } + } +} diff --git a/tinyufo/benches/bench_perf.rs b/tinyufo/benches/bench_perf.rs new file mode 100644 index 0000000..1295fb2 --- /dev/null +++ b/tinyufo/benches/bench_perf.rs @@ -0,0 +1,290 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use rand::prelude::*; +use std::num::NonZeroUsize; +use std::sync::Mutex; +use std::thread; +use std::time::Instant; + +const ITEMS: usize = 100; + +const ITERATIONS: usize = 5_000_000; +const THREADS: usize = 8; + +/* +cargo bench --bench bench_perf + +Note: the performance number vary a lot on different planform, CPU and CPU arch +Below is from Linux + Ryzen 5 7600 CPU + +lru read total 150.423567ms, 30ns avg per operation, 33239472 ops per second +moka read total 462.133322ms, 92ns avg per operation, 10819389 ops per second +tinyufo read total 199.007359ms, 39ns avg per operation, 25124698 ops per second + +lru read total 5.402631847s, 1.08µs avg per operation, 925474 ops per second +... +total 6960329 ops per second + +moka read total 2.742258211s, 548ns avg per operation, 1823314 ops per second +... +total 14072430 ops per second + +tinyufo read total 208.346855ms, 41ns avg per operation, 23998444 ops per second +... +total 148691408 ops per second + +lru mixed read/write 5.500309876s, 1.1µs avg per operation, 909039 ops per second, 407431 misses +... +total 6846743 ops per second + +moka mixed read/write 2.368500882s, 473ns avg per operation, 2111040 ops per second 279324 misses +... +total 16557962 ops per second + +tinyufo mixed read/write 456.134531ms, 91ns avg per operation, 10961678 ops per second, 294977 misses +... +total 80865792 ops per second +*/ + +fn main() { + // we don't bench eviction here so make the caches large enough to hold all + let lru = Mutex::new(lru::LruCache::<u64, ()>::unbounded()); + let moka = moka::sync::Cache::new(ITEMS as u64 + 10); + let tinyufo = tinyufo::TinyUfo::new(ITEMS + 10, 10); + + // populate first, then we bench access/promotion + for i in 0..ITEMS { + lru.lock().unwrap().put(i as u64, ()); + moka.insert(i as u64, ()); + tinyufo.put(i as u64, (), 1); + } + + // single thread + let mut rng = thread_rng(); + let zipf = zipf::ZipfDistribution::new(ITEMS, 1.03).unwrap(); + + let before = Instant::now(); + for _ in 0..ITERATIONS { + lru.lock().unwrap().get(&(zipf.sample(&mut rng) as u64)); + } + let elapsed = before.elapsed(); + println!( + "lru read total {elapsed:?}, {:?} avg per operation, {} ops per second", + elapsed / ITERATIONS as u32, + (ITERATIONS as f32 / elapsed.as_secs_f32()) as u32 + ); + + let before = Instant::now(); + for _ in 0..ITERATIONS { + moka.get(&(zipf.sample(&mut rng) as u64)); + } + let elapsed = before.elapsed(); + println!( + "moka read total {elapsed:?}, {:?} avg per operation, {} ops per second", + elapsed / ITERATIONS as u32, + (ITERATIONS as f32 / elapsed.as_secs_f32()) as u32 + ); + + let before = Instant::now(); + for _ in 0..ITERATIONS { + tinyufo.get(&(zipf.sample(&mut rng) as u64)); + } + let elapsed = before.elapsed(); + println!( + "tinyufo read total {elapsed:?}, {:?} avg per operation, {} ops per second", + elapsed / ITERATIONS as u32, + (ITERATIONS as f32 / elapsed.as_secs_f32()) as u32 + ); + + // concurrent + + let before = Instant::now(); + thread::scope(|s| { + for _ in 0..THREADS { + s.spawn(|| { + let mut rng = thread_rng(); + let zipf = zipf::ZipfDistribution::new(ITEMS, 1.03).unwrap(); + let before = Instant::now(); + for _ in 0..ITERATIONS { + lru.lock().unwrap().get(&(zipf.sample(&mut rng) as u64)); + } + let elapsed = before.elapsed(); + println!( + "lru read total {elapsed:?}, {:?} avg per operation, {} ops per second", + elapsed / ITERATIONS as u32, + (ITERATIONS as f32 / elapsed.as_secs_f32()) as u32 + ); + }); + } + }); + let elapsed = before.elapsed(); + println!( + "total {} ops per second", + (ITERATIONS as f32 * THREADS as f32 / elapsed.as_secs_f32()) as u32 + ); + + let before = Instant::now(); + thread::scope(|s| { + for _ in 0..THREADS { + s.spawn(|| { + let mut rng = thread_rng(); + let zipf = zipf::ZipfDistribution::new(ITEMS, 1.03).unwrap(); + let before = Instant::now(); + for _ in 0..ITERATIONS { + moka.get(&(zipf.sample(&mut rng) as u64)); + } + let elapsed = before.elapsed(); + println!( + "moka read total {elapsed:?}, {:?} avg per operation, {} ops per second", + elapsed / ITERATIONS as u32, + (ITERATIONS as f32 / elapsed.as_secs_f32()) as u32 + ); + }); + } + }); + let elapsed = before.elapsed(); + println!( + "total {} ops per second", + (ITERATIONS as f32 * THREADS as f32 / elapsed.as_secs_f32()) as u32 + ); + + let before = Instant::now(); + thread::scope(|s| { + for _ in 0..THREADS { + s.spawn(|| { + let mut rng = thread_rng(); + let zipf = zipf::ZipfDistribution::new(ITEMS, 1.03).unwrap(); + let before = Instant::now(); + for _ in 0..ITERATIONS { + tinyufo.get(&(zipf.sample(&mut rng) as u64)); + } + let elapsed = before.elapsed(); + println!( + "tinyufo read total {elapsed:?}, {:?} avg per operation, {} ops per second", + elapsed / ITERATIONS as u32, + (ITERATIONS as f32 / elapsed.as_secs_f32()) as u32 + ); + }); + } + }); + let elapsed = before.elapsed(); + println!( + "total {} ops per second", + (ITERATIONS as f32 * THREADS as f32 / elapsed.as_secs_f32()) as u32 + ); + + ///// bench mixed read and write ///// + const CACHE_SIZE: usize = 1000; + let items: usize = 10000; + const ZIPF_EXP: f64 = 1.3; + + let lru = Mutex::new(lru::LruCache::<u64, ()>::new( + NonZeroUsize::new(CACHE_SIZE).unwrap(), + )); + let before = Instant::now(); + thread::scope(|s| { + for _ in 0..THREADS { + s.spawn(|| { + let mut miss_count = 0; + let mut rng = thread_rng(); + let zipf = zipf::ZipfDistribution::new(items, ZIPF_EXP).unwrap(); + let before = Instant::now(); + for _ in 0..ITERATIONS { + let key = zipf.sample(&mut rng) as u64; + let mut lru = lru.lock().unwrap(); + if lru.get(&key).is_none() { + lru.put(key, ()); + miss_count += 1; + } + } + let elapsed = before.elapsed(); + println!( + "lru mixed read/write {elapsed:?}, {:?} avg per operation, {} ops per second, {miss_count} misses", + elapsed / ITERATIONS as u32, + (ITERATIONS as f32 / elapsed.as_secs_f32()) as u32 + ); + }); + } + }); + let elapsed = before.elapsed(); + println!( + "total {} ops per second", + (ITERATIONS as f32 * THREADS as f32 / elapsed.as_secs_f32()) as u32 + ); + + let moka = moka::sync::Cache::new(CACHE_SIZE as u64); + + let before = Instant::now(); + thread::scope(|s| { + for _ in 0..THREADS { + s.spawn(|| { + let mut miss_count = 0; + let mut rng = thread_rng(); + let zipf = zipf::ZipfDistribution::new(items, ZIPF_EXP).unwrap(); + let before = Instant::now(); + for _ in 0..ITERATIONS { + let key = zipf.sample(&mut rng) as u64; + if moka.get(&key).is_none() { + moka.insert(key, ()); + miss_count += 1; + } + } + let elapsed = before.elapsed(); + println!( + "moka mixed read/write {elapsed:?}, {:?} avg per operation, {} ops per second {miss_count} misses", + elapsed / ITERATIONS as u32, + (ITERATIONS as f32 / elapsed.as_secs_f32()) as u32 + ); + }); + } + }); + let elapsed = before.elapsed(); + println!( + "total {} ops per second", + (ITERATIONS as f32 * THREADS as f32 / elapsed.as_secs_f32()) as u32 + ); + + let tinyufo = tinyufo::TinyUfo::new(CACHE_SIZE, CACHE_SIZE); + let before = Instant::now(); + thread::scope(|s| { + for _ in 0..THREADS { + s.spawn(|| { + let mut miss_count = 0; + let mut rng = thread_rng(); + let zipf = zipf::ZipfDistribution::new(items, ZIPF_EXP).unwrap(); + let before = Instant::now(); + for _ in 0..ITERATIONS { + let key = zipf.sample(&mut rng) as u64; + if tinyufo.get(&key).is_none() { + tinyufo.put(key, (), 1); + miss_count +=1; + } + } + let elapsed = before.elapsed(); + println!( + "tinyufo mixed read/write {elapsed:?}, {:?} avg per operation, {} ops per second, {miss_count} misses", + elapsed / ITERATIONS as u32, + (ITERATIONS as f32 / elapsed.as_secs_f32()) as u32, + ); + }); + } + }); + + let elapsed = before.elapsed(); + println!( + "total {} ops per second", + (ITERATIONS as f32 * THREADS as f32 / elapsed.as_secs_f32()) as u32 + ); +} diff --git a/tinyufo/src/estimation.rs b/tinyufo/src/estimation.rs new file mode 100644 index 0000000..19d84d4 --- /dev/null +++ b/tinyufo/src/estimation.rs @@ -0,0 +1,188 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +use ahash::RandomState; +use std::hash::Hash; +use std::sync::atomic::{AtomicU8, AtomicUsize, Ordering}; + +struct Estimator { + estimator: Box<[(Box<[AtomicU8]>, RandomState)]>, +} + +impl Estimator { + fn optimal_paras(items: usize) -> (usize, usize) { + use std::cmp::max; + // derived from https://en.wikipedia.org/wiki/Count%E2%80%93min_sketch + // width = ceil(e / ε) + // depth = ceil(ln(1 − δ) / ln(1 / 2)) + let error_range = 1.0 / (items as f64); + let failure_probability = 1.0 / (items as f64); + ( + max((std::f64::consts::E / error_range).ceil() as usize, 16), + max((failure_probability.ln() / 0.5f64.ln()).ceil() as usize, 2), + ) + } + + fn optimal(items: usize) -> Self { + let (slots, hashes) = Self::optimal_paras(items); + Self::new(hashes, slots) + } + + /// Create a new `Estimator` with the given amount of hashes and columns (slots). + pub fn new(hashes: usize, slots: usize) -> Self { + let mut estimator = Vec::with_capacity(hashes); + for _ in 0..hashes { + let mut slot = Vec::with_capacity(slots); + for _ in 0..slots { + slot.push(AtomicU8::new(0)); + } + estimator.push((slot.into_boxed_slice(), RandomState::new())); + } + + Estimator { + estimator: estimator.into_boxed_slice(), + } + } + + pub fn incr<T: Hash>(&self, key: T) -> u8 { + let mut min = u8::MAX; + for (slot, hasher) in self.estimator.iter() { + let hash = hasher.hash_one(&key) as usize; + let counter = &slot[hash % slot.len()]; + let (_current, new) = incr_no_overflow(counter); + min = std::cmp::min(min, new); + } + min + } + + /// Get the estimated frequency of `key`. + pub fn get<T: Hash>(&self, key: T) -> u8 { + let mut min = u8::MAX; + for (slot, hasher) in self.estimator.iter() { + let hash = hasher.hash_one(&key) as usize; + let counter = &slot[hash % slot.len()]; + let current = counter.load(Ordering::Relaxed); + min = std::cmp::min(min, current); + } + min + } + + /// right shift all values inside this `Estimator`. + pub fn age(&self, shift: u8) { + for (slot, _) in self.estimator.iter() { + for counter in slot.iter() { + // we don't CAS because the only update between the load and store + // is fetch_add(1), which should be fine to miss/ignore + let c = counter.load(Ordering::Relaxed); + counter.store(c >> shift, Ordering::Relaxed); + } + } + } +} + +fn incr_no_overflow(var: &AtomicU8) -> (u8, u8) { + loop { + let current = var.load(Ordering::Relaxed); + if current == u8::MAX { + return (current, current); + } + let new = if current == u8::MAX - 1 { + u8::MAX + } else { + current + 1 + }; + if let Err(new) = var.compare_exchange(current, new, Ordering::Acquire, Ordering::Relaxed) { + // someone else beat us to it + if new == u8::MAX { + // already max + return (current, new); + } // else, try again + } else { + return (current, new); + } + } +} + +// bare-minimum TinyLfu with CM-Sketch, no doorkeeper for now +pub(crate) struct TinyLfu { + estimator: Estimator, + window_counter: AtomicUsize, + window_limit: usize, +} + +impl TinyLfu { + pub fn get<T: Hash>(&self, key: T) -> u8 { + self.estimator.get(key) + } + + pub fn incr<T: Hash>(&self, key: T) -> u8 { + let window_size = self.window_counter.fetch_add(1, Ordering::Relaxed); + // When window_size concurrently increases, only one resets the window and age the estimator. + // > self.window_limit * 2 is a safety net in case for whatever reason window_size grows + // out of control + if window_size == self.window_limit || window_size > self.window_limit * 2 { + self.window_counter.store(0, Ordering::Relaxed); + self.estimator.age(1); // right shift 1 bit + } + self.estimator.incr(key) + } + + // because we use 8-bits counters, window size can be 256 * the cache size + pub fn new(cache_size: usize) -> Self { + Self { + estimator: Estimator::optimal(cache_size), + window_counter: Default::default(), + // 8x: just a heuristic to balance the memory usage and accuracy + window_limit: cache_size * 8, + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_cmk_paras() { + let (slots, hashes) = Estimator::optimal_paras(1_000_000); + // just smoke check some standard input + assert_eq!(slots, 2718282); + assert_eq!(hashes, 20); + } + + #[test] + fn test_tiny_lfu() { + let tiny = TinyLfu::new(1); + assert_eq!(tiny.get(1), 0); + assert_eq!(tiny.incr(1), 1); + assert_eq!(tiny.incr(1), 2); + assert_eq!(tiny.get(1), 2); + + assert_eq!(tiny.get(2), 0); + assert_eq!(tiny.incr(2), 1); + assert_eq!(tiny.incr(2), 2); + assert_eq!(tiny.get(2), 2); + + assert_eq!(tiny.incr(3), 1); + assert_eq!(tiny.incr(3), 2); + assert_eq!(tiny.incr(3), 3); + assert_eq!(tiny.incr(3), 4); + + // 8 incr(), now reset + + assert_eq!(tiny.incr(3), 3); + assert_eq!(tiny.incr(1), 2); + assert_eq!(tiny.incr(2), 2); + } +} diff --git a/tinyufo/src/lib.rs b/tinyufo/src/lib.rs new file mode 100644 index 0000000..879e373 --- /dev/null +++ b/tinyufo/src/lib.rs @@ -0,0 +1,632 @@ +// Copyright 2024 Cloudflare, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +//! A In-memory cache implementation with TinyLFU as the admission policy and [S3-FIFO](https://s3fifo.com/) as the eviction policy. +//! +//! TinyUFO improves cache hit ratio noticeably compared to LRU. +//! +//! TinyUFO is lock-free. It is very fast in the systems with a lot concurrent reads and/or writes + +use ahash::RandomState; +use crossbeam_queue::SegQueue; +use flurry::HashMap; +use std::marker::PhantomData; +use std::sync::atomic::AtomicUsize; +use std::sync::atomic::{ + AtomicBool, AtomicU8, + Ordering::{Acquire, Relaxed, SeqCst}, +}; +mod estimation; +use estimation::TinyLfu; +use std::hash::Hash; + +const SMALL: bool = false; +const MAIN: bool = true; + +// Indicate which queue an item is located +#[derive(Debug, Default)] +struct Location(AtomicBool); + +impl Location { + fn new_small() -> Self { + Self(AtomicBool::new(SMALL)) + } + + fn value(&self) -> bool { + self.0.load(Relaxed) + } + + fn is_main(&self) -> bool { + self.value() + } + + fn move_to_main(&self) { + self.0.store(true, Relaxed); + } +} + +// We have 8 bits to spare but we still cap at 3. This is to make sure that the main queue +// in the worst case can find something to evict quickly +const USES_CAP: u8 = 3; + +#[derive(Debug, Default)] +struct Uses(AtomicU8); + +impl Uses { + pub fn inc_uses(&self) { + loop { + let uses = self.uses(); + if uses >= USES_CAP { + return; + } + if let Err(new) = self.0.compare_exchange(uses, uses + 1, Acquire, Relaxed) { + // someone else beat us to it + if new >= USES_CAP { + // already above cap + return; + } // else, try again + } else { + return; + } + } + } + + // decrease uses, return the previous value + pub fn decr_uses(&self) -> u8 { + loop { + let uses = self.uses(); + if uses == 0 { + return 0; + } + if let Err(new) = self.0.compare_exchange(uses, uses - 1, Acquire, Relaxed) { + // someone else beat us to it + if new == 0 { + return 0; + } // else, try again + } else { + return uses; + } + } + } + + pub fn uses(&self) -> u8 { + self.0.load(Relaxed) + } +} + +type Key = u64; +type Weight = u16; + +/// The key-value pair returned from cache eviction +#[derive(Clone)] +pub struct KV<T> { + /// NOTE: that we currently don't store the Actual key in the cache. This returned value + /// is just the hash of it. + pub key: Key, + pub data: T, + pub weight: Weight, +} + +// the data and its metadata +struct Bucket<T> { + uses: Uses, + queue: Location, + weight: Weight, + data: T, +} + +impl<T: Clone> Bucket<T> { + fn update_bucket(&self, main_queue: bool, data: T, weight: Weight) -> Self { + Self { + uses: Uses(self.uses.uses().into()), + queue: Location(main_queue.into()), + weight, + data, + } + } +} + +const SMALL_QUEUE_PERCENTAGE: f32 = 0.1; + +struct FiFoQueues<T> { + total_weight_limit: usize, + + small: SegQueue<Key>, + small_weight: AtomicUsize, + + main: SegQueue<Key>, + main_weight: AtomicUsize, + + // this replaces the ghost queue of S3-FIFO with similar goal: track the evicted assets + estimator: TinyLfu, + + _t: PhantomData<T>, +} + +type Buckets<T> = HashMap<Key, Bucket<T>, RandomState>; + +impl<T: Clone + Send + Sync> FiFoQueues<T> { + fn admit( + &self, + key: Key, + data: T, + weight: u16, + ignore_lfu: bool, + buckets: &Buckets<T>, + ) -> Vec<KV<T>> { + // Note that we only use TinyLFU during cache admission but not cache read. + // So effectively we mostly sketch the popularity of less popular assets. + // In this way the sketch is a bit more accurate on these assets. + // Also we don't need another separated window cache to address the sparse burst issue as + // this sketch doesn't favor very popular assets much. + let new_freq = self.estimator.incr(key); + + assert!(weight > 0); + let new_bucket = { + let pinned_buckets = buckets.pin(); + let bucket = pinned_buckets.get(&key); + let Some(bucket) = bucket else { + let mut evicted = self.evict_to_limit(weight, buckets); + // TODO: figure out the right way to compare frequencies of different weights across + // many evicted assets. For now TinyLFU is only used when only evicting 1 item. + let (key, data, weight) = if !ignore_lfu && evicted.len() == 1 { + // Apply the admission algorithm of TinyLFU: compare the incoming new item + // and the evicted one. The more popular one is admitted to cache + let evicted_first = &evicted[0]; + let evicted_freq = self.estimator.get(evicted_first.key); + if evicted_freq > new_freq { + // put it back + let first = evicted.pop().expect("just check non-empty"); + // return the put value + evicted.push(KV { key, data, weight }); + (first.key, first.data, first.weight) + } else { + (key, data, weight) + } + } else { + (key, data, weight) + }; + + let bucket = Bucket { + queue: Location::new_small(), + weight, + uses: Default::default(), // 0 + data, + }; + let old = pinned_buckets.insert(key, bucket); + if old.is_none() { + // Always push key first before updating weight + // If doing the other order, another concurrent thread might not + // find things to evict + self.small.push(key); + self.small_weight.fetch_add(weight as usize, SeqCst); + } // else: two threads are racing adding the item + // TODO: compare old.weight and update accordingly + return evicted; + }; + + // the item exists, in case weight changes + let old_weight = bucket.weight; + bucket.uses.inc_uses(); + + fn update_atomic(weight: &AtomicUsize, old: u16, new: u16) { + if old == new { + return; + } + if old > new { + weight.fetch_sub((old - new) as usize, SeqCst); + } else { + weight.fetch_add((new - old) as usize, SeqCst); + } + } + if bucket.queue.is_main() { + update_atomic(&self.main_weight, old_weight, weight); + bucket.update_bucket(MAIN, data, weight) + } else { + update_atomic(&self.small_weight, old_weight, weight); + bucket.update_bucket(SMALL, data, weight) + } + }; + + // replace the existing one + buckets.pin().insert(key, new_bucket); + + // NOTE: there is a chance that the item itself is evicted if it happens to be the one selected + // by the algorithm. We could avoid this by checking if the item is in the returned evicted items, + // and then add it back. But to keep the code simple we just allow it to happen. + self.evict_to_limit(0, buckets) + } + + // the `extra_weight` is to essentially tell the cache to reserve that amount of weight for + // admission. It is used when calling `evict_to_limit` before admitting the asset itself. + fn evict_to_limit(&self, extra_weight: Weight, buckets: &Buckets<T>) -> Vec<KV<T>> { + let mut evicted = if self.total_weight_limit + < self.small_weight.load(SeqCst) + self.main_weight.load(SeqCst) + extra_weight as usize + { + Vec::with_capacity(1) + } else { + vec![] + }; + while self.total_weight_limit + < self.small_weight.load(SeqCst) + self.main_weight.load(SeqCst) + extra_weight as usize + { + if let Some(evicted_item) = self.evict_one(buckets) { + evicted.push(evicted_item); + } else { + break; + } + } + + evicted + } + + fn evict_one(&self, buckets: &Buckets<T>) -> Option<KV<T>> { + let evict_small = self.small_weight_limit() <= self.small_weight.load(SeqCst); + + if evict_small { + let evicted = self.evict_one_from_small(buckets); + // evict_one_from_small could just promote everything to main without evicting any + // so need to evict_one_from_main if nothing evicted + if evicted.is_some() { + return evicted; + } + } + self.evict_one_from_main(buckets) + } + + fn small_weight_limit(&self) -> usize { + (self.total_weight_limit as f32 * SMALL_QUEUE_PERCENTAGE).floor() as usize + 1 + } + + fn evict_one_from_small(&self, buckets: &Buckets<T>) -> Option<KV<T>> { + loop { + let Some(to_evict) = self.small.pop() else { + // empty queue, this is caught between another pop() and fetch_sub() + return None; + }; + let pinned_buckets = buckets.pin(); + let maybe_bucket = pinned_buckets.get(&to_evict); + + let Some(bucket) = maybe_bucket.as_ref() else { + //key in queue but not bucket, shouldn't happen, but ignore + continue; + }; + + let weight = bucket.weight; + self.small_weight.fetch_sub(weight as usize, SeqCst); + + if bucket.uses.uses() > 1 { + // move to main + bucket.queue.move_to_main(); + self.main.push(to_evict); + self.main_weight.fetch_add(weight as usize, SeqCst); + // continue until find one to evict + continue; + } + // move to ghost + + let data = bucket.data.clone(); + let weight = bucket.weight; + pinned_buckets.remove(&to_evict); + return Some(KV { + key: to_evict, + data, + weight, + }); + } + } + + fn evict_one_from_main(&self, buckets: &Buckets<T>) -> Option<KV<T>> { + loop { + let Some(to_evict) = self.main.pop() else { + return None; + }; + let buckets = buckets.pin(); + let maybe_bucket = buckets.get(&to_evict); + if let Some(bucket) = maybe_bucket.as_ref() { + if bucket.uses.decr_uses() > 0 { + // put it back + self.main.push(to_evict); + // continue the loop + } else { + // evict + let weight = bucket.weight; + self.main_weight.fetch_sub(weight as usize, SeqCst); + let data = bucket.data.clone(); + buckets.remove(&to_evict); + return Some(KV { + key: to_evict, + data, + weight, + }); + } + } // else: key in queue but not bucket, shouldn't happen + } + } +} + +/// [TinyUfo] cache +pub struct TinyUfo<K, T> { + queues: FiFoQueues<T>, + buckets: HashMap<Key, Bucket<T>, RandomState>, + random_status: RandomState, + _k: PhantomData<K>, +} + +impl<K: Hash, T: Clone + Send + Sync> TinyUfo<K, T> { + /// Create a new TinyUfo cache with the given weight limit and the given + /// size limit of the ghost queue. + pub fn new(total_weight_limit: usize, estimated_size: usize) -> Self { + let queues = FiFoQueues { + small: SegQueue::new(), + small_weight: 0.into(), + main: SegQueue::new(), + main_weight: 0.into(), + total_weight_limit, + estimator: TinyLfu::new(estimated_size), + _t: PhantomData, + }; + TinyUfo { + queues, + buckets: HashMap::with_capacity_and_hasher(estimated_size, RandomState::new()), + random_status: RandomState::new(), + _k: PhantomData, + } + } + + // TODO: with_capacity() + + /// Read the given key + /// + /// Return Some(T) if the key exists + pub fn get(&self, key: &K) -> Option<T> { + let key = self.random_status.hash_one(key); + let buckets = self.buckets.pin(); + buckets.get(&key).map(|p| { + p.uses.inc_uses(); + p.data.clone() + }) + } + + /// Put the key value to the [TinyUfo] + /// + /// Return a list of [KV] of key and `T` that are evicted + pub fn put(&self, key: K, data: T, weight: Weight) -> Vec<KV<T>> { + let key = self.random_status.hash_one(key); + self.queues.admit(key, data, weight, false, &self.buckets) + } + + /// Always put the key value to the [TinyUfo] + /// + /// Return a list of [KV] of key and `T` that are evicted + /// + /// Similar to [Self::put] but guarantee the assertion of the asset. + /// In [Self::put], the TinyLFU check may reject putting the current asset if it is less + /// popular than the once being evicted. + /// + /// In some real world use cases, a few reads to the same asset may be pending for the put action + /// to be finished so that they can read the asset from cache. Neither the above behaviors are ideal + /// for this use case. + /// + /// Compared to [Self::put], the hit ratio when using this function is reduced by about 0.5pp or less in + /// under zipf workloads. + pub fn force_put(&self, key: K, data: T, weight: Weight) -> Vec<KV<T>> { + let key = self.random_status.hash_one(key); + self.queues.admit(key, data, weight, true, &self.buckets) + } + + #[cfg(test)] + fn peek_queue(&self, key: K) -> Option<bool> { + let key = self.random_status.hash_one(key); + self.buckets.pin().get(&key).map(|p| p.queue.value()) + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_uses() { + let uses: Uses = Default::default(); + assert_eq!(uses.uses(), 0); + uses.inc_uses(); + assert_eq!(uses.uses(), 1); + for _ in 0..USES_CAP { + uses.inc_uses(); + } + assert_eq!(uses.uses(), USES_CAP); + + for _ in 0..USES_CAP + 2 { + uses.decr_uses(); + } + assert_eq!(uses.uses(), 0); + } + + #[test] + fn test_evict_from_small() { + let cache = TinyUfo::new(5, 5); + + cache.put(1, 1, 1); + cache.put(2, 2, 2); + cache.put(3, 3, 2); + // cache full now + + assert_eq!(cache.peek_queue(1), Some(SMALL)); + assert_eq!(cache.peek_queue(2), Some(SMALL)); + assert_eq!(cache.peek_queue(3), Some(SMALL)); + + let evicted = cache.put(4, 4, 3); + assert_eq!(evicted.len(), 2); + assert_eq!(evicted[0].data, 1); + assert_eq!(evicted[1].data, 2); + + assert_eq!(cache.peek_queue(1), None); + assert_eq!(cache.peek_queue(2), None); + assert_eq!(cache.peek_queue(3), Some(SMALL)); + } + + #[test] + fn test_evict_from_small_to_main() { + let cache = TinyUfo::new(5, 5); + + cache.put(1, 1, 1); + cache.put(2, 2, 2); + cache.put(3, 3, 2); + // cache full now + + cache.get(&1); + cache.get(&1); // 1 will be moved to main during next eviction + + assert_eq!(cache.peek_queue(1), Some(SMALL)); + assert_eq!(cache.peek_queue(2), Some(SMALL)); + assert_eq!(cache.peek_queue(3), Some(SMALL)); + + let evicted = cache.put(4, 4, 1); + assert_eq!(evicted.len(), 1); + assert_eq!(evicted[0].data, 2); + + assert_eq!(cache.peek_queue(1), Some(MAIN)); + // 2 is evicted because 1 is in main + assert_eq!(cache.peek_queue(2), None); + assert_eq!(cache.peek_queue(3), Some(SMALL)); + assert_eq!(cache.peek_queue(4), Some(SMALL)); + } + + #[test] + fn test_evict_reentry() { + let cache = TinyUfo::new(5, 5); + + cache.put(1, 1, 1); + cache.put(2, 2, 2); + cache.put(3, 3, 2); + // cache full now + + assert_eq!(cache.peek_queue(1), Some(SMALL)); + assert_eq!(cache.peek_queue(2), Some(SMALL)); + assert_eq!(cache.peek_queue(3), Some(SMALL)); + + let evicted = cache.put(4, 4, 1); + assert_eq!(evicted.len(), 1); + assert_eq!(evicted[0].data, 1); + + assert_eq!(cache.peek_queue(1), None); + assert_eq!(cache.peek_queue(2), Some(SMALL)); + assert_eq!(cache.peek_queue(3), Some(SMALL)); + assert_eq!(cache.peek_queue(4), Some(SMALL)); + + let evicted = cache.put(1, 1, 1); + assert_eq!(evicted.len(), 1); + assert_eq!(evicted[0].data, 2); + + assert_eq!(cache.peek_queue(1), Some(SMALL)); + assert_eq!(cache.peek_queue(2), None); + assert_eq!(cache.peek_queue(3), Some(SMALL)); + assert_eq!(cache.peek_queue(4), Some(SMALL)); + } + + #[test] + fn test_evict_entry_denied() { + let cache = TinyUfo::new(5, 5); + + cache.put(1, 1, 1); + cache.put(2, 2, 2); + cache.put(3, 3, 2); + // cache full now + + assert_eq!(cache.peek_queue(1), Some(SMALL)); + assert_eq!(cache.peek_queue(2), Some(SMALL)); + assert_eq!(cache.peek_queue(3), Some(SMALL)); + + // trick: put a few times to bump their frequencies + cache.put(1, 1, 1); + cache.put(2, 2, 2); + cache.put(3, 3, 2); + + let evicted = cache.put(4, 4, 1); + assert_eq!(evicted.len(), 1); + assert_eq!(evicted[0].data, 4); // 4 is returned + + assert_eq!(cache.peek_queue(1), Some(SMALL)); + assert_eq!(cache.peek_queue(2), Some(SMALL)); + assert_eq!(cache.peek_queue(3), Some(SMALL)); + assert_eq!(cache.peek_queue(4), None); + } + + #[test] + fn test_force_put() { + let cache = TinyUfo::new(5, 5); + + cache.put(1, 1, 1); + cache.put(2, 2, 2); + cache.put(3, 3, 2); + // cache full now + + assert_eq!(cache.peek_queue(1), Some(SMALL)); + assert_eq!(cache.peek_queue(2), Some(SMALL)); + assert_eq!(cache.peek_queue(3), Some(SMALL)); + + // trick: put a few times to bump their frequencies + cache.put(1, 1, 1); + cache.put(2, 2, 2); + cache.put(3, 3, 2); + + // force put will replace 1 with 4 even through 1 is more popular + let evicted = cache.force_put(4, 4, 1); + assert_eq!(evicted.len(), 1); + assert_eq!(evicted[0].data, 1); // 1 is returned + + assert_eq!(cache.peek_queue(1), None); + assert_eq!(cache.peek_queue(2), Some(SMALL)); + assert_eq!(cache.peek_queue(3), Some(SMALL)); + assert_eq!(cache.peek_queue(4), Some(SMALL)); + } + + #[test] + fn test_evict_from_main() { + let cache = TinyUfo::new(5, 5); + + cache.put(1, 1, 1); + cache.put(2, 2, 2); + cache.put(3, 3, 2); + // cache full now + + // all 3 will qualify to main + cache.get(&1); + cache.get(&1); + cache.get(&2); + cache.get(&2); + cache.get(&3); + cache.get(&3); + + let evicted = cache.put(4, 4, 1); + assert_eq!(evicted.len(), 1); + assert_eq!(evicted[0].data, 1); + + // 1 kicked from main + assert_eq!(cache.peek_queue(1), None); + assert_eq!(cache.peek_queue(2), Some(MAIN)); + assert_eq!(cache.peek_queue(3), Some(MAIN)); + assert_eq!(cache.peek_queue(4), Some(SMALL)); + + let evicted = cache.put(1, 1, 1); + assert_eq!(evicted.len(), 1); + assert_eq!(evicted[0].data, 4); + + assert_eq!(cache.peek_queue(1), Some(SMALL)); + assert_eq!(cache.peek_queue(2), Some(MAIN)); + assert_eq!(cache.peek_queue(3), Some(MAIN)); + assert_eq!(cache.peek_queue(4), None); + } +} |