How does Plug.Cowboy Work

A deep dive into how the https://github.com/elixir-plug/plug_cowboy library works, this tour is not aimed at extensively covering every single aspect of the Plug.Cowboy library but rather provide a good understanding of the main mechanisms operating under the hood

tiagonbotelho

Overview

Today we'll look at how an HTTP request is handled by Plug and all the different systems involved in making that happen. By the end of this code tour, you'll understand the full lifecycle of an HTTP request, from a socket being opened to your response being sent back.

Architecture Overview

The overall architecture of Plug.Cowboy relies on four main components in order for it to work

Plug

Plug is a specification to help you build web endpoints. It gives you the tools to handle HTTP requests, set status codes and send responses back. Here's a very simple example:

However, Plug in itself isn't capable of handling any HTTP requests and this is precisely where Cowboy comes in

Cowboy

The actual web-server that parses and processes any incoming and outgoing request written in Erlang. Cowboy works in tandem with Ranch which we will give a brief overview below

Plug.Cowboy

Plug can use a multitude of different web servers, via different adapters. Plug.Cowboy is a very slim Plug Adapter that specifically "glues" Cowboy with Plug

Ranch

Cowboy understand how to handle the HTTP Protocol but it still does not know how to handle incoming socket connections and managing the TCP protocol, and for this it depends on Ranch to do the work.

Ranch is a socket acceptor pool toolkit for TCP protocols written in Erlang that is widely used by most web-servers in both Elixir and Erlang.

Receiving a request

In this section you'll get a better understanding of how the lifecycle of a request gets managed. To illustrate our flow, we will assume that our client is sending a simple GET request with no additional headers or body (e.g. `GET http://localhost:4000`) to our Example Application. Before we begin it is important to know that Ranch, the underlying socket acceptor used by Cowboy and subsequently Plug.Cowboy, has these two main moving pieces: * Ranch acceptors - A fleet of processes that are actively accepting connections from external clients * Ranch connections - After the acceptor accepts a new connection, it delegates to this process to actually begin processing the incoming request Everything begins with the fleet of acceptors that got initialised by the Ranch Acceptor supervisor. This fleet works in the "Transport" layer (TCP in our case which translates into the ranch_tcp module being used) waiting indefinitely for incoming connections to accept. Most socket operations are handled by `ranch_tcp`, this is for the most part a simple wrapper of OTP's `gen_tcp` which is the native interface of TCP/IP sockets for Erlang. As soon as a connection gets accepted, a Connection Socket will get returned which is then passed to ranch_tcp:controlling_process, which, according to the documentation (https://erlang.org/doc/man/gen_tcp.html#controlling_process-2) delegates the handling of the Socket to the provided process (our Connection's process in this case).

As soon as a connection gets accepted, we bind it to the controlling process. This process will receive socket events such as when data is pushed through or a disconnect happens. The call finishes up with invoking ranch_conns_sup:start_protocol/2 which we'll now dig deeper into.

ranch_conns_sup:start_protocol/2

We begin by sending a message (start_protocol) to the supervised connections supervisor process

The Conns Supervisor process, on the other hand, will be waiting for any incoming messages sent to it and , as illustrated below, we're interested in the start_protocol message that got sent by the code snippet above

117 receive
118 {?MODULE, start_protocol, To, Socket} ->
119 try Protocol:start_link(Ref, Socket, Transport, Opts) of
The cowboy_clear (Protocol) process gets initialised returning back its Pid which we will use below
120 {ok, Pid} ->
121 handshake(State, CurConns, NbChildren, Sleepers, To, Socket, Pid, Pid);
A handshake routine between the newly initialised cowboy_clear process and Ranch begins to take place
225 handshake(State=#state{ref=Ref, transport=Transport, handshake_timeout=HandshakeTimeout,
226 max_conns=MaxConns}, CurConns, NbChildren, Sleepers, To, Socket, SupPid, ProtocolPid) ->
227 case Transport:controlling_process(Socket, ProtocolPid) of
The protocol process (cowboy_clear) will now be assigned to receive messages from the provided socket
228 ok ->
229 ProtocolPid ! {handshake, Ref, Transport, Socket, HandshakeTimeout},
Emit the handshake message, with the Socket so that cowboy_clear becomes aware of it
230 put(SupPid, active),
231 CurConns2 = CurConns + 1,
232 if CurConns2 < MaxConns ->
If we're still below the maximum number of connections, we can process the request straight away
233 To ! self(),
We send a message back to the process that initialised the conns_sup process to flag that the request is now being processed which, in turn will unblock the acceptor to start processing additional requests
234 loop(State, CurConns2, NbChildren + 1, Sleepers);

The started Cowboy Clear start_link function will start its own process invoking the connection_process routine

The connection_process/4 function synchronises with ranch's conns_sup, through the ranch:handshake/1 function and delegates the processing of the request to the picked protocol (cowboy_http in our case)

The relevant part of Ranch's handshake can be seen below, where we can see the format of the expected message (which is emitted by our connections supervision process handshake routine).

241 handshake(Ref, Opts) ->
242 receive {handshake, Ref, Transport, CSocket, HandshakeTimeout} ->
243 case Transport:handshake(CSocket, Opts, HandshakeTimeout) of
Transport handshake will return the provided CSocket back to us in our case (things would be different if we were exploring the SSL Transport layer rather than TCP)
244 OK = {ok, _} ->
245 OK;

Lets dive deeper into what happens when the Protocol, which in our case is Cowboy HTTP gets initialised

Cowboy HTTP

With the connection established and acknowledged we begin looking at how the request itself gets processed. This began in none other than the Cowboy Clear module we explored above, which, after going through a handshake routine with Ranch and receiving the connection socket in return, finally delegates the work of processing that same Socket to the chosen protocol module, which in our case is Cowboy HTTP Given that we're exploring the simplest flow possible, this code actually becomes a lot simpler given that we can explore most of the SSL logic in it.

The Loop function has a multitude of scenarios which we won't get to explore in this code tour and we will instead focus on the simplest scenario possible of parsing the incoming GET request

The State passed to parsing always gets initialised with #ps_request_line{empty_lines=0} which causes the function below to be our first match

From reading HTTP's RFC7230, we learn that the first request line should always contain the following format: request-line = method <SP> request-target <SP? HTTP-version CRLF

Therefore, our first step towards parsing the request would be the method.

To do so, we sequentially process each character until we find the first space escape character.

We then proceed to the second part of parsing the first request line, which is parsing the request-target or URI, which, for brevity we'll only explore the HTTP case since that'd be the one used in our example.

I won't bother continuing further into explaining the parsing as it is pretty much the same pattern applied until we reach the end of the line. After we're done with the first request line we move on to the headers, which pretty much follow the exact same pattern explored above as well. With the request now parsed, we can shift our attention to the function that receives the result of the parse_request/3 function. The after_parse/1 function. This function simply delegates its work to the Cowboy Stream module, which we will explore next.

Cowboy Stream

The Cowboy Stream module can be summarised as taking a list of handlers and going through them one by one until the request has been fully processed.

These handlers can be configured but for the sake of simplicity we'll just explore the default option for now (cowboy_stream_h).

cowboy_stream_h

We're getting closer to finally reaching the edge of our example application, but before we do that we still have to go through a sequence of middleware, which get passed to a request_process function. We'll be using the defaults (cowboy_router and cowboy_handler) in this tour but these could alternatively also be configured.

Each one of the provided middleware gets executed in sequence, if we follow the default sequence we'd first handle cowboy_router and then cowboy_handler

Lets then explore each Middleware in closer detail

Cowboy Router

The whole point of Cowboy Router is quite simple, it "traverses" the MFA provided by Plug.Cowboy, breaks it apart, and joins it together under a Request structure and Environment structure.

These values ultimately get passed to the next stage of the Middleware pipeline, which turns out to be Cowboy Handler.

Cowboy Handler

We're almost at our final destination! The Cowboy Handler will simply process the Request/Environment pair provided by the Cowboy Router and invoke it. As it just so happens the Handler contained within the Environment structure, is none other than Plug.Cowboy.Handler and its options contain our HelloWorldPlug that we've implemented.

Plug.Cowboy.Handler

The handler will in turn, call the plug we've developed at the beginning of this tour to apply our own logic based on the request

and finally emit a response back to the client through the maybe_send

Sending a response

We've explored how the request gets processed all the way from being emitted to finally being processed by our own application logic. Lets now explore sending back a response to the client. It all begins with the maybe_send function we've mentioned above delegating the task of emitting the response to an "adapter" which in our case will be Plug.Cowboy.Conn

The actual action of sending the response back is in turn passed to cowboy_req.reply/4

That reply function ends up calling the do_reply function to process the actual response which also defers it to cast/2

cast/2 is quite simple, all it does is send a message to the PID handling the request (that we now know is Cowboy Clear) with the contents of the response

Cowboy Clear process, in turn, will be waiting for a message with this exact format, to begin processing it

info/3 itself calls the commands function, which, as the name suggests, actions a series of commands, one of them being sending back a response to the client as we see below

And that is it! We now understand how Plug.Cowboy both receives and sends back requests through a TCP connection

Starting Plug.Cowboy

Below, we initialise all the options to start a Plug.Cowboy process under our application supervision tree namely: * scheme - can be either http or https (depending on wether you'd like to have TLS enabled) * plug - Your plug which will process incoming requests and return a response * options - All options available to Plug.Cowboy (which will impact the underlying cowboy and ranch configuration). In this case we only make use of `port` to specify which port we want to open for listening to requests (4000) but a full list of all the options is available here: https://hexdocs.pm/plug_cowboy/Plug.Cowboy.html#module-options

If you take a closer look at our Example Application, the most common way of starting Plug.Cowboy is by specifying it under the supervision tree of your application with a series of options (such as the port number), which ends up invoking the code below.

This child_spec function is always invoked through the Supervisor.start_link/2 function as a means to initialise all applications under the supervision tree with the provided options that we've covered in the "Example App" section.

Plug.Cowboy.child_spec/2

There are two main aspects within the Plug.Cowboy.child_spec/2 that we need to understand: The first one is the Dispatcher, which, unless explicitly configured, is built from the provided plug (in this case our "HelloWorldPlug") and returns an MFA (Module, Function, Arguments) which in the end will result in our plug getting called to handle the incoming request.

The second one is the fact that Plug.Cowboy, actually starts a Ranch process underneath which we'll dive into next.

Ranch supervisor

Building on what was mentioned above, if we take a closer look at the second argument (named start) being returned by the child spec we notice the following format: {ranch_listener_sup, start_link, [Ref, Transport, TransOpts, Protocol, ProtoOpts]} This is again an MFA that will initialise the Ranch Listener process under our application's supervision tree which we will begin to explore in the next section. If we look deeper into the options being passed to the Ranch Listener process we have: * Ref - The listener name, in our case "Example.HelloWorldPlug.HTTP" built by Plug.Cowboy.build_ref/2 which combines the scheme (http) with the plug name (Example.HelloWorldPlug) * Transport - Since we are using HTTP (and not https) the module used will be "ranch_tcp" * TransportOpts0 - The options that will be provided to the ranch transport module * Protocol - Provided through the "cowboy_protocol" variable seen above, it too is inferred from the scheme being used, which in this case will result in "cowboy_clear" being used * ProtoOpts - Equally to TransportOpts0, these are the options that will be provided to the protocol ranch module

Afterwards, all these values will get returned back to Plug.Cowboy which in turn will pass them to the Application's supervisor to get initialised which results in the ranch_listener_sup.start_link/5 getting called. Now, the magic begins to happen.

Through the start_link function we begin, trough invoking the set_new_listener_opts action, by setting all our listener options (such as max connections) in an ETS table as seen below which will then be retrieved at multiple stages of our handling of a request.

Afterwards, we end up calling the init/1 function through the supervisor:start_link/2 self-referencing the ranch_listener_sup which initialises the two most important processes of this whole tour: * ranch_conns_sup - The connections supervisor * ranch_acceptors_sup - The supervisor in charge of managing our whole fleet of accepting any incoming connection

Ranch Acceptors Supervisor

The processes under this supervisor will be responsible for starting the server in the correct address and begin to accept connections, which are then delegated to the Ranch Conns Supervisor

The ranch_acceptors_sup:init function is tasked with both setting up a socket to begin listening on the configured port number (4000 in our case)

and initialising a "fleet" of acceptors under its supervision (100 in our case), each its own process, which will accept any incoming connection being established with our server in the configured port

Ranch Conns Supervisor

The Ranch Conns Supervisor will run in an indefinite loop listening for any incoming requests to start handling.

As we've seen from the Acceptor its task is to send a message to the Conns supervisor requesting it to start the TCP protocol, which we see below

After this our entire application is fired up and ready to begin accepting incoming requests from our clients

References

* Plug - https://github.com/elixir-plug/plug * Plug.Cowboy - https://github.com/elixir-plug/plug_cowboy * Cowboy - https://github.com/ninenines/cowboy * Ranch - https://github.com/ninenines/ranch * gen_tcp - https://erlang.org/doc/man/gen_tcp.html * HTTP's RFC7230 - https://tools.ietf.org/html/rfc7230#section-3.1.1

Explore and document code. Straight from your editor.

Get started