internals.rst revision 83eb86cb
1############
2Library Guts
3############
4
5.. highlight:: c
6
7Introduction
8************
9
10
11lsquic inception dates back to the fall of 2016. Since that time, lsquic
12underwent several major changes. Some of those had to do with making the
13library more performant; others were needed to add important new
14functionality (for example, IETF QUIC and HTTP/3). Throughout this time,
15one of the main principles we embraced is that **performance trumps
16everything else**, including code readability and maintainability. This
17focus drove code design decisions again and again and it explains some
18of the hairiness that we will come across in this document.
19
20Code Version
21============
22
23The code version under discussion is v2.29.6.
24
25Coding Style
26************
27
28Spacing and Cuddling
29====================
30
31lsquic follows the LiteSpeed spacing and cuddling conventions:
32
33-  Two empty lines between function definitions
34
35-  Four-space indentation
36
37-  Ifs and elses are not cuddled
38
39Function Name Alignment
40=======================
41
42In function definitions, the name is always left-aligned, for example:
43
44::
45
46    static void
47    check_flush_threshold (lsquic_stream_t *stream)
48
49
50Naming Conventions
51==================
52
53-  Struct members usually have prefixes derived from the struct name.
54   For example members of ``struct qpack_dec_hdl`` begin with ``qdh_``,
55   members of ``struct cid_update_batch`` begin with ``cub_``, and so on.
56   This is done to reduce the need to memorize struct member names as
57   vim's autocomplete (Ctrl-P) functionality makes it easy to fill in
58   the needed identifier.
59
60-  Non-static functions all begin with ``lsquic_``.
61
62-  Functions usually begin with a module name (e.g. ``lsquic_engine_`` or
63   ``stream_``) which is then followed by a verb (e.g.
64   ``lsquic_engine_connect`` or ``stream_activate_hq_frame``). If a
65   function does not begin with a module name, it begins with a verb
66   (e.g. ``check_flush_threshold`` or ``maybe_remove_from_write_q``).
67
68-  Underscores are used to separate words (as opposed to, for example,
69   theCamelCase).
70
71Typedefs
72========
73
74Outside of user-facing API, structs, unions, and enums are not
75typedefed. On the other hand, some integral types are typedefed.
76
77List of Common Terms
78********************
79
80-  **gQUIC** Google QUIC. The original lsquic supported only Google
81   QUIC. gQUIC is going to become obsolete. (Hopefully soon).
82
83-  **HQ** This stands for "HTTP-over-QUIC", the original name of HTTP/3.
84   The code predates the official renaming to HTTP/3 and thus there
85   are many types and names with some variation of ``HQ`` in them.
86
87-  **iQUIC** This stands for IETF QUIC. To differentiate between gQUIC
88   and IETF QUIC, we use ``iquic`` in some names and types.
89
90-  **Public Reset** In the IETF QUIC parlance, this is called the *stateless*
91   reset.  Because gQUIC was first to be implemented, this name is still
92   used in the code, even when the IETF QUIC stateless reset is meant.
93   You will see names that contain strings like "prst" and "pubres".
94
95High-Level Structure
96********************
97
98At a high level, the lsquic library can be used to instantiate an engine
99(or several engines). An engine manages connections; each connection has
100streams. Engine, connection, and stream objects are exposed to the user
101who interacts with them using the API (see :doc:`apiref`). All other data
102structures are internal and are hanging off, in one way or another, from
103the engine, connection, or stream objects.
104
105Engine
106******
107
108*Files: lsquic_engine.c, lsquic_engine_public.h, lsquic.h*
109
110Data Structures
111===============
112
113out_batch
114---------
115
116::
117
118    /* The batch of outgoing packets grows and shrinks dynamically */
119    /* Batch sizes do not have to be powers of two */
120    #define MAX_OUT_BATCH_SIZE 1024
121    #define MIN_OUT_BATCH_SIZE 4
122    #define INITIAL_OUT_BATCH_SIZE 32
123
124    struct out_batch
125    {
126        lsquic_conn_t           *conns  [MAX_OUT_BATCH_SIZE];
127        struct lsquic_out_spec   outs   [MAX_OUT_BATCH_SIZE];
128        unsigned                 pack_off[MAX_OUT_BATCH_SIZE];
129        lsquic_packet_out_t     *packets[MAX_OUT_BATCH_SIZE * 2];
130        struct iovec             iov    [MAX_OUT_BATCH_SIZE * 2];
131    };
132
133The array of struct lsquic_out_specs -- outs above -- is what gets
134passed to the user callback ``ea_packets_out()``. ``conns`` array corresponds
135to the spec elements one to one.
136
137``pack_off`` records which packet in ``packets`` corresponds to which
138connection in ``conns``. Because of coalescing, an element in ``outs`` can
139correspond (logically) to more than one packet. (See how the batch is
140constructed in `Batching packets`_.) On the
141other hand, ``packets`` and ``iov`` arrays have one-to-one correspondence.
142
143There is one instance of this structure per engine: the whole thing is
144allocated as part of `struct lsquic_engine <#lsquic-engine>`__.
145
146
147cid_update_batch
148----------------
149
150::
151
152    struct cid_update_batch
153    {
154        lsquic_cids_update_f    cub_update_cids;
155        void                   *cub_update_ctx;
156        unsigned                cub_count;
157        lsquic_cid_t            cub_cids[20];
158        void                   *cub_peer_ctxs[20];
159    };
160
161This struct is used to batch CID updates.
162
163There are three user-defined CID liveness callbacks: ``ea_new_scids``,
164``ea_live_scids``, and ``ea_old_scids``. These functions all have the same
165signature, ``lsquic_cids_update_f``. When the batch reaches the count of
16620 (kept in ``cub_count``), the callback is called.
167
168The new SCIDs batch is kept in `struct
169lsquic_engine <#lsquic-engine>`__. Other batches are allocated on the
170stack in different functions as necessary.
171
17220 is an arbitrary number.
173
174lsquic_engine_public
175--------------------
176
177This struct, defined in lsquic_engine_public.h, is the "public"
178interface to the engine. ("Public" here means accessible by other
179modules inside lsquic, not that it's a public interface like the
180:doc:`apiref`.) Because there are many things in the engine object that
181are accessed by other modules, this struct is used to expose those
182(``public``) parts of the engine.
183
184``lsquic_engine_struct`` is the first member of
185`lsquic_engine <#lsquic-engine>`__. The functions declared in
186lsquic_engine_public.h take a pointer to lsquic_engine_public as the
187first argument, which is then case to lsquic_engine.
188
189This is somewhat ugly, but it's not too bad, as long as one remembers
190that the two pointers are interchangeable.
191
192lsquic_engine
193-------------
194
195This is the central data structure. The engine instance is the root of
196all other data structures. It contains:
197
198-  Pointers to connections in several lists and hashes (see `Connection Management <#connection-management>`__)
199
200-  Memory manager
201
202-  Engine settings
203
204-  Token generator
205
206-  CID Purgatory
207
208-  Server certificate cache
209
210-  Transport parameter cache
211
212-  Packet request queue
213
214-  `Outgoing packet batch <#out-batch>`__
215
216-  And several other things
217
218Some of the members above are stored in the ``pub`` member of type
219`lsquic_engine_public <#lsquic-engine-public>`__. These are accessed
220directly from other parts of lsquic.
221
222The engine is instantiated via ``lsquic_engine_new()`` and destroyed via
223``lsquic_engine_destroy()``
224
225Connection Management
226=====================
227
228Lifetime
229--------
230
231There are several `connection types`_. All types of
232connections begin their life inside the engine module, where their
233constructors are called. They all also end their life here as well: this
234is where the destructors are called.
235
236The connection constructors are all different function calls:
237
238-  lsquic_ietf_full_conn_client_new
239
240-  lsquic_gquic_full_conn_client_new
241
242-  lsquic_ietf_full_conn_server_new
243
244-  lsquic_gquic_full_conn_server_new
245
246-  lsquic_mini_conn_ietf_new
247
248-  lsquic_mini_conn_new
249
250-  lsquic_prq_new_req
251
252-  lsquic_prq_new_req_ext
253
254(See `Evanescent Connection`_ for information about the last two.)
255
256After a connection is instantiated, all further interactions with it,
257including destruction, are done via the `Common Connection Interface`_.
258
259Refcounting Model
260-----------------
261
262Each connection is referenced by at least one of the following data
263structures:
264
2651. CID-to-connection hash. This hash is used to find connections in
266   order to dispatch an incoming packet. Connections can be hashed by
267   CIDs or by address. In the former case, each connection has one or
268   more mappings in the hash table. IETF QUIC connections have up to
269   eight (in our implementation) source CIDs and each of those would
270   have a mapping. In client mode, depending on QUIC versions and
271   options selected, it is may be necessary to hash connections by
272   address, in which case incoming packets are delivered to
273   connections based on the address.
274
2752. Outgoing queue. This queue holds connections that have packets to
276   send.
277
2783. `Tickable Queue`_. This queue holds connections
279   that `can be ticked now <#tickability>`__.
280
2814. `Advisory Tick Time Queue`_.
282
2835. Closing connections queue. This is a transient queue -- it only
284   exists for the duration of
285   `process_connections() <#processing-connections>`__ function call.
286
2876. Ticked connections queue. Another transient queue, similar to the
288   above.
289
290The idea is to destroy the connection when it is no longer referenced.
291For example, a connection tick may return TICK_SEND|TICK_CLOSE. In that
292case, the connection is referenced from two places: (2) and (5). After
293its packets are sent, it is only referenced in (5), and at the end of
294the function call, when it is removed from (5), reference count goes to
295zero and the connection is destroyed. (See function ``destroy_conn``.) If
296not all packets can be sent, at the end of the function call, the
297connection is referenced by (2) and will only be removed once all
298outgoing packets have been sent.
299
300.. image:: lsquic-engine-conns.png
301
302In the diagram above, you can see that the CID-to-connection hash has
303several links to the same connection. This is because an IETF QUIC
304connection has more than one Source Connection IDs (SCIDs), any of which
305can be included by the peer into the packet. See ``insert_conn_into_hash``
306for more details.
307
308References from each of these data structures are tracked inside the
309connection object by bit flags:
310
311::
312
313    #define CONN_REF_FLAGS  (LSCONN_HASHED          \
314                            |LSCONN_HAS_OUTGOING    \
315                            |LSCONN_TICKABLE        \
316                            |LSCONN_TICKED          \
317                            |LSCONN_CLOSING         \
318                            |LSCONN_ATTQ)
319
320Functions ``engine_incref_conn`` and ``engine_decref_conn`` manage setting
321and unsetting of these flags.
322
323
324Notable Code
325============
326
327Handling incoming packets
328-------------------------
329
330Incoming UDP datagrams are handed off to the lsquic library using the
331function ``lsquic_engine_packet_in``. Depending on the engine mode --
332client or server -- the appropriate `packet
333parsing <#parsing-packets>`__ function is selected.
334
335Because a UDP datagram can contain more than one QUIC packet, the
336parsing is done in a loop. If the first part of packet parsing is
337successful, the internal ``process_packet_in`` function is called.
338
339There, most complexity is contained in ``find_or_create_conn``, which gets
340called for the server side. Here, parsing of the packet is finished, now
341via the version-specific call to ``pf_parse_packet_in_finish``. If
342connection is not found, it may need to be created. Before that, the
343following steps are performed:
344
345-  Check that engine is not in the cooldown mode
346
347-  Check that the maximum number of mini connections is not exceeded
348
349-  Check that the (D)CID specified in the packet is not in the `CID Purgatory`_
350
351-  Check that the packet can be used to create a mini conn: it contains
352   version information and the version is supported
353
354-  Depending on QUIC version, perform token verification, if necessary
355
356Only then does the mini connection constructor is called and the
357connection is inserted into appropriate structures.
358
359Processing connections
360----------------------
361
362Connections are processed in the internal function
363``process_connections``. There is the main connection processing loop and
364logic.
365
366All connections that the iterator passed to this function returns are
367processed in the first while loop. The ``ci_tick()`` call is what causes
368the underlying connection to do all it needs to (most importantly,
369dispatch user events and generate outgoing packets). The return value
370dictates what lists -- global and local to the function -- the
371connection will be placed upon.
372
373Note that mini connection promotion happens inside this loop. Newly
374created full connections are processed inside the same while loop. For a
375short time, a mini and a full connection object exist that are
376associated with the same logical connection.
377
378After all connections are ticked, outgoing packets, if there are any,
379`are sent out <#batching-packets>`__.
380
381Then, connections that were closed by the first while loop above are
382finally closed.
383
384Connections that were ticked (and not closed) are either:
385
386-  Put back onto the ``tickable`` queue;
387
388-  Added to the `Advisory Tick Time Queue`_; or
389
390-  Left unqueued. This can happen when both idle and ping timer are
391   turned off. (This should not happen for the connections that we
392   expect to process, though.)
393
394And lastly, CID liveness updates are reported to the user via the
395optional SCIDs callbacks: ``ea_new_scids`` etc.
396
397Tickable Queue Cycle
398--------------------
399
400When a connection is ticked, it is removed from the `Tickable
401Queue <#tickable-queue>`__ and placed onto the transient Ticked Queue.
402After outgoing packets are sent and some connections are closed, the
403Ticked Queue is examined: the engine queries each remaining connection
404again whether it's tickable. If it is, back onto the Tickable Queue it
405goes. This should not happen often, however. It may occur when RTT is
406low and there are many connections to process. In that case, once all
407connections have been processed, the pacer now allows to send another
408packet because some time has passed.
409
410Batching packets
411----------------
412
413Packet-sending entry point is the function ``send_packets_out``. The main
414idea here is as follows:
415
416Iterate over connections that have packets to send (those are on the
417Outgoing queue in the engine). For each connection, ask it for the next
418outgoing packet, encrypt it, and place it into the batch. When the batch
419is full, `send the batch <#sending-a-batch>`__.
420
421The outgoing packets from all connections are interleaved. For example,
422if connections A, B, and C are on the Outgoing queue, the batch will
423contain packets A1, B1, C1, A2, B2, C2, A3, B3, C3, ��� and so on. This is
424done to ensure fairness. When a connection runs out of packets to send,
425it returns NULL and is removed from the iterator.
426
427The idea is simple, but the devil is in the details. The code may be
428difficult to read. There are several things going on:
429
430Conns Out Iterator
431^^^^^^^^^^^^^^^^^^
432
433This iterator, ``conns_out_iter``, sends packets from connections on the
434Outgoing queue and packets on the Packet Request queue. (The latter
435masquerade as `Evanescent Connections <#evanescent-connection>`__ so that they
436are simple to use.) First, the Outgoing queue (which is a min-heap) is
437drained. Then, packets from the Packet Request queue are sent, if there
438are any. Then, remaining connections from the first pass are returned in
439the round-robin fashion.
440
441After sending is completed, the connections that still have outgoing
442packets to send are placed back onto the Outgoing queue.
443
444
445Packet Coalescing
446^^^^^^^^^^^^^^^^^
447
448Some IETF QUIC packets can be coalesced. This reduces the number of UDP
449datagrams that need to be sent during the handshake. To support this, if
450a packet matches some parameters, the same connection is queried for
451another packet, which, if it returns, is added to the current batch
452slot's iov.
453
454::
455
456    if ((conn->cn_flags & LSCONN_IETF)
457        && ((1 << packet_out->po_header_type)
458          & ((1 << HETY_INITIAL)|(1 << HETY_HANDSHAKE)|(1 << HETY_0RTT)))
459        && (engine->flags & ENG_COALESCE)
460        && iov < batch->iov + sizeof(batch->iov) / sizeof(batch->iov[0]))
461    {
462        const struct to_coal to_coal = {
463            .prev_packet = packet_out,
464            .prev_sz_sum = iov_size(packet_iov, iov),
465        };
466        packet_out = conn->cn_if->ci_next_packet_to_send(conn, &to_coal);
467        if (packet_out)
468            goto next_coa;
469    }
470    batch->outs   [n].iovlen = iov - packet_iov;
471
472*With some debug code removed for simplicity*
473
474Also see the description of the batch in `out_batch`_.
475
476Note that packet coalescing is only done during the handshake of an IETF
477QUIC connection. Non-handshake and gQUIC packets cannot be coalesced.
478
479Sending and Refilling the Batch
480^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
481
482When the batch is sent inside the while loop, and the whole batch was
483sent successfully, the batch pointers are reset, the batch potentially
484grows larger, and the while loop continues.
485
486Batch Resizing
487^^^^^^^^^^^^^^
488
489When all datagrams in the batch are sent successfully, the batch may
490grow -- up to the hardcoded maximum value of ``MAX_OUT_BATCH_SIZE``. When
491not all datagrams are sent, the batch shrinks. The batch size survives
492the call into the library: when packets are sent again, the same batch
493size is used to begin the sending.
494
495Deadline Checking
496^^^^^^^^^^^^^^^^^
497
498This is a rather old safety check dating back to the summer of 2017,
499when we first shipped QUIC support. The way we send packets has changed
500since then -- there is high possibility that this code can be removed
501with no ill effect.
502
503Sending a batch
504---------------
505
506When the batch is filled, it is handed off to the function ``send_batch``,
507which calls the user-supplied callback to send packets out. The
508high-level logic is as follows:
509
510-  Update each packet's ``sent`` time
511
512-  Call the "send packets out" callback
513
514-  For packets that were sent successfully, call ``ci_packet_sent``
515
516-  For packets that were not sent, call ``ci_packet_not_sent``. This is
517   important: all packets returned by ``ci_next_packet_to_send`` must
518   be returned to the connection via either these two calls above or
519   via ``ci_packet_too_large`` (see below).
520
521-  Return the number of packets sent
522
523Because of support for coalescing, we have to map from outgoing spec to
524packets via ``batch->pack_off``. This is done in several places in this
525function.
526
527To handle the case when a PMTU probe is too large (stuff happens!), the
528code checks for EMSGSIZE and returns the packet back to the connection
529via ``ci_packet_too_large``. Because this error is of our own making, this
530does not count as inability to send. The too-large packet is skipped and
531sending of the datagrams in the batch continues.
532
533Growing min-heaps
534-----------------
535
536The Outgoing and Tickable connection queues are actually min-heaps. The
537number of elements in these min-heaps never exceeds the number of
538connections. As optimization, allocation of the underlying arrays is
539done not in the min-heap module itself but in the engine module in the
540function ``maybe_grow_conn_heaps``. The engine knows how many connections
541there are and it grows the arrays as necessary.
542
543As an additional optimization, the two arrays use a single memory region
544which is allocated once.
545
546The min-heap arrays are never shrunk.
547
548Connection
549**********
550
551*Files: lsquic_conn.h, lsquic_conn.c -- others are covered in dedicated
552chapters*
553
554The connection represents the QUIC connection. Connections are `managed
555by the engine <#connection-management>`__. A connection, in turn,
556manages `streams <#stream>`__.
557
558Connection Types
559================
560
561lsquic supports two different QUIC protocols: Google QUIC and IETF QUIC.
562Each of these has a separate implementation, which includes connection
563logic, parsing/generating mechanism, and encryption.
564
565Each of the QUIC connection types on the server begin their life as a
566``mini`` connection. This connection type is used while handshake is
567proceeding. Once the handshake has completed, the mini connection is
568``promoted`` to a ``full`` connection. (See `Mini vs Full
569Connection <#mini-vs-full-connections>`__ for more.)
570
571In addition to the above, an "evanescent" connection type is used to
572manage replies to incoming packets that do not result in connection
573creation. These include version negotiation, stateless retry, and
574stateless reset packets.
575
576Each of the five connection types above are covered in their own
577dedicated chapters elsewhere in this document:
578
579-  `Mini gQUIC Connection <#mini-gquic-connection>`__
580
581-  `Full gQUIC Connection <#full-gquic-connection>`__
582
583-  `Mini IETF QUIC Connection <#mini-ietf-connection>`__
584
585-  `Full IETF QUIC Connection <#full-ietf-connection>`__
586
587-  `Evanescent Connection <#evanescent-connection>`__
588
589lsquic_conn
590===========
591
592All connection types expose the same connection interface via a pointer
593to ``struct lsquic_conn``. (This is the same type pointer to which is
594exposed to the user, but the user can only treat the connection as an
595opaque pointer.)
596
597This structure contains the following elements:
598
599Pointers to Crypto Implementation
600---------------------------------
601
602The crypto session pointer, ``cn_enc_session``, points to a type-specific
603(gQUIC or iQUIC) instance of the encryption session. This session
604survives `connection promotion <#connection-promotion>`__.
605
606The two types of crypto session have a set of common functionality; it
607is pointed to by ``cn_esf_c`` (where ``c`` stands for ``common``). Each of
608them also has its own, type-specific functionality, which is pointed to
609by ``cn_esf.g`` and ``cn_esf.i``
610
611Pointer to Common Connection Interface
612--------------------------------------
613
614``cn_if`` points to the set of functions that implement the Common
615Connection Interface (`see below <#common-connection-interface>`__).
616
617Pointer to Parsing Interface
618----------------------------
619
620The parsing interface is version-specific. It is pointed to by ``cn_pf``.
621
622Various list and heap connectors
623--------------------------------
624
625A connection may be pointed to by one or several queues and heaps (see
626"\ `Connection Management <#connection-management>`__\ "). There are
627several struct members that make it possible: \*TAILQ_ENTRYs,
628``cn_attq_elem``, and ``cn_cert_susp_head``.
629
630Version
631-------
632
633``cn_version`` is used to make some decisions in several parts of the
634code.
635
636Flags
637-----
638
639The flags in ``cn_flags`` specify which lists the connection is on and
640some other properties of the connection which need to be accessible by
641other modules.
642
643Stats
644-----
645
646``cn_last_sent`` and ``cn_last_ticked`` are used to determine the
647connection's place on the outgoing queue (see `Batching
648Packets <#batching-packets>`__) and on the `Advisory Tick Time
649Queue <#alarm-set>`__.
650
651List of SCIDs
652-------------
653
654IETF QUIC connections have one or more SCIDs (Source Connection IDs),
655any one of which can be used by the peer as the DCID (Destination CID)
656in the packets it sends. Each of the SCIDs is used to hash the
657connection so it can be found. ``cn_cces`` points to an array of size
658``cn_n_cces`` which is allocated internally inside each connection type.
659
660Google QUIC connections use only one CID (same for source and
661destination). In order not to modify old code, the macro ``cn_cid`` is
662used.
663
664Common Connection Interface
665===========================
666
667The struct ``conn_iface`` defines the common connection interface. All
668connection types implement all or some of these functions.
669
670Some of these functions are used by the engine; others by other modules
671(for example, to abort a connection); yet others are for use by the
672user, e.g. ``lsquic_conn_close`` and others in lsquic.h. In that case,
673these calls are wrapped in lsquic_conn.c.
674
675Tickability
676===========
677
678A connection is processed when it is tickable. More precisely, the
679connection is placed onto the `Tickable Queue <#tickable-queue>`__,
680which is iterated over when `connections are
681processed <#processing-connections>`__. A connection reports its own
682tickability via the ``ci_is_tickable`` method.
683
684In general, a connection is tickable if it has productive user callbacks
685to dispatch (that is, user wants to read and there is data to read or
686user wants to write and writing is possible), if there are packets to
687send or generate, or if its advisory tick time is in the past. (The
688latter is handled in ``lsquic_engine_process_conns()`` when expired
689connections from the `Advisory Tick Time Queue`_ are added
690to the Tickable Queue.)
691
692Stream
693******
694
695*Files: lsquic_stream.h, lsquic_stream.c*
696
697Overview
698========
699
700The lsquic stream is the conduit for data. This object is accessible by
701the user via any of the ``lsquic_stream_*`` functions declared in
702lsquic.h. The stream is bidirectional; in our user code, it represents
703the HTTP request and response. The client writes its request to the
704stream and the server reads the request in its corresponding instance of
705the stream. The server sends its response using the same stream, which
706the client reads from the stream.
707
708Besides streams exposed to the application, connections use streams
709internally:
710
711-  gQUIC has the HANDSHAKE and HEADERS streams
712
713-  IETF QUIC has up to four HANDSHAKE streams
714
715-  HTTP/3 has at least three unidirectional streams:
716
717   -  Settings stream
718
719   -  QPACK encoder stream
720
721   -  QPACK decoder stream
722
723In addition, HTTP/3 push promises use unidirectional streams. In the
724code, we make a unidirectional stream simply by closing one end in the
725constructor.
726
727All of the use cases above are handled by the single module,
728lsquic_stream. The differences in behavior -- gQUIC vs IETF QUIC, HTTP
729vs non-HTTP -- are handled either by explicit conditionals or via
730function pointers.
731
732The streams hang off full connections via stream ID-to-stream hashes and
733in various queues. This is similar to the way the connections hang off
734the engine.
735
736Streams are only used in the full connections; mini connections use
737their own, minimalistic, code to handle streams.
738
739.. _data-structures-1:
740
741Data Structures
742===============
743
744stream_hq_frame
745---------------
746
747This structure is used to keep information about an HTTP/3 frame that is
748being, or is about to be, written. In our implementation, frame headers
749can be two or three bytes long: one byte is HTTP/3 frame type and the
750frame length is encoded in 1 or 2 bytes, giving us the maximum payload
751size of 2\ :sup:`14` - 1 bytes. You will find literal ``2`` or ``3`` values
752in code that deals with writing HQ frames.
753
754If the HQ frame's size is known in advance (SHF_FIXED_SIZE) -- which is
755the case for HEADERS and PUSH_PROMISE frames -- then the HQ header
756contents are written immediately. Otherwise, ``shf_frame_ptr`` points to
757the bytes in the packet where the HQ header was written, to be filled in
758later.
759
760See `Writing HTTP/3 Streams`_ for more information.
761
762hq_filter
763---------
764
765This structure is used to read HTTP/3 streams. A single instance of it
766is stored in the stream in ``sm_hq_filter``. The framing is removed
767transparently (see `Reading HTTP/3 Streams`_).
768
769Frame type and length are read into ``hqfi_vint2_state``. Due to greasing,
770the reader must be able to support arbitrary frame types and so the code
771is pretty generic: varints of any size are supported.
772
773``hqfi_flags`` and ``hqfi_state`` contain information needed to resume
774parsing the frame header, as only partial data may have arrived.
775
776``hqfi_hist_buf`` and ``hqfi_hist_idx`` are used to record the last few
777incoming headers. This information is used to check for validity, as
778some sequences of HTTP/3 frames are invalid.
779
780stream_filter_if
781----------------
782
783This struct is used to specify functionality required to strip arbitrary
784framing when reading from the stream. At the moment (and for the
785foreseeable future) only one mechanism is used: that to strip the HTTP/3
786framing. At the time the code was written, however, the idea was to
787future-proof it in case we needed to support more than one framing format
788at a time.
789
790lsquic_stream
791-------------
792
793This struct is the stream object. It contains many members that deal
794with
795
796-  Reading data
797
798-  Writing data
799
800-  Maintaining stream list memberships
801
802-  Enforcing flow control
803
804-  Dispatching read and write events
805
806-  Calling various user callbacks
807
808-  Interacting with HEADERS streams
809
810The stream has an ID (``id``). It is used to hash the stream.
811
812A stream can be on one or more lists: see ``next_send_stream``,
813``next_read_stream``, and so on.
814
815Incoming data is stored in ``data_in``. Outgoing data is packetized
816immediately or buffered in ``sm_buf``.
817
818HTTP/3 frames that are being actively written are on the ``sm_hq_frames``
819list.
820
821A note on naming: newer members of the stream begin with ``sm_`` for
822simplicity. Originally, the structure members lacked a prefix.
823
824progress
825--------
826
827This structure is used to determine whether the user callback has made
828any progress during an ``on_write`` or ``on_read`` event loop. If progress
829is not made for a number of calls, the callback is interrupted, breaking
830out of a suspected infinite loop. (See ``es_progress_check`` setting.)
831
832
833frame_gen_ctx
834-------------
835
836This structure holds function pointers to get user data and write it to
837packets. ``fgc_size``, ``fgc_fin``, and ``fgc_read`` are set based on framing
838requirements. This is a nice abstraction that gets passed to several
839packetization functions and allows them not to care about how or whether
840framing is performed.
841
842pwritev_ctx
843-----------
844
845Used to aid ``lsquic_stream_pwritev``. ``hq_arr`` is used to roll back
846HTTP/3 framing if necessary. (The rollback is the most complicated part
847of the ``pwritev`` functionality).
848
849Event Dispatch
850==============
851
852The "on stream read" and "on stream write" callbacks are part of the
853lsquic API. These callbacks are called when the user has registered
854interest in reading from or writing to the stream and reading or writing
855is possible.
856
857Calling ``lsquic_stream_wantwrite`` and ``lsquic_stream_wantread`` places
858the stream on the corresponding "want to write" and "want to read" list.
859These lists are processed by a connection when it's ticked. For each
860stream on the list, the internal function
861``lsquic_stream_dispatch_read_events`` or
862``lsquic_stream_dispatch_write_events``, whichever may be the case.
863
864Dispatching read events is simple. When ``es_rw_once`` is set, the "on
865stream read" callback is called once -- if the stream is readable.
866Otherwise, the callback is called in a loop as long as:
867
868-  The stream is readable;
869
870-  The user wants to read from it; and
871
872-  Progress is being made
873
874Dispatching write events is more complicated due to the following
875factors:
876
877-  In addition to calling the "on stream write" callback, the flushing
878   mechanism also works by using the "want to write" list.
879
880-  When writing occurs, the stream's position on the list may change
881
882STREAM frames in
883================
884
885The data gets in from the transport into the stream via
886``lsquic_stream_frame_in`` function. The connection calls this function
887after parsing a STREAM frame.
888
889The data from the STREAM frame is stored in one of the two "data in"
890modules: ``di_nocopy`` and ``di_hash``. The two are abstracted out behind
891``stream->data_in``.
892
893The "data in" module is used to store incoming stream data. The data is
894read from this module using the ``di_get_frame`` function. See the next
895section.
896
897Reading Data
898============
899
900There are three user-facing stream-reading functions; two of them are
901just wrappers around ``"lsquic_stream_readf``. This function performs some
902checks (we will cover HTTP mode separately) and calls
903``lsquic_stream_readf``, which also performs some checks and calls
904``read_data_frames``. This is the only function in the stream module where
905data is actually read from the "data in" module.
906
907Writing Data
908============
909
910There are four user-facing functions to write to stream, and all of them
911are wrappers around ``stream_write``. (``lsquic_stream_pwritev`` is a bit
912more involved than the other three, but it's pretty well-commented --
913and the complexity is in the rollback, not writing itself.)
914
915Small writes get buffered. If the write size plus whatever is buffered
916already exceeds the threshold -- which is the size of the largest STREAM
917frame that could be fit into a single outgoing packet -- the data is
918packetized instead by calling ``stream_write_to_packets``. See the next
919section.
920
921Packetization
922=============
923
924``stream_write_to_packets`` is the only function through which user data
925makes it into outgoing packets. There are three ways to write STREAM
926frames:
927
9281. ``stream_write_to_packet_hsk``
929
9302. ``stream_write_to_packet_std``
931
9323. ``stream_write_to_packet_crypto``
933
934The particular function is selected based on connection and stream type
935when the stream is first created.
936
937stream_write_to_packets
938-----------------------
939
940Depending on the need to frame data, a reader is selected. The job of
941the reader is to copy user data into the outgoing STREAM frame. In
942HTTP/3 mode, HTTP/3 framing is added transparently -- see `Writing
943HTTP/3 Streams`_ for more information.
944
945The while loop is entered if there is user data to be copied or if the
946end of the stream has been reached and FIN needs to be written. Note the
947threshold check: when writing data from a user call, the threshold is
948set and frames smaller than the full packet are not generated. This is
949to allow for usage like "write 8KB", "write 8KB", "write 8KB" not to
950produce jagged STREAM frames. This way, we utilize the bandwidth most
951effectively. When flushing data, the threshold is not set, so even a
9521-byte data gets packetized.
953
954The call ``stream->sm_write_to_packet`` writes data to a single packet.
955This packet is allocated by the `Send Controller <#send-controller>`__.
956(Depending on when writing is performed, the returned packet may be
957placed onto the scheduled queue immediately or it may be a "buffered"
958packet. The stream code is oblivious to that.) If the send controller
959does not give us a packet, STOP is returned and the while loop exits. An
960ERROR should never happen -- this indicates a bug or maybe failure to
961allocate memory -- and so the connection is aborted in that case. If
962everything is OK, the while loop goes on.
963
964The ``seen_ok`` check is used to place the connection on the tickable list
965on the first successfully packetized STREAM frame. This is so that if
966the packet is buffered (meaning that the writing is occurring outside of
967the callback mechanism), the connection will be processed (ticked) and
968the packets will be scheduled and sent out.
969
970After the while loop, we conditionally close an outstanding HTTP/3
971frame, save any leftover data, schedule STREAM_BLOCKED or BLOCKED frames
972to be sent out if needed, and return the number of user-provided bytes
973that were copied into the outgoing packets and into the internal stream
974buffer (leftovers).
975
976Write a single STREAM frame
977---------------------------
978
979We will examine ``stream_write_to_packet_std`` as it is the most
980complicated of these three functions.
981
982First, we flush the headers stream if necessary -- this is because we
983want the HTTP (gQUIC or HTTP/3) headers to be sent before the payload.
984
985Then, the number of bytes needed to generate a STREAM frame is
986calculated. This value depends on the QUIC version, whether we need to
987generate HTTP/3 framing, and whether the data to write exists (or we
988just need to write an empty STREAM frame with the FIN bit set).
989
990(Note that the framing check is made to overshoot the estimate for
991simplicity. For one, we might not need 3 bytes for the DATA frame, but
992only 2. Secondly, there may already be an open HTTP/3 frame in one of
993the previous packets and so we don't need to write it at all.)
994
995Then, a packet is allocated and ``write_stream_frame`` is called. It is in
996this function that we finally make the call to generate the STREAM frame
997and to copy the data from the user. The function ``pf_gen_stream_frame``
998returns the number of bytes actually written to the packet: this
999includes both the STREAM frame header and the payload (which may also
1000include HTTP/3 frame).
1001
1002The fact that this frame type has been written is added to
1003``po_frame_types`` and the STREAM frame location, type, and size are
1004recorded. This information is necessary to be able to elide the frame
1005from the packet in case the stream is reset.
1006
1007``PO_STREAM_END`` is set if the STREAM frame extends to the end of the
1008packet. This is done to prevent this packet from being used again to
1009append frames to it (after, for example, some preceding frames are
1010elided from it). This is because both in gQUIC and IETF QUIC the STREAM
1011frame header is likely to omit the ``length`` field and instead use the
1012"extends to the end of the packet" field. If frames are shifted, the
1013packet cannot be appended to because it will lead to data loss and
1014corruption.
1015
1016Writing HTTP/3 Streams
1017======================
1018
1019HTTP/3 streams use framing. In most cases, a single HEADERS frame is
1020followed by zero or more DATA frames. The user code does not know this:
1021both gQUIC and IETF QUIC streams appear to behave in exactly the same
1022manner. This makes lsquic simple to use.
1023
1024The drawback is internal complexity. To make the code both easy to use
1025and performant, HTTP/3 framing is generated on-the-fly, as data is being
1026written to packets (as opposed to being buffered and then written). (OK,
1027*mostly* on-the-fly: the HEADERS frame payload is generated and then
1028copied.)
1029
1030On the high level, the way it works is as follows:
1031
1032-  When a write call is made, a variable-size (that is, unknown size;
1033   it's called variable-size because the size of the DATA header may
1034   be 2 or 3 bytes; it's not the best name in the world) frame is
1035   opened/activated.
1036
1037-  When data is written to stream, the DATA header placeholder bytes are
1038   written to the stream transparently and a pointer is saved to this
1039   location.
1040
1041-  The active frame header is closed when
1042
1043   -  It reaches its maximum size; or
1044
1045   -  The data we are writing runs out.
1046
1047-  When the header is closed, the number of bytes that follows is now
1048   written to the location we saved when the header was activated.
1049
1050This mechanism allows us to create a DATA frame that spans several
1051packets before we know how many packets there will be in a single write.
1052(As outgoing packet allocation is governed by the `Send Controller`_.)
1053This is done to minimize the goodput overhead incurred by the DATA frame header.
1054
1055.. image:: stream-http3-framing.png
1056
1057There are a couple of things that do not fit into this model:
1058
10591. The HEADERS frame is fixed size [1]_. It is generated separately
1060   (written by QPACK encoder into a buffer on the stack) and later
1061   copied into the stream. (See the ``send_headers_ietf`` function.) It
1062   can happen that the whole buffer cannot be written. In that case,
1063   a rather complicated dance of buffering the unwritten HEADERS
1064   frame bytes is performed. Here, the "on stream write" callback is
1065   replaced with an internal callback (see the ``select_on_write``
1066   function) and user interaction is prohibited until the whole of
1067   the HEADERS frame is written to the stream.
1068
10692. Push promise streams are even weirder. In addition to the HEADERS
1070   handling above, the push promise stream must begin with a
1071   variable-integer Push ID. To make this fit into the framed stream
1072   model, the code makes up the concept of a "phantom" HTTP/3 frame.
1073   This type of frame's header is not written. This allows us to
1074   treat the Push ID as the payload of a regular HTTP/3 frame.
1075
1076The framing code has had its share of bugs. Because of that, there is a
1077dedicated unit test program just for the framing code,
1078*tests/test_h3_framing.c*. In addition to manually-written tests, the
1079program has a "fuzzer driver" mode, in which the American Fuzzy Lop
1080fuzzer drives the testing of the HTTP/3 framing mechanism. The advantage
1081of this approach is that AFL tries to explore all the code paths.
1082
1083
1084Debates regarding DATA framing raged in 2018 on the QUIC mailing list.
1085Some of the discussion is quite interesting: for example, the debate about
1086"optimizing" DATA frames and `calculations of the header
1087cost <https://lists.w3.org/Archives/Public/ietf-http-wg/2018OctDec/0236.html>`__.
1088
1089Reading HTTP/3 Streams
1090======================
1091
1092HTTP/3 frame headers are stripped out transparently -- they are never
1093seen by the user. From the user's perspective, the lsquic stream
1094represents the payload of HTTP message; a dedicated call is made first
1095to get at the HTTP headers.
1096
1097To accomplish this, the stream implements a generic deframing mechanism.
1098The `stream_filter_if`_ interface allows one to
1099specify functions to a) check whether the stream is readable, b) strip
1100header bytes from a data frame fetched from "data in" module; and c)
1101update byte count in the filter once bytes have been read:
1102
1103hq_filter_readable
1104------------------
1105
1106This function tests for availability of non-frame-header data, stripping
1107frame headers from the stream transparently. Note how it calls
1108``read_data_frames`` with its own callback, ``hq_read``. It is inside this
1109callback that the HEADERS frame is fed to the QPACK decoder.
1110
1111hq_filter_df
1112------------
1113
1114This function's job is to strip framing from data frames returned by the
1115"data in" module inside the ``read_data_frames`` function. It, too, calls
1116the ``hq_read`` function. This allows the two functions that read from
1117stream (this one) and the readability-checking function
1118(``hq_filter_readable``) to share the same state. This is crucial:
1119Otherwise this approach is not likely to work well.
1120
1121hq_decr_left
1122------------
1123
1124This function is needed to update the filter state. Once all payload
1125bytes from the frame have been consumed, the filter is readied to strip
1126the next frame header again.
1127
1128.. _notable-code-1:
1129
1130Notable Code
1131============
1132
1133frame_hq_gen_read
1134-----------------
1135
1136This is where HTTP/3 frame headers are generated. Note the use of
1137``shf_frame_ptr`` to record the memory location to which the correct frame
1138size will be written by a different function.
1139
1140Parsing
1141*******
1142
1143*Files: lsquic_parse.h, lsquic_parse_ietf_v1.c, lsquic_parse_Q050.c, lsquic_parse_Q046.c,
1144lsquic_parse_gquic_be.c, lsquic_parse_common.c, and others*
1145
1146Overview
1147========
1148
1149The two types of QUIC -- gQUIC and IETF QUIC -- have different packet and
1150frame formats.  In addition, different gQUIC version are different among
1151themselves.  Functions to parse and generate packets and frames of each
1152type are abstracted out behind the rather large ``struct parse_funcs``.
1153When a connection is created, its ``cn_pf`` member is set to point to
1154the correct set of function pointers via the ``select_pf_by_ver()`` macro.
1155
1156Parsing Packets
1157===============
1158
1159Before settling on a particular set of parsing function for a connection,
1160the server needs to determine the connection's version.  It does so using
1161the function ``lsquic_parse_packet_in_server_begin()``.
1162
1163This function figures out whether the packet has a long or a short header,
1164and which QUIC version it is.  Because the server deals with fewer packet
1165types than the client (no version negotiation or stateless retry packets),
1166it can determine the necessary parsing function from the first byte of the
1167incoming packet.
1168
1169The "begin" in the name of the function refers to the fact that packet
1170parsing is a two-step process [3]_.  In the first step, the packet version,
1171CID, and some other parameters are parsed out; in the second step,
1172version-specific ``pf_parse_packet_in_finish()`` is called to parse out
1173the packet number.  Between the two calls, the state is saved in
1174``struct packin_parse_state``.
1175
1176Generating Packets
1177==================
1178
1179Packets are generated during encryption using the ``pf_gen_reg_pkt_header()``
1180function.  The generated header is encrypted together with the `packet payload`_
1181and this becomes the QUIC packet that is sent out.  (Most of the time, the
1182QUIC packet corresponds to the UDP datagram, but sometimes packets are
1183`coalesced <#packet-coalescing>`__.
1184
1185Parsing Frames
1186==============
1187
1188There is a parsing function for each frame type.  These function generally
1189have names that begin with "pf_parse\_" and follow a similar pattern:
1190
1191-   The first argument is the buffer to be parsed;
1192
1193-   The second argument is its size;
1194
1195-   Any additional arguments are outputs: the parsed out values from the frame;
1196
1197-   Number of bytes consumed is returned or a negative value is returned
1198    if a parsing error occurred.
1199
1200For example:
1201
1202::
1203
1204    int
1205    (*pf_parse_stream_frame) (const unsigned char *buf, size_t rem_packet_sz,
1206                                                    struct stream_frame *);
1207
1208    int
1209    (*pf_parse_max_data) (const unsigned char *, size_t, uint64_t *);
1210
1211Generating Frames
1212=================
1213
1214Functions that generate frames begin with "pf_gen\_" and also follow a
1215pattern:
1216
1217-   First argument is the buffer to be written to;
1218
1219-   The second argument is the buffer size;
1220
1221-   Any additional arguments specify the values to include in the frame;
1222
1223-   The size of the resulting frame is returned or a negative value if
1224    an error occurred.
1225
1226For example:
1227
1228::
1229
1230    int
1231    (*pf_gen_path_chal_frame) (unsigned char *, size_t, uint64_t chal);
1232
1233    int
1234    (*pf_gen_stream_frame) (unsigned char *buf, size_t bufsz,
1235                            lsquic_stream_id_t stream_id, uint64_t offset,
1236                            int fin, size_t size, gsf_read_f, void *stream);
1237
1238Frame Types
1239===========
1240
1241Frame types are listed in ``enum quic_frame_type``.  When frames are parsed,
1242the on-the-wire frame type is translated to the enum value; when frames are
1243generated, the enum is converted to the on-the-wire format.  This indirection
1244is convenient, as it limits the range of possible QUIC frame values, making
1245it possible to store a list of frame types as a bitmask.  Examples include
1246``po_frame_types`` and ``sc_retx_frames``.
1247
1248Some frame types, such as ACK and STREAM, are common to both Google and IETF
1249QUIC.  Others, such as STOP_WAITING and RETIRE_CONNECTION_ID, are only used
1250in one of the protocols.  The third type is frames that are used by IETF
1251QUIC extensions, such as TIMESTAMP and ACK_FREQUENCY.
1252
1253Parsing IETF QUIC Frame Types
1254-----------------------------
1255
1256Most IETF frame types are encoded as a single by on the wire (and all Google
1257QUIC frames are).  Some of them are encoded using multiple bytes.  This is
1258because, like the vast majority of all integral values in IETF QUIC, the frame
1259type is encoded as a varint.  Unlike the other integral values, however, the
1260frame type has the unique property is that it must be encoded using the
1261*minimal representation*: that is, the encoding must use the minimum number
1262of bytes possible.  For example, encoding the value 200 must use the two-byte
1263varint, not four- or eight-byte version.  This makes it possible to parse
1264frame types once without having to reparse the frame type again in individual
1265frame-parsing routines.
1266
1267Frame type is parsed out in ``ietf_v1_parse_frame_type()``.  Because of the
1268minimal encoding requirement, the corresponding frame-parsing functions know
1269the number of bytes to skip for type, for example:
1270
1271
1272::
1273
1274    static int
1275    ietf_v1_parse_frame_with_varints (const unsigned char *buf, size_t len,
1276                const uint64_t frame_type, unsigned count, uint64_t *vals[])
1277    {
1278        /* --- 8< --- code removed */
1279        vbits = vint_val2bits(frame_type);
1280        p += 1 << vbits;                    // <=== SKIP FRAME TYPE
1281        /* --- 8< --- code removed */
1282    }
1283
1284    static int
1285    ietf_v1_parse_timestamp_frame (const unsigned char *buf,
1286                                    size_t buf_len, uint64_t *timestamp)
1287    {
1288        return ietf_v1_parse_frame_with_varints(buf, buf_len,
1289                FRAME_TYPE_TIMESTAMP, 1, (uint64_t *[]) { timestamp });
1290    }
1291
1292
1293Mini vs Full Connections
1294************************
1295
1296Mini Purpose
1297============
1298
1299The reason for having a mini connection is to conserve resources: a mini
1300connection allocates a much smaller amount of memory. This protects the
1301server from a potential DoS attack. The mini connection's job is to get
1302the handshake to succeed, after which the connection is
1303`promoted <#connection-promotion>`__.
1304
1305Mini/Full Differences
1306=====================
1307
1308Besides their size, the two connection types differ in the following
1309ways:
1310
1311Mini connections' lifespan is limited. If the handshake does not succeed
1312within 10 seconds (configurable), the mini connection is destroyed.
1313
1314A mini connection is only `tickable <#tickability>`__ if it has unsent
1315packets.
1316
1317Mini connections do not process packets that carry application (as
1318opposed to handshake) data. The 0-RTT packet processing is deferred;
1319these packets are stashed and handed over to the full connection during
1320promotion.
1321
1322Connection Promotion
1323====================
1324
1325A mini connection is promoted when the handshake succeeds. The mini
1326connection reports this via the return status of ``ci_tick`` by setting
1327the ``TICK_PROMOTE`` bit. The engine creates a new connection object and
1328calls the corresponding server constructor. The latter copies all the
1329relevant state information from mini to full connection.
1330
1331For a time, two connection objects -- one mini and one full -- exist at
1332the same state. Most of the time, the mini connection is destroyed
1333within the same function call to ``process_connections()``. If, however,
1334the mini connection has unsent packets, it will remain live until those
1335packets are sent successfully. Because the mini connection is by then
1336removed from the CID-to-connection hash (``engine->conns_hash``), it will
1337not receive any more incoming packets.
1338
1339Also see `Connection Processing <#processing-connections>`__.
1340
1341Mini gQUIC Connection
1342*********************
1343
1344*Files: lsquic_mini_conn.h, lsquic_mini_conn.c*
1345
1346.. _overview-1:
1347
1348Overview
1349========
1350
1351The original version of ``struct mini_conn`` fit into paltry 128 bytes.
1352The desire to fit into 128 bytes [2]_ led to, for example,
1353``mc_largest_recv`` -- in effect, a 3-byte integer! Since that time,
1354the mini conn has grown to over 512 bytes.
1355
1356Looking at the struct, we can see that a lot of other data structures
1357are squeezed into small fields:
1358
1359Received and sent packet history is each packed into a 64-bit integer,
1360``mc_received_packnos`` and ``mc_sent_packnos``, respectively. The HEADERS
1361stream offsets are handled by the two two-byte integers ``mc_read_off``
1362and ``mc_write_off``.
1363
1364.. _notable-code-2:
1365
1366Notable Code
1367============
1368
1369continue_handshake
1370------------------
1371
1372This function constructs a contiguous buffer with all the HANDSHAKE
1373stream chunks in order and passes it to ``esf_handle_chlo()``. This is
1374done because the gQUIC crypto module does not buffer anything: it's all
1375or nothing.
1376
1377The code has been written in a generic way, so that even
1378many small packets can be reconstructed into a CHLO. The lsquic client
1379can be made to split the CHLO by setting the max packet size
1380sufficiently low.
1381
1382sent/unsent packets
1383-------------------
1384
1385To conserve space, only a single outgoing packet header exists in the
1386mini connection struct, ``mc_packets_out``. To differentiate between
1387packets that are to be sent and those that have already been sent, the
1388``PO_SENT`` flag is used.
1389
1390Mini IETF Connection
1391********************
1392
1393*Files: lsquic_mini_conn_ietf.h, lsquic_mini_conn_ietf.c*
1394
1395.. _overview-2:
1396
1397Overview
1398========
1399
1400The IETF QUIC mini connection has the same idea as the gQUIC mini
1401connection: use as little memory as possible. This is more difficult to
1402do with the IETF QUIC, however, as there are more moving parts in this
1403version of the protocol.
1404
1405.. _data-structures-2:
1406
1407Data Structures
1408===============
1409
1410mini_crypto_stream
1411------------------
1412
1413This structure is a minimal representation of a stream. The IETF QUIC
1414protocol uses up to four HANDSHAKE streams (one for each encryption
1415level) during the handshake and we need to keep track of them. Even a
1416basic event dispatch mechanism is supported.
1417
1418packno_set_t
1419------------
1420
1421This bitmask is used to keep track of sent, received, and acknowledged
1422packet numbers. It can support up to 64 packet numbers: 0 through 63. We
1423assume that the server will not need to send more than 64 packets to
1424complete the handshake.
1425
1426imc_recvd_packnos
1427-----------------
1428
1429Because the client is allowed to start its packet number sequence with
1430any number in the [0, 2\ :sup:`32`-1] range, the received packet history
1431must be able to accommodate numbers larger than 63. To do that, the
1432receive history is a union. If all received packet numbers are 63 or
1433smaller, the packno_set_t bitmask is used. Otherwise, the receive
1434history is kept in `Tiny Receive History <#tiny-receive-history>`__
1435(trechist). The flag ``IMC_TRECHIST`` indicates which data structure is
1436used.
1437
1438ietf_mini_conn
1439--------------
1440
1441This structure is similar to the gQUIC mini conn. It is larger, though,
1442as it needs to keep track of several instances of things based on
1443encryption level or packet number space.
1444
1445``imc_cces`` can hold up to three SCIDs: one for the original DCID from
1446the client, one for SCID generated by the server, and one for when
1447preferred address transport parameter is used. (The preferred address
1448functionality is not compiled by default.)
1449
1450ietf_mini_rechist
1451-----------------
1452
1453The receive history is in the header file because, in addition to
1454generating the ACK frames in the IETF mini conn, it is used to migrate
1455the receive history during promotion.
1456
1457.. _notable-code-3:
1458
1459Notable Code
1460============
1461
1462Switching to trechist
1463---------------------
1464
1465The switch to the Tiny Receive History happens when the incoming packet
1466number does not fit into the bitmask anymore -- see
1467``imico_switch_to_trechist()``. To keep the trechist code exercised, about
1468one in every 16 mini connection uses trechist unconditionally -- see
1469``lsquic_mini_conn_ietf_new()``.
1470
1471crypto_stream_if
1472----------------
1473
1474A set of functions to drive reading and writing CRYPTO frames to move
1475the handshake along is specified. It is passed to the crypto session.
1476After promotion, the full connection installs its own function pointers.
1477
1478imico_read_chlo_size
1479--------------------
1480
1481This function reads the first few bytes of the first CRYPTO frame on the
1482first HANDSHAKE stream to figure out the size of ClientHello. The
1483transport parameters will not be read until the full ClientHello is
1484available.
1485
1486
1487Duplicated Code
1488---------------
1489
1490Some code has been copied from gQUIC mini connection. This was done on
1491purpose, with the expectation that gQUIC is going away.
1492
1493ECN Blackhole Detection
1494-----------------------
1495
1496ECN blackhole at the beginning of connection is guessed at when none of
1497packets sent in the initial batch were acknowledged. This is done by
1498``imico_get_ecn()``. ``lsquic_mini_conn_ietf_ecn_ok()`` is also used during
1499promotion to check whether to use ECN.
1500
1501Connection Public Interface
1502***************************
1503
1504*Files: lsquic_conn_public.h*
1505
1506TODO
1507
1508Full gQUIC Connection
1509*********************
1510
1511*Files: lsquic_full_conn.h, lsquic_full_conn.c*
1512
1513.. _overview-3:
1514
1515Overview
1516========
1517
1518The full gQUIC connection implements the Google QUIC protocol, both
1519server and client side. This is where a large part of the gQUIC protocol
1520logic is contained and where everything -- engine, streams, sending,
1521event dispatch -- is tied together.
1522
1523Components
1524==========
1525
1526In this section, each member of the ``full_conn`` structure is documented.
1527
1528fc_conn
1529-------
1530
1531The first member of the struct is the common connection object,
1532`lsquic_conn`_.
1533
1534It must be first in the struct because the two pointer are cast to each
1535other, depending on circumstances.
1536
1537fc_cces
1538-------
1539
1540This array holds two connection CID elements.
1541
1542The reason for having two elements in this array instead of one (even
1543though gQUIC only uses one CID) is for the benefit of the client: In
1544some circumstances, the client connections are hashed by the port
1545number, in which case the second element is used to hash the port value.
1546The relevant code is in lsquic_engine.c
1547
1548fc_rechist
1549----------
1550
1551This member holds the `packet receive history <#receive-history>`__. It
1552is used to generate ACK frames.
1553
1554fc_stream_ifs
1555-------------
1556
1557This three-element array holds pointers to stream callbacks and the
1558stream callback contexts.
1559
1560From the perspective of lsquic, Google QUIC has three stream types:
1561
15621. HANDSHAKE stream;
1563
15642. HEADERS stream; and
1565
15663. Regular (message, or request/response) streams.
1567
1568The user provides stream callbacks and the context for the regular
1569streams (3) in ``ea_stream_if`` and ``ea_stream_if_ctx``.
1570
1571The other two stream types are internal. The full connection specifies
1572internal callbacks for those streams. One set handles the handshake and
1573the other handles reading and writing of HTTP/2 frames: SETTINGS,
1574HEADERS, and so on.
1575
1576fc_send_ctl
1577-----------
1578
1579This is the `Send Controller <#send-controller>`__. It is used to
1580allocate outgoing packets, control sending rate, and process
1581acknowledgements.
1582
1583fc_pub
1584------
1585
1586This member holds the `Connection Public
1587Interface <#connection-public-interface>`__.
1588
1589fc_alset
1590--------
1591
1592This is the `Alarm Set <#alarm-set>`__. It is used to set various timers
1593in the connection and the send controller.
1594
1595fc_closed_stream_ids
1596--------------------
1597
1598The two sets in this array hold the IDs of closed streams.
1599
1600There are two of them because of the uneven distribution of stream IDs.
1601It is more efficient to hold even and odd stream IDs in separate
1602structures.
1603
1604fc_settings
1605-----------
1606
1607Pointer to the engine settings.
1608
1609This member is superfluous -- the settings can be fetched from
1610``fc_enpub->enp_settings``.
1611
1612fc_enpub
1613--------
1614
1615This points to the `engine's public interface <#lsquic-engine-public>`__.
1616
1617fc_max_ack_packno
1618-----------------
1619
1620Recording the maximum packet number that contained an ACK allows us to
1621ignore old ACKs.
1622
1623fc_max_swf_packno
1624-----------------
1625
1626This is the maximum packet number that contained a STOP_WAITING frame.
1627It is used to ignore old STOP_WAITING frames.
1628
1629fc_mem_logged_last
1630------------------
1631
1632This timestamp is used to limit logging the amount of memory used to
1633most once per second.
1634
1635fc_cfg
1636------
1637
1638This structure holds a few important configuration parameters. (Looks
1639like ``max_conn_send`` is no longer used���)
1640
1641fc_flags
1642--------
1643
1644The flags hold various boolean indicators associated with the full
1645connections. Some of them, such as ``FC_SERVER``, never change, while
1646others change all the time.
1647
1648fc_n_slack_akbl
1649---------------
1650
1651This is the number of ackable (or, in the new parlance, *ack-eliciting*)
1652packets received since the last ACK was sent.
1653
1654This counter is used to decide whether an ACK should be sent (or, more
1655precisely, queued to be sent) immediately or whether to wait.
1656
1657fc_n_delayed_streams
1658--------------------
1659
1660Count how many streams have been delayed.
1661
1662When ``lsquic_conn_make_stream()`` is called, a stream may not be created
1663immediately. It is delayed if creating a stream would go over the
1664maximum number of stream allowed by peer.
1665
1666fc_n_cons_unretx
1667----------------
1668
1669Counts how many consecutive unretransmittable packets have been sent.
1670
1671
1672fc_last_stream_id
1673-----------------
1674
1675ID of the last created stream.
1676
1677Used to assign ID to streams created by this side of the connection.
1678Clients create odd-numbered streams, while servers initiate
1679even-numbered streams (push promises).
1680
1681fc_max_peer_stream_id
1682---------------------
1683
1684Maximum value of stream ID created by peer.
1685
1686fc_goaway_stream_id
1687-------------------
1688
1689Stream ID received in the GOAWAY frame.
1690
1691This ID is used to reset locally-initiated streams with ID larger than
1692this.
1693
1694fc_ver_neg
1695----------
1696
1697This structure holds the version negotiation state.
1698
1699This is used by the client to negotiate with the server.
1700
1701
1702With gQUIC going away, it is probably not very important anymore.
1703
1704fc_hsk_ctx
1705----------
1706
1707Handshake context for the HANDSHAKE stream.
1708
1709Client and server have different HANDSHAKE stream handlers -- and
1710therefore different contexts.
1711
1712fc_stats
1713--------
1714
1715Connection stats
1716
1717fc_last_stats
1718-------------
1719
1720Snapshot of connection stats
1721
1722This is used to log the changes in counters between calls to
1723``ci_log_stats()``. The calculation is straightforward in
1724``lsquic_conn_stats_diff()``.
1725
1726fc_stream_histories and fc_stream_hist_idx
1727------------------------------------------
1728
1729Rolling log of histories of closed streams
1730
1731
1732fc_errmsg
1733---------
1734
1735Error message associated with connection termination
1736
1737This is set when the connection is aborted for some reason. This error
1738message is only set once. It is used only to set the error message in
1739the call to ``ci_status()``
1740
1741fc_recent_packets
1742-----------------
1743
1744Dual ring-buffer log of packet history
1745
1746The first element is for incoming packets, the second is for outgoing
1747packets. Each entry holds received or sent time and frame information.
1748
1749This can be used for debugging. It is only compiled into debug builds.
1750
1751fc_stream_ids_to_reset
1752----------------------
1753
1754List of stream ID to send STREAM_RESET for
1755
1756These STREAM_RESET frames are associated with streams that are not
1757allowed to be created because we sent a GOAWAY frame. (There is a period
1758when GOAWAY is in transit, but the peer keeps on creating streams). To
1759queue the reset frames for such a stream, an element is added to this
1760list.
1761
1762fc_saved_ack_received
1763---------------------
1764
1765Timestamp of the last received ACK.
1766
1767This is used for `ACK merging <#ack-merging>`__.
1768
1769fc_path
1770-------
1771
1772The network path -- Google QUIC only has one network path.
1773
1774fc_orig_versions
1775----------------
1776
1777List (as bitmask) of original versions supplied to the client
1778constructor.
1779
1780Used for version negotiation. See `fc_ver_neg`_ for more
1781coverage of this topic.
1782
1783fc_crypto_enc_level
1784-------------------
1785
1786Latest crypto level
1787
1788This is for Q050 only, which does away with the HANDSHAKE stream and
1789uses CRYPTO frames instead. (This was part of Google's plan to move
1790Google QUIC protocol closer to IETF QUIC.)
1791
1792fc_ack
1793------
1794
1795Saved ACK -- latest or merged
1796
1797This ACK structure is used in `ACK merging <#ack-merging>`__.
1798
1799Instantiation
1800=============
1801
1802The largest difference between the server and client mode of the full
1803connection is in the way it is created. The client creates a brand-new
1804connection, performs version negotiation, and runs the handshake before
1805dispatching user events. The server connection, on the other hand, gets
1806created from a mini connection during `connection
1807promotion <#connection-promotion>`__. By that time, both version
1808negotiation and handshake have already completed.
1809
1810Common Initialization
1811---------------------
1812
1813The ``new_conn_common()`` function contains initialization common to both
1814server and client. Most full connection's internal data structures are
1815initialized or allocated here, among them `Send
1816Controller <#send-controller>`__, `Receive
1817History <#receive-history>`__, and `Alarm Set <#alarm-set>`__.
1818
1819The HEADERS stream is created here, if necessary. (Throughout the code,
1820you can see checks whether the connection is in HTTP mode or not. Even
1821though gQUIC means that HTTP is used, our library supports a non-HTTP
1822mode, in which there is no HEADERS stream. This was done for testing
1823purposes and made possible the echo and md5 client and server programs.)
1824
1825Server
1826------
1827
1828After initializing the common structures in ``new_conn_common()``,
1829server-specific initialization continues in
1830``lsquic_gquic_full_conn_server_new()``.
1831
1832The HANDSHAKE stream is created. The handler (see
1833``lsquic_server_hsk_stream_if``) simply throws out data that it reads from
1834the client.
1835
1836Outgoing packets are inherited -- they will be sent during the next tick
1837-- and deferred incoming packets are processed.
1838
1839Client
1840------
1841
1842The client's initialization takes place in
1843``lsquic_gquic_full_conn_client_new()``. Crypto session is created and the
1844HANDSHAKE stream is initialized. The handlers in
1845``lsquic_client_hsk_stream_if`` drive the handshake process.
1846
1847Incoming Packets
1848================
1849
1850The entry point for incoming packets is ``ci_packet_in()``, which is
1851implemented by ``full_conn_ci_packet_in``. Receiving a packet restarts the
1852idle timer.
1853
1854The function ``process_incoming_packet`` contains some client-only logic
1855for processing version negotiation and stateless retry packets. In the
1856normal case, ``process_regular_packet()`` is called. This is where the
1857incoming packet is decrypted, the `Receive
1858History <#receive-history>`__ is updated, ``parse_regular_packet()`` is
1859called, and some post-processing takes place (most importantly,
1860scheduling an ACK to be sent).
1861
1862The function ``parse_regular_packet`` is simple: It iterates over the
1863whole decrypted payload of the incoming packet and parses out frames one
1864by one. An error aborts the connection.
1865
1866ACK Merging
1867===========
1868
1869Processing ACKs is `expensive <#handling-acks>`__. When sending data, a
1870batch of incoming packets is likely to contain an ACK frame each. The
1871ACK frame handler, ``process_ack_frame()``, merges consecutive ACK frames
1872and stores the result in `fc_ack`_. The ACK is processed
1873during the `next tick <#ticking>`__. If the two ACK cannot be merged
1874(which is unlikely), the cached ACK is processed immediately and the new
1875ACK is cached.
1876
1877Caching an ACK has a non-trivial memory cost: the 4KB-plus data
1878structure ``ack_info`` accounts for more than half of the size of the
1879``full_conn`` struct. Nevertheless, the tradeoff is well worth it. ACK
1880merging reduces the number of calls to ``lsquic_send_ctl_got_ack()`` by a
1881factor of 10 or 20 in some high-throughput scenarios.
1882
1883Ticking
1884=======
1885
1886When a `connection is processed by the
1887engine <#processing-connections>`__, the engine calls the connection's
1888``ci_tick()`` method. This is where most of the connection logic is
1889exercised. In the full gQUIC connection, this method is implemented by
1890``full_conn_ci_tick()``.
1891
1892The following steps are performed:
1893
1894-  A cached ACK, if it exists, is processed
1895
1896-  Expired alarms are rung
1897
1898-  Stream read events are dispatched
1899
1900-  An ACK frame is generated if necessary
1901
1902-  Other control frames are generated if necessary
1903
1904-  Lost packets are rescheduled
1905
1906-  More control frames and stream resets are generated if necessary
1907
1908-  HEADERS stream is flushed
1909
1910-  Outgoing packets that carry stream data are scheduled in four steps:
1911
1912   a. High-priority `buffered packets <#buffered-queue>`__ are scheduled
1913
1914   b. Write events are dispatched for high-priority streams
1915
1916   c. Non-high-priority buffered packets are scheduled
1917
1918   d. Write events are dispatched for non-high-priority streams
1919
1920-  Connection close or PING frames are generated if necessary
1921
1922-  Streams are serviced (closed, freed, created)
1923
1924Full IETF Connection
1925********************
1926
1927*Files: lsquic_full_conn_ietf.h, lsquic_full_conn_ietf.c*
1928
1929Overview
1930========
1931
1932This module implements IETF QUIC
1933`Transport <https://tools.ietf.org/html/draft-ietf-quic-transport-34>`_
1934and
1935`HTTP/3 <https://tools.ietf.org/html/draft-ietf-quic-http-34>`_ logic,
1936plus several QUIC extensions.  To attain an overall grasp of the code,
1937at least some familiarity with these protocols is required.  To understand
1938the code in detail, especially *why* some things are done, a closer reading
1939of the specification may be in order.
1940
1941In some places, the code contains comments with references to the
1942specification, e.g.
1943
1944::
1945
1946    if (conn->ifc_flags & IFC_SERVER)
1947    {   /* [draft-ietf-quic-transport-34] Section 19.7 */
1948        ABORT_QUIETLY(0, TEC_PROTOCOL_VIOLATION,
1949                            "received unexpected NEW_TOKEN frame");
1950        return 0;
1951    }
1952
1953(A search for "[draft-ietf" will reveal over one hundred places in the
1954code thus commented.)
1955
1956The Full IETF Connection module is similar in structure to the `Full gQUIC
1957Connection`_ module, from which it originated.  Some code is quite similar
1958as well, including logic for `ACK Merging`_ and `Ticking`_.
1959
1960Components
1961==========
1962
1963In this section, each member of ``ietf_full_conn`` is documented.
1964
1965ifc_conn
1966--------
1967
1968The first member of the struct is the common connection object,
1969`lsquic_conn`_.
1970
1971It must be first in the struct because the two pointer are cast to each
1972other, depending on circumstances.
1973
1974ifc_cces
1975--------
1976
1977This array holds eight connection CID elements.
1978See `Managing SCIDs`_.
1979
1980ifc_rechist
1981-----------
1982
1983This member holds the `packet receive history <#receive-history>`__.
1984The receive history is used to generate ACK frames.
1985
1986ifc_max_ackable_packno_in
1987-------------------------
1988
1989This value is used to detect holes in incoming packet number sequence.
1990This information is used to queue ACK frames.
1991
1992ifc_send_ctl
1993------------
1994
1995This is the `Send Controller`_. It is used to
1996allocate outgoing packets, control sending rate, and process
1997acknowledgements.
1998
1999ifc_pub
2000-------
2001
2002This member holds the `Connection Public Interface`_
2003
2004ifc_alset
2005---------
2006
2007This is the `Alarm Set`_. It is used to set various timers
2008in the connection and the send controller.
2009
2010ifc_closed_stream_ids
2011---------------------
2012
2013The two sets in this array hold the IDs of closed streams.
2014
2015There are two of them because of the uneven distribution of stream IDs.
2016The set data structure is meant to hold sequences without gaps.
2017It is more efficient to hold stream IDs for each stream type in
2018separate structures.
2019
2020ifc_n_created_streams
2021---------------------
2022
2023Counters for locally initiated streams.  Used to generate next
2024stream ID.
2025
2026ifc_max_allowed_stream_id
2027-------------------------
2028
2029Maximum allowed stream ID for each of the four (``N_SITS``) stream types.
2030This is used all over the place.
2031
2032ifc_closed_peer_streams
2033-----------------------
2034
2035Counts how many remotely-initiated streams have been closed.  Because the
2036protocol mandates that the stream IDs be assigned in order, this allows us
2037to make some logical inferences in the code.
2038
2039ifc_max_streams_in
2040------------------
2041
2042Maximum number of open streams the peer is allowed to initiate.
2043
2044ifc_max_stream_data_uni
2045-----------------------
2046
2047Initial value of the maximum amount of data locally-initiated unidirectional
2048stream is allowed to send.
2049
2050ifc_flags
2051---------
2052
2053All kinds of flags.
2054
2055ifc_mflags
2056----------
2057
2058More flags!
2059
2060ifc_send_flags
2061--------------
2062
2063The send flags keep track of which control frames are queued to be sent.
2064
2065ifc_delayed_send
2066----------------
2067
2068Some send flags are delayed.
2069
2070We stop issuing streams credits if peer stops opening QPACK decoder window.
2071This addresses a potential attack whereby client can cause the server to keep
2072allocating memory.  See `Security Considerations in the QPACK Internet-Draft
2073<https://tools.ietf.org/html/draft-ietf-quic-qpack-21#section-7.3>`__.
2074
2075ifc_send
2076--------
2077
2078This is the `Send Controller`_. It is used to allocate outgoing packets,
2079control sending rate, and process acknowledgements.
2080
2081ifc_error
2082---------
2083
2084This struct records which type of error has occurred (transport or application)'
2085and the error code.
2086
2087ifc_n_delayed_streams
2088---------------------
2089
2090Count how many streams have been delayed.
2091
2092When ``lsquic_conn_make_stream()`` is called, a stream may not be created
2093immediately. It is delayed if creating a stream would go over the
2094maximum number of stream allowed by peer.
2095
2096ifc_n_cons_unretx
2097-----------------
2098
2099Counts how many consecutive unretransmittable packets have been sent.
2100
2101Enough unretransittable sent packets in a row causes a PING frame to
2102be sent.  This forces the peer to send an ACK.
2103
2104ifc_pii
2105-------
2106
2107Points to the selected priority iterator.
2108
2109The IETF Full Connection supports two priority mechanisms: the original
2110Google QUIC priority mechanism and the `HTTP/3 Extensible Priorities
2111<https://tools.ietf.org/html/draft-ietf-httpbis-priority-03>`__.
2112
2113ifc_errmsg
2114----------
2115
2116Holds dynamically generated error message string.
2117
2118Once set, the error string does not change until the connection is
2119destroyed.
2120
2121ifc_enpub
2122---------
2123
2124This points to the `engine's public interface <#lsquic-engine-public>`__.
2125
2126ifc_settings
2127------------
2128
2129Pointer to the engine settings.
2130
2131This member is superfluous -- the settings can be fetched from
2132``ifc_enpub->enp_settings``.
2133
2134ifc_stream_ids_to_ss
2135--------------------
2136
2137Holds a queue of STOP_SENDING frames to send as response to remotely
2138initiated streams that came in after we sent a GOAWAY frame.
2139
2140ifc_created
2141-----------
2142
2143Time when the connection was created.  This is used for the Timestamp
2144and Delayed ACKs extensions.
2145
2146ifc_saved_ack_received
2147----------------------
2148
2149Time when cached ACK frame was received.  See `ACK Merging`_.
2150
2151ifc_max_ack_packno
2152------------------
2153
2154Holding the maximum packet number containing an ACK frame allows us
2155to ignore old ACK frames.  One value per Packet Number Space is kept.
2156
2157ifc_max_non_probing
2158-------------------
2159
2160Maximum packet number of a received non-probing packets.  This is used
2161for path migration.
2162
2163ifc_cfg
2164-------
2165
2166Local copy of a couple of transport parameters.  We could get at them
2167with a function call, but these are used often enough to optimize
2168fetching them.
2169
2170ifc_process_incoming_packet
2171---------------------------
2172
2173The client goes through version negotiation and the switches to the
2174fast function.  The server begins to use the fast function immediately.
2175
2176ifc_n_slack_akbl
2177----------------
2178
2179Number ackable packets received since last ACK was sent.  A count is
2180kept for each Packet Number Space.
2181
2182ifc_n_slack_all
2183---------------
2184
2185Count of all packets received since last ACK was sent.  This is only
2186used in the Application PNS (Packet Number Space).  (This is regular
2187PNS after the handshake completes).
2188
2189ifc_max_retx_since_last_ack
2190---------------------------
2191
2192This number is the maximum number of ack-eliciting packets to receive
2193before an ACK must be sent.
2194
2195The default value is 2.  When the Delayed ACKs extension is used, this
2196value gets modified by peer's ACK_FREQUENCY frames.
2197
2198ifc_max_ack_delay
2199-----------------
2200
2201Maximum amount of allowed after before an ACK is sent if the threshold
2202defined by ifc_max_retx_since_last_ack_ has not yet been reached.
2203
2204The default value is 25 ms.  When the Delayed ACKs extension is used, this
2205value gets modified by peer's ACK_FREQUENCY frames.
2206
2207ifc_ecn_counts_in
2208-----------------
2209
2210Incoming ECN counts in each of the Packet Number Spaces.  These counts
2211are used to generate ACK frames.
2212
2213ifc_max_req_id
2214--------------
2215
2216Keeps track of the maximum ID of bidirectional stream ID initiated by the
2217peers.  It is used to construct the GOAWAY frame.
2218
2219ifc_hcso
2220--------
2221
2222State for outgoing HTTP/3 control stream.
2223
2224ifc_hcsi
2225--------
2226
2227State for incoming HTTP/3 control stream.
2228
2229ifc_qeh
2230-------
2231
2232QPACK encoder streams handler.
2233
2234The handler owns two unidirectional streams: a) locally-initiated QPACK
2235encoder stream, to which it writes; and b) peer-initiated QPACK decoder
2236stream, from which it reads.
2237
2238ifc_qdh
2239-------
2240
2241QPACK decoder streams handler.
2242
2243The handler owns two unidirectional streams: a) peer-initiated QPACK
2244encoder stream, from which it reads; and b) locally-initiated QPACK
2245decoder stream, to which it writes.
2246
2247ifc_peer_hq_settings
2248--------------------
2249
2250Peer's HTTP/3 settings.
2251
2252ifc_dces
2253--------
2254
2255List of destination connection ID elements (DCEs).  Each holds a DCID
2256and the associated stateless reset token.  When lsquic uses a DCID, it
2257inserts the stateless reset token into a hash so that stateless resets
2258can be found.
2259
2260Outside of the initial migration, the lsquic client code does not switch
2261DCIDs.  One idea (suggested in the drafts somewhere) is to switch DCIDs
2262after a period of inactivity.
2263
2264ifc_to_retire
2265-------------
2266
2267List of DCIDs to retire.
2268
2269ifc_scid_seqno
2270--------------
2271
2272Sequence generator for SCIDs generated by the endpoint.
2273
2274ifc_scid_timestamp
2275------------------
2276
2277List of timestamps for the generated SCIDs.
2278
2279This list is used in the SCID rate-limiting mechanism.
2280
2281
2282ifc_incoming_ecn
2283----------------
2284
2285History indicating presence of ECN markings on most recent incoming packets.
2286
2287ifc_cur_path_id
2288---------------
2289
2290Current path ID -- indexes `ifc_paths`_.
2291
2292ifc_used_paths
2293--------------
2294
2295Bitmask of which paths in `ifc_paths`_ are being used.
2296
2297ifc_mig_path_id
2298---------------
2299
2300Path ID of the path being migrated to.
2301
2302ifc_active_cids_limit
2303---------------------
2304
2305This is the maximum number of CIDs at any one time this
2306endpoint is allowed to issue to peer.  If the TP value exceeds ``cn_n_cces``,
2307it is reduced to it.
2308
2309ifc_active_cids_count
2310---------------------
2311
2312This value tracks how many CIDs have been issued.  It is decremented
2313each time a CID is retired.
2314
2315ifc_first_active_cid_seqno
2316--------------------------
2317
2318Another piece of the SCID rate-limiting mechanism.
2319
2320ifc_ping_unretx_thresh
2321----------------------
2322
2323Once the number consecutively sent non-ack-elicing packets
2324(`ifc_n_cons_unretx`_) exceeds this value, this endpoint will send
2325a PING frame to force the peer to respond with an ACK.
2326
2327The threshold begins at 20 and then made to fluctuate randomly between
232812 and 27.
2329
2330ifc_last_retire_prior_to
2331------------------------
2332
2333Records the maximum value of ``Retire Prior To`` value of the
2334`NEW_CONNECTION_ID frame
2335<https://tools.ietf.org/html/draft-ietf-quic-transport-34#section-19.15>`_.
2336
2337ifc_ack_freq_seqno
2338------------------
2339
2340Sequence number generator for ACK_FREQUENCY frames generated by this
2341endpoint.
2342
2343ifc_last_pack_tol
2344-----------------
2345
2346Last value of the ``Packet Tolerance`` field sent in the last
2347``ACK_FREQUENCY`` frame generated by this endpoint.
2348
2349ifc_last_calc_pack_tol
2350----------------------
2351
2352Last *calculated* value of the ``Packet Tolerance`` field.
2353
2354ifc_min_pack_tol_sent
2355---------------------
2356
2357Minimum value of the ``Packet Tolerance`` field sent.  Only used for
2358statistics display.
2359
2360ifc_max_pack_tol_sent
2361---------------------
2362
2363Maximum value of the ``Packet Tolerance`` field sent.  Only used for
2364statistics display.
2365
2366ifc_max_ack_freq_seqno
2367----------------------
2368
2369Maximum seen sequence number of incoming ``ACK_FREQUENCY`` frame.  Used
2370to discard old frames.
2371
2372ifc_max_udp_payload
2373-------------------
2374
2375Maximum UDP payload.  This is the cached value of the transport parameter.
2376
2377ifc_last_live_update
2378--------------------
2379
2380Last time ``ea_live_scids()`` was called.
2381
2382ifc_paths
2383---------
2384
2385Array of connection paths.  Most of the time, only one path is used; more
2386are used during `migration <#path-migration>`__.  The array has four
2387elements as a safe upper limit.
2388
2389The elements are of type ``struct conn_path``.  Besides the network path,
2390which stores socket addresses and is associated with each outgoing packet
2391(via ``po_path``), the connection path keeps track of the following
2392information:
2393
2394-   Outgoing path challenges.  See `Sending Path Challenges`_.
2395
2396-   Incoming path challenge.
2397
2398-   Spin bit (``cop_max_packno``, ``cop_spin_bit``, and ``COP_SPIN_BIT``).
2399
2400-   DPLPMTUD state.
2401
2402ifc_u.cli
2403---------
2404
2405Client-specific state.  This is where pointers to "crypto streams" are
2406stored; they are not in the ``ifc_pub.all_streams`` hash.
2407
2408ifc_u.ser
2409---------
2410
2411The server-specific state is only about push promises.
2412
2413ifc_idle_to
2414-----------
2415
2416Idle timeout.
2417
2418ifc_ping_period
2419---------------
2420
2421Ping period.
2422
2423ifc_bpus
2424--------
2425
2426A hash of buffered priority updates.  It is used when a priority update
2427(part of the Extensible HTTP Priorities extension) arrives before the
2428stream it is prioritizing.
2429
2430ifc_last_max_data_off_sent
2431--------------------------
2432
2433Value of the last MAX_DATA frame sent.  This is used to limit the number
2434of times we send the MAX_DATA frame in response to a DATA_BLOCKED frame.
2435
2436ifc_min_dg_sz
2437-------------
2438
2439Minimum size of the DATAGRAM frame.  Used by the eponymous extension.
2440
2441ifc_max_dg_sz
2442-------------
2443
2444Maximum size of the DATAGRAM frame.  Used by the eponymous extension.
2445
2446ifc_pts
2447-------
2448
2449PTS stands for "Packet Tolerance Stats".  Information collected here
2450is used to calculate updates to the packet tolerance advertised to the
2451peer via ACK_FREQUENCY frames.  Part of the Delayed ACKs extension.
2452
2453ifc_stats
2454---------
2455
2456Cumulative connection stats.
2457
2458ifc_last_stats
2459--------------
2460
2461Copy of `ifc_stats`_ last time ``ci_log_stats()`` was called.  Used
2462to calculate the difference.
2463
2464ifc_ack
2465-------
2466
2467One or more cached incoming ACK frames.  Used for `ACK merging`_.
2468
2469Managing SCIDs
2470==============
2471
2472Source Connection IDs -- or SCIDs for short -- are stored in the `ifc_cces`_
2473array.
2474
2475Each of ``struct conn_cid_elem`` contains the CID itself, the CID's port or
2476sequence number, and flags:
2477
2478-   ``CCE_USED`` means that this Connection ID has been used by the peer.  This
2479    information is used to check whether the peer's incoming packet is using
2480    a new DCID or reusing an old one when the packet's DCID does not match
2481    this path's current DCID.
2482
2483-   ``CCE_REG`` signifies that the CID has been registered with the user-defined
2484    ``ea_new_scids()`` callback.
2485
2486-   ``CCE_SEQNO`` means that the connection has been issued by this endpoint
2487    and ``cce_seqno`` contains a valid value.  Most of SCIDs are issued by
2488    either endpoint, with one exception:  The DCID included in the first few
2489    packets sent by the client becomes an interim SCID for the server and it
2490    does not have a sequence number.  This "original" SCID gets retired 2
2491    seconds after the handshake succeeds, see the ``AL_RET_CIDS`` alarm.
2492
2493-   ``CCE_PORT`` is used to mark the special case of hashing connections by
2494    port number.  In client mode, the lsquic engine may, under some circumstances,
2495    hash the connections by local port number instead of connection ID.
2496    In that case, ``cce_port`` contains the port number used to hash the
2497    connection.
2498
2499Each CIDs is hashed in the of the "CID-to-connection" mapping that the engine
2500maintains.  If it is not in the hash, incoming packets that use this CID as
2501DCID will not be dispatched to the connection (because the connection will not
2502be found).
2503
2504Path Migration
2505==============
2506
2507What follows assumes familiarity with `Section 9
2508<https://tools.ietf.org/html/draft-ietf-quic-transport-34#section-9>`__
2509of the Transport I-D.
2510
2511Server
2512------
2513
2514The server handles two types of path migration.  In the first type, the
2515client performs probing by sending path challenges; in the second type,
2516the migration is due to a NAT rebinding.
2517
2518The connection keeps track of different paths in `ifc_paths`_.  Path
2519objects are allocated out of the ``ifc_paths`` array.  They are of type
2520``struct conn_path``; one of the members is ``cop_path``, which is the
2521network path object used to send packets (via ``po_path``).
2522
2523Each incoming packet is fed to the engine using the
2524``lsquic_engine_packet_in()`` function.  Along with the UDP datagram,
2525the local and peer socket addresses are passed to it.  These addresses are
2526eventually passed to the connection via the ``ci_record_addrs()`` call.
2527The first of these calls -- for the first incoming packet -- determines the
2528*current path*.  When the address pair, which is a four-tuple of local
2529and remote IP addresses and port numbers, does not match that of the
2530current path, a new path object is created, triggering migration logic.
2531
2532``ci_record_addrs()`` returns a *path ID*, which is simply the index of
2533the corresponding element in the ``ifc_paths`` array.  The current path
2534ID is stored in ``ifc_cur_path_id``.  The engine assigns this value to
2535the newly created incoming packet (in ``pi_path_id``).  The packet is
2536then passed to ``ci_packet_in()``.
2537
2538The first part of the path-switching logic is in ``process_regular_packet()``:
2539
2540::
2541
2542    case REC_ST_OK:
2543        /* --- 8< --- some code elided... */
2544        saved_path_id = conn->ifc_cur_path_id;
2545        parse_regular_packet(conn, packet_in);
2546        if (saved_path_id == conn->ifc_cur_path_id)
2547        {
2548            if (conn->ifc_cur_path_id != packet_in->pi_path_id)
2549            {
2550                if (0 != on_new_or_unconfirmed_path(conn, packet_in))
2551                {
2552                    LSQ_DEBUG("path %hhu invalid, cancel any path response "
2553                        "on it", packet_in->pi_path_id);
2554                    conn->ifc_send_flags &= ~(SF_SEND_PATH_RESP
2555                                                    << packet_in->pi_path_id);
2556                }
2557
2558The above means: if the current path has not changed after the packet
2559was processed, but the packet came in on a different path, then invoke
2560the "on new or unconfirmed path" logic.  This is done this way because
2561the current path may be have been already changed if the packet contained
2562a PATH_RESPONSE frame.
2563
2564First time a packet is received on a new path, a PATH_CHALLENGE frame is
2565scheduled.
2566
2567If more than one packet received on the new path contain non-probing frames,
2568the current path is switched: it is assumed that the path change is due to
2569NAT rebinding.
2570
2571Client
2572------
2573
2574Path migration is controlled by the client.  When the client receives
2575a packet from an unknown server address, it drops the packet on the
2576floor (per spec).  This code is in ``process_regular_packet()``.
2577
2578The client can migrate if ``es_allow_migration`` is on (it is in the default
2579configuration) and the server provides the "preferred_address" transport
2580parameter.  The migration process begins once the handshake is confirmed;
2581see the ``maybe_start_migration()`` function.  The SCID provided by the
2582server as part of the "preferred_address" transport parameter is used as the
2583destination CID and path #1 is picked:
2584
2585
2586::
2587
2588    copath = &conn->ifc_paths[1];
2589    migra_begin(conn, copath, dce, (struct sockaddr *) &sockaddr, params);
2590    return BM_MIGRATING;
2591
2592In ``migra_begin``, migration state is initiated and sending of a
2593PATH_CHALLENGE frame is scheduled:
2594
2595::
2596
2597    conn->ifc_mig_path_id = copath - conn->ifc_paths;
2598    conn->ifc_used_paths |= 1 << conn->ifc_mig_path_id;
2599    conn->ifc_send_flags |= SF_SEND_PATH_CHAL << conn->ifc_mig_path_id;
2600    LSQ_DEBUG("Schedule migration to path %hhu: will send PATH_CHALLENGE",
2601        conn->ifc_mig_path_id);
2602
2603Sending Path Challenges
2604-----------------------
2605
2606To send a path challenge, a packet is allocated to be sent on that path,
2607a new challenge is generated, the PATH_CHALLENGE is written to the
2608packet, and the packet is scheduled.  All this happens in the
2609``generate_path_chal_frame()`` function.
2610
2611::
2612
2613    need = conn->ifc_conn.cn_pf->pf_path_chal_frame_size();
2614    packet_out = get_writeable_packet_on_path(conn, need, &copath->cop_path, 1);
2615    /* --- 8< --- some code elided... */
2616    w = conn->ifc_conn.cn_pf->pf_gen_path_chal_frame(
2617            packet_out->po_data + packet_out->po_data_sz,
2618            lsquic_packet_out_avail(packet_out),
2619            copath->cop_path_chals[copath->cop_n_chals]);
2620    /* --- 8< --- some code elided... */
2621    lsquic_alarmset_set(&conn->ifc_alset, AL_PATH_CHAL + path_id,
2622                    now + (INITIAL_CHAL_TIMEOUT << (copath->cop_n_chals - 1)));
2623
2624If the path response is not received before a timeout, another path challenge
2625is sent, up to the number of elements in ``cop_path_chals``.  The timeout
2626uses exponential back-off; it is not based on RTT, because the RTT of the
2627new path is unknown.
2628
2629Receiving Path Responses
2630------------------------
2631
2632When a PATH_RESPONSE frame is received, the path on which the corresponding
2633challenge was sent may become the new current path.  See
2634``process_path_response_frame()``.
2635
2636Note that the path ID of the incoming packet with the PATH_RESPONSE frame is
2637not taken into account.  This is by design: see
2638`Section 8.2.2 of the Transport I-D
2639<https://tools.ietf.org/html/draft-ietf-quic-transport-34#section-8.2.2>`__.
2640
2641Stream Priority Iterators
2642=========================
2643
2644Creating Streams on the Server
2645==============================
2646
2647Calculating Packet Tolerance
2648============================
2649
2650When the Delayed ACKs extension is used, we advertise our ``Packet Tolerance``
2651to peer.  This is the number of packets the peer can receive before having to
2652send an acknowledgement.  By default -- without the extension -- the packet
2653tolerance is 2.
2654
2655Because we `merge ACKs <#ack-merging>`__, receiving more than one ACK between
2656ticks is wasteful.  Another consideration is that a packet is not declared
2657lost until at least one RTT passes -- the time to send a packet and receive
2658the acknowledgement from peer.
2659
2660To calculate the packet tolerance, we use a feedback mechanism: when number
2661of ACKs per RTT is too high, we increase packet tolerance; when number of
2662ACKs per RTT is too low, we decrease packet tolerance.  The feedback is
2663implemented with a `PID Controller <https://en.wikipedia.org/wiki/PID_controller>`__:
2664the target is the number of ACKs per RTT, normalized to 1.0.
2665
2666See the function ``packet_tolerance_alarm_expired()`` as well as comments
2667in ``lsquic.h`` that explain the normalization as well as the knobs available
2668for tuning.
2669
2670The pre-normalized target is a function of RTT.  It was obtained
2671empirically using netem.  This function together with the default
2672PID controller parameters give good performance in the lab and in
2673some limited interop testing.
2674
2675Anatomy of Outgoing Packet
2676**************************
2677
2678Overview
2679========
2680
2681The outgoing packet is represented by ``struct lsquic_packet_out``.  An
2682outgoing packet always lives on one -- and only one -- of the
2683`Send Controller`_'s `Packet Queues`_.  For that, ``po_next`` is used.
2684
2685Beyond the packet number, stored in ``po_packno``, the packet has several
2686properties: sent time (``po_sent``), frame information, encryption
2687level, network path, and others.  Several properties are encoded into
2688one or more bits in the bitmasks ``po_flags`` and ``po_lflags``.
2689Multibit properties are usually accessed and modified by a special
2690macro.
2691
2692The packet has a pointer to the packetized data in ``po_data``.
2693If the packet has been encrypted but not yet sent, the encrypted
2694buffer is pointed to ``po_enc_data``.
2695
2696Packet Payload
2697==============
2698
2699The payload consists of the various frames -- STREAM, ACK, and others --
2700written, one after another, to ``po_data``.  The header, consisting of
2701the type byte, (optional) connection ID, and the packet number is constructed
2702when the packet is just about to be sent, during encryption.  This
2703buffer -- header and the encrypted payload are stored in a buffer
2704pointed to by ``po_enc_data``.
2705
2706Because stream data is written directly to the outgoing packet, the
2707packet is not destroyed when it is declared lost by the `loss detection
2708logic <#loss-detection-and-retransmission>`__.  Instead, it is repackaged
2709and sent out again as a new packet.  Besides assigning the packet a
2710new number, packet retransmission involves removing non-retransmittable
2711frames from the packet.  (See ``lsquic_packet_out_chop_regen()``.)
2712
2713Historically, some places in the code assumed that the frames to be
2714dropped are always located at the beginning of the ``po_data`` buffer.
2715(This was before a `frame record <#frame-records>`__ was created for
2716each frame).  The cumulative size of the frames to be removed is in
2717``po_regen_sz``; this size can be zero.  Code that generates
2718non-retransmittable frames still writes them only to the beginning
2719of the packet.
2720
2721The goal is to drop ``po_regen_sz`` and to begin to write ACK and
2722other non-retransmittable frames anywhere.  This should be possible
2723to do now (see ``lsquic_packet_out_chop_regen()``, which can support
2724such use after removing the assertion), but we haven't pulled the
2725trigger on it yet.  Making this change will allow other code to become
2726simpler: for example, the opportunistic ACKs logic.
2727
2728Frame Records
2729=============
2730
2731Each frame written to ``po_data`` has an associated *frame record* stored
2732in ``po_frecs``:
2733
2734::
2735
2736    struct frame_rec {
2737        union {
2738            struct lsquic_stream   *stream;
2739            uintptr_t               data;
2740        }                        fe_u;
2741        unsigned short           fe_off,
2742                                 fe_len;
2743        enum quic_frame_type     fe_frame_type;
2744    };
2745
2746Frame records are primarily used to keep track of the number of unacknowledged
2747stream frames for a stream.  When a packet is acknowledged, the frame records
2748are iterated over and ``lsquic_stream_acked()`` is called.  The second purpose
2749is to speed up packet resizing, as frame records record the type, position,
2750and size of a frame.
2751
2752Most of the time, a packet will contain a single frame: STREAM on the sender
2753of data and ACK on the receiver.  This use case is optimized: ``po_frecs`` is
2754a union and when there is only one frame per packets, the frame record is
2755stored in the packet struct directly.
2756
2757Evanescent Connection
2758*********************
2759
2760*Files: lsquic_pr_queue.h, lsquic_pr_queue.c*
2761
2762"PR Queue" stands for "Packet Request Queue."  This and the Evanescent
2763Connection object types are explaned below in this section.
2764
2765Overview
2766========
2767
2768Some packets need to be replied to outside of context of existing
2769mini or full connections:
2770
27711. A version negotiation packet needs to be sent when a packet
2772   arrives that specifies QUIC version that we do not support.
2773
27742. A stateless reset packet needs to be sent when we receive a
2775   packet that does not belong to a known QUIC connection.
2776
2777
2778The replies cannot be sent immediately.  They share outgoing
2779socket with existing connections and must be scheduled according
2780to prioritization rules.
2781
2782The information needed to generate reply packet  -- connection ID,
2783connection context, and the peer address -- is saved in the Packet
2784Request Queue.
2785
2786When it is time to send packets, the connection iterator knows to
2787call prq_next_conn() when appropriate.  What is returned is an
2788evanescent connection object that disappears as soon as the reply
2789packet is successfully sent out.
2790
2791There are two limits associated with Packet Request Queue:
2792
27931. Maximum number of packet requests that are allowed to be
2794   pending at any one time.  This is simply to prevent memory
2795   blowout.
2796
27972. Maximum verneg connection objects to be allocated at any one
2798   time.  This number is the same as the maximum batch size in
2799   the engine, because the packet (and, therefore, the connection)
2800   is returned to the Packet Request Queue when it could not be
2801   sent.
2802
2803We call this a "request" queue because it describes what we do with
2804QUIC packets whose version we do not support or those packets that
2805do not belong to an existing connection: we send a reply for each of
2806these packets, which effectively makes them "requests."
2807
2808Packet Requests
2809===============
2810
2811When an incoming packet requires a non-connection response, it is added
2812to the Packet Request Queue.  There is a single ``struct pr_queue`` per
2813engine -- it is instantiated if the engine is in the server mode.
2814
2815The packet request is recorded in ``struct packet_req``, which are kept
2816inside a hash in the PR Queue.  The reason for keeping the requests in
2817a hash is to minimize duplicate responses:  If a client hello message
2818is spread over several incoming packets, only one response carrying the
2819version negotiation packet (for example) will be sent.
2820
2821::
2822
2823    struct packet_req
2824    {
2825        struct lsquic_hash_elem     pr_hash_el;
2826        lsquic_cid_t                pr_scid;
2827        lsquic_cid_t                pr_dcid;
2828        enum packet_req_type        pr_type;
2829        enum pr_flags {
2830            PR_GQUIC    = 1 << 0,
2831        }                           pr_flags;
2832        enum lsquic_version         pr_version;
2833        unsigned                    pr_rst_sz;
2834        struct network_path         pr_path;
2835    };
2836
2837Responses are created on demand.  Until that time, everything that is
2838necessary to generate the response is stored in ``packet_req``.
2839
2840Sending Responses
2841=================
2842
2843To make these packets fit into the usual packet-sending loop,
2844each response is made to resemble a packet
2845sent by a connecteion.  For that, the PR Queue creates a connection
2846object that only lives for the duration of batching of the packet.
2847(Hence the connection's name: *evanescent* connection.)  This connection
2848is returned by the ``lsquic_prq_next_conn()`` by the connection iterator
2849during the `batching process <#batching-packets>`__
2850
2851For simplicity, the response packet is generated in this function as well.
2852The call to ``ci_next_packet_to_send()`` only returns the pointer to it.
2853
2854Send Controller
2855***************
2856
2857*Files: lsquic_send_ctl.h, lsquic_send_ctl.c*
2858
2859.. _overview-4:
2860
2861Overview
2862========
2863
2864The Send Controller manages outgoing packets and the sending rate:
2865
2866-  It decides whether packets can be sent
2867
2868-  It figures out what the congestion window is
2869
2870-  It processes acknowledgements and detects packet losses
2871
2872-  It allocates packets
2873
2874-  It maintains sent packet history
2875
2876The controller allocates, manages, splits, coalesces, and destroys
2877outgoing packets. It owns these packets.
2878
2879The send controller services two modules:
2880
2881-  Full connection. gQUIC and IETF full connections use the send
2882   controller to allocate packets and delegate packet-sending
2883   decisions to it.
2884
2885-  Stream. The stream uses the stream controller as the source of
2886   outgoing packets to write STREAM frames to.
2887
2888Packet Life Cycle
2889=================
2890
2891A new outgoing packet is allocated and returned to the connection or the
2892stream. Around this time (just before or just after, depending on the
2893particular function call to get the packet), the packet is placed on the
2894Scheduled Queue.
2895
2896When the engine is creating a batch of packets to send, it calls
2897``ci_next_packet_to_send()``. The connection removes the next packet from
2898its Scheduled Queue and returns it. The engine now owns the outgoing
2899packet, but only while the batch is being sent. The engine *always
2900returns the packet* after it tries to send it.
2901
2902If the packet was sent successfully, it is returned via the
2903``ci_packet_sent`` call, after which it is appended to the Unacked Queue.
2904If the packet could not be sent, ``ci_packet_not_sent()`` is called, at
2905which point it is prepended back to the Schedule Queue to be tried
2906later.
2907
2908There are two ways to get off the Unacked Queue: being acknowledged or
2909being lost. When a packet is acknowledged, it is destroyed. On the other
2910hand, if it is deemed lost, it is placed onto the Lost Queue, where it
2911will await being rescheduled.
2912
2913Packet Queues
2914=============
2915
2916.. image:: lsquic-packet-queues.png
2917
2918Buffered Queue
2919--------------
2920
2921The Buffered Queue is a special case. When writing to the stream occurs
2922outside of the event dispatch loop, new outgoing packets begin their
2923life in the Buffered Queue. They get scheduled during a connection tick,
2924making their way onto the Scheduled Queue.
2925
2926There are two buffered queues: one for packets holding STREAM frames
2927from the highest-priority streams and one for packets for streams with
2928lower priority.
2929
2930Scheduled Queue
2931---------------
2932
2933Packets on the Scheduled Queue have packet numbers assigned to them. In
2934rare cases, packets may be removed from this queue before being sent
2935out. (For example, a stream may be cancelled, in which case packets that
2936carry its STREAM frames may end up empty.) In that case, they are marked
2937with a special flag to generate the packet number just before they are
2938sent.
2939
2940Unacked Queue
2941-------------
2942
2943This queue holds packets that have been sent but are yet to be
2944acknowledged. When a packet on this queue is acknowledged, it is
2945destroyed.
2946
2947The loss detection code runs on this queue when ACKs are received or
2948when the retransmission timer expires.
2949
2950This queue is actually three queues: one for each of the IETF QUIC's
2951Packet Number Spaces, or PNSs. The PNS_APP queue is what is used by
2952gQUIC and IETF QUIC server code. PNS_INIT and PNS_HSK are only used by
2953the IETF QUIC client. (IETF QUIC server handles those packet number
2954spaces in its mini conn module.)
2955
2956In addition to regular packets, the Unacked Queue holds `loss
2957records <#loss-records>`__ and `poisoned packets <#poisoned-packets>`__.
2958
2959Lost Queue
2960----------
2961
2962This queue holds lost packets. These packets are removed from the
2963Unacked Queue when it is decided that they have been lost. Packets on
2964this queue get rescheduled after connection schedules a packet with
2965control frames, as those have higher priority.
2966
29670-RTT Stash Queue
2968-----------------
2969
2970This queue is used by the client to retransmit packets that carry 0-RTT
2971data.
2972
2973Handling ACKs
2974=============
2975
2976Acknowledgements are processed in the function
2977``lsquic_send_ctl_got_ack``.
2978
2979One of the first things that is done is ACK validation. We confirm that
2980the ACK does not contain any packet numbers that we did not send.
2981Because of the way we `generate packet numbers <#packet-numbers>`__,
2982this check is a simple comparison.
2983
2984The nested loops work as follows. The outer loop iterates over the
2985packets in the Unacked Queue in order -- meaning packet numbers
2986increase. In other words, older packets are examined first. The inner
2987loop processes ACK ranges in the ACK *backwards*, meaning that both
2988loops follow packets in increasing packet number order. It is done this
2989way as an optimization. The (previous) alternative design of looking
2990up a packet number in the ACK frame, even if using binary search, is
2991slower.
2992
2993The code is optimized: the inner loop has a minimum possible number of
2994branches. These optimizations predate the more-recent, higher-level
2995optimization. The latest ACK-handling optimization added to the code
2996combines incoming ACKs into a single ACK (at the connection level), thus
2997reducing the number of times this loop needs to be used by a lot,
2998sometimes by a significant factor (when lots of data is being sent).
2999This makes some of the code-level optimizations, such as the use of
3000``__builtin_prefetch``, an overkill.
3001
3002Loss Records
3003============
3004
3005A loss record is a special type of outgoing packet. It marks a place in
3006the Unacked Queue where a lost packet had been -- the lost packet itself
3007having since moved on to the Lost Queue or further. The loss record and
3008the lost packet form a circular linked list called the "loss chain."
3009This list contains one real packet and zero or more loss records. The
3010real packet can move from the Unacked Queue to the Lost Queue to the
3011Scheduled Queue and back to the Unacked Queue; its loss records live
3012only on the Unacked Queue.
3013
3014We need loss records to be able to handle late acknowledgements -- those
3015that acknowledge a packet *after* it has been deemed lost. When an
3016acknowledgment for any of the packet numbers associated with this packet
3017comes in, the packet is acknowledged and the whole loss chain is
3018destroyed.
3019
3020Poisoned Packets
3021================
3022
3023A poisoned packet is used to thwart opportunistic ACK attacks. The
3024opportunistic ACK attack works as follows:
3025
3026-  The client requests a large resource
3027
3028-  The server begins sending the response
3029
3030-  The client sends ACKs for packet number before it sees these packets,
3031   tricking the server into sending packets faster than it would
3032   otherwise
3033
3034The poisoned packet is placed onto the Unacked Queue. If the peer lies
3035about packet numbers it received, it will acknowledge the poisoned
3036packet, in which case it will be discovered during ACK processing.
3037
3038Poisoned packets cycle in and out of the Unacked Queue. A maximum of one
3039poisoned packet is outstanding at any one time for simplicity. (And we
3040don't need more).
3041
3042Packet Numbers
3043==============
3044
3045The Send Controller aims to send out packets without any gaps in the
3046packet number sequence. (The only exception to this rule is the handling
3047of poisoned packets, where the gap is what we want.) Not having gaps in
3048the packet number sequence is beneficial:
3049
3050-  ACK verification is cheap
3051
3052-  Send history updates are fast
3053
3054-  Send history uses very little memory
3055
3056The downside is code complexity and having to renumber packets when they
3057are removed from the Scheduled Queue (due to, for example, STREAM frame
3058elision or loss chain destruction) or resized (due to a path or MTU
3059change, for instance).
3060
3061Some scenarios when gaps can be produced inadvertently are difficult to
3062test or foresee. To cope with that, a special warning in the send
3063history code is added when the next packet produces a gap. This warning
3064is limited to once per connection. Having a gap does not break
3065functionality other than ACK verification, but that's minor. On the
3066other hand, we want to fix such bugs when they crop up -- that's why the
3067warning is there.
3068
3069Loss Detection and Retransmission
3070=================================
3071
3072The loss detection and retransmission logic in the Send Controller was
3073taken from the Chromium code in the fall of 2016, in the beginning of
3074the lsquic project. This logic has not changed much since then -- only
3075some bugs have been fixed here and there. The code bears no resemblance
3076to what is described in the QUIC Recovery Internet Draft. Instead, `the
3077much earlier
3078document <https://tools.ietf.org/html/draft-iyengar-quic-loss-recovery-01>`__,
3079describing gQUIC, could be looked to for reference.
3080
3081Congestions Controllers
3082=======================
3083
3084The Send Controller has a choice of two congestion controllers: Cubic
3085and BBRv1. The latter was translated from Chromium into C. BBRv1 does
3086not work well for very small RTTs.
3087
3088To cope with that, lsquic puts the Send Controller into the "adaptive CC"
3089mode by default. The CC is selected after RTT is determined: below a
3090certain threshold (configurable; 1.5 ms by default), Cubic is used.
3091Until Cubic or BBRv1 is selected, *both* CC controllers are used --
3092because we won't have the necessary state to instantiate a controller
3093when the decision is made.
3094
3095Buffered Packet Handling
3096========================
3097
3098Buffered packets require quite a bit of special handling. Because they
3099are created outside of the regular event dispatch, a lot of things are
3100unknown:
3101
3102-  Congestion window
3103
3104-  Whether more incoming packets will arrive before the next tick
3105
3106-  The optimal packet number size
3107
3108The Send Controller tries its best to accommodate the buffered packets
3109usage scenario.
3110
3111ACKs
3112----
3113
3114When buffered packets are created, we want to generate an ACK, if
3115possible. This can be seen in ``send_ctl_get_buffered_packet``, which
3116calls ``ci_write_ack()``
3117
3118This ACK should be in the first buffered packet to be scheduled. Because
3119the Send Controller does not dictate the order of buffered packet
3120creation -- high-priority versus low-priority -- it may need to move (or
3121steal) the ACK frame from a packet on the low-priority queue to a packet
3122on the high-priority queue.
3123
3124When buffered packets are finally scheduled, we have to remove ACKs from
3125them if another ACK has already been sent. This is because Chrome errors
3126out if out-of-order ACKs come in.
3127
3128Flushing QPACK Decoder
3129----------------------
3130
3131The priority-based write events dispatch is emulated when the first
3132buffered packet is allocated: the QPACK decoder is flushed. Without it,
3133QPACK updates are delayed, which may negatively affect compression
3134ratio.
3135
3136Snapshot and Rollback
3137=====================
3138
3139The Send Controller snapshot and rollback functionality was implemented
3140exclusively for the benefit of the optimized ``lsquic_stream_pwritev``
3141call.
3142
3143Complexity Woes
3144===============
3145
3146The Send Controller is complicated. Because we write stream data to
3147packets directly and packets need to be resized, a lot of complexity
3148resides in the code to resize packets, be it due to repathing, STREAM
3149frame elision, or MTU changes. This is the price to be paid for
3150efficiency in the normal case.
3151
3152
3153Alarm Set
3154*********
3155
3156*Files: lsquic_alarmset.h, lsquic_alarmset.c, test_alarmset.c*
3157
3158The alarm set, ``struct lsquic_alarmset``, is an array of callbacks and
3159expiry times.  To speed up operations, setting and unsetting alarms is
3160done via macros.
3161
3162The functions to ring [4]_ the alarms and to calculate the next alarm
3163time use a loop.  It would be possible to maintain a different data
3164structure, such as a min-heap, to keep the alarm, and that would obviate
3165the need to loop in ``lsquic_alarmset_mintime()``.  It is not worth it:
3166the function is not called often and a speed win here would be offset
3167by the necessity to maintain the min-heap ordering.
3168
3169Tickable Queue
3170**************
3171
3172*Files: lsquic_engine.c, lsquic_min_heap.h, lsquic_min_heap.c*
3173
3174The Tickable Queue is a min-heap used as a priority queue. Connections
3175on this queue are in line to be processed. Connections that were last
3176ticked a longer time ago have higher priority than those ticked
3177recently. (``cn_last_ticked`` is used for ordering.) This is intended to
3178prevent starvation as multiple connections vye for the ability to send
3179packets.
3180
3181The way the min-heap grows is described in `Growing
3182Min-Heaps <#growing-min-heaps>`__.
3183
3184Advisory Tick Time Queue
3185************************
3186
3187*Files: lsquic_attq.h, lsquic_attq.c*
3188
3189This data structure is a mini-heap. Connections are ordered by the value
3190of the next time they should be processed (ticked). (Because this is not
3191a hard limit, this value is advisory -- hence its name.)
3192
3193This particular min-heap implementation has two properties worth
3194highlighting:
3195
3196Removal of Arbitrary Elements
3197=============================
3198
3199When a connection's next tick time is updated (or when the connection is
3200destroyed), the connection is removed from the ATTQ. At that time, it
3201may be at any position in the min-heap. The position is recorded in the
3202heap element, ``attq_elem->ae_heap_idx`` and is updated when elements are
3203swapped. This makes it unnecessary to search for the entry in the
3204min-heap.
3205
3206Swapping Speed
3207==============
3208
3209To make swapping faster, the array that underlies the min-heap is an
3210array of *pointers* to ``attq_elem``. This makes it unnecessary to update
3211each connection's ``cn_attq_elem`` as array elements are swapped: the
3212memory that stores ``attq_elem`` stays put. This is why there are both
3213``aq_elem_malo`` and ``aq_heap``.
3214
3215CID Purgatory
3216*************
3217
3218*Files: lsquic_purga.h, lsquic_purga.c*
3219
3220Overview
3221========
3222
3223This module keeps a set of CIDs that should be ignored for a period
3224of time.  It is used when a connection is closed: this way, late
3225packets will not create a new connection.
3226
3227A connection may have been deleted, retired, or closed.  In the latter
3228case, it enters the `Draining State <https://tools.ietf.org/html/draft-ietf-quic-transport-34#section-10.2.2>`__.
3229In this state, the connection is to ignore incoming packets.
3230
3231Structure
3232=========
3233
3234The purgatory keeps a list of 16-KB pages.  A page looks like this:
3235
3236::
3237
3238    #define PURGA_ELS_PER_PAGE 273
3239
3240    struct purga_page
3241    {
3242        TAILQ_ENTRY(purga_page)     pupa_next;
3243        lsquic_time_t               pupa_last;
3244        unsigned                    pupa_count;
3245        bloom_mask_el_t             pupa_mask[BLOOM_N_MASK_ELS];
3246        lsquic_cid_t                pupa_cids[PURGA_ELS_PER_PAGE];
3247        void *                      pupa_peer_ctx[PURGA_ELS_PER_PAGE];
3248        struct purga_el             pupa_els[PURGA_ELS_PER_PAGE];
3249    };
3250
3251The reason for having CIDs and peer contexts in separate arrays is to be
3252able to call the ``ea_old_scids()`` callback when a page expires.  A page
3253is expired when it is full and the last added element is more than
3254``pur_min_life`` microseconds ago.  The minimum CID life is hardcoded as
325530 seconds in lsquic_engine.c (see the ``lsquic_purga_new()`` call).
3256
3257To avoid scannig the whole array of CIDs in ``lsquic_purga_contains()``,
3258we use a Bloom filter.
3259
3260The Bloom filter is constructed using a 8192-bit bit field and 6 hash
3261functions.  With 273 elements per page, this gives us 0.004% possibility
3262of a false positive.  In other words, when we do have to search a page
3263for a particular CID, the chance of finding the CID is 99.99%.
3264
3265Quick calculation:
3266
3267.. code-block:: text
3268
3269    perl -E '$k=6;$m=1<<13;$n=273;printf("%f\n", (1-exp(1)**-($k*$n/$m))**$k)'
3270
3271To extract 6 13-bit values from a 64-bit integer, they are overlapped:
3272
3273.. code-block:: text
3274
3275     0         10        20        30        40        50        60
3276    +----------------------------------------------------------------+
3277    |                                                                |
3278    +----------------------------------------------------------------+
3279     1111111111111
3280               2222222222222
3281                         3333333333333
3282                                   4444444444444
3283                                             5555555555555
3284                                                       6666666666666
3285
3286This is not 100% kosher, but having 6 functions gives a better guarantee
3287and it happens to work in practice.
3288
3289Memory Manager
3290**************
3291
3292*Files: lsquic_mm.h, lsquic_mm.c*
3293
3294The memory manager allocates several types of objects that are used by
3295different parts of the library:
3296
3297-   Incoming packet objects and associated buffers
3298
3299-   Outgoing packet objects and associated buffers
3300
3301-   Stream frames
3302
3303-   Frame records
3304
3305-   Mini connections, both Google and IETF QUIC
3306
3307-   DCID elements
3308
3309-   HTTP/3 (a.k.a. "HQ") frames
3310
3311-   Four- and sixteen-kilobyte pages
3312
3313These objects are either stored on linked list or in `malo <#malo-allocator>`__
3314pools and are shared among all connections.  (Full connections allocate outgoing
3315packets from per-connection malo allocators: this is done to speed up `ACK
3316processing <#handling-acks>`__.)
3317
3318The list of cached outgoing packet buffers is shrunk once in a while (see
3319the "poolst\_*" functions).  Other object types are kept in the cache
3320until the engine is destroyed.  One Memory Manager object is allocated per
3321engine instance.
3322
3323Malo Allocator
3324**************
3325
3326*Files: lsquic_malo.h, lsquic_malo.c*
3327
3328Overview
3329========
3330
3331The malo allocator is a pool of objects of fixed size.  It tries to
3332allocate and deallocate objects as fast as possible.  To do so, it
3333does the following:
3334
33351. Allocations occur 4 KB at a time.
3336
33372. No division or multiplication operations are performed for
3338   appropriately sized objects.  (More on this below.)
3339
3340(In recent testing, malo was about 2.7 times faster than malloc for
334164-byte objects.)
3342
3343Besides speed, the allocator provides a convenient API:
3344To free (put) an object, one does not need a pointer to the malo
3345object.
3346
3347To gain all these advantages, there are trade-offs:
3348
33491. There are two memory penalties:
3350
3351   a. Per object overhead.  If an object is at least ROUNDUP_THRESH in
3352      size as the next power of two, the allocator uses that power of
3353      two value as the object size.  This is done to avoid using
3354      division and multiplication.  For example, a 104-byte object
3355      will have a 24-byte overhead.
3356
3357   b. Per page overhead.  Page links occupy some bytes in the
3358      page.  To keep things fast, at least one slot per page is
3359      always occupied, independent of object size.  Thus, for a
3360      1 KB object size, 25% of the page is used for the page
3361      header.
3362
33632. 4 KB pages are not freed until the malo allocator is destroyed.
3364   This is something to keep in mind.
3365
3366Internal Structure
3367==================
3368
3369The malo allocator allocates objects out of 4 KB pages.  Each page is
3370aligned on a 4-KB memory boundary.  This makes it possible for the
3371``lsquic_malo_put()`` function only to take on argument -- the object
3372to free -- and to find the malo allocator object itself.
3373
3374Each page begins with a header followed by a number of slots -- up to
3375the 4-KB limit.  Two lists of pages are maintained: all pages and free
3376pages.  A "free" page is a page with at least one free slot in it.
3377
3378The malo allocator (``struct malo``) stores itself in the first page,
3379occupying some slots.
3380
3381Receive History
3382***************
3383
3384*Files: lsquic_rechist.h, lsquic_rechist.c, test_rechist.c*
3385
3386.. _overview-5:
3387
3388Overview
3389========
3390
3391The reason for keeping the history of received packets is to generate
3392ACK frames. The Receive History module provides functionality to add
3393packet numbers, truncate history, and iterate over the received packet
3394number ranges.
3395
3396.. _data-structures-3:
3397
3398Data Structures
3399===============
3400
3401.. _overview-6:
3402
3403Overview
3404--------
3405
3406The receive history is a singly-linked list of packet number ranges,
3407ordered from high to low:
3408
3409.. image:: rechist-linked-list.png
3410
3411The ordering is maintained as an invariant with each addition to the
3412list and each truncation. This makes it trivial to iterate over the
3413ranges.
3414
3415To limit the amount of memory this data structure can allocate, the
3416maximum number of elements is specified when Receive History is
3417initialized. In the unlikely case that that number is reached, new
3418elements will push out the elements at the tail of the linked list.
3419
3420Memory Layout
3421-------------
3422
3423In memory, the linked list elements are stored in an array. Placing them
3424into contiguous memory achieves three goals:
3425
3426-  Receive history manipulation is fast because the elements are all
3427   close together.
3428
3429-  Memory usage is reduced because each element does not use pointers to
3430   other memory locations.
3431
3432-  Memory fragmentation is reduced.
3433
3434The array grows as necessary as the number of elements increases.
3435
3436The elements are allocated from and returned to the array with the aid
3437of an auxiliary data structure. An array of bitmasks is kept where each
3438bit corresponds to an array element. A set bit means that the element is
3439allocated; a cleared bit indicates that the corresponding element is
3440free.
3441
3442To take memory savings and speed further, the element array and the
3443array of bitmasks are allocated in a single span of memory.
3444
3445.. image:: rechist-memory-layout.png
3446
3447rechist_elem
3448------------
3449
3450``re_low`` and ``re_count`` define the packet range. To save memory, we
3451assume that the range will not contain more than 4 billion entries and
3452use a four-byte integer instead of a second ``lsquic_packno_t``.
3453
3454``re_next`` is the index of the next element. Again, we assume that there
3455will be no more than 4 billion elements. The NULL pointer is represented
3456by ``UINT_MAX``.
3457
3458This struct is just 16 bytes in size, which is a nice number.
3459
3460lsquic_rechist
3461--------------
3462
3463``rh_elems`` and ``rh_masks`` are the element array and the bitmask array,
3464respectively, as described above. The two use the same memory chunk.
3465
3466``rh_head`` is the index of the first element of the linked list.
3467
3468The iterator state, ``rh_iter``, is embedded into the main object itself,
3469as there is no expectation that more than one iterations will need to be
3470active at once.
3471
3472.. _notable-code-5:
3473
3474Notable Code
3475============
3476
3477Inserting Elements
3478------------------
3479
3480Elements may be inserted into the list when a new packet number is added
3481to history via ``lsquic_rechist_received()``. If the new packet number
3482requires a new range (e.g. it does not expand one of the existing
3483ranges), a new element is allocated and linked.
3484
3485There are four cases to consider:
3486
34871. Inserting the new element at the head of the list, with it becoming
3488   the new head. (This is the most common scenario.) The code that
3489   does it is labeled ``first_elem``.
3490
34912. Appending the new element to the list, with it becoming the new tail.
3492   This code is located right after the ``while`` loop.
3493
34943. Inserting the new element between two existing elements. This code is
3495   labeled ``insert_before``.
3496
34974. Like (3), but when the insertion is between the last and the
3498   next-to-last elements and the maximum number of elements has been
3499   reached. In this case, the last element's packet number
3500   information can simply be replaced. This code is labeled
3501   ``replace_last_el``.
3502
3503Growing the Array
3504-----------------
3505
3506When all allocated elements in ``rh_elems`` are in use
3507(``rh_n_used >= rh_n_alloced``), the element array needs to be expanded.
3508This is handled by the function ``rechist_grow``.
3509
3510Note how, after realloc, the bitmask array is moved to its new location
3511on the right side of the array.
3512
3513Handling Element Overflow
3514-------------------------
3515
3516When the array has grown to its maximum allowed size, allocating a new
3517element occurs via reusing the last element on the list, effectively
3518pushing it out. This happens in ``rechist_reuse_last_elem``.
3519
3520The first loop finds the last linked list element: that's the element
3521whose ``re_next`` is equal to ``UINT_MAX``.
3522
3523Then, the second loop finds the element that points to the last element.
3524This is the next-to-last (penultimate) element. This element's next
3525pointer will now be set to NULL, effectively dropping the last element,
3526which can now be reused.
3527
3528Iterating Over Ranges
3529---------------------
3530
3531Iteration is performed by the ``lsquic_rechist_first`` and
3532``lsquic_rechist_next`` pair of functions. The former resets the internal
3533iterator. Only one iteration at a time is possible.
3534
3535These functions have a specific signature: they and the pointer to the
3536receive history are passed to the ``pf_gen_ack_frame`` function, which
3537generates an ACK frame.
3538
3539Clone Functionality
3540-------------------
3541
3542The Receive History can be initialized from another instance of a
3543receive history. This is done by ``lsquic_rechist_copy_ranges``. This
3544functionality is used during connection promotion, when `Tiny Receive
3545History <#tiny-receive-history>`__ that is used by the `IETF mini
3546connection <#mini-ietf-connection>`__ is converted to Receive History.
3547
3548Tiny Receive History
3549********************
3550
3551*Files: lsquic_trechist.h, lsquic_trechist.c, test_trechist.c*
3552
3553.. _overview-7:
3554
3555Overview
3556========
3557
3558The Tiny Receive History is similar to `Receive
3559History <#receive-history>`__, but it is even more frugal with memory.
3560It is used in the `IETF mini connection <#mini-ietf-connection>`__ as a
3561more costly `alternative to using bare bitmasks <#imc-recvd-packnos>`__.
3562
3563Because it is so similar to Receive History, only differences are
3564covered in this section.
3565
3566Less Memory
3567===========
3568
3569No Trechist Type
3570----------------
3571
3572There is no ``lsquic_trechist``. The history is just a single 32-bit
3573bitmask and a pointer to the array of elements. The bitmask and the
3574pointer are passed to all ``lsquic_trechist_*`` functions.
3575
3576This gives the user of Tiny Receive History some flexibility and saves
3577memory.
3578
3579Element
3580-------
3581
3582The linked list element, ``trechist_elem``, is just 6 bytes in size. The
3583assumptions are:
3584
3585-  No packet number is larger than 2\ :sup:`32` - 1
3586
3587-  No packet range contains more than 255 packets
3588
3589-  Linked list is limited to 256 elements
3590
3591Head Does Not Move
3592==================
3593
3594Because of memory optimizations described above, the head element is
3595always at index 0. The NULL pointer ``te_next`` is indicated by the value
35960 (because nothing points to the first element).
3597
3598Array Does Not Grow
3599===================
3600
3601The size of the element array is limited by the 32-bit bitmask. As a
3602further optimization, the number of ranges is limited to 16 via the
3603``TRECHIST_MAX_RANGES`` macro.
3604
3605Insertion Range Check
3606=====================
3607
3608A packet range spanning more than 255 (UCHAR_MAX) packets cannot be
3609represented. This will cause a failure, as it is checked for in the
3610code.
3611
3612This many packets are unlikely to even be required to complete the
3613handshake. If this limit is hit, it is perhaps good to abort the mini
3614connection.
3615
3616Set64
3617*****
3618
3619*Files: lsquic_set.h, lsquic_set.h, test_set.c*
3620
3621This data structure (along with *Set32*, which is not currently used
3622anywhere in the code) is meant to keep track of a set of numbers that
3623are always increasing and are not expected to contain many gaps.
3624Stream IDs fit that description, and ``lsquic_set64`` is used in both
3625gQUIC and IETF QUIC full connections.
3626
3627Because one or two low bits in stream IDs contain stream type, the
3628stream IDs of different types are stored in different set structures;
3629otherwise, there would be gaps.  For example, see the
3630``conn_is_stream_closed()`` functions (there is one in each gQUIC and
3631IETF QUIC full connection code).
3632
3633Appendix A: List of Data Structures
3634***********************************
3635
3636The majority of data structures employed by lsquic are linked lists and,
3637to a lesser extent, arrays. This makes the code simple and fast
3638(assuming a smart memory layout).
3639
3640Nevertheless, a few places in the code called for more involved and, at
3641times, customized data structures. This appendix catalogues them.
3642
3643This is the list of non-trivial data structures implemented in lsquic:
3644
3645Ring Buffer Linked Lists
3646========================
3647
3648-  `Receive History <#receive-history>`__
3649
3650-  `Tiny Receive History <#tiny-receive-history>`__
3651
3652Hashes
3653======
3654
3655-  lsquic_hash
3656
3657-  hash_data_in
3658
3659Min-heaps
3660=========
3661
3662-  `Advisory Tick Time Queue <#advisory-tick-time-queue>`__
3663
3664-  lsquic_min_heap
3665
3666Bloom Filters
3667=============
3668
3669-  CID Purgatory
3670
3671
3672Appendix B: Obsolete and Defunct Code
3673*************************************
3674
3675Mem Used
3676========
3677
3678Engine History
3679==============
3680
3681QLOG
3682====
3683
3684
3685.. [1]
3686   This is due to the limitation of the QPACK library: the decoder can
3687   read input one byte at a time, but the encoder cannot write output
3688   one byte at a time. It could be made to do that, but the effort is
3689   not worth it.
3690
3691.. [2]
3692   Mini conn structs are allocated out of the `Malo
3693   Allocator <#malo-allocator>`__, which used to be limited to objects
3694   whose size is a power of two, so it was either fitting it into 128
3695   bytes or effectively doubling the mini conn size.
3696
3697.. [3]
3698   This two-step packet parsing mechanism is left over from the
3699   little-endian to big-endian switch in gQUIC several years ago:
3700   Before parsing out the packet number, it was necessary to know
3701   whether it is little- or big-endian.  It should be possible to
3702   do away with this, especially once gQUIC is gone.
3703
3704.. [4]
3705   This term was picked consciously: alarms *ring*, while timers do
3706   other things, such as "fire" and so on.
3707