internals.rst revision 293df8d6
1############ 2Library Guts 3############ 4 5.. highlight:: c 6 7Introduction 8************ 9 10 11lsquic inception dates back to the fall of 2016. Since that time, lsquic 12underwent several major changes. Some of those had to do with making the 13library more performant; others were needed to add important new 14functionality (for example, IETF QUIC and HTTP/3). Throughout this time, 15one of the main principles we embraced is that **performance trumps 16everything else**, including code readability and maintainability. This 17focus drove code design decisions again and again and it explains some 18of the hairiness that we will come across in this document. 19 20Code Version 21============ 22 23The code version under discussion is v2.29.6. 24 25Coding Style 26************ 27 28Spacing and Cuddling 29==================== 30 31lsquic follows the LiteSpeed spacing and cuddling conventions: 32 33- Two empty lines between function definitions 34 35- Four-space indentation 36 37- Ifs and elses are not cuddled 38 39Function Name Alignment 40======================= 41 42In function definitions, the name is always left-aligned, for example: 43 44:: 45 46 static void 47 check_flush_threshold (lsquic_stream_t *stream) 48 49 50Naming Conventions 51================== 52 53- Struct members usually have prefixes derived from the struct name. 54 For example members of ``struct qpack_dec_hdl`` begin with ``qdh_``, 55 members of ``struct cid_update_batch`` begin with ``cub_``, and so on. 56 This is done to reduce the need to memorize struct member names as 57 vim's autocomplete (Ctrl-P) functionality makes it easy to fill in 58 the needed identifier. 59 60- Non-static functions all begin with ``lsquic_``. 61 62- Functions usually begin with a module name (e.g. ``lsquic_engine_`` or 63 ``stream_``) which is then followed by a verb (e.g. 64 ``lsquic_engine_connect`` or ``stream_activate_hq_frame``). If a 65 function does not begin with a module name, it begins with a verb 66 (e.g. ``check_flush_threshold`` or ``maybe_remove_from_write_q``). 67 68- Underscores are used to separate words (as opposed to, for example, 69 theCamelCase). 70 71Typedefs 72======== 73 74Outside of user-facing API, structs, unions, and enums are not 75typedefed. On the other hand, some integral types are typedefed. 76 77List of Common Terms 78******************** 79 80- **gQUIC** Google QUIC. The original lsquic supported only Google 81 QUIC. gQUIC is going to become obsolete. (Hopefully soon). 82 83- **HQ** This stands for "HTTP-over-QUIC", the original name of HTTP/3. 84 The code predates the official renaming to HTTP/3 and thus there 85 are many types and names with some variation of ``HQ`` in them. 86 87- **iQUIC** This stands for IETF QUIC. To differentiate between gQUIC 88 and IETF QUIC, we use ``iquic`` in some names and types. 89 90- **Public Reset** In the IETF QUIC parlance, this is called the *stateless* 91 reset. Because gQUIC was first to be implemented, this name is still 92 used in the code, even when the IETF QUIC stateless reset is meant. 93 You will see names that contain strings like "prst" and "pubres". 94 95High-Level Structure 96******************** 97 98At a high level, the lsquic library can be used to instantiate an engine 99(or several engines). An engine manages connections; each connection has 100streams. Engine, connection, and stream objects are exposed to the user 101who interacts with them using the API (see :doc:`apiref`). All other data 102structures are internal and are hanging off, in one way or another, from 103the engine, connection, or stream objects. 104 105Engine 106****** 107 108*Files: lsquic_engine.c, lsquic_engine_public.h, lsquic.h* 109 110Data Structures 111=============== 112 113out_batch 114--------- 115 116:: 117 118 /* The batch of outgoing packets grows and shrinks dynamically */ 119 /* Batch sizes do not have to be powers of two */ 120 #define MAX_OUT_BATCH_SIZE 1024 121 #define MIN_OUT_BATCH_SIZE 4 122 #define INITIAL_OUT_BATCH_SIZE 32 123 124 struct out_batch 125 { 126 lsquic_conn_t *conns [MAX_OUT_BATCH_SIZE]; 127 struct lsquic_out_spec outs [MAX_OUT_BATCH_SIZE]; 128 unsigned pack_off[MAX_OUT_BATCH_SIZE]; 129 lsquic_packet_out_t *packets[MAX_OUT_BATCH_SIZE * 2]; 130 struct iovec iov [MAX_OUT_BATCH_SIZE * 2]; 131 }; 132 133The array of struct lsquic_out_specs -- outs above -- is what gets 134passed to the user callback ``ea_packets_out()``. ``conns`` array corresponds 135to the spec elements one to one. 136 137``pack_off`` records which packet in ``packets`` corresponds to which 138connection in ``conns``. Because of coalescing, an element in ``outs`` can 139correspond (logically) to more than one packet. (See how the batch is 140constructed in `Batching packets`_.) On the 141other hand, ``packets`` and ``iov`` arrays have one-to-one correspondence. 142 143There is one instance of this structure per engine: the whole thing is 144allocated as part of `struct lsquic_engine <#lsquic-engine>`__. 145 146 147cid_update_batch 148---------------- 149 150:: 151 152 struct cid_update_batch 153 { 154 lsquic_cids_update_f cub_update_cids; 155 void *cub_update_ctx; 156 unsigned cub_count; 157 lsquic_cid_t cub_cids[20]; 158 void *cub_peer_ctxs[20]; 159 }; 160 161This struct is used to batch CID updates. 162 163There are three user-defined CID liveness callbacks: ``ea_new_scids``, 164``ea_live_scids``, and ``ea_old_scids``. These functions all have the same 165signature, ``lsquic_cids_update_f``. When the batch reaches the count of 16620 (kept in ``cub_count``), the callback is called. 167 168The new SCIDs batch is kept in `struct 169lsquic_engine <#lsquic-engine>`__. Other batches are allocated on the 170stack in different functions as necessary. 171 17220 is an arbitrary number. 173 174lsquic_engine_public 175-------------------- 176 177This struct, defined in lsquic_engine_public.h, is the "public" 178interface to the engine. ("Public" here means accessible by other 179modules inside lsquic, not that it's a public interface like the 180:doc:`apiref`.) Because there are many things in the engine object that 181are accessed by other modules, this struct is used to expose those 182(``public``) parts of the engine. 183 184``lsquic_engine_struct`` is the first member of 185`lsquic_engine <#lsquic-engine>`__. The functions declared in 186lsquic_engine_public.h take a pointer to lsquic_engine_public as the 187first argument, which is then case to lsquic_engine. 188 189This is somewhat ugly, but it's not too bad, as long as one remembers 190that the two pointers are interchangeable. 191 192lsquic_engine 193------------- 194 195This is the central data structure. The engine instance is the root of 196all other data structures. It contains: 197 198- Pointers to connections in several lists and hashes (see `Connection Management <#connection-management>`__) 199 200- Memory manager 201 202- Engine settings 203 204- Token generator 205 206- CID Purgatory 207 208- Server certificate cache 209 210- Transport parameter cache 211 212- Packet request queue 213 214- `Outgoing packet batch <#out-batch>`__ 215 216- And several other things 217 218Some of the members above are stored in the ``pub`` member of type 219`lsquic_engine_public <#lsquic-engine-public>`__. These are accessed 220directly from other parts of lsquic. 221 222The engine is instantiated via ``lsquic_engine_new()`` and destroyed via 223``lsquic_engine_destroy()`` 224 225Connection Management 226===================== 227 228Lifetime 229-------- 230 231There are several `connection types`_. All types of 232connections begin their life inside the engine module, where their 233constructors are called. They all also end their life here as well: this 234is where the destructors are called. 235 236The connection constructors are all different function calls: 237 238- lsquic_ietf_full_conn_client_new 239 240- lsquic_gquic_full_conn_client_new 241 242- lsquic_ietf_full_conn_server_new 243 244- lsquic_gquic_full_conn_server_new 245 246- lsquic_mini_conn_ietf_new 247 248- lsquic_mini_conn_new 249 250- lsquic_prq_new_req 251 252- lsquic_prq_new_req_ext 253 254(See `Evanescent Connection`_ for information about the last two.) 255 256After a connection is instantiated, all further interactions with it, 257including destruction, are done via the `Common Connection Interface`_. 258 259Refcounting Model 260----------------- 261 262Each connection is referenced by at least one of the following data 263structures: 264 2651. CID-to-connection hash. This hash is used to find connections in 266 order to dispatch an incoming packet. Connections can be hashed by 267 CIDs or by address. In the former case, each connection has one or 268 more mappings in the hash table. IETF QUIC connections have up to 269 eight (in our implementation) source CIDs and each of those would 270 have a mapping. In client mode, depending on QUIC versions and 271 options selected, it is may be necessary to hash connections by 272 address, in which case incoming packets are delivered to 273 connections based on the address. 274 2752. Outgoing queue. This queue holds connections that have packets to 276 send. 277 2783. `Tickable Queue`_. This queue holds connections 279 that `can be ticked now <#tickability>`__. 280 2814. `Advisory Tick Time Queue`_. 282 2835. Closing connections queue. This is a transient queue -- it only 284 exists for the duration of 285 `process_connections() <#processing-connections>`__ function call. 286 2876. Ticked connections queue. Another transient queue, similar to the 288 above. 289 290The idea is to destroy the connection when it is no longer referenced. 291For example, a connection tick may return TICK_SEND|TICK_CLOSE. In that 292case, the connection is referenced from two places: (2) and (5). After 293its packets are sent, it is only referenced in (5), and at the end of 294the function call, when it is removed from (5), reference count goes to 295zero and the connection is destroyed. (See function ``destroy_conn``.) If 296not all packets can be sent, at the end of the function call, the 297connection is referenced by (2) and will only be removed once all 298outgoing packets have been sent. 299 300.. image:: lsquic-engine-conns.png 301 302In the diagram above, you can see that the CID-to-connection hash has 303several links to the same connection. This is because an IETF QUIC 304connection has more than one Source Connection IDs (SCIDs), any of which 305can be included by the peer into the packet. See ``insert_conn_into_hash`` 306for more details. 307 308References from each of these data structures are tracked inside the 309connection object by bit flags: 310 311:: 312 313 #define CONN_REF_FLAGS (LSCONN_HASHED \ 314 |LSCONN_HAS_OUTGOING \ 315 |LSCONN_TICKABLE \ 316 |LSCONN_TICKED \ 317 |LSCONN_CLOSING \ 318 |LSCONN_ATTQ) 319 320Functions ``engine_incref_conn`` and ``engine_decref_conn`` manage setting 321and unsetting of these flags. 322 323 324Notable Code 325============ 326 327Handling incoming packets 328------------------------- 329 330Incoming UDP datagrams are handed off to the lsquic library using the 331function ``lsquic_engine_packet_in``. Depending on the engine mode -- 332client or server -- the appropriate `packet 333parsing <#parsing-packets>`__ function is selected. 334 335Because a UDP datagram can contain more than one QUIC packet, the 336parsing is done in a loop. If the first part of packet parsing is 337successful, the internal ``process_packet_in`` function is called. 338 339There, most complexity is contained in ``find_or_create_conn``, which gets 340called for the server side. Here, parsing of the packet is finished, now 341via the version-specific call to ``pf_parse_packet_in_finish``. If 342connection is not found, it may need to be created. Before that, the 343following steps are performed: 344 345- Check that engine is not in the cooldown mode 346 347- Check that the maximum number of mini connections is not exceeded 348 349- Check that the (D)CID specified in the packet is not in the `CID Purgatory`_ 350 351- Check that the packet can be used to create a mini conn: it contains 352 version information and the version is supported 353 354- Depending on QUIC version, perform token verification, if necessary 355 356Only then does the mini connection constructor is called and the 357connection is inserted into appropriate structures. 358 359Processing connections 360---------------------- 361 362Connections are processed in the internal function 363``process_connections``. There is the main connection processing loop and 364logic. 365 366All connections that the iterator passed to this function returns are 367processed in the first while loop. The ``ci_tick()`` call is what causes 368the underlying connection to do all it needs to (most importantly, 369dispatch user events and generate outgoing packets). The return value 370dictates what lists -- global and local to the function -- the 371connection will be placed upon. 372 373Note that mini connection promotion happens inside this loop. Newly 374created full connections are processed inside the same while loop. For a 375short time, a mini and a full connection object exist that are 376associated with the same logical connection. 377 378After all connections are ticked, outgoing packets, if there are any, 379`are sent out <#batching-packets>`__. 380 381Then, connections that were closed by the first while loop above are 382finally closed. 383 384Connections that were ticked (and not closed) are either: 385 386- Put back onto the ``tickable`` queue; 387 388- Added to the `Advisory Tick Time Queue`_; or 389 390- Left unqueued. This can happen when both idle and ping timer are 391 turned off. (This should not happen for the connections that we 392 expect to process, though.) 393 394And lastly, CID liveness updates are reported to the user via the 395optional SCIDs callbacks: ``ea_new_scids`` etc. 396 397Tickable Queue Cycle 398-------------------- 399 400When a connection is ticked, it is removed from the `Tickable 401Queue <#tickable-queue>`__ and placed onto the transient Ticked Queue. 402After outgoing packets are sent and some connections are closed, the 403Ticked Queue is examined: the engine queries each remaining connection 404again whether it's tickable. If it is, back onto the Tickable Queue it 405goes. This should not happen often, however. It may occur when RTT is 406low and there are many connections to process. In that case, once all 407connections have been processed, the pacer now allows to send another 408packet because some time has passed. 409 410Batching packets 411---------------- 412 413Packet-sending entry point is the function ``send_packets_out``. The main 414idea here is as follows: 415 416Iterate over connections that have packets to send (those are on the 417Outgoing queue in the engine). For each connection, ask it for the next 418outgoing packet, encrypt it, and place it into the batch. When the batch 419is full, `send the batch <#sending-a-batch>`__. 420 421The outgoing packets from all connections are interleaved. For example, 422if connections A, B, and C are on the Outgoing queue, the batch will 423contain packets A1, B1, C1, A2, B2, C2, A3, B3, C3, ��� and so on. This is 424done to ensure fairness. When a connection runs out of packets to send, 425it returns NULL and is removed from the iterator. 426 427The idea is simple, but the devil is in the details. The code may be 428difficult to read. There are several things going on: 429 430Conns Out Iterator 431^^^^^^^^^^^^^^^^^^ 432 433This iterator, ``conns_out_iter``, sends packets from connections on the 434Outgoing queue and packets on the Packet Request queue. (The latter 435masquerade as `Evanescent Connections <#evanescent-connection>`__ so that they 436are simple to use.) First, the Outgoing queue (which is a min-heap) is 437drained. Then, packets from the Packet Request queue are sent, if there 438are any. Then, remaining connections from the first pass are returned in 439the round-robin fashion. 440 441After sending is completed, the connections that still have outgoing 442packets to send are placed back onto the Outgoing queue. 443 444 445Packet Coalescing 446^^^^^^^^^^^^^^^^^ 447 448Some IETF QUIC packets can be coalesced. This reduces the number of UDP 449datagrams that need to be sent during the handshake. To support this, if 450a packet matches some parameters, the same connection is queried for 451another packet, which, if it returns, is added to the current batch 452slot's iov. 453 454:: 455 456 if ((conn->cn_flags & LSCONN_IETF) 457 && ((1 << packet_out->po_header_type) 458 & ((1 << HETY_INITIAL)|(1 << HETY_HANDSHAKE)|(1 << HETY_0RTT))) 459 && (engine->flags & ENG_COALESCE) 460 && iov < batch->iov + sizeof(batch->iov) / sizeof(batch->iov[0])) 461 { 462 const struct to_coal to_coal = { 463 .prev_packet = packet_out, 464 .prev_sz_sum = iov_size(packet_iov, iov), 465 }; 466 packet_out = conn->cn_if->ci_next_packet_to_send(conn, &to_coal); 467 if (packet_out) 468 goto next_coa; 469 } 470 batch->outs [n].iovlen = iov - packet_iov; 471 472*With some debug code removed for simplicity* 473 474Also see the description of the batch in `out_batch`_. 475 476Note that packet coalescing is only done during the handshake of an IETF 477QUIC connection. Non-handshake and gQUIC packets cannot be coalesced. 478 479Sending and Refilling the Batch 480^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 481 482When the batch is sent inside the while loop, and the whole batch was 483sent successfully, the batch pointers are reset, the batch potentially 484grows larger, and the while loop continues. 485 486Batch Resizing 487^^^^^^^^^^^^^^ 488 489When all datagrams in the batch are sent successfully, the batch may 490grow -- up to the hardcoded maximum value of ``MAX_OUT_BATCH_SIZE``. When 491not all datagrams are sent, the batch shrinks. The batch size survives 492the call into the library: when packets are sent again, the same batch 493size is used to begin the sending. 494 495Deadline Checking 496^^^^^^^^^^^^^^^^^ 497 498This is a rather old safety check dating back to the summer of 2017, 499when we first shipped QUIC support. The way we send packets has changed 500since then -- there is high possibility that this code can be removed 501with no ill effect. 502 503Sending a batch 504--------------- 505 506When the batch is filled, it is handed off to the function ``send_batch``, 507which calls the user-supplied callback to send packets out. The 508high-level logic is as follows: 509 510- Update each packet's ``sent`` time 511 512- Call the "send packets out" callback 513 514- For packets that were sent successfully, call ``ci_packet_sent`` 515 516- For packets that were not sent, call ``ci_packet_not_sent``. This is 517 important: all packets returned by ``ci_next_packet_to_send`` must 518 be returned to the connection via either these two calls above or 519 via ``ci_packet_too_large`` (see below). 520 521- Return the number of packets sent 522 523Because of support for coalescing, we have to map from outgoing spec to 524packets via ``batch->pack_off``. This is done in several places in this 525function. 526 527To handle the case when a PMTU probe is too large (stuff happens!), the 528code checks for EMSGSIZE and returns the packet back to the connection 529via ``ci_packet_too_large``. Because this error is of our own making, this 530does not count as inability to send. The too-large packet is skipped and 531sending of the datagrams in the batch continues. 532 533Growing min-heaps 534----------------- 535 536The Outgoing and Tickable connection queues are actually min-heaps. The 537number of elements in these min-heaps never exceeds the number of 538connections. As optimization, allocation of the underlying arrays is 539done not in the min-heap module itself but in the engine module in the 540function ``maybe_grow_conn_heaps``. The engine knows how many connections 541there are and it grows the arrays as necessary. 542 543As an additional optimization, the two arrays use a single memory region 544which is allocated once. 545 546The min-heap arrays are never shrunk. 547 548Connection 549********** 550 551*Files: lsquic_conn.h, lsquic_conn.c -- others are covered in dedicated 552chapters* 553 554The connection represents the QUIC connection. Connections are `managed 555by the engine <#connection-management>`__. A connection, in turn, 556manages `streams <#stream>`__. 557 558Connection Types 559================ 560 561lsquic supports two different QUIC protocols: Google QUIC and IETF QUIC. 562Each of these has a separate implementation, which includes connection 563logic, parsing/generating mechanism, and encryption. 564 565Each of the QUIC connection types on the server begin their life as a 566``mini`` connection. This connection type is used while handshake is 567proceeding. Once the handshake has completed, the mini connection is 568``promoted`` to a ``full`` connection. (See `Mini vs Full 569Connection <#mini-vs-full-connections>`__ for more.) 570 571In addition to the above, an "evanescent" connection type is used to 572manage replies to incoming packets that do not result in connection 573creation. These include version negotiation, stateless retry, and 574stateless reset packets. 575 576Each of the five connection types above are covered in their own 577dedicated chapters elsewhere in this document: 578 579- `Mini gQUIC Connection <#mini-gquic-connection>`__ 580 581- `Full gQUIC Connection <#full-gquic-connection>`__ 582 583- `Mini IETF QUIC Connection <#mini-ietf-connection>`__ 584 585- `Full IETF QUIC Connection <#full-ietf-connection>`__ 586 587- `Evanescent Connection <#evanescent-connection>`__ 588 589lsquic_conn 590=========== 591 592All connection types expose the same connection interface via a pointer 593to ``struct lsquic_conn``. (This is the same type pointer to which is 594exposed to the user, but the user can only treat the connection as an 595opaque pointer.) 596 597This structure contains the following elements: 598 599Pointers to Crypto Implementation 600--------------------------------- 601 602The crypto session pointer, ``cn_enc_session``, points to a type-specific 603(gQUIC or iQUIC) instance of the encryption session. This session 604survives `connection promotion <#connection-promotion>`__. 605 606The two types of crypto session have a set of common functionality; it 607is pointed to by ``cn_esf_c`` (where ``c`` stands for ``common``). Each of 608them also has its own, type-specific functionality, which is pointed to 609by ``cn_esf.g`` and ``cn_esf.i`` 610 611Pointer to Common Connection Interface 612-------------------------------------- 613 614``cn_if`` points to the set of functions that implement the Common 615Connection Interface (`see below <#common-connection-interface>`__). 616 617Pointer to Parsing Interface 618---------------------------- 619 620The parsing interface is version-specific. It is pointed to by ``cn_pf``. 621 622Various list and heap connectors 623-------------------------------- 624 625A connection may be pointed to by one or several queues and heaps (see 626"\ `Connection Management <#connection-management>`__\ "). There are 627several struct members that make it possible: \*TAILQ_ENTRYs, 628``cn_attq_elem``, and ``cn_cert_susp_head``. 629 630Version 631------- 632 633``cn_version`` is used to make some decisions in several parts of the 634code. 635 636Flags 637----- 638 639The flags in ``cn_flags`` specify which lists the connection is on and 640some other properties of the connection which need to be accessible by 641other modules. 642 643Stats 644----- 645 646``cn_last_sent`` and ``cn_last_ticked`` are used to determine the 647connection's place on the outgoing queue (see `Batching 648Packets <#batching-packets>`__) and on the `Advisory Tick Time 649Queue <#alarm-set>`__. 650 651List of SCIDs 652------------- 653 654IETF QUIC connections have one or more SCIDs (Source Connection IDs), 655any one of which can be used by the peer as the DCID (Destination CID) 656in the packets it sends. Each of the SCIDs is used to hash the 657connection so it can be found. ``cn_cces`` points to an array of size 658``cn_n_cces`` which is allocated internally inside each connection type. 659 660Google QUIC connections use only one CID (same for source and 661destination). In order not to modify old code, the macro ``cn_cid`` is 662used. 663 664Common Connection Interface 665=========================== 666 667The struct ``conn_iface`` defines the common connection interface. All 668connection types implement all or some of these functions. 669 670Some of these functions are used by the engine; others by other modules 671(for example, to abort a connection); yet others are for use by the 672user, e.g. ``lsquic_conn_close`` and others in lsquic.h. In that case, 673these calls are wrapped in lsquic_conn.c. 674 675Tickability 676=========== 677 678A connection is processed when it is tickable. More precisely, the 679connection is placed onto the `Tickable Queue <#tickable-queue>`__, 680which is iterated over when `connections are 681processed <#processing-connections>`__. A connection reports its own 682tickability via the ``ci_is_tickable`` method. 683 684In general, a connection is tickable if it has productive user callbacks 685to dispatch (that is, user wants to read and there is data to read or 686user wants to write and writing is possible), if there are packets to 687send or generate, or if its advisory tick time is in the past. (The 688latter is handled in ``lsquic_engine_process_conns()`` when expired 689connections from the `Advisory Tick Time Queue`_ are added 690to the Tickable Queue.) 691 692Stream 693****** 694 695*Files: lsquic_stream.h, lsquic_stream.c* 696 697Overview 698======== 699 700The lsquic stream is the conduit for data. This object is accessible by 701the user via any of the ``lsquic_stream_*`` functions declared in 702lsquic.h. The stream is bidirectional; in our user code, it represents 703the HTTP request and response. The client writes its request to the 704stream and the server reads the request in its corresponding instance of 705the stream. The server sends its response using the same stream, which 706the client reads from the stream. 707 708Besides streams exposed to the application, connections use streams 709internally: 710 711- gQUIC has the HANDSHAKE and HEADERS streams 712 713- IETF QUIC has up to four HANDSHAKE streams 714 715- HTTP/3 has at least three unidirectional streams: 716 717 - Settings stream 718 719 - QPACK encoder stream 720 721 - QPACK decoder stream 722 723In addition, HTTP/3 push promises use unidirectional streams. In the 724code, we make a unidirectional stream simply by closing one end in the 725constructor. 726 727All of the use cases above are handled by the single module, 728lsquic_stream. The differences in behavior -- gQUIC vs IETF QUIC, HTTP 729vs non-HTTP -- are handled either by explicit conditionals or via 730function pointers. 731 732The streams hang off full connections via stream ID-to-stream hashes and 733in various queues. This is similar to the way the connections hang off 734the engine. 735 736Streams are only used in the full connections; mini connections use 737their own, minimalistic, code to handle streams. 738 739.. _data-structures-1: 740 741Data Structures 742=============== 743 744stream_hq_frame 745--------------- 746 747This structure is used to keep information about an HTTP/3 frame that is 748being, or is about to be, written. In our implementation, frame headers 749can be two or three bytes long: one byte is HTTP/3 frame type and the 750frame length is encoded in 1 or 2 bytes, giving us the maximum payload 751size of 2\ :sup:`14` - 1 bytes. You will find literal ``2`` or ``3`` values 752in code that deals with writing HQ frames. 753 754If the HQ frame's size is known in advance (SHF_FIXED_SIZE) -- which is 755the case for HEADERS and PUSH_PROMISE frames -- then the HQ header 756contents are written immediately. Otherwise, ``shf_frame_ptr`` points to 757the bytes in the packet where the HQ header was written, to be filled in 758later. 759 760See `Writing HTTP/3 Streams`_ for more information. 761 762hq_filter 763--------- 764 765This structure is used to read HTTP/3 streams. A single instance of it 766is stored in the stream in ``sm_hq_filter``. The framing is removed 767transparently (see `Reading HTTP/3 Streams`_). 768 769Frame type and length are read into ``hqfi_vint2_state``. Due to greasing, 770the reader must be able to support arbitrary frame types and so the code 771is pretty generic: varints of any size are supported. 772 773``hqfi_flags`` and ``hqfi_state`` contain information needed to resume 774parsing the frame header, as only partial data may have arrived. 775 776``hqfi_hist_buf`` and ``hqfi_hist_idx`` are used to record the last few 777incoming headers. This information is used to check for validity, as 778some sequences of HTTP/3 frames are invalid. 779 780stream_filter_if 781---------------- 782 783This struct is used to specify functionality required to strip arbitrary 784framing when reading from the stream. At the moment (and for the 785foreseeable future) only one mechanism is used: that to strip the HTTP/3 786framing. At the time the code was written, however, the idea was to 787future-proof it in case we needed to support more than one framing format 788at a time. 789 790lsquic_stream 791------------- 792 793This struct is the stream object. It contains many members that deal 794with 795 796- Reading data 797 798- Writing data 799 800- Maintaining stream list memberships 801 802- Enforcing flow control 803 804- Dispatching read and write events 805 806- Calling various user callbacks 807 808- Interacting with HEADERS streams 809 810The stream has an ID (``id``). It is used to hash the stream. 811 812A stream can be on one or more lists: see ``next_send_stream``, 813``next_read_stream``, and so on. 814 815Incoming data is stored in ``data_in``. Outgoing data is packetized 816immediately or buffered in ``sm_buf``. 817 818HTTP/3 frames that are being actively written are on the ``sm_hq_frames`` 819list. 820 821A note on naming: newer members of the stream begin with ``sm_`` for 822simplicity. Originally, the structure members lacked a prefix. 823 824progress 825-------- 826 827This structure is used to determine whether the user callback has made 828any progress during an ``on_write`` or ``on_read`` event loop. If progress 829is not made for a number of calls, the callback is interrupted, breaking 830out of a suspected infinite loop. (See ``es_progress_check`` setting.) 831 832 833frame_gen_ctx 834------------- 835 836This structure holds function pointers to get user data and write it to 837packets. ``fgc_size``, ``fgc_fin``, and ``fgc_read`` are set based on framing 838requirements. This is a nice abstraction that gets passed to several 839packetization functions and allows them not to care about how or whether 840framing is performed. 841 842pwritev_ctx 843----------- 844 845Used to aid ``lsquic_stream_pwritev``. ``hq_arr`` is used to roll back 846HTTP/3 framing if necessary. (The rollback is the most complicated part 847of the ``pwritev`` functionality). 848 849Event Dispatch 850============== 851 852The "on stream read" and "on stream write" callbacks are part of the 853lsquic API. These callbacks are called when the user has registered 854interest in reading from or writing to the stream and reading or writing 855is possible. 856 857Calling ``lsquic_stream_wantwrite`` and ``lsquic_stream_wantread`` places 858the stream on the corresponding "want to write" and "want to read" list. 859These lists are processed by a connection when it's ticked. For each 860stream on the list, the internal function 861``lsquic_stream_dispatch_read_events`` or 862``lsquic_stream_dispatch_write_events``, whichever may be the case. 863 864Dispatching read events is simple. When ``es_rw_once`` is set, the "on 865stream read" callback is called once -- if the stream is readable. 866Otherwise, the callback is called in a loop as long as: 867 868- The stream is readable; 869 870- The user wants to read from it; and 871 872- Progress is being made 873 874Dispatching write events is more complicated due to the following 875factors: 876 877- In addition to calling the "on stream write" callback, the flushing 878 mechanism also works by using the "want to write" list. 879 880- When writing occurs, the stream's position on the list may change 881 882STREAM frames in 883================ 884 885The data gets in from the transport into the stream via 886``lsquic_stream_frame_in`` function. The connection calls this function 887after parsing a STREAM frame. 888 889The data from the STREAM frame is stored in one of the two "data in" 890modules: ``di_nocopy`` and ``di_hash``. The two are abstracted out behind 891``stream->data_in``. 892 893The "data in" module is used to store incoming stream data. The data is 894read from this module using the ``di_get_frame`` function. See the next 895section. 896 897Reading Data 898============ 899 900There are three user-facing stream-reading functions; two of them are 901just wrappers around ``"lsquic_stream_readf``. This function performs some 902checks (we will cover HTTP mode separately) and calls 903``lsquic_stream_readf``, which also performs some checks and calls 904``read_data_frames``. This is the only function in the stream module where 905data is actually read from the "data in" module. 906 907Writing Data 908============ 909 910There are four user-facing functions to write to stream, and all of them 911are wrappers around ``stream_write``. (``lsquic_stream_pwritev`` is a bit 912more involved than the other three, but it's pretty well-commented -- 913and the complexity is in the rollback, not writing itself.) 914 915Small writes get buffered. If the write size plus whatever is buffered 916already exceeds the threshold -- which is the size of the largest STREAM 917frame that could be fit into a single outgoing packet -- the data is 918packetized instead by calling ``stream_write_to_packets``. See the next 919section. 920 921Packetization 922============= 923 924``stream_write_to_packets`` is the only function through which user data 925makes it into outgoing packets. There are three ways to write STREAM 926frames: 927 9281. ``stream_write_to_packet_hsk`` 929 9302. ``stream_write_to_packet_std`` 931 9323. ``stream_write_to_packet_crypto`` 933 934The particular function is selected based on connection and stream type 935when the stream is first created. 936 937stream_write_to_packets 938----------------------- 939 940Depending on the need to frame data, a reader is selected. The job of 941the reader is to copy user data into the outgoing STREAM frame. In 942HTTP/3 mode, HTTP/3 framing is added transparently -- see `Writing 943HTTP/3 Streams`_ for more information. 944 945The while loop is entered if there is user data to be copied or if the 946end of the stream has been reached and FIN needs to be written. Note the 947threshold check: when writing data from a user call, the threshold is 948set and frames smaller than the full packet are not generated. This is 949to allow for usage like "write 8KB", "write 8KB", "write 8KB" not to 950produce jagged STREAM frames. This way, we utilize the bandwidth most 951effectively. When flushing data, the threshold is not set, so even a 9521-byte data gets packetized. 953 954The call ``stream->sm_write_to_packet`` writes data to a single packet. 955This packet is allocated by the `Send Controller <#send-controller>`__. 956(Depending on when writing is performed, the returned packet may be 957placed onto the scheduled queue immediately or it may be a "buffered" 958packet. The stream code is oblivious to that.) If the send controller 959does not give us a packet, STOP is returned and the while loop exits. An 960ERROR should never happen -- this indicates a bug or maybe failure to 961allocate memory -- and so the connection is aborted in that case. If 962everything is OK, the while loop goes on. 963 964The ``seen_ok`` check is used to place the connection on the tickable list 965on the first successfully packetized STREAM frame. This is so that if 966the packet is buffered (meaning that the writing is occurring outside of 967the callback mechanism), the connection will be processed (ticked) and 968the packets will be scheduled and sent out. 969 970After the while loop, we conditionally close an outstanding HTTP/3 971frame, save any leftover data, schedule STREAM_BLOCKED or BLOCKED frames 972to be sent out if needed, and return the number of user-provided bytes 973that were copied into the outgoing packets and into the internal stream 974buffer (leftovers). 975 976Write a single STREAM frame 977--------------------------- 978 979We will examine ``stream_write_to_packet_std`` as it is the most 980complicated of these three functions. 981 982First, we flush the headers stream if necessary -- this is because we 983want the HTTP (gQUIC or HTTP/3) headers to be sent before the payload. 984 985Then, the number of bytes needed to generate a STREAM frame is 986calculated. This value depends on the QUIC version, whether we need to 987generate HTTP/3 framing, and whether the data to write exists (or we 988just need to write an empty STREAM frame with the FIN bit set). 989 990(Note that the framing check is made to overshoot the estimate for 991simplicity. For one, we might not need 3 bytes for the DATA frame, but 992only 2. Secondly, there may already be an open HTTP/3 frame in one of 993the previous packets and so we don't need to write it at all.) 994 995Then, a packet is allocated and ``write_stream_frame`` is called. It is in 996this function that we finally make the call to generate the STREAM frame 997and to copy the data from the user. The function ``pf_gen_stream_frame`` 998returns the number of bytes actually written to the packet: this 999includes both the STREAM frame header and the payload (which may also 1000include HTTP/3 frame). 1001 1002The fact that this frame type has been written is added to 1003``po_frame_types`` and the STREAM frame location, type, and size are 1004recorded. This information is necessary to be able to elide the frame 1005from the packet in case the stream is reset. 1006 1007``PO_STREAM_END`` is set if the STREAM frame extends to the end of the 1008packet. This is done to prevent this packet from being used again to 1009append frames to it (after, for example, some preceding frames are 1010elided from it). This is because both in gQUIC and IETF QUIC the STREAM 1011frame header is likely to omit the ``length`` field and instead use the 1012"extends to the end of the packet" field. If frames are shifted, the 1013packet cannot be appended to because it will lead to data loss and 1014corruption. 1015 1016Writing HTTP/3 Streams 1017====================== 1018 1019HTTP/3 streams use framing. In most cases, a single HEADERS frame is 1020followed by zero or more DATA frames. The user code does not know this: 1021both gQUIC and IETF QUIC streams appear to behave in exactly the same 1022manner. This makes lsquic simple to use. 1023 1024The drawback is internal complexity. To make the code both easy to use 1025and performant, HTTP/3 framing is generated on-the-fly, as data is being 1026written to packets (as opposed to being buffered and then written). (OK, 1027*mostly* on-the-fly: the HEADERS frame payload is generated and then 1028copied.) 1029 1030On the high level, the way it works is as follows: 1031 1032- When a write call is made, a variable-size (that is, unknown size; 1033 it's called variable-size because the size of the DATA header may 1034 be 2 or 3 bytes; it's not the best name in the world) frame is 1035 opened/activated. 1036 1037- When data is written to stream, the DATA header placeholder bytes are 1038 written to the stream transparently and a pointer is saved to this 1039 location. 1040 1041- The active frame header is closed when 1042 1043 - It reaches its maximum size; or 1044 1045 - The data we are writing runs out. 1046 1047- When the header is closed, the number of bytes that follows is now 1048 written to the location we saved when the header was activated. 1049 1050This mechanism allows us to create a DATA frame that spans several 1051packets before we know how many packets there will be in a single write. 1052(As outgoing packet allocation is governed by the `Send Controller`_.) 1053This is done to minimize the goodput overhead incurred by the DATA frame header. 1054 1055.. image:: stream-http3-framing.png 1056 1057There are a couple of things that do not fit into this model: 1058 10591. The HEADERS frame is fixed size [1]_. It is generated separately 1060 (written by QPACK encoder into a buffer on the stack) and later 1061 copied into the stream. (See the ``send_headers_ietf`` function.) It 1062 can happen that the whole buffer cannot be written. In that case, 1063 a rather complicated dance of buffering the unwritten HEADERS 1064 frame bytes is performed. Here, the "on stream write" callback is 1065 replaced with an internal callback (see the ``select_on_write`` 1066 function) and user interaction is prohibited until the whole of 1067 the HEADERS frame is written to the stream. 1068 10692. Push promise streams are even weirder. In addition to the HEADERS 1070 handling above, the push promise stream must begin with a 1071 variable-integer Push ID. To make this fit into the framed stream 1072 model, the code makes up the concept of a "phantom" HTTP/3 frame. 1073 This type of frame's header is not written. This allows us to 1074 treat the Push ID as the payload of a regular HTTP/3 frame. 1075 1076The framing code has had its share of bugs. Because of that, there is a 1077dedicated unit test program just for the framing code, 1078*tests/test_h3_framing.c*. In addition to manually-written tests, the 1079program has a "fuzzer driver" mode, in which the American Fuzzy Lop 1080fuzzer drives the testing of the HTTP/3 framing mechanism. The advantage 1081of this approach is that AFL tries to explore all the code paths. 1082 1083 1084Debates regarding DATA framing raged in 2018 on the QUIC mailing list. 1085Some of the discussion is quite interesting: for example, the debate about 1086"optimizing" DATA frames and `calculations of the header 1087cost <https://lists.w3.org/Archives/Public/ietf-http-wg/2018OctDec/0236.html>`__. 1088 1089Reading HTTP/3 Streams 1090====================== 1091 1092HTTP/3 frame headers are stripped out transparently -- they are never 1093seen by the user. From the user's perspective, the lsquic stream 1094represents the payload of HTTP message; a dedicated call is made first 1095to get at the HTTP headers. 1096 1097To accomplish this, the stream implements a generic deframing mechanism. 1098The `stream_filter_if`_ interface allows one to 1099specify functions to a) check whether the stream is readable, b) strip 1100header bytes from a data frame fetched from "data in" module; and c) 1101update byte count in the filter once bytes have been read: 1102 1103hq_filter_readable 1104------------------ 1105 1106This function tests for availability of non-frame-header data, stripping 1107frame headers from the stream transparently. Note how it calls 1108``read_data_frames`` with its own callback, ``hq_read``. It is inside this 1109callback that the HEADERS frame is fed to the QPACK decoder. 1110 1111hq_filter_df 1112------------ 1113 1114This function's job is to strip framing from data frames returned by the 1115"data in" module inside the ``read_data_frames`` function. It, too, calls 1116the ``hq_read`` function. This allows the two functions that read from 1117stream (this one) and the readability-checking function 1118(``hq_filter_readable``) to share the same state. This is crucial: 1119Otherwise this approach is not likely to work well. 1120 1121hq_decr_left 1122------------ 1123 1124This function is needed to update the filter state. Once all payload 1125bytes from the frame have been consumed, the filter is readied to strip 1126the next frame header again. 1127 1128.. _notable-code-1: 1129 1130Notable Code 1131============ 1132 1133frame_hq_gen_read 1134----------------- 1135 1136This is where HTTP/3 frame headers are generated. Note the use of 1137``shf_frame_ptr`` to record the memory location to which the correct frame 1138size will be written by a different function. 1139 1140Parsing 1141******* 1142 1143*Files: lsquic_parse.h, lsquic_parse_ietf_v1.c, lsquic_parse_Q050.c, lsquic_parse_Q046.c, 1144lsquic_parse_gquic_be.c, lsquic_parse_common.c, and others* 1145 1146Overview 1147======== 1148 1149The two types of QUIC -- gQUIC and IETF QUIC -- have different packet and 1150frame formats. In addition, different gQUIC version are different among 1151themselves. Functions to parse and generate packets and frames of each 1152type are abstracted out behind the rather large ``struct parse_funcs``. 1153When a connection is created, its ``cn_pf`` member is set to point to 1154the correct set of function pointers via the ``select_pf_by_ver()`` macro. 1155 1156Parsing Packets 1157=============== 1158 1159Before settling on a particular set of parsing function for a connection, 1160the server needs to determine the connection's version. It does so using 1161the function ``lsquic_parse_packet_in_server_begin()``. 1162 1163This function figures out whether the packet has a long or a short header, 1164and which QUIC version it is. Because the server deals with fewer packet 1165types than the client (no version negotiation or stateless retry packets), 1166it can determine the necessary parsing function from the first byte of the 1167incoming packet. 1168 1169The "begin" in the name of the function refers to the fact that packet 1170parsing is a two-step process [3]_. In the first step, the packet version, 1171CID, and some other parameters are parsed out; in the second step, 1172version-specific ``pf_parse_packet_in_finish()`` is called to parse out 1173the packet number. Between the two calls, the state is saved in 1174``struct packin_parse_state``. 1175 1176Generating Packets 1177================== 1178 1179Packets are generated during encryption using the ``pf_gen_reg_pkt_header()`` 1180function. The generated header is encrypted together with the `packet payload`_ 1181and this becomes the QUIC packet that is sent out. (Most of the time, the 1182QUIC packet corresponds to the UDP datagram, but sometimes packets are 1183`coalesced <#packet-coalescing>`__. 1184 1185Parsing Frames 1186============== 1187 1188There is a parsing function for each frame type. These function generally 1189have names that begin with "pf_parse\_" and follow a similar pattern: 1190 1191- The first argument is the buffer to be parsed; 1192 1193- The second argument is its size; 1194 1195- Any additional arguments are outputs: the parsed out values from the frame; 1196 1197- Number of bytes consumed is returned or a negative value is returned 1198 if a parsing error occurred. 1199 1200For example: 1201 1202:: 1203 1204 int 1205 (*pf_parse_stream_frame) (const unsigned char *buf, size_t rem_packet_sz, 1206 struct stream_frame *); 1207 1208 int 1209 (*pf_parse_max_data) (const unsigned char *, size_t, uint64_t *); 1210 1211Generating Frames 1212================= 1213 1214Functions that generate frames begin with "pf_gen\_" and also follow a 1215pattern: 1216 1217- First argument is the buffer to be written to; 1218 1219- The second argument is the buffer size; 1220 1221- Any additional arguments specify the values to include in the frame; 1222 1223- The size of the resulting frame is returned or a negative value if 1224 an error occurred. 1225 1226For example: 1227 1228:: 1229 1230 int 1231 (*pf_gen_path_chal_frame) (unsigned char *, size_t, uint64_t chal); 1232 1233 int 1234 (*pf_gen_stream_frame) (unsigned char *buf, size_t bufsz, 1235 lsquic_stream_id_t stream_id, uint64_t offset, 1236 int fin, size_t size, gsf_read_f, void *stream); 1237 1238Frame Types 1239=========== 1240 1241Frame types are listed in ``enum quic_frame_type``. When frames are parsed, 1242the on-the-wire frame type is translated to the enum value; when frames are 1243generated, the enum is converted to the on-the-wire format. This indirection 1244is convenient, as it limits the range of possible QUIC frame values, making 1245it possible to store a list of frame types as a bitmask. Examples include 1246``po_frame_types`` and ``sc_retx_frames``. 1247 1248Some frame types, such as ACK and STREAM, are common to both Google and IETF 1249QUIC. Others, such as STOP_WAITING and RETIRE_CONNECTION_ID, are only used 1250in one of the protocols. The third type is frames that are used by IETF 1251QUIC extensions, such as TIMESTAMP and ACK_FREQUENCY. 1252 1253Parsing IETF QUIC Frame Types 1254----------------------------- 1255 1256Most IETF frame types are encoded as a single by on the wire (and all Google 1257QUIC frames are). Some of them are encoded using multiple bytes. This is 1258because, like the vast majority of all integral values in IETF QUIC, the frame 1259type is encoded as a varint. Unlike the other integral values, however, the 1260frame type has the unique property is that it must be encoded using the 1261*minimal representation*: that is, the encoding must use the minimum number 1262of bytes possible. For example, encoding the value 200 must use the two-byte 1263varint, not four- or eight-byte version. This makes it possible to parse 1264frame types once without having to reparse the frame type again in individual 1265frame-parsing routines. 1266 1267Frame type is parsed out in ``ietf_v1_parse_frame_type()``. Because of the 1268minimal encoding requirement, the corresponding frame-parsing functions know 1269the number of bytes to skip for type, for example: 1270 1271 1272:: 1273 1274 static int 1275 ietf_v1_parse_frame_with_varints (const unsigned char *buf, size_t len, 1276 const uint64_t frame_type, unsigned count, uint64_t *vals[]) 1277 { 1278 /* --- 8< --- code removed */ 1279 vbits = vint_val2bits(frame_type); 1280 p += 1 << vbits; // <=== SKIP FRAME TYPE 1281 /* --- 8< --- code removed */ 1282 } 1283 1284 static int 1285 ietf_v1_parse_timestamp_frame (const unsigned char *buf, 1286 size_t buf_len, uint64_t *timestamp) 1287 { 1288 return ietf_v1_parse_frame_with_varints(buf, buf_len, 1289 FRAME_TYPE_TIMESTAMP, 1, (uint64_t *[]) { timestamp }); 1290 } 1291 1292 1293Mini vs Full Connections 1294************************ 1295 1296Mini Purpose 1297============ 1298 1299The reason for having a mini connection is to conserve resources: a mini 1300connection allocates a much smaller amount of memory. This protects the 1301server from a potential DoS attack. The mini connection's job is to get 1302the handshake to succeed, after which the connection is 1303`promoted <#connection-promotion>`__. 1304 1305Mini/Full Differences 1306===================== 1307 1308Besides their size, the two connection types differ in the following 1309ways: 1310 1311Mini connections' lifespan is limited. If the handshake does not succeed 1312within 10 seconds (configurable), the mini connection is destroyed. 1313 1314A mini connection is only `tickable <#tickability>`__ if it has unsent 1315packets. 1316 1317Mini connections do not process packets that carry application (as 1318opposed to handshake) data. The 0-RTT packet processing is deferred; 1319these packets are stashed and handed over to the full connection during 1320promotion. 1321 1322Connection Promotion 1323==================== 1324 1325A mini connection is promoted when the handshake succeeds. The mini 1326connection reports this via the return status of ``ci_tick`` by setting 1327the ``TICK_PROMOTE`` bit. The engine creates a new connection object and 1328calls the corresponding server constructor. The latter copies all the 1329relevant state information from mini to full connection. 1330 1331For a time, two connection objects -- one mini and one full -- exist at 1332the same state. Most of the time, the mini connection is destroyed 1333within the same function call to ``process_connections()``. If, however, 1334the mini connection has unsent packets, it will remain live until those 1335packets are sent successfully. Because the mini connection is by then 1336removed from the CID-to-connection hash (``engine->conns_hash``), it will 1337not receive any more incoming packets. 1338 1339Also see `Connection Processing <#processing-connections>`__. 1340 1341Mini gQUIC Connection 1342********************* 1343 1344*Files: lsquic_mini_conn.h, lsquic_mini_conn.c* 1345 1346.. _overview-1: 1347 1348Overview 1349======== 1350 1351The original version of ``struct mini_conn`` fit into paltry 128 bytes. 1352The desire to fit into 128 bytes [2]_ led to, for example, 1353``mc_largest_recv`` -- in effect, a 3-byte integer! Since that time, 1354the mini conn has grown to over 512 bytes. 1355 1356Looking at the struct, we can see that a lot of other data structures 1357are squeezed into small fields: 1358 1359Received and sent packet history is each packed into a 64-bit integer, 1360``mc_received_packnos`` and ``mc_sent_packnos``, respectively. The HEADERS 1361stream offsets are handled by the two two-byte integers ``mc_read_off`` 1362and ``mc_write_off``. 1363 1364.. _notable-code-2: 1365 1366Notable Code 1367============ 1368 1369continue_handshake 1370------------------ 1371 1372This function constructs a contiguous buffer with all the HANDSHAKE 1373stream chunks in order and passes it to ``esf_handle_chlo()``. This is 1374done because the gQUIC crypto module does not buffer anything: it's all 1375or nothing. 1376 1377The code has been written in a generic way, so that even 1378many small packets can be reconstructed into a CHLO. The lsquic client 1379can be made to split the CHLO by setting the max packet size 1380sufficiently low. 1381 1382sent/unsent packets 1383------------------- 1384 1385To conserve space, only a single outgoing packet header exists in the 1386mini connection struct, ``mc_packets_out``. To differentiate between 1387packets that are to be sent and those that have already been sent, the 1388``PO_SENT`` flag is used. 1389 1390Mini IETF Connection 1391******************** 1392 1393*Files: lsquic_mini_conn_ietf.h, lsquic_mini_conn_ietf.c* 1394 1395.. _overview-2: 1396 1397Overview 1398======== 1399 1400The IETF QUIC mini connection has the same idea as the gQUIC mini 1401connection: use as little memory as possible. This is more difficult to 1402do with the IETF QUIC, however, as there are more moving parts in this 1403version of the protocol. 1404 1405.. _data-structures-2: 1406 1407Data Structures 1408=============== 1409 1410mini_crypto_stream 1411------------------ 1412 1413This structure is a minimal representation of a stream. The IETF QUIC 1414protocol uses up to four HANDSHAKE streams (one for each encryption 1415level) during the handshake and we need to keep track of them. Even a 1416basic event dispatch mechanism is supported. 1417 1418packno_set_t 1419------------ 1420 1421This bitmask is used to keep track of sent, received, and acknowledged 1422packet numbers. It can support up to 64 packet numbers: 0 through 63. We 1423assume that the server will not need to send more than 64 packets to 1424complete the handshake. 1425 1426imc_recvd_packnos 1427----------------- 1428 1429Because the client is allowed to start its packet number sequence with 1430any number in the [0, 2\ :sup:`32`-1] range, the received packet history 1431must be able to accommodate numbers larger than 63. To do that, the 1432receive history is a union. If all received packet numbers are 63 or 1433smaller, the packno_set_t bitmask is used. Otherwise, the receive 1434history is kept in `Tiny Receive History <#tiny-receive-history>`__ 1435(trechist). The flag ``IMC_TRECHIST`` indicates which data structure is 1436used. 1437 1438ietf_mini_conn 1439-------------- 1440 1441This structure is similar to the gQUIC mini conn. It is larger, though, 1442as it needs to keep track of several instances of things based on 1443encryption level or packet number space. 1444 1445``imc_cces`` can hold up to three SCIDs: one for the original DCID from 1446the client, one for SCID generated by the server, and one for when 1447preferred address transport parameter is used. (The preferred address 1448functionality is not compiled by default.) 1449 1450ietf_mini_rechist 1451----------------- 1452 1453The receive history is in the header file because, in addition to 1454generating the ACK frames in the IETF mini conn, it is used to migrate 1455the receive history during promotion. 1456 1457.. _notable-code-3: 1458 1459Notable Code 1460============ 1461 1462Switching to trechist 1463--------------------- 1464 1465The switch to the Tiny Receive History happens when the incoming packet 1466number does not fit into the bitmask anymore -- see 1467``imico_switch_to_trechist()``. To keep the trechist code exercised, about 1468one in every 16 mini connection uses trechist unconditionally -- see 1469``lsquic_mini_conn_ietf_new()``. 1470 1471crypto_stream_if 1472---------------- 1473 1474A set of functions to drive reading and writing CRYPTO frames to move 1475the handshake along is specified. It is passed to the crypto session. 1476After promotion, the full connection installs its own function pointers. 1477 1478imico_read_chlo_size 1479-------------------- 1480 1481This function reads the first few bytes of the first CRYPTO frame on the 1482first HANDSHAKE stream to figure out the size of ClientHello. The 1483transport parameters will not be read until the full ClientHello is 1484available. 1485 1486 1487Duplicated Code 1488--------------- 1489 1490Some code has been copied from gQUIC mini connection. This was done on 1491purpose, with the expectation that gQUIC is going away. 1492 1493ECN Blackhole Detection 1494----------------------- 1495 1496ECN blackhole at the beginning of connection is guessed at when none of 1497packets sent in the initial batch were acknowledged. This is done by 1498``imico_get_ecn()``. ``lsquic_mini_conn_ietf_ecn_ok()`` is also used during 1499promotion to check whether to use ECN. 1500 1501Connection Public Interface 1502*************************** 1503 1504*Files: lsquic_conn_public.h* 1505 1506TODO 1507 1508Full gQUIC Connection 1509********************* 1510 1511*Files: lsquic_full_conn.h, lsquic_full_conn.c* 1512 1513.. _overview-3: 1514 1515Overview 1516======== 1517 1518The full gQUIC connection implements the Google QUIC protocol, both 1519server and client side. This is where a large part of the gQUIC protocol 1520logic is contained and where everything -- engine, streams, sending, 1521event dispatch -- is tied together. 1522 1523Components 1524========== 1525 1526In this section, each member of the ``full_conn`` structure is documented. 1527 1528fc_conn 1529------- 1530 1531The first member of the struct is the common connection object, 1532`lsquic_conn`_. 1533 1534It must be first in the struct because the two pointer are cast to each 1535other, depending on circumstances. 1536 1537fc_cces 1538------- 1539 1540This array holds two connection CID elements. 1541 1542The reason for having two elements in this array instead of one (even 1543though gQUIC only uses one CID) is for the benefit of the client: In 1544some circumstances, the client connections are hashed by the port 1545number, in which case the second element is used to hash the port value. 1546The relevant code is in lsquic_engine.c 1547 1548fc_rechist 1549---------- 1550 1551This member holds the `packet receive history <#receive-history>`__. It 1552is used to generate ACK frames. 1553 1554fc_stream_ifs 1555------------- 1556 1557This three-element array holds pointers to stream callbacks and the 1558stream callback contexts. 1559 1560From the perspective of lsquic, Google QUIC has three stream types: 1561 15621. HANDSHAKE stream; 1563 15642. HEADERS stream; and 1565 15663. Regular (message, or request/response) streams. 1567 1568The user provides stream callbacks and the context for the regular 1569streams (3) in ``ea_stream_if`` and ``ea_stream_if_ctx``. 1570 1571The other two stream types are internal. The full connection specifies 1572internal callbacks for those streams. One set handles the handshake and 1573the other handles reading and writing of HTTP/2 frames: SETTINGS, 1574HEADERS, and so on. 1575 1576fc_send_ctl 1577----------- 1578 1579This is the `Send Controller <#send-controller>`__. It is used to 1580allocate outgoing packets, control sending rate, and process 1581acknowledgements. 1582 1583fc_pub 1584------ 1585 1586This member holds the `Connection Public 1587Interface <#connection-public-interface>`__. 1588 1589fc_alset 1590-------- 1591 1592This is the `Alarm Set <#alarm-set>`__. It is used to set various timers 1593in the connection and the send controller. 1594 1595fc_closed_stream_ids 1596-------------------- 1597 1598The two sets in this array hold the IDs of closed streams. 1599 1600There are two of them because of the uneven distribution of stream IDs. 1601It is more efficient to hold even and odd stream IDs in separate 1602structures. 1603 1604fc_settings 1605----------- 1606 1607Pointer to the engine settings. 1608 1609This member is superfluous -- the settings can be fetched from 1610``fc_enpub->enp_settings``. 1611 1612fc_enpub 1613-------- 1614 1615This points to the `engine's public interface <#lsquic-engine-public>`__. 1616 1617fc_max_ack_packno 1618----------------- 1619 1620Recording the maximum packet number that contained an ACK allows us to 1621ignore old ACKs. 1622 1623fc_max_swf_packno 1624----------------- 1625 1626This is the maximum packet number that contained a STOP_WAITING frame. 1627It is used to ignore old STOP_WAITING frames. 1628 1629fc_mem_logged_last 1630------------------ 1631 1632This timestamp is used to limit logging the amount of memory used to 1633most once per second. 1634 1635fc_cfg 1636------ 1637 1638This structure holds a few important configuration parameters. (Looks 1639like ``max_conn_send`` is no longer used���) 1640 1641fc_flags 1642-------- 1643 1644The flags hold various boolean indicators associated with the full 1645connections. Some of them, such as ``FC_SERVER``, never change, while 1646others change all the time. 1647 1648fc_n_slack_akbl 1649--------------- 1650 1651This is the number of ackable (or, in the new parlance, *ack-eliciting*) 1652packets received since the last ACK was sent. 1653 1654This counter is used to decide whether an ACK should be sent (or, more 1655precisely, queued to be sent) immediately or whether to wait. 1656 1657fc_n_delayed_streams 1658-------------------- 1659 1660Count how many streams have been delayed. 1661 1662When ``lsquic_conn_make_stream()`` is called, a stream may not be created 1663immediately. It is delayed if creating a stream would go over the 1664maximum number of stream allowed by peer. 1665 1666fc_n_cons_unretx 1667---------------- 1668 1669Counts how many consecutive unretransmittable packets have been sent. 1670 1671 1672fc_last_stream_id 1673----------------- 1674 1675ID of the last created stream. 1676 1677Used to assign ID to streams created by this side of the connection. 1678Clients create odd-numbered streams, while servers initiate 1679even-numbered streams (push promises). 1680 1681fc_max_peer_stream_id 1682--------------------- 1683 1684Maximum value of stream ID created by peer. 1685 1686fc_goaway_stream_id 1687------------------- 1688 1689Stream ID received in the GOAWAY frame. 1690 1691This ID is used to reset locally-initiated streams with ID larger than 1692this. 1693 1694fc_ver_neg 1695---------- 1696 1697This structure holds the version negotiation state. 1698 1699This is used by the client to negotiate with the server. 1700 1701 1702With gQUIC going away, it is probably not very important anymore. 1703 1704fc_hsk_ctx 1705---------- 1706 1707Handshake context for the HANDSHAKE stream. 1708 1709Client and server have different HANDSHAKE stream handlers -- and 1710therefore different contexts. 1711 1712fc_stats 1713-------- 1714 1715Connection stats 1716 1717fc_last_stats 1718------------- 1719 1720Snapshot of connection stats 1721 1722This is used to log the changes in counters between calls to 1723``ci_log_stats()``. The calculation is straightforward in 1724``lsquic_conn_stats_diff()``. 1725 1726fc_stream_histories and fc_stream_hist_idx 1727------------------------------------------ 1728 1729Rolling log of histories of closed streams 1730 1731 1732fc_errmsg 1733--------- 1734 1735Error message associated with connection termination 1736 1737This is set when the connection is aborted for some reason. This error 1738message is only set once. It is used only to set the error message in 1739the call to ``ci_status()`` 1740 1741fc_recent_packets 1742----------------- 1743 1744Dual ring-buffer log of packet history 1745 1746The first element is for incoming packets, the second is for outgoing 1747packets. Each entry holds received or sent time and frame information. 1748 1749This can be used for debugging. It is only compiled into debug builds. 1750 1751fc_stream_ids_to_reset 1752---------------------- 1753 1754List of stream ID to send STREAM_RESET for 1755 1756These STREAM_RESET frames are associated with streams that are not 1757allowed to be created because we sent a GOAWAY frame. (There is a period 1758when GOAWAY is in transit, but the peer keeps on creating streams). To 1759queue the reset frames for such a stream, an element is added to this 1760list. 1761 1762fc_saved_ack_received 1763--------------------- 1764 1765Timestamp of the last received ACK. 1766 1767This is used for `ACK merging <#ack-merging>`__. 1768 1769fc_path 1770------- 1771 1772The network path -- Google QUIC only has one network path. 1773 1774fc_orig_versions 1775---------------- 1776 1777List (as bitmask) of original versions supplied to the client 1778constructor. 1779 1780Used for version negotiation. See `fc_ver_neg`_ for more 1781coverage of this topic. 1782 1783fc_crypto_enc_level 1784------------------- 1785 1786Latest crypto level 1787 1788This is for Q050 only, which does away with the HANDSHAKE stream and 1789uses CRYPTO frames instead. (This was part of Google's plan to move 1790Google QUIC protocol closer to IETF QUIC.) 1791 1792fc_ack 1793------ 1794 1795Saved ACK -- latest or merged 1796 1797This ACK structure is used in `ACK merging <#ack-merging>`__. 1798 1799Instantiation 1800============= 1801 1802The largest difference between the server and client mode of the full 1803connection is in the way it is created. The client creates a brand-new 1804connection, performs version negotiation, and runs the handshake before 1805dispatching user events. The server connection, on the other hand, gets 1806created from a mini connection during `connection 1807promotion <#connection-promotion>`__. By that time, both version 1808negotiation and handshake have already completed. 1809 1810Common Initialization 1811--------------------- 1812 1813The ``new_conn_common()`` function contains initialization common to both 1814server and client. Most full connection's internal data structures are 1815initialized or allocated here, among them `Send 1816Controller <#send-controller>`__, `Receive 1817History <#receive-history>`__, and `Alarm Set <#alarm-set>`__. 1818 1819The HEADERS stream is created here, if necessary. (Throughout the code, 1820you can see checks whether the connection is in HTTP mode or not. Even 1821though gQUIC means that HTTP is used, our library supports a non-HTTP 1822mode, in which there is no HEADERS stream. This was done for testing 1823purposes and made possible the echo and md5 client and server programs.) 1824 1825Server 1826------ 1827 1828After initializing the common structures in ``new_conn_common()``, 1829server-specific initialization continues in 1830``lsquic_gquic_full_conn_server_new()``. 1831 1832The HANDSHAKE stream is created. The handler (see 1833``lsquic_server_hsk_stream_if``) simply throws out data that it reads from 1834the client. 1835 1836Outgoing packets are inherited -- they will be sent during the next tick 1837-- and deferred incoming packets are processed. 1838 1839Client 1840------ 1841 1842The client's initialization takes place in 1843``lsquic_gquic_full_conn_client_new()``. Crypto session is created and the 1844HANDSHAKE stream is initialized. The handlers in 1845``lsquic_client_hsk_stream_if`` drive the handshake process. 1846 1847Incoming Packets 1848================ 1849 1850The entry point for incoming packets is ``ci_packet_in()``, which is 1851implemented by ``full_conn_ci_packet_in``. Receiving a packet restarts the 1852idle timer. 1853 1854The function ``process_incoming_packet`` contains some client-only logic 1855for processing version negotiation and stateless retry packets. In the 1856normal case, ``process_regular_packet()`` is called. This is where the 1857incoming process is decrypted, the `Receive 1858History <#receive-history>`__ is updated, ``parse_regular_packet()`` is 1859called, and some post-processing takes place (most importantly, 1860scheduling an ACK to be sent). 1861 1862The function ``parse_regular_packet`` is simple: It iterates over the 1863whole decrypted payload of the incoming packet and parses out frames one 1864by one. An error aborts the connection. 1865 1866ACK Merging 1867=========== 1868 1869Processing ACKs is `expensive <#handling-acks>`__. When sending data, a 1870batch of incoming packets is likely to contain an ACK frame each. The 1871ACK frame handler, ``process_ack_frame()``, merges consecutive ACK frames 1872and stores the result in `fc_ack`_. The ACK is processed 1873during the `next tick <#ticking>`__. If the two ACK cannot be merged 1874(which is unlikely), the cached ACK is processed immediately and the new 1875ACK is cached. 1876 1877Caching an ACK has a non-trivial memory cost: the 4KB-plus data 1878structure ``ack_info`` accounts for more than half of the size of the 1879``full_conn`` struct. Nevertheless, the tradeoff is well worth it. ACK 1880merging reduces the number of calls to ``lsquic_send_ctl_got_ack()`` by a 1881factor of 10 or 20 in some high-throughput scenarios. 1882 1883Ticking 1884======= 1885 1886When a `connection is processed by the 1887engine <#processing-connections>`__, the engine calls the connection's 1888``ci_tick()`` method. This is where most of the connection logic is 1889exercised. In the full gQUIC connection, this method is implemented by 1890``full_conn_ci_tick()``. 1891 1892The following steps are performed: 1893 1894- A cached ACK, if it exists, is processed 1895 1896- Expired alarms are rung 1897 1898- Stream read events are dispatched 1899 1900- An ACK frame is generated if necessary 1901 1902- Other control frames are generated if necessary 1903 1904- Lost packets are rescheduled 1905 1906- More control frames and stream resets are generated if necessary 1907 1908- HEADERS stream is flushed 1909 1910- Outgoing packets that carry stream data are scheduled in four steps: 1911 1912 a. High-priority `buffered packets <#buffered-queue>`__ are scheduled 1913 1914 b. Write events are dispatched for high-priority streams 1915 1916 c. Non-high-priority buffered packets are scheduled 1917 1918 d. Write events are dispatched for non-high-priority streams 1919 1920- Connection close or PING frames are generated if necessary 1921 1922- Streams are serviced (closed, freed, created) 1923 1924Full IETF Connection 1925******************** 1926 1927*Files: lsquic_full_conn_ietf.h, lsquic_full_conn_ietf.c* 1928 1929Overview 1930======== 1931 1932This module implements IETF QUIC 1933`Transport <https://tools.ietf.org/html/draft-ietf-quic-transport-34>`_ 1934and 1935`HTTP/3 <https://tools.ietf.org/html/draft-ietf-quic-http-34>`_ logic, 1936plus several QUIC extensions. To attain an overall grasp of the code, 1937at least some familiarity with these protocols is required. To understand 1938the code in detail, especially *why* some things are done, a closer reading 1939of the specification may be in order. 1940 1941In some places, the code contains comments with references to the 1942specification, e.g. 1943 1944:: 1945 1946 if (conn->ifc_flags & IFC_SERVER) 1947 { /* [draft-ietf-quic-transport-34] Section 19.7 */ 1948 ABORT_QUIETLY(0, TEC_PROTOCOL_VIOLATION, 1949 "received unexpected NEW_TOKEN frame"); 1950 return 0; 1951 } 1952 1953(A search for "[draft-ietf" will reveal over one hundred places in the 1954code thus commented.) 1955 1956The Full IETF Connection module is similar in structure to the `Full gQUIC 1957Connection`_ module, from which it originated. Some code is quite similar 1958as well, including logic for `ACK Merging`_ and `Ticking`_. 1959 1960Components 1961========== 1962 1963In this section, each member of ``ietf_full_conn`` is documented. 1964 1965ifc_conn 1966-------- 1967 1968The first member of the struct is the common connection object, 1969`lsquic_conn`_. 1970 1971It must be first in the struct because the two pointer are cast to each 1972other, depending on circumstances. 1973 1974ifc_cces 1975-------- 1976 1977This array holds eight connection CID elements. 1978See `Managing SCIDs`_. 1979 1980ifc_rechist 1981----------- 1982 1983This member holds the `packet receive history <#receive-history>`__. 1984The receive history is used to generate ACK frames. 1985 1986ifc_max_ackable_packno_in 1987------------------------- 1988 1989This value is used to detect holes in incoming packet number sequence. 1990This information is used to queue ACK frames. 1991 1992ifc_send_ctl 1993------------ 1994 1995This is the `Send Controller`_. It is used to 1996allocate outgoing packets, control sending rate, and process 1997acknowledgements. 1998 1999ifc_pub 2000------- 2001 2002This member holds the `Connection Public Interface`_ 2003 2004ifc_alset 2005--------- 2006 2007This is the `Alarm Set`_. It is used to set various timers 2008in the connection and the send controller. 2009 2010ifc_closed_stream_ids 2011--------------------- 2012 2013The two sets in this array hold the IDs of closed streams. 2014 2015There are two of them because of the uneven distribution of stream IDs. 2016The set data structure is meant to hold sequences without gaps. 2017It is more efficient to hold stream IDs for each stream type in 2018separate structures. 2019 2020ifc_n_created_streams 2021--------------------- 2022 2023Counters for locally initiated streams. Used to generate next 2024stream ID. 2025 2026ifc_max_allowed_stream_id 2027------------------------- 2028 2029Maximum allowed stream ID for each of the four (``N_SITS``) stream types. 2030This is used all over the place. 2031 2032ifc_closed_peer_streams 2033----------------------- 2034 2035Counts how many remotely-initiated streams have been closed. Because the 2036protocol mandates that the stream IDs be assigned in order, this allows us 2037to make some logical inferences in the code. 2038 2039ifc_max_streams_in 2040------------------ 2041 2042Maximum number of open streams the peer is allowed to initiate. 2043 2044ifc_max_stream_data_uni 2045----------------------- 2046 2047Initial value of the maximum amount of data locally-initiated unidirectional 2048stream is allowed to send. 2049 2050ifc_flags 2051--------- 2052 2053All kinds of flags. 2054 2055ifc_mflags 2056---------- 2057 2058More flags! 2059 2060ifc_send_flags 2061-------------- 2062 2063The send flags keep track of which control frames are queued to be sent. 2064 2065ifc_delayed_send 2066---------------- 2067 2068Some send flags are delayed. 2069 2070We stop issuing streams credits if peer stops opening QPACK decoder window. 2071This addresses a potential attack whereby client can cause the server to keep 2072allocating memory. See `Security Considerations in the QPACK Internet-Draft 2073<https://tools.ietf.org/html/draft-ietf-quic-qpack-21#section-7.3>`__. 2074 2075ifc_send 2076-------- 2077 2078This is the `Send Controller`_. It is used to allocate outgoing packets, 2079control sending rate, and process acknowledgements. 2080 2081ifc_error 2082--------- 2083 2084This struct records which type of error has occurred (transport or application)' 2085and the error code. 2086 2087ifc_n_delayed_streams 2088--------------------- 2089 2090Count how many streams have been delayed. 2091 2092When ``lsquic_conn_make_stream()`` is called, a stream may not be created 2093immediately. It is delayed if creating a stream would go over the 2094maximum number of stream allowed by peer. 2095 2096ifc_n_cons_unretx 2097----------------- 2098 2099Counts how many consecutive unretransmittable packets have been sent. 2100 2101Enough unretransittable sent packets in a row causes a PING frame to 2102be sent. This forces the peer to send an ACK. 2103 2104ifc_pii 2105------- 2106 2107Points to the selected priority iterator. 2108 2109The IETF Full Connection supports two priority mechanisms: the original 2110Google QUIC priority mechanism and the `HTTP/3 Extensible Priorities 2111<https://tools.ietf.org/html/draft-ietf-httpbis-priority-03>`__. 2112 2113ifc_errmsg 2114---------- 2115 2116Holds dynamically generated error message string. 2117 2118Once set, the error string does not change until the connection is 2119destroyed. 2120 2121ifc_enpub 2122--------- 2123 2124This points to the `engine's public interface <#lsquic-engine-public>`__. 2125 2126ifc_settings 2127------------ 2128 2129Pointer to the engine settings. 2130 2131This member is superfluous -- the settings can be fetched from 2132``ifc_enpub->enp_settings``. 2133 2134ifc_stream_ids_to_ss 2135-------------------- 2136 2137Holds a queue of STOP_SENDING frames to send as response to remotely 2138initiated streams that came in after we sent a GOAWAY frame. 2139 2140ifc_created 2141----------- 2142 2143Time when the connection was created. This is used for the Timestamp 2144and Delayed ACKs extensions. 2145 2146ifc_saved_ack_received 2147---------------------- 2148 2149Time when cached ACK frame was received. See `ACK Merging`_. 2150 2151ifc_max_ack_packno 2152------------------ 2153 2154Holding the maximum packet number containing an ACK frame allows us 2155to ignore old ACK frames. One value per Packet Number Space is kept. 2156 2157ifc_max_non_probing 2158------------------- 2159 2160Maximum packet number of a received non-probing packets. This is used 2161for path migration. 2162 2163ifc_cfg 2164------- 2165 2166Local copy of a couple of transport parameters. We could get at them 2167with a function call, but these are used often enough to optimize 2168fetching them. 2169 2170ifc_process_incoming_packet 2171--------------------------- 2172 2173The client goes through version negotiation and the switches to the 2174fast function. The server begins to use the fast function immediately. 2175 2176ifc_n_slack_akbl 2177---------------- 2178 2179Number ackable packets received since last ACK was sent. A count is 2180kept for each Packet Number Space. 2181 2182ifc_n_slack_all 2183--------------- 2184 2185Count of all packets received since last ACK was sent. This is only 2186used in the Application PNS (Packet Number Space). (This is regular 2187PNS after the handshake completes). 2188 2189ifc_max_retx_since_last_ack 2190--------------------------- 2191 2192This number is the maximum number of ack-eliciting packets to receive 2193before an ACK must be sent. 2194 2195The default value is 2. When the Delayed ACKs extension is used, this 2196value gets modified by peer's ACK_FREQUENCY frames. 2197 2198ifc_max_ack_delay 2199----------------- 2200 2201Maximum amount of allowed after before an ACK is sent if the threshold 2202defined by ifc_max_retx_since_last_ack_ has not yet been reached. 2203 2204The default value is 25 ms. When the Delayed ACKs extension is used, this 2205value gets modified by peer's ACK_FREQUENCY frames. 2206 2207ifc_ecn_counts_in 2208----------------- 2209 2210Incoming ECN counts in each of the Packet Number Spaces. These counts 2211are used to generate ACK frames. 2212 2213ifc_max_req_id 2214-------------- 2215 2216Keeps track of the maximum ID of bidirectional stream ID initiated by the 2217peers. It is used to construct the GOAWAY frame. 2218 2219ifc_hcso 2220-------- 2221 2222State for outgoing HTTP/3 control stream. 2223 2224ifc_hcsi 2225-------- 2226 2227State for incoming HTTP/3 control stream. 2228 2229ifc_qeh 2230------- 2231 2232QPACK encoder streams handler. 2233 2234The handler owns two unidirectional streams: a) locally-initiated QPACK 2235encoder stream, to which it writes; and b) peer-initiated QPACK decoder 2236stream, from which it reads. 2237 2238ifc_qdh 2239------- 2240 2241QPACK decoder streams handler. 2242 2243The handler owns two unidirectional streams: a) peer-initiated QPACK 2244encoder stream, from which it reads; and b) locally-initiated QPACK 2245decoder stream, to which it writes. 2246 2247ifc_peer_hq_settings 2248-------------------- 2249 2250Peer's HTTP/3 settings. 2251 2252ifc_dces 2253-------- 2254 2255List of destination connection ID elements (DCEs). Each holds a DCID 2256and the associated stateless reset token. When lsquic uses a DCID, it 2257inserts the stateless reset token into a hash so that stateless resets 2258can be found. 2259 2260Outside of the initial migration, the lsquic client code does not switch 2261DCIDs. One idea (suggested in the drafts somewhere) is to switch DCIDs 2262after a period of inactivity. 2263 2264ifc_to_retire 2265------------- 2266 2267List of DCIDs to retire. 2268 2269ifc_scid_seqno 2270-------------- 2271 2272Sequence generator for SCIDs generated by the endpoint. 2273 2274ifc_scid_timestamp 2275------------------ 2276 2277List of timestamps for the generated SCIDs. 2278 2279This list is used in the SCID rate-limiting mechanism. 2280 2281 2282ifc_incoming_ecn 2283---------------- 2284 2285History indicating presence of ECN markings on most recent incoming packets. 2286 2287ifc_cur_path_id 2288--------------- 2289 2290Current path ID -- indexes `ifc_paths`_. 2291 2292ifc_used_paths 2293-------------- 2294 2295Bitmask of which paths in `ifc_paths`_ are being used. 2296 2297ifc_mig_path_id 2298--------------- 2299 2300Path ID of the path being migrated to. 2301 2302ifc_active_cids_limit 2303--------------------- 2304 2305This is the maximum number of CIDs at any one time this 2306endpoint is allowed to issue to peer. If the TP value exceeds ``cn_n_cces``, 2307it is reduced to it. 2308 2309ifc_active_cids_count 2310--------------------- 2311 2312This value tracks how many CIDs have been issued. It is decremented 2313each time a CID is retired. 2314 2315ifc_first_active_cid_seqno 2316-------------------------- 2317 2318Another piece of the SCID rate-limiting mechanism. 2319 2320ifc_ping_unretx_thresh 2321---------------------- 2322 2323Once the number consecutively sent non-ack-elicing packets 2324(`ifc_n_cons_unretx`_) exceeds this value, this endpoint will send 2325a PING frame to force the peer to respond with an ACK. 2326 2327The threshold begins at 20 and then made to fluctuate randomly between 232812 and 27. 2329 2330ifc_last_retire_prior_to 2331------------------------ 2332 2333Records the maximum value of ``Retire Prior To`` value of the 2334`NEW_CONNECTION_ID frame 2335<https://tools.ietf.org/html/draft-ietf-quic-transport-34#section-19.15>`_. 2336 2337ifc_ack_freq_seqno 2338------------------ 2339 2340Sequence number generator for ACK_FREQUENCY frames generated by this 2341endpoint. 2342 2343ifc_last_pack_tol 2344----------------- 2345 2346Last value of the ``Packet Tolerance`` field sent in the last 2347``ACK_FREQUENCY`` frame generated by this endpoint. 2348 2349ifc_last_calc_pack_tol 2350---------------------- 2351 2352Last *calculated* value of the ``Packet Tolerance`` field. 2353 2354ifc_min_pack_tol_sent 2355--------------------- 2356 2357Minimum value of the ``Packet Tolerance`` field sent. Only used for 2358statistics display. 2359 2360ifc_max_pack_tol_sent 2361--------------------- 2362 2363Maximum value of the ``Packet Tolerance`` field sent. Only used for 2364statistics display. 2365 2366ifc_max_ack_freq_seqno 2367---------------------- 2368 2369Maximum seen sequence number of incoming ``ACK_FREQUENCY`` frame. Used 2370to discard old frames. 2371 2372ifc_max_udp_payload 2373------------------- 2374 2375Maximum UDP payload. This is the cached value of the transport parameter. 2376 2377ifc_last_live_update 2378-------------------- 2379 2380Last time ``ea_live_scids()`` was called. 2381 2382ifc_paths 2383--------- 2384 2385Array of connection paths. Most of the time, only one path is used; more 2386are used during `migration <#path-migration>`__. The array has four 2387elements as a safe upper limit. 2388 2389The elements are of type ``struct conn_path``. Besides the network path, 2390which stores socket addresses and is associated with each outgoing packet 2391(via ``po_path``), the connection path keeps track of the following 2392information: 2393 2394- Outgoing path challenges. See `Sending Path Challenges`_. 2395 2396- Incoming path challenge. 2397 2398- Spin bit (``cop_max_packno``, ``cop_spin_bit``, and ``COP_SPIN_BIT``). 2399 2400- DPLPMTUD state. 2401 2402ifc_u.cli 2403--------- 2404 2405Client-specific state. This is where pointers to "crypto streams" are 2406stored; they are not in the ``ifc_pub.all_streams`` hash. 2407 2408ifc_u.ser 2409--------- 2410 2411The server-specific state is only about push promises. 2412 2413ifc_idle_to 2414----------- 2415 2416Idle timeout. 2417 2418ifc_ping_period 2419--------------- 2420 2421Ping period. 2422 2423ifc_bpus 2424-------- 2425 2426A hash of buffered priority updates. It is used when a priority update 2427(part of the Extensible HTTP Priorities extension) arrives before the 2428stream it is prioritizing. 2429 2430ifc_last_max_data_off_sent 2431-------------------------- 2432 2433Value of the last MAX_DATA frame sent. This is used to limit the number 2434of times we send the MAX_DATA frame in response to a DATA_BLOCKED frame. 2435 2436ifc_min_dg_sz 2437------------- 2438 2439Minimum size of the DATAGRAM frame. Used by the eponymous extension. 2440 2441ifc_max_dg_sz 2442------------- 2443 2444Maximum size of the DATAGRAM frame. Used by the eponymous extension. 2445 2446ifc_pts 2447------- 2448 2449PTS stands for "Packet Tolerance Stats". Information collected here 2450is used to calculate updates to the packet tolerance advertised to the 2451peer via ACK_FREQUENCY frames. Part of the Delayed ACKs extension. 2452 2453ifc_stats 2454--------- 2455 2456Cumulative connection stats. 2457 2458ifc_last_stats 2459-------------- 2460 2461Copy of `ifc_stats`_ last time ``ci_log_stats()`` was called. Used 2462to calculate the difference. 2463 2464ifc_ack 2465------- 2466 2467One or more cached incoming ACK frames. Used for `ACK merging`_. 2468 2469Managing SCIDs 2470============== 2471 2472Source Connection IDs -- or SCIDs for short -- are stored in the `ifc_cces`_ 2473array. 2474 2475Each of ``struct conn_cid_elem`` contains the CID itself, the CID's port or 2476sequence number, and flags: 2477 2478- ``CCE_USED`` means that this Connection ID has been used by the peer. This 2479 information is used to check whether the peer's incoming packet is using 2480 a new DCID or reusing an old one when the packet's DCID does not match 2481 this path's current DCID. 2482 2483- ``CCE_REG`` signifies that the CID has been registered with the user-defined 2484 ``ea_new_scids()`` callback. 2485 2486- ``CCE_SEQNO`` means that the connection has been issued by this endpoint 2487 and ``cce_seqno`` contains a valid value. Most of SCIDs are issued by 2488 either endpoint, with one exception: The DCID included in the first few 2489 packets sent by the client becomes an interim SCID for the server and it 2490 does not have a sequence number. This "original" SCID gets retired 2 2491 seconds after the handshake succeeds, see the ``AL_RET_CIDS`` alarm. 2492 2493- ``CCE_PORT`` is used to mark the special case of hashing connections by 2494 port number. In client mode, the lsquic engine may, under some circumstances, 2495 hash the connections by local port number instead of connection ID. 2496 In that case, ``cce_port`` contains the port number used to hash the 2497 connection. 2498 2499Each CIDs is hashed in the of the "CID-to-connection" mapping that the engine 2500maintains. If it is not in the hash, incoming packets that use this CID as 2501DCID will not be dispatched to the connection (because the connection will not 2502be found). 2503 2504Path Migration 2505============== 2506 2507What follows assumes familiarity with `Section 9 2508<https://tools.ietf.org/html/draft-ietf-quic-transport-34#section-9>`__ 2509of the Transport I-D. 2510 2511Server 2512------ 2513 2514The server handles two types of path migration. In the first type, the 2515client performs probing by sending path challenges; in the second type, 2516the migration is due to a NAT rebinding. 2517 2518The connection keeps track of different paths in `ifc_paths`_. Path 2519objects are allocated out of the ``ifc_paths`` array. They are of type 2520``struct conn_path``; one of the members is ``cop_path``, which is the 2521network path object used to send packets (via ``po_path``). 2522 2523Each incoming packet is fed to the engine using the 2524``lsquic_engine_packet_in()`` function. Along with the UDP datagram, 2525the local and peer socket addresses are passed to it. These addresses are 2526eventually passed to the connection via the ``ci_record_addrs()`` call. 2527The first of these calls -- for the first incoming packet -- determines the 2528*current path*. When the address pair, which is a four-tuple of local 2529and remote IP addresses and port numbers, does not match that of the 2530current path, a new path object is created, triggering migration logic. 2531 2532``ci_record_addrs()`` returns a *path ID*, which is simply the index of 2533the corresponding element in the ``ifc_paths`` array. The current path 2534ID is stored in ``ifc_cur_path_id``. The engine assigns this value to 2535the newly created incoming packet (in ``pi_path_id``). The packet is 2536then passed to ``ci_packet_in()``. 2537 2538The first part of the path-switching logic is in ``process_regular_packet()``: 2539 2540:: 2541 2542 case REC_ST_OK: 2543 /* --- 8< --- some code elided... */ 2544 saved_path_id = conn->ifc_cur_path_id; 2545 parse_regular_packet(conn, packet_in); 2546 if (saved_path_id == conn->ifc_cur_path_id) 2547 { 2548 if (conn->ifc_cur_path_id != packet_in->pi_path_id) 2549 { 2550 if (0 != on_new_or_unconfirmed_path(conn, packet_in)) 2551 { 2552 LSQ_DEBUG("path %hhu invalid, cancel any path response " 2553 "on it", packet_in->pi_path_id); 2554 conn->ifc_send_flags &= ~(SF_SEND_PATH_RESP 2555 << packet_in->pi_path_id); 2556 } 2557 2558The above means: if the current path has not changed after the packet 2559was processed, but the packet came in on a different path, then invoke 2560the "on new or unconfirmed path" logic. This is done this way because 2561the current path may be have been already changed if the packet contained 2562a PATH_RESPONSE frame. 2563 2564First time a packet is received on a new path, a PATH_CHALLENGE frame is 2565scheduled. 2566 2567If more than one packet received on the new path contain non-probing frames, 2568the current path is switched: it is assumed that the path change is due to 2569NAT rebinding. 2570 2571Client 2572------ 2573 2574Path migration is controlled by the client. When the client receives 2575a packet from an unknown server address, it drops the packet on the 2576floor (per spec). This code is in ``process_regular_packet()``. 2577 2578The client can migrate if ``es_allow_migration`` is on (it is in the default 2579configuration) and the server provides the "preferred_address" transport 2580parameter. The migration process begins once the handshake is confirmed; 2581see the ``maybe_start_migration()`` function. The SCID provided by the 2582server as part of the "preferred_address" transport parameter is used as the 2583destination CID and path #1 is picked: 2584 2585 2586:: 2587 2588 copath = &conn->ifc_paths[1]; 2589 migra_begin(conn, copath, dce, (struct sockaddr *) &sockaddr, params); 2590 return BM_MIGRATING; 2591 2592In ``migra_begin``, migration state is initiated and sending of a 2593PATH_CHALLENGE frame is scheduled: 2594 2595:: 2596 2597 conn->ifc_mig_path_id = copath - conn->ifc_paths; 2598 conn->ifc_used_paths |= 1 << conn->ifc_mig_path_id; 2599 conn->ifc_send_flags |= SF_SEND_PATH_CHAL << conn->ifc_mig_path_id; 2600 LSQ_DEBUG("Schedule migration to path %hhu: will send PATH_CHALLENGE", 2601 conn->ifc_mig_path_id); 2602 2603Sending Path Challenges 2604----------------------- 2605 2606To send a path challenge, a packet is allocated to be sent on that path, 2607a new challenge is generated, the PATH_CHALLENGE is written to the 2608packet, and the packet is scheduled. All this happens in the 2609``generate_path_chal_frame()`` function. 2610 2611:: 2612 2613 need = conn->ifc_conn.cn_pf->pf_path_chal_frame_size(); 2614 packet_out = get_writeable_packet_on_path(conn, need, &copath->cop_path, 1); 2615 /* --- 8< --- some code elided... */ 2616 w = conn->ifc_conn.cn_pf->pf_gen_path_chal_frame( 2617 packet_out->po_data + packet_out->po_data_sz, 2618 lsquic_packet_out_avail(packet_out), 2619 copath->cop_path_chals[copath->cop_n_chals]); 2620 /* --- 8< --- some code elided... */ 2621 lsquic_alarmset_set(&conn->ifc_alset, AL_PATH_CHAL + path_id, 2622 now + (INITIAL_CHAL_TIMEOUT << (copath->cop_n_chals - 1))); 2623 2624If the path response is not received before a timeout, another path challenge 2625is sent, up to the number of elements in ``cop_path_chals``. The timeout 2626uses exponential back-off; it is not based on RTT, because the RTT of the 2627new path is unknown. 2628 2629Receiving Path Responses 2630------------------------ 2631 2632When a PATH_RESPONSE frame is received, the path on which the corresponding 2633challenge was sent may become the new current path. See 2634``process_path_response_frame()``. 2635 2636Note that the path ID of the incoming packet with the PATH_RESPONSE frame is 2637not taken into account. This is by design: see 2638`Section 8.2.2 of the Transport I-D 2639<https://tools.ietf.org/html/draft-ietf-quic-transport-34#section-8.2.2>`__. 2640 2641Stream Priority Iterators 2642========================= 2643 2644Creating Streams on the Server 2645============================== 2646 2647Calculating Packet Tolerance 2648============================ 2649 2650When the Delayed ACKs extension is used, we advertise our ``Packet Tolerance`` 2651to peer. This is the number of packets the peer can receive before having to 2652send an acknowledgement. By default -- without the extension -- the packet 2653tolerance is 2. 2654 2655Because we `merge ACKs <#ack-merging>`__, receiving more than one ACK between 2656ticks is wasteful. Another consideration is that a packet is not declared 2657lost until at least one RTT passes -- the time to send a packet and receive 2658the acknowledgement from peer. 2659 2660To calculate the packet tolerance, we use a feedback mechanism: when number 2661of ACKs per RTT is too high, we increase packet tolerance; when number of 2662ACKs per RTT is too low, we decrease packet tolerance. The feedback is 2663implemented with a `PID Controller <https://en.wikipedia.org/wiki/PID_controller>`__: 2664the target is the number of ACKs per RTT, normalized to 1.0. 2665 2666See the function ``packet_tolerance_alarm_expired()`` as well as comments 2667in ``lsquic.h`` that explain the normalization as well as the knobs available 2668for tuning. 2669 2670The pre-normalized target is a function of RTT. It was obtained 2671empirically using netem. This function together with the default 2672PID controller parameters give good performance in the lab and in 2673some limited interop testing. 2674 2675Anatomy of Outgoing Packet 2676************************** 2677 2678Overview 2679======== 2680 2681The outgoing packet is represented by ``struct lsquic_packet_out``. An 2682outgoing packet always lives on one -- and only one -- of the 2683`Send Controller`_'s `Packet Queues`_. For that, ``po_next`` is used. 2684 2685Beyond the packet number, stored in ``po_packno``, the packet has several 2686properties: sent time (``po_sent``), frame information, encryption 2687level, network path, and others. Several properties are encoded into 2688one or more bits in the bitmasks ``po_flags`` and ``po_lflags``. 2689Multibit properties are usually accessed and modified by a special 2690macro. 2691 2692The packet has a pointer to the packetized data in ``po_data``. 2693If the packet has been encrypted but not yet sent, the encrypted 2694buffer is pointed to ``po_enc_data``. 2695 2696Packet Payload 2697============== 2698 2699The payload consists of the various frames -- STREAM, ACK, and others -- 2700written, one after another, to ``po_data``. The header, consisting of 2701the type byte, (optional) connection ID, and the packet number is constructed 2702when the packet is just about to be sent, during encryption. This 2703buffer -- header and the encrypted payload are stored in a buffer 2704pointed to by ``po_enc_data``. 2705 2706Because stream data is written directly to the outgoing packet, the 2707packet is not destroyed when it is declared lost by the `loss detection 2708logic <#loss-detection-and-retransmission>`__. Instead, it is repackaged 2709and sent out again as a new packet. Besides assigning the packet a 2710new number, packet retransmission involves removing non-retransmittable 2711frames from the packet. (See ``lsquic_packet_out_chop_regen()``.) 2712 2713Historically, some places in the code assumed that the frames to be 2714dropped are always located at the beginning of the ``po_data`` buffer. 2715(This was before a `frame record <#frame-records>`__ was created for 2716each frame). The cumulative size of the frames to be removed is in 2717``po_regen_sz``; this size can be zero. Code that generates 2718non-retransmittable frames still writes them only to the beginning 2719of the packet. 2720 2721The goal is to drop ``po_regen_sz`` and to begin to write ACK and 2722other non-retransmittable frames anywhere. This should be possible 2723to do now (see ``lsquic_packet_out_chop_regen()``, which can support 2724such use after removing the assertion), but we haven't pulled the 2725trigger on it yet. Making this change will allow other code to become 2726simpler: for example, the opportunistic ACKs logic. 2727 2728Frame Records 2729============= 2730 2731Each frame written to ``po_data`` has an associated *frame record* stored 2732in ``po_frecs``: 2733 2734:: 2735 2736 struct frame_rec { 2737 union { 2738 struct lsquic_stream *stream; 2739 uintptr_t data; 2740 } fe_u; 2741 unsigned short fe_off, 2742 fe_len; 2743 enum quic_frame_type fe_frame_type; 2744 }; 2745 2746Frame records are primarily used to keep track of the number of unacknowledged 2747stream frames for a stream. When a packet is acknowledged, the frame records 2748are iterated over and ``lsquic_stream_acked()`` is called. The second purpose 2749is to speed up packet resizing, as frame records record the type, position, 2750and size of a frame. 2751 2752Most of the time, a packet will contain a single frame: STREAM on the sender 2753of data and ACK on the receiver. This use case is optimized: ``po_frecs`` is 2754a union and when there is only one frame per packets, the frame record is 2755stored in the packet struct directly. 2756 2757Evanescent Connection 2758********************* 2759 2760*Files: lsquic_pr_queue.h, lsquic_pr_queue.c* 2761 2762"PR Queue" stands for "Packet Request Queue." This and the Evanescent 2763Connection object types are explaned below in this section. 2764 2765Overview 2766======== 2767 2768Some packets need to be replied to outside of context of existing 2769mini or full connections: 2770 27711. A version negotiation packet needs to be sent when a packet 2772 arrives that specifies QUIC version that we do not support. 2773 27742. A stateless reset packet needs to be sent when we receive a 2775 packet that does not belong to a known QUIC connection. 2776 2777 2778The replies cannot be sent immediately. They share outgoing 2779socket with existing connections and must be scheduled according 2780to prioritization rules. 2781 2782The information needed to generate reply packet -- connection ID, 2783connection context, and the peer address -- is saved in the Packet 2784Request Queue. 2785 2786When it is time to send packets, the connection iterator knows to 2787call prq_next_conn() when appropriate. What is returned is an 2788evanescent connection object that disappears as soon as the reply 2789packet is successfully sent out. 2790 2791There are two limits associated with Packet Request Queue: 2792 27931. Maximum number of packet requests that are allowed to be 2794 pending at any one time. This is simply to prevent memory 2795 blowout. 2796 27972. Maximum verneg connection objects to be allocated at any one 2798 time. This number is the same as the maximum batch size in 2799 the engine, because the packet (and, therefore, the connection) 2800 is returned to the Packet Request Queue when it could not be 2801 sent. 2802 2803We call this a "request" queue because it describes what we do with 2804QUIC packets whose version we do not support or those packets that 2805do not belong to an existing connection: we send a reply for each of 2806these packets, which effectively makes them "requests." 2807 2808Packet Requests 2809=============== 2810 2811When an incoming packet requires a non-connection response, it is added 2812to the Packet Request Queue. There is a single ``struct pr_queue`` per 2813engine -- it is instantiated if the engine is in the server mode. 2814 2815The packet request is recorded in ``struct packet_req``, which are kept 2816inside a hash in the PR Queue. The reason for keeping the requests in 2817a hash is to minimize duplicate responses: If a client hello message 2818is spread over several incoming packets, only one response carrying the 2819version negotiation packet (for example) will be sent. 2820 2821:: 2822 2823 struct packet_req 2824 { 2825 struct lsquic_hash_elem pr_hash_el; 2826 lsquic_cid_t pr_scid; 2827 lsquic_cid_t pr_dcid; 2828 enum packet_req_type pr_type; 2829 enum pr_flags { 2830 PR_GQUIC = 1 << 0, 2831 } pr_flags; 2832 enum lsquic_version pr_version; 2833 unsigned pr_rst_sz; 2834 struct network_path pr_path; 2835 }; 2836 2837Responses are created on demand. Until that time, everything that is 2838necessary to generate the response is stored in ``packet_req``. 2839 2840Sending Responses 2841================= 2842 2843To make these packets fit into the usual packet-sending loop, 2844each response is made to resemble a packet 2845sent by a connecteion. For that, the PR Queue creates a connection 2846object that only lives for the duration of batching of the packet. 2847(Hence the connection's name: *evanescent* connection.) This connection 2848is returned by the ``lsquic_prq_next_conn()`` by the connection iterator 2849during the `batching process <#batching-packets>`__ 2850 2851For simplicity, the response packet is generated in this function as well. 2852The call to ``ci_next_packet_to_send()`` only returns the pointer to it. 2853 2854Send Controller 2855*************** 2856 2857*Files: lsquic_send_ctl.h, lsquic_send_ctl.c* 2858 2859.. _overview-4: 2860 2861Overview 2862======== 2863 2864The Send Controller manages outgoing packets and the sending rate: 2865 2866- It decides whether packets can be sent 2867 2868- It figures out what the congestion window is 2869 2870- It processes acknowledgements and detects packet losses 2871 2872- It allocates packets 2873 2874- It maintains sent packet history 2875 2876The controller allocates, manages, splits, coalesces, and destroys 2877outgoing packets. It owns these packets. 2878 2879The send controller services two modules: 2880 2881- Full connection. gQUIC and IETF full connections use the send 2882 controller to allocate packets and delegate packet-sending 2883 decisions to it. 2884 2885- Stream. The stream uses the stream controller as the source of 2886 outgoing packets to write STREAM frames to. 2887 2888Packet Life Cycle 2889================= 2890 2891A new outgoing packet is allocated and returned to the connection or the 2892stream. Around this time (just before or just after, depending on the 2893particular function call to get the packet), the packet is placed on the 2894Scheduled Queue. 2895 2896When the engine is creating a batch of packets to send, it calls 2897``ci_next_packet_to_send()``. The connection removes the next packet from 2898its Scheduled Queue and returns it. The engine now owns the outgoing 2899packet, but only while the batch is being sent. The engine *always 2900returns the packet* after it tries to send it. 2901 2902If the packet was sent successfully, it is returned via the 2903``ci_packet_sent`` call, after which it is appended to the Unacked Queue. 2904If the packet could not be sent, ``ci_packet_not_sent()`` is called, at 2905which point it is prepended back to the Schedule Queue to be tried 2906later. 2907 2908There are two ways to get off the Unacked Queue: being acknowledged or 2909being lost. When a packet is acknowledged, it is destroyed. On the other 2910hand, if it is deemed lost, it is placed onto the Lost Queue, where it 2911will await being rescheduled. 2912 2913Packet Queues 2914============= 2915 2916.. image:: lsquic-packet-queues.png 2917 2918Buffered Queue 2919-------------- 2920 2921The Buffered Queue is a special case. When writing to the stream occurs 2922outside of the event dispatch loop, new outgoing packets begin their 2923life in the Buffered Queue. They get scheduled during a connection tick, 2924making their way onto the Scheduled Queue. 2925 2926There are two buffered queues: one for packets holding STREAM frames 2927from the highest-priority streams and one for packets for streams with 2928lower priority. 2929 2930Scheduled Queue 2931--------------- 2932 2933Packets on the Scheduled Queue have packet numbers assigned to them. In 2934rare cases, packets may be removed from this queue before being sent 2935out. (For example, a stream may be cancelled, in which case packets that 2936carry its STREAM frames may end up empty.) In that case, they are marked 2937with a special flag to generate the packet number just before they are 2938sent. 2939 2940Unacked Queue 2941------------- 2942 2943This queue holds packets that have been sent but are yet to be 2944acknowledged. When a packet on this queue is acknowledged, it is 2945destroyed. 2946 2947The loss detection code runs on this queue when ACKs are received or 2948when the retransmission timer expires. 2949 2950This queue is actually three queues: one for each of the IETF QUIC's 2951Packet Number Spaces, or PNSs. The PNS_APP queue is what is used by 2952gQUIC and IETF QUIC server code. PNS_INIT and PNS_HSK are only used by 2953the IETF QUIC client. (IETF QUIC server handles those packet number 2954spaces in its mini conn module.) 2955 2956In addition to regular packets, the Unacked Queue holds `loss 2957records <#loss-records>`__ and `poisoned packets <#poisoned-packets>`__. 2958 2959Lost Queue 2960---------- 2961 2962This queue holds lost packets. These packets are removed from the 2963Unacked Queue when it is decided that they have been lost. Packets on 2964this queue get rescheduled after connection schedules a packet with 2965control frames, as those have higher priority. 2966 29670-RTT Stash Queue 2968----------------- 2969 2970This queue is used by the client to retransmit packets that carry 0-RTT 2971data. 2972 2973Handling ACKs 2974============= 2975 2976Acknowledgements are processed in the function 2977``lsquic_send_ctl_got_ack``. 2978 2979One of the first things that is done is ACK validation. We confirm that 2980the ACK does not contain any packet numbers that we did not send. 2981Because of the way we `generate packet numbers <#packet-numbers>`__, 2982this check is a simple comparison. 2983 2984The nested loops work as follows. The outer loop iterates over the 2985packets in the Unacked Queue in order -- meaning packet numbers 2986increase. In other words, older packets are examined first. The inner 2987loop processes ACK ranges in the ACK *backwards*, meaning that both 2988loops follow packets in increasing packet number order. It is done this 2989way as an optimization. The (previous) alternative design of looking 2990up a packet number in the ACK frame, even if using binary search, is 2991slower. 2992 2993The code is optimized: the inner loop has a minimum possible number of 2994branches. These optimizations predate the more-recent, higher-level 2995optimization. The latest ACK-handling optimization added to the code 2996combines incoming ACKs into a single ACK (at the connection level), thus 2997reducing the number of times this loop needs to be used by a lot, 2998sometimes by a significant factor (when lots of data is being sent). 2999This makes some of the code-level optimizations, such as the use of 3000``__builtin_prefetch``, an overkill. 3001 3002Loss Records 3003============ 3004 3005A loss record is a special type of outgoing packet. It marks a place in 3006the Unacked Queue where a lost packet had been -- the lost packet itself 3007having since moved on to the Lost Queue or further. The loss record and 3008the lost packet form a circular linked list called the "loss chain." 3009This list contains one real packet and zero or more loss records. The 3010real packet can move from the Unacked Queue to the Lost Queue to the 3011Scheduled Queue and back to the Unacked Queue; its loss records live 3012only on the Unacked Queue. 3013 3014We need loss records to be able to handle late acknowledgements -- those 3015that acknowledge a packet *after* it has been deemed lost. When an 3016acknowledgment for any of the packet numbers associated with this packet 3017comes in, the packet is acknowledged and the whole loss chain is 3018destroyed. 3019 3020Poisoned Packets 3021================ 3022 3023A poisoned packet is used to thwart opportunistic ACK attacks. The 3024opportunistic ACK attack works as follows: 3025 3026- The client requests a large resource 3027 3028- The server begins sending the response 3029 3030- The client sends ACKs for packet number before it sees these packets, 3031 tricking the server into sending packets faster than it would 3032 otherwise 3033 3034The poisoned packet is placed onto the Unacked Queue. If the peer lies 3035about packet numbers it received, it will acknowledge the poisoned 3036packet, in which case it will be discovered during ACK processing. 3037 3038Poisoned packets cycle in and out of the Unacked Queue. A maximum of one 3039poisoned packet is outstanding at any one time for simplicity. (And we 3040don't need more). 3041 3042Packet Numbers 3043============== 3044 3045The Send Controller aims to send out packets without any gaps in the 3046packet number sequence. (The only exception to this rule is the handling 3047of poisoned packets, where the gap is what we want.) Not having gaps in 3048the packet number sequence is beneficial: 3049 3050- ACK verification is cheap 3051 3052- Send history updates are fast 3053 3054- Send history uses very little memory 3055 3056The downside is code complexity and having to renumber packets when they 3057are removed from the Scheduled Queue (due to, for example, STREAM frame 3058elision or loss chain destruction) or resized (due to a path or MTU 3059change, for instance). 3060 3061Some scenarios when gaps can be produced inadvertently are difficult to 3062test or foresee. To cope with that, a special warning in the send 3063history code is added when the next packet produces a gap. This warning 3064is limited to once per connection. Having a gap does not break 3065functionality other than ACK verification, but that's minor. On the 3066other hand, we want to fix such bugs when they crop up -- that's why the 3067warning is there. 3068 3069Loss Detection and Retransmission 3070================================= 3071 3072The loss detection and retransmission logic in the Send Controller was 3073taken from the Chromium code in the fall of 2016, in the beginning of 3074the lsquic project. This logic has not changed much since then -- only 3075some bugs have been fixed here and there. The code bears no resemblance 3076to what is described in the QUIC Recovery Internet Draft. Instead, `the 3077much earlier 3078document <https://tools.ietf.org/html/draft-iyengar-quic-loss-recovery-01>`__, 3079describing gQUIC, could be looked to for reference. 3080 3081Congestions Controllers 3082======================= 3083 3084The Send Controller has a choice of two congestion controllers: Cubic 3085and BBRv1. The latter was translated from Chromium into C. BBRv1 does 3086not work well for very small RTTs. 3087 3088To cope with that, lsquic puts the Send Controller into the "adaptive CC" 3089mode by default. The CC is selected after RTT is determined: below a 3090certain threshold (configurable; 1.5 ms by default), Cubic is used. 3091Until Cubic or BBRv1 is selected, *both* CC controllers are used -- 3092because we won't have the necessary state to instantiate a controller 3093when the decision is made. 3094 3095Buffered Packet Handling 3096======================== 3097 3098Buffered packets require quite a bit of special handling. Because they 3099are created outside of the regular event dispatch, a lot of things are 3100unknown: 3101 3102- Congestion window 3103 3104- Whether more incoming packets will arrive before the next tick 3105 3106- The optimal packet number size 3107 3108The Send Controller tries its best to accommodate the buffered packets 3109usage scenario. 3110 3111ACKs 3112---- 3113 3114When buffered packets are created, we want to generate an ACK, if 3115possible. This can be seen in ``send_ctl_get_buffered_packet``, which 3116calls ``ci_write_ack()`` 3117 3118This ACK should be in the first buffered packet to be scheduled. Because 3119the Send Controller does not dictate the order of buffered packet 3120creation -- high-priority versus low-priority -- it may need to move (or 3121steal) the ACK frame from a packet on the low-priority queue to a packet 3122on the high-priority queue. 3123 3124When buffered packets are finally scheduled, we have to remove ACKs from 3125them if another ACK has already been sent. This is because Chrome errors 3126out if out-of-order ACKs come in. 3127 3128Flushing QPACK Decoder 3129---------------------- 3130 3131The priority-based write events dispatch is emulated when the first 3132buffered packet is allocated: the QPACK decoder is flushed. Without it, 3133QPACK updates are delayed, which may negatively affect compression 3134ratio. 3135 3136Snapshot and Rollback 3137===================== 3138 3139The Send Controller snapshot and rollback functionality was implemented 3140exclusively for the benefit of the optimized ``lsquic_stream_pwritev`` 3141call. 3142 3143Complexity Woes 3144=============== 3145 3146The Send Controller is complicated. Because we write stream data to 3147packets directly and packets need to be resized, a lot of complexity 3148resides in the code to resize packets, be it due to repathing, STREAM 3149frame elision, or MTU changes. This is the price to be paid for 3150efficiency in the normal case. 3151 3152 3153Alarm Set 3154********* 3155 3156*Files: lsquic_alarmset.h, lsquic_alarmset.c, test_alarmset.c* 3157 3158The alarm set, ``struct lsquic_alarmset``, is an array of callbacks and 3159expiry times. To speed up operations, setting and unsetting alarms is 3160done via macros. 3161 3162The functions to ring [4]_ the alarms and to calculate the next alarm 3163time use a loop. It would be possible to maintain a different data 3164structure, such as a min-heap, to keep the alarm, and that would obviate 3165the need to loop in ``lsquic_alarmset_mintime()``. It is not worth it: 3166the function is not called often and a speed win here would be offset 3167by the necessity to maintain the min-heap ordering. 3168 3169Tickable Queue 3170************** 3171 3172*Files: lsquic_engine.c, lsquic_min_heap.h, lsquic_min_heap.c* 3173 3174The Tickable Queue is a min-heap used as a priority queue. Connections 3175on this queue are in line to be processed. Connections that were last 3176ticked a longer time ago have higher priority than those ticked 3177recently. (``cn_last_ticked`` is used for ordering.) This is intended to 3178prevent starvation as multiple connections vye for the ability to send 3179packets. 3180 3181The way the min-heap grows is described in `Growing 3182Min-Heaps <#growing-min-heaps>`__. 3183 3184Advisory Tick Time Queue 3185************************ 3186 3187*Files: lsquic_attq.h, lsquic_attq.c* 3188 3189This data structure is a mini-heap. Connections are ordered by the value 3190of the next time they should be processed (ticked). (Because this is not 3191a hard limit, this value is advisory -- hence its name.) 3192 3193This particular min-heap implementation has two properties worth 3194highlighting: 3195 3196Removal of Arbitrary Elements 3197============================= 3198 3199When a connection's next tick time is updated (or when the connection is 3200destroyed), the connection is removed from the ATTQ. At that time, it 3201may be at any position in the min-heap. The position is recorded in the 3202heap element, ``attq_elem->ae_heap_idx`` and is updated when elements are 3203swapped. This makes it unnecessary to search for the entry in the 3204min-heap. 3205 3206Swapping Speed 3207============== 3208 3209To make swapping faster, the array that underlies the min-heap is an 3210array of *pointers* to ``attq_elem``. This makes it unnecessary to update 3211each connection's ``cn_attq_elem`` as array elements are swapped: the 3212memory that stores ``attq_elem`` stays put. This is why there are both 3213``aq_elem_malo`` and ``aq_heap``. 3214 3215CID Purgatory 3216************* 3217 3218*Files: lsquic_purga.h, lsquic_purga.c* 3219 3220Overview 3221======== 3222 3223This module keeps a set of CIDs that should be ignored for a period 3224of time. It is used when a connection is closed: this way, late 3225packets will not create a new connection. 3226 3227A connection may have been deleted, retired, or closed. In the latter 3228case, it enters the `Draining State <https://tools.ietf.org/html/draft-ietf-quic-transport-34#section-10.2.2>`__. 3229In this state, the connection is to ignore incoming packets. 3230 3231Structure 3232========= 3233 3234The purgatory keeps a list of 16-KB pages. A page looks like this: 3235 3236:: 3237 3238 #define PURGA_ELS_PER_PAGE 273 3239 3240 struct purga_page 3241 { 3242 TAILQ_ENTRY(purga_page) pupa_next; 3243 lsquic_time_t pupa_last; 3244 unsigned pupa_count; 3245 bloom_mask_el_t pupa_mask[BLOOM_N_MASK_ELS]; 3246 lsquic_cid_t pupa_cids[PURGA_ELS_PER_PAGE]; 3247 void * pupa_peer_ctx[PURGA_ELS_PER_PAGE]; 3248 struct purga_el pupa_els[PURGA_ELS_PER_PAGE]; 3249 }; 3250 3251The reason for having CIDs and peer contexts in separate arrays is to be 3252able to call the ``ea_old_scids()`` callback when a page expires. A page 3253is expired when it is full and the last added element is more than 3254``pur_min_life`` microseconds ago. The minimum CID life is hardcoded as 325530 seconds in lsquic_engine.c (see the ``lsquic_purga_new()`` call). 3256 3257To avoid scannig the whole array of CIDs in ``lsquic_purga_contains()``, 3258we use a Bloom filter. 3259 3260The Bloom filter is constructed using a 8192-bit bit field and 6 hash 3261functions. With 273 elements per page, this gives us 0.004% possibility 3262of a false positive. In other words, when we do have to search a page 3263for a particular CID, the chance of finding the CID is 99.99%. 3264 3265Quick calculation: 3266 3267.. code-block:: text 3268 3269 perl -E '$k=6;$m=1<<13;$n=273;printf("%f\n", (1-exp(1)**-($k*$n/$m))**$k)' 3270 3271To extract 6 13-bit values from a 64-bit integer, they are overlapped: 3272 3273.. code-block:: text 3274 3275 0 10 20 30 40 50 60 3276 +----------------------------------------------------------------+ 3277 | | 3278 +----------------------------------------------------------------+ 3279 1111111111111 3280 2222222222222 3281 3333333333333 3282 4444444444444 3283 5555555555555 3284 6666666666666 3285 3286This is not 100% kosher, but having 6 functions gives a better guarantee 3287and it happens to work in practice. 3288 3289Memory Manager 3290************** 3291 3292*Files: lsquic_mm.h, lsquic_mm.c* 3293 3294The memory manager allocates several types of objects that are used by 3295different parts of the library: 3296 3297- Incoming packet objects and associated buffers 3298 3299- Outgoing packet objects and associated buffers 3300 3301- Stream frames 3302 3303- Frame records 3304 3305- Mini connections, both Google and IETF QUIC 3306 3307- DCID elements 3308 3309- HTTP/3 (a.k.a. "HQ") frames 3310 3311- Four- and sixteen-kilobyte pages 3312 3313These objects are either stored on linked list or in `malo <#malo-allocator>`__ 3314pools and are shared among all connections. (Full connections allocate outgoing 3315packets from per-connection malo allocators: this is done to speed up `ACK 3316processing <#handling-acks>`__.) 3317 3318The list of cached outgoing packet buffers is shrunk once in a while (see 3319the "poolst\_*" functions). Other object types are kept in the cache 3320until the engine is destroyed. One Memory Manager object is allocated per 3321engine instance. 3322 3323Malo Allocator 3324************** 3325 3326*Files: lsquic_malo.h, lsquic_malo.c* 3327 3328Overview 3329======== 3330 3331The malo allocator is a pool of objects of fixed size. It tries to 3332allocate and deallocate objects as fast as possible. To do so, it 3333does the following: 3334 33351. Allocations occur 4 KB at a time. 3336 33372. No division or multiplication operations are performed for 3338 appropriately sized objects. (More on this below.) 3339 3340(In recent testing, malo was about 2.7 times faster than malloc for 334164-byte objects.) 3342 3343Besides speed, the allocator provides a convenient API: 3344To free (put) an object, one does not need a pointer to the malo 3345object. 3346 3347To gain all these advantages, there are trade-offs: 3348 33491. There are two memory penalties: 3350 3351 a. Per object overhead. If an object is at least ROUNDUP_THRESH in 3352 size as the next power of two, the allocator uses that power of 3353 two value as the object size. This is done to avoid using 3354 division and multiplication. For example, a 104-byte object 3355 will have a 24-byte overhead. 3356 3357 b. Per page overhead. Page links occupy some bytes in the 3358 page. To keep things fast, at least one slot per page is 3359 always occupied, independent of object size. Thus, for a 3360 1 KB object size, 25% of the page is used for the page 3361 header. 3362 33632. 4 KB pages are not freed until the malo allocator is destroyed. 3364 This is something to keep in mind. 3365 3366Internal Structure 3367================== 3368 3369The malo allocator allocates objects out of 4 KB pages. Each page is 3370aligned on a 4-KB memory boundary. This makes it possible for the 3371``lsquic_malo_put()`` function only to take on argument -- the object 3372to free -- and to find the malo allocator object itself. 3373 3374Each page begins with a header followed by a number of slots -- up to 3375the 4-KB limit. Two lists of pages are maintained: all pages and free 3376pages. A "free" page is a page with at least one free slot in it. 3377 3378The malo allocator (``struct malo``) stores itself in the first page, 3379occupying some slots. 3380 3381Receive History 3382*************** 3383 3384*Files: lsquic_rechist.h, lsquic_rechist.c, test_rechist.c* 3385 3386.. _overview-5: 3387 3388Overview 3389======== 3390 3391The reason for keeping the history of received packets is to generate 3392ACK frames. The Receive History module provides functionality to add 3393packet numbers, truncate history, and iterate over the received packet 3394number ranges. 3395 3396.. _data-structures-3: 3397 3398Data Structures 3399=============== 3400 3401.. _overview-6: 3402 3403Overview 3404-------- 3405 3406The receive history is a singly-linked list of packet number ranges, 3407ordered from high to low: 3408 3409.. image:: rechist-linked-list.png 3410 3411The ordering is maintained as an invariant with each addition to the 3412list and each truncation. This makes it trivial to iterate over the 3413ranges. 3414 3415To limit the amount of memory this data structure can allocate, the 3416maximum number of elements is specified when Receive History is 3417initialized. In the unlikely case that that number is reached, new 3418elements will push out the elements at the tail of the linked list. 3419 3420Memory Layout 3421------------- 3422 3423In memory, the linked list elements are stored in an array. Placing them 3424into contiguous memory achieves three goals: 3425 3426- Receive history manipulation is fast because the elements are all 3427 close together. 3428 3429- Memory usage is reduced because each element does not use pointers to 3430 other memory locations. 3431 3432- Memory fragmentation is reduced. 3433 3434The array grows as necessary as the number of elements increases. 3435 3436The elements are allocated from and returned to the array with the aid 3437of an auxiliary data structure. An array of bitmasks is kept where each 3438bit corresponds to an array element. A set bit means that the element is 3439allocated; a cleared bit indicates that the corresponding element is 3440free. 3441 3442To take memory savings and speed further, the element array and the 3443array of bitmasks are allocated in a single span of memory. 3444 3445.. image:: rechist-memory-layout.png 3446 3447rechist_elem 3448------------ 3449 3450``re_low`` and ``re_count`` define the packet range. To save memory, we 3451assume that the range will not contain more than 4 billion entries and 3452use a four-byte integer instead of a second ``lsquic_packno_t``. 3453 3454``re_next`` is the index of the next element. Again, we assume that there 3455will be no more than 4 billion elements. The NULL pointer is represented 3456by ``UINT_MAX``. 3457 3458This struct is just 16 bytes in size, which is a nice number. 3459 3460lsquic_rechist 3461-------------- 3462 3463``rh_elems`` and ``rh_masks`` are the element array and the bitmask array, 3464respectively, as described above. The two use the same memory chunk. 3465 3466``rh_head`` is the index of the first element of the linked list. 3467 3468The iterator state, ``rh_iter``, is embedded into the main object itself, 3469as there is no expectation that more than one iterations will need to be 3470active at once. 3471 3472.. _notable-code-5: 3473 3474Notable Code 3475============ 3476 3477Inserting Elements 3478------------------ 3479 3480Elements may be inserted into the list when a new packet number is added 3481to history via ``lsquic_rechist_received()``. If the new packet number 3482requires a new range (e.g. it does not expand one of the existing 3483ranges), a new element is allocated and linked. 3484 3485There are four cases to consider: 3486 34871. Inserting the new element at the head of the list, with it becoming 3488 the new head. (This is the most common scenario.) The code that 3489 does it is labeled ``first_elem``. 3490 34912. Appending the new element to the list, with it becoming the new tail. 3492 This code is located right after the ``while`` loop. 3493 34943. Inserting the new element between two existing elements. This code is 3495 labeled ``insert_before``. 3496 34974. Like (3), but when the insertion is between the last and the 3498 next-to-last elements and the maximum number of elements has been 3499 reached. In this case, the last element's packet number 3500 information can simply be replaced. This code is labeled 3501 ``replace_last_el``. 3502 3503Growing the Array 3504----------------- 3505 3506When all allocated elements in ``rh_elems`` are in use 3507(``rh_n_used >= rh_n_alloced``), the element array needs to be expanded. 3508This is handled by the function ``rechist_grow``. 3509 3510Note how, after realloc, the bitmask array is moved to its new location 3511on the right side of the array. 3512 3513Handling Element Overflow 3514------------------------- 3515 3516When the array has grown to its maximum allowed size, allocating a new 3517element occurs via reusing the last element on the list, effectively 3518pushing it out. This happens in ``rechist_reuse_last_elem``. 3519 3520The first loop finds the last linked list element: that's the element 3521whose ``re_next`` is equal to ``UINT_MAX``. 3522 3523Then, the second loop finds the element that points to the last element. 3524This is the next-to-last (penultimate) element. This element's next 3525pointer will now be set to NULL, effectively dropping the last element, 3526which can now be reused. 3527 3528Iterating Over Ranges 3529--------------------- 3530 3531Iteration is performed by the ``lsquic_rechist_first`` and 3532``lsquic_rechist_next`` pair of functions. The former resets the internal 3533iterator. Only one iteration at a time is possible. 3534 3535These functions have a specific signature: they and the pointer to the 3536receive history are passed to the ``pf_gen_ack_frame`` function, which 3537generates an ACK frame. 3538 3539Clone Functionality 3540------------------- 3541 3542The Receive History can be initialized from another instance of a 3543receive history. This is done by ``lsquic_rechist_copy_ranges``. This 3544functionality is used during connection promotion, when `Tiny Receive 3545History <#tiny-receive-history>`__ that is used by the `IETF mini 3546connection <#mini-ietf-connection>`__ is converted to Receive History. 3547 3548Tiny Receive History 3549******************** 3550 3551*Files: lsquic_trechist.h, lsquic_trechist.c, test_trechist.c* 3552 3553.. _overview-7: 3554 3555Overview 3556======== 3557 3558The Tiny Receive History is similar to `Receive 3559History <#receive-history>`__, but it is even more frugal with memory. 3560It is used in the `IETF mini connection <#mini-ietf-connection>`__ as a 3561more costly `alternative to using bare bitmasks <#imc-recvd-packnos>`__. 3562 3563Because it is so similar to Receive History, only differences are 3564covered in this section. 3565 3566Less Memory 3567=========== 3568 3569No Trechist Type 3570---------------- 3571 3572There is no ``lsquic_trechist``. The history is just a single 32-bit 3573bitmask and a pointer to the array of elements. The bitmask and the 3574pointer are passed to all ``lsquic_trechist_*`` functions. 3575 3576This gives the user of Tiny Receive History some flexibility and saves 3577memory. 3578 3579Element 3580------- 3581 3582The linked list element, ``trechist_elem``, is just 6 bytes in size. The 3583assumptions are: 3584 3585- No packet number is larger than 2\ :sup:`32` - 1 3586 3587- No packet range contains more than 255 packets 3588 3589- Linked list is limited to 256 elements 3590 3591Head Does Not Move 3592================== 3593 3594Because of memory optimizations described above, the head element is 3595always at index 0. The NULL pointer ``te_next`` is indicated by the value 35960 (because nothing points to the first element). 3597 3598Array Does Not Grow 3599=================== 3600 3601The size of the element array is limited by the 32-bit bitmask. As a 3602further optimization, the number of ranges is limited to 16 via the 3603``TRECHIST_MAX_RANGES`` macro. 3604 3605Insertion Range Check 3606===================== 3607 3608A packet range spanning more than 255 (UCHAR_MAX) packets cannot be 3609represented. This will cause a failure, as it is checked for in the 3610code. 3611 3612This many packets are unlikely to even be required to complete the 3613handshake. If this limit is hit, it is perhaps good to abort the mini 3614connection. 3615 3616Set64 3617***** 3618 3619*Files: lsquic_set.h, lsquic_set.h, test_set.c* 3620 3621This data structure (along with *Set32*, which is not currently used 3622anywhere in the code) is meant to keep track of a set of numbers that 3623are always increasing and are not expected to contain many gaps. 3624Stream IDs fit that description, and ``lsquic_set64`` is used in both 3625gQUIC and IETF QUIC full connections. 3626 3627Because one or two low bits in stream IDs contain stream type, the 3628stream IDs of different types are stored in different set structures; 3629otherwise, there would be gaps. For example, see the 3630``conn_is_stream_closed()`` functions (there is one in each gQUIC and 3631IETF QUIC full connection code). 3632 3633Appendix A: List of Data Structures 3634*********************************** 3635 3636The majority of data structures employed by lsquic are linked lists and, 3637to a lesser extent, arrays. This makes the code simple and fast 3638(assuming a smart memory layout). 3639 3640Nevertheless, a few places in the code called for more involved and, at 3641times, customized data structures. This appendix catalogues them. 3642 3643This is the list of non-trivial data structures implemented in lsquic: 3644 3645Ring Buffer Linked Lists 3646======================== 3647 3648- `Receive History <#receive-history>`__ 3649 3650- `Tiny Receive History <#tiny-receive-history>`__ 3651 3652Hashes 3653====== 3654 3655- lsquic_hash 3656 3657- hash_data_in 3658 3659Min-heaps 3660========= 3661 3662- `Advisory Tick Time Queue <#advisory-tick-time-queue>`__ 3663 3664- lsquic_min_heap 3665 3666Bloom Filters 3667============= 3668 3669- CID Purgatory 3670 3671 3672Appendix B: Obsolete and Defunct Code 3673************************************* 3674 3675Mem Used 3676======== 3677 3678Engine History 3679============== 3680 3681QLOG 3682==== 3683 3684 3685.. [1] 3686 This is due to the limitation of the QPACK library: the decoder can 3687 read input one byte at a time, but the encoder cannot write output 3688 one byte at a time. It could be made to do that, but the effort is 3689 not worth it. 3690 3691.. [2] 3692 Mini conn structs are allocated out of the `Malo 3693 Allocator <#malo-allocator>`__, which used to be limited to objects 3694 whose size is a power of two, so it was either fitting it into 128 3695 bytes or effectively doubling the mini conn size. 3696 3697.. [3] 3698 This two-step packet parsing mechanism is left over from the 3699 little-endian to big-endian switch in gQUIC several years ago: 3700 Before parsing out the packet number, it was necessary to know 3701 whether it is little- or big-endian. It should be possible to 3702 do away with this, especially once gQUIC is gone. 3703 3704.. [4] 3705 This term was picked consciously: alarms *ring*, while timers do 3706 other things, such as "fire" and so on. 3707