diff --git a/man7/packet.7 b/man7/packet.7 index 9f3f8b93c..e0993d208 100644 --- a/man7/packet.7 +++ b/man7/packet.7 @@ -177,17 +177,22 @@ and .I sll_ifindex are used. .SS Socket options +Packet socket options are configured by calling +.BR setsockopt (2) +with level +.BR SOL_PACKET . +.TP +.BR PACKET_ADD_MEMBERSHIP +.PD 0 +.TP +.BR PACKET_DROP_MEMBERSHIP +.PD Packet sockets can be used to configure physical layer multicasting and promiscuous mode. -It works by calling -.BR setsockopt (2) -on a packet socket for -.B SOL_PACKET -and one of the options .B PACKET_ADD_MEMBERSHIP -to add a binding or +adds a binding and .B PACKET_DROP_MEMBERSHIP -to drop it. +drops it. They both expect a .B packet_mreq structure as argument: @@ -227,11 +232,207 @@ In addition the traditional ioctls .BR SIOCADDMULTI , .B SIOCDELMULTI can be used for the same purpose. +.TP +.BR PACKET_AUXDATA " (since Linux 2.6.21)" +.\" commit 8dc4194474159660d7f37c495e3fc3f10d0db8cc +If this binary option is enabled, the packet socket passes a metadata +structure along with each packet in the +.BR recvmsg (2) +control field. +The structure can be read with +.BR cmsg (3). +It is defined as + +.in +4n +.nf +struct tpacket_auxdata { + __u32 tp_status; + __u32 tp_len; /* packet length */ + __u32 tp_snaplen; /* captured length */ + __u16 tp_mac; + __u16 tp_net; + __u16 tp_vlan_tci; + __u16 tp_padding; +}; +.fi +.in +.TP +.BR PACKET_FANOUT " (since Linux 3.1)" +.\" commit dc99f600698dcac69b8f56dda9a8a00d645c5ffc +To scale processing across threads, packet sockets can form a fanout +group. +In this mode, each matching packet is enqueued onto only one +socket in the group. +A socket joins a fanout group by calling +.BR setsockopt (2) +with level +.B SOL_PACKET +and option +.BR PACKET_FANOUT . +Each network namespace can have up to 65536 independent groups. +A socket selects a group by encoding the ID in the first 16 bits of +the integer option value. +The first packet socket to join a group implicitly creates it. +To successfully join an existing group, subsequent packet sockets +must have the same protocol, device settings and fanout mode and +flags (see below). +Packet sockets can leave a fanout group only by closing the socket. +The group is deleted when the last socket is closed. + +Fanout supports multiple algorithms to spread traffic between sockets. +The default mode, +.BR PACKET_FANOUT_HASH , +sends packets from the same flow to the same socket to maintain +per-flow ordering. +For each packet, it chooses a socket by taking the packet flow hash +modulo the number of sockets in the group, where a flow hash is a hash +over network layer address and optional transport layer port fields. +The load balance mode +.BR PACKET_FANOUT_LB +implements a round-robin algorithm. +.BR PACKET_FANOUT_CPU +selects the socket based on the CPU that the packet arrived on. +.BR PACKET_FANOUT_ROLLOVER +processes all data on a single socket, moves to the next when one +becomes backlogged. +.BR PACKET_FANOUT_RND +selects the socket using a pseudo random number generator. + +Fanout modes can take additional options. +IP fragmentation causes packets from the same flow to have different +flow hashes. +The flag +.BR PACKET_FANOUT_FLAG_DEFRAG , +if set, causes packet to be defragmented before fanout is applied, to +preserve order even in this case. +Fanout mode and options are communicated in the second 16 bits of the +integer option value. +The flag +.BR PACKET_FANOUT_FLAG_ROLLOVER +enables the roll over mechanism as a backup strategy: if the +original fanout algorithm selects a backlogged socket, the packet +rolls over to the next available one. +.TP +.BR PACKET_LOSS " (with PACKET_TX_RING)" +If set, do not silently drop a packet on transmission error, but +return it with status set to +.BR TP_STATUS_WRONG_FORMAT . +.TP +.BR PACKET_RESERVE " (with PACKET_RX_RING)" +By default, a packet receive ring writes packets immediately following the +metadata structure and alignment padding. +This integer option reserves additional headroom. +.TP +.BR PACKET_RX_RING +Create a memory mapped ring buffer for asynchronous packet reception. +The packet socket reserves a contiguous region of application address +space, lays it out into an array of packet slots and copies packets +(up to +.IR tp_snaplen +) into subsequent slots. +Each packet is preceded by a metadata structure similar to +.IR tpacket_auxdata . +The protocol fields encode the offset to the data +from the start of the metadata header. +.I tp_net +stores the offset to the network layer. +If the packet socket is of type +.BR SOCK_DGRAM , +then +.I tp_mac +is the same. +If it is of type +.BR SOCK_RAW , +then that field stores the offset to the link layer frame. +Packet socket and application communicate the head and tail of the ring +through the +.I tp_status +field. +The packet socket owns all slots with status +.BR TP_STATUS_KERNEL . +After filling a slot, it changes the status of the slot to transfer +ownership to the application. +During normal operation, the new status is +.BR TP_STATUS_USER , +to signal that a correctly received packet has been stored. +When the application has finished processing a packet, it transfers +ownership of the slot back to the socket by setting the status to +.BR TP_STATUS_KERNEL . +Packet sockets implement multiple variants of the packet ring. +The implementation details are described in +.IR Documentation/networking/packet_mmap.txt +in the Linux kernel source tree. +.TP +.BR PACKET_STATISTICS +Retrieve packet socket statistics in the form of a structure + +.in +4n +.nf +struct tpacket_stats { + unsigned int tp_packets; /* total packet count */ + unsigned int tp_drops; /* dropped packet count */ +}; +.fi +.in + +Receiving statistics resets the internal counters. +The statistics structure differs when using a ring of variant +.BR TPACKET_V3 . +.TP +.BR PACKET_TIMESTAMP " (with PACKET_RX_RING)" +.\" commit 614f60fa9d73a9e8fdff3df83381907fea7c5649 +The packet receive ring always stores a timestamp in the metadata header. +By default, this is a software generated timestamp generated when the +packet is copied into the ring. +This integer option selects the type of timestamp. +Besides the default, it support the two hardware formats described in +.IR Documentation/networking/timestamping.txt +in the Linux kernel source tree. +.TP +.BR PACKET_TX_RING " (since Linux 2.6.31)" +.\" commit 69e3c75f4d541a6eb151b3ef91f34033cb3ad6e1 +Create a memory mapped ring buffer for packet transmission. +This option is similar to +.BR PACKET_RX_RING +and takes the same arguments. +The application writes packets into slots with status +.BR TP_STATUS_AVAILABLE +and schedules them for transmission by changing the status to +.BR TP_STATUS_SEND_REQUEST . +When packets are ready to be transmitted, the application calls +.BR send (2) +or a variant thereof. +The +.I buf +and +.I len +fields of this call are ignored. +If an address is passed using +.BR sendto (2) +or +.BR sendmsg (2) , +then that overrides the socket default. +On successful transmission, the socket resets the slot to +.BR TP_STATUS_AVAILABLE . +It discards packets silently on error unless +.BR PACKET_LOSS +is set. +.TP +.BR PACKET_VERSION " (with PACKET_RX_RING)" +.\" commit bbd6ef87c544d88c30e4b762b1b61ef267a7d279 +By default, +.BR PACKET_RX_RING +creates a packet receive ring of variant +.BR TPACKET_V1 . +To create another variant, configure the desired variant by setting this +integer option before creating the ring. + .SS Ioctls .B SIOCGSTAMP can be used to receive the timestamp of the last received packet. Argument is a -.I struct timeval. +.I struct timeval +variable. .\" FIXME Document SIOCGSTAMPNS In addition all standard ioctls defined in @@ -318,7 +519,7 @@ header to get a fully conforming packet. Incoming 802.3 packets are not multiplexed on the DSAP/SSAP protocol fields; instead they are supplied to the user as protocol .B ETH_P_802_2 -with the LLC header prepended. +with the LLC header prefixed. It is thus not possible to bind to .BR ETH_P_802_3 ; bind to