diff --git a/man2/seccomp.2 b/man2/seccomp.2 index b596fb87e..266c7216d 100644 --- a/man2/seccomp.2 +++ b/man2/seccomp.2 @@ -251,7 +251,7 @@ struct seccomp_data { .in Because the numbers of system calls vary between architectures and -some architectures (e.g. X86-64) allow user-space code to use +some architectures (e.g., X86-64) allow user-space code to use the calling conventions of multiple architectures, it is usually necessary to verify the value of the .IR arch @@ -260,19 +260,20 @@ field. It is strongly recommended to use a whitelisting approach whenever possible because such an approach is more robust and simple. A blacklist will have to be updated whenever a potentially -dangerous syscall is added (or a dangerous flag or option if those +dangerous system call is added (or a dangerous flag or option if those are blacklisted), and it is often possible to alter the representation of a value without altering its meaning, leading to a blacklist bypass. The .IR arch -field is not unique for all calling conventions. The X86-64 ABI and -the X32 ABI both use +field is not unique for all calling conventions. +The X86-64 ABI and the X32 ABI both use .BR AUDIT_ARCH_X86_64 as .IR arch , -and they run on the same processors. Instead, the mask +and they run on the same processors. +Instead, the mask .BR __X32_SYSCALL_BIT is used on the system call number to tell the two ABIs apart. This means that in order to create a seccomp-based @@ -281,7 +282,7 @@ it is necessary to not only check that .IR arch equals .BR AUDIT_ARCH_X86_64 , -but also to explicitly reject all syscalls that contain +but also to explicitly reject all system call that contain .BR __X32_SYSCALL_BIT in .IR nr . @@ -289,15 +290,16 @@ in When checking values from .IR args against a blacklist, keep in mind that arguments are often -silently truncated before being processed, but after the seccomp -check. For example, this happens if the i386 ABI is used on an +silently truncated before being processed, but after the seccomp check. +For example, this happens if the i386 ABI is used on an X86-64 kernel: Although the kernel will normally not look beyond the 32 lowest bits of the arguments, the values of the full -64-bit registers will be present in the seccomp data. A less -surprising example is that if the X86-64 ABI is used to perform -a syscall that takes an argument of type int, the -more-significant half of the argument register is ignored by -the syscall, but visible in the seccomp data. +64-bit registers will be present in the seccomp data. +A less surprising example is that if the X86-64 ABI is used to perform +a system call that takes an argument of type +.IR int , +the more-significant half of the argument register is ignored by +the system call, but visible in the seccomp data. A seccomp filter returns a 32-bit value consisting of two parts: the most significant 16 bits @@ -691,7 +693,7 @@ install_filter(int syscall_nr, int t_arch, int f_errno) (offsetof(struct seccomp_data, nr))), /* [3] Check ABI - only needed for X86-64 in blacklist usecases. - Use JGT instead of checking against the bitmask to avoid + Use JGT instead of checking against the bit mask to avoid having to reload the syscall number. */ BPF_JUMP(BPF_JMP | BPF_JGT | BPF_K, upper_nr_limit, 3, 0),