Skip to content

SDTs optimised into tail-calls panic #476

@FelixMcFelix

Description

@FelixMcFelix

Noted while working on #462/#475 -- this is a tracking issue to understand this as its own problem. Today we are converting InnerFlowIds to flow_id_sdt_arg structs, which is moderately costly as it occurs many times per packet. This creates one or two (current, or before+after) stack-local variables which are referenced without issue.

Removing this and passing in either a *const InnerFlowId or converting to a uintptr_t (as we do with our other args) leads to known panics in two locations so far. From some dumps I've captured:

Periodic flow expiry
panic[cpu14]/thread=fffffe009270ac20:
BAD TRAP: type=e (#pf Page fault) rp=fffffe009270a090 addr=18 occurred in module "xde" due to a NULL pointer dereference


sched:
#pf Page fault
Bad kernel fault at addr=0x18
pid=0, pc=0xfffffffff44db7be, sp=0xfffffe009270a180, eflags=0x10246
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe>  cr4: 3406f8<smap,smep,osxsav,xmme,fxsr,pge,mce,pae,pse,de>
cr2: 18
cr3: 1a800000
cr8: 0

        rdi: fffffe69f79cacf0 rsi:                0 rdx:           110000
        rcx:                0  r8:                2  r9: fffffe009270a278
        rax:                0 rbx: fffffe69f79cacf0 rbp: fffffe009270a1d0
        r10:      8a2ba1a7901 r11:                6 r12:                4
        r13:                1 r14:                4 r15:                0
        fsb: fffffc7fef2d2a40 gsb: fffffe69deb02000  ds:                0
         es:                0  fs:                0  gs:                0
        trp:                e err:                0 rip: fffffffff44db7be
         cs:               30 rfl:            10246 rsp: fffffe009270a180
         ss:               38

fffffe0092709fa0 unix:die+c0 ()
fffffe009270a080 unix:trap+999 ()
fffffe009270a090 unix:cmntrap+e9 ()
fffffe009270a1d0 xde:_ZN4core3fmt9Formatter12pad_integral17hdace542c09befd8aE+18e ()
fffffe009270a260 xde:_ZN4core3fmt3num53_$LT$impl$u20$core..fmt..LowerHex$u20$for$u20$u16$GT$3fmt17hba211b57c0906999E+7a ()
fffffe009270a2f0 xde:_ZN4core3fmt5write17h5e760e4f19caf97dE+1b3 ()
fffffe009270a3b0 xde:_ZN67_$LT$smoltcp..wire..ipv6..Address$u20$as$u20$core..fmt..Display$GT$3fmt17h6e9c915dfd6e131fE+195 ()
fffffe009270a440 xde:_ZN4core3fmt5write17h5e760e4f19caf97dE+1b3 ()
fffffe009270a4a0 xde:_ZN44_$LT$$RF$T$u20$as$u20$core..fmt..Display$GT$3fmt17h9128c724cf7b20c1E+61 ()
fffffe009270a530 xde:_ZN4core3fmt5write17h5e760e4f19caf97dE+1b3 ()
fffffe009270a590 xde:_ZN59_$LT$opte_api..ip..IpAddr$u20$as$u20$core..fmt..Display$GT$3fmt17haa9991ea1a942307E+74 ()
fffffe009270a620 xde:_ZN4core3fmt5write17h5e760e4f19caf97dE+1b3 ()
fffffe009270a6c0 xde:_ZN69_$LT$opte..engine..nat..OutboundNat$u20$as$u20$core..fmt..Display$GT$3fmt17hdcf3ff64a1fb60dcE+5b ()
fffffe009270a800 xde:_ZN121_$LT$alloc..collections..btree..map..ExtractIf$LT$K$C$V$C$F$C$A$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$4next17h6364d8716d0dcc4eE+1fc ()
fffffe009270a980 xde:_ZN4opte6engine10flow_table18FlowTable$LT$S$GT$12expire_flows17hb96f1e0513b25d84E+17f ()
fffffe009270ab10 xde:expire_periodic+91 ()
fffffe009270ab50 genunix:periodic_execute+f5 ()
fffffe009270ac00 genunix:taskq_thread+2a6 ()
fffffe009270ac10 unix:thread_start+b ()

dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Update TCP state
panic[cpu3]/thread=fffffe00934eac20:
BAD TRAP: type=e (#pf Page fault) rp=fffffe00934e9aa0 addr=c occurred in module "xde" due to a NULL pointer dereference


sched:
#pf Page fault
Bad kernel fault at addr=0xc
pid=0, pc=0xfffffffff44ee31b, sp=0xfffffe00934e9b90, eflags=0x10286
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe>  cr4: 3406f8<smap,smep,osxsav,xmme,fxsr,pge,mce,pae,pse,de>
cr2: c
cr3: 1a800000
cr8: 0

        rdi:                2 rsi: fffffe6ba8fb8228 rdx: fffffe00934ea0e8
        rcx:               17  r8:          a000000  r9:                4
        rax: fffffe69e320cac0 rbx:                2 rbp: fffffe00934e9d50
        r10:               ff r11:                5 r12: fffffe69e8260e10
        r13: fffffe00934ea0e8 r14:               17 r15:                2
        fsb: fffffc7fef2d2a40 gsb: fffffe69de4bf000  ds:                0
         es:                0  fs:                0  gs:                0
        trp:                e err:                0 rip: fffffffff44ee31b
         cs:               30 rfl:            10286 rsp: fffffe00934e9b90
         ss:               38

fffffe00934e99b0 unix:die+c0 ()
fffffe00934e9a90 unix:trap+999 ()
fffffe00934e9aa0 unix:cmntrap+e9 ()
fffffe00934e9d50 xde:_ZN4opte6engine4port13Port$LT$N$GT$16update_tcp_entry17h116fa2f846532633E+3b ()
fffffe00934e9f20 xde:_ZN4opte6engine4port13Port$LT$N$GT$16update_tcp_entry17h116fa2f846532633E+25c ()
fffffe00934ea2e0 xde:_ZN4opte6engine4port13Port$LT$N$GT$7process17h132132694f056ce9E+3bc ()
fffffe00934ea950 xde:xde_rx+43c ()
fffffe00934ea9a0 mac:mac_promisc_dispatch_one+60 ()
fffffe00934eaa20 mac:mac_promisc_dispatch+83 ()
fffffe00934eaa80 mac:mac_rx_common+47 ()
fffffe00934eaae0 mac:mac_rx+c6 ()
fffffe00934eab20 mac:mac_rx_ring+2b ()
fffffe00934eab60 igb:igb_intr_rx_work+5c ()
fffffe00934eab80 igb:igb_intr_rx+15 ()
fffffe00934eabd0 apix:apix_dispatch_by_vector+8c ()
fffffe00934eac00 apix:apix_dispatch_lowlevel+29 ()
fffffe0093499a40 unix:switch_sp_and_call+15 ()
fffffe0093499aa0 apix:apix_do_interrupt+f3 ()
fffffe0093499ab0 unix:cmnint+c3 ()
fffffe0093499ba0 unix:i86_mwait+12 ()
fffffe0093499bd0 unix:cpu_idle_mwait+14b ()
fffffe0093499be0 unix:cpu_idle_adaptive+19 ()
fffffe0093499c00 unix:idle+a8 ()
fffffe0093499c10 unix:thread_start+b ()

Both occur some distance from the actual SDT: a format statement on the supposedly-valid &InnerFlowId, and a match on a &InnerFlowId respectively before the probe occurs. Removing the probe causes these callsites to behave/compile correctly. Another SDT, layer-process-return shows a different variation:

NAME         DIR EPOCH    FLOW BEFORE                                 FLOW AFTER                                  LEN   RESULT
opte0        OUT 3        UDP,10.0.0.2:38231,10.0.0.1:10000           ,0.254.255.255:29,16.0.0.0:1980             133   Modified

             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 11 a1 02 00 0a 00 00 02 0a 00 00 01 00 00 00 00  ................
        10: 00 00 00 00 02 00 00 03 02 11 00 0e cb 1f a1 fb  ................
        20: ff ff ff ff 57 95 10 27                          ....W..'

             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 90 69 1b 93 00 fe ff ff 10 00 00 00 00 00 00 00  .i..............
        10: c0 db 43 42 6a fe ff ff f4 33 a1 fb ff ff ff ff  ..CBj....3......
        20: 57 95 10 27 1d 00 bc 07                          W..'....

Flow_before is obviously valid, while flow_after appears to point elsewhere. The only obvious difference I'm aware of is that flow_after is obtained direct via Packet::flow, while flow_before is obtained and explicltly copied out before pkt is modified.

EDIT: The last case is caused by our uintptr_t untyped args making it easy to pass the wrong thing. The actual kernel panics still stand even with accurate types, however.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions