5.7. Unexpected restarts of the optimization service

There are two ways the optimization service gets aborted:

In both situations, the Steelhead appliance will generate a process dump. When this happens, open a case with Riverbed TAC and provide them with a system dump. The process dump might be requested later by them.

5.7.1. Clean restart of the optimization service

When the optimization service gets restarted, either via the CLI or via the GUI, the logs will show that the optimization service gets cleanly shutdown:

Figure 5.39. Proper restart of the optimization service

SH pm[11256]: [pm.NOTICE]: Terminating process sport 
SH sport[19125]: [signal.NOTICE] - {- -} /opt/rbt/bin/sport (pid 19125) received signal 2  \
    (SIGINT) 
SH pm[11256]: [pm.NOTICE]: Output from sport: /opt/rbt/bin/sport (pid 19125) received sign \
    al 2 (SIGINT) marking bit 
[...]
SH sport[19125]: [sport.NOTICE] - {- -} Sport is shutting down. 
SH sport[19125]: [sport.NOTICE] - {- -} Sport has cleanly shutdown. 
SH pm[11256]: [pm.NOTICE]: Launched sport with pid 19132 
SH sport[19132]: [sport.NOTICE] - {- -} Sport initializing... (main thread pid 19132)
SH sport[19132]: [sport.NOTICE] - {- -} AsyncLogger initialized, backlog 1000, shutdown ti \
    meout 30 
SH kernel:  <5>[intercept.NOTICE] Intercept paused since watchdog failed.
SH statsd[14594]: [statsd.NOTICE]: Alarm triggered for rising error for event bypass 
[...]
SH sport[19132]: [sport.NOTICE] - {- -} Sport ready. 
SH kernel: [intercept.NOTICE] Intercept unpaused , watchdog is normal.
SH wdt[14896]: [wdt.NOTICE]: Hardware watchdog change event.  Hardware Watchdog is now : e \
    nabled 
SH statsd[14594]: [statsd.NOTICE]: Alarm triggered for rising clear for event bypass 

5.7.2. Restart of the optimization service because of an assertion failure

An assertion failure happens when the optimization service experiences a pre-condition which wasn't met. This is put in the code by the developers because they know this is a situation which shouldn't happen and which will cause problems later on.

It can be seen in the logs by the string received signal SIGABRT in the log files together with the string ASSERTION FAILED. After that a stack trace is captured and the optimization service is restarted.

Figure 5.40. Optimization service restart because of an assertion failure

SH sport[20486]: [assert.CRIT] - {- -} ASSERTION FAILED (true == cur_req_->req_.done()) at \
     ../protocol/http/http.cc:3064 
SH pm[11899]: [pm.ERR]: Output from sport: /opt/rbt/bin/sport (pid 20486) received signal  \
    6 (SIGABRT) dumping core 
SH pm[11899]: [pm.ERR]: Output from sport: /opt/rbt/bin/sport STACK TRACE: 0x2a959c75b0:/l \
    ib64/tls/libpthread.so.0:@0x2a959bb000+0x0000c5b0 0x2a961a026d:/lib64/tls/libc.so.6:gs \
    ignal@0x2a96172000+0x0002e26d 0x2a961a1a6e:/lib64/tls/libc.so.6:abort@0x2a96172000+0x0 \
    002fa6e 0x0084b448:/opt/rbt/bin/sport:@0x400000+0x0044b448 0x011c8682:/opt/rbt/bin/spo \
    rt:_ZN10HttpServer8proc_reqEP7DataBufj13HttpTransType@0x400000+0x00dc8682 0x0117bac7:/ \
    opt/rbt/bin/sport:_ZN8HttpUnit19proc_http_turbo_msgEP7DataBufj@0x400000+0x00d7bac7 0x0 \
    11bfd55:/opt/rbt/bin/sport:_ZN10HttpServer9proc_dataEP7DataBufjb@0x400000+0x00dbfd55 0 \
    x0117b8cf:/opt/rbt/bin/sport:_ZN12HttpConsumer7consumeEP7DataBufj@0x400000+0x00d7b8cf  \
    0x009244b5:/opt/rbt/bin/sport:_ZN7Decoder5flushEv@0x400000+0x005244b5 0x00923a31:/opt/ \
    rbt/bin/sport:_ZN7Decoder25handle_refs_disk_callbackEv@0x400000+0x00523a31 0x00923eaa: \
    /opt/rbt/bin/sport:_ZN7Decoder12handle_eventE11EventSource9EventTypePvS2_@0x400000+0x0 \
    0523eaa 0x009d6e7a:/opt/rbt/bin/sport:_ZN20sp_lookup_b
SH pm[11899]: [pm.WARNING]: Output from sport:  (pid 20486) Force Flushing AsyncLogger, ti \
    mestamps may be incorrect 
SH statsd[15655]: [statsd.NOTICE]: statsd/provider handle_session_close 68 
SH pm[11899]: [pm.NOTICE]: Process sport (pid 20486) terminated from signal 6 (SIGABRT) 
SH pm[11899]: [pm.WARNING]: Exit with code 1 from /sbin/fuser 
SH pm[11899]: [pm.WARNING]: Core file found in sport's working directory 
SH pm[11899]: [pm.NOTICE]: Waiting 1 seconds before relaunching sport 
SH kernel: [intercept.NOTICE] Intercept paused since watchdog failed.
SH statsd[15655]: [statsd.NOTICE]: Alarm triggered for rising error for event bypass 
SH sport[21850]: [sport.NOTICE] - {- -} Sport initializing... (main thread pid 21850)
SH pm[11899]: [pm.NOTICE]: Launched sport with pid 21850 

5.7.3. Restart of the optimization service because of a process failure

A process failure happens when the optimization service experiences a condition which causes the operation system to terminate the optimization service.

It can be seen in the logs by the string received signal SIGxxx but without the string ASSERTION FAILED. After that a stack trace is captured and the optimization service is restarted.

Figure 5.41. Optimization service restart because of a process failure

SH pm[5996]: [pm.ERR]: Output from sport: /opt/rbt/bin/sport (pid 14917) received signal 1 \
    1 (SIGSEGV) dumping core 
SH pm[5996]: [pm.ERR]: Output from sport: /opt/rbt/bin/sport STACK TRACE: 0x2a959c75b0:/li \
    b64/tls/libpthread.so.0:@0x2a959bb000+0x0000c5b0 0x012c36df:/opt/rbt/bin/sport:_ZN4Smb \
    212ClientParser22process_CreateResponseEPNS_14CreateResponseE@0x400000+0x00ec36df 0x01 \
    2aadce:/opt/rbt/bin/sport:_ZN4Smb215CompoundRequest17process_responsesEPNS_8Smb2ListI3 \
    PtrINS_12ResponseBaseEEEE@0x400000+0x00eaadce 0x0126b568:/opt/rbt/bin/sport:_ZN4Smb26P \
    arser17process_responsesERKNS_8Smb2ListI3PtrINS_12ResponseBaseEEEE@0x400000+0x00e6b568 \
     0x0126bb1f:/opt/rbt/bin/sport:_ZN4Smb212ClientParser18server_consume_smbEP7DataBufj@0 \
    x400000+0x00e6bb1f 0x00c9e5e9:/opt/rbt/bin/sport:_ZN4Cifs12CifsConsumer18process_data_ \
    queueEv@0x400000+0x0089e5e9 0x00c9e6ef:/opt/rbt/bin/sport:_ZN4Cifs12CifsConsumer15proc \
    ess_consumeEP7DataBufj@0x400000+0x0089e6ef 0x00c9d457:/opt/rbt/bin/sport:_ZNK5boost9fu \
    nction0IvEclEv@0x400000+0x0089d457 0x00c9be93:/opt/rbt/bin/sport:_ZN4Cifs10CifsParser9 \
    add_eventERKN5boost9function0IvEE@0x400000+0x0089be93 
SH pm[5996]: [pm.WARNING]: Output from sport:  (pid 14917) Force Flushing AsyncLogger, tim \
    estamps may be incorrect 
SH statsd[9264]: [statsd.NOTICE]: statsd/provider handle_session_close 4 
SH pm[5996]: [pm.NOTICE]: Process sport (pid 14917) terminated from signal 11 (SIGSEGV) 
SH pm[5996]: [pm.WARNING]: Exit with code 1 from /sbin/fuser 
SH pm[5996]: [pm.WARNING]: Core file found in sport's working directory 
SH pm[5996]: [pm.NOTICE]: Waiting 1 seconds before relaunching sport 
SH mgmtd[5997]: [mgmtd.NOTICE]: Sending email to event recipients 
SH kernel: [intercept.NOTICE] Intercept paused since watchdog failed.
SH statsd[9264]: [statsd.NOTICE]: Alarm triggered for rising error for event bypass 
SH mgmtd[5997]: [mgmtd.NOTICE]: Sending email to event recipients 
SH sport[11613]: [sport.NOTICE] - {- -} Sport initializing... (main thread pid 11613)