unixdev.net


Switch to SpeakEasy.net DSL

The Modular Manual Browser

Home Page
Manual: (4.2BSD)
Page:
Section:
Apropos / Subsearch:
optional field

CRASH(8V)                                                            CRASH(8V)



NAME
       crash - what happens when the system crashes

DESCRIPTION
       This  section explains what happens when the system crashes and how you
       can analyze crash dumps.

       When the system crashes voluntarily it prints a message of the form

              panic: why i gave up the ghost

       on the console, takes a dump on a mass  storage  peripheral,  and  then
       invokes  an  automatic reboot procedure as described in reboot(8).  (If
       auto-reboot is disabled on the front panel of the  machine  the  system
       will  simply halt at this point.)  Unless some unexpected inconsistency
       is encountered in the state of the file  systems  due  to  hardware  or
       software failure the system will then resume multi-user operations.

       The system has a large number of internal consistency checks; if one of
       these fails, then it will panic with a very  short  message  indicating
       which one failed.

       The most common cause of system failures is hardware failure, which can
       reflect itself in different ways.  Here are the messages which you  are
       likely  to  encounter,  with some hints as to causes.  Left unstated in
       all cases is the possibility that hardware or software  error  produced
       the message in some unexpected way.

       IO err in push
       hard IO err in swap
              The  system  encountered  an error trying to write to the paging
              device or an error in reading critical information from  a  disk
              drive.  You should fix your disk if it is broken or unreliable.

       timeout table overflow
              This  really  shouldn't be a panic, but until we fix up the data
              structure involved, running out of entries causes a  crash.   If
              this happens, you should make the timeout table bigger.

       KSP not valid
       SBI fault
       CHM? in kernel
              These  indicate  either  a  serious  bug  in the system or, more
              often, a glitch or failing hardware.  If SBI faults recur, check
              out  the  hardware  or  call field service.  If the other faults
              recur, there is likely a bug somewhere in the  system,  although
              these can be caused by a flakey processor.  Run processor micro-
              diagnostics.

       machine check %x:
              description

          machine dependent machine-check information
              We should describe machine checks, and will someday.   For  now,
              ask someone who knows (like your friendly field service people).

       trap type %d, code=%d, pc=%x
              A unexpected trap has occurred within the system; the trap types
              are:

              0    reserved addressing fault
              1    privileged instruction fault
              2    reserved operand fault
              3    bpt instruction fault
              4    xfc instruction fault
              5    system call trap
              6    arithmetic trap
              7    ast delivery trap
              8    segmentation fault
              9    protection fault
              10   trace trap
              11   compatibility mode fault
              12   page fault
              13   page table fault

              The favorite trap types in system crashes are trap types  8  and
              9,  indicating  a  wild  reference.   The code is the referenced
              address, and the pc at the time of the fault is printed.   These
              problems  tend  to be easy to track down if they are kernel bugs
              since the processor stops cold, but random  flakiness  seems  to
              cause this sometimes.

       init died
              The system initialization process has exited.  This is bad news,
              as no new users will then be able to log in.  Rebooting  is  the
              only fix, so the system just does it right away.

       That completes the list of panic types you are likely to see.

       When  the  system  crashes it writes (or at least attempts to write) an
       image of memory into the back end of the primary swap area.  After  the
       system  is  rebooted, the program savecore(8) runs and preserves a copy
       of this core image and the current system in a specified directory  for
       later perusal.  See savecore(8) for details.

       To  analyze  a dump you should begin by running adb(1) with the -k flag
       on the core dump.  Normally the command ``*(intstack-4)$c''  will  pro-
       vide  a stack trace from the point of the crash and this will provide a
       clue as to what went wrong.   A  more  complete  discussion  of  system
       debugging  is  impossible here.  See, however, ``Using ADB to Debug the
       UNIX Kernel''.

SEE ALSO
       adb(1), analyze(8), reboot(8)
       VAX 11/780 System Maintenance Guide for more information about  machine
       checks.
       Using ADB to Debug the UNIX Kernel



4th Berkeley Distribution      1 September 1981                      CRASH(8V)