VNODE(9)                 BSD Kernel Developer's Manual                VNODE(9)

     vnode, vcount, vref, VREF, vrele, vget, vput, vhold, VHOLD, holdrele,
     HOLDRELE, getnewvnode, ungetnewvnode, vrecycle, vgone, vgonel, vflush,
     vaccess, checkalias, bdevvp, cdevvp, vfinddev, vdevgone, vwakeup,
     vflushbuf, vinvalbuf, vtruncbuf, vprint -- kernel representation of a
     file or directory

     #include <&lt;sys/param.h>&gt;
     #include <&lt;sys/vnode.h>&gt;

     vcount(struct vnode *vp);

     vref(struct vnode *vp);

     VREF(struct vnode *vp);

     vrele(struct vnode *vp);

     vget(struct vnode *vp, int lockflag);

     vput(struct vnode *vp);

     vhold(struct vnode *vp);

     VHOLD(struct vnode *vp);

     holdrele(struct vnode *vp);

     HOLDRELE(struct vnode *vp);

     getnewvnode(enum vtagtype tag, struct mount *mp, int (**vops)(void *),
         struct vnode **vpp);

     ungetnewvnode(struct vnode *vp);

     vrecycle(struct vnode *vp, struct simplelock *inter_lkp, struct proc *p);

     vgone(struct vnode *vp);

     vgonel(struct vnode *vp);

     vflush(struct mount *mp, struct vnode *skipvp, int flags);

     vaccess(enum vtype type, mode_t file_mode, uid_t uid, gid_t gid,
         mode_t acc_mode, struct ucred *cred);

     struct vnode *
     checkalias(struct vnode *vp, dev_t nvp_rdev, struct mount *mp);

     bdevvp(dev_t dev, struct vnode **vpp);

     cdevvp(dev_t dev, struct vnode **vpp);

     vfinddev(dev_t dev, enum vtype, struct vnode **vpp);

     vdevgone(int maj, int minl, int minh, enum vtype type);

     vwakeup(struct buf *bp);

     vflushbuf(struct vnode *vp, int sync);

     vinvalbuf(struct vnode *vp, int flags, struct ucred *cred,
         struct proc *p, int slpflag, int slptimeo);

     vtruncbuf(struct vnode *vp, daddr_t lbn, int slpflag, int slptimeo);

     vprint(char *label, struct vnode *vp);

     The vnode is the focus of all file activity in NetBSD.  There is a unique
     vnode allocated for each active file, directory, mounted-on file, fifo,
     domain socket, symbolic link and device.  The kernel has no concept of a
     file's underlying structure and so it relies on the information stored in
     the vnode to describe the file.  Thus, the vnode associated with a file
     holds all the administration information pertaining to it.

     When a process requests an operation on a file, the vfs(9) interface
     passes control to a file system type dependent function to carry out the
     operation.  If the file system type dependent function finds that a vnode
     representing the file is not in main memory, it dynamically allocates a
     new vnode from the system main memory pool.  Once allocated, the vnode is
     attached to the data structure pointer associated with the cause of the
     vnode allocation and it remains resident in the main memory until the
     system decides that it is no longer needed and can be recycled.

     The vnode has the following structure:

     struct vnode {
             struct uvm_object v_uobj;               /* uvm object */
     #define v_usecount      v_uobj.uo_refs
     #define v_interlock     v_uobj.vmobjlock
             voff_t          v_size;                 /* size of file */
             int             v_flag;                 /* flags */
             int             v_numoutput;            /* num pending writes */
             long            v_writecount;           /* ref count of writers */
             long            v_holdcnt;              /* page & buffer refs */
             daddr_t         v_lastr;                /* last read */
             u_long          v_id;                   /* capability id */
             struct mount    *v_mount;               /* ptr to vfs we are in */
             int             (**v_op)(void *);       /* vnode ops vector */
             TAILQ_ENTRY(vnode) v_freelist;          /* vnode freelist */
             LIST_ENTRY(vnode) v_mntvnodes;          /* vnodes for mount pt */
             struct buflists v_cleanblkhd;           /* clean blocklist head */
             struct buflists v_dirtyblkhd;           /* dirty blocklist head */
             LIST_ENTRY(vnode) v_synclist;           /* dirty vnodes */
             union {
                     struct mount    *vu_mountedhere;/* ptr to mounted vfs */
                     struct socket   *vu_socket;     /* unix ipc (VSOCK) */
                     struct specinfo *vu_specinfo;   /* device (VCHR, VBLK) */
                     struct fifoinfo *vu_fifoinfo;   /* fifo (VFIFO) */
             } v_un;
     #define v_mountedhere   v_un.vu_mountedhere
     #define v_socket        v_un.vu_socket
     #define v_specinfo      v_un.vu_specinfo
     #define v_fifoinfo      v_un.vu_fifoinfo
             struct nqlease  *v_lease;               /* Soft ref to lease */
             enum vtype      v_type;                 /* vnode type */
             enum vtagtype   v_tag;                  /* underlying data type */
             struct lock     v_lock;                 /* lock for this vnode */
             struct lock     *v_vnlock;              /* ptr to vnode lock */
             void            *v_data;                /* private data for fs */

     Most members of the vnode structure should be treated as opaque and only
     manipulated using the proper functions.  There are some rather common
     exceptions detailed throughout this page.

     Files and file systems are inextricably linked with the virtual memory
     system and v_uobj contains the data maintained by the virtual memory sys-
     tem.  For compatibility with code written before the integration of
     uvm(9) into NetBSD C-preprocessor directives are used to alias the mem-
     bers of v_uobj.

     Vnode flags are recorded by v_flag.  Valid flags are:

           VROOT       This vnode is the root of its file system.
           VTEXT       This vnode is a pure text prototype.
           VEXECMAP    This vnode has executable mappings.
           VSYSTEM     This vnode being used by kernel; only used to skip
                       quota files in vflush().
           VISTTY      This vnode represents a tty; used when reading dead
           VXLOCK      This vnode is currently locked to change underlying
           VXWANT      A process is waiting for this vnode.
           VBWAIT      Waiting for output associated with this vnode to com-
           VALIASED    This vnode has an alias.
           VDIROP      This vnode is involved in a directory operation.  This
                       flag is used exclusively by LFS.
           VLAYER      This vnode is on a layered file system.
           VONWORKLST  This vnode is on syncer work-list.
           VDIRTY      This vnode possibly has dirty pages.

     The VXLOCK flag is used to prevent multiple processes from entering the
     vnode reclamation code.  It is also used as a flag to indicate that
     reclamation is in progress.  The VXWANT flag is set by threads that wish
     to be awaken when reclamation is finished.  Before v_flag can be modi-
     fied, the v_interlock simplelock must be acquired.  See lock(9) for
     details on the kernel locking API.

     Each vnode has three reference counts: v_usecount, v_writecount and
     v_holdcnt.  The first is the number of active references within the ker-
     nel to the vnode.  This count is maintained by vref(), vrele(), and
     vput().  The second is the number of active references within the kernel
     to the vnode performing write access to the file.  It is maintained by
     the open(2) and close(2) system calls.  The third is the number of refer-
     ences within the kernel requiring the vnode to remain active and not be
     recycled.  This count is maintained by vhold() and holdrele().  When both
     the v_usecount and v_holdcnt reach zero, the vnode is recycled to the
     freelist and may be reused for another file.  The transition to and from
     the freelist is handled by getnewvnode(), ungetnewvnode() and vrecycle().
     Access to v_usecount, v_writecount and v_holdcnt is also protected by the
     v_interlock simplelock.

     The number of pending synchronous and asynchronous writes on the vnode
     are recorded in v_numoutput.  It is used by fsync(2) to wait for all
     writes to complete before returning to the user.  Its value must only be
     modified at splbio (see spl(9)).  It does not track the number of dirty
     buffers attached to the vnode.

     Every time a vnode is reassigned to a new file, the vnode capability
     identifier v_id is changed.  It is used to maintain the name lookup cache
     consistency by providing a unique <vnode *,v_id> tuple without requiring
     the cache to hold a reference.  The name lookup cache can later compare
     the vnode's capability identifier to its copy and see if the vnode still
     points to the same file.  See namecache(9) for details on the name lookup

     The link to the file system which owns the vnode is recorded by v_mount.
     See vfsops(9) for further information of file system mount status.

     The v_op pointer points to its vnode operations vector.  This vector
     describes what operations can be done to the file associated with the
     vnode.  The system maintains one vnode operations vector for each file
     system type configured into the kernel.  The vnode operations vector con-
     tains a pointer to a function for each operation supported by the file
     system.  See vnodeops(9) for a description of vnode operations.

     When not in use, vnodes are kept on the freelist through v_freelist.  The
     vnodes still reference valid files but may be reused to refer to a new
     file at any time.  Often, these vnodes are also held in caches in the
     system, such as the name lookup cache.  When a valid vnode which is on
     the freelist is used again, the user must call vget() to increment the
     reference count and retrieve it from the freelist.  When a user wants a
     new vnode for another file getnewvnode() is invoked to remove a vnode
     from the freelist and initialize it for the new file.

     The type of object the vnode represents is recorded by v_type.  It is
     used by generic code to perform checks to ensure operations are performed
     on valid file system objects.  Valid types are:

           VNON   The vnode has no type.
           VREG   The vnode represents a regular file.
           VDIR   The vnode represents a directory.
           VBLK   The vnode represents a block special device.
           VCHR   The vnode represents a character special device.
           VLNK   The vnode represents a symbolic link.
           VSOCK  The vnode represents a socket.
           VFIFO  The vnode represents a pipe.
           VBAD   The vnode represents a bad file (not currently used).

     Vnode tag types are used by external programs only (e.g., pstat(8)), and
     should never be inspected by the kernel.  Its use is deprecated since new
     v_tag values cannot be defined for loadable file systems.  The v_tag mem-
     ber is read-only.  Valid tag types are:

           VT_NON        non file system
           VT_UFS        universal file system
           VT_NFS        network file system
           VT_MFS        memory file system
           VT_MSDOSFS    FAT file system
           VT_LFS        log-structured file system
           VT_LOFS       loopback file system
           VT_FDESC      file descriptor file system
           VT_PORTAL     portal daemon
           VT_NULL       null file system layer
           VT_UMAP       uid/gid remapping file system layer
           VT_KERNFS     kernel interface file system
           VT_PROCFS     process interface file system
           VT_AFS        AFS file system
           VT_ISOFS      ISO 9660 file system(s)
           VT_UNION      union file system
           VT_ADOSFS     Amiga file system
           VT_EXT2FS     Linux's EXT2 file system
           VT_CODA       Coda file system
           VT_FILECORE   filecore file system
           VT_NTFS       Microsoft NT's file system
           VT_VFS        virtual file system
           VT_OVERLAY    overlay file system
           VT_SMBFS      SMB file system

     All vnode locking operations use v_vnlock.  This lock is acquired by
     calling vn_lock(9) and released by calling VOP_UNLOCK(9).  The reason for
     this asymmetry is that vn_lock(9) is a wrapper for VOP_LOCK(9) with extra
     checks, while the unlocking step usually does not need additional checks
     and thus has no wrapper.

     The vnode locking operation is complicated because it is used for many
     purposes.  Sometimes it is used to bundle a series of vnode operations
     (see vnodeops(9)) into an atomic group.  Many file systems rely on it to
     prevent race conditions in updating file system type specific data struc-
     tures rather than using their own private locks.  The vnode lock can
     operate as a multiple-reader (shared-access lock) or single-writer lock
     (exclusive access lock), however many current file system implementations
     were written assuming only single-writer locking.  Multiple-reader lock-
     ing functions equivalently only in the presence of big-lock SMP locking
     or a uni-processor machine.  The lock may be held while sleeping.  While
     the v_vnlock is acquired, the holder is guaranteed that the vnode will
     not be reclaimed or invalidated.  Most file system functions require that
     you hold the vnode lock on entry.  See lock(9) for details on the kernel
     locking API.

     For leaf file systems (such as ffs, lfs, msdosfs, etc), v_vnlock will
     point to v_lock.  For stacked file systems, v_vnlock will generally point
     to v_vlock of the lowest file system.  Additionally, the implementation
     of the vnode lock is the responsibility of the individual file systems
     and v_vnlock may also be NULL indicating that a leaf node does not export
     a lock for vnode locking.  In this case, stacked file systems (such as
     nullfs) must call the underlying file system directly for locking.

     Each file system underlying a vnode allocates its own private area and
     hangs it from v_data.

     Most functions discussed in this page that operate on vnodes cannot be
     called from interrupt context.  The members v_numoutput, v_holdcnt,
     v_dirtyblkhd, v_cleanblkhd, v_freelist, and v_synclist are modified in
     interrupt context and must be protected by splbio(9) unless it is certain
     that there is no chance an interrupt handler will modify them.  The vnode
     lock must not be acquired within interrupt context.

              Calculate the total number of reference counts to a special
              device with vnode vp.

              Increment v_usecount of the vnode vp.  Any kernel thread system
              which uses a vnode (e.g., during the operation of some algorithm
              or to store in a data structure) should call vref().

              This function is an alias for vref().

              Decrement v_usecount of unlocked vnode vp.  Any code in the sys-
              tem which is using a vnode should call vrele() when it is fin-
              ished with the vnode.  If v_usecount of the vnode reaches zero
              and v_holdcnt is greater than zero, the vnode is placed on the
              holdlist.  If both v_usecount and v_holdcnt are zero, the vnode
              is placed on the freelist.

     vget(vp, lockflags)
              Reclaim vnode vp from the freelist, increment its reference
              count and lock it.  The argument lockflags specifies the
              lockmgr(9) flags used to lock the vnode.  If the VXLOCK is set
              in vp's v_flag, vnode vp is being recycled in vgone() and the
              calling thread sleeps until the transition is complete.  When it
              is awakened, an error is returned to indicate that the vnode is
              no longer usable (possibly having been recycled to a new file
              system type).

              Unlock vnode vp and decrement its v_usecount.  Depending of the
              reference counts, move the vnode to the holdlist or the freel-
              ist.  This operation is functionally equivalent to calling
              VOP_UNLOCK(9) followed by vrele().

              Mark the vnode vp as active by incrementing vp-&gt;v_holdcnt and
              moving the vnode from the freelist to the holdlist.  Once on the
              holdlist, the vnode will not be recycled until it is released
              with holdrele().

              This function is an alias for vhold().

              Mark the vnode vp as inactive by decrementing vp-&gt;v_holdcnt and
              moving the vnode from the holdlist to the freelist.

              This function is an alias for holdrele().

     getnewvnode(tag, mp, vops, vpp)
              Retrieve the next vnode from the freelist.  getnewvnode() must
              choose whether to allocate a new vnode or recycle an existing
              one.  The criterion for allocating a new one is that the total
              number of vnodes is less than the number desired or there are no
              vnodes on either free list.  Generally only vnodes that have no
              buffers associated with them are recycled and the next vnode
              from the freelist is retrieved.  If the freelist is empty,
              vnodes on the holdlist are considered.  The new vnode is
              returned in the address specified by vpp.

              The argument mp is the mount point for the file system requested
              the new vnode.  Before retrieving the new vnode, the file system
              is checked if it is busy (such as currently unmounting).  An
              error is returned if the file system is unmounted.

              The argument tag is the vnode tag assigned to *vpp-&gt;v_tag.  The
              argument vops is the vnode operations vector of the file system
              requesting the new vnode.  If a vnode is successfully retrieved
              zero is returned, otherwise an appropriate error code is

              Undo the operation of getnewvnode().  The argument vp is the
              vnode to return to the freelist.  This function is needed for
              VFS_VGET(9) which may need to push back a vnode in case of a
              locking race condition.

     vrecycle(vp, inter_lkp, p)
              Recycle the unused vnode vp to the front of the freelist.
              vrecycle() is a null operation if the reference count is greater
              than zero.

              Eliminate all activity associated with the unlocked vnode vp in
              preparation for recycling.

              Eliminate all activity associated with the locked vnode vp in
              preparation for recycling.

     vflush(mp, skipvp, flags)
              Remove any vnodes in the vnode table belonging to mount point
              mp.  If skipvp is not NULL it is exempt from being flushed.  The
              argument flags is a set of flags modifying the operation of
              vflush().  If FORCECLOSE is not specified, there should not be
              any active vnodes and the error EBUSY is returned if any are
              found (this is a user error, not a system error).  If FORCECLOSE
              is specified, active vnodes that are found are detached.  If
              WRITECLOSE is set, only flush out regular file vnodes open for
              writing.  SKIPSYSTEM causes any vnodes marked V_SYSTEM to be

     vaccess(type, file_mode, uid, gid, acc_mode, cred)
              Do access checking by comparing the file's permissions to the
              caller's desired access type acc_mode and credentials cred.

     checkalias(vp, nvp_rdev, mp)
              Check to see if the new vnode vp represents a special device for
              which another vnode represents the same device.  If such an
              aliases exists the existing contents and the aliased vnode are
              deallocated.  The caller is responsible for filling the new
              vnode with its new contents.

     bdevvp(dev, vpp)
              Create a vnode for a block device.  bdevvp() is used for root
              file systems, swap areas and for memory file system special

     cdevvp(dev, vpp)
              Create a vnode for a character device.  cdevvp() is used for the
              console and kernfs special devices.

     vfinddev(dev, vtype, vpp)
              Lookup a vnode by device number.  The vnode is returned in the
              address specified by vpp.

     vdevgone(int maj, int min, int minh, enum vtype type)
              Reclaim all vnodes that correspond to the specified minor number
              range minl to minh (endpoints inclusive) of the specified major

              Update outstanding I/O count vp-&gt;v_numoutput for the vnode
              bp-&gt;b_vp and do a wakeup if requested and vp-&gt;vflag has VBWAIT

     vflushbuf(vp, sync)
              Flush all dirty buffers to disk for the file with the locked
              vnode vp.  The argument sync specifies whether the I/O should be
              synchronous and vflushbuf() will sleep until vp-&gt;v_numoutput is
              zero and vp-&gt;v_dirtyblkhd is empty.

     vinvalbuf(vp, flags, cred, p, slpflag, slptimeo)
              Flush out and invalidate all buffers associated with locked
              vnode vp.  The argument p and cred specified the calling process
              and its credentials.  The ltsleep(9) flag and timeout are speci-
              fied by the arguments slpflag and slptimeo respectively.  If the
              operation is successful zero is returned, otherwise an appropri-
              ate error code is returned.

     vtruncbuf(vp, lbn, slpflag, slptimeo)
              Destroy any in-core buffers past the file truncation length for
              the locked vnode vp.  The truncation length is specified by lbn.
              vtruncbuf() will sleep while the I/O is performed,  The
              ltsleep(9) flag and timeout are specified by the arguments
              slpflag and slptimeo respectively.  If the operation is success-
              ful zero is returned, otherwise an appropriate error code is

     vprint(label, vp)
              This function is used by the kernel to dump vnode information
              during a panic.  It is only used if the kernel option DIAGNOSTIC
              is compiled into the kernel.  The argument label is a string to
              prefix the information dump of vnode vp.

     This section describes places within the NetBSD source tree where actual
     code implementing or using the vnode framework can be found.  All path-
     names are relative to /usr/src.

     The vnode framework is implemented within the file sys/kern/vfs_subr.c.

     intro(9), lock(9), namecache(9), namei(9), uvm(9), vattr(9), vfs(9),
     vfsops(9), vnodeops(9), vnsubr(9)

     The locking protocol is inconsistent.  Many vnode operations are passed
     locked vnodes on entry but release the lock before they exit.  The lock-
     ing protocol is used in some places to attempt to make a series of opera-
     tions atomic (e.g., access check then operation).  This does not work for
     non-local file systems that do not support locking (e.g., NFS).  The
     vnode interface would benefit from a simpler locking protocol.

BSD                           September 22, 2001                           BSD