UVM(9) BSD Kernel Developer's Manual UVM(9)
uvm -- virtual memory system external interface
The UVM virtual memory system manages access to the computer's memory
resources. User processes and the kernel access these resources through
UVM's external interface. UVM's external interface includes functions
- initialise UVM subsystems
- manage virtual address spaces
- resolve page faults
- memory map files and devices
- perform uio-based I/O to virtual memory
- allocate and free kernel virtual memory
- allocate and free physical memory
In addition to exporting these services, UVM has two kernel-level pro-
cesses: pagedaemon and swapper. The pagedaemon process sleeps until
physical memory becomes scarce. When that happens, pagedaemon is awoken.
It scans physical memory, paging out and freeing memory that has not been
recently used. The swapper process swaps in runnable processes that are
currently swapped out, if there is room.
UVM has a machine independent and a machine dependent layer. See pmap(9)
for the machine dependent layer.
void uvm_init(void) void uvm_init_limits(struct proc *p) void
uvm_setpagesize(void) void uvm_swap_init(void)
The uvm_init() function sets up the UVM system at system boot time, after
the copyright has been printed. It initialises global state, the page,
map, kernel virtual memory state, machine-dependent physical map, kernel
memory allocator, pager and anonymous memory subsystems, and then enables
paging of kernel objects. uvm_init() must be called after machine-depen-
dent code has registered some free RAM with the uvm_page_physload() func-
The uvm_init_limits() function initialises process limits for the named
process. This is for use by the system startup for process zero, before
any other processes are created.
The uvm_setpagesize() function initialises the uvmexp members pagesize
(if not already done by machine-dependent code), pageshift and pagemask.
It should be called by machine-dependent code early in the pmap_init(9)
The uvm_swap_init() function initialises the swap subsystem.
VIRTUAL ADDRESS SPACE MANAGEMENT
int uvm_map(vm_map_t map, vaddr_t *startp, vsize_t size, struct
uvm_object *uobj, voff_t uoffset, vsize_t alignment, uvm_flag_t flags)
int uvm_map_pageable(vm_map_t map, vaddr_t start, vaddr_t end, boolean_t
new_pageable, int lockflags) int uvm_map_pageable_all(vm_map_t map, int
flags, vsize_t limit) boolean_t uvm_map_checkprot(vm_map_t map, vaddr_t
start, vaddr_t end, vm_prot_t protection) int uvm_map_protect(vm_map_t
map, vaddr_t start, vaddr_t end, vm_prot_t new_prot, boolean_t set_max)
int uvm_deallocate(vm_map_t map, vaddr_t start, vsize_t size) struct
vmspace * uvmspace_alloc(vaddr_t min, vaddr_t max, boolean_t pageable,
boolean_t remove_holes) void uvmspace_exec(struct proc *p, vaddr_t start,
vaddr_t end) struct vmspace * uvmspace_fork(struct process *pr) void
uvmspace_free(struct vmspace *vm) struct vmspace * uvmspace_share(struct
process *pr) vaddr_t uvm_uarea_alloc(void) void uvm_uarea_free(struct
proc *p) int UVM_MAPFLAG(vm_prot_t prot, vm_prot_t maxprot, vm_inherit_t
inh, int advice, int flags)
The uvm_map() function establishes a valid mapping in map map, which must
be unlocked. The new mapping has size size, which must be in PAGE_SIZE
units. If alignment is non-zero, it describes the required alignment of
the list, in power-of-two notation. The uobj and uoffset arguments can
have four meanings. When uobj is NULL and uoffset is UVM_UNKNOWN_OFFSET,
uvm_map() does not use the machine-dependent PMAP_PREFER function. If
uoffset is any other value, it is used as the hint to PMAP_PREFER. When
uobj is not NULL and uoffset is UVM_UNKNOWN_OFFSET, uvm_map() finds the
offset based upon the virtual address, passed as startp. If uoffset is
any other value, we are doing a normal mapping at this offset. The start
address of the map will be returned in startp.
flags passed to uvm_map() are typically created using the UVM_MAPFLAG()
macro, which uses the following values. The prot and maxprot can take a
mix of the following values:
#define PROT_MASK 0x07 /* protection mask */
#define PROT_NONE 0x00 /* protection none */
#define PROT_READ 0x01 /* read */
#define PROT_WRITE 0x02 /* write */
#define PROT_EXEC 0x04 /* exec */
The values that inh can take are:
#define MAP_INHERIT_MASK 0x30 /* inherit mask */
#define MAP_INHERIT_SHARE 0x00 /* "share" */
#define MAP_INHERIT_COPY 0x10 /* "copy" */
#define MAP_INHERIT_NONE 0x20 /* "none" */
#define MAP_INHERIT_ZERO 0x30 /* "zero" */
The values that advice can take are:
#define MADV_NORMAL 0x0 /* 'normal' */
#define MADV_RANDOM 0x1 /* 'random' */
#define MADV_SEQUENTIAL 0x2 /* 'sequential' */
#define MADV_MASK 0x7 /* mask */
The values that flags can take are:
#define UVM_FLAG_FIXED 0x010000 /* find space */
#define UVM_FLAG_OVERLAY 0x020000 /* establish overlay */
#define UVM_FLAG_NOMERGE 0x040000 /* don't merge map entries */
#define UVM_FLAG_COPYONW 0x080000 /* set copy_on_write flag */
#define UVM_FLAG_AMAPPAD 0x100000 /* bss: pad amap to reduce malloc() */
#define UVM_FLAG_TRYLOCK 0x200000 /* fail if we can not lock map */
#define UVM_FLAG_HOLE 0x400000 /* no backend */
The UVM_MAPFLAG macro arguments can be combined with an or operator.
There are also some additional macros to extract bits from the flags.
The UVM_PROTECTION, UVM_INHERIT, UVM_MAXPROTECTION and UVM_ADVICE macros
return the protection, inheritance, maximum protection and advice,
respectively. uvm_map() returns a standard errno.
The uvm_map_pageable() function changes the pageability of the pages in
the range from start to end in map map to new_pageable. The
uvm_map_pageable_all() function changes the pageability of all mapped
regions. If limit is non-zero and pmap_wired_count() is implemented,
ENOMEM is returned if the amount of wired pages exceed limit. The map is
locked on entry if lockflags contain UVM_LK_ENTER, and locked on exit if
lockflags contain UVM_LK_EXIT. uvm_map_pageable() and
uvm_map_pageable_all() return a standard errno.
The uvm_map_checkprot() function checks the protection of the range from
start to end in map map against protection. This returns either TRUE or
The uvm_map_protect() function changes the protection start to end in map
map to new_prot, also setting the maximum protection to the region to
new_prot if set_max is non-zero. This function returns a standard errno.
The uvm_deallocate() function deallocates kernel memory in map map from
address start to start + size.
The uvmspace_alloc() function allocates and returns a new address space,
with ranges from min to max, setting the pageability of the address space
to pageable. If remove_holes is non-zero, hardware 'holes' in the vir-
tual address space will be removed from the newly allocated address
The uvmspace_exec() function either reuses the address space of process p
if there are no other references to it, or creates a new one with
uvmspace_alloc(). The range of valid addresses in the address space is
reset to start through end.
The uvmspace_fork() function creates and returns a new address space
based upon the address space of process pr and is typically used when
allocating an address space for a child process.
The uvmspace_free() function lowers the reference count on the address
space vm, freeing the data structures if there are no other references.
The uvmspace_share() function returns a reference to the address space of
process pr, increasing its reference count.
The uvm_uarea_alloc() function allocates a thread's 'uarea', the memory
where its kernel stack and PCB are stored. The uvm_uarea_free() function
frees the uarea for thread p, which must no longer be running.
PAGE FAULT HANDLING
int uvm_fault(vm_map_t orig_map, vaddr_t vaddr, vm_fault_t fault_type,
The uvm_fault() function is the main entry point for faults. It takes
orig_map as the map the fault originated in, a vaddr offset into the map
the fault occurred, fault_type describing the type of fault, and
access_type describing the type of access requested. uvm_fault() returns
a standard errno.
MEMORY MAPPING FILES AND DEVICES
struct uvm_object * uvn_attach(struct vnode *vp, vm_prot_t accessprot)
void uvm_vnp_setsize(struct vnode *vp, voff_t newsize) void
uvm_vnp_sync(struct mount *mp) void uvm_vnp_terminate(struct vnode *vp)
boolean_t uvm_vnp_uncache(struct vnode *vp)
The uvn_attach() function attaches a UVM object to vnode vp, creating the
object if necessary. The object is returned.
The uvm_vnp_setsize() function sets the size of vnode vp to newsize.
Caller must hold a reference to the vnode. If the vnode shrinks, pages
no longer used are discarded. This function will be removed when the
file system and VM buffer caches are merged.
The uvm_vnp_sync() function flushes dirty vnodes from either the mount
point passed in mp, or all dirty vnodes if mp is NULL. This function
will be removed when the file system and VM buffer caches are merged.
The uvm_vnp_terminate() function frees all VM resources allocated to
vnode vp. If the vnode still has references, it will not be destroyed;
however all future operations using this vnode will fail. This function
will be removed when the file system and VM buffer caches are merged.
The uvm_vnp_uncache() function disables vnode vp from persisting when all
references are freed. This function will be removed when the file system
and UVM caches are unified. Returns true if there is no active vnode.
VIRTUAL MEMORY I/O
int uvm_io(vm_map_t map, struct uio *uio)
The uvm_io() function performs the I/O described in uio on the memory
described in map.
ALLOCATION OF KERNEL MEMORY
vaddr_t uvm_km_alloc(vm_map_t map, vsize_t size) vaddr_t
uvm_km_zalloc(vm_map_t map, vsize_t size) vaddr_t uvm_km_alloc1(vm_map_t
map, vsize_t size, vsize_t align, boolean_t zeroit) vaddr_t
uvm_km_kmemalloc(vm_map_t map, struct uvm_object *obj, vsize_t size, int
flags) vaddr_t uvm_km_valloc(vm_map_t map, vsize_t size) vaddr_t
uvm_km_valloc_wait(vm_map_t map, vsize_t size) struct vm_map *
uvm_km_suballoc(vm_map_t map, vaddr_t *min, vaddr_t *max , vsize_t size,
int flags, boolean_t fixed, vm_map_t submap) void uvm_km_free(vm_map_t
map, vaddr_t addr, vsize_t size) void uvm_km_free_wakeup(vm_map_t map,
vaddr_t addr, vsize_t size)
The uvm_km_alloc() and uvm_km_zalloc() functions allocate size bytes of
wired kernel memory in map map. In addition to allocation,
uvm_km_zalloc() zeros the memory. Both of these functions are defined as
macros in terms of uvm_km_alloc1(), and should almost always be used in
preference to uvm_km_alloc1().
The uvm_km_alloc1() function allocates and returns size bytes of wired
memory in the kernel map aligned to the align boundary, zeroing the mem-
ory if the zeroit argument is non-zero.
The uvm_km_kmemalloc() function allocates and returns size bytes of wired
kernel memory into obj. The flags can be any of:
#define UVM_KMF_NOWAIT 0x1 /* matches M_NOWAIT */
#define UVM_KMF_VALLOC 0x2 /* allocate VA only */
#define UVM_KMF_TRYLOCK UVM_FLAG_TRYLOCK /* try locking only */
The UVM_KMF_NOWAIT flag causes uvm_km_kmemalloc() to return immediately
if no memory is available. UVM_KMF_VALLOC causes no pages to be allo-
cated, only a virtual address. UVM_KMF_TRYLOCK causes uvm_km_kmemalloc()
to only try and not sleep when locking maps.
The uvm_km_valloc() and uvm_km_valloc_wait() functions return a newly
allocated zero-filled address in the kernel map of size size.
uvm_km_valloc_wait() will also wait for kernel memory to become avail-
able, if there is a memory shortage.
The uvm_km_suballoc() function allocates submap (with the specified
flags, as described above) from map, creating a new map if submap is
NULL. The addresses of the submap can be specified exactly by setting
the fixed argument to non-zero, which causes the min argument to specify
the beginning of the address in the submap. If fixed is zero, any
address of size size will be allocated from map and the start and end
addresses returned in min and max.
The uvm_km_free() and uvm_km_free_wakeup() functions free size bytes of
memory in the kernel map, starting at address addr. uvm_km_free_wakeup()
calls wakeup() on the map before unlocking the map.
ALLOCATION OF PHYSICAL MEMORY
struct vm_page * uvm_pagealloc(struct uvm_object *uobj, voff_t off,
struct vm_anon *anon, int flags) void uvm_pagerealloc(struct vm_page *pg,
struct uvm_object *newobj, voff_t newoff) void uvm_pagefree(struct
vm_page *pg) int uvm_pglistalloc(psize_t size, paddr_t low, paddr_t high,
paddr_t alignment, paddr_t boundary, struct pglist *rlist, int nsegs, int
flags) void uvm_pglistfree(struct pglist *list) void
uvm_page_physload(paddr_t start, paddr_t end, paddr_t avail_start,
paddr_t avail_end, int free_list)
The uvm_pagealloc() function allocates a page of memory at virtual
address off in either the object uobj or the anonymous memory anon, or
returns NULL if no pages are free. Only one of anon and uobj can be non
NULL. The flags can be any of:
#define UVM_PGA_USERESERVE 0x0001 /* ok to use reserve pages */
#define UVM_PGA_ZERO 0x0002 /* returned page must be zeroed */
The UVM_PGA_USERESERVE flag means to allocate a page even if that will
result in the number of free pages being lower than
uvmexp.reserve_pagedaemon (if the current thread is the pagedaemon) or
uvmexp.reserve_kernel (if the current thread is not the pagedaemon). The
UVM_PGA_ZERO flag causes the returned page to be filled with zeroes,
either by allocating it from a pool of pre-zeroed pages or by zeroing it
in-line as necessary.
The uvm_pagerealloc() function reallocates page pg to a new object
newobj, at a new offset newoff.
The uvm_pagefree() function frees the physical page pg.
The uvm_pglistalloc() function allocates a list of pages for size size
byte under various constraints. low and high describe the lowest and
highest addresses acceptable for the list. If alignment is non-zero, it
describes the required alignment of the list, in power-of-two notation.
If boundary is non-zero, no segment of the list may cross this power-of-
two boundary, relative to zero. nsegs is the maximum number of physi-
cally contiguous segments. The allocated memory is returned in the rlist
list. The flags can be any of:
#define UVM_PLA_WAITOK 0x0001 /* may sleep */
#define UVM_PLA_NOWAIT 0x0002 /* can't sleep */
#define UVM_PLA_ZERO 0x0004 /* zero all pages before returning */
The UVM_PLA_WAITOK flag means that the function may sleep while trying to
allocate the list of pages (this is currently ignored). Conversely, the
UVM_PLA_NOWAIT flag signifies that the function may not sleep while allo-
cating. It is an error not to provide one of the above flags. Option-
ally, one may also specify the UVM_PLA_ZERO flag to receive zeroed memory
in the page list.
The uvm_pglistfree() function frees the list of pages pointed to by list.
The uvm_page_physload() function loads physical memory segments into VM
space on the specified free_list. uvm_page_physload() must be called at
system boot time to set up physical memory management pages. The argu-
ments describe the start and end of the physical addresses of the seg-
ment, and the available start and end addresses of pages not already in
void uvm_pageout(void *arg)
The uvm_pageout() function is the main loop for the page daemon. The arg
argument is ignored.
struct uvm_object * uao_create(vsize_t size, int flags) void
uao_detach(struct uvm_object *uobj) void uao_reference(struct uvm_object
*uobj) boolean_t uvm_chgkprot(caddr_t addr, size_t len, int rw) void
uvm_kernacc(caddr_t addr, size_t len, int rw) void uvm_vslock(struct proc
*p, caddr_t addr, size_t len, vm_prot_t access_type) void
uvm_vsunlock(struct proc *p, caddr_t addr, size_t len) void uvm_meter()
int uvm_sysctl(int *name, u_int namelen, void *oldp, size_t *oldlenp,
void *newp , size_t newlen, struct proc *p) int uvm_grow(struct proc *p,
vaddr_t sp) int uvm_coredump(struct proc *p, struct vnode *vp, struct
ucred *cred, struct core *chdr)
The uao_create(), uao_detach() and uao_reference() functions operate on
anonymous memory objects, such as those used to support System V shared
memory. uao_create() returns an object of size size with flags:
#define UAO_FLAG_KERNOBJ 0x1 /* create kernel object */
#define UAO_FLAG_KERNSWAP 0x2 /* enable kernel swap */
which can only be used once each at system boot time. uao_reference()
creates an additional reference to the named anonymous memory object.
uao_detach() removes a reference from the named anonymous memory object,
destroying it if removing the last reference.
The uvm_chgkprot() function changes the protection of kernel memory from
addr to addr + len to the value of rw. This is primarily useful for
debuggers, for setting breakpoints. This function is only available with
The uvm_kernacc() function checks the access at address addr to addr +
len for rw access, in the kernel address space.
The uvm_vslock() and uvm_vsunlock() functions control the wiring and
unwiring of pages for process p from addr to addr + len. The access_type
argument of uvm_vslock() is passed to uvm_fault(). These functions are
normally used to wire memory for I/O.
The uvm_meter() function calculates the load average and wakes up the
swapper if necessary.
The uvm_sysctl() function provides support for the CTL_VM domain of the
sysctl(3) hierarchy. uvm_sysctl() handles the VM_LOADAVG, VM_METER and
VM_UVMEXP calls, which return the current load averages, calculates cur-
rent VM totals, and returns the uvmexp structure respectively. The load
averages are accessed from userland using the getloadavg(3) function.
The uvmexp structure has all global state of the UVM system, and has the
/* vm_page constants */
int pagesize; /* size of a page (PAGE_SIZE): must be power of 2 */
int pagemask; /* page mask */
int pageshift; /* page shift */
/* vm_page counters */
int npages; /* number of pages we manage */
int free; /* number of free pages */
int active; /* number of active pages */
int inactive; /* number of pages that we free'd but may want back */
int paging; /* number of pages in the process of being paged out */
int wired; /* number of wired pages */
int zeropages; /* number of zero'd pages */
int reserve_pagedaemon; /* number of pages reserved for pagedaemon */
int reserve_kernel; /* number of pages reserved for kernel */
int anonpages; /* number of pages used by anon pagers */
int vnodepages; /* number of pages used by vnode page cache */
int vtextpages; /* number of pages used by vtext vnodes */
/* pageout params */
int freemin; /* min number of free pages */
int freetarg; /* target number of free pages */
int inactarg; /* target number of inactive pages */
int wiredmax; /* max number of wired pages */
int anonmin; /* min threshold for anon pages */
int vtextmin; /* min threshold for vtext pages */
int vnodemin; /* min threshold for vnode pages */
int anonminpct; /* min percent anon pages */
int vtextminpct;/* min percent vtext pages */
int vnodeminpct;/* min percent vnode pages */
/* swap */
int nswapdev; /* number of configured swap devices in system */
int swpages; /* number of PAGE_SIZE'ed swap pages */
int swpginuse; /* number of swap pages in use */
int swpgonly; /* number of swap pages in use, not also in RAM */
int nswget; /* number of times fault calls uvm_swap_get() */
int nanon; /* number total of anon's in system */
int nanonneeded;/* number of anons currently needed */
int nfreeanon; /* number of free anon's */
/* stat counters */
int faults; /* page fault count */
int traps; /* trap count */
int intrs; /* interrupt count */
int swtch; /* context switch count */
int softs; /* software interrupt count */
int syscalls; /* system calls */
int pageins; /* pagein operation count */
/* pageouts are in pdpageouts below */
int swapins; /* swapins */
int swapouts; /* swapouts */
int pgswapin; /* pages swapped in */
int pgswapout; /* pages swapped out */
int forks; /* forks */
int forks_ppwait; /* forks where parent waits */
int forks_sharevm; /* forks where vmspace is shared */
int pga_zerohit; /* pagealloc where zero wanted and zero
was available */
int pga_zeromiss; /* pagealloc where zero wanted and zero
not available */
int zeroaborts; /* number of times page zeroing was
/* fault subcounters */
int fltnoram; /* number of times fault was out of ram */
int fltnoanon; /* number of times fault was out of anons */
int fltpgwait; /* number of times fault had to wait on a page */
int fltpgrele; /* number of times fault found a released page */
int fltrelck; /* number of times fault relock called */
int fltrelckok; /* number of times fault relock is a success */
int fltanget; /* number of times fault gets anon page */
int fltanretry; /* number of times fault retrys an anon get */
int fltamcopy; /* number of times fault clears "needs copy" */
int fltnamap; /* number of times fault maps a neighbor anon page */
int fltnomap; /* number of times fault maps a neighbor obj page */
int fltlget; /* number of times fault does a locked pgo_get */
int fltget; /* number of times fault does an unlocked get */
int flt_anon; /* number of times fault anon (case 1a) */
int flt_acow; /* number of times fault anon cow (case 1b) */
int flt_obj; /* number of times fault is on object page (2a) */
int flt_prcopy; /* number of times fault promotes with copy (2b) */
int flt_przero; /* number of times fault promotes with zerofill (2b) */
/* daemon counters */
int pdwoke; /* number of times daemon woke up */
int pdrevs; /* number of times daemon rev'd clock hand */
int pdswout; /* number of times daemon called for swapout */
int pdfreed; /* number of pages daemon freed since boot */
int pdscans; /* number of pages daemon scanned since boot */
int pdanscan; /* number of anonymous pages scanned by daemon */
int pdobscan; /* number of object pages scanned by daemon */
int pdreact; /* number of pages daemon reactivated since boot */
int pdbusy; /* number of times daemon found a busy page */
int pdpageouts; /* number of times daemon started a pageout */
int pdpending; /* number of times daemon got a pending pagout */
int pddeact; /* number of pages daemon deactivates */
int pdreanon; /* anon pages reactivated due to min threshold */
int pdrevnode; /* vnode pages reactivated due to min threshold */
int pdrevtext; /* vtext pages reactivated due to min threshold */
int fpswtch; /* FPU context switches */
int kmapent; /* number of kernel map entries */
The uvm_grow() function increases the stack segment of process p to
The uvm_coredump() function generates a coredump on vnode vp for process
p with credentials cred and core header description in chdr.
The structure and types whose names begin with ``vm_'' were named so UVM
could coexist with BSD VM during the early development stages.
getloadavg(3), kvm(3), sysctl(3), ddb(4), options(4), pmap(9)
Charles D. Cranor, Design and Implementation of the UVM Virtual Memory
System, D.Sc. dissertation, Department of Computer Science, Sever
Institute of Technology, Washington University, St. Louis, Missouri,
The UVM virtual memory system was developed at Washington University in
St. Louis. UVM's roots lie partly in the Mach-based 4.4BSD VM system,
the FreeBSD VM system, and the SunOS4 VM system. UVM's basic structure
is based on the 4.4BSD VM system. UVM's new anonymous memory system is
based on the anonymous memory system found in the SunOS4 VM (as described
in papers published by Sun Microsystems, Inc.). UVM also includes a num-
ber of features new to BSD including page loanout, map entry passing,
simplified copy-on-write, and clustered anonymous memory pageout.
UVM appeared in OpenBSD 2.9.
Charles D. Cranor <email@example.com> designed and implemented UVM.
Matthew Green <firstname.lastname@example.org> wrote the swap-space management code.
Chuck Silvers <email@example.com> implemented the aobj pager, thus allowing
UVM to support System V shared memory and process swapping.
Artur Grabowski <firstname.lastname@example.org> handled the logistical issues involved
with merging UVM into the OpenBSD source tree.
BSD January 15, 2015 BSD