You can use the kernel debugger’s !vad command to view the VADs for a given process. First find the address of the root of the VAD tree with the !process command. Then specify that address to the !vad command, as shown in the following example of the VAD tree for a process running Notepad.exe:lkd> !process 0 1 notepad.exe PROCESS 8718ed90 SessionId: 1 Cid: 1ea68 Peb: 7ffdf000 ParentCid: 0680 DirBase: ce2aa880 ObjectTable: ee6e01b0 HandleCount: 48. Image: notepad.exe VadRoot 865f10e0 Vads 51 Clone 0 Private 210. Modified 0. Locked 0. lkd> !vad 865f10e0 VAD level start end commit 8a05bf88 ( 6) 10 1f 0 Mapped READWRITE 88390ad8 ( 5) 20 20 1 Private READWRITE 87333740 ( 6) 30 33 0 Mapped READONLY 86d09d10 ( 4) 40 41 0 Mapped READONLY 882b49a0 ( 6) 50 50 1 Private READWRITE ... Total VADs: 51 average level: 5 maximum depth: 6

Rotate VADs

A video card driver must typically copy data from the user-mode graphics application to various other system memory, including the video card memory and the AGP port’s memory, both of which have different caching attributes as well as addresses. In order to quickly allow these different views of memory to be mapped into a process, and to support the different cache attributes, the memory manager implements rotate VADs, which allow video drivers to transfer data directly by using the GPU and to rotate unneeded memory in and out of the process view pages on demand. Figure 10-33 shows an example of how the same virtual address can rotate between video RAM and virtual memory.

Figure 10-33. Rotate virtual address descriptors

NUMA

Each new release of Windows provides new enhancements to the memory manager to better make use of Non Uniform Memory Architecture (NUMA) machines, such as large server systems (but also Intel i7 and AMD Opteron SMP workstations). The NUMA support in the memory manager adds intelligent knowledge of node information such as location, topology, and access costs to allow applications and drivers to take advantage of NUMA capabilities, while abstracting the underlying hardware details.

When the memory manager is initializing, it calls the MiComputeNumaCosts function to perform various page and cache operations on different nodes and then computes the time it took for those operations to complete. Based on this information, it builds a node graph of access costs (the distance between a node and any other node on the system). When the system requires pages for a given operation, it consults the graph to choose the most optimal node (that is, the closest). If no memory is available on that node, it chooses the next closest node, and so on.

Although the memory manager ensures that, whenever possible, memory allocations come from the ideal processor’s node (the ideal node) of the thread making the allocation, it also provides functions that allow applications to choose their own node, such as the VirtualAllocExNuma, CreateFileMappingNuma, MapViewOfFileExNuma, and AllocateUserPhysicalPagesNuma APIs.

The ideal node isn’t used only when applications allocate memory but also during kernel operation and page faults. For example, when a thread is running on a nonideal processor and takes a page fault, the memory manager won’t use the current node but will instead allocate memory from the thread’s ideal node. Although this might result in slower access time while the thread is still running on this CPU, overall memory access will be optimized as the thread migrates back to its ideal node. In any case, if the ideal node is out of resources, the closest node to the ideal node is chosen and not a random other node. Just like user-mode applications, however, drivers can specify their own node when using APIs such as MmAllocatePagesforMdlEx or MmAllocateContiguousMemorySpecifyCacheNode.

Various memory manager pools and data structures are also optimized to take advantage of NUMA nodes. The memory manager tries to evenly use physical memory from all the nodes on the system to hold the nonpaged pool. When a nonpaged pool allocation is made, the memory manager looks at the ideal node and uses it as an index to choose a virtual memory address range inside nonpaged pool that corresponds to physical memory belonging to this node. In addition, per-NUMA node pool freelists are created to efficiently leverage these types of memory configurations. Apart from nonpaged pool, the system cache and system PTEs are also similarly allocated across all nodes, as well as the memory manager’s look-aside lists.

Перейти на страницу:

Похожие книги