Capturing Process Memory Usage Under Linux

July 2009

Contents

Introduction

Pages and Mappings

PageMap Reports

Processes and Components

Process Memory Metrics

PageMap Tool

Attachments

Reader Comments


Introduction

Recently we here at EQware had occasion to want detailed information about process memory usage on an embedded Linux system. Naturally we turned to the web to discover what tools and techniques the community might have to offer. What we found surprised us: apparently it's considered hard to get good per-process memory information under Linux! We read any number of discussions of why ps and top give misleading results (they report RSS memory mappings), and why /proc/pid/maps and /proc/pid/smaps aren't the answer either (they report virtual memory mappings).

Just as we were scratching our heads and trying to decide what to do next, we bumped into "ELC: How much memory are applications really using?", a discussion of a set of patches written by Matt Mackall which provided precisely what we needed: a way of relating virtual mappings to physical pages. His proposed method uses the /proc filing system to expose some new windows into Linux memory management:

 /proc/kpagecount — A binary array of 64-bit words, one for each page of physical RAM, containing the current count of mappings for that page.

 /proc/kpageflags — A binary array of 64-bit words, one for each page of physical RAM, containing a set of flag bits for that page.

 /proc/pid/pagemap — A binary array of 64-bit words, one for each page in process pid's virtual address space, containing the physical address of the mapped page.

Using these new resources in conjunction with /proc/pid/maps, it becomes possible to pin down exactly which process(es) is/are mapping which physical page(s). Better still, this feature incurs no overhead – all the author has done is make visible some state information which was already being kept by the kernel. The method does have one drawback: taking a memory “snapshot” reqires the reading of a lot of files. Depending on the size of RAM, the number and complexity of processes running, and the CPU load, that can take a long time – up to 30 seconds or more on our target platform (though 5-10 seconds was more typical).

It's hard to call a data set collected over so long an interval a “snapshot”, all the more so since any time-skew in data collection might result in inconsistencies as the page mappings change. Even so, Mr. Mackall's patch, which we shall refer to as the PageMap feature, provides a vastly better picture of process memory usage under Linux than has ever been available before.

Good news: as of Linux kernel version 2.6.25, the PageMap feature has been incorporated into the kernel. Bad news: up until kernel version 2.6.28.8, it contained two egregious bugs. The bugs are easy to fix (the patch is given below), and it's rather surprising they survived so long. It seems as if the good news about Linux process memory measurement is being slow to get around. Once we understood the capabilities the PageMap feature offered, we rushed back to the web expecting to find any number of tools and scripts which made use of it. What we found was... nothing. Maybe we didn't search hard enough, but Mr. Mackall's original Python scripts, which we had already judged didn't meet our needs, were pretty much it in the tools department.

So we did what any good Linux programmer would do: we wrote our own. It is these PageMap tools, and the results they provide, that are the topics of this article.

Pages and Mappings

All fundamental memory management in Linux is carried out in pages, contiguous blocks of physical memory. On i386 systems the page size is typically 4Kb, but you can enter "getconf PAGE_SIZE" at the command line to find out the page size on your platform. Most of the memory left unused after the kernel loads itself is turned over to the heap allocator, which parcels it out in page-sized chunks, back to the kernel or to user processes.

Processes allocate memory under several circumstances. When an application is loaded, a new process is created and pages are allocated for the application image itself, plus pages for any libraries it loads, plus more pages for any static data the application has declared. Once the application is running it will reserve stack pages, and it may call malloc() or new , allocating yet more pages.

User programs run in virtual memory, so allocating a page involves creating a mapping from a user virtual address to the page's physical address. Given the ability to see the page mappings on a per-process basis (which we have, via /proc/pid/maps) it might seem obvious to simply count the mapped pages and thereby tell how much memory the process is using. A variant of this is how top and ps and several other standard Linux utilities do in fact report memory usage.

But page mappings can be made without actually allocating memory. The means by which libraries are shared is by allowing multiple mappings of the library pages. Not all of the pages of a shared library need be mapped by any process at all, and different applications may map different parts of the same library. Finally, during normal operation some number of pages which were originally mapped, allocated and loaded will get swapped out. The mapping and allocation remain, but the page is not loaded in memory. These facts make a simple count of mappings a very unreliable indicator of per-process memory usage.

PageMap Reports

Below is a fragment, the upper-left corner, of a human-readable Process/Component Matrix report, the most inclusive of the reports generated by the PageMap tools:

1 2 3 4 PID - PROCESS NAME : USS PSS RSS : heap libwe ANON Xfbde ---- - ------------------- - ----- -------- ----- - ----- ----- ----- ----- 2124 - app : 2644 3128.21 4120 : 793 760 46 0 2074 - /usr/bin/Xfbdev : 528 567.46 736 : 120 0 86 195 2127 - mb-applet-volume : 343 660.23 1426 : 256 0 12 0 2113 - matchbox-panel : 173 217.60 591 : 117 0 8 0 2130 - mb-applet-menu-laun : 155 196.25 565 : 102 0 9 0 ---- - ------------------- - ----- -------- ----- - ----- ----- ----- ----- 2125 - mb-applet-clock : 149 215.82 718 : 86 0 10 0 2107 - matchbox-window-man : 147 190.13 543 : 68 0 9 0 2121 - /usr/bin/appinitd : 122 202.40 604 : 48 0 12 0 1970 - avahi-daemon: runni : 116 133.91 328 : 13 0 8 0 2123 - /usr/libexec/gconfd : 104 143.09 440 : 48 0 7 0

All numeric values (except PIDs) are page-counts. Each running process occupies one row, and rows are arranged in decreasing USS order. For each process, the first 4 columns give the process ID and name, and the USS, PSS and RSS of the process.

The remaining columns are a count of pages mapped by the process, per-component, in decreasing RSS order. Remember that the table above is only a fragment of a real Process/Component Matrix report, which might well have 200 or more columns. Column names are limited to 5 characters, but a separate table following the matrix gives an indexed, ordered pairing of these shortened names and the original, full names. (In the alternate "comma-separated" output format, column names are not truncated).

Processes and Components

A process, in this context, is an ordinary Linux process, the kind you'd see on a ps report. Processes represent some body of code – an application or a daemon – which is active in memory and has resources allocated to it. Note that the kernel itself and anything linked to it, such as device drivers, are not processes, and will not appear on this list.

A component is a set of one or more memory regions attributable to the same source. Multiple regions associated with a single source typically have different characteristics – one region may be read-only, another may be read-write, a third may be execute-only.

Most components are “file-backed” – their content is attributable to (and indeed directly mapped to) a file in the filing system. For example, the first process in the example above is app. Somewhere out on the right in the header row will be the name of the file containing the app executable's binary image, /usr/bin/app – that's one component of the app process, and it will account for some portion of the process's total page-count. Other components include stack and heap pages (which don't have any file backing, and are hence anonymous), and many, many shared library pages. Each of these has a seperate column – every “provider” of mapped memory pages is represented by a column.

Process Memory Metrics

For each process, 3 memory metrics are given in addition to the per-component page mapping counts. These are:

 USS (Unique Set Size) — Indicates the number of pages a process has mapped which are mapped only by that process – which are not mapped by any other process. This is the amount of memory which would be recovered if this process were to be unloaded.

 PSS (Proportional Set Size) — The count of all pages mapped uniquely by the process, plus a fraction of each shared page, said fraction to be proportional to the number of processes which have mapped the page. For example, if a process mapped 1 page that was unmapped by any other process (USS=1), and it mapped a second page which was also mapped by one other process, the PSS value reported would be 1.5. If there were three users of the shared page, the reported PSS would be 1.33.

 RSS (Resident Set Size) — This is currently the most common – but much disparaged – measure of per-process memory under Linux, reported by utilities such as ps and top. It indicates the total count of pages mapped by the specified process, including 100% of all shared pages. For each process shown in a Process/Component Matrix, the sum of the page counts in all of the columns in its row is equal to the RSS value for the process.

PageMap Tools

The PageMap tools consists of two separate command-line utilities:

 page-collect.c — Collects the memory “snapshot”; runs on the target platform.

 page-analyze.cpp — Analyzes the memory “snapshot” and generates reports; runs on any platform.

page-collect is a very simple utility. It would have been even simpler to have written it in shell script, but that would have slowed it down, and taking a snapshot already takes longer than is really desirable. page-collect offers very few options. Here is the result of entering page-collect --help:

page-collect -- collect a snapshot each of of the /proc/pid/maps files, with each VM region interleaved with a list of physical addresses which make up the virtual region. usage: page-collect {switches} switches: -o out-file -- Output file name (def=./page-collect.out)

page-analyze takes a data file as output by page-collect and generates reports. Here is the result of entering page-analyze --help:

page-analyze -- analyze a snapshot file created by page-collect and generate specified reports. usage: page-analyze {switches} switches: -c -- Generate component report. -i in-file -- Input file name (def=./page-collect.dat) -m -- Generate process/component matrix. -mcsv -- Generate matrix in CSV format. -Mb -- Report in Mbytes (def=pages) -o out-file -- Output file name (def=./page-analyze.dat) -p -- Generate process report

Process/Component Matrix reports are discussed above; Process and Component Reports are simpler, less detailed reports, giving USS, PSS and RSS on a per-process or per-component basis. All of the information that appears in both the Process and Component Reports is also present in a Process/Component Matrix, plus a detailed breakdown of the page mappings between processes and components. On the other hand, both of the Process and Component Reports are much shorter than the Process/Component Matrix.

The Process/Component Matrix can be written in either of two ways: human-readable text as shown above (‑m) or comma-separated text, suitable for import into a spreadsheet (‑mcsv).

Attachments

Attached below are files containing sections of code which implement the tools described above, together with examples of some reports.

 fs/proc/page.c.diff

Contains a diff between the original (2.6.25) release of the PageMap feature (which appears in fs/proc/proc_misc.c), and the fixed version (2.6.28.8). Note that only the changes needed to fix the two bugs are given; other differences are not shown.

 page-collect.c  (HTML)

Contains the PageMapTools data collection utility, which must be compiled for and run on the target platform. Known to build successfully with gcc V4.1.2.

 page-analyze.cpp  (HTML)

Contains the PageMapTools report generator utility, which converts the output of a prior page-collect run into one or more human-readable formats, and which may be compiled and run on any platform. Known to build successfully with g++ V4.3.3.

 Makefile

Contains a mindlessly-simple makefile for building the above two utilities using the GNU toolchain.

 matrix.txt

Contains an example of the most inclusive PageMapTools report type, the Process/Component Matrix, in human-readable format.

 matrix.csv

Contains a duplicate of the previous example, but this time in comma-separated format, suitable for import into a spreadsheet.

Contact Us | info@eqware.net


Copyright ©2009 by EQWARE Engineering Inc.