编程获取Linux的内存占用和CPU使用率
2020-01-30 ProgrammingLinux平台上有很多系统监控工具,自己用过的就有top、htop、glances,可以在上面看到系统平均负载,内存占用,CPU使用率等信息。除此之外还有uptime,lsblk,df,du,free,iftop等针对性查看系统资源的命令。
准备在自己的程序中获取相关系统资源的信息,解析命令输出显得不伦不类,于是就找了找其它方法。
内存占用
很久之前写Shell脚本时看过screenfetch
的相关代码,还顺便提了一个bug。当时screenfetch
的计算方法是读取/proc/meminfo
,然后按照MemUsed = Memtotal + Shmem - MemFree - Buffers - Cached - SReclaimable
的公式进行计算。
(有意思的是screenfetch
计算内存占用的代码在注释里提到了neofetch
的一次pull request,点进去发现neofetch
的代码又在注释里提到了screenfetch
的一条issue,可以算是2016年的套娃了)
现在再去看最新的代码,screenfetch
中旧的计算方式已经在一次commit中被注释掉了,改成了解析free
命令的输出来计算内存占用,而neofetch
中的代码还保持原样。
常用的free
命令并不会直接显示系统内存一共占用了多少,而是提供了一项“可用内存”,screenfetch
中的内存占用就是通过“总内存”减去这项“可用内存”得到的。用pacman -Qo
查了一下,free
命令是在procps-ng
包里,源码托管在GitLab上的procps仓库。
在free.c
源文件的这里可以看到“可用内存”对应着代码中的kb_main_available
,而计算kb_main_available
的代码则在proc/sysinfo.c
文件的meminfo()
函数中。该函数首先会读取/proc/meminfo
文件,然后将其中MemAvailable
的值记作kb_main_available
,如果在/proc/meminfo
文件中没有找到MemAvailable
,则执行:
/* zero? might need fallback for 2.6.27 <= kernel <? 3.14 */
if (!kb_main_available) {
#ifdef __linux__
if (linux_version_code < LINUX_VERSION(2, 6, 27))
kb_main_available = kb_main_free;
else {
FILE_TO_BUF(VM_MIN_FREE_FILE, vm_min_free_fd); /* /proc/sys/vm/min_free_kbytes */
kb_min_free = (unsigned long) strtoull(buf,&tail,10);
watermark_low = kb_min_free * 5 / 4; /* should be equal to sum of all 'low' fields in /proc/zoneinfo */
mem_available = (signed long)kb_main_free - watermark_low
+ kb_inactive_file + kb_active_file - MIN((kb_inactive_file + kb_active_file) / 2, watermark_low)
+ kb_slab_reclaimable - MIN(kb_slab_reclaimable / 2, watermark_low);
if (mem_available < 0) mem_available = 0;
kb_main_available = (unsigned long)mem_available;
}
#else
kb_main_available = kb_main_free;
#endif /* linux */
即当Linux内核版本小于2.6.27时,会将MemFree
记作kb_main_available
,否则会采用另一套较复杂的计算方法。
使用man proc
命令查看Linux文档,找到/proc/meminfo
部分,这里对MemAvailable
的介绍如下:
MemAvailable %lu (since Linux 3.14)
An estimate of how much memory is available for starting new applications, without swapping.
所以free
命令在内核版本低于3.14的Linux系统上会获取不到MemAvailable
的值,转而通过上面的备用方法来计算kb_main_available
,MemAvailable
的值也确实是启动新程序时的真实可用内存。
除了读取/proc/meminfo
,在搜索的过程中也发现了Linux的系统调用int sysinfo(struct sysinfo *info)
,通过man sysinfo
可以查看相关结构体的介绍:
struct sysinfo {
long uptime; /* Seconds since boot */
unsigned long loads[3]; /* 1, 5, and 15 minute load averages */
unsigned long totalram; /* Total usable main memory size */
unsigned long freeram; /* Available memory size */
unsigned long sharedram; /* Amount of shared memory */
unsigned long bufferram; /* Memory used by buffers */
unsigned long totalswap; /* Total swap space size */
unsigned long freeswap; /* Swap space still available */
unsigned short procs; /* Number of current processes */
unsigned long totalhigh; /* Total high memory size */
unsigned long freehigh; /* Available high memory size */
unsigned int mem_unit; /* Memory unit size in bytes */
char _f[20-2*sizeof(long)-sizeof(int)];
/* Padding to 64 bytes */
};
其中涉及到内存的有totalram
,freeram
,sharedram
,bufferram
几项,同时下面也注明了:
NOTES
All of the information provided by this system call is also available via /proc/meminfo and /proc/loadavg.
所以该系统调用获取的数据只是/proc
伪文件系统提供数据的一个子集,仅通过该系统调用并不能得到真实的可用内存,相比之下还是读取/proc/meminfo
文件中的MemAvailable
更为合适。
在维基百科的Linux kernel version history页面上查了一下,Linux内核的3.14版本在2014年3月份发布,还在维护的最旧的版本是3.16,所以准备在自己的程序里直接读取/proc/meminfo
,然后系统整体的内存占用通过MemUsed = Memtotal - MemAvailable
进行计算。
示例代码如下:
#include<stdio.h>
#include<string.h>
#include<unistd.h>
#include<fcntl.h>
#define ul unsigned long
ul mem_total; // MemTotal
ul mem_free; // MemFree
ul mem_shared; // Shmem
ul mem_buffers; // Buffers
ul mem_available; // MemAvailable
ul page_cache; // Cached
ul slab_reclaimable;// SReclaimable
ul mem_cached; // = page_cache + slab_reclaimable
ul mem_used; // = mem_total - mem_free - mem_cached - mem_buffers
int main(void)
{
int fd;
if((fd = open("/proc/meminfo", O_RDONLY)) == -1)
return -1;
int cnt = 0;
char buf[8192];
char line[1024];
while(1)
{
lseek(fd, 0, SEEK_SET);
if((cnt = read(fd, buf, sizeof(buf)-1)) < 0)
return -1;
buf[cnt] = '\0';
int pos = 0;
char name[32];
ul value;
for(int i = 0;i <= cnt;i++)
{
line[pos++] = buf[i];
if(buf[i] != '\n' && buf[i] != '\0')
continue;
line[pos] = '\0';
pos = 0;
sscanf(line, "%s%lu", name, &value);
if(!strcmp(name, "MemTotal:"))
mem_total = value;
else if(!strcmp(name, "MemFree:"))
mem_free = value;
else if(!strcmp(name, "Shmem:"))
mem_shared = value;
else if(!strcmp(name, "Buffers:"))
mem_buffers = value;
else if(!strcmp(name, "MemAvailable:"))
mem_available = value;
else if(!strcmp(name, "Cached:"))
page_cache = value;
else if(!strcmp(name, "SReclaimable:"))
slab_reclaimable = value;
}
mem_cached = page_cache + slab_reclaimable;
mem_used = mem_total - mem_free - mem_cached - mem_buffers;
// 总计 已用 空闲 共享 缓冲/缓存 可用
printf("%lu %lu %lu %lu %lu %lu\n",
mem_total,
mem_used,
mem_free,
mem_shared,
mem_cached + mem_buffers,
mem_available);
// 内存占用
printf("mem_usage: %lu(%.1f%%)\n",
mem_total - mem_available,
(mem_total - mem_available) * 100.0 / mem_total);
sleep(5);
}
return 0;
}
CPU 使用率
top
命令中默认显示的是各状态的CPU使用率,man top
在2b章节可以看到各状态的含义:
us, user : time running un-niced user processes
sy, system : time running kernel processes
ni, nice : time running niced user processes
id, idle : time spent in the kernel idle handler
wa, IO-wait : time waiting for I/O completion
hi : time spent servicing hardware interrupts
si : time spent servicing software interrupts
st : time stolen from this vm by the hypervisor
下面提到了t
命令可以切换CPU使用率的显示方式,4b章节有更详细的命令介绍。在top
命令的主界面输入t
切换后就可以在条形图旁边看到整体的CPU使用率。
top
命令也在procps-ng
包里,源文件同样在procps仓库中。summary_show()
函数负责显示概要,可以看到是先调用了cpus_refresh()
刷新CPU相关信息,再调用summary_hlp()
进行输出。
函数summary_hlp()
中与显示的相关代码是:
show_special(0, fmtmk("%%%s ~3%#5.1f~2/%-#5.1f~3 %3.0f[~1%-*s]~1\n"
, pfx, pct_user, pct_syst, pct_user + pct_syst, Graph_len +4, dual));
所以整体的CPU使用率由pct_user
和pct_syst
相加得到,查看前面的相关代码,整理后可以得到:
total = (cur.u - sav.u) + (cur.s - sav.s)
+ (cur.n - sav.n) + (cur.i - sav.i)
+ (cur.w - sav.w) + (cur.x - sav.x)
+ (cur.y - sav.y) + (cur.z - sav.z);
pct_user = ((cur.u - sav.u) + (cur.n - sav.n)) * 100 / total;
pct_syst = ((cur.s - sav.s) + (cur.x - sav.x)+ (cur.y - sav.y)) * 100 / total;
其中cur
和sav
来自cpus_refresh()
,分别是本次和上次读取/proc/stat
保存的值,对应:
u --> user
n --> nice
s --> system
i --> idle
w --> iowait
x --> irq
y --> softirq
z --> steal
命令htop
默认显示的是各个CPU的平均使用率,按F2进入设置界面后可以在Meters
栏添加CPU average
的条形图,显示的就是整体的CPU使用率。
htop
的源码托管在GitHub的hishamhm/htop仓库中,顶层目录下的CPUMeter.c
就是与CPU使用率相关的源文件,代码中的CPUMeter_updateValues()
函数调用了Platform_setCPUValues()
来获取CPU的使用率,Linux系统对应的Platform_setCPUValues()
函数在linux/Platform.c
源文件中:
double total = (double) ( cpuData->totalPeriod == 0 ? 1 : cpuData->totalPeriod);
double percent;
double* v = this->values;
v[CPU_METER_NICE] = cpuData->nicePeriod / total * 100.0;
v[CPU_METER_NORMAL] = cpuData->userPeriod / total * 100.0;
if (this->pl->settings->detailedCPUTime) {
v[CPU_METER_KERNEL] = cpuData->systemPeriod / total * 100.0;
v[CPU_METER_IRQ] = cpuData->irqPeriod / total * 100.0;
v[CPU_METER_SOFTIRQ] = cpuData->softIrqPeriod / total * 100.0;
v[CPU_METER_STEAL] = cpuData->stealPeriod / total * 100.0;
v[CPU_METER_GUEST] = cpuData->guestPeriod / total * 100.0;
v[CPU_METER_IOWAIT] = cpuData->ioWaitPeriod / total * 100.0;
Meter_setItems(this, 8);
if (this->pl->settings->accountGuestInCPUMeter) {
percent = v[0]+v[1]+v[2]+v[3]+v[4]+v[5]+v[6];
} else {
percent = v[0]+v[1]+v[2]+v[3]+v[4];
}
} else {
v[2] = cpuData->systemAllPeriod / total * 100.0;
v[3] = (cpuData->stealPeriod + cpuData->guestPeriod) / total * 100.0;
Meter_setItems(this, 4);
percent = v[0]+v[1]+v[2]+v[3];
}
根据cat .config/htop/htoprc
的输出,可以得到detailed_cpu_time=0
和account_guest_in_cpu_meter=0
,所以在自己的笔记本上是percent = v[0]+v[1]+v[2]+v[3]
,在CPUMeter.h
中可以看到:
typedef enum {
CPU_METER_NICE = 0,
CPU_METER_NORMAL = 1,
CPU_METER_KERNEL = 2,
CPU_METER_IRQ = 3,
CPU_METER_SOFTIRQ = 4,
CPU_METER_STEAL = 5,
CPU_METER_GUEST = 6,
CPU_METER_IOWAIT = 7,
CPU_METER_ITEMCOUNT = 8, // number of entries in this enum
} CPUMeterValues;
所以CPU使用率计算公式为percent = (cpuData->nicePeriod + cpuData->userPeriod + cpuData->systemAllPeriod + cpuData->stealPeriod + cpuData->guestPeriod) * 100 / cpuData->totalPeriod
。有关cpuData
的计算可以在linux/LinuxProcessList.c
源文件中找到具体代码,对应函数为LinuxProcessList_scanCPUTime()
,再次整理可以得到:
userPeriod = (cur.user - cur.guest) - (sav.user - sav.guest);
nicePeriod = (cur.nice - cur.guestnice) - (sav.nice - sav.guestnice);
systemAllPeriod = (cur.system + cur.irq + cur.softirq)
- (sav.system + sav.irq + sav.softirq);
stealPeriod = cur.steal - sav.steal;
guestPeriod = (cur.guest + cur.guestnice) - (sav.guest + sav.guestnice);
totalPeriod = (cur.user + cur.nice + cur.system
+ cur.irq + cur.softirq + cur.idle + cur.iowait
+ cur.steal + cur.guest + cur.guestnice)
- (sav.user + sav.nice + sav.system
+ sav.irq + sav.softirq + sav.idle + sav.iowait
+ sav.steal + sav.guest + sav.guestnice);
两者对比的话htop
的计算应该是更加全面一些,所以准备采取htop
的计算方法,示例代码如下:
#include<stdio.h>
#include<string.h>
#include<unistd.h>
#include<fcntl.h>
#define ull unsigned long long
// man proc
typedef struct CPUData{
ull user;
ull nice;
ull system;
ull idle;
ull iowait;
ull irq;
ull softirq;
ull steal;
ull guest;
ull guestnice;
ull total;
}CPUData;
int main(void)
{
int fd;
if((fd = open("/proc/stat", O_RDONLY)) == -1)
return -1;
CPUData cur, sav;
memset(&sav, 0, sizeof(sav));
int cnt = 0;
char buf[1024];
while(1)
{
lseek(fd, 0, SEEK_SET);
if((cnt = read(fd, buf, sizeof(buf)-1)) < 0)
return -1;
buf[cnt] = '\0';
memset(&cur, 0, sizeof(cur));
sscanf(buf, "cpu%llu%llu%llu%llu%llu%llu%llu%llu%llu%llu",
&cur.user, &cur.nice, &cur.system,
&cur.idle, &cur.iowait, &cur.irq, &cur.softirq,
&cur.steal, &cur.guest, &cur.guestnice);
cur.total = cur.user + cur.nice + cur.system + cur.idle
+ cur.iowait + cur.irq + cur.softirq + cur.steal;
ull userPeriod = (cur.user - cur.guest) - (sav.user - sav.guest);
ull nicePeriod = (cur.nice - cur.guestnice) - (sav.nice - sav.guestnice);
ull systemAllPeriod = (cur.system + cur.irq + cur.softirq)
- (sav.system + sav.irq + sav.softirq);
ull stealPeriod = cur.steal - sav.steal;
ull guestPeriod = (cur.guest + cur.guestnice) - (sav.guest + sav.guestnice);
ull totalPeriod = cur.total - sav.total;
float percent = (nicePeriod + userPeriod + systemAllPeriod
+ stealPeriod + guestPeriod) * 100.0 / totalPeriod;
if(sav.total)
printf("cpu_usage: %.1f%%\n", percent);
sav = cur;
sleep(5);
}
return 0;
}