Linux内核之物理内存组织结构

一、系统调用mmap
虚拟内存区域使用起始地址和结束地址描述，链表按起始地址递增排序。两系统调用区别：mmap指定的偏移的单位是字节，而mmap2指定的偏移的单位是页。arm64架构实现系统调用mmap。
二、系统调用munmap
系统调用munmap用来删除内存映射，它有两个参数：起始地址和长度即可。它的主要工作委托给内核源码文件处理“mm/mmap.c”中的函数do_munmap。
vm_munmap-->do_munmap -->vma = find_vma(mm,start)        -->error = __split_vma(mm,vma,start,0)        -->last = find_vma(mm,end)        -->int error = __split_vma(mm,last,end,1)        -->munlock_vma_pages_all -->detach_vmas_to_be_unmapped -->unmap_region -->arch_unmap -->remove_vma_list
vma = find_vma(mm,start);//根据起始地址找到要删除的第一个虚拟内存区域vma
error = __split_vma(mm,vma,start,0);//如果只删除虚拟内存区域vma的部分，那么分裂虚拟内存区域vma
last = find_vma(mm,end);//根据结束地址找到要删除的最后一个虚拟内存区域vma
int error = __split_vma(mm,last,end,1);//如果只删除虚拟内存区域last的一部分，那么分裂虚拟内存区域vma
munlock_vma_pages_all;//针对所有删除目标，如果虚拟内存区域被锁定在内存中（不允许换出到交换区），调用函数解除锁定
detach_vmas_to_be_unmapped;//调用此函数，把所有删除目标从进程虚拟内存区域链表和树中删除，单独组成一条临时链表
unmap_region;//调用此函数，针对所有删除目标，在进程的页表中删除映射，并且从处理器的页表缓存中删除映射
arch_unmap;//调用此函数执行处理器架构特定的处理操作
remove_vma_list;//调用此函数，删除所有目标
三、物理内存组织结构
1.体系结构
目前多处理器系统有两种体系结构：
非一致内存访问（non-unit memory access，numa）：指内存被划分成多个内存节点的多处理器系统。访问一个内存节点花费的时间取决于处理器和内存节点的距离。
对称多处理器（sysmmetric muti-processor，smp）：即一致内存访问（uniform memory access，uma），所有处理器访问内存花费的时间是相同的。
2.内存模型
内存模型是从处理器角度看到的物理内存分布，内核管理不同内存模型的方式存在差异。内存管理子系统支持3种内存模型：
平坦内存（flat memory）：内存的物理地址空间是连续的，没有空洞。
不连续内存（discontiguous memory）：内存的物理地址空间存在空洞，这种模型可以高效地处理空洞。
稀疏内存（space memory）：内存物理地址空间存在空洞，如果要支持内存热插拔，只能选择稀疏内存模型。
3.三级结构
内存管理子系统使用节点（node）、区域（zone）、页（page）三级结构描述物理内存。
3.1 内存节点--->分为两种情况
numa体系的内存节点，根据处理器和内存距离划分；
在具有不连续内存的numa系统中，表示比区域的级别更高的内存区域，根据物理地址是否连续，每块物理地址连续的内存是一个内存节点。
内存节点使用结构体pglist_data描述内存布局
linux内核源码如下：
typedef struct pglist_data { struct zone node_zones[max_nr_zones]; // 内存区域数组 struct zonelist node_zonelists[max_zonelists]; // 备用区域列表 int nr_zones; // 该节点包含内存区域数量#ifdef config_flat_node_mem_map /* means !sparsemem */ // 除了稀疏内存模型以外 struct page *node_mem_map; // 页描述符数组#ifdef config_page_extension struct page_ext *node_page_ext; // 页的扩展属性#endif#endif#ifndef config_no_bootmem struct bootmem_data *bdata;#endif#ifdef config_memory_hotplug /*  * must be held any time you expect node_start_pfn, node_present_pages  * or node_spanned_pages stay constant.  holding this will also  * guarantee that any pfn_valid() stays that way.  *  * pgdat_resize_lock() and pgdat_resize_unlock() are provided to  * manipulate node_size_lock without checking for config_memory_hotplug.  *  * nests above zone->lock and zone->span_seqlock  */ spinlock_t node_size_lock;#endif unsigned long node_start_pfn; // 该节点的起始物理页号 unsigned long node_present_pages; /* 物理页总数 */ unsigned long node_spanned_pages; /* 物理页范围总的长度，包括空间*/     int node_id; // 节点标识符 wait_queue_head_t kswapd_wait; wait_queue_head_t pfmemalloc_wait; struct task_struct *kswapd; /* protected by        mem_hotplug_begin/end() */ int kswapd_max_order; enum zone_type classzone_idx;#ifdef config_numa_balancing /* lock serializing the migrate rate limiting window */ spinlock_t numabalancing_migrate_lock; /* rate limiting time interval */ unsigned long numabalancing_migrate_next_window; /* number of pages migrated during the rate limiting time interval */ unsigned long numabalancing_migrate_nr_pages;#endif#ifdef config_deferred_struct_page_init /*  * if memory initialisation on large machines is deferred then this  * is the first pfn that needs to be initialised.  */ unsigned long first_deferred_pfn;#endif /* config_deferred_struct_page_init */} pg_data_t;
node_mem_map此成员指向页描述符数组，每个物理页对应一个页描述符。
node是内存管理最顶层的结构，在numa架构下，cpu平均划分为多个node，每个node有自己的内存控制器及内存插槽。cpu访问自己node上内存速度快，而访问其他cpu所关联node的内存速度慢。uma被当作只一个node的numa系统。
3.2 内存区域（zone）
内存节点被划分为内存区域。linux内核源码分析：include/linux/mmzone.h
enum zone_type {#ifdef config_zone_dma /*  * zone_dma is used when there are devices that are not able  * to do dma to all of addressable memory (zone_normal). then we  * carve out the portion of memory that is needed for these devices.  * the range is arch specific.  *  * some examples  *  * architecture  limit  * ---------------------------  * parisc, ia64, sparc <4g  * s390   <2g  * arm   various  * alpha  unlimited or 0-16mb.  *  * i386, x86_64 and multiple other arches  *    > page_shift */ unsigned long  zone_start_pfn; // 当前区域的起始物理页号 /*  * spanned_pages is the total pages spanned by the zone, including  * holes, which is calculated as:  *  spanned_pages = zone_end_pfn - zone_start_pfn;  *  * present_pages is physical pages existing within the zone, which  * is calculated as:  * present_pages = spanned_pages - absent_pages(pages in holes);  *  * managed_pages is present pages managed by the buddy system, which  * is calculated as (reserved_pages includes pages allocated by the  * bootmem allocator):  * managed_pages = present_pages - reserved_pages;  *  * so present_pages may be used by memory hotplug or memory power  * management logic to figure out unmanaged pages by checking  * (present_pages - managed_pages). and managed_pages should be used  * by page allocator and vm scanner to calculate all kinds of watermarks  * and thresholds.  *  * locking rules:  *  * zone_start_pfn and spanned_pages are protected by span_seqlock.  * it is a seqlock because it has to be read outside of zone->lock,  * and it is done in the main allocator path.  but, it is written  * quite infrequently.  *  * the span_seq lock is declared along with zone->lock because it is  * frequently read in proximity to zone->lock.  it's good to  * give them a chance of being in the same cacheline.  *  * write access to present_pages at runtime should be protected by  * mem_hotplug_begin/end(). any reader who can't tolerant drift of  * present_pages should get_online_mems() to get a stable value.  *  * read access to managed_pages should be safe because it's unsigned  * long. write access to zone->managed_pages and totalram_pages are  * protected by managed_page_count_lock at runtime. idealy only  * adjust_managed_page_count() should be used instead of directly  * touching zone->managed_pages and totalram_pages.  */ unsigned long  managed_pages; // 伙伴分配器管理的物理页的数量 unsigned long  spanned_pages; // 当前区域跨越的总页数，包括空洞 unsigned long  present_pages; // 当前区域存在的物理页的数量，不包括空洞 const char  *name; // 区域名称#ifdef config_memory_isolation /*  * number of isolated pageblock. it is used to solve incorrect  * freepage counting problem due to racy retrieving migratetype  * of pageblock. protected by zone->lock.  */ unsigned long  nr_isolate_pageblock;#endif#ifdef config_memory_hotplug /* see spanned/present_pages for more description */ seqlock_t  span_seqlock;#endif /*  * wait_table  -- the array holding the hash table  * wait_table_hash_nr_entries -- the size of the hash table array  * wait_table_bits -- wait_table_size == (1 << wait_table_bits)  *  * the purpose of all these is to keep track of the people  * waiting for a page to become available and make them  * runnable again when possible. the trouble is that this  * consumes a lot of space, especially when so few things  * wait on pages at a given time. so instead of using  * per-page waitqueues, we use a waitqueue hash table.  *  * the bucket discipline is to sleep on the same queue when  * colliding and wake all in that wait queue when removing.  * when something wakes, it must check to be sure its page is  * truly available, a la thundering herd. the cost of a  * collision is great, but given the expected load of the  * table, they should be so rare as to be outweighed by the  * benefits from the saved space.  *  * __wait_on_page_locked() and unlock_page() in mm/filemap.c, are the  * primary users of these fields, and in mm/page_alloc.c  * free_area_init_core() performs the initialization of them.  */ wait_queue_head_t *wait_table; unsigned long  wait_table_hash_nr_entries; unsigned long  wait_table_bits; zone_padding(_pad1_) /* free areas of different sizes */ struct free_area free_area[max_order]; // 不同长度的空间区域 /* zone flags, see below */ unsigned long  flags; /* write-intensive fields used from the page allocator */ spinlock_t  lock; zone_padding(_pad2_) /* write-intensive fields used by page reclaim */ /* fields commonly accessed by the page reclaim scanner */ spinlock_t  lru_lock; struct lruvec  lruvec; /* evictions & activations on the inactive file list */ atomic_long_t  inactive_age; /*  * when free pages are below this point, additional steps are taken  * when reading the number of free pages to avoid per-cpu counter  * drift allowing watermarks to be breached  */ unsigned long percpu_drift_mark;#if defined config_compaction || defined config_cma /* pfn where compaction free scanner should start */ unsigned long  compact_cached_free_pfn; /* pfn where async and sync compaction migration scanner should start */ unsigned long  compact_cached_migrate_pfn[2];#endif#ifdef config_compaction /*  * on compaction failure, 1< 3.3 物理页
页是内存管理当中最小单位，页面中的内存其物理地址是连续的，每个物理页由struct page描述。为了节省内存，struct page是个联合体。
页，又称为页帧，在内核当中，内存管理单元mmu（负责虚拟地址和物理地址转换的硬件）是把物理页page作为内存管理的基本单位。体系结构不同，支持的页大小也不同。（32位体系结构支持4kb的页、64位体系结构支持8kb的页、mips64架构体系支持16kb的页）

阿里云诉华为软件涉嫌提供虚假材料谋取中标
TFT LCD面板需求扩长 IC产能暴增
贸泽电子赞助推出《深圳Style》第二期，且看无人车厂商独辟蹊径
DIY主机玩家应谨防这四个大坑
联发科助力阿里天猫精灵再发人工智能终端连接更多的智能家居设备
Linux内核之物理内存组织结构
DA和AD有什么不同？
激光焊接广泛地应用在医疗器械领域
如何看懂变频器的铭牌？
一款简单的火灾报警电路图
盘点AI技术在多媒体中的应用场景
亚锦科技退出云南联通混改
艾滋病疫苗最新消息：荧光激活细胞分选术助力HIV疫苗取得突破
华为重磅发布了业界首款基于ASIC芯片的230MHz商用终端模组eM600
什么是相电压,什么是线电压,什么是相量？
基于正压睡眠呼吸机硬件设计方案
Arduino地震仪的制作教程
浅谈5G毫米波射频线板连接器
The Cube的集装箱是一个模块化的智能农场
苹果iPhone8多少钱?国行版iPhone8售价、发布时间确定,iPhone8真机曝光值得期待