This is short dialog, where Ralf explained me (or rather confirmed and clarified my understanding of) how Linux page tables are structured. I slightly edited it to remove unrelated remarks by other channel members. This discussion took place on Nov 08, 2002 and pertains to Linux 2.5.46 specifically, though applies to 2.4 series, as well as most likely many future versions. You can find detailed unalysis of Linux page tables as of version 2.4 in Alessandro Rubbini's book Linux Device Drivers. It is great book, and has lots of intersting details, at least enough for those who want to write device drivers. However, not quite enough for thos who are looking for bugs in Linux mm.
* Bacchus thinks iluxa will still need to learn alot about mm :)
<iluxa> Defintely :)
<Bacchus> iluxa: I'm splitting pgtable.h in 2.4 also, just like in 2.5.
<iluxa> And how was it split in 2.5?
<Bacchus> The cache stuff went to <asm/cacheflush.h>
<iluxa> ic
<iluxa> Bacchus: what does 'pfn' stand for?
<Bacchus> page frame number.
<iluxa> And what is 'frame' in this context?
<Bacchus> Good question, next question :)
<iluxa> Can you explain page tables structure in popular way? :)
<Bacchus> Basically it's just the page number.
<Bacchus> 3 level tree.
<iluxa> Well, that I know :)
<iluxa> OK, let me try to express my understanding of it, and then you correct me
<iluxa> There is pgd (page global directory?) that is just an array of pointers to arrays of pmd's (page middle dirrectory?) that hold pointers to arrays of pte's
<iluxa> Am I completely off track, or am I close to truth?
<Bacchus> Perfectly right.
<iluxa> OK, then only questionis what is used for index in each of these arrays
<Bacchus> It's actually simpler on 64-bit than on 32-bit.
<Bacchus> The virtual address.
<Bacchus> The lowest PAGE_SHIFT bits will not be translated through the page table.
<iluxa> you mean some part of virtaddress>>PAGE_SHIFT?
<Bacchus> No.
<iluxa> like top x bits for pgd, next y bits for pmd, then finally last z bits for pte?
<iluxa> no?
<Bacchus> Ah, yes.
<Bacchus> But vritaddress>>page_shift is a bit too trivial :)
<iluxa> Yeah, I understand that part
<Bacchus> That would be the index for a one level page table or so.
<iluxa> And way to large table too :)
<Bacchus> Yep :)
* iluxa is amazed by the fact he understood it all correctly
<iluxa> Then only question remaining is where does struct page come into play, and whether pfn's have something to do with it
<iluxa> What I mean, is how do we locate struct page*, when given pte?
<Bacchus> Not at all.
<iluxa> Bacchus: does any given struct page always correspond to page in physical memory (or at least physical address), or could it be holding information on, let's say, swapped out page
<Bacchus> BS...
<Bacchus> I was thinking of something else.
<Bacchus> Basically what you do it:
<Bacchus> mem_map + (physical_address >> PAGE_OFFSET)
<wli> ow!!
<wli> pfn_to_page(physical_address >> PAGE_SHIFT) please =)
<Bacchus> ;)
<iluxa> What is intended difference between pmd_page and pmd_page_kernel?
<wli> pmd_page_kernel() gives the kvaddr of the actual contents of the page, pmd_page() returns a page descriptor
<wli> pmd_page_kernel() can obviously only be used for pmd's whose contents are guaranteed to be low memory.
<iluxa> What is page describtor?
<Bacchus> iluxa: struct page *
<wli> struct page
<iluxa> struct page?
<iluxa> lol