The Translation Lookaside Buffer - TLB
The Translation Lookaside Buffer (TLB) is a hardware cache that stores recent translations of virtual memory addresses to physical memory addresses.
It is a crucial component in modern computer architectures that use virtual memory, as it helps improve the performance of memory accesses by
reducing the number of expensive page table lookups required.
Here's how the TLB works:
Virtual to Physical Address Translation:
When a CPU needs to access a memory location, it first looks up the virtual address in the TLB. The TLB stores recent translations of virtual addresses to their corresponding physical addresses.
TLB Hit:
If the virtual address is found in the TLB, the physical address is retrieved directly from the TLB, which is a very fast operation. This is called a "TLB hit".
TLB Miss:
If the virtual address is not found in the TLB, the CPU needs to perform a page table lookup to translate the virtual address to a physical address. This is a much slower operation, called a "TLB miss".
Page Table Lookup:
When a TLB miss occurs, the CPU looks up the virtual address in the page tables, which are data structures stored in main memory that map virtual addresses to physical addresses. This lookup can be a multi-level process, depending on the page table structure.
TLB Update:
Once the physical address is obtained from the page tables, the TLB is updated with the new virtual-to-physical address translation, so that future accesses to the same virtual address can be served quickly.
TLB Organization:
The TLB is typically organised as a set-associative cache, with multiple "ways" or sets that can hold different virtual-to-physical address translations.
The number of ways and the total size of the TLB can vary widely between different CPU architectures, ranging from as little as 4 ways and a few hundred entries to 8, 12, or even 16 ways with thousands of entries.
The replacement policy for the TLB (i.e., which entry to evict when a new translation needs to be added) is usually a variation of the Least Recently Used (LRU) policy.
TLB Lookup Process:
When the CPU needs to access a memory location, it first checks the TLB for the corresponding virtual-to-physical address translation.
The TLB lookup is performed in parallel with the page table lookup, which is a more complex and time-consuming operation.
If the virtual address is found in the TLB (a TLB hit), the physical address is immediately available, and the memory access can proceed without delay.
If the virtual address is not found in the TLB (a TLB miss), the CPU must perform a page table lookup to translate the virtual address to a physical address.
TLB Management:
The TLB is managed by the Memory Management Unit (MMU), which is a hardware component responsible for handling virtual-to-physical address translations.
The operating system plays a crucial role in managing the TLB, as it is responsible for updating the page tables and invalidating TLB entries when necessary (e.g., when a page is swapped out or the memory mapping changes).
TLB misses can be reduced by optimising the page table structure, using larger page sizes, and employing strategies like multi-level page tables and inverted page tables.
TLB Performance Impact:
The TLB can have a significant impact on system performance, as TLB misses can lead to significant delays due to the page table lookup process.
A high TLB miss rate can result in a large number of page faults, which can severely degrade system performance.
Improving TLB performance is an important design goal in modern CPU architectures, and techniques like increasing the TLB size, implementing multi-level TLBs, and using specialized TLB management algorithms are often employed.
TLB Virtualisation:
In virtualised environments, the hypervisor must manage the TLB to ensure that virtual machines (VMs) can access their own virtual memory spaces efficiently.
This can involve techniques like nested page tables, where the hypervisor maintains a second level of page tables to translate the guest OS's virtual addresses to the host's physical addresses.
Efficient TLB management is critical in virtualised environments, as TLB misses can have an even greater impact on performance due to the additional virtualisation overhead.
Typical Translation Lookaside Buffer (TLB) sizes can vary based on the specific computer architecture and the target application, but here are some common TLB sizes found in modern systems:
x86 Processors:
Intel Core i7/i9 processors: 64-entry instruction TLB and 64-entry data TLB for 4KB pages.AMD Ryzen processors: 48-entry instruction TLB and 32-entry data TLB for 4KB pages.
ARM Processors:
Arm Cortex-A72: 48-entry instruction TLB and 32-entry data TLB for 4KB pages.Arm Cortex-A53: 32-entry instruction TLB and 32-entry data TLB for 4KB pages.
PowerPC Processors:
IBM POWER8: 128-entry instruction TLB and 128-entry data TLB for 4KB pages.IBM POWER9: 128-entry instruction TLB and 128-entry data TLB for 4KB pages.
RISC-V Processors:
SiFive U74-MC Core Complex: 32-entry instruction TLB and 32-entry data TLB for 4KB pages.Microchip PolarFire RISC-V SoC: 16-entry instruction TLB and 16-entry data TLB for 4KB pages.

