diff options
Diffstat (limited to 'src/posts/primer-on-memory-management/index.md')
-rw-r--r-- | src/posts/primer-on-memory-management/index.md | 725 |
1 files changed, 725 insertions, 0 deletions
diff --git a/src/posts/primer-on-memory-management/index.md b/src/posts/primer-on-memory-management/index.md new file mode 100644 index 0000000..5e41448 --- /dev/null +++ b/src/posts/primer-on-memory-management/index.md @@ -0,0 +1,725 @@ +--- +title: A Primer on Memory Management +description: All programing languages have variables in them but have you imagined on where are they stored in memory and how it is managed. Also what is this memory safety and memory leaks which get mentioned around? Lets go to the deep dive into this concept + +--- + +## Back to Basics + +OK, so what is a variable? + +A variable stores some data. More precisely it is an abstraction to refer to +some data that is stored in memory. It can be an integer, a string or anything +else. This data can be anything such as an integer, a character, a +floating-point number, a string or even a more complex structure. It is composed +of 2 attributes: + +1. Data: The actual value or content stored in memory. + +2. Data Type: Information about what kind of data the variable holds and how it + should be handled. + +### Where are the variable are stored? + +*[RAM]: Random Access Memory + +Some variables are stored in CPU registers and some are only used during compile +time. But CPU register are limited and have a small size and compile time +variable are constants that the compiler can optimise out. The rest of the +variables are stored in the devices memory, its primary memory (RAM). + +The primary memory can be imagined as a massive array of bytes. On 32-bit device +the array size would be 4 GiB and 64-bit systems are unrealistically big. Now +the type of the variable contains 2 attributes to start with: size and +alignment. + +#### Size + +Size is how much space the variable uses in memory. In out array analogy it the +number of continuous array elements it will consume. + +#### Alignment + +*[ISA]: Instruction Set Architecture + +On several ISA like x86_64, this does not matter but in several ISAs it does +such as MIPS. Now what is alignment? Alignment is the constraint that the memory +address of a variable must be a multiple of some power of two, typically its +size (or natural alignment). For primitive types, the variable is aligned such a +way that the memory address is divisible to its size. An example is if a 16-bit +`int` will be aligned 2 bytes and therefore can only start if the memory address +is divisible by 2 such as 0x100 or 0x102 but not 0x101 or 0x103. This has some +interesting ramification when we deal with composite types such as structs or +tuples. As alignment is usually only relevant when we talk about primitive types +such as `int`, `char` or `double`, a struct aligns it member according to its +alignment and therefore will need to insert padding in between members. + +```c +struct data { + char character; + uint64_t id; +}; +``` + +Offset | Content +-------|----------------------- +0x0000 | char character (1B) +0x0001 | padding (7B) +0x0008 | uint64_t id (8B) + +The above example is an extreme but shows how is padding. Now in some compilers +the packed struct which disable padding but this is not portable as not all CPUs +can handle such code (but most mainstream ones can). Padding also means you may +want to be careful on order structure field so you do not waste space in +padding. + +### Assignment + +Another important operation all variables can do (Ignoring if you should) is +assignment. Its can be thought as an universal operation. In the case of +assigning, the value is copying the data. C's `memcpy` is basically the lowest +assignment operator at its raw form (for Plain Old Data). + +Now there are caveats to this but we will bring them up later in this post when +we talk about lifetimes. In fact this especially obvious you this with big +objects where some compiler (like gcc or clang) just convert it to a call to +`memcpy`. + +``` c +// A 64 kiB struct, too big for register +struct a { + char a[1<<16]; +}; + +static struct a b; + +void a(struct a a) { +// assignment + b = a; +} +``` + +The assembly output on x86_64: + +``` nasm +a: + push rbp + mov rbp, rsp + lea rsi, [rbp + 16] + lea rdi, [rip + b] + mov edx, 65536 ;; also this is assignment in x86 + call memcpy@PLT ;; notice the memcpy + pop rbp + ret +``` + +## Refer and Point + +As mention above you copy the data on each assignment. This means in places like +function calls, the function has a copy of the original and its modification is +not sent outside. So there is special type of variable call pointers. Pointer +are simply variables that reference another variable. In most cases, this can be +the memory address and in a way the index number of the array analogy earlier. +This is not full accurate as modern OS uses virtual memory so it is the virtual +memory seen by the program is also seen by the assembly code, so this +abstraction doesn't affect low-level reasoning. + +As mentioned, a pointers are simply variables whose data are being referenced +and they simply memory addresses which are unsigned integers. So this means you +can do a set of operations on it. + +### Load and Store operations + +As the data is referencing another variable and if you want read or write to the +referenced variable, you simply load or store. Load is reading from memory and +store is the write. Now to make life easier this is combined to a single +operator call dereferencing operator. You can see the usage in C and Zig bellow. + +In C + +``` c +*x = 10; // Store +y = *x; // Load +``` + +In Zig: + +``` zig +x.* = 10; // Store +y = x.*; // Load +``` + +### Pointer Arithmetic + +As a pointer is effectively an integer and therefore with some caveats. Addition +and subtraction of a pointer works. When adding as alignment is important, it +treats it as an array of the type and we going to the next index. Similarly when +subtracting from a regular integer. When 2 pointer subtract each other, we get +the offset in memory. Note the caveat about as mentioned above, this is only +true inside an object and not generally. + +This can be seen in C's arrays it simply a syntax sugar when both pointer +arithmetic and dereferencing is done. + +``` c +a[1] = 10; +// is equivalent +*(a + 1) = 10; +``` + +### References + +A reference is more abstract concept than pointers. They reference another +object like pointers but do not necessary contain the memory address and are +more abstract. They can do load and store but not pointer arithmetic. Some +languages uses this as a safer version to pointers such as C++. I will also +mention about Reference Counting and Handles which are a common way to do this +in several languages. + +### Why do you need references?? + +Well to understand, we need to remember how assignment operator works. It simply +copies data. In fact the parameters of a function are assignments. So when one +passes a parameter, the original variable is not modified. + +``` c +#include <assert.h> +#include <stdlib.h> + +struct foo { + char var; +}; + +void bar(struct foo self) { + self.var = 'a'; // Will only modify local copy +} + +int main(void) { + auto foo = (struct foo){ .var='z' }; + bar(foo); //send a copy + assert(foo.var == 'a'); //Will fail + return EXIT_SUCCESS; +} +``` + +Now if you do the same for any reference, you are not copies the variable but +the just the reference. For a pointer that will be just the memory address being +copied. Now since the only reference is copied, this has interesting implication +when doing store. When you dereference and write to the variable, it will update +the original variable. Even in places like functions. + +So lets repeat the above an example with references instead. + +``` c +#include <assert.h> +#include <stdlib.h> + +struct foo { + char var; +}; + +void bar(struct foo* self){ // passing a pointer + (*self).var = 'a'; // Will modify the original +} + +int main(void){ + auto foo = (struct foo){.var='z'}; + bar(&foo); //send a reference + assert(foo.var == 'a'); //Will pass + return EXIT_SUCCESS; +} +``` + +Also if you are working on certain algorithms or data structures, this allows +you store the reference for later processing. One point to add is that assigning +pointers is often call shallow copy as the pointer is copied but not the data. + +Another thing to note is the time of copying. If the variable is less that the +word size of the CPU or in other words the biggest register, there is not much +performance penalty. But if you are transferring a bigger object such as a big +data structure, there is a visible cost in the copy. This is negligible unless +you are copying a multi kilobyte struct but its something to keep in mind. + +*[CISC]: Complex Instruction Set Architecture + +This also leads to an interesting trivia. As most variable are not allocated to +the registers but most CPU operation must work on registers which means in +machine code the variable is almost always a pointer and all operation are +surrounded by a load or store when doing anything. CISC hide this in the +microcode but look at any MIPS assembly output and the code will be littered +with `LW` and `SW` as shown bellow for adding 2 variables. + +``` asm +; int c = a + b; + lw $1, 12($fp) ; load + lw $2, 8($fp) ; load + addu $1, $1, $2 ; the actual action + sw $1, 4($fp) ; store +``` + +Functions in compiled code are invoked via their memory address which is +referred to as function pointers in C-like languages. + +Another interesting thing which is possible although unsafe is that if you have +pointer to a member of the struct and have its offset in the struct, you can +obtain the reference to its entire struct. This has several use cases such as +how subtyping polymorphism can be implemented but that is off topic for now. + +## Advanced Hardware Memory Models + +The above model is simplistic view. There are few memory models in the wild and +some nuances that then to be overlooked which is important to take note + +### Basic Units of Memory + +This seems simple at first as most CPU have few variations here but this is +still important to note. + +**Bits** is a simply represent a single base 2 digit (0 and 1). + +**Bytes** is the basic addressable memory of a system. On almost all modern +systems it is 8 bits. In fact this is mandated by POSIX and WIN 32 so the on +systems are legacy systems or embedded system. Most languages also make this +assumption (not C but most libraries make this assumption anyway). + +**Words** is the unit of data that that a CPU processes natively in one +operation. Usually this represent the register size or the size of pointer. This +is usually `uintptr_t` in C. 32-bit systems have 3- bit word size while 64-bit +have 64-bit word size. + +### Memory Models + +The flat array model of memory is an oversimplification which is good enough for +most use case. But it still good to know underneath the hood too. + +#### Von Neumann Architecture + +Von Neuman Architecture is where code and data use the same memory bus and +therefore the same address space. This is basically the same as the model +mentioned earlier. Most modern non-embedded CPUs are using this architecture +with virtual memory. The biggest limitation of the model is there is a bottle +neck due to fetching code and data is the same bus (not parallel). + +#### Harvard Architecture + +*[JIT]: Just In Time Compilation *[ROM]: Read only Memory + +The data and code use different memory bus and therefore different addressing +space. This is common in embedded where the code is in ROM. Self-modifying and +JIT code is hard or impossible to implement. Probably the most famous one is AVR +used by Arduino. + +#### Segmented Memory + +Mostly obsolete but was popular in the past. In this the memory was divided into +segments. Each segment has the addressable memory space (this is the pointer). A +way to think of this is the RAM is a 2D array with segment being the new +dimension. Was used as a hack to deal with small word sizes in the 16 bit era. + +#### Virtual Memory + +*[MMU]: Memory Management Unit + +This is an abstraction found on modern system. In this case the userspace gets +unique view of the memory (per process normally). This is managed by the MMU +which the kernel controls. The RAM is divided in pages and each page is mapped +to a process. This effectively sandboxes the processes so they do not access +other processes memory. + +#### Cache + +Modern CPUs need to multiple loads and stores so they have their own internal +memory called the cache. There are usually multiple layers. This is mostly +transparent but there are caveats especially on performance. As cache gets a +copy of a section of memory that is used, which mean that memory locality is +important as objects that are close together in RAM are faster than one father +apart (fragmented memory). This is relevant for the heap mentioned below. + +## Lifetimes and Memory Storage + +Well a variable does not exist from the start till the end of the +application(mostly). Each variable has a lifetime. One thing must be clear is +that any memory that needs to allocated at the start of the lifetime and +deallocated when they are no longer needed. Trying to access variables is to +deallocated pointer is invalid and this is called dangling pointer. The memory +being referenced can be used for something else and therefore the behaviour is +undefined. + +### Static Memory + +Now global variables have a lifetime entire program so it can be allocated with +the entire program and can deallocated when the program end. As long these +objects need to allocate the safe size these variable can always (if compiler +allows) to be allocated in static memory. This is stored in binary in compiled +languages (Sometimes in a way to be auto allocated). + +``` c + +#include <assert.h> +#include <stdlib.h> + +struct foo { + char var; +}; + +static struct foo foo; // Global variable that is allocated in static + +void bar(){ + foo.var = 'a'; // Will modify the global variable +} + +int main(void){ + foo = (struct foo){.var='z'}; + bar(); // Call the function. + assert(foo.var == 'a'); // Will pass as foo is global is shared in all functions + return EXIT_SUCCESS; +} +``` + +There may be copies per thread which have a lifetime equivalent to threads. + +### Stack Memory + +*[LIFO]: Last in First Out *[OS]: Operating System + +Now you have local variables to a particular programming block such as function +or in some case constructs like if-else or loops. As these can be nested or +repeated due to recursion you need multiple copies of them unlike static +variables. But the lifetimes of these variables are equal to the runtime of the +block. To make life easier, the OS allocates a stack which can be used for this. +There are some important things to know about the stack. The stack is LIFO +structure and its top is stored as the stack pointer position. So once +deallocation occurs, the stack point reset to the value before the block. In a +sense the following pseudocode gives an idea how it is done. + +``` cpp +static void *sp = &stack; // The entire stack can be though as a giant static memory + +T* alloc<typename T>(){ + align(&sp, alignof T); // Make sure the new pointer are in the correct alignment + T* new = sp; // set the new pointer to current stack pointer + sp += sizeof T; // Update to stack pointer to length of the pointer + return new; +} + +void dealloc<typename T>(T* old){ + sp = old; // reset +} +``` + +Observe the dealloc function or method. Its really fast but it frees any thing +allocated after it. This means that while it's good for local variables, it also +implies that the return variable of a function must be at the top of the stack +after the function returns. This means returning pointer to local variable is +impossible. And any variable with lifetime that that is unknown and if variable +has an unknown size at compile time can not use stack. + +Another major limitation is stack has limit size of most OSes in the range of +1-16 MiB. If you exceed this, you will face a stack overfull and can not store +more data in the stack (safely without crashing or ending up somewhere you +should not). + +### The Heap + +The last place is the heap. Now when static and stack are unsuitable, there is +the heap. Now heap unlike in stack or static, is not given freely to the user by +the environment but the user must manually allocate the OS from memory and +deallocate. This has few major advantages, mainly as the user is control over +size and lifetimes. In fact the object size is unknown unless the language does +the allocation for the use. + +This is especially used for when lifetimes are very different and custom. +Example are network sockets or widgets. + +``` c +#include <assert.h> +#include <stdlib.h> + +[[nodiscard]] char *bar(){ + char* a = malloc(10); // request 10 bytes of heap from runtime + if (!a){ + return NULL; // malloc fails discuss later + } + strcpy(a, "hello"); + return a; // Will not free a +} + +int main(void){ + auto str = bar(); // Call the function. + if (!str){ + return EXIT_FAILURE; + } + puts(str); // print the string + free(str); // release the memory + return EXIT_SUCCESS; +} +``` + +## Memory Safety + +Memory safety is about avoiding bugs where you violate type or lifetime +guarantees. A few common issues: + +### Unsafe Pointer Arithmetic + +Doing raw arithmetic can take you out of bounds, either into another variable or +into unmapped memory leading to corruption or segmentation faults. + +``` c +char a[16] = {0}; + +// Can give data from another variable or segmentation fault +char const b = *(&a + 17); // out of bounds +char const b = *(&a - 1); // out of bounds +``` + +### Missing Metadata + +In low-level languages like C, heap allocations don't retain type metadata, +which makes safety checks impossible without external info. A good example of +this size which is a source of out of bound error. Most modern languages handle +this as either you have some form of a fat pointer (a pointer with data attached +to it) like slices, span (in C++) or just arrays. + +If not available the solution is generally to store the metadata out of bound by +making your own fat pointer type or request in your function. + +``` c +// example pattern +void memzero(void* data, size_t size); +``` + +### Null Pointer + +This is is a generic invalid pointer found in many languages. It cannot be +dereference and attempt to do so in most language is a runtime error +(segmentation fault in C like languages, exception/panic is languages that have +them). + +The usual semantic is this pointer/reference is not pointing to any variable. If +it is possible to get NULL (the type system does not provide a method to check +at compile time), check it before any for of dereferencing. A lot of languages +have nullish operators (?.) which are useful shortcuts when dealing with null +pointers. + +### Out of Memory Errors + +Most Programs assume ```malloc``` will not error which is mostly true as +operating systems overcommit memory. But still either due lack of overcommit so +out of memory or other reason (such as internal fragmentation). This must be +handled (```malloc``` returns null and exception in certain languages). Be very +careful and never assume ```malloc``` will always return memory. + +### Lifetime Errors + +There are three types of lifetime errors: + +- **Use after free**: using memory after it's been deallocated. Can corrupt +future allocations. + +- **Use before initialised**: using memory before it's been initialised. Can get +garbage data. + +- **Memory leak**: memory is never deallocated after use, slowly consuming RAM. + +In C, manual memory management means you have to: + +- track lifetimes + +- track the ownership + +- store sizes + +- Be careful with pointer arithmetic. + +## Memory Management Approaches + +Languages and runtimes use different strategies to manage memory safely and +efficiently. + +### Garbage Collection + +Used in Python, JavaScript, Java, etc. The runtime tracks which variables are +"still in use" and frees unreferenced ones. This avoids most manual bugs but +comes with costs: + +- Garbage Collection adds runtime overhead and pauses. + +- May not detect cycles without extra work (some Garbage Collectors leak cyclic +data). + +- Performance is unpredictable in tight loops or latency-sensitive systems. + +But this is automatic which can simplify code especially since in some languages +(dynamic and prototypical) an type/object is more complicated in memory then a +simple size+data mentioned earlier. + +A few languages like Go has escape analysis which allows them to determine +whether a variable escapes the current function scope. If it does not, it can be +safely allocated on the stack even when the syntax appears heap-bound. + +### Reference Counting + +A simpler alternative to Garbage Collection (Sort of the simplest and naive +possible implementation). Each object tracks how many references point to it. +When reference count hits zero, it's freed. + +- Easier to reason about. + +- Handles shared ownership. + +- Doesn't clean up cycles (unless using weak references or hybrid collectors). + +A classic example is C++ smart pointers or Rust's Arc. + +The bellow pseudocode shows this + +``` cpp +struct ref<typename T>{ + T* ptr; + unsigned ref_count = 1; +} + +void assign(ref<T> *self){ + self->ref_count++; +} + +void drop(ref<T> *self){ + self->ref_count--; + if(0 == self->ref_count){ + free(self->ptr); // Lifetime of the pointer is over + } +} +``` + +### Ownership + Lifetimes in Type System + +Seen in Rust. The compiler encodes lifetimes and ownership rules directly in +types. + +- Compiler ensures memory safety statically. + +- You can't use freed memory or double-free. + +- But this comes at a cost: increased mental overhead, especially in complex +cases (e.g. double-linked lists, graphs). + +### Arenas / Region Allocators + +*[AST]: Abstract Syntax Tree + +Multiple objects are allocated in a chunk and freed together. Great for when +lifetimes can be grouped (e.g. AST nodes, temporary buffers). + +- Reduces malloc/free churn. + +- Simplifies lifetime logic. + +- But causes waste if lifetimes are mixed (over-retention). + +This is a great technique and is popular in modern C and Zig. + +A method which I personally use in the case of trees is to create an object pool +(which is similar to arena but only for a single type). This simplifies clean up +of tree when done. + +*[RAII]: Resource acquisition is initialization *[OOP]: Object Oriented +Programming + +### RAII + +RAII is a method that can be considered a very simplified of automatic version +of resource management(which heap is a resource). There are 2 variants of this, +defer and destructor. + +In some OOP languages, destructor is called when the pointer in the stack is +freed. This allows cleaning other stuff including the memory holded by the +object. This is a common method in c++ and rust especially when combined with +reference counting. + +Similar pattern for defer but this works by inserting code hooks to the end of +scope so you can call cleanup after you are done. This is in Zig. + +``` zig var list = ArrayList(u8).init(allocator); defer list.deinit(); ``` + +Now this method is not memory management specific and ownership and lifetimes +are technically also for system resources too like files, devices, windows, etc. +So other language have these mechanism although not relevant in this case. + +Although C lacks this, defer can be emulated by goto/longjump and is probably +one the few legitimate uses of that keyword. + +### Free on Exit + +This is a valid method although not recommended. In many runtimes and OS, +exiting the process will free all heap too. So you can avoid freeing the heap. +This can be used if your program is short. Personally this is an anti-pattern +unless you are exiting due to an unrecoverable error/exception. The reason it is +an anti-pattern is that you risk creating a memory leak once the code expand +(the same logic against global variables or singletons). + +### Handles + +This is a mix of Arenas and the abstract reference before. The idea is instead +of providing pointer to an object, pass a handle(simplest is an id). The actual +object is stored in a subsystem's internal array or hash map. This is powerful +as the ownership belong to the subsystem. And depending on the mechanism, use +after free of the handle is detectable if the correct metadata is handled. + +This is powerful but is more complicated and works with more long term systems. + +## Implicit References in High-Level Constructs + +Just in case you are wondering in the scenario when your language does not have +pointers, how it is working? + +Well most language implicitly use pointers for stuff you may not notice except +the pass by reference at times. I will explain why. + +Couple of language features need pointer for implicitly carrying context where +the context of the function is smuggled in. The common features are: + +- Subtyping Polymorphism: In the form of inheritance or interfaces, the method +needs to get the correct type (this or self in many languages). This usual works +with methods having a virtual table and a mechanism (either a pointer in the +virtual table or via some pointer arithmetic) to get the actual type. In case of +prototypical inheritance (in JavaScript and Lua), the object are hash maps that +need to store in RAM due to the nature of mutable hash maps. + +``` go +func sum(x int) func(b int) int { + return func (b int){ + return x+b // x is capture and needs to be stored somewhere + } +} +``` + +- Lambdas: Lambdas or nested function need to capture the local variable. But as +the function need a copy of all the variables since it is often passed as +callbacks or returned. + +- Coroutines: The backbone of any asynchronous system. The coroutine must be +able to suspend it self and therefore save its stack somewhere else while the +rest of the program runs. This is again stored in the heap. + +## Closing Thoughts + +A few general principles hold across all strategies: + +- Prefer **single ownership** if possible. It's easiest to reason about. + +- Use **reference counting** for shared ownership. + +- **Encode data size** somewhere, either via the type system or with fat +pointers. This helps with bounds checks and memory layout. + +- **Respect lifetimes**. Use stack, RAII, or arenas when you can. + +*[GUI]: Graphical User Interface + +- If you're not working on a performance critical system (e.g. simple GUI or +script), garbage collectors are fine. The mental overhead saved is often worth +it. + +Even in high-level systems, understanding the low-level trade-offs helps avoid +performance cliffs and debug nasty bugs faster. |