T.TAO
Back to Blog
/8 min read/Blog

C++ Programming #2 Struct and Union

C++ Programming #2 Struct and Union

This note mainly covers the use of struct and the similar union in C++, as well as their respective memory layout. These data structures are also called heterogeneous data structures.

Struct

Declaration

Declaring a struct is straightforward. The syntax for defining a structure named StructName is as follows.

Plain Textstruct StructName {
    DataType1 member1;
    DataType2 member2;
    // ...
};

To declare a struct variable:

Plain TextStructName myStruct;

Here, struct members can contain different types of data, such as int, float, char, other structs, arrays, or pointers.

Use the dot operator (.) to access struct members. For example, myStruct.member1;

Initialization

You can initialize at declaration time:

Plain TextStructName myStruct = {value1, value2};

You can also declare a pointer to a struct. Then use the arrow operator (->) to access the members of the struct pointed to by the pointer.

Plain TextStructName myStruct = {value1, value2};
StructName *myStructPtr = &myStruct;
std::cout << (myStructPtr -> member1); // not myStructPtr.member1 !

Memory Alignment

For structs, memory alignment is an important strategy that affects how members are laid out in memory to optimize access speed and reduce wasted space.

The basic principles of memory alignment are:

  1. Struct start address alignment. For example, if the largest member in a struct is uint32_t (4 bytes), the struct's start address will likely be aligned on a 4-byte boundary.
  2. Struct member alignment. Each member inside a struct is typically aligned relative to the struct's start address on the natural alignment boundary of that member's type. For example, a 4-byte int member is usually aligned on a 4-byte boundary.
  3. Struct total size alignment. The total size of a struct is usually padded to align to the alignment boundary of its largest member. This means the struct's size may be larger than the sum of all member sizes.

To satisfy these alignment rules, the compiler may add extra unused memory between struct members or at the end of the struct, called padding. This memory padding helps ensure each member is at an appropriate memory address.

Example

Suppose we have the following struct definition:

Plain Textstruct SomeExample {
	char a; // 1 byte
	int b;  // 4 bytes
	char c; // 1 byte
}

According to the alignment rules, the memory layout of this struct might look like:

  • The largest member in memory is int, which occupies 4 bytes, so we align on 4-byte boundaries.
  • char a: occupies 1 byte. Padding of 3 bytes.
  • int b: occupies 4 bytes.
  • char c: occupies 1 byte. Padding of 3 bytes.

In this case, the total struct size is 12 bytes. As long as its start address is a multiple of 4, the address right after its end will also be a multiple of 4.

In a struct, the compiler typically lays out members in declaration order and adds the necessary padding based on each member's alignment requirements. So if we reorder the struct declaration:

Plain Textstruct SomeExample {
	char a; // 1 byte
	char c;  // 1 byte
	int b; // 4 bytes
}

Then, according to the memory alignment rules, the memory layout might look like:

  • char a: occupies 1 byte. No padding.
  • char c: immediately after a, occupies 1 byte. Padding of 2 bytes.
  • int b: occupies 4 bytes. No padding.

In this case, the total struct size is only 8 bytes, one-third smaller than before, and still satisfies memory alignment.

In some cases, programmers may adjust alignment for pointer arithmetic reasons. The second layout is not always better than the first: in the first case, we can quickly get each member's address by stepping in int-sized units from the start, while in the second case the pointer step size may be 1, 2, or 4. In the end, struct declaration order should be chosen based on project needs.

Memory Layout

In C++, data types are stored in different regions depending on how they are declared and their lifetime. Classes and structs can also be stored in different regions depending on their declaration.

The main memory regions are:

  1. Text Segment: Stores the program's executable code (machine instructions). Usually read-only to prevent accidental modification. The OS maps the text segment as read-only when loading the program.
  2. Data Segment: Stores initialized global and static variables. The data segment is initialized by the OS at program start and freed when the program ends. It exists for the entire program lifetime.
  3. Read-only Data Segment: Stores read-only data such as string literals and constants. Usually read-only to prevent accidental modification.
  4. BSS Segment (Block Started by Symbol): Stores uninitialized global and static variables. The BSS segment is zero-initialized when the program starts. Unlike the data segment, it does not occupy space in the file; the OS allocates and zeros it when loading the program.
  5. Heap: Stores dynamically allocated memory. Heap memory is managed by the programmer and can be allocated and freed at runtime. The heap is typically larger than the stack but must be freed manually; otherwise memory leaks can occur.
  6. Stack: Stores local variables, function parameters, return addresses, etc. Stack memory is managed automatically by the OS and follows last-in-first-out (LIFO). A stack frame is allocated on each function call and freed when the function returns.

Below are common data types and where they are stored.

Local Variables (Stack)

Local variables are usually stored on the stack. Their lifetime is within the function scope. When a function is called, a stack frame is allocated; when it returns, the frame is freed.

For example,

Plain Textvoid foo() {
	int x = 10; // x is a local variable in foo(), stored on the stack
}

Global and Static Variables (Data Segment)

Global and static variables are stored in the data segment. They are allocated at program start and freed when the program ends.

Plain Textint globalVar = 10; // global variable, stored in the data segment

void foo() {
	static int staticVar = 20; // static variable, in the data segment
}

Dynamically Allocated Variables (Heap)

Variables allocated with new or malloc are stored on the heap. Their lifetime is controlled by the programmer; they must be freed with delete or free.

Plain Textvoid foo() {
	int* p = new int(10); // memory pointed to by p is on the heap
	delete p; // manually free heap memory
}

Constants (Text Segment or Read-only Data Segment)

String literals and other constants are usually stored in the text segment or read-only data segment.

Plain Textconst char* str = "Hello World!"; // string literal stored in the read-only data segment.

From the above, it follows that where struct and class objects are stored in C++ also depends on their declaration and allocation. Here are common cases:

If a struct or class object is declared as a local variable in a function, it is usually allocated on the stack. For example,

Plain Textvoid foo() {
	struct MyStruct {
		int a;
		int b;
	};

	MyStruct s; // s is stored on the stack
}

Global or Static Variables (Data Segment)

If a struct or class object is declared as a global or static variable, it is allocated in the data segment. For example,

Plain Textstruct MyStruct {
	int a;
	int b;
};

MyStruct globalStruct; // globalStruct is in the data segment

void foo() {
	static MyStruct staticStruct;  // staticStruct is also in the data segment
}

Dynamic Allocation (Heap)

If a struct or class object is created via dynamic allocation (e.g., with new), it is allocated on the heap. For example,

Plain Textstruct MyStruct {
	int a;
	int b;
};

void foo() {
	MyStruct *p = new MyStruct(); // object pointed to by p is on the heap
	delete p; // manually free memory
}

Class Members

If a class object contains other classes or structs as members, those members are stored wherever the containing object is stored. For example,

Plain Textstruct InnerStruct {
	int a;
	int b;
};

struct OuterStruct {
	InnerStruct inner;
};

void foo() {
	OuterStruct outer; // outer and outer.inner are both on the stack
	OuterStruct *p = new OuterStruct(); // outer and outer.inner pointed to by p are both on the heap
	delete p;
}

These rules apply in most C++ situations. In special cases, such as with custom allocators, storage may differ.