0000000100008008 d __dyld_private 0000000100000000 T __mh_execute_header 0000000100008010 D _global_init_a 0000000100008014 D _global_init_b 0000000100008028 S _global_unin_a 000000010000802c S _global_unin_b 0000000100003f4c T _main 0000000100008018 d _main.local_stat_init_a 000000010000801c d _main.local_stat_init_b 0000000100008020 b _main.local_stat_unin_a 0000000100008024 b _main.local_stat_unin_b U _sleep U dyld_stub_binder
Conclusion:
Comparing the addr in the information given above, it can be judged that the initialized global variables and local static variables are in the __data segment, while the uninitialized global variables are in the __common segment, and the uninitialized local static variables are in the _ _bss section.
We can modify this program to initialize uninitialized variables to 0:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
#include<unistd.h> int global_init_a = 1; char global_init_b = 'a';
Check the symbol table after compiling: (for convenience, we use the objdump -t command to view)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
SYMBOL TABLE: 0000000100008008 l O __DATA,__data __dyld_private 0000000100008018 l O __DATA,__data _main.local_stat_init_a 000000010000801c l O __DATA,__data _main.local_stat_init_b 0000000100008028 l O __DATA,__bss _main.local_stat_unin_a 000000010000802c l O __DATA,__bss _main.local_stat_unin_b 0000000100000000 g F __TEXT,__text __mh_execute_header 0000000100008010 g O __DATA,__data _global_init_a 0000000100008014 g O __DATA,__data _global_init_b 0000000100008020 g O __DATA,__common _global_unin_a 0000000100008024 g O __DATA,__common _global_unin_b 0000000100003f4c g F __TEXT,__text _main 0000000000000000 *UND* _sleep 0000000000000000 *UND* dyld_stub_binder
Surprisingly, initializing both variables to 0 is equivalent to no initialization. **So we generally do not need to initialize global variables or local static variables to 0, because it is unnecessary. **
What is the difference between the __common section and the __bss section?
Observing the address information above, it can be found that the __commond segment and the __bss segment are connected together. __common section: The __common section stores weakly referenced symbols, that is, uninitialized global variables, and then puts them into the __bss section when linking, and then they can be easily merged together~ __bss segment: The abbreviation of Block Started by Symbol, a memory area that stores uninitialized global variables and global variables initialized to 0 in the program. The BSS segment is a static memory allocation.
Doesn’t the bss section occupy the executable file space, why?
We can take a look directly to find out. Using the command otool -d, you can see:
It can be seen from SYMBOL TABLE that there are a total of 5 values in the (__DATA,__data) section, which are: 0,1,61,2,62.
0 is __dyld_private, skip this for now;
1 is global_init_a;
61 is global_init_b(‘a’);
2 is local_stat_init_a;
62 is local_stat_init_b = ‘b’; This means that the __data segment holds the values of initialized global variables and local static variables.
Let’s take a look at the information in the __bss section, use the command objdump -s command, see:
1 2 3 4
Contents of section __common: <skipping contents of bss section at [100008020, 100008025)> Contents of section __bss: <skipping contents of bss section at [100008028, 10000802d)>
It turned out that there was nothing inside. This is actually not difficult to understand. Stored in the __bss segment are uninitialized global variables and local static variables. Since there is no initialization, there is no need to record the value of the variable in the executable file (there is no value to record). The only thing needed is to give These variables have a certain memory address (like variables in the __data section). In this way, there are actually two methods: first, write some initial values in the corresponding position like the __data section to occupy the place, and the executable file can be directly mapped when it is loaded, which is exactly the same as the __data section; second, do not give __bss The segment occupies a place in the executable file, and the corresponding area is directly opened in the memory according to the __bss segment information when loading, that is, the place is postponed from the executable file to the time of loading. The compiler is the choice of the second method, which postpones the placeholder from the executable file to the loading time. The advantage of this is to reduce the size of the executable file. takes up any executable space, while approach one increases the executable size by at least 40KB. But strictly speaking, the bss segment still occupies some executable file space. For example, there is a description of the bss segment in the segment table, and a description of related variables in the bss segment in the symbol table, but here is not the same concept.
How is the size of the bss segment and the data segment calculated?
Simple addition? Let’s see. Type objdump -h to see:
1 2 3 4 5 6 7 8 9 10 11
Sections: Idx Name Size VMA Type 0 __text 0000003c 0000000100003f4c TEXT 1 __stubs 0000000c 0000000100003f88 TEXT 2 __stub_helper 000000240000000100003f94 TEXT 3 __unwind_info 000000480000000100003fb8 DATA 4 __got 000000080000000100004000 DATA 5 __la_symbol_ptr 000000080000000100008000 DATA 6 __data 000000150000000100008008 DATA 7 __common 000000050000000100008020 BSS 8 __bss 000000050000000100008028 BSS
It can be seen that the size of the __data section is 15, but we know that there are only 5 variables in the data section. combine
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
SYMBOL TABLE: 0000000100008008 l O __DATA,__data __dyld_private 0000000100008018 l O __DATA,__data _main.local_stat_init_a 000000010000801c l O __DATA,__data _main.local_stat_init_b 0000000100008028 l O __DATA,__bss _main.local_stat_unin_a 000000010000802c l O __DATA,__bss _main.local_stat_unin_b 0000000100000000 g F __TEXT,__text __mh_execute_header 0000000100008010 g O __DATA,__data _global_init_a 0000000100008014 g O __DATA,__data _global_init_b 0000000100008020 g O __DATA,__common _global_unin_a 0000000100008024 g O __DATA,__common _global_unin_b 0000000100003f4c g F __TEXT,__text _main 0000000000000000 *UND* _sleep 0000000000000000 *UND* dyld_stub_binder
In this way, the total is exactly 15 bytes.However, if we count as 4+4+1+1, it is 10 bytes. why? It turns out that there is an interval of 3 bytes between global_init_b and local_stat_init_a, which is a common “alignment” process in memory. Therefore, the size of the data segment or the bss segment is not necessarily the sum of the sizes of its internal variables, and is generally greater than or equal to the sum of the sizes of its internal variables. This is caused by memory alignment.
It can be seen that sp is getting smaller and smaller. We know that the stack grows toward small memory addresses. According to the disassembly code, it is not difficult to see that local variables are stored in the stack.
What are the default values of global variables, static variables and local variables?
The answer to this question was known when I first started learning C language: the default value of global variables and static variables is 0, and the default value of local variables is uncertain. Here’s a closer look at why this is the case.
Both global variables and local static variables are stored in the __bss segment and the __common segment. The default value only needs to be analyzed in units of segments. Here we take the __bss segment as an example. The information of __bss is as follows:
Note that type S_ZEROFILL is to require all the contents of this section to be filled with 0. Why is the default value of global variables and static variables 0, while local variables are undefined? It’s actually easy to understand. Global variables and static variables are stored in the __bss segment and __common segment. The memory address is known when loading, and the overhead of filling 0 is not too large, and global variables and static variables act on the entire program life cycle, and it is also valuable to initialize them of. In contrast to local variables, there are a large number of local variables with a short life cycle. They are stored on the stack and their memory addresses cannot be determined in advance. If the local variables are filled with 0 and initialized every time, it will not only consume resources, but also have small benefits, which is not worth the candle.
Will variables modified by const have different locations in memory?
In order to study this problem, we modify the code again:
SYMBOL TABLE: 0000000100003fa8 l O __TEXT,__const _main.local_con_stat_init_a 0000000100003fac l O __TEXT,__const _main.local_con_stat_init_b 0000000100003fb0 l O __TEXT,__const _main.local_con_stat_unin_a 0000000100003fb4 l O __TEXT,__const _main.local_con_stat_unin_b 0000000100008008 l O __DATA,__data __dyld_private 0000000100008018 l O __DATA,__data _main.local_stat_init_a 000000010000801c l O __DATA,__data _main.local_stat_init_b 0000000100008020 l O __DATA,__bss _main.local_stat_unin_a 0000000100008024 l O __DATA,__bss _main.local_stat_unin_b 0000000100000000 g F __TEXT,__text __mh_execute_header 0000000100003fa0 g O __TEXT,__const _global_con_init_a 0000000100003fa4 g O __TEXT,__const _global_con_init_b 0000000100008028 g O __DATA,__common _global_con_unin_a 000000010000802c g O __DATA,__common _global_con_unin_b 0000000100008010 g O __DATA,__data _global_init_a 0000000100008014 g O __DATA,__data _global_init_b 0000000100008030 g O __DATA,__common _global_unin_a 0000000100008034 g O __DATA,__common _global_unin_b 0000000100003f14 g F __TEXT,__text _main 0000000000000000 *UND* _sleep 0000000000000000 *UND* dyld_stub_binder
Finally, look at the contents of the __const section:
1 2 3
Contents of section __const: 100003fa0 02000000620000000400000064000000 ....b.......d... 100003fb0 0000000000
It can be found that all global variables and local static variables decorated with const are placed in the read-only __const segment. Compared with global variables and local static variables without const modification, there are mainly the following differences:
Exposed (external) global variables and internal (non-external) local static variables are in the __const segment;
Both initialized and uninitialized variables are in the __const section;
The __const section does not have the feature of saving executable file space like the __bss section.
That is, both uninitialized global variables and local static variables take up executable space.
Let’s take a look at the situation of local variables again, disassembly:
According to the method analyzed above, it can be explained that for the initialized local variables, the presence or absence of const modification has no effect on the location of the variables, and they are all in the stack in order. Conclusions:
Global variables and local static variables modified by const will be placed in the read-only __const segment; for local variables, whether there is const modification has no effect on the location of the variable, and they are all on the stack.
Note that there is a problem here: the read-only __const segment is protected by the operating system, and its internal value cannot be modified; but there is no protection mechanism for local variables modified by const (because they are placed on the stack like ordinary local variables), at this time C language If you want to realize the function of const, you can only intervene in the compilation process to achieve constantization. This question is left for later consideration.
Afterwords
Good tools can help us get to the bottom of a problem, and this article makes use of some great tools. Of course, we must not only have one tool. In this article, we used nm, otool, objdump to successfully analyze various parts of the C programs.