C Programming


C is on of the most used systems programming languages in the world. Your OS will probably a good chunk of C in it. After some years of C++ programming I wanted to expore the more minimal core of C99 and C11, and I much prefer it to any other system programming language alternatives.

Picture of a communist style propaganda pamphlet promoting the K&R’s C programming language book.

Style guide

This is my personal style guide for writing C code. Some more or less objective rules of thumb and my own biased preferences. Everyone has their own preferences, and one should try to respect the style guide if working with other people on different projects. Consistency is key.

Resources

Here are some style guides that I’ve read from other people. I agree with some parts of these and disagree with others. Ultimately some choices are philosophical and other are more or less practical, so pick your own style and try to stick to it. If working with other people try to agree on some guidelines, use the project existing style or use some autoformatters to avoid pointless discussions.

Notes on the language

The static keyword

The inline keyword

The inline keyword can be thought as a hint to the compiler to increase the likelihood of a function to be inlined. In order for that to work, it needs to be used with static as well, otherwise, inline can be used as an alternative implementation for an existing function. For example if we have an int fun() in translation unit fun.c and inline fun() in bar.c, and we call fun() from bar.c, if the function gets inlined, the inline fun() will be used instead of the one in fun.c. This can be a big problem if the functions don’t behave exactly the same, and it will increase the burden of maintaining two different versions.

Inline functions should be both declared and defined in header files unless those functions have internal linkage. In C, if also want non inline for the same funcion we would add a definition on a single .c file.

//
// max_val.h
//

inline int
max_val(int a, int b) {
    return a > b ? a : b;
}

//
// max_val.c
//

#include "max_val.h"

int
max_val(int a, int b);

// or

extern int
max_val(int a, int b);

// or

extern inline int
max_val(int a, int b);

// this doesn't work here!
inline int
max_val(int a, int b);

We can also force the inlining by passing a compiler parameter, i.e. __atribute__((always_inline)) inline int fun(), but in general we can trust that the compiler will optimize the inlines as needed.

The <stdarg.h> header

Contains macros to access variadic functions. Use the following example as a guide:

#include <stdarg.h>

int
sum_numbers(int num, ...) {
    va_list args;
    va_init(args, num);
    int sum = 0;
    for (size_t i; i < num; ++i) {
        sum += va_arg(args, int);
    }
    va_end(args);
    return sum;
}

Strings with <string.h>

// Split file into words.
char *token = webster;
char **all_tokens = NULL;
for (size_t i = 0; i < n_words; ++i) {
    // Limit number of words
    if (i >= n_words) {
        break;
    }
    if (i == 0) {
        token = strtok(token, " \n\t\r\v\f");
    } else {
        token = strtok(NULL, " \n\t\r\v\f");
    }
    if (token == NULL) {
        break;
    }
    dyn_push(all_tokens, token); // Dynamic array from my own library.
}

Reference

char *strtok(char *str, const char *delim);
char *strtok_r(char *str, const char *delim, char **saveptr);

About const and pointers

Here is an example of const pointers usage depending on what do we want to make const, taken from Correct usage of const with pointers:

// Neither the data nor the pointer are const
char* ptr = "just a string";

// Constant data, non-constant pointer
const char* ptr = "just a string";

// Constant pointer, non-constant data
char* const ptr = "just a string";

// Constant pointer, constant data
const char* const ptr = "just a string";

IO with <stdio.h>

Read an entire file into memory

char *
read_file(char *file_name) {
    FILE *file = fopen(file_name, "r");
    if (!file) {
        fprintf(stderr, "couldn't open file: %s\n", file_name);
        exit(-1);
    }

    fseek(file, 0, SEEK_END);          // Set file pointer to the end of file
    size_t file_size = ftell(file);    // Store the current position in file (in Bytes)
    fseek(file, 0, SEEK_SET);          // Set file pointer to beginning.
    char *str = malloc(file_size + 1); // Allocate memory for file.
    fread(str, 1, file_size, file);    // Copy 1 (Byte) * file_size from file into str.
    str[file_size] = 0;                // Set the null terminator.

    fclose(file);
    return str; // NOTE: We are returning a pointer, caller must free it!
}

Copy the contents of a file into another

void
copy_file(FILE *in, FILE *out) {
    char buf[COPY_BUF_SIZE];
    int read = 0;
    while ((read = fread(buf, 1, sizeof(buf), in)) > 0) {
        fwrite(buf, 1, read, out);
    }
}

Extra printf/scanf options

// Print 10 characters
printf("%.10s", string);

// Print n characters
printf("%.*s", n, string);

// Format a numeric value as hex with up to 4 leading zeros.
printf("0x%04x\n", number); // Lowercase
printf("0x%04X\n", number); // Uppercase

// Read 10 characters
scanf("%10s", input);

// Read 10 vowels using scansets
scanf("%10[aeiou]s", input);

Read characters until end of file

int8_t c;
while ((c = getchar()) != EOF) {
    // do stuff...
}

Binary shifts

We can create an M circular shift of an Nbit unsigned integer by doing:

uintN_t x = 1231215198;
uintN_t shift_M_left = (x << M) | (x >> (N - M));
uintN_t shift_M_right = (x >> M) | (x << (N - M));

Defer style macros

These can be useful for profiling, UIs, releaseing file handles and more:

#define macro_var(name) concat(name, __LINE__)
#define defer(start, end) for (      \
    int macro_var(_i_) = (start, 0); \
    !macro_var(_i_);                 \
    (macro_var(_i_) += 1), end)      \

defer(begin(), end()) {
    // ...
}

#define profile defer(profile_begin(), profile_end())
profile {
    // ...
}

#define gui defer(gui_begin(), gui_end())
gui {
    // ...
}

file_handle file = file_open(filename, file_mode_read);
scope(file_close(file)) {
    // ...
    // File will be closed when the scope ends.
}

Assertions/Tests with <assert.h>

We can quickly create a minimal test suite using asserts. If we want to display a message about the test, we can do so with:

assert(1 == 0 && "Message goes here");

Assertions will be removed when compiled with -DNDEBUG (They become NOPs). They also call abort(), which will dump the core if possible for further debugging.

Complex numbers <complex.h>

We have support for complex numbers with the _Complex values, but this header adds some nice typedefs. Arithmetic, equality, assignment and compoound assignment work with complex numbers.

double complex a = 2;   // 2 + 0i
double imaginary = 2;   // 2i
double complex = 6 * I; // 6i
d = a + b * c;
printf("%g + %gi\n", creal(d), cimag(d));

Designated initializers

One of the main reasons for me to use C99. They give a lot of utility for initializing structures and arrays. You can even use these to pass pointers to structs without creating a temporary. You can use this also for arrays by specifying the desired position.

struct Foo {
    int a;
    int b;
    int c;
};

// Direct structure initialization. Not given fields will be zero initialized.
struct Foo foo = (struct Foo){.a = 1, .b = 2};

// Zero initialization except for index 3 and 7
int arr[10] = {[3] = 1, [7] = 2};

// Take address of temporary.
// void add_vectors(struct Vec3 *, struct Vec3 *)
add_vectors(&(struct Vec3){.x = 2, .y = 3}, &(struct Vec3){.x = 1, .y = 2});

Flexible Array Members

The last field in a structure can be a variable array, for example if we have a packet structure such as this:

struct Packet {
    header h;
    data d[];
};

We can allocate the memory for it as follows:

Packet *p = malloc(sizeof(Packet) + n * sizeof(data));

This solves potential issues with padding. If so, sizeof will return the size of the package up to but not including the data member (Including padding).

Transform endianness

We don’t normally care about the endianness of our processor. What we may actually want is to read the bytes from a big/little endian encoded number from a stream. We can use these functions (or equivalent) to achieve this regardless of the endianness of our processor:

u32
big_endian_read(u8 *data) {
    return (data[3] <<  0) |
           (data[2] <<  8) |
           (data[1] << 16) |
           (data[0] << 24);
}

u32
little_endian_read(u8 *data) {
    return (data[0] <<  0) |
           (data[1] <<  8) |
           (data[2] << 16) |
           (data[3] << 24);
}

You can read more about this in “the byte order fallacy” article, by Rob Pike.

Minimalist unit testing in C

Unit testing in C can be done without a large framework:

// testlib.h
#define mu_assert(message, test) do { if (!(test)) return message; } while (0)
#define mu_run_test(test) do { char *message = test(); tests_run++; \
                               if (message) return message; } while (0)
extern int tests_run;

// tests/example.c
#include <stdio.h>
#include "testlib.h"

int tests_run = 0;

int foo = 7;
int bar = 4;

static char * test_foo() {
    mu_assert("error, foo != 7", foo == 7);
    return 0;
}

static char * test_bar() {
    mu_assert("error, bar != 5", bar == 5);
    return 0;
}

static char * all_tests() {
    mu_run_test(test_foo);
    mu_run_test(test_bar);
    return 0;
}

int main(int argc, char **argv) {
    char *result = all_tests();
    if (result != 0) {
        printf("%s\n", result);
    }
    else {
        printf("ALL TESTS PASSED\n");
    }
    printf("Tests run: %d\n", tests_run);

    return result != 0;
}

Source: JTN002 - MinUnit – a minimal unit testing framework for C

How to embed binary data in our programs

It is common to want to embed data and assets in our programs, particularly when we are working with embedded systems without a filesystem. Having a good asset pipeline is pretty essential and something I neglected for a long time.

My previous workflow consisted on grabbing binary files and converting them to a C source file that looked like this:

static const u32 font[] = {
    0x00000000, 0x00000000, 0x00002400, 0x423c0000,
    0x00002400, 0x3c420000, 0x0000363e, 0x3e1c0800,
    0x00081c3e, 0x3e1c0800, 0x001c1c3e, 0x363e081c,
    0x00081c3e, 0x3e3e081c, 0x00000018, 0x18000000,
    0x7e7e7e66, 0x667e7e7e, 0x00001824, 0x24180000,
    ...
};

I wrote some hacky tools to deal with this (and yes, I know there are thousands of these kind of utilities available, I just like to cook my own food ok?), but this is far from an optimal solution.

I compile pretty much everything these days using the ever-present Makefiles and if we are compiling C code, you probably already have installed all the tools you may need for this purpose. We can use objcopy to generate object files that can then be linked into your executable, for example:

objcopy -I binary -O elf32-little source.bin source.o

Note that depending on your architecture you may need to change this slightly. For example I need to do the following to make it work on my ARM systems:

arm-none-eabi-objcopy -I binary -O elf32-littlearm source.bin source.o

It’s also possible you may want to put the embedded data in a different ELF section other than .data, in which case you can use the --rename section argument:

objcopy -I binary -O elf32-little --rename-section .data=.rodata,alloc,load,readonly source.bin source.o

Once you have your object files, it’s only a matter of linking them together with the rest of your code:

gcc -o my_program my_program.c source.o

If we put it all together on a Makefile, we can have an assets folder where we will put everything it should be linked together:

.POSIX:
.SUFFIXES:
.PHONY: main run clean

# Source code location and files to watch for changes.
SRC_DIR     := src
BUILD_DIR   := build
ASSETS_DIR  := assets
SRC_OBJ     := $(wildcard $(SRC_DIR)/*.c)
ASSETS_SRC  := $(wildcard $(ASSETS_DIR)/*.bin)
OBJECTS     := $(patsubst $(SRC_DIR)/%.c, $(BUILD_DIR)/%.o, $(SRC_OBJ))
ASSETS      := $(patsubst $(ASSETS_DIR)/%.bin, $(BUILD_DIR)/%.o, $(ASSETS_SRC))
INC_DIRS    := $(shell find $(SRC_DIR) -type d)
INC_FLAGS   := $(addprefix -I,$(INC_DIRS))

# Output names and executables.
TARGET := hello-world
BIN    := $(BUILD_DIR)/$(TARGET)

# Main compilation tool paths.
CC       := gcc
LD       := ld
AS       := as
OBJDUMP  := objdump
OBJCOPY  := objcopy

# Compiler and linker configuration.
CFLAGS         := -Wall -Wextra -pedantic
CFLAGS         += $(INC_FLAGS)
LDFLAGS        :=
LDLIBS         :=
RELEASE_CFLAGS := -O2 -DNDEBUG
DEBUG_CFLAGS   := -O0 -DDEBUG -g

# Setup debug/release builds.
DEBUG ?= 0
ifeq ($(DEBUG), 1)
    CFLAGS += $(DEBUG_CFLAGS)
else
    CFLAGS += $(RELEASE_CFLAGS)
endif

main: $(BUILD_DIR) $(OBJECTS) $(ASSETS) $(BIN)

$(BIN): $(OBJECTS) $(ASSETS) | $(BUILD_DIR)
    $(CC) $(CFLAGS) $(LDFLAGS) -o $(BIN) $(OBJECTS) $(LDLIBS) $(ASSETS)

# Remove build directory.
clean:
    rm -rf $(BUILD_DIR)

# Create the build directory.
$(BUILD_DIR):
    mkdir -p $(BUILD_DIR)

# Inference rules for C files.
$(BUILD_DIR)/%.o: $(SRC_DIR)/%.c | $(BUILD_DIR)
    $(CC) $(CFLAGS) $< -o $@ -r

# Inference rules for binary file embedding.
$(BUILD_DIR)/%.o: $(ASSETS_DIR)/%.bin | $(BUILD_DIR)
    $(OBJCOPY) -I binary -O elf32-little $< $@

Ok you have the assets and your code bundled together, but how do you use it? Well, we have to take advantage of the extern keyword. If we have embedded a file located at assets/source.bin, your variable declarations should look something like this:

extern char _binary_assets_source_bin_start[];
extern char _binary_assets_source_bin_size[];

To use it as an array, you would have to cast the size to a size_t value:

char *source_data = _binary_assets_source_bin_start;
size_t source_size = (size_t)_binary_assets_source_bin_size;

If you can’t find the symbol name that was generated by objcopy, you can use readelf or objdump to explore the symbol table:

> readelf -Ws build/source.o

Symbol table '.symtab' contains 4 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 00000000     0 NOTYPE  GLOBAL DEFAULT    1 _binary_assets_source_bin_start
     2: 00000150     0 NOTYPE  GLOBAL DEFAULT    1 _binary_assets_source_bin_end
     3: 00000150     0 NOTYPE  GLOBAL DEFAULT  ABS _binary_assets_source_bin_size

The names can get a bit hairy to use, so using macros for more descriptive names is not a terrible idea:

#define SOURCE_DATA _binary_assets_source_start
#define SOURCE_SIZE _binary_assets_source_size

extern s8 SOURCE_DATA[];
extern u8 SOURCE_SIZE[];

Lately I’ve moved to use these cursed macros to include raw binary files when possible, they can be used outside functions and already setup the extern variables for me. Nifty! I didn’t came up with this idea, but just adapted it to suit my needs, check the sources below for more info.

#define BINARY(sect, file, sym) \
    asm (\
        ".section " #sect "\n"\
        ".balign 4\n"\
        ".global " #sym "\n"\
        #sym ":\n"\
        ".incbin \"" file "\"\n"\
        ".global " #sym "_size\n"\
        ".set " #sym "_size, . - " #sym "\n"\
        ".balign 4\n"\
        ".section \".text\"\n"\
    );\
    extern const char sym[], sym##_size[]

#define BINARY_PARTIAL(sect, file, offset, size, sym) \
    asm (\
        ".section " #sect "\n"\
        ".balign 4\n"\
        ".global " #sym "\n"\
        #sym ":\n"\
        ".incbin \"" file "\"," #offset "," #size "\n"\
        ".global " #sym "_size\n"\
        ".set " #sym "_size, . - " #sym "\n"\
        ".balign 4\n"\
        ".section \".text\"\n"\
    );\
    extern const char sym[], sym##_size[]

How to explore assembly code

There are tools like Godbolt/Compiler explorer that allow us to look at the generated assembly code for our files but on UNIX you probably already have the necessary things if you have installed your build-tools. This Makefile will generate object files .o, partially compiled assembly files .s and dissasembled code using objdump .dump. Just create an src directory, put your source .c files there and run make.

.POSIX:
.SUFFIXES:
.PHONY: main run clean

# Source code location and files to watch for changes.
SRC_DIR     := src
BUILD_DIR   := build
SRC_OBJ     := $(wildcard $(SRC_DIR)/*.c)
OBJECTS     := $(patsubst $(SRC_DIR)/%.c, $(BUILD_DIR)/%.o, $(SRC_OBJ))
ASM_FILES   := $(patsubst $(SRC_DIR)/%.c, $(BUILD_DIR)/%.s, $(SRC_OBJ))
DUMP_FILES  := $(patsubst $(SRC_DIR)/%.c, $(BUILD_DIR)/%.dump, $(SRC_OBJ))
WATCH_SRC   := $(shell find $(SRC_DIR) -name "*.c" -or -name "*.s" -or -name "*.h")
INC_DIRS    := $(shell find $(SRC_DIR) -type d)
INC_FLAGS   := $(addprefix -I,$(INC_DIRS))

# Output names and executables.
TARGET := compiler-explorer
BIN    := $(BUILD_DIR)/$(TARGET)

# Main compilation tool paths.
CC       := gcc
LD       := ld
AS       := as
OBJDUMP  := objdump

# Compiler and linker configuration.
CFLAGS         := -Wall -Wextra -pedantic
CFLAGS         += $(INC_FLAGS)
LDFLAGS        :=
LDLIBS         :=
RELEASE_CFLAGS := -O2 -DNDEBUG
DEBUG_CFLAGS   := -O0 -DDEBUG -g

# Setup debug/release builds.
DEBUG ?= 1
ifeq ($(DEBUG), 1)
    CFLAGS += $(DEBUG_CFLAGS)
else
    CFLAGS += $(RELEASE_CFLAGS)
endif

main: $(ASM_FILES) $(OBJECTS) $(DUMP_FILES)

# Remove build directory.
clean:
    rm -rf $(BUILD_DIR)

# Create the build directory.
$(BUILD_DIR):
    mkdir -p $(BUILD_DIR)

# Inference rules for C files.
$(BUILD_DIR)/%.o: $(SRC_DIR)/%.c | $(BUILD_DIR)
    $(CC) $(CFLAGS) $< -o $@ -r

# Inference rules for partially compiled assembly.
$(BUILD_DIR)/%.s: $(SRC_DIR)/%.c | $(BUILD_DIR)
    $(CC) $(CFLAGS) -S $< -o $@ -r

$(BUILD_DIR)/%.dump: $(BUILD_DIR)/%.o | $(BUILD_DIR)
    $(OBJDUMP) -Mintel -d $< > $@

For example the following file named src/math.c:

// src/math.c
int square(int num) {
    return num * num;
}

Will generate the dump file build/math.dump:

build/math.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <square>:
   0:   55                      push   rbp
   1:   48 89 e5                mov    rbp,rsp
   4:   89 7d fc                mov    DWORD PTR [rbp-0x4],edi
   7:   8b 45 fc                mov    eax,DWORD PTR [rbp-0x4]
   a:   0f af c0                imul   eax,eax
   d:   5d                      pop    rbp
   e:   c3                      ret

And partially compiled assembly file build/math.s:

    .file   "math.c"
    .text
.Ltext0:
    .file 0 "/home/badd10de/microbolt" "src/math.c"
    .globl  square
    .type   square, @function
square:
.LFB0:
    .file 1 "src/math.c"
    .loc 1 1 21
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    %edi, -4(%rbp)
    .loc 1 2 16
    movl    -4(%rbp), %eax
    imull   %eax, %eax
    .loc 1 3 1
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   square, .-square
.Letext0:
    .section    .debug_info,"",@progbits
.Ldebug_info0:
    .long   0x64
    .value  0x5
    .byte   0x1
    .byte   0x8
    .long   .Ldebug_abbrev0
    .uleb128 0x1
    .long   .LASF2
    .byte   0x1d
    .long   .LASF0
    .long   .LASF1
    .quad   .Ltext0
    .quad   .Letext0-.Ltext0
    .long   .Ldebug_line0
    .uleb128 0x2
    .long   .LASF3
    .byte   0x1
    .byte   0x1
    .byte   0x5
    .long   0x60
    .quad   .LFB0
    .quad   .LFE0-.LFB0
    .uleb128 0x1
    .byte   0x9c
    .long   0x60
    .uleb128 0x3
    .string "num"
    .byte   0x1
    .byte   0x1
    .byte   0x10
    .long   0x60
    .uleb128 0x2
    .byte   0x91
    .sleb128 -20
    .byte   0
    .uleb128 0x4
    .byte   0x4
    .byte   0x5
    .string "int"
    .byte   0
    .section    .debug_abbrev,"",@progbits
.Ldebug_abbrev0:
    .uleb128 0x1
    .uleb128 0x11
    .byte   0x1
    .uleb128 0x25
    .uleb128 0xe
    .uleb128 0x13
    .uleb128 0xb
    .uleb128 0x3
    .uleb128 0x1f
    .uleb128 0x1b
    .uleb128 0x1f
    .uleb128 0x11
    .uleb128 0x1
    .uleb128 0x12
    .uleb128 0x7
    .uleb128 0x10
    .uleb128 0x17
    .byte   0
    .byte   0
    .uleb128 0x2
    .uleb128 0x2e
    .byte   0x1
    .uleb128 0x3f
    .uleb128 0x19
    .uleb128 0x3
    .uleb128 0xe
    .uleb128 0x3a
    .uleb128 0xb
    .uleb128 0x3b
    .uleb128 0xb
    .uleb128 0x39
    .uleb128 0xb
    .uleb128 0x27
    .uleb128 0x19
    .uleb128 0x49
    .uleb128 0x13
    .uleb128 0x11
    .uleb128 0x1
    .uleb128 0x12
    .uleb128 0x7
    .uleb128 0x40
    .uleb128 0x18
    .uleb128 0x7a
    .uleb128 0x19
    .uleb128 0x1
    .uleb128 0x13
    .byte   0
    .byte   0
    .uleb128 0x3
    .uleb128 0x5
    .byte   0
    .uleb128 0x3
    .uleb128 0x8
    .uleb128 0x3a
    .uleb128 0xb
    .uleb128 0x3b
    .uleb128 0xb
    .uleb128 0x39
    .uleb128 0xb
    .uleb128 0x49
    .uleb128 0x13
    .uleb128 0x2
    .uleb128 0x18
    .byte   0
    .byte   0
    .uleb128 0x4
    .uleb128 0x24
    .byte   0
    .uleb128 0xb
    .uleb128 0xb
    .uleb128 0x3e
    .uleb128 0xb
    .uleb128 0x3
    .uleb128 0x8
    .byte   0
    .byte   0
    .byte   0
    .section    .debug_aranges,"",@progbits
    .long   0x2c
    .value  0x2
    .long   .Ldebug_info0
    .byte   0x8
    .byte   0
    .value  0
    .value  0
    .quad   .Ltext0
    .quad   .Letext0-.Ltext0
    .quad   0
    .quad   0
    .section    .debug_line,"",@progbits
.Ldebug_line0:
    .section    .debug_str,"MS",@progbits,1
.LASF2:
    .string "GNU C17 13.1.1 20230429 -mtune=generic -march=x86-64 -g -O0"
.LASF3:
    .string "square"
    .section    .debug_line_str,"MS",@progbits,1
.LASF0:
    .string "src/math.c"
.LASF1:
    .string "/home/badd10de/microbolt"
    .ident  "GCC: (GNU) 13.1.1 20230429"
    .section    .note.GNU-stack,"",@progbits

Feel free to tweak the CFLAGS parameters or change the debug flag to use optimizations modifying the makefile’s REALESE_CFLAGS, DEBUG_CFLAGS and the DEBUG ?= 1 or DEBUG ?= 0 options.

To get the size of each section in an .elf file we can do:

size -A -d my-file.elf

And the sizes of different fuctions and structurestherein:

nm --print-size --size-sort --radix=d my-file.elf

Resources

Parallelism/Concurrency

Memory and OS resources

Libraries/Tools

Debugging/profiling

Networking

UI

Other

Books

Talks