1Learning Outcomes¶
Understand why generic pointers (
void *) cannot be dereferencedUse
memcpyandmemmoveto access memory in generic functions
In this section, we focus on implementing generic swap. Recall our goals for generic functions from the introduction:
Generics should work regardless of argument type.
Generics should work by accessing blocks of memory, regardless of the data type stored in those blocks.
We translate these high-level ideas into pseudocode for generic swap:
void swap(void *ptr1, void *ptr2) {
// 1. store a copy of data1 in temporary storage
// 2. copy data2 to location of data1
// 3. copy data in temporary storage to location of data2
}The above parameters approximately achieve the first of our goals for generic functions. By declaring ptr1 and ptr2 as generic pointers with void *, we effectively assume there is some data at the locations that ptr1 and ptr2 point to that need to be swapped. (We will see later that some adjustments to this function signature are necessary.)
What about the function body? Our swap_int function function from earlier uses the dereference operator (*) to read and write data to and from the memory pointed to by ptr1 and ptr2. We will see in this section that dereferencing generic pointers will not work, and we must use explore other operations for accessing memory.
2You cannot dereference void *¶
It is natural to want to write a generic swap function following the pattern of swap_int, swap_short, and swap_string from earlier. However, the following code will not work:
1 2 3 4 5 6/* produces compiler errors */ void swap_faulty(void *ptr1, void *ptr2) { void temp = *ptr1; *ptr1 = *ptr2; *ptr2 = temp; }
When we attempt to compile with gcc, the compiler is not happy:
$ gcc -c -o swap.o swap.c -Wall -std=c99 -g
swap.c: In function ‘swap_faulty’:
swap.c:10:8: error: variable or field ‘temp’ declared void
10 | void temp = *ptr1;
| ^~~~
swap.c:10:15: warning: dereferencing ‘void *’ pointer
10 | void temp = *ptr1;
| ^~~~~
swap.c:10:15: error: void value not ignored as it ought to be
10 | void temp = *ptr1;
| ^
swap.c:11:3: warning: dereferencing ‘void *’ pointer
11 | *ptr1 = *ptr2;
| ^~~~~
swap.c:11:11: warning: dereferencing ‘void *’ pointer
11 | *ptr1 = *ptr2;
| ^~~~~
swap.c:11:9: error: invalid use of void expression
11 | *ptr1 = *ptr2;
| ^
swap.c:12:3: warning: dereferencing ‘void *’ pointer
12 | *ptr2 = temp;
| ^~~~~
swap.c:12:9: error: invalid use of void expression
12 | *ptr2 = temp;
| ^There are two main reasons this code won’t work. First, we cannot declare untyped variables, so a declaration like void temp; errors. Second, dereferencing void * pointers does not yield anything usable.
Consider what it means to dereference a pointer, say, declared and initialized as int32_t *p = ...; pointer means. This means we know that the p is an address of an int32_t value, and dereferencing with *p accesses those 4 bytes of memory. This allows the compiler to translate later statements like (*p) + 4 into integer arithmetic, _because it knows that *p is an int32_t-typed value.
3Generic memory access with memcpy, memmove¶
Generics cannot use the dereference operator (*) because the type of the data pointed to by generic pointers is unknown. Instead, generics use two functions from the C standard library: memcpy and memmove.
void *memcpy(void *dest, const void *src, size_t n);
void *memmove(void *dest, const void *src, size_t n); These string.h generic functions copy n bytes from memory area src to memory area dest. In other words, they allow read/write access to data in memory using a void * pointer!
4Generic Swap¶
Let’s consider how the generic memcpy will replace dereferencing in one of the statements in our swap_int function function:
// in swap_int:
// ptr1, ptr2 are both declared int *
*ptr1 = *ptr2; This statement is simply a memcpy—it copies the bytes of the int at ptr2 into the corresponding bytes at ptr1!
// equivalent to swap_int's *ptr1 = *ptr2;
memcpy(ptr1, ptr2, sizeof(int));memcpy requires knowing how many bytes to copy from a source to a destination. Recall from an earlier chapter that with pointer parameters, functions will not know how much data is pointed to. We must therefore update our pseudocode to add an additional size parameter:
void swap(void *ptr1, void *ptr2, size_t nbytes) {
// 1. store a copy of data1 in temporary storage
// 2. copy data2 to location of data1
// 3. copy data in temporary storage to location of data2
}4.1Implementation¶
Finally, we are ready to implement our generic swap function, swap!
Generic swap function
1 2 3 4 5 6 7 8 9 10 11void swap(void *ptr1, void *ptr2, size_t nbytes) { // 1. store a copy of data1 in temporary storage char temp[nbytes]; memcpy(temp, ptr1, nbytes); // 2. copy data2 to location of data1 memcpy(ptr1, ptr2, nbytes); // 3. copy data in temporary storage to location of data2 memcpy(ptr2, temp, nbytes); }
The swap function above uses memcpy and therefore assumes that ptr1 points to nbytes of memory and ptr2 points to nbytes of memory, and that there is no overlap in these two memory areas.
Now, let’s use this new generic swap to swap two ints:
1 2 3int data1 = 22; int data2 = 61; swap(&data1, &data2, sizeof(data1));
The function call looks near-identical to our non-generic call to swap_int! Just like before, we pass in the locations of the two values we want to swap. The only modification is that we now additionally pass in the size of the value(s), which we assume to be identical.
The below slidedeck traces through swap in action using a toy example. Initially, data1 stores 22 at address 0x100, and data2 stores 61 at address 0x104. Assume sizeof(int) is 4.
5Application: swap_ends¶
Finally, let’s consider generics that operate on a generic array of values. We will need to do pointer arithmetic!
We would like to use the swap function to write a function swap_ends, that swaps the first and last elements in an array.
The code below will update the array arr as shown in Figure 1:
int main() {
...
int32_t arr[] = {1, 2, 3, 4, 5};
int32_t n = sizeof(arr)/sizeof(arr[0]);
swap_ends(arr, n, sizeof(arr[0]); // to implement
...
}
Figure 1:swap_ends swaps the elements 1 and 5 in the arr array.
The function swap_ends has the following inputs and output:
void swap_ends(void *arr, size_t nelems, size_t nbytes);arr: A generic pointer to a memory block (recall that all “array” arguments decay to pointers)size_t nelems: Number of elements in the arraysize_t nbytes: Size of each element, in bytes
The last two parameters specify (1) how large the block of memory (“array”) at arr is and (2) how to access sequential elements within the block. Combined, nelems and nbytes should help us access the last element in the array.
Answer: Let’s consider what swap does. It takes two pointers and swaps nbytes at those positions. To swap the ends of the array in Figure 1, we would like to pass in 0x100 and 0x104 to swap.
Option C does this explicitly:
(char *) arr: Cast genericarrto achar *to perform bytewise arithmetic.Add
(nelems - 1) * nbytesto the(char *) arrpointer.Here, because the pointer is to a
char, pointer arithmetic is effectively bytewise.To point to the last element, we jump to the end of the array (in bytes), then go back by one element’s length, in bytes.
The final implementation of swap_ends:
void swap_ends(void *arr, size_t nelems, size_t nbytes) {
swap(arr, (char *) arr + (nelems - 1) * nbytes, nbytes);
}6More Generics in string.h¶
The string.h header in the C standard library contains string functions and functions that handle memory, like memcpy, memmove, and memset.
At first glance, you may then find the header name a bit misleading. However, as we have seen, memory handling functions operate byte-by-byte. Furthermore, by definition strings are byte arrays that happen to be null-terminated! Because byte arrays can be accessed with char * pointers, many implementations of string functions simply assume char * operands.
For example, the glibc implementation of strncpy uses memset and memcpy. From the Linux man pages:
... at most n bytes of src are copied. Warning: If there is no null byte among
the first n bytes of src, the string placed in dest will not be
null-terminated. If the length of src is less than n, strncpy() writes
additional null bytes to dest to ensure that a total of n bytes are written.
...[returns] a pointer to the destination string dest.Here is a simplified version of the full implementation in glibc:
1 2 3 4 5 6char *strncpy(char *dest, const char *src, size_t n) { size_t size = strnlen(src, n); // max(strlen(src), n) if (size != n) memset(dest + size, '\0', n – size); return memcpy(dest, src, size); }
Explanation
Each line, explained:
Function declaration.
sizeis set tostrlen(src),n. From the Linuxmanpages: “Thestrnlen()function returnsstrlen(s), if that is less thanmaxlen, ormaxlenif there is no null terminating ('\0') among the firstmaxlencharacters pointed to bys.”Recall that conditional statements without curly braces treat the next statement as the singular statement of the conditional body (here, Line 4).
Write in null terminators beyond the length of
src. Ifnis at leaststrlen(src) + 1bytes, this line null-terminates the result.Copy
sizebytes fromsrctodest.
Some implementations of memmove actually employ temporary storage (like in C99), which risks running out of memory. Read more on StackOverflow.