July 15th, 2024

I'm Not a Fan of Strlcpy(3)

strlcpy is debated for efficiency compared to strcpy and strncpy. For optimal performance, memccpy is suggested over strlcpy or strncpy. Dynamic allocation or mem* functions are preferred for string operations.

Read original articleLink Icon
I'm Not a Fan of Strlcpy(3)

strlcpy(3) is often considered a safer alternative to strcpy(3) and strncpy(3) in OpenBSD. However, a critical view emerged when Ulrich Drepper rejected its inclusion in glibc due to inefficiency. The main issue lies in copying null-terminated strings efficiently. In cases where truncation is irrelevant, using strlcpy or strncpy is deemed inefficient. Instead, memccpy(3) is suggested for better performance. For scenarios where truncation matters, dynamic allocation or using strlen(3) and memcpy(3) is recommended over strlcpy. The article argues that the mem* functions are suitable for string operations, contrary to common misconceptions. While strlcpy lacks universal applicability, memccpy, memcpy, and strdup are favored for string manipulation. The author concludes that in most cases, strlcpy is not the optimal choice, advocating for the use of memccpy, memcpy, or strdup.

Related

How much memory does a call to 'malloc' allocate?

How much memory does a call to 'malloc' allocate?

The malloc function in C allocates memory on the heap. Allocating 1 byte incurs an 8-byte overhead. Memory alignment may result in 16-24 bytes. Avoid small allocations for efficiency; consider realloc for extensions.

How much memory does a call to 'malloc' allocates? – Daniel Lemire's blog

How much memory does a call to 'malloc' allocates? – Daniel Lemire's blog

The malloc function in C allocates memory on the heap. Allocating 1 byte may result in 16-24 bytes due to overhead. Avoid small allocations and focus on broader concepts for efficient memory management.

Some Tricks from the Scrapscript Compiler

Some Tricks from the Scrapscript Compiler

The Scrapscript compiler implements optimization tricks like immediate objects, small strings, and variants for better performance. It introduces immediate variants and const heap to enhance efficiency without complexity, seeking suggestions for future improvements.

Designing a Better Strcpy

Designing a Better Strcpy

Saagar Jha explores challenges in enhancing strcpy in C, proposing strxcpy for efficient, null-terminated string copying with overflow indication. Comparison of strcpy variants reveals strscpy's functionality superiority but standardization absence. Jha notes original bug and C string handling complexities, emphasizing efficiency, safety, and standardization in strcpy evolution.

Malloc() and free() are a bad API (2022)

Malloc() and free() are a bad API (2022)

The post delves into malloc() and free() limitations in C, proposing a new interface with allocate(), deallocate(), and try_expand(). It discusses C++ improvements and emphasizes the significance of a robust API.

Link Icon 31 comments
By @nine_k - 4 months
I'd say that in general any format of variable size that does not state its length before the variable part is provoking buffer-overflow bugs. Thus such formats should be avoided in any kind of binary interop,

This holds even if the total length of the data in an interaction is not known ahead of time. E.g. an audio stream can be of indeterminate length, not known when the first byte is sent over the network, but each UDP packet has a well-determined length given in the header.

The length field can be made variable-size in a rather fool-proof way [1], allowing to economically represent both tiny and huge sizes.

(Zip files, WAD files, etc have that info at the very end, but this is because a file has a well-defined end before you start appending to it; fseek(fp, 0, SEEK_END) can't miss.)

[1]: http://personal.kent.edu/~sbirch/Music_Production/MP-II/MIDI...

By @EPWN3D - 4 months
The point of strlcpy(3) is not to be the world's best and most efficient string copier. It's to provide a drop-in replacement to previous, memory-unsafe string copy routines in constrained environments where you have to have bounds on stuff and might not have an allocator.

If there are bugs with truncation in the resulting buffer, those are the program's bugs, and they existing before strlcpy(3) came into the picture.

By @kazinator - 4 months
I'm not a fan of Unix man page section numbers in parentheses.

strlcpy is a stopgap, whack-a-mole solution for buffer overflows. It is rationalized by the reasoning that it does not make the program less wrong, while (probably) making it more secure.

When truncation matters and you have a fixed size buffer, that buffer should be large enough in order for it to be justifiable to say that someone is misusing the application. Perhaps a tester trying to break it.

Nobody’s surname needs 128+ bytes. No reasonable URL for a firmware update download needs 4096 bytes.

If truncation matters, no, it does not always make sense to accept a gig of data and be ready for more. You can impose a limit. A violation of the limit is an error, treated like a case of bad input.

By @ktpsns - 4 months
Given how nice system programming languages we have these days, I refrain to let classic Null-terminated C-Strings entering my program. Even on embedded programming we opt-in for std::string (over Arduino's String). I am just happy to save our time in favour of having some X percentage less optimal code.
By @mikewarot - 4 months
If someone could port the Free Pascal string library to C, it would solve a lot of problems with new C code. It reference counts and does all the management of strings. You never have to allocate or free them, and they can store gigabytes of text. You can delete from the middle of a string too!

They're counted, zero terminated, ASCII or Unicode, and magic as far as I'm concerned.

Oh... And a string copy is an O(1) operation as it only breaks the copy on modification.

Edit: correct to O(1), thanks mort96

By @Retr0id - 4 months
TIL of memccpy() https://www.man7.org/linux/man-pages/man3/memccpy.3.html

To be honest, every time I need to deal with strings in C I feel like I'm banging rocks together, regardless of approach. I try to avoid it at all costs.

By @VyseofArcadia - 4 months
It is a long, time-honored tradition to attempt to improve on flawed standard library functions with equally flawed functions.
By @Arch-TK - 4 months
Also relevant: https://nullprogram.com/blog/2021/07/30/ (it references this blog post and offers good solutions)
By @giomasce - 4 months
It's not very clear to me why in paragraph "Truncation matters" it is claimed that the strlen variant is necessarily better than the strlcpy variant. the strlcpy variant only scan the source and destination string once in the fast case (no reallocation needed), while the strlen variant needs to scan the source string at least twice. I guess in the common case you have to enlarge the destination a few times, then once it's big enough you don't have to enlarge it anymore and always hit the fast case, os it makes sense to optimize for that.

It might also be that in some programs with different access patterns that doesn't happen and it makes sense to optimize for the slow case, sure, but the author should acknowledge that variability instead of being adamant on what's better, even to the point of calling "schizo" the solution it doesn't understand. In my experience the pattern of optimizing the fast path makes a lot of sense.

BTW, the strlcpy/"schizo" variant could stand some improvement: realloc() already copies the part of the string within the original size of the buffer, so you can start copying at that point. Also, once you know that the destination is big enough to receive a full copy of the source you can use good old strcpy(). Cargo cult and random "linters"/"static checkers" will tell you shouldn't, but you know that it's a perfectly fine function to call once you've ensured that its prerequisites are satisfied.

By @pornel - 4 months
C can add a whole alphabet if str?cpy functions, and they all will have issues, because the language lacks expressive power to build a safe abstraction. It's all ad-hoc juggling of buffers without a reliably tracked size.
By @pton_xd - 4 months
The main issue, which the article covers, is that there's really two different operations you want with copying C strings.

Do you want to copy and truncate, or just copy?

Within that, do you want to manage your own allocation, or do you want that abstracted?

There's too many decision points and tradeoffs to just neatly hide behind a single "one true function" for copying C strings.

By @forrestthewoods - 4 months
C's string handling is an abomination. Null terminated strings is and always has been a colossal mistake. The C standard and operating systems need to be updated such that null-terminated strings are deprecated and all APIs take a string_view/slice/whatever struct.
By @saagarjha - 4 months
I’m always glad to see more people coming around to the fact that memccpy is the actual function they want, not these inefficient nonstandard garbage functions that are needlessly inefficient for no reason but that everyone flocks to anyways for “security”.
By @commandersaki - 4 months
strlcpy() is now part of POSIX: https://sortix.org/blog/posix-2024/.
By @ashvardanian - 4 months
Aside from the NULL-termination requirements there is arguably another big design issue with libc strings. I believe the interfaces that may allocate memory - must give you an opportunity to override the allocator. Aside from the SIMD implementation quality and throughput on Arm, that was one of the key reasons to start a new library: https://github.com/ashvardanian/StringZilla/blob/91d0a1a02fa...

Also not a huge fan of locale controls and wchar APIs :)

By @ezekiel68 - 4 months
> In other words, there's nothing "improper", "bad practice", "code-smell" or whatever with using the mem* family of functions for operating on strings, because strings are just an array of (null-terminated) characters afterall.

So refreshing to see a common-sense take in a world of shrill low-level programming alarmists.

By @uecker - 4 months
Here is my attempt at strings in C (and other stuff).

https://github.com/uecker/noplate

(attention: this is experimental and incomplete for trying ideas and is subject to change.)

By @asveikau - 4 months
I feel like this person went through an unnecessary and false tangent about how people are "afraid" of memcpy due to inexperience and missed the much more important criticism that arbitrary, naive truncation on a byte level doesn't play well with Unicode.
By @parasti - 4 months
Over time, for my needs, I've gravitated back to fixed-size buffers. There are many apps where it really doesn't matter that they handle any string ever without truncation. A string is too long and won't fit? Whoops, just use a shorter one.
By @BobbyTables2 - 4 months
In cross platform environments, it gets horrible when one does something like:

#define strlcpy strncpy

By @ComputerGuru - 4 months
I’m late to the party but this mostly rehashes already voiced concerns with all the existing “updated” strcpy functions. But what I was surprised to learn is that strdup wasn’t part of the C language spec (until now)!
By @stephencanon - 4 months
strlcpy is the worst C string routine except for all the others.
By @KerrAvon - 4 months
TFA misses the entire point of strlcpy, which is to improve security by making your code less prone to common C programmer errors that are known causes of common exploits. The author’s suggested remedies reintroduce the potential for those vulnerabilities.
By @coding123 - 4 months
Real men use memcpy
By @snitty - 4 months
"I like all my string copy's equally."

CUT TO:

"I'm not a fan of strlcpy(3)"

By @Pesthuf - 4 months
And that's why I'm not a fan of C for writing any kind of program: Every time you need to touch a string, which is an effortless task in any semi-modern language, you aim a trillion footguns at yourself and all your users.

It's also pretty telling that every article that tries to explain how to safely copy or concat strings in C, like this one, only ever works with ASCII, no attempt whatsoever to handle UTF-8 and keep code points together, let alone grapheme clusters. No wonder almost all C software has problems with non-English strings...

By @hgs3 - 4 months
There is also strscpy [1] which behaves like the authors use of memccpy except the former doesn’t require manually passing the null terminator as an argument.

[1] https://manpages.debian.org/testing/linux-manual-4.8/strscpy...

By @voidUpdate - 4 months
While the article itself is interesting, could they not have picked a less... offensive word in "I'd like to point out just how schizo this entire logic is"? Like strange, or weird, or unusual