Giving C++ std:regex a C makeover
A C interface for C++ `std::regex` is proposed to simplify usage in C, utilizing an arena for memory management. It improves performance but lacks Unicode support and is inherently slow.
Read original articleThis article discusses the process of creating a C interface for the C++ standard library's regex functionality, specifically `std::regex`. The author highlights the challenges of using C++ libraries in a C environment and proposes a solution that wraps `std::regex` in a C-friendly interface. The new interface, defined in `regex.h`, utilizes structures for string and memory management, allowing for regex operations without exposing C++ complexities. The implementation avoids memory allocation issues by using an arena for memory management, which simplifies the cleanup process. The article provides example usage of the new interface, demonstrating how to create regex objects and match strings. The author notes that while this approach can improve performance, especially in MSVC, it has limitations, such as lack of Unicode support and the inherent slowness of `std::regex`. The article concludes by acknowledging the trade-offs involved in using this method, including the size of the resulting DLL and the potential need for alternative regex libraries.
- A C interface for C++ `std::regex` is proposed to simplify usage in C environments.
- The implementation uses an arena for memory management, avoiding individual memory deallocations.
- Example code demonstrates how to create regex objects and perform string matching.
- The approach can lead to performance improvements, particularly in MSVC.
- Limitations include lack of Unicode support and the overall slowness of `std::regex`.
Related
Malloc() and free() are a bad API (2022)
The post delves into malloc() and free() limitations in C, proposing a new interface with allocate(), deallocate(), and try_expand(). It discusses C++ improvements and emphasizes the significance of a robust API.
`noexcept` affects libstdc++'s `unordered_set`
The article examines the impact of the `noexcept` specifier on `std::unordered_set` performance in libstdc++, highlighting optimization opportunities and advocating for improvements to handle hash function efficiency better.
Safer C++
Alex Gaynor advocates transitioning from C/C++ to memory-safe languages in security-critical contexts, proposing improvements in C++ safety while acknowledging challenges and recommending a dual strategy for enhancement and migration.
Small Strings in Rust: smolstr vs. smartstring
The article explores Rust's small string libraries `smolstr` and `smartstring`, demonstrating JSON parsing, a custom memory allocator, and a reporting subcommand for analyzing memory usage and allocations.
Regex Crossword
A regex crossword puzzle challenges participants to fill a grid using simplified regex syntax. The author encourages learning regex basics and promotes their book on enhancing Python skills.
- Many commenters express concerns about the performance of `std::regex`, with some suggesting that it is too slow for practical use.
- There is a discussion about memory management, with some praising the arena allocation approach while others find it confusing for C programmers.
- Several users mention the desire for a simpler, full C implementation without C++ complexities.
- Some commenters appreciate the author's innovative approach, while others feel he should have addressed the limitations of existing C regex libraries more openly.
- Overall, there is a shared interest in improving regex functionality in C, but skepticism about the proposed solution's effectiveness.
This is a pretty cool hack. Makes me want to write a regex library again.
This comprehensive article goes over the problems of memory allocation, how programmers and educators have been trained to wrongly think about the problem, and how the concept of arenas solve it.
As someone who spends most of his time in garbage collected languages, this was wildly fascinating to me.
As it stands, std::regex should come with a warning label. It’s fine for occasional use. As part of a parser, it’s not. Slow is better than broken, until slow is broken.
Problematic macro in the header, custom string type compatible with nothing else in C, and I have no idea where the arena type comes from.
Having it magically deallocate memory is nice, but will confuse C programmers reading the caller.
Honestly, adding -lre to the linker is just much easier, and that library comes with docs too.
I guess the entire post could be seen as an exercise in wrapping C++ to C with nice memory-handling properties and so on, but it would also be fine to be open and upfront about that, in my opinion.
my_audio_sdk_init(&arena, sizeof(arena)); // char arena[65536]; // or something like this
I do something quite different. I design the API so any data returned by the library function is allocated by the caller. This means the caller has full control over what style of memory management works best.
For example, you can then choose to use stack allocation, RAII, malloc/free, the GC, static allocation, etc.
For a primitive example, snprintf.
Related
Malloc() and free() are a bad API (2022)
The post delves into malloc() and free() limitations in C, proposing a new interface with allocate(), deallocate(), and try_expand(). It discusses C++ improvements and emphasizes the significance of a robust API.
`noexcept` affects libstdc++'s `unordered_set`
The article examines the impact of the `noexcept` specifier on `std::unordered_set` performance in libstdc++, highlighting optimization opportunities and advocating for improvements to handle hash function efficiency better.
Safer C++
Alex Gaynor advocates transitioning from C/C++ to memory-safe languages in security-critical contexts, proposing improvements in C++ safety while acknowledging challenges and recommending a dual strategy for enhancement and migration.
Small Strings in Rust: smolstr vs. smartstring
The article explores Rust's small string libraries `smolstr` and `smartstring`, demonstrating JSON parsing, a custom memory allocator, and a reporting subcommand for analyzing memory usage and allocations.
Regex Crossword
A regex crossword puzzle challenges participants to fill a grid using simplified regex syntax. The author encourages learning regex basics and promotes their book on enhancing Python skills.