Parsing arguments in Rust with no dependencies
The author developed a command-line argument parser in Rust without external dependencies, focusing on simplicity and minimalism, supporting various argument types, and inviting contributions under MIT and Apache licenses.
Read original articleIn a recent blog post, the author discusses the process of creating a command-line argument parser in Rust without using any external dependencies. The motivation for this project arose from a previous experience of implementing a similar feature in C++. The author aimed to keep the project lightweight and manageable, avoiding the addition of numerous dependencies that could complicate the project and increase compile times. The parser is designed to handle both positional and named arguments, with a straightforward structure that includes options for short and long flags, help text, and required or optional arguments. The implementation involves defining data structures for options and values, as well as a parsing method that processes command-line arguments. The author emphasizes the benefits of building from scratch, such as gaining a deeper understanding of the problem space and reducing the potential for bugs associated with external libraries. While the parser is functional, the author notes that there are still features to be added, such as help text and validation. The project is open for contributions and is licensed under MIT and Apache licenses, reflecting the author's belief in the value of creating simple, dependency-free tools.
- The author created a command-line argument parser in Rust without external dependencies.
- The project aims to keep dependencies minimal to reduce complexity and improve compile times.
- The parser supports both positional and named arguments with various options.
- The author encourages building tools from scratch for better understanding and reduced bug risks.
- The project is open for contributions and is licensed under MIT and Apache licenses.
Related
Rust's Ugly Syntax (2023)
The blog post addresses complaints about Rust's syntax, attributing them to misunderstandings of its semantics. It suggests simplifying semantics for readability while maintaining performance and safety features.
Pantheon: Parsing command line arguments
The "sheshat" library in Rust enables command line argument parsing without the standard library, supporting various argument types and simplifying structure definitions through macros while ensuring effective error handling.
Rust needs a web framework for lazy developers
The author advocates for a simplified Rust web framework, "newt," to ease development for non-commercial projects, addressing current complexities and promoting a more integrated ecosystem with essential features.
Command Line Tools I Like (2022)
The article highlights command line tools favored by an iOS developer, including neovim, fzf, bat, exa, and others, appreciated for their speed, usability, and modern features over traditional commands.
Lessons learned from a successful Rust rewrite
The transition from C++ to Rust improved code performance and safety but revealed challenges like undefined behavior, memory management issues, and tooling limitations, highlighting the need for a stable ABI.
I did a very similar thing in Rust, but to solve a different problem. I wanted to be able to easily make nice rich command line arguments for bash scripts without cluttering up the script too much. My minimal demo is:
$ cat demo.sh
eval "$(argparse-sh --string text -- "$@")";
echo "$TEXT"
$ ./demo.sh "Hello world"
Hello World
I always think it's valuable to build these things for yourself; maybe not always valuable to ship them. It de-magic-ifies the libraries you use and can only help you grow as a programmer.I'm still learning as a Rust developer, and I'm sure there are terrible things in my code. The hard part for me is finding ways to make things more idiomatic in isolation. I don't have a code review feedback loop I can use to speed up improvement.
Regarding the CLI parser
> This will take in our arguments (from something like std::env::args) and return our matches, or an error.
`std::env::args` will panic on non-UTF8 content, like a file path. You could instead error on non-UTF8 content. Until recently, you had to pull in a dependency or reinvent some non-trivial stuff to properly deal with `OsStr`s. There are now `unsafe` functions for dealing with them. I'd like to extend things further to have a proper "pattern" API for `OsStr` which would allow almost everything a CLI parser needs to deal with `OsStr` without a dependency and without `unsafe`.
---
Regarding the discussion on dependencies, I think there are reasonable and valid situations to be careful of adding dependencies (see https://tweedegolf.nl/en/blog/119/sudo-rs-depencencies-when-... and the follow up https://www.reddit.com/r/rust/comments/1b92j0k/sudors_depend...) but the reasoning here focuses on the wrong things imo.
> That would add 23 dependencies to my little project, if you count transitive dependencies. This can go up higher if you turn on a few features: derive, env, unicode, and wrap_help bring you up to 38 dependencies!
People overly focus on dependency counts. Yes, they mention dependency counts aren't a meaningful metric later but the lack of nuance here suggests they've not internalized that, including talking about the impact of optional dependencies when they advocate for optional dependencies later.
Clap can be trimmed down to just 4 dependencies. 1 of those exist for build performance. One might be able to be removed but is very light weight. The last is functionality that would exist either way, whether in its own crate or another.
> More concretely, by having no external dependencies you reduce your bug surface area. Sure, you own all the bugs now—but you won't get leftpad-ed, and you won't get dependabot alerts for third-removed transitive dependencies that now you've gotta patch.
crates.io is leftpad safe in all but the most extreme cases (law enforcement forces the deletion of a crate).
As I point out at the beginning, you already have a bug in this trivial code, one that is often hit when people think a CLI parser is trivial and they don't need dependencies.
> On the other hand, you miss out on nice things.
I think this is an understatement. imo one of the reasons we are seeing a lot of high quality CLIs out there is because its so easy to build on the work of others.
You also get very inconsistent results which makes the user experience much worse. Take the CLI parser shown here, it doesn't handle many conventions people expect, like multiple short flags (`-zxvf`). Having to deal with each CLI parser's quirks or only living with a subset of them all is not great.
> I think more things should be built from scratch and, ideally, without dependencies. You get to know the problem space better, and most things don't need the big sophisticated solution—but you pay for the whole dependency you pull in.
In creating a "product", the problem space of CLI parsing is not core. Same with a lot of what other dependencies provide. Instead of reinventing the wheel, you can better focus on the core of what you are trying to provide.
As for big sophisticated solutions, let's take the CLI space. There are many CLI parsers that you can pick from to adapt to the needs of your specific problem (https://github.com/rosetta-rs/argparse-rosetta-rs) but do you want to go into discovery mode for every dependency for every project, pivot between them as requirements change, or deal with bouncing between APIs for non-core parts of your projects? I don't.
But, there are a few issues with this argument parser.
First and foremost, while there's no problem with forcing your option names to be str/String, you should still process OsStr/OsString unless none of your arguments are ever planning to be OS paths. The reason for this is that making your programs accept all the valid unix path names (which might not be valid UTF-8) is just the right thing to do, the alternative is an arbitrary restriction on your end users. It's about as annoying to run into these kinds of issues as it is to run into applications which don't handle spaces in filenames.
Next, there's the inability to handle multiple short options combined.
Also there's the lack of proper handling for options which require arguments vs options with optional arguments (-ovalue, -o value, --opt=value and --opt value should all work for the former case, but for the latter case it only makes sense to accept -ovalue and --opt=value due to the implications in the alternative case). Although this isn't that important and generally confuses people anyway so maybe it should be avoided.
Last (in this list, but no guarantee it's exhaustive), there's no handling for `--` to end passing options. This can have security implications.
It's a bit of a shame there isn't a zero dependency direct clone of python's argparse. Or something like that even in the standard library. argparse is relatively easy to use, not necessarily designed to be low overhead or fast (god help you if you're in a situation where option parsing is your bottleneck, but I can also appreciate the desire for not wasting cycles where there's no reason to waste them).
I think it's a good idea that people are writing their own low-dependency programs. But it's important that you understand the subject matter in detail if you plan on doing something like this for anything you're hoping to be used by anyone other than yourself.
While clap deviates a lot from the expectations of an option parser (I think part of the deviation is that the people behind clap want to do things "better" than they've been done in the past, but the problem with this motivation is that at some point better isn't important if it is at odds with interface design which has been around for a long time), it does for the most part handle most of these things in the expected way.
For me personally, I would reach for getargs (specifically, my own fork of getargs which does the handling of ArgsOs in a way I find to be optimal) can handle all of the above outline things correctly. There's also lexopt which looked promising when I last looked at it.
This seems like pointless exercise.
Related
Rust's Ugly Syntax (2023)
The blog post addresses complaints about Rust's syntax, attributing them to misunderstandings of its semantics. It suggests simplifying semantics for readability while maintaining performance and safety features.
Pantheon: Parsing command line arguments
The "sheshat" library in Rust enables command line argument parsing without the standard library, supporting various argument types and simplifying structure definitions through macros while ensuring effective error handling.
Rust needs a web framework for lazy developers
The author advocates for a simplified Rust web framework, "newt," to ease development for non-commercial projects, addressing current complexities and promoting a more integrated ecosystem with essential features.
Command Line Tools I Like (2022)
The article highlights command line tools favored by an iOS developer, including neovim, fzf, bat, exa, and others, appreciated for their speed, usability, and modern features over traditional commands.
Lessons learned from a successful Rust rewrite
The transition from C++ to Rust improved code performance and safety but revealed challenges like undefined behavior, memory management issues, and tooling limitations, highlighting the need for a stable ABI.