Show HN: I generated 70k audiobooks with OpenAI Text-to-Speech
Project Gutenberg Audiobooks library by Listenly offers 70,000+ public domain books with titles like "Frankenstein," "Pride and Prejudice," and "Moby Dick." It includes works by Shakespeare, Austen, and more, spanning various genres.
Read original articleProject Gutenberg Audiobooks library by Listenly offers over 70,000 public domain books that can be listened to using the latest Text-to-Speech model from OpenAI. Some notable titles available include "Frankenstein" by Mary Shelley, "Pride and Prejudice" by Jane Austen, and "Moby Dick" by Herman Melville. The collection also features works by William Shakespeare, George Eliot, Louisa May Alcott, and many more. Users can access a variety of bookshelves such as Best Books Ever, Harvard Classics, Gothic Fiction, Science Fiction, and more. The library showcases authors like William Shakespeare, Jane Austen, Mary Shelley, and Charles Dickens among others. With a wide range of genres and authors, Project Gutenberg Audiobooks library by Listenly provides a diverse selection for listeners to enjoy.
Related
500k Books Have Been Deleted from the Internet Archive's Lending Library
500,000 books removed from Internet Archive's Open Library due to publishers' lawsuit. Legal battle restricts eBook lending, aiming to control distribution and pricing, challenging libraries' role in providing access to information.
Internet Archive forced to remove 500k books after publishers' court win
The Internet Archive removed 500,000 books due to a court ruling favoring publishers. The organization is appealing, arguing for fair use. Supporters stress the impact on education and access to information.
Much Ado About First Folios — the world's largest Shakespeare collection reopens
The Folger Shakespeare Library in Washington, D.C., completes a four-year renovation, introducing new museum spaces and leadership. It features 82 "First Folio" copies and hosts diverse cultural events, aiming to expand its audience and cultural significance.
Graham Essays: Full Collection of PG Essays in ePub, PDF and Markdowng
The GitHub repository offers 200+ essays by Paul Graham in EPUB and Markdown formats. Regularly updated, users can download the complete set and explore the current list. Instructions for downloading and contributing are provided.
Shadow Library
Shadow libraries, like Anna's Archive and Library Genesis, offer free access to academic content behind paywalls. Despite legal concerns, they persist, aiding the Open Access movement and sparking debates among academics.
Check out their voice samples: https://rhasspy.github.io/piper-samples/ (or make your own).
Happy to help you set it up locally...
Here is pride and prejudice and up the thread you can see another web novel example:
https://twitter.com/HarrisonJackson/status/18109373574214537...
ElevenLabs has so many great voice models but is super expensive. I want to experiment with some oss voice models and even train my own but not sure on a great starting point with that. Play.ht has some good voices, too.
Seeing some of the results here with the openai tts I will probably switch at least the narrator to use one of these to save some money.
As a rabid audiobook consumer, I do have a couple of suggestions.
An easy one - currently you only use the Onyx voice from OpenAI. I'd recommend that at the very least you match the gender of the voice to the gender of the author. I find this is pretty common with published audiobooks, and I find it helps bring out the tone of the author more.
A harder one - most great audiobook narrators change their voice depending on the character speaking. If you really wanted to go in depth here, parsing the text by character and matching them to a voice would go a long way in making these more listenable. It would be fairly straightforward (albeit more expensive) to parse these books with an LLM and ask it to add inline markdown for the right voice options for each speaking character.
Cos if so - cool, that’s a lovely model. And you should make more of it. There’s a definite feel good factor associated with this. You could probably also charge a bit more - $5 for a thing I get alone vs $10 for a thing that I get but everyone else gets for free too seems a no brainer incentive to me.
FWIW I find Omnivore[0] to be really compellingly realistic TTS. I don’t know what they use but it’s pretty great imo.
https://marhamilresearch4.blob.core.windows.net/gutenberg-pu...
A sample of the first chapter is available here:
https://fairpublishing.org/index.php/ebooks/sample-audiobook...
The voice quality and pronunciation are excellent. However, the system struggles with acting, so the tone and emotional expression are often wrong during dialogues. Additionally, I have to fragment the text into short paragraphs, making it challenging to set appropriate break durations, resulting in an unnatural rhythm.
Despite the technical quality and my appreciation for the reading voice, I won't continue in this direction.
ElevenLabs is quite expensive, but it would be worth it if the final result were good enough for listeners to purchase the audiobook.
I don't know if using OpenAI's API in English would yield better results. However, OpenAI's performance in non-English languages is not satisfactory.
I sadly found an AI audio project I don't support: This person was instead summarizing popular books into 10 minutes of audio. Basically trying to SEO better than the author and I know the authors aren't compensated. That just left me feeling sad. (I know book summaries for busy people have been a thing for a while, but this just all felt so opportunistic.)
As I search podcasts these days, I'm finding more and more of these low-effort, "doesn't take more than a few minutes to set up, why not" type AI-generated spam cannons. Been hard for a while but it's about to get REALLY hard to separate the wheat from the chaff.
It seems like you did a lot of good technical work, but I find this project entirely useless and a waste of resources.
Have you done any attempts at multiple narrators telling a story?
Microsoft's Azure has a great tool for doing this but it's time consuming as you have to take all the text & match it to the narrator by hand. Open AI's last big demo kind of showed using voice chat to change narrator voices on the fly.
I think it would be awesome if you could submit a book, have a simple tool parse through & find all the speakers. Then let you sample how each one sounds with a brief description of what the person is like. Basically you get to have each voice do an audition & you pick your favorites. Then it goes through page by page generating audio based on the voices selected.
I'm not suggesting this feature for the app. I'm just throwing out this idea as one I've been thinking about. There have been a lot of books I've wanted to listen to but don't have time to sit down & read.
Pricing: maybe try a mobile app with monthly subscription? Something for recurring revenue.
Features: can you generate at 1.5x speed? Might be more natural than the playback speed up options and be a nice differentiator.
I wish the OP well, and the project is nicely designed. But AI simply isn't there for this yet, not without a lot of individual hand holding and extra work.
https://docs.lemonsqueezy.com/help/checkout/payment-methods#...
The best books should already exist in audio and you can already show examples of the quality.
Has no one used this yet? Do you not store the generated result?
I mean it's fine to make money but you state it differently.
Nonetheless I like the project, I'm impressed with the examples and I also like the approach
I expect your costs to drive down over time, which is nice.
Related
500k Books Have Been Deleted from the Internet Archive's Lending Library
500,000 books removed from Internet Archive's Open Library due to publishers' lawsuit. Legal battle restricts eBook lending, aiming to control distribution and pricing, challenging libraries' role in providing access to information.
Internet Archive forced to remove 500k books after publishers' court win
The Internet Archive removed 500,000 books due to a court ruling favoring publishers. The organization is appealing, arguing for fair use. Supporters stress the impact on education and access to information.
Much Ado About First Folios — the world's largest Shakespeare collection reopens
The Folger Shakespeare Library in Washington, D.C., completes a four-year renovation, introducing new museum spaces and leadership. It features 82 "First Folio" copies and hosts diverse cultural events, aiming to expand its audience and cultural significance.
Graham Essays: Full Collection of PG Essays in ePub, PDF and Markdowng
The GitHub repository offers 200+ essays by Paul Graham in EPUB and Markdown formats. Regularly updated, users can download the complete set and explore the current list. Instructions for downloading and contributing are provided.
Shadow Library
Shadow libraries, like Anna's Archive and Library Genesis, offer free access to academic content behind paywalls. Despite legal concerns, they persist, aiding the Open Access movement and sparking debates among academics.