July 14th, 2024

Show HN: I generated 70k audiobooks with OpenAI Text-to-Speech

Project Gutenberg Audiobooks library by Listenly offers 70,000+ public domain books with titles like "Frankenstein," "Pride and Prejudice," and "Moby Dick." It includes works by Shakespeare, Austen, and more, spanning various genres.

Read original articleLink Icon
Show HN: I generated 70k audiobooks with OpenAI Text-to-Speech

Project Gutenberg Audiobooks library by Listenly offers over 70,000 public domain books that can be listened to using the latest Text-to-Speech model from OpenAI. Some notable titles available include "Frankenstein" by Mary Shelley, "Pride and Prejudice" by Jane Austen, and "Moby Dick" by Herman Melville. The collection also features works by William Shakespeare, George Eliot, Louisa May Alcott, and many more. Users can access a variety of bookshelves such as Best Books Ever, Harvard Classics, Gothic Fiction, Science Fiction, and more. The library showcases authors like William Shakespeare, Jane Austen, Mary Shelley, and Charles Dickens among others. With a wide range of genres and authors, Project Gutenberg Audiobooks library by Listenly provides a diverse selection for listeners to enjoy.

Link Icon 28 comments
By @dv35z - 4 months
If you're interested in further text to speech missions, I just got Piper (open-source text-to-speech engine) running happily in a Docker container on my Mac. Effectively "free", high quality, fast-generating text-to-speech.

Check out their voice samples: https://rhasspy.github.io/piper-samples/ (or make your own).

Happy to help you set it up locally...

https://github.com/rhasspy/piper

By @harrisonjackson - 4 months
Ah, nice! I've been doing something similar to convert web novels --> epub --> mp3/m4b --> sorta a graphic novel --> sorta a video / slide show

Here is pride and prejudice and up the thread you can see another web novel example:

https://twitter.com/HarrisonJackson/status/18109373574214537...

ElevenLabs has so many great voice models but is super expensive. I want to experiment with some oss voice models and even train my own but not sure on a great starting point with that. Play.ht has some good voices, too.

Seeing some of the results here with the openai tts I will probably switch at least the narrator to use one of these to save some money.

By @jjcm - 4 months
Very cool, and nice work on this! I used to record wikipedia's articles in audio format to help those who had trouble reading, so I'm a huge fan of anything that makes public domain work more accessible.

As a rabid audiobook consumer, I do have a couple of suggestions.

An easy one - currently you only use the Onyx voice from OpenAI. I'd recommend that at the very least you match the gender of the voice to the gender of the author. I find this is pretty common with published audiobooks, and I find it helps bring out the tone of the author more.

A harder one - most great audiobook narrators change their voice depending on the character speaking. If you really wanted to go in depth here, parsing the text by character and matching them to a voice would go a long way in making these more listenable. It would be fairly straightforward (albeit more expensive) to parse these books with an LLM and ask it to add inline markdown for the right voice options for each speaking character.

By @dmje - 4 months
So the model is - “first person pays, rest of community gets that audio for free”, have I understood that right?

Cos if so - cool, that’s a lovely model. And you should make more of it. There’s a definite feel good factor associated with this. You could probably also charge a bit more - $5 for a thing I get alone vs $10 for a thing that I get but everyone else gets for free too seems a no brainer incentive to me.

FWIW I find Omnivore[0] to be really compellingly realistic TTS. I don’t know what they use but it’s pretty great imo.

[0] https://omnivore.app/

By @42lux - 4 months
Did you know that Microsoft did basically the same thing for free last year?

https://marhamilresearch4.blob.core.windows.net/gutenberg-pu...

By @scandox - 4 months
How much listening have you done to the results? How do you feel about the results? Just interested because I've listened to quite a few AI readings (sometimes without knowing ahead of time) and I'm still sort of processing my reactions.
By @frankohn - 4 months
I created a similar project for the book Madame Bovary, but in French using the ElevenLabs API.

A sample of the first chapter is available here:

https://fairpublishing.org/index.php/ebooks/sample-audiobook...

The voice quality and pronunciation are excellent. However, the system struggles with acting, so the tone and emotional expression are often wrong during dialogues. Additionally, I have to fragment the text into short paragraphs, making it challenging to set appropriate break durations, resulting in an unnatural rhythm.

Despite the technical quality and my appreciation for the reading voice, I won't continue in this direction.

ElevenLabs is quite expensive, but it would be worth it if the final result were good enough for listeners to purchase the audiobook.

I don't know if using OpenAI's API in English would yield better results. However, OpenAI's performance in non-English languages is not satisfactory.

By @toddmorey - 4 months
I definitely support your goal: take all the public domain e-books and create audio versions for them. I think the "on-demand" approach is kinda brilliant. Once a book is requested, how long does it take to generate the audio file? Does it happen in one shot?

I sadly found an AI audio project I don't support: This person was instead summarizing popular books into 10 minutes of audio. Basically trying to SEO better than the author and I know the authors aren't compensated. That just left me feeling sad. (I know book summaries for busy people have been a thing for a while, but this just all felt so opportunistic.)

As I search podcasts these days, I'm finding more and more of these low-effort, "doesn't take more than a few minutes to set up, why not" type AI-generated spam cannons. Been hard for a while but it's about to get REALLY hard to separate the wheat from the chaff.

By @saberience - 4 months
I guess you didn't hear about Librivox? Which allows anyone to provide voiceovers for Project Gutenberg books. Much better than AI generated voice in my experience.
By @astromd - 4 months
This is one of the worst use cases for AI. You have no way to verify the quality of the output. Many of these texts are going to have pronunciations that will be difficult for today’s TTS systems. Plus, many of these are already available from good voice actors, many of them free, and they do the proper service to these texts.

It seems like you did a lot of good technical work, but I find this project entirely useless and a waste of resources.

By @ilaksh - 4 months
I think it's sad that lying in a title is now accepted marketing practice.
By @mattferderer - 4 months
I love how much better listening to books with AI has become.

Have you done any attempts at multiple narrators telling a story?

Microsoft's Azure has a great tool for doing this but it's time consuming as you have to take all the text & match it to the narrator by hand. Open AI's last big demo kind of showed using voice chat to change narrator voices on the fly.

I think it would be awesome if you could submit a book, have a simple tool parse through & find all the speakers. Then let you sample how each one sounds with a brief description of what the person is like. Basically you get to have each voice do an audition & you pick your favorites. Then it goes through page by page generating audio based on the voices selected.

I'm not suggesting this feature for the app. I'm just throwing out this idea as one I've been thinking about. There have been a lot of books I've wanted to listen to but don't have time to sit down & read.

By @scosman - 4 months
Great project.

Pricing: maybe try a mobile app with monthly subscription? Something for recurring revenue.

Features: can you generate at 1.5x speed? Might be more natural than the playback speed up options and be a nice differentiator.

By @jkbbwr - 4 months
Honestly? The quality of the output is as expected, I wondered how it would manage something like Shakespeare which depends so heavily on iambic pentameter, instead AI does what it usually does which is drone on at a slightly too fast speed, with no natural pauses and no delivery. Honestly as with most things you would be better paying for a human performance than relying on this.

I wish the OP well, and the project is nicely designed. But AI simply isn't there for this yet, not without a lot of individual hand holding and extra work.

By @laurent_du - 4 months
I think there may be issue with data collection. I tried listening to some of pg's articles but they were cut off right in the beginning, see e.g. 005 Lisp for Web Applications.
By @eplatzek - 4 months
I did some spot checks and the cadence and intonation of their speech feels so natural. The sentences flow. It's the best I've ever heard. Thanks for doing this.
By @notsure357 - 4 months
Are there any books among Project Gutenberg books that haven't already been performed as an audiobook? Assuming that all of the popular books in Project Gutenberg have an audiobook available to purchase read by a human which is probably better quality or at least more likely to be better quality, why would I want to pay money for this instead? I don't see the value proposition here.
By @ukuina - 4 months
By @outcoldman - 4 months
Does anybody have a recommendation for the apps/scrips (mac/windows/ios/android) that will allow me to generate audiobook from ebup (or txt) using my own openai api key?
By @gooseyman - 4 months
Once generated, (I.e. a user pays for the audio to be generated) does it become available to the public? If so, very cool!
By @j45 - 4 months
Is there any open source text to speech library that's starting to be half close or decent for something like this?
By @fritzo - 4 months
Feature request: I'd love to be able to choose among different voices.
By @lobito14 - 4 months
Why only Google account to login? also, why only dark theme when so many users have difficulty reading on dark backgrounds?
By @shhsdydywhwhb - 4 months
Cool but it seems you are not transparent about regenerating books?

The best books should already exist in audio and you can already show examples of the quality.

Has no one used this yet? Do you not store the generated result?

I mean it's fine to make money but you state it differently.

Nonetheless I like the project, I'm impressed with the examples and I also like the approach

By @cranberryturkey - 4 months
is your code on github somewhere?
By @qup - 4 months
Nice job Ivan.

I expect your costs to drive down over time, which is nice.

By @saberience - 4 months
Also, I find it quite unethical that you're charging for public domain books. It's frankly gross, in my opinion.