April 24th, 2025

Show HN: Lemon Slice Live, a real-time video-audio AI model

Lemon Slice has developed a real-time avatar generation tool using a custom DiT model, enabling users to create talking avatars from a single image with 25fps video streaming and low latency.

ExcitementCuriosityAdmiration

Show HN: Lemon Slice Live, a real-time video-audio AI model

Lemon Slice, co-founded by Lina, Andrew, and Sidney, has developed a custom diffusion transformer (DiT) model that enables real-time video streaming at 25 frames per second (fps). This technology allows users to create talking avatars from a single photo without the need for pre-training or rigging. The demo can be accessed at their website, where users can upload images of any style to generate custom characters for video calls. Key advancements include a fast DiT model that synchronizes lip movements and facial expressions with audio, a technique to maintain visual coherence across long video sequences, and a complex streaming architecture that minimizes latency. The system currently achieves a 3-6 second latency from user input to avatar response, with a goal of under 2 seconds. Future improvements aim to incorporate whole-body motions, enhance resolution, and enable avatars to engage in more natural conversations. The founders envision a future where generative video transforms media consumption, blending interactive and passive experiences in entertainment.

- Lemon Slice has created a real-time avatar generation tool using a custom DiT model.

- Users can generate talking avatars from a single image without prior training or rigging.

- The technology achieves video streaming at 25fps and aims for under 2 seconds of latency.

- Future developments will focus on improving avatar interactions and visual quality.

- The founders predict a shift in media consumption towards more interactive experiences.

Show HN: Infinity – Realistic AI characters that can speak

Infinity AI has developed a groundbreaking video model that generates expressive characters from audio input, trained for 11 GPU years at a cost of $500,000, addressing limitations of existing tools.

A New System for Temporally Consistent Stable Diffusion Video Characters

Alibaba Group's MIMO system improves full-body avatar generation with Stable Diffusion, addressing temporal stability issues and utilizing three encodings for character, scene, and occlusion, demonstrating flexibility in video synthesis.

Show HN: A real time AI video agent with under 1 second of latency

Tavus, co-founded by Hassaan and Quinn, develops AI video models for realistic conversations, achieving under 1 second latency with the Phoenix-2 model, attracting clients like Delphi for digital twin technology.

Perceptually lossless (talking head) video compression at 22kbit/s

The LivePortrait model achieves perceptually lossless video compression at 22kbit/s by animating still images, focusing on facial keypoints. It has potential in video conferencing and social media, despite some limitations.

A16Z: AI Avatars

AI avatars are evolving to create realistic, interactive characters for various sectors, enhancing storytelling and engagement. They utilize advanced technology for speech and movement synchronization, benefiting businesses and consumers alike.

AI: What people are saying

The comments on the real-time avatar generation tool reveal a mix of excitement and critique regarding its capabilities and potential applications.

Users are impressed by the technology and its ability to create engaging avatars, noting the potential for future improvements.
Some commenters express concerns about the realism of the avatars, particularly in terms of voice and facial expressions.
There are inquiries about the model's architecture and input capabilities, with curiosity about non-photo inputs like artwork.
Suggestions for applications include using the technology for virtual interactions with deceased loved ones and enhancing customer service experiences.
Several users mention the need for a more accessible trial experience, expressing reluctance to sign up for new services without assurance of quality.

23 comments

By @djaychela - about 6 hours

Just talked with Max Headroom and Michael Scott - my wife is an office fan so knows the references, and I know enough Max to ask the right things.

Overall, a fun experience. I think that MH was better than Scott. Max was missing the glitches and moving background but I'd imagine both of those are technically challenging to achieve.

Michael Scott's mouth seemed a bit wrong - I was thinking Michael J Fox but my wife then corrected that with Jason Bateman - which is much more like it. He knew Office references alright, but wasn't quite Steve Carell enough.

The default while it was listening could do with some work, I think - that was the least convincing bit; for Max he would have just glitched or even been completely still I would think. Michael Scott seemed too synthetic at this point.

Don't get me wrong, this was pretty clever and I enjoyed it, just trying to say what I found lacking without trying to sound like I could do better (which I couldn't!).

By @mentalgear - about 1 hour

So basically the old open-source live-portrait hooked up with audio output. Was very glitchy and low res on my side. btw: Wondering if it's legal to use characters you don't have rights to. (how do you justify possible IP infringement)

By @o_____________o - 34 minutes

Are you going to offer a web embeddable version of the Live offering?

By @bsenftner - about 3 hours

This is fantastic. I was the founder of the 3D Avatar Store, a company that was doing similar things 15 years ago with 3D reconstructions of people. Your platform is what I was trying to build back then, but at the time nobody thought such tech was possible, or they seriously wanted to make porn, and we refused. I'll try reaching out through channels to connect with your team. I come from a feature film VFX, Academy Award quality work, so it would be interesting to discuss. Plus, I've not been idle since the 3D Avatar Store, not at all...

By @zebomon - about 15 hours

This is impressive. The video chat works well. It is just a hair away from a very comfortable conversation. I'm excited to see where you have it a year from now, if it turns out to be financially viable. Good luck!

By @dang - about 21 hours

https://lemonslice.com/api/videos/video-XzDwIcW6QCvSIj1vX1Hu...

By @srameshc - about 21 hours

I am very much fascinated by this virtual avatar talking thing. I tried video-retalking https://github.com/OpenTalker/video-retalking just to see how far I can make it work to make a talking avatar but it is tremendously difficult. But this holds tremendous possibilities and I hope it can be eventually cheaper to run such models. I know this is far superior and probably a lot different but I hope to find open source solutions like Lemon Slice someday that I can experiment with.

By @inhumantsar - about 2 hours

love the demo video with Andrew. showing the potential as well as the delays and awkwardness of AI is refreshing compared to the heavily edited hype reels that are so common

By @lostmsu - about 21 hours

This is very impressive. Any details about model architecture and size? Input and output representation?

How does voice work? You mentioned Deepgram. Does it mean you do Speech-to-Text-to-Speech?

By @NoScopeNinja - about 7 hours

Hey, this looks really cool! I'm wondering - what happens if you feed it something totally different like a Van Gogh painting or anime character? Have you tested any non-photo inputs?

By @gitroom - about 20 hours

honestly this feels kinda huge - stuff like this is moving so fast, it's insane seeing it go real-time

By @ashishact - about 12 hours

This is just brilliant. Hope you succeed, so that eventually I get an API to play with.

By @wouterjanl - about 9 hours

Really cool stuff. It felt strangely real. Impressive!

By @elternal_love - about 18 hours

Hmm, plug this together with a app which collects photos and chats with a deceased love one and you have a working Malachim. Might be worth a shot.

Impressive technology - impressive demo! Sadly, the conversation seems to be a little bit overplayed. Might be worth plugging ChatGPT or some better LLM in the logic section.,

By @benob - about 19 hours

Very nice. Are you planning a paper?

By @sid-the-kid - about 20 hours

The system just crashed. Sorry! Working on getting things live again as fast as we can!

By @aorloff - about 18 hours

Max Headroom lives !

By @bigyabai - about 22 hours

> reducing delays and improving resolution (purpose-built ASICs will help)

How can you be sure? Investing in an ASIC seems like one of the most expensive and complicated solutions.

By @andrewstuart - about 15 hours

A really compelling experience.

It seems clumsy to use copyrighted characters in your demos.

Seems to me this will be a standard way to interact with LLMs and even companies - like a receptionist/customer service/salesperson.

Obviously games could use this.

By @tetris11 - about 22 hours

If you could lower the email signup for a few hours, that'd be nice. I'm not going to sign up for yet another service I'm unsure about.

By @doublerabbit - about 22 hours

"Try it now live" and then request me to enter my email.

I'll pass thanks.

Show HN: Infinity – Realistic AI characters that can speak

Infinity AI has developed a groundbreaking video model that generates expressive characters from audio input, trained for 11 GPU years at a cost of $500,000, addressing limitations of existing tools.

Show HN: Lemon Slice Live, a real-time video-audio AI model

Related

Show HN: Infinity – Realistic AI characters that can speak

A New System for Temporally Consistent Stable Diffusion Video Characters

Show HN: A real time AI video agent with under 1 second of latency

Perceptually lossless (talking head) video compression at 22kbit/s

A16Z: AI Avatars

Related

Show HN: Infinity – Realistic AI characters that can speak

A New System for Temporally Consistent Stable Diffusion Video Characters

Show HN: A real time AI video agent with under 1 second of latency

Perceptually lossless (talking head) video compression at 22kbit/s

A16Z: AI Avatars