Perceptually lossless (talking head) video compression at 22kbit/s
The LivePortrait model achieves perceptually lossless video compression at 22kbit/s by animating still images, focusing on facial keypoints. It has potential in video conferencing and social media, despite some limitations.
Read original articleThe article discusses the advancements in perceptually lossless video compression using the LivePortrait model, which operates at a remarkably low bitrate of 22kbit/s. This model leverages deepfake technology to animate still images, focusing on facial expressions and movements rather than transmitting full video frames. By sharing a source image and only sending changes in facial keypoints, the model achieves high-quality reconstructions with minimal data. While the results are promising, there are limitations, such as potential inaccuracies in eye gaze and facial features during animation. The LivePortrait model improves upon previous methods like facevid2vid by utilizing a larger training dataset and enhanced loss functions, allowing for better control over avatar movements. Despite its computational demands, requiring an RTX 4090 for real-time processing, the technology holds potential for applications in video conferencing and social media. Future developments may lead to even more efficient models that could operate on less powerful hardware, making this technology more accessible.
- LivePortrait achieves perceptually lossless video compression at 22kbit/s by animating still images.
- The model focuses on transmitting changes in facial keypoints rather than full video frames.
- Limitations include inaccuracies in eye gaze and facial features during animation.
- The technology has potential applications in video conferencing and social media.
- Future advancements may enable operation on less powerful hardware, increasing accessibility.
Related
Deep Live Cam: Real-Time Face Swapping and One-Click Video Deepfake Tool
Deep Live Cam is an AI tool for real-time face swapping and video deepfakes, featuring one-click generation, ethical safeguards, multi-platform support, and open-source accessibility, praised for its efficiency and user-friendliness.
Real time face swap and one-click video deepfake
Deep-Live-Cam is a GitHub project for AI-generated media, emphasizing ethical use. It requires specific software for installation and features a GUI for face swapping and webcam functionality.
Show HN: Infinity – Realistic AI characters that can speak
Infinity AI has developed a groundbreaking video model that generates expressive characters from audio input, trained for 11 GPU years at a cost of $500,000, addressing limitations of existing tools.
A New System for Temporally Consistent Stable Diffusion Video Characters
Alibaba Group's MIMO system improves full-body avatar generation with Stable Diffusion, addressing temporal stability issues and utilizing three encodings for character, scene, and occlusion, demonstrating flexibility in video synthesis.
Self-Occluded Avatar Recovery from a Single Video in the Wild
The Self-Occluded Avatar Recovery (SOAR) framework reconstructs human avatars from occluded videos, outperforming existing methods in accuracy, realism, and detail, while enabling novel view rendering and animation.
Traditional codecs have always focused on trade offs among encode complexity, decode complexity, and latency. Where complexity = compute. If every target device ran a 4090 at full power, we could go far below 22kbps with a traditional codec techniques for content like this. 22kbps isn't particularly impressive given these compute constraints.
This is my field, and trust me we (MPEG committees, AOM) look at "AI" based models, including GANs constantly. They don't yet look promising compared to traditional methods.
Oh and benchmarking against a video compression standard that's over twenty years old isn't doing a lot either for the plausibility of these methods.
This is interesting tech, and the considerations in the introduction are particularly noteworthy. I never considered the possibility of animating 2D avatars with no 3D pipeline at all.
> On a spectrum of model architectures, it achieves higher compression efficiency at the cost of model complexity. Indeed, the full LivePortrait model has 130m parameters compared to DCVC’s 20 million. While that’s tiny compared to LLMs, it currently requires an Nvidia RTX 4090 to run it in real time (in addition to parameters, a large culprit is using expensive warping operations). That means deploying to edge runtimes such as Apple Neural Engine is still quite a ways ahead.
It’s very cool that this is possible, but the compression use case is indeed .. a bit far fetched. A insanely large model requiring the most expensive consumer GPU to run on both ends and at the same time being limited in bandwidth so much (22kbps) is a _very_ limited scenario.
However is does raise an interesting property in that if you are on the spectrum or have ADHD, you only need one headshot of yourself staring directly at the camera and then the capture software can stop you from looking at your taskbar or off into space.
Reminds me of the video chat in Metal Gear Solid 1 https://youtu.be/59ialBNj4lE?t=21
Maybe there is a custom web filter in there somewhere that could block particular people and images of them.
Does anyone else remember the weirder (for lack of a better term) features of MPEG-4 part 2, like face and body animation? It did something like that, but as far as I know nearly no one used that feature for anything.
https://en.wikipedia.org/wiki/Face_Animation_Parameter
and in the worst, trust on the internet will be heavily undermined
...as long as the model doesn't include data to put a shoe on one's head.
Related
Deep Live Cam: Real-Time Face Swapping and One-Click Video Deepfake Tool
Deep Live Cam is an AI tool for real-time face swapping and video deepfakes, featuring one-click generation, ethical safeguards, multi-platform support, and open-source accessibility, praised for its efficiency and user-friendliness.
Real time face swap and one-click video deepfake
Deep-Live-Cam is a GitHub project for AI-generated media, emphasizing ethical use. It requires specific software for installation and features a GUI for face swapping and webcam functionality.
Show HN: Infinity – Realistic AI characters that can speak
Infinity AI has developed a groundbreaking video model that generates expressive characters from audio input, trained for 11 GPU years at a cost of $500,000, addressing limitations of existing tools.
A New System for Temporally Consistent Stable Diffusion Video Characters
Alibaba Group's MIMO system improves full-body avatar generation with Stable Diffusion, addressing temporal stability issues and utilizing three encodings for character, scene, and occlusion, demonstrating flexibility in video synthesis.
Self-Occluded Avatar Recovery from a Single Video in the Wild
The Self-Occluded Avatar Recovery (SOAR) framework reconstructs human avatars from occluded videos, outperforming existing methods in accuracy, realism, and detail, while enabling novel view rendering and animation.