August 17th, 2024

DifuzCam: Replacing Camera Lens with a Mask and a Diffusion Model

The paper introduces DifuzCam, a lensless camera design using a diffusion model for image reconstruction, enhancing quality and fidelity, with potential applications in various imaging systems.

Read original articleLink Icon
DifuzCam: Replacing Camera Lens with a Mask and a Diffusion Model

The paper titled "DifuzCam: Replacing Camera Lens with a Mask and a Diffusion Model" presents a novel approach to camera design by eliminating the traditional lens in favor of a flat lensless configuration. This design significantly reduces the size and weight of the camera. Instead of a lens, an optical element is used to manipulate incoming light, and images are reconstructed from raw sensor data using advanced algorithms. However, previous methods resulted in subpar image quality. To address this, the authors propose a reconstruction technique that employs a pre-trained diffusion model, enhanced by a control network and a learned separable transformation. This method not only improves the quality of the reconstructed images but also allows for the integration of textual descriptions of the scene to further enhance the imaging process. The prototype developed demonstrates state-of-the-art results in both image quality and perceptual fidelity, suggesting that this reconstruction approach could be beneficial for various imaging systems.

- The DifuzCam design replaces traditional camera lenses with a flat lensless configuration.

- A pre-trained diffusion model is utilized for improved image reconstruction.

- The method allows for the incorporation of textual scene descriptions to enhance image quality.

- The prototype shows significant advancements in both quality and perceptual fidelity.

- The reconstruction technique has potential applications in other imaging systems.

Link Icon 15 comments
By @dekhn - 8 months
For those interested in various approaches to lens-free imaging, Laura Waller at Berkeley has been pursuing this area for some time.

https://waller-lab.github.io/DiffuserCam/ https://waller-lab.github.io/DiffuserCam/tutorial.html includes instructions and code to build your own https://ieeexplore.ieee.org/abstract/document/8747341 https://ieeexplore.ieee.org/document/7492880

By @karmakaze - 8 months
This is not a 'camera' per se. It's more like a human vision system that samples light and hallucinates an appropriate image based on context. The image is constructed from the data more than it is reconstructed. And like human vision, it can be correct more often than not to be useful.
By @Dibby053 - 8 months
This would be impressive if the examples weren't taken from the same dataset (Laion-5B) that was used to train the Stable Diffusion model it's using.
By @teruakohatu - 8 months
This is quite amazing that using a diffuser rather than a lens, then using a diffusion model can reconstruct an image so well.

The downside of this is that is heavily relies on the model to construct the image. Much like those colorisation models applied to old monochrome photos, the results will probably always look a little off based on the training data. I could imagine taking a photo of some weird art installation and the camera getting confused.

You can see examples of this when the model invented fabric texture on the fabric examples and converted solar panels to walls.

By @Thomashuet - 8 months
I don't understand the use of a textual description. In which scenario do you not have enough space for a lens and yet have a textual description of the scene?
By @Snoozus - 8 months
It's not as crazy as it seems, a pinhole camera doesn't have any lenses either and works just fine. The hole size is a tradeoff between brightness and detail. This one has many holes and uses software to puzzle their images back together.
By @albert_e - 8 months
so this is like use of (in a different species) light sensitive patches of skin instead of the eye balls (lenses) that most animals on earth evolved ?

interesting.

even if this does not immediately replace traditional cameras and lenses... I am wondering if this can add a complementary set of capabilities to a traditional camera say next to a phone's camera bump/island/cluster...so that we can drive some enhanced use cases

maybe store the wider context in raw format alongside the EXIF data ...so that future photo manipulation models can use that ambient data to do more realistic edits / in painting / out painting etc?

I am thinking this will benefit 3D photography and video graphics a lot if you can capture more of the ambient data, not strictly channeled through the lenses

By @mjburgess - 8 months
Does a camera without a lens make any physics sense? I cannot see how the scene geometry could be recoverable. Rays of light travelling from the scene arrive in all directions.

Intuitively, imagine moving your eye at every point along some square inch. Each position of the eye is a different image. Now all those images overlap on the sensor.

If you look at the images in the paper, everything except their most macro geometry and colour pallet is clearly generated -- since it changes depending on the prompt.

So at a guess, the lensless sensor gets this massive overlap of all possible photos at that location and so is able, at least, to capture minimal macro geometry and colour. This isn't going to be a useful amount of information for almost any application.

By @xg15 - 8 months
Oh great, waiting for the first media piece where pictures from this "camera" are presented as evidence. (Or the inverse, where actual photographic evidence is disputed because who knows if the camera didn't have AI stuff built in)
By @pvillano - 8 months
I wonder how it "reacts" to optical illusions? The ones we're familiar with are optimized for probing the limits of the human visual system, but there might be some overlap
By @a1o - 8 months
Oh god, we are going to make lens a premium feature now aren't we?
By @6gvONxR4sf7o - 8 months
Re: is this a camera or not, I recently realized that my fancy mirrorless camera is closer to this than i’d previously thought.

The sensor has a zillion pixels but each one only measures one color. for example, the pixel at index (145, 2832) might only measure green, while its neighbor at (145, 2833) only measures red. So we use models to fill in the blanks. We didn’t measure redness at (145, 2832) so we guess based on the redness nearby.

This kind of guessing is exactly what modern CV is so good at. So the line of what is a camera and what isn’t is a bit blurry to begin with.

By @valine - 8 months
I get the feeling that lens free cameras are the future. Obviously the results here are no where near good enough, but given the rapid improvement of diffusion models lately the trajectory seems clear.

Would love to lose the camera bump on the back of my phone.

By @ziofill - 8 months
+1 for the Thor labs candy box