DifuzCam: Replacing Camera Lens with a Mask and a Diffusion Model
The paper introduces DifuzCam, a lensless camera design using a diffusion model for image reconstruction, enhancing quality and fidelity, with potential applications in various imaging systems.
Read original articleThe paper titled "DifuzCam: Replacing Camera Lens with a Mask and a Diffusion Model" presents a novel approach to camera design by eliminating the traditional lens in favor of a flat lensless configuration. This design significantly reduces the size and weight of the camera. Instead of a lens, an optical element is used to manipulate incoming light, and images are reconstructed from raw sensor data using advanced algorithms. However, previous methods resulted in subpar image quality. To address this, the authors propose a reconstruction technique that employs a pre-trained diffusion model, enhanced by a control network and a learned separable transformation. This method not only improves the quality of the reconstructed images but also allows for the integration of textual descriptions of the scene to further enhance the imaging process. The prototype developed demonstrates state-of-the-art results in both image quality and perceptual fidelity, suggesting that this reconstruction approach could be beneficial for various imaging systems.
- The DifuzCam design replaces traditional camera lenses with a flat lensless configuration.
- A pre-trained diffusion model is utilized for improved image reconstruction.
- The method allows for the incorporation of textual scene descriptions to enhance image quality.
- The prototype shows significant advancements in both quality and perceptual fidelity.
- The reconstruction technique has potential applications in other imaging systems.
Related
Subwavelength imaging using a solid-immersion diffractive optical processor
Researchers developed a solid-immersion diffractive optical processor for subwavelength imaging using deep learning. The system magnifies images, reveals subwavelength features, and operates across electromagnetic spectrum for various applications.
Diffusion Training from Scratch on a Micro-Budget
The paper presents a cost-effective method for training text-to-image generative models by masking image patches and using synthetic images, achieving competitive performance at significantly lower costs.
Speedrunning 30yrs of lithography technology [video]
The author is constructing a photolithography machine for one-micron features, utilizing a UV LED and Digital Micromirror Device, in collaboration with Carnegie Mellon’s hacker Fab group, emphasizing open-source lithography.
VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
VFusion3D is a project developing scalable 3D generative models using video diffusion, to be presented at ECCV 2024. It offers pretrained models and a Gradio application for user interaction.
Gaussian Splatting Slam [CVPR 2024]
The "Gaussian Splatting SLAM" paper presents a real-time 3D reconstruction method using 3D Gaussians, achieving high-quality results for small and transparent objects with support from Dyson Technology Ltd.
https://waller-lab.github.io/DiffuserCam/ https://waller-lab.github.io/DiffuserCam/tutorial.html includes instructions and code to build your own https://ieeexplore.ieee.org/abstract/document/8747341 https://ieeexplore.ieee.org/document/7492880
The downside of this is that is heavily relies on the model to construct the image. Much like those colorisation models applied to old monochrome photos, the results will probably always look a little off based on the training data. I could imagine taking a photo of some weird art installation and the camera getting confused.
You can see examples of this when the model invented fabric texture on the fabric examples and converted solar panels to walls.
interesting.
even if this does not immediately replace traditional cameras and lenses... I am wondering if this can add a complementary set of capabilities to a traditional camera say next to a phone's camera bump/island/cluster...so that we can drive some enhanced use cases
maybe store the wider context in raw format alongside the EXIF data ...so that future photo manipulation models can use that ambient data to do more realistic edits / in painting / out painting etc?
I am thinking this will benefit 3D photography and video graphics a lot if you can capture more of the ambient data, not strictly channeled through the lenses
Intuitively, imagine moving your eye at every point along some square inch. Each position of the eye is a different image. Now all those images overlap on the sensor.
If you look at the images in the paper, everything except their most macro geometry and colour pallet is clearly generated -- since it changes depending on the prompt.
So at a guess, the lensless sensor gets this massive overlap of all possible photos at that location and so is able, at least, to capture minimal macro geometry and colour. This isn't going to be a useful amount of information for almost any application.
The sensor has a zillion pixels but each one only measures one color. for example, the pixel at index (145, 2832) might only measure green, while its neighbor at (145, 2833) only measures red. So we use models to fill in the blanks. We didn’t measure redness at (145, 2832) so we guess based on the redness nearby.
This kind of guessing is exactly what modern CV is so good at. So the line of what is a camera and what isn’t is a bit blurry to begin with.
Would love to lose the camera bump on the back of my phone.
Related
Subwavelength imaging using a solid-immersion diffractive optical processor
Researchers developed a solid-immersion diffractive optical processor for subwavelength imaging using deep learning. The system magnifies images, reveals subwavelength features, and operates across electromagnetic spectrum for various applications.
Diffusion Training from Scratch on a Micro-Budget
The paper presents a cost-effective method for training text-to-image generative models by masking image patches and using synthetic images, achieving competitive performance at significantly lower costs.
Speedrunning 30yrs of lithography technology [video]
The author is constructing a photolithography machine for one-micron features, utilizing a UV LED and Digital Micromirror Device, in collaboration with Carnegie Mellon’s hacker Fab group, emphasizing open-source lithography.
VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
VFusion3D is a project developing scalable 3D generative models using video diffusion, to be presented at ECCV 2024. It offers pretrained models and a Gradio application for user interaction.
Gaussian Splatting Slam [CVPR 2024]
The "Gaussian Splatting SLAM" paper presents a real-time 3D reconstruction method using 3D Gaussians, achieving high-quality results for small and transparent objects with support from Dyson Technology Ltd.