July 23rd, 2024

Median filtering: naive algorithm, histogram-based, binary tree, and more (2022)

The blog post explains median filtering in image analysis, covering techniques like percentile filters. It explores algorithms for computing median filters and compares their efficiency based on kernel size and complexity.

Read original article

Median filtering: naive algorithm, histogram-based, binary tree, and more (2022)

The blog post discusses the concept of median filtering in image analysis. It explains how the median filter works by taking the median value of a pixel's neighborhood in the input image. The post delves into different filtering techniques such as percentile and rank filters, highlighting their applications in noisy image processing. Various algorithms for computing the median filter are explored, including a naive algorithm, a histogram-based algorithm, and a binary tree algorithm. The post compares the efficiency of these algorithms based on kernel size and complexity. Additionally, it introduces a constant-time algorithm proposed by Perreault and Hérbert for 8-bit images and square kernels. The post concludes with timing comparisons of different implementations, showcasing the computational efficiency of each method. Overall, the blog provides a comprehensive overview of median filtering techniques and their computational implications in image processing.

Beating NumPy's matrix multiplication in 150 lines of C code

Aman Salykov's blog delves into high-performance matrix multiplication in C, surpassing NumPy with OpenBLAS on AMD Ryzen 7700 CPU. Scalable, portable code with OpenMP, targeting Intel Core and AMD Zen CPUs. Discusses BLAS, CPU performance limits, and hints at GPU optimization.

Beating NumPy's matrix multiplication in 150 lines of C code

Aman Salykov's blog explores high-performance matrix multiplication in C, surpassing NumPy with OpenBLAS on AMD Ryzen 7700 CPU. Scalable, portable code optimized for modern CPUs with FMA3 and AVX instructions, parallelized with OpenMP for scalability and performance. Discusses matrix multiplication's significance in neural networks, BLAS libraries' role, CPU performance limits, and optimizing implementations without low-level assembly. Mentions fast matrix multiplication tutorials and upcoming GPU optimization post.

Beating NumPy matrix multiplication in 150 lines of C

Aman Salykov's blog explores high-performance matrix multiplication in C, surpassing NumPy with OpenBLAS on AMD Ryzen 7700 CPU. Scalable, portable code optimized for modern CPUs with OpenMP directives for parallelization. Discusses BLAS libraries, CPU performance limits, and matrix multiplication optimization.

C++ Design Patterns for Low-Latency Applications

The article delves into C++ design patterns for low-latency applications, emphasizing optimizations for high-frequency trading. Techniques include cache prewarming, constexpr usage, loop unrolling, and hotpath/coldpath separation. It also covers comparisons, datatypes, lock-free programming, and memory access optimizations. Importance of code optimization is underscored.

Memory and ILP handling in 2D convolutions

A 2D convolution operation extracts image features using filters, converting signals to tensors. Cross-correlation is used for symmetric signals. Memory optimization and SIMD instructions enhance efficiency in processing MNIST images.

8 comments

By @foooorsyth - 10 months

Had a lot of fun implementing the Huang histogram approach and the constant time approach by hand in a prior role.

Never saw the binary tree approach. And this article, being written in summer of 22, missed out on the 2D wavelet approach published later on that year.

https://cgenglab.github.io/en/publication/sigga22_wmatrix_me....

By @ttoinou - 10 months

Median filtering with large kernel can be pretty ugly as it gives pixels who are far away the same weight. I prefer weighting them (for example with a gaussian) and using a weighted histogram, use the weighted median value (interpolated between two values). It gives a result even more interesting (for some uses) than a simple gaussian blur and you can also configure the weights to make it edge preserving like a bilateral blur but better

  There is also no way to split up the median computation

What does this mean here ? Seems like we could have a rolling window by adding and subtracting pixels on the way. I’ve coded this before, although it’s not O(1) like the algorithm described at the end

By @__abadams__ - 10 months

I've found sorting-network-based filters to be faster than these methods up to size 25x25 or so. It has worse computational complexity, but filters up to that size covers a lot of ground in practice. See Figure 10 in https://andrew.adams.pub/fast_median_filters.pdf

By @hnmullany - 10 months

I hacked together an SVG-filter based median filter once. It was horribly inefficient, but it worked: https://codepen.io/mullany/pen/ngJWvx

By @AcerbicZero - 10 months

For a second, I thought median filtering was going to be about riding motorcycles down the median of a road to filter through traffic.....

By @infocollector - 10 months

Does anyone know of a pytorch differentiable filter that does the same thing?

Beating NumPy's matrix multiplication in 150 lines of C code

Beating NumPy matrix multiplication in 150 lines of C

Aman Salykov's blog explores high-performance matrix multiplication in C, surpassing NumPy with OpenBLAS on AMD Ryzen 7700 CPU. Scalable, portable code optimized for modern CPUs with OpenMP directives for parallelization. Discusses BLAS libraries, CPU performance limits, and matrix multiplication optimization.

Median filtering: naive algorithm, histogram-based, binary tree, and more (2022)

Related

Beating NumPy's matrix multiplication in 150 lines of C code

Beating NumPy's matrix multiplication in 150 lines of C code

Beating NumPy matrix multiplication in 150 lines of C

C++ Design Patterns for Low-Latency Applications

Memory and ILP handling in 2D convolutions

Related

Beating NumPy's matrix multiplication in 150 lines of C code

Beating NumPy's matrix multiplication in 150 lines of C code

Beating NumPy matrix multiplication in 150 lines of C

C++ Design Patterns for Low-Latency Applications

Memory and ILP handling in 2D convolutions