June 30th, 2024

Convolutions, Fast Fourier Transform and Polynomials

Alvaro Revuelta explains efficient polynomial multiplication using Convolution, Fast Fourier Transform (FFT), and Python. FFT reduces complexity from $O(n^2)$ to $O(nlogn), showcasing significant efficiency gains for high-degree polynomials.

Read original article

Convolutions, Fast Fourier Transform and Polynomials

Alvaro Revuelta explains how to multiply polynomials efficiently using Convolution, Fast Fourier Transform (FFT), and Polynomials. The traditional method taught in high school has a complexity of $O(n^2)$, but by leveraging FFT, the complexity can be reduced to $O(nlogn)$. Polynomials are represented as vectors, and operations like addition and subtraction are straightforward. Multiplication involves convolutions, where the FFT comes into play. FFT converts the time domain signal into the frequency domain, simplifying the multiplication process. By converting polynomials to the frequency domain, performing element-wise multiplication, and then converting back to the time domain, the overall complexity is reduced significantly. Alvaro provides Python code snippets to showcase the difference in efficiency between the traditional method and the FFT approach, especially evident when dealing with high-degree polynomials. The FFT algorithm enables faster polynomial multiplication, making it a more scalable and efficient technique compared to the conventional approach.

How does a computer/calculator compute logarithms?

Computers and calculators compute logarithms using geometric series and polynomial equations from calculus. Natural logarithm's relation to $\frac{1}{1+x}$ is explained through curve areas. Polynomial series derived from integrating geometric series compute natural logarithms, but converge quickly only for $|x| < \frac{1}{2}."

Why do we still teach people to calculate?

Conrad Wolfram promotes modernizing math education by integrating computers, problem-solving emphasis, and adapting to technological advancements. His innovative approach challenges traditional methods to better prepare students for real-world challenges.

A brief introduction to interval arithmetic

Interval arithmetic deals with uncertainties by using intervals instead of exact numbers. It offers accuracy in calculations but faces challenges like complexities and overestimation. Despite evolving, it finds applications in diverse fields.

Show HN: UNet diffusion model in pure CUDA

The GitHub content details optimizing a UNet diffusion model in C++/CUDA to match PyTorch's performance. It covers custom convolution kernels, forward pass improvements, backward pass challenges, and future optimization plans.

A simplified Python simulation of diffusion

Physicist Richard Feynman's teaching style is explored through Python simulations on diffusion. The tutorial simplifies physics concepts, emphasizing coding and optimization with Python modules like turtle, NumPy, and Matplotlib.

17 comments

By @programjames - 10 months

Something that always bothers me about these explanations is they usually forget about numerical errors. You can't just abstract away multiplying coefficients as "constant time". You may as well abstract away the entire multiplication to begin with! If you take into account numerical precision, it's closer to O(n (log n)^3) [1].

[1]: http://numbers.computation.free.fr/Constants/Algorithms/fft....

By @nickcw - 10 months

You can use this method to multiply long numbers together. The key insight you need is that polynomial multiplication is the same as normal long number multiplication without doing carrying.

So suppose you have a 1000 digit number. You then take each digit and use it as the coefficient of a 1000 element polynomial. You can then multiply these polynomials together using the fft method as described in the article. To convert the result back to a number you need to do the carries. So if an element is bigger than 10 you carry the excess to the next digit. You then take the coefficients and turn them into numbers.

That's the basic idea. There is some subtlety I've glossed over due to precision needed for the carries and to be sure that rounding the fft result to the nearest integer is correct. That is the way big number multiplication is done in GMP which is the leading Library for this sort of thing.

By @tverbeure - 10 months

If you haven’t seen it yet, you should watch this video: https://youtu.be/h7apO7q16V0?si=bmgUEMTQSqU3flIv

It derives the FFT algorithm from polynomial multiplication and it’s fantastic.

I rewatch it every 6 months or so.

By @rjeli - 10 months

Note that the FFT has this property of “convolution is pointwise multiplication” for any cyclic multiplicative group, see https://www.sciencedirect.com/science/article/pii/S002200007... for a more algebraic derivation.

Some call this a “harmonic” fft, and there are also non-harmonic FFTs:

- the “additive NTT” of [LCH14] on GF(2^n)

- the circle fft on the unit circle X^2+Y^2=1 of a finite field [HLP24]

- the ecfft on a sequence of elliptic curve isogenies [BCKL21]

[LCH14]: https://arxiv.org/abs/1404.3458

[HLP24]: https://eprint.iacr.org/2024/278

[BCKL21]: https://arxiv.org/pdf/2107.08473

By @sigil - 10 months

Who was the first person to propose FFTs for faster polynomial multiplication?

Got curious about this recently. I’m not great at citation tracing, but did make it back to this 1995 paper by David Eppstein [0] where he uses it to efficiently solve Subset Sum after an incremental update. Surely Knuth’s TAOCP had it even earlier?

The fact that FFT polynomial multiplication also lets you solve Exact Subset Sum with Repetition in sub-exponential time came as a real shock to me. [1] Crucially, this algo is O(N log N) where N = the maximum element, not N = the set size, so it isn’t a P ≠ NP counterexample or anything.

[0] https://escholarship.org/content/qt6sd695gn/qt6sd695gn.pdf

[1] https://x.com/festivitymn/status/1788362552998580473?s=46&t=...

By @adamnemecek - 10 months

I think that all machine learning is solving a convolutional equation.

This paper talks about it in the context of RL https://arxiv.org/abs/1712.06115 but most approaches fit within that paradigm.

By @beryilma - 10 months

Interesting. I've just implemented an algorithm (matrix profile) that makes use of FFT to compute a big set of dot products of time series subsequences where the length n of the time series can be in 100s of millions. The fast convolution computation using FFT reduces the computation time from O(n) to O(log n) with awesome speed gains at this scale. Throw in a GPU and the speed goes up even faster, like processing 10 million data point in 0.1 second on a laptop.

By @eranation - 10 months

The key ”trick” of the operation seems to be this revelation:

> In other words, performing the convolution of two signals in time domain is equivalent to multiplying them in the frequency domain.

Great article, as it breaks down a complex idea into much smaller steps that even my math challenged mind can somehow grasp, but did I miss a step? Or is it left as an exercise to the reader to look up? I was already stretching my math ability to that point, but it felt a little bit like - “and then, draw the rest of the F-ing owl” to me. Is it just me?

Great writing otherwise.

By @keepamovin - 10 months

So integer factoring is a discrete deconvolution? I wonder if juxtaposing the FFT representation (inverse pointwise multiplication) and the tableax (regular long multiplication / carry add) could break the symmetry and get enough information for a fast algorithm?

By @joe_the_user - 10 months

Sure, the naive method of multiplying polynomials is slow in the degree of the polynomial. But when does one have to deal with two degree 100 polynomials?

My impression is that this sort of method isn't used by computer algebra system for this reason.

By @froh - 10 months

I'd like the article even more if it used numpy.convolve in the benchmark.

comparing pure python and numpy fft feels not right to me.

the result will be similar, sure, but the effect might then only show for much larger inputs.

By @bobmcnamara - 10 months

Another FFT trick applies well here: the middle vector multiplication can be folded into the last stage of the FFT and first stage of the IFFT. This saves writing these two stages out two stages to memory.

By @bjornsing - 10 months

Funny feeling: Looking at the title I felt a bit puzzled. Then scrolling the article the concepts and their (well known) connections came back to me.

By @sva_ - 10 months

Good article, but I think the multiply_naive should've used numpy instead for a more fair benchmark comparison.

Convolutions, Fast Fourier Transform and Polynomials

Related

How does a computer/calculator compute logarithms?

Why do we still teach people to calculate?

A brief introduction to interval arithmetic

Show HN: UNet diffusion model in pure CUDA

A simplified Python simulation of diffusion

Related

How does a computer/calculator compute logarithms?

Why do we still teach people to calculate?

A brief introduction to interval arithmetic

Show HN: UNet diffusion model in pure CUDA

A simplified Python simulation of diffusion