August 28th, 2024

What is the longest known sequence that repeats in Pi? (homelab)

The article explores repeating sequences in Pi's decimal representation, detailing the transition from Python to C for efficiency, successful identification of sequences, and plans for further extensive searches.

Read original articleLink Icon
What is the longest known sequence that repeats in Pi? (homelab)

The article discusses the exploration of repeating sequences in the decimal representation of Pi, focusing on computational number theory. It highlights the discovery of various repeating sequences, such as single digits and longer sequences, and references the Online Encyclopedia of Integer Sequences (OEIS) for known sequences. The author describes the challenges of searching for these sequences, particularly as the length of the sequences increases, necessitating the use of extensive computational resources. A Python algorithm is initially employed to identify these sequences, but due to memory limitations, the author transitions to a C implementation for better efficiency. The C program utilizes a hash table to store candidate numbers and their positions in Pi, allowing for faster searches. The author successfully finds several sequences, including the 19th and 20th entries, and discusses the need for further optimization and memory management. The search for the 21st entry is also initiated, with plans to explore up to 200 billion digits of Pi. The article emphasizes the complexity and resource demands of computational searches in number theory, particularly when dealing with large datasets like the digits of Pi.

- The article explores repeating sequences in the decimal representation of Pi.

- Initial searches were conducted using Python, later optimized with a C implementation.

- A hash table structure is used to efficiently store and search for candidate numbers.

- The author successfully identifies several repeating sequences, including the 19th and 20th entries.

- Future searches aim to find longer sequences, requiring extensive computational resources.

Link Icon 14 comments
By @0points - 5 months
On a side note (it's mentioned in the article), it has been a while since I last visited OEIS (Online Encyclopedia of Integer Sequences), which has been of use over the years.

Very much appreciate the amazing effort that is OEIS!

I thought I should mention it to raise awareness: https://oeis.org/

The sequence in the article: https://oeis.org/A197123

By @kstrauser - 5 months
I wonder if you could do something with Bloom filters similar to:

  filters = {}
  candidates = []
  for i in range(len(pi)):
      block = pi[i:i+10]
      hash_prefix = block[:4]
      try:
          bloom = filters[hash_prefix]
      except KeyError:
          filters[hash_prefix] = BloomFilter().add(block)
          candidates.append((i, block))
          continue
      if block in bloom:
          candidates.append((i, block))
      else:
          bloom.add(block)
Basically, make one pass through to build a list of statistically likely candidates to evaluate later. Then use full string matching on those candidates to find the real matching sequences.

That might be helpful here because Bloom filters can say "we might have seen this string before" or "we definitely have not seen it before". That may greatly reduce the amount of storage you need.

By @nobrains - 5 months
OK cool. But...

Finding random numbers repeating is a simple brute force problem. It is cool, but slightly boring.

Related to this, what I cannot wrap my head across, is the Infinite monkey theorem. Is it not possible that we keep expanding the numbers and never reach a complex enough set of values?

By @amelius - 5 months
The binary representation of Pi contains both the MIT license and the GPL. I'm confused.
By @lifthrasiir - 5 months
Extending that to all known digits would be a fun technical challenge. I once wrote a code to calculate positions where k subsequent decimal digits equal to the position index itself [1], but I had no storage to spare so I instead used a stream of digits from the public GCP dataset [2]. You would eventually want to try this approach if you are like me (i.e. don't want to pay for two-digit-TB storage).

[1] https://github.com/lifthrasiir/remote-pi-reader/

[2] https://storage.googleapis.com/pi100t/index.html (used to be `pi50t` back then)

By @qingcharles - 5 months
By @mihaic - 5 months
This can easily be optimized to use less memory, since pi is thankfully random.

Do N passes over the data, and at each pass only put in your hashmap the values that mod N equal the current iteration. If you take N~100, the runtime would probably be in the same ballpark, since the only thing that increase is streaming all the data N times. With a fast SSD, that's not that much.

By @amingilani - 5 months
Is there a research paper is this?

Seriously, I find the most interesting works self-published in people’s personal blogs and I often wonder if there’s a paper in there and why it didn’t make it into a paper.

Maybe there’s a paper in finding the largest repeating sequence in Pi. There have certainly been more niche papers than that.

By @34679 - 5 months
Assuming that pi has an infinite number of digits, then the answer to "What is the longest known sequence that repeats in Pi?" is "The longest known sequence of Pi".
By @csense - 5 months
Breaking news: As of this writing, OP is 23 hours old. OEIS has already been updated with the new terms OP discovered.
By @cheeze - 5 months
Can you do something fancy with FFTs and convolution here?

I _think_ you can but my brain doesn't have the bandwidth to think past that.

By @ZiiS - 5 months
Isn't the entire known sequence of pi also known to repeat due to its nature?
By @cperciva - 5 months
The right algorithm for this problem starts with a linear time suffix sort.
By @Beijinger - 5 months