November 4th, 2024

Adventures in Probability

The author reflects on their limited exposure to statistics, emphasizing the importance of probability, the exponential distribution in queueing theory, and the CoDel algorithm for effective queue management.

Read original articleLink Icon
FrustrationCuriosityAppreciation
Adventures in Probability

The article discusses the author's reflections on their limited exposure to statistics and probability during school, particularly highlighting a frustrating experience with a left-handed professor. The author emphasizes the importance of understanding probability and statistics, especially in relation to managing uncertainty. They delve into the concept of the exponential distribution, which frequently appears in various fields such as queueing theory and performance modeling. The author explains how the exponential distribution is foundational for simulating events, particularly in queue management systems. They introduce the CoDel algorithm for queue management and describe how to simulate Poisson point processes (PPPs) for modeling event arrivals and completions. The author notes the unique memoryless property of the exponential distribution, which allows for certain simplifications in modeling. They express a desire to communicate the significance of these concepts more effectively, reflecting on their own learning journey and the challenges faced in grasping these statistical principles.

- The author regrets not taking more statistics classes in school.

- The exponential distribution is crucial in various fields, including queueing theory.

- The CoDel algorithm is introduced as a method for queue management.

- The memoryless property of the exponential distribution is highlighted as unique and significant.

- The author reflects on their learning process and the challenges of understanding these concepts.

Related

Everyday is a Birthday A Journey to a classic problem through Math and Rust

Everyday is a Birthday A Journey to a classic problem through Math and Rust

This article explores the classic probability problem of ensuring every day of the year is represented by a birthday in a group of people, using mathematical concepts like the Coupon Collector Problem and Inclusion-Exclusion Principle. It calculates needing around 2364 students, adjusting to 2669 with leap days, and discusses the gamma function for probability calculation.

Do not confuse a random variable with its distribution

Do not confuse a random variable with its distribution

In probability theory, a random variable and its distribution are distinct concepts. The random variable depends on random outcomes, while the distribution shows variation patterns. Changing variables or probabilities alters distributions. Examples clarify shared distributions with different outcomes, crucial for interpreting probabilities and simulations.

Visualizing Algorithms

Visualizing Algorithms

Visualization enhances understanding of algorithms, with sampling being crucial in graphics. Bridson’s Poisson-disc sampling improves distribution, while the Fisher-Yates shuffle efficiently rearranges elements, aiding education and comprehension.

How networking affects distributed systems

How networking affects distributed systems

The article outlines challenges in distributed systems, including network unreliability, inconsistency, probabilistic outcomes, and bandwidth limitations, emphasizing the need for robust design and failure management strategies.

Sampling with SQL

Sampling with SQL

Tom Moertel's blog post discusses SQL sampling algorithms, particularly the A-ES algorithm for weighted sampling without replacement, emphasizing numerical stability and performance optimizations in column-oriented storage formats.

AI: What people are saying
The comments reflect a mix of experiences and insights related to the teaching and application of statistics and queueing theory concepts.
  • Many commenters express frustration with traditional teaching methods in statistics, noting a lack of intuitive understanding.
  • There are discussions about the complexities of modeling arrival and completion rates in Poisson processes, highlighting dependencies that challenge the validity of certain models.
  • Some commenters suggest that neural networks could potentially outperform traditional algorithms by capturing subtle patterns in real-world data.
  • Several individuals share personal experiences with research or projects related to probabilistic models and queue management.
  • Overall, there is a call for more engaging and intuitive approaches to teaching these mathematical concepts.
Link Icon 10 comments
By @alexpotato - 3 months
>my professor projected video of himself writing on a piece of paper before a very large auditorium, and that guy was left-handed, and so his hand would cover his notes for like the entire time and it was impossible to see what he was writing. I only figured out that this was why it was so unpleasant like halfway through the class.

So many of my college math classes had some version of this professor who took a fascinating subject like linear algebra, statistics or algorithms and made it into a slog. The fact that most stats is taught by getting students to just memorize random ideas rather than building up a holistic and intuitive view really is a travesty.

Also makes sense why so many people, even though they took stats in college, hav e such a poor understanding of probability.

By @exmadscientist - 3 months
> I think if I were in charge of presenting this material to students I'd do it by introducing the concept of memorylessness and by showing how good memorylessness is, how many wonderful things you can do with it. And then one day I'd be like, "well, it sure would be nice if we had any distributions like that!" and then whirl around with my piece of chalk to deliver the exciting news that we do. Exactly one, in fact.

Incidentally, this also goes for the determinant of a matrix. It's got a lot of neat and desirable properties, and it turns out to be the only thing that does. When it was finally taught to me this way, those weird algorithms we use to compute this seemingly-arbitrary number finally made sense. (And, in fact, this is the easiest way to prove that all those algorithms have to be computing the same seemingly-arbitrary number. Because the algorithms preserve the properties that define The Determinant, and The Determinant is the unique thing that preserves all of those properties, so must those algorithms all be computing The Determinant, no matter how different they might look.)

So I can vouch that this style of explanation really does work, at least for people like me.

By @shiandow - 3 months
Poisson processes are neat, they always end up working nicely in ways that many other distributions/processes very much don't.

Splitting a Poisson process into two lower rate processes is a neat trick. Even better is that you can do the same to convert a Poisson process into one with a variable rate, provided that rate is lower than the original (original may be variable as well).

And the fact that the partial sums of a bunch of exponential distributions results in the same distribution of values as picking Poisson(lambda * time) values uniformly at random is pure magic.

By @vdvsvwvwvwvwv - 3 months
This was a lot of fun. By skipping over the formulaic details and proof and explaining the lay of the land, it makes a good starting point to explore further.

Either mathematically or just some python dice rolls.

Really good. Same ethos as fast.ai courses.

Finally that you can add tbe /3 and /5 to get a /8 distribution makes intuitive semsw to me. / means lambda.

This is because if you have people arriving to a train station you could split them by eye colour and there is no reason a particular eye colour cause there to be dependencies. (Assuming spherical cows: families arriving excepted. Assume it is downtown rush hour)

By @2-3-7-43-1807 - 3 months
I doubt that this model of a queue and the processing of its items by overlaying two independent poisson processes is statistically valid (one for items arriving, the other for processing those items). The processing starts only after the respective item arrived in the queue - so it's not independent - and this needs to be modeled accordingly or it requires a proof that this is equivalent to the suggested overlaying approach - that wouldn't be obvious or trivial.
By @asah - 3 months
I can't help but wonder if real systems have additional (perhaps subtle) signals, which can be provided to a neural network, which then outperforms these simple algorithms.

For example, customers arrive at the grocery store in clusters due to traffic lights, schools getting out, etc. Even without direct signals, a NN could potentially pickup on these "rules" given other inputs, e.g. time of day, weather, etc.

?

By @sriram_malhar - 3 months
You can't combine rate of arriving with rate of leaving, can you? Leaving is dependent on arriving, so the latter distribution is dependent on the former.
By @sidcool - 3 months
How does one know the application of such a Math concept for a particular software problem? I couldn't guess in a million years.
By @manvillej - 3 months
hey this might be a stupid detail in implementation, but if the poisson simulation of arrival and completion are started cold, then random assignment of events could assign a completion before an arrival, or there could be more completions than arrivals? Arrivals should always be greater or equal to completions?

Edit: I remembered by stats class. merging poisson distributions requires Independent processes. since start and completion are dependent, its not mathematically valid.

but I am an engineer and it works, so whatever

By @dariosalvi78 - 3 months
my Msc thesis in 2004 was about this: a probabilistic model for last-recently used queues, based on poisson processes, for network packets flows. The model worked OK on a couple of datasets I could get at that time. I tried to publish the work but got rejected a couple of times, then I gave up. If anyone wants to read it (even try to publish) I am happy to share it.