Show HN: I created a Neural Network from scratch, in scratch
The article discusses implementing a 1-layer Feed Forward Network in Scratch for image classification. Challenges with handling multi-dimensional data were faced, but promising results were achieved with limited MNIST dataset training.
Read original articleThis article describes the author's experience implementing a 1-layer Feed Forward Network from scratch to classify images from the MNIST dataset. The model is trained using the Scratch programming language, which presents limitations in handling multi-dimensional data and lacks functions and variable scope. The implementation covers initializing weights using the Xavier initialization method, matrix multiplication, ReLU activation function, Softmax for probability distribution, and the Cross Entropy Loss calculation. The backward pass for gradient descent involves calculating gradients for trainable parameters using the chain rule. The author encountered challenges with memory order assumptions, leading to a need to rewrite parts of the code. Despite these setbacks, the model showed promising results when tested on dummy data points. Due to Scratch's limitations, only a subset of the MNIST dataset was used for training. The article highlights the complexities of implementing deep learning concepts in a constrained programming environment like Scratch.
Related
ML from Scratch, Part 3: Backpropagation (2019)
The article explains backpropagation in neural networks, detailing equations, matrix operations, and activation functions. It emphasizes linear algebra and calculus, model fitting, parameter optimization, and binary cross-entropy for minimizing loss. Calculating gradients and deltas iteratively is crucial.
Show HN: UNet diffusion model in pure CUDA
The GitHub content details optimizing a UNet diffusion model in C++/CUDA to match PyTorch's performance. It covers custom convolution kernels, forward pass improvements, backward pass challenges, and future optimization plans.
An Analog Network of Resistors Promises Machine Learning Without a Processor
Researchers at the University of Pennsylvania created an analog resistor network for machine learning, offering energy efficiency and enhanced computational capabilities. The network, supervised by Arduino Due, shows promise in diverse tasks.
My Python code is a neural network
Neural networks are explored for identifying program code in engineering messages. Manual rules and a Python classifier are discussed, with a suggestion to use a recurrent neural network for automated detection.
Transformer Layers as Painters
The study "Transformer Layers as Painters" by Qi Sun et al. delves into transformer models, showcasing layer impact variations and potential for model optimization through strategic layer adjustments.
Related
ML from Scratch, Part 3: Backpropagation (2019)
The article explains backpropagation in neural networks, detailing equations, matrix operations, and activation functions. It emphasizes linear algebra and calculus, model fitting, parameter optimization, and binary cross-entropy for minimizing loss. Calculating gradients and deltas iteratively is crucial.
Show HN: UNet diffusion model in pure CUDA
The GitHub content details optimizing a UNet diffusion model in C++/CUDA to match PyTorch's performance. It covers custom convolution kernels, forward pass improvements, backward pass challenges, and future optimization plans.
An Analog Network of Resistors Promises Machine Learning Without a Processor
Researchers at the University of Pennsylvania created an analog resistor network for machine learning, offering energy efficiency and enhanced computational capabilities. The network, supervised by Arduino Due, shows promise in diverse tasks.
My Python code is a neural network
Neural networks are explored for identifying program code in engineering messages. Manual rules and a Python classifier are discussed, with a suggestion to use a recurrent neural network for automated detection.
Transformer Layers as Painters
The study "Transformer Layers as Painters" by Qi Sun et al. delves into transformer models, showcasing layer impact variations and potential for model optimization through strategic layer adjustments.