July 25th, 2024

Memory Mapping an FPGA from an STM32

The article details integrating an FPGA with an STM32 microcontroller using a memory-mapped interface, emphasizing simplicity, security, and performance for embedded projects through a flexible architecture and efficient data handling.

Read original article

The article discusses the integration of an FPGA with an STM32 microcontroller (MCU) for embedded projects, focusing on a memory-mapped interface. The author prefers this two-chip solution over SoC FPGAs due to its simplicity in programming, sufficient on-chip memory, and the ability to enforce security boundaries between the FPGA and MCU. The Flexible Memory Controller (FMC) is utilized as the bridge between the STM32's AXI interface and the FPGA's internal interconnect, allowing for various memory types and configurations. The design includes a test board featuring a STM32H735 MCU and a Xilinx Spartan-7 FPGA, connected via FMC, OCTOSPI, and RMII interfaces. The FPGA design incorporates a tri-speed Ethernet MAC, GPIO ports, and health monitoring blocks, using a 32-bit APB interconnect for control signals. The FMC bridge converts transactions between the STM32 and FPGA, ensuring proper latency management. The author highlights the performance of the interface through a benchmark application, indicating the system's capability to handle data efficiently. The overall architecture aims to provide flexibility in selecting components while maintaining a straightforward design approach, suitable for various embedded applications.

Debugging hardware is hard

Debugging hardware can be complex. A case study involving communication problems between STM32 MCU and ESP32 WiFi chips in Pickup device revealed an unexpected glitch in the STM32's auto-calibration feature affecting UART communication. Disabling it resolved the issue, emphasizing the need for thorough hardware and software analysis.

Hardware FPGA DPS-8M Mainframe and FNP Project

A new project led by Dean S. Anderson aims to implement the DPS‑8/M mainframe architecture using FPGAs to run Multics OS. Progress includes FNP component implementation and transitioning software gradually. Ongoing development updates available.

C++ Design Patterns for Low-Latency Applications

The article delves into C++ design patterns for low-latency applications, emphasizing optimizations for high-frequency trading. Techniques include cache prewarming, constexpr usage, loop unrolling, and hotpath/coldpath separation. It also covers comparisons, datatypes, lock-free programming, and memory access optimizations. Importance of code optimization is underscored.

Inside an IBM/Motorola mainframe controller chip from 1981

The IBM 3274 Control Unit chip from 1981, SC81150R, was examined, revealing IBM and Motorola collaboration. It featured a 16x16 memory block, PLAs, and a 16-bit bus. The chip specialized in data handling, lacking ROM and microcode, with a unique memory buffer design. The analysis highlighted vintage mainframe technology complexity.

Don't snipe me in space-intentional flash corruption for STM32 microcontrollers

The MOVE student club at the Technical University of Munich develops a reliable Rust-based bootloader for STM32 microcontrollers in satellites. Rigorous testing ensures the bootloader's resilience to flash errors, offering mission continuity.

7 comments

By @dmitrygr - 8 months

Be veeeery careful. STM32H QSPI peripheral is FULL OF very nasty bugs, especially the second version (supports writes) that you find in STM32H0B chips . You are currently avoiding them by having QSPI mapped as device memory, but the minute you attempt to use it with cache or run code from it, or (god help you) put your stack, heap, and/or vector table on a QSPI device, you are in for a world of poorly-debuggable 1:1,000,000 failures. STM knows but refuses to publicly acknowledge, even if they privately admit some other customers have "hit similar issues". Issues I've found, demonstrated to them, and wrote reliable replications of:

* non-4-byte-sized writes randomly lost about 1/million writes if QSPI is writeable and not cached

* non-4-byte-sized writes randomly rounded up in size to 2 or 4 bytes with garbage, overwriting nearby data about 1/million writes if QSPI is writeable and cached

* when PC, SP, and VTOR all point to QSPI memory, any interrupt has about a 1/million chance of reading garbage instead of the proper vector from the vector table if it interrupts a LDM/STM instruction targeting the QSPI memory and it is cached and misses the cache

Some of these have workarounds that I found (contact me). I am refusing to disclose them to STM until they acknowledge the bugs publicly.

I recommend NOT using STM32H7 chips in any product where you want QSPI memory to work properly.

By @15155 - 8 months

I recommend checking out SpinalHDL generally - I do a ton of this very same kind of work with these same chips (7 series, US+) and would never look back to Verilog!

AXI (and all memory-mapped bus protocol schemes) becomes very very pleasant. SV interfaces get you 5% of the way there, though!

Also - I was under the impression that S1000-2M is a higher-end material, not cost-optimized? (But not Rogers, of course.)

By @rkangel - 8 months

While we're talking about this sort of architecture, I'd like to plug Elixir.

For some development hardware, we had Elixir running on the ARM of a Zynq Ultrascale, running in tandem with some digital logic. It required one C code "port" that integrated with UIO to expose the registers to our application and then we had a great programming environment.

Elixir for embedded doesn't get talked about that much, but that is actually the origin story of Erlang (software component of telephony hardware). Basic language features like binary pattern matching work very well, and the concurrency approach makes it very easy to write clean performant real-time software. We had a lot of functionality that did used digital logic and then had the stateful stuff in software and it worked very well.

Plus, I could then do stuff like trivially spin up a Web UI with a graphical display of all the register state, that updated live (Phoenix LiveView). And be happy that that running wasn't going to interfere with the realtime stuff.

We did this using Nerves which is a Linux platform set up to boot the BEAM and nothing else (e.g. no init system, just a special pid 0 binary that boots the BEAM and lets that handle all other processes). It had some plus points like making firmware upgrade trivial and simplifying the system, but not being a "normal" linux platform was a bit irritating sometimes. You could equally well just run Elixir as an application normally.

By @throwawayabcdef - 8 months

This is dope. I work with Zynq/Versal quite a bit and respect and understand (conceptually) the decisions you have made!

You get to own every aspect of your toolchain and with that will come a lot of power.

Are you familiar with:

https://github.com/corundum/corundum

Perhaps you can build a support package for your platform.

By @buescher - 8 months

This is really crisp work and nice to see. Before the Zynq era I worked with some designs that used a DSP or StrongARM along with a medium-sized FPGA, where the FPGA would be both the glue logic for RAM as well as custom peripherals, but I've been out of that world for a while. It would be fun to find an application for a big FPGA and a modern microcontroller.

By @chillingeffect - 8 months

Neat! I love that H7 chip and its gargantuan inatruction manual... ...and you didn't even mention its 2nd core :)

By @Already__Taken - 8 months

real quite high level sorry, most of your embedded projects going forward are MCU+fpga to do what? I thought a custom router but 284mbps isn't nearly fast for a network.

Memory Mapping an FPGA from an STM32

Related

Debugging hardware is hard

Hardware FPGA DPS-8M Mainframe and FNP Project

C++ Design Patterns for Low-Latency Applications

Inside an IBM/Motorola mainframe controller chip from 1981

Don't snipe me in space-intentional flash corruption for STM32 microcontrollers

Related

Debugging hardware is hard

Hardware FPGA DPS-8M Mainframe and FNP Project

C++ Design Patterns for Low-Latency Applications

Inside an IBM/Motorola mainframe controller chip from 1981

Don't snipe me in space-intentional flash corruption for STM32 microcontrollers