August 13th, 2024

A C/C++ library for audio and music analysis

audioFlux is a deep learning library for audio analysis, featuring new pitch algorithms in version 0.1.8. It supports Python 3.6+, with modules for transformations, features, and music information retrieval.

Read original article

CuriositySkepticismInterest

A C/C++ library for audio and music analysis

audioFlux is a deep learning library designed for audio and music analysis, feature extraction, and various audio-related tasks. It supports multiple time-frequency analysis methods and is suitable for applications such as classification, separation, music information retrieval (MIR), and automatic speech recognition (ASR). The library emphasizes performance through C implementations and hardware acceleration for FFT, making it effective for large-scale data processing. The latest version, v0.1.8, introduces several pitch algorithms, including YIN, CEP, PEF, NCF, HPS, LHS, STFT, and FFP, along with new algorithms for pitch shifting and time stretching. audioFlux is compatible with Python 3.6 and above and can be installed via PyPI or Anaconda. The main modules include Transform, which offers various time-frequency representation algorithms; Feature, which provides spectral features and cepstrum coefficients; and MIR, which focuses on pitch and onset detection. Comprehensive documentation is available, and contributions are encouraged. The project is licensed under the MIT License.

- audioFlux is a deep learning library for audio analysis and feature extraction.

- Version 0.1.8 includes new pitch algorithms and features for audio manipulation.

- The library can be installed via PyPI or Anaconda for Python 3.6+.

- Main modules cover time-frequency transformations, spectral features, and MIR tasks.

- Contributions to the project are welcome, and it is licensed under the MIT License.

FileFlows: Execute actions against files in a tree flow structure

FileFlows is a versatile tool for processing various file types like text, images, audio, and video. It supports transcoding, converting, and optimizing files, offering detailed reporting and customization options for users.

Black Forest Labs – FLUX.1 open weights SOTA text to image model

Black Forest Labs has launched to develop generative deep learning models for media, securing $31 million in funding. Their FLUX.1 suite includes three model variants, outperforming competitors in image synthesis.

The open weight Flux text to image model is next level

Black Forest Labs has launched Flux, the largest open-source text-to-image model with 12 billion parameters, available in three versions. It features enhanced image quality and speed, alongside the release of AuraSR V2.

FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention

FlexAttention is a new PyTorch API that enhances flexibility and performance in attention mechanisms, allowing users to implement various attention variants efficiently while leveraging existing infrastructure and improving performance through sparsity.

Forget Midjourney – Flux is the new king of AI image generation

Flux, an open-source AI image generator by Black Forest Labs, competes with Midjourney and Stable Diffusion, offering three versions and a developing text-to-video model for enhanced media production.

AI: What people are saying

The comments on the audioFlux article reflect various inquiries and suggestions regarding the library's features and comparisons with other tools.

Users are interested in comparisons with existing C++ music information retrieval libraries.
There are questions about the necessity of GPU-accelerated functions for deep learning applications.
Some commenters clarify the programming languages used, noting that it is C and Python, not C++.
There is curiosity about the library's capabilities, such as audio fingerprinting and feature parity with other libraries like librosa.
One user humorously inquires about a version in Rust, highlighting interest in language compatibility.

11 comments

By @jcelerier - 9 months

It would be nice to have a comparison with any of the many C++ MIR (music information retrieval) libraries in the wild:

- https://essentia.upf.edu/

- https://github.com/marsyas/marsyas

- https://github.com/ircam-ismm/pipo

- https://github.com/flucoma/flucoma-core/tree/main/include/al...

By @dsego - 9 months

It's also for Python, I just discovered it a few days ago. This is the website https://audioflux.top/

By @bravura - 9 months

If this is supposed to be used for deep-learning, shouldn't all the transforms be GPU-accelerated torch functions?

By @nesarkvechnep - 9 months

What's this C/C++ language?

By @dekken_ - 9 months

It's C and Python, not C++

By @BrannonKing - 9 months

So are they going for feature parity with librosa? I think that would be great.

By @gosub100 - 9 months

Can this be used for audio fingerprinting?

By @zombot - 9 months

How can a Python library support iOS?

By @morning-coffee - 9 months

Do you have one in safe Rust? See, we've only just met, and I don't know how you handle your ptr/len arguments in C just yet. ;)

A C/C++ library for audio and music analysis

Related

FileFlows: Execute actions against files in a tree flow structure

Black Forest Labs – FLUX.1 open weights SOTA text to image model

The open weight Flux text to image model is next level

FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention

Forget Midjourney – Flux is the new king of AI image generation

Related

FileFlows: Execute actions against files in a tree flow structure

Black Forest Labs – FLUX.1 open weights SOTA text to image model

The open weight Flux text to image model is next level

FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention

Forget Midjourney – Flux is the new king of AI image generation