July 29th, 2024

TreeSeg: Hierarchical Topic Segmentation of Large Transcripts

Augmend is creating a platform to automate tribal knowledge for development teams, featuring the TreeSeg algorithm, which segments session data into chapters by analyzing audio transcriptions and semantic actions.

Read original articleLink Icon
TreeSeg: Hierarchical Topic Segmentation of Large Transcripts

Augmend is developing a platform aimed at capturing and automating tribal knowledge for development teams. The platform allows users to record sessions or upload videos, which are then processed to extract knowledge and generate structured data. A key feature of this system is the TreeSeg algorithm, which segments session data into chapters, enhancing organization and context. TreeSeg operates by analyzing audio transcriptions and semantic actions from shared screens, creating a timeline of events that narrate the session. The algorithm employs a method similar to TextTiling, calculating similarities between text chunks to identify topic shifts. Unlike previous methods, TreeSeg focuses on determining when a topic shift occurs within a segment, allowing for a more nuanced segmentation process.

The algorithm uses a recursive approach to create a binary tree of segments, optimizing the segmentation by evaluating potential splits based on temporal clustering loss. It stops splitting when segments reach a minimum viable size, ensuring that sub-segments remain meaningful. TreeSeg has been shown to outperform other segmentation methods, leveraging global information to enhance accuracy. The official implementation of TreeSeg is open-source, available on GitHub, and includes tools for parsing datasets and adapting existing baselines for hierarchical segmentation tasks. Future developments will explore how to convert segments into titled chapters and utilize the segment tree for processing large transcripts.

Related

Link Icon 4 comments
By @blackkettle - 4 months
This is quite interesting, but I have to ask, have you experimented much with larger LLMs as a mechanism to basically automate the entire process?

I'm doing something pretty similar right now for internal meetings and I use a process like: transcribe meeting with utterance timestamps, extract keyframes from video along with timestamps, request segmented summary from LLM along with rough timestamps for transitions, add keyframe analysis (mainly for slides).

gpt-4o, claude sonnet 3.5, llama 3.1 405b instruct, llama 3.1 70b instruct all do a pretty stunning job of this IMO. Each department still reviews and edits the final result before sending it out, but I'm so far quite impressed with what we get from the default output even for 1-2hr conversations.

I'd argue the key feature for us is also still providing a simple, intuitive UI for non technical users to manage the final result, edit, polish and send it out.

By @Terretta - 4 months
The recent StackOverflow developer survey noted a prevalence (mislabeled as popularity) over 50% of Microsoft Teams collaboration tool among groups of devs, higher prevalence than Slack.

For devs using Teams, particular remote teams, trial Teams Premium, switch on recording and enable transcripts, then switch on the Microsoft "Meet" app for Teams. (If you are colocated, Teams has a mode where each dev can join with their own device in the same room, and it uses that to enhance speaker detection.)

After a meeting, you may be surprised, stunned even, at the usefulness of the “Meet” app experience for understanding the meeting conversation flow, participant by participant, the quality of the transcript, the quality of the OpenAI backed summary, and the utility of the follow-ups extracted.

This material also becomes searchable, and assuming you leverage Microsoft Stream and retain the meets and recordings, usable as training material as well.

While Augmend takes this idea to the next level, if you are using Teams* and aren't using Meet, you are missing out.

---

Overview:

https://support.microsoft.com/en-us/office/stay-on-top-of-me...

However, this doesn't show the timeline of speakers and more importantly timeline of topics, which is the most valuable part for review. For a double-click on that, see:

Meeting recap in Microsoft Teams > Topics and Chapters:

https://support.microsoft.com/en-us/office/meeting-recap-in-...

* The meeting recap by AI is in Teams Premium

By @gklezd - 4 months
Here is a link to the preprint for more details: https://www.arxiv.org/abs/2407.12028
By @potatoman22 - 4 months
Reminds me of this site VideoGist. They do a similar thing, breaking down transcripts into chapters and providing summaries for each chapter.

https://news.ycombinator.com/item?id=38555629