TreeSeg: Hierarchical Topic Segmentation of Large Transcripts
Augmend is creating a platform to automate tribal knowledge for development teams, featuring the TreeSeg algorithm, which segments session data into chapters by analyzing audio transcriptions and semantic actions.
Read original articleAugmend is developing a platform aimed at capturing and automating tribal knowledge for development teams. The platform allows users to record sessions or upload videos, which are then processed to extract knowledge and generate structured data. A key feature of this system is the TreeSeg algorithm, which segments session data into chapters, enhancing organization and context. TreeSeg operates by analyzing audio transcriptions and semantic actions from shared screens, creating a timeline of events that narrate the session. The algorithm employs a method similar to TextTiling, calculating similarities between text chunks to identify topic shifts. Unlike previous methods, TreeSeg focuses on determining when a topic shift occurs within a segment, allowing for a more nuanced segmentation process.
The algorithm uses a recursive approach to create a binary tree of segments, optimizing the segmentation by evaluating potential splits based on temporal clustering loss. It stops splitting when segments reach a minimum viable size, ensuring that sub-segments remain meaningful. TreeSeg has been shown to outperform other segmentation methods, leveraging global information to enhance accuracy. The official implementation of TreeSeg is open-source, available on GitHub, and includes tools for parsing datasets and adapting existing baselines for hierarchical segmentation tasks. Future developments will explore how to convert segments into titled chapters and utilize the segment tree for processing large transcripts.
Related
I'm doing something pretty similar right now for internal meetings and I use a process like: transcribe meeting with utterance timestamps, extract keyframes from video along with timestamps, request segmented summary from LLM along with rough timestamps for transitions, add keyframe analysis (mainly for slides).
gpt-4o, claude sonnet 3.5, llama 3.1 405b instruct, llama 3.1 70b instruct all do a pretty stunning job of this IMO. Each department still reviews and edits the final result before sending it out, but I'm so far quite impressed with what we get from the default output even for 1-2hr conversations.
I'd argue the key feature for us is also still providing a simple, intuitive UI for non technical users to manage the final result, edit, polish and send it out.
For devs using Teams, particular remote teams, trial Teams Premium, switch on recording and enable transcripts, then switch on the Microsoft "Meet" app for Teams. (If you are colocated, Teams has a mode where each dev can join with their own device in the same room, and it uses that to enhance speaker detection.)
After a meeting, you may be surprised, stunned even, at the usefulness of the “Meet” app experience for understanding the meeting conversation flow, participant by participant, the quality of the transcript, the quality of the OpenAI backed summary, and the utility of the follow-ups extracted.
This material also becomes searchable, and assuming you leverage Microsoft Stream and retain the meets and recordings, usable as training material as well.
While Augmend takes this idea to the next level, if you are using Teams* and aren't using Meet, you are missing out.
---
Overview:
https://support.microsoft.com/en-us/office/stay-on-top-of-me...
However, this doesn't show the timeline of speakers and more importantly timeline of topics, which is the most valuable part for review. For a double-click on that, see:
Meeting recap in Microsoft Teams > Topics and Chapters:
https://support.microsoft.com/en-us/office/meeting-recap-in-...
* The meeting recap by AI is in Teams Premium