Every AI notetaker on the market has the same catch: to summarize your meeting, it uploads your audio to its cloud. For a regulated team — healthcare, finance, anyone under data-residency rules — that's a non-starter. Your private calls quietly become someone else's data.
This is exactly why we built the summary engine inside Olivasal to run entirely on infrastructure you control. In this post I'll show you the core of it: take a recorded meeting, produce a clean transcript, and generate a structured summary with action items — all on your own box, nothing leaving the network.
What you'll build
A three-stage pipeline: recording → transcript → AI summary. Stage 1 uses your meeting recording (from LiveKit egress, or any file). Stage 2 transcribes with faster-whisper. Stage 3 summarizes with a local Qwen model served by Ollama. No external API is ever called.
Prerequisites
# System: ffmpeg for audio, plus Python 3.10+
# A GPU makes transcription much faster, but CPU works too
pip install faster-whisper openai
# Ollama for the local LLM (https://ollama.com)
ollama pull qwen2.5
Step 1 — Get the audio
If you're running a conference stack, LiveKit's egress writes a recording when the call ends. Composite egress gives you one mixed file; per-track egress gives you one file per participant (we'll use that later for speaker labels). Either way, extract a clean 16 kHz mono WAV — the format Whisper likes best:
ffmpeg -i meeting.mp4 -ar 16000 -ac 1 -c:a pcm_s16le audio.wav
Step 2 — Transcribe locally with faster-whisper
faster-whisper is a reimplementation of Whisper that's several times quicker and lighter on memory. The vad_filter flag trims silence so you don't waste compute on dead air.
from faster_whisper import WhisperModel
# use device="cpu", compute_type="int8" if you have no GPU
model = WhisperModel("large-v3", device="cuda", compute_type="float16")
segments, info = model.transcribe(
"audio.wav",
beam_size=5,
vad_filter=True,
)
transcript = " ".join(seg.text.strip() for seg in segments)
print(f"Detected language: {info.language}")
print(transcript)
That's your full transcript, produced without a single byte leaving the machine.
Step 3 — Summarize with a local LLM
Ollama exposes an OpenAI-compatible endpoint, so you can use the standard openai client and just point it at localhost. We ask for structured JSON so the output is easy to store and render.
from openai import OpenAI
import json
# api_key is required by the client but ignored by Ollama
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
prompt = f"""You are a meeting assistant. From the transcript below, return JSON with:
- "summary": a 3-sentence overview
- "decisions": a list of key decisions made
- "action_items": a list of objects with "task" and "owner"
Transcript:
{transcript}"""
resp = client.chat.completions.create(
model="qwen2.5",
messages=[{"role": "user", "content": prompt}],
response_format={"type": "json_object"},
temperature=0.2,
)
notes = json.loads(resp.choices[0].message.content)
print(json.dumps(notes, indent=2))
You now have clean, structured meeting notes — summary, decisions, and owner-tagged action items — generated entirely on your own hardware.
Step 4 — Add speaker labels (optional)
This is where per-track egress earns its keep. Instead of one mixed file, you get a separate audio track per participant. Transcribe each track on its own, tag every segment with that participant's identity, then merge all segments back together in timestamp order. The result is a speaker-attributed transcript — "Priya: … / Arun: …" — which makes the LLM's action-item ownership far more accurate, because it can see who actually committed to what.
Why self-hosted matters here
The convenience of AI summaries usually comes at the cost of handing your conversations to a vendor. This pipeline gives you the convenience and keeps every stage — media, transcript, and model — inside your own walls. For regulated teams, that's not a nice-to-have; it's the whole requirement.
This is the engine behind Olivasal, our self-hosted video conferencing with built-in AI summaries — and the same core powers the clinical scribe in our telehealth platform, Saffron. If you'd rather have this running out of the box than wire it up yourself, that's exactly what we built → olivasal.com