How many times have you seen a clickbait article that you actually wanted to check out, but you didn't want to click through a million ads, or expose yourself to a myriad of trackers? Alternatively, how many times have you read something you wanted to share or remember, copied it down or bookmarked the page, then immediately forgot about it?
Well, if you're like me, this is a regular occurrence in your life. I have been wanting to create something like kb-web for a long time.
This is the story of kb-web—version 4 of a web-importer application that I have built. There are three key factors that made this latest iteration a massive success:
- Gemma (via Ollama): This local AI model is powerful enough, smart enough, and has low enough latency on my production hardware that the speed of generation is no longer the bottleneck it once was.
- Advancement in My Personal Coding Skills: I have a lot more development experience under my belt than I did when I first set out to re-work this application.
- An Agentic Development Loop: Setting up an agentic coding workflow that works exactly the way I want (using Google Gemini and the Antigravity IDE) to automate test suites, run standard packaging pipelines, and maintain strict, agent-facing documentation. (Look out for a future article where I walk through how to set up this workflow environment!).
The Development Timeline: Version 1 to Version 4
Version 1 (Circa 2023): The CLI script
- What it did: A simple script using
httpxto fetch a webpage as HTML and save it in a folder named for the URL's basename. A second script then took that HTML and converted it into a raw Markdown document in the same folder. - The Workflow: Targeted the folder as an Obsidian Markdown vault to view and search the results.
- The Pain Points:
- Markdowns contained all the noisy navigation code and footer links from the raw HTML.
- There was no descriptive title for the pages.
- I still had to search pages manually or use Obsidian's basic search to find what I needed.
Version 2: The Database Shift (Circa 2024)
- What it did: Moved the HTML and Markdown content into a structured database.
- The Workflow: Robust error-handling was introduced in the script to filter out malformed pages during CLI ingestion. Added an exporter to compile database entries back into markdown files inside the vault folder so I could still use the Obsidian UI.
Version 3: The Template Precursor (Early 2026)
- What it did: This was the direct precursor to
kb-web. It introduced theHTMLPagePydantic model and mapped it as a 1:1 schema representation with database records, serving as the boilerplate for a web interface.
The Deep Dive: Building Version 4
This is where things got really interesting. Over the course of a single week, I sat down to turn that raw boilerplate into a full-fledged local-first, self-hosted web portal and CLI bridge. Here is how that journey unfolded step-by-step.
Part 1: Establishing the Foundation (Late May 2026)
I started with a simple, clear goal: build a local web portal that could scrape webpages and parse their core readability text into markdown.
- FastAPI & Pydantic: I set up a lightweight web server with FastAPI. Pydantic models handled validation, making sure the incoming page data was clean before parsing.
class HTMLPage(BaseModel):
url: str
title: Optional[str] = None
html_content: str
md_content: str
links: list[str]
html_content_hash: str
md_content_hash: str
fetched_at: str
description: Optional[str] = None
tags: Optional[list[str]] = None
- sqlite-utils: For data storage, I went with
sqlite-utils. Since this is a local-first application, SQLite is a perfect fit. I set up the schema to map URLs as the primary key and record the page title, raw HTML, parsed markdown, and timestamps.
def init_db(db: sqlite_utils.Database) -> None:
db.enable_wal()
if "fetched_pages" not in db.table_names():
db["fetched_pages"].create({
"url": str,
"title": str,
"html_content": str,
"md_content": str,
"links": str,
"html_content_hash": str,
"md_content_hash": str,
"fetched_at": str,
"description": str,
"tags": str,
}, pk="url")
- Readability Scraper: I built an automated scraper using
httpxandBeautifulSoup. The parser strips out noisy headers, nav bars, and ads, keeping only the body text and converting the layout to clean Markdown viaHTML2Text.
Part 2: Moving to Production, Agentic Pipelines, & the systemd Daemon (May 31, 2026)
After the initial port of the application I had created into the specific agentic coding environment I had setup on my dev machine, the application was running from an ssh session on my dev pc on the production server, and the AI wiki generation was working.
I had cautiously allowed an AI agent to touch my code for the first time. Because I had spent more than a day laying out my workflows, coding standards, and artifact creation/documentation process—so that every portion of every step is documented in markdown as code and added to a documentation folder in the outer workspace—I felt slightly less nervous that the whole thing was going to fall apart.
- Daemonizing with systemd: I wrote a systemd service file (
kb-web.service) to run the FastAPI app as a background system service.
[Unit]
Description=Knowledge Base Web Service Ingestion and Dashboard
After=network.target
[Service]
Type=simple
User=will
WorkingDirectory=/srv/kb-web
EnvironmentFile=-/srv/kb-web/.env
ExecStart=/srv/kb-web/.venv/bin/kb-web serve --host 0.0.0.0 --port 8050
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
- Automated Installer: I wrote a service installer script to automate the setup process—creating the virtual environment, syncing dependencies with
uv sync, and managing service restarts. - Isolated Virtual Environment: I pointed the systemd daemon to execute commands directly using the virtual environment's binaries (
/srv/kb-web/.venv/bin/kb-web). This completely isolated the application's dependencies from the host system's packages.
Part 3: Real-world Security, Gunicorn Clustering, & Admin Dashboard Observability (June 1, 2026)
I felt like I had succeeded and this was all I actually wanted/needed. But then I realized that my production site, which was out on the open internet, allowed any user to make my server scrape any address they wanted. That was a recipe for disaster.
So, I decided I needed to add an admin portal. Since I was mucking with the code—and starting to muck with some of the agentic coding tools at the same time—I decided to start really trying to add features (Saturday/Sunday 5/30-5/31 late night into the early morning).
- Solving the Reboot Interruptions: I created the service configurations so I could stop the interruptions created because my dev computer—which was sshed into my prod server running the service—would reboot to apply Windows updates and drop the active server process. I solved those reboot downtime issues once and for all by letting systemd manage the process.
- Auditing the AI plans: I actually found myself enjoying the process of auditing the
implementation planartifacts that the agents create. I made a point to add an instruction that all of them are automatically stored in our logs before any changes are made for proper documentation. - Multi-process Gunicorn Worker Pools: I integrated Gunicorn to spawn multi-process worker pools. This allowed the app to scale across multiple CPU cores on my server.
class StandaloneApplication(gunicorn.app.base.BaseApplication):
def __init__(self, app_uri: str, options: dict = None):
self.options = options or {}
self.app_uri = app_uri
super().__init__()
def load_config(self):
config = {key: value for key, value in self.options.items()
if key in self.cfg.settings and value is not None}
for key, value in config.items():
self.cfg.set(key.lower(), value)
def load(self):
from importlib import import_module
module_path, app_name = self.app_uri.split(":")
module = import_module(module_path)
return getattr(module, app_name)
-
Observability Configurations: I realized that the Gotify notifications had never worked on the prod machine. I had forgotten or neglected to add the user environment variables configured on my dev machine to my prod machine, and hadn't set up a
.envfile in production. I built these missing values into the admin settings portal so I could control things on a server level, including Gotify configs and the generation prompt itself. That shouldn't be hardcoded in my opinion! -
Gemini's Title Generation Hack: I didn't like how raw titles of scraped sites were displayed. I asked Gemini to generate clean titles. This was the moment I was really surprised—not because the solution was good, but because it was something I would have never thought to try. Instead of writing a separate prompt and incurring another round-trip call to generate a title, Gemini just asked the model to include a title in the wiki description block and parsed it out! It was simple, clever, and effective.
-
Backfilling in the Background: I created the process to generate tags and the wiki for the articles as they came in, and added a button to the admin portal to trigger it—backfilling the records without descriptions using a loop. At first, I thought this process was broken and was ready to tell the agent off for not following the instructions properly. But when I went to
ctrl + cto kill it, I saw it printwaiting on background tasks to complete...and exit cleanly. I was surprised that everything was implemented exactly to the spec we had laid out in our back-and-forth. It was a clear example of the agents implementing code exactly as described, and I was really enjoying feeling like I was onto something with this agentic coding process. -
Admin Wiki Page Customization: After this, I was really on a roll and wanted to add more functionality to the wiki pages if I was logged in as the admin. I wanted the ability to:
-
Regenerate wiki entries and titles to adhere to the new standards.
- Re-fetch the source page to update content if the source article had changed.
- Version the content so I wouldn't lose historical snapshots.
- Edit/regenerate tags for an entry (adding my own or deleting generated ones).
- Delete entries altogether.
I added all of this to one prompt to a new agent, and in a matter of minutes, it had generated a page with all of the edit functionality that I wanted.
- Bedtime Scrolling: With everything working as I wanted, I laid in bed that morning next to my partner, scrolling the previous posts I had ingested and regenerating tags and wikis to adhere to the new standards I had created. I was just really enjoying the fact that I had this production application that I had built with AI running on the open internet that I could now use to make my life easier and more organized.
For about a week the service was running on production and accessible to me anywhere, and I was really enjoying the fact that it was there and everything was working just fine.
Part 4: Building the AI Feature Stack & Scraping YouTube Transcripts (June 4, 2026)
Then a funny thing happened; my google news feed started being filled with more and more youtube videos. I'm sure this is a coincidence and it has nothing to do with me copying and scraping links from their feed without clicking them on my phone or sharing them to my friends—expanding their knowledgebase on me and my interests and connections... This is not about that though.
The main point is that this made me wonder if I could have Gemini add scraping youtube transcripts as HTML pages and generating "wikis" for them—allowing me to scrape the youtube links that I was now getting so many of.
1. Frictionless Share Triggers (PWA & Bookmarklets)
- PWA Share Target: I registered the application as a Web Share Target. Now, on mobile or desktop, I can save articles to my wiki straight from the system share sheet.
- Login Parameter Preservation: I refactored the authentication guard. If I share a link while logged out, the app redirects me to login and then forwards me straight to the target import page with all shared URL parameters preserved.
- Bookmarklet: I made a simple bookmarklet that I dragged to my browser bookmarks bar. Clicking it packages the current tab's link and info and posts it to the ingestion API in one click.
2. Messy Ingestion Parsing
- Copy-pasting text from chat apps or notes often includes extra conversational noise. I added regex parsing to automatically extract the first URL from any messy copy-pasted blocks, removing prefix garbage like
source: some website http://...from the entry.
3. YouTube Media Ingestion
- I integrated
yt-dlpandyoutube-transcript-apito process YouTube links. - The backend pulls the timestamped subtitle tracks and formats them into a clean, readable transcript file, creating a searchable text index of the entire video.
# Formatting timestamps into transcripts
transcript_lines = []
for entry in transcript_list:
start_sec = int(entry.start) if hasattr(entry, "start") else int(entry.get("start", 0))
text_content = entry.text if hasattr(entry, "text") else entry.get("text", "")
minutes = start_sec // 60
seconds = start_sec % 60
timestamp = f"[{minutes:02d}:{seconds:02d}]"
transcript_lines.append(f"{timestamp} {text_content}")
transcript = "\n".join(transcript_lines)
4. Semantic Search & Vector Embeddings
- I hooked up local embeddings using Ollama's embeddings API, default-running
nomic-embed-text. - Whenever a page is saved or updated, the backend calculates the vector embeddings for the title, summary, and tags, storing them in an
article_embeddingstable.
# Calculate description and tags vector and store in sqlite
text_to_embed = f"Tags: {', '.join(tags)}\n\nDescription: {description}"
response = client.embeddings(model="nomic-embed-text", prompt=text_to_embed[:4000])
embedding = response["embedding"]
db["article_embeddings"].upsert({
"url": url,
"embedding": json.dumps(embedding),
"updated_at": datetime.now().isoformat(),
}, pk="url")
- The detail view calculates cosine similarity over the cached vector space to suggest related articles—giving me topic recommendations even if they don't share a single keyword!
5. Model Context Protocol (MCP) Server
- I built an MCP stdio server using FastMCP. Now, external AI agents (like Claude Desktop) can connect directly to my
kb-webdatabase to search, list, and fetch summaries of what I've ingested.
mcp = FastMCP("KB Web")
@mcp.tool()
def search_articles(query: str) -> str:
"""Searches for articles in the Knowledge Base matching the query string."""
db = _get_db()
rows = list(db.execute_returning_dicts(
"SELECT * FROM fetched_pages WHERE title LIKE ? OR description LIKE ? OR tags LIKE ?",
[f"%{query}%", f"%{query}%", f"%{query}%"]
))
return json.dumps(rows, indent=2)
Part 5: Overcoming the Concurrency Bottleneck (June 5, 2026)
Everything was working beautifully—until I deployed the new feature stack under my Gunicorn multi-process cluster. Suddenly, I started getting a classic, painful database error:
sqlite3.OperationalError: database is locked
The Diagnostic: "Write-on-Read" Contention
To track user sessions across different Gunicorn worker processes, I had stored active logins in an SQLite table called active_sessions. Every request to a public page checked this table and ran a cleanup query to delete expired tokens:
db.execute("DELETE FROM active_sessions WHERE expiry < ?", [current_time])
Even if a user was only reading a page, the server was executing a DELETE query. SQLite treats this as a write transaction, locking the database file. With Gunicorn running multiple workers, they concurrently tried to write to active_sessions. Workers blocked each other, hung for the full SQLite timeout (30 seconds), and crashed with database-locked exceptions.
The Fix: Stateless Cookies & Thread-Local Caching
To fix database locking once and for all, I did a major refactor of the authentication system:
- Stateless HMAC-SHA256 Cookies: I threw out the database session table entirely and switched to cryptographically signed cookies. When logging in, the server signs a session expiration timestamp using a secret key derived from my admin password.
def generate_session_token(expiry_time: float) -> str:
import hmac
payload = str(int(expiry_time))
secret = hashlib.sha256(config.admin_password.encode("utf-8")).digest()
signature = hmac.new(secret, payload.encode("utf-8"), hashlib.sha256).hexdigest()
return f"{payload}.{signature}"
Verification is now a CPU-only signature check (hmac.compare_digest). GET requests are now completely lock-free and execute zero write queries on the database.
2. Thread-Local Connection Cache: I refactored _get_db() to cache database handles using thread-local storage (threading.local()). Each thread now owns its database connection. I also wrapped database schema checks inside a threading lock to prevent race conditions during concurrent server starts.
_local = threading.local()
_init_lock = threading.Lock()
def _get_db():
db_path = config.db_path
db = getattr(_local, "db", None)
if db is None:
conn = sqlite3.connect(db_path, timeout=30.0, check_same_thread=False)
db = sqlite_utils.Database(conn)
with _init_lock:
init_db(db)
_local.db = db
return db
- Automated Concurrency Testing: I added unit tests to make sure this would never slip past me again:
test_get_requests_are_write_freeintercepts database execution during public GET requests and asserts that no write commands (INSERT,UPDATE,DELETE) are run.test_concurrent_reads_no_lockfires concurrent parallel GET requests from a thread pool to verify the connection cache scales safely under load.
Retrospective
By combining stateless cryptographic sessions, thread-local database caching, and local AI model integrations, kb-web has become a fast, secure, and resilient local-first tool. Catching concurrency bugs early and locking them down with automated test suites ensures the system remains robust as the knowledge ecosystem continues to grow.
About the Author
Will Morris is a software developer focused on local-first tools, AI integration pipelines, and robust private software architectures. He builds applications to organize personal knowledge and catalog digital history.
Follow the development of the kb stack on GitHub.