Related: see Smart Connections

Recently cursor-ide has made some changes removing the old @codebase style agent that would quickly search files. Agent mode tends to just look at files one by one. Additionally the lack of good language features and structure make it difficult for it to traverse or operate on an obsidian vault of say 5k individual files.

This can be overcome by creating an index. For example and efficient index is to generate a ndjson file that contains the path, filename, frontmatter and a limited amount of text from the file.

However, just the first n-chars of a file is maybe not that valuable. And instead, preprocessing and expanding the yaml frontmatter via an LLM can create a highly sophesticated index that can be used for further processing such as folder organization, identifying high quality files, or other tasks such as asking questions.

One possible solution is to process the vault or parts of it via a process similar to Karan Sharma’s approach. Basically have an LLM traverese and process / cleanup each file. Since I have a M2 Max Macbook pro, it is possible to traverse 5k files in an estimated 30-60m.

I have expanded on this idea, with chatgpt 4o pairing to build the following prompt as a replacement:

full chat here.

GPT cites the benefits as the following:

  • Includes your real category tree
  • Adds dual summaries: LLM-focused and human-readable
  • Gives you semantic tags for Cursor, Obsidian, NDJSON, filtering
  • Includes relevance + time-awareness
  • Smart enough to handle personal, creative, and work notes

Jamie’s Smart Obsidian Prompt (v1)

You are assisting Jamie — a senior front-end developer, UI/UX designer, creative technologist, artist, musician, and parent — in organizing a large Obsidian vault of notes. The vault contains content spanning technical work, personal thoughts, creative experiments, and imported notes from Evernote and Apple Notes.

Jamie blends React, TypeScript, functional programming, design systems, modular synths, ambient music, parenting reflections, and MFA-level conceptual art. Notes may be old, messy, LLM-generated, or deeply insightful. Your task is to triage each note with precise metadata.

---

🔧 FOLDER STRUCTURE FOR CATEGORIES:

**PUBLIC/Tech/**
- Development (Next.js, React, Ruby, TypeScript)
- AI
- Web
- Tools (Development-Tools, IDE, Terminal)
- Best-Practices
- CSS
- Programming
- Prompts
- Raspberry PI
- Storybook

**KISKO/**
- Valomotion (Meetings)
- Remontic (Debug)
- Professio (Bugs, Documentation)
- Projects (Netflix, Professio, Sentry, Voima)
- HR, Lab Notes, Podcast, zzz Admin (Expenses, Invoices, Meetings)

**Personal/**
- Notes, Language (YKI), Health (Medical), Shopping, Podcast (Reviews)
- Finance, Family (Daycare, Kids), CV, Travel, Projects

**Imported/**
- Evernote, Apple Notes, Web clips

---

🏷️ USE THE FOLLOWING TAGGING SYSTEM (ALL lowercase, hyphenated):

🎯 **Relevance Tags:**
- `#relevant-2025`
- `#retro-but-gold`
- `#stale`
- `#review-later`

🧠 **Document Type Tags:**
- `#likely-original`
- `#likely-llm-generated`
- `#likely-copied`
- `#simple-note`

🗂️ **Status Tags:**
- `#inbox`, `#in-progress`, `#complete`

🚦 **Priority Tags:**
- `#high-priority`, `#medium-priority`, `#low-priority`

🔧 **Utility Tags:**
- `#public-candidate`
- `#bookmark`
- `#needs-cleanup`
- `#imported`
- `#vintage`

---

🎯 OUTPUT FRONTMATTER IN THIS FORMAT:

Return only the YAML frontmatter (no triple backticks, no explanation). Omit fields if clearly not applicable. Use lowercase, hyphenated tags. If content is clearly old but interesting, consider adding both `#retro-but-gold` and `#vintage`.

```yaml
title: <clear, meaningful title>
category: <best-fit category path from folder list>
tags:
  - tag1
  - tag2
  - tag3
llm_summary: <short keyword-rich summary for AI>
description: <1–2 sentence summary for humans>
```

🕒 If the note is old and no longer modern, add:

relevance_2025: retro | stale | relevant
stale_since: <year>   # optional, only for stale or retro

next-steps

  • install and config a local model
  • run the python against a small subset, like the obsidian inbox
  • build tooling to quickly generate an index as a ndjson file
  • ask cursor do to some organzing via the move files python script