Agent skill
knowledge-ingest
Ingest URLs, documents, and text into the memory system as structured knowledge
Stars
557
Forks
72
Install this agent skill to your Project
npx add-skill https://github.com/QuixiAI/Hexis/tree/main/skills/installed/knowledge-ingest
SKILL.md
Knowledge Base Ingestion
Transform external content -- web pages, documents, raw text -- into structured semantic memories that persist in the knowledge graph.
When to Use
- When the user shares a URL and says "learn this" or "remember this article"
- When a research workflow finds valuable sources that should be retained long-term
- When the user pastes raw text (notes, transcripts, outlines) to be ingested
- During heartbeats when a goal involves building knowledge on a specific topic
- When importing reference material for a project or domain
Step-by-Step Methodology
- Assess the source: Before ingesting, determine what kind of content it is (article, documentation, transcript, raw notes). This guides how aggressively to summarize.
- Fetch and parse: For URLs, use
ingest_urlwhich handles fetching, HTML-to-text conversion, and chunking. For raw text, useingest_textdirectly. - Check for duplicates: Use
recallwith the URL or a key phrase from the content to see if it has already been ingested. Avoid storing the same source twice. - Chunk intelligently: Long content is automatically chunked by the ingestion pipeline. Each chunk becomes a separate semantic memory linked by source metadata. Trust the pipeline's chunking; do not manually split content unless it is clearly failing.
- Add context: When storing via
remember, include metadata about the source: URL, author, date published, and why it was ingested (which goal or topic it serves). - Verify ingestion: After ingestion completes, run a quick
recallon a key concept from the content to confirm it is retrievable. - Connect to goals: If the ingested content relates to an active goal, note the connection so future heartbeats can leverage it.
Quality Guidelines
- Prefer ingesting authoritative, primary sources over summaries or aggregators.
- Do not ingest entire websites. Be selective -- ingest the specific pages that contain the needed information.
- When ingesting long documents, let the chunking pipeline do its job. Each chunk retains a reference to the parent source.
- Always record the source URL or origin. Memories without provenance are harder to evaluate and update later.
- Respect rate limits and robots.txt when fetching URLs. If a fetch fails, note the failure and move on rather than retrying aggressively.
- For sensitive or private content (internal docs, personal notes), ensure the user understands that ingested content persists in the local database.
Didn't find tool you were looking for?