Back to writings

How to give agents context that does not fall apart

Why filesystem-first context architectures work, where they break down, and how the same idea can be implemented with structured object spaces using small, composable primitives instead of bespoke tools.

Many engineering teams still take pride in the sophisticated retrieval pipelines they build for their agents.

But recent work suggests a simpler approach often works better. For example, Ashka Stephen shows how LLM agents can explore context exposed as a filesystem using familiar Unix commands like ls , grep , cat , and awk .

This makes sense. Models have been trained on massive amounts of code. They already know how to navigate directories, inspect files, search for patterns, and assemble context incrementally. By exposing data as files and letting the agent explore, you avoid brittle retrieval pipelines and overly opinionated tooling.

ONE DOES NOT SIMPLY ASK AN AGENT TO PICK CONTEXT WITHOUT GIVING IT NAVIGABLE STRUCTURE

But you don't necessarily have to implement this in the form of filesystems or bash.

§The idea behind the filesystem-first context architecture

If you abstract away from the filesystem-first context architecture, the argument becomes:

  1. Let the model decide what context it needs. Do not precompute relevance or force everything through a retrieval layer that encodes your assumptions.

  2. Expose raw structure rather than curated views. The agent should navigate the same primitives a human would, not a hand picked abstraction.

  3. Use operations the model already knows deeply. Do not invent a new DSL (Domain-Specific Language) or a bespoke API if you can reuse concepts that LLMs have already seen extensively during training.

  4. Keep execution minimal and inspectable. You should be able to see exactly what the agent touched and why.

Filesystems and bash satisfy these properties well because they encode hierarchy and locality, they support precise selection, and they are extremely familiar to models (and to many humans too). But none of these properties are inherently tied to POSIX, the Unix style model of files and directories, or to setting up an actual filesystem in a sandbox (as opposed to any other explicit, navigable representation).

§The same idea, but without a filesystem

Instead of "filesystem + bash", we can do "object graph + structural query operators":

  • Rather than directories and files, you expose immutable structured objects with stable identifiers.
  • Instead of shell commands, you give the agent a small set of general operators over object collections.

Conceptually, the mapping looks like this:

Filesystem path        -> object identity or collection
File contents          -> structured object
ls                     -> list(collection)
cat                    -> get(object)
grep                   -> grep(collection, pattern, fields?)
awk                    -> map, group, reduce

This is the same mental model, just without expressing structure primarily through a filesystem, text files, and text oriented operations.

§Why not just use SQLite?

Yes, SQLite can be a perfectly reasonable backing store for structured data.

The distinction I am trying to make is less about persistence and more about the interaction model. If agents primarily explore context through SQL queries, you end up flattening rich objects and relationships and then reconstructing meaning through joins and predicates.

In those cases, it is often simpler to keep the data in a structured object space and expose a small set of general, filesystem like operators to the agent directly over that structure. This keeps identities, relationships, and references stable, and makes the agent’s traversal easier to audit. It also avoids projecting the data into tables and reconstructing structure through joins, where identity and relationship integrity can be lost.

§Why structured object spaces sometimes work better

For many technical domains, such as engineering contents, working directly with structured objects offers advantages. In these domains, meaning is often carried by structure rather than prose, through identities, parameters, thresholds, states, and relationships.

Making that structure explicit avoids heuristic reconstruction and turns identity, relationships, and constraints into something agents can reason about directly.

This shows up in a few ways:

Explicit identity and traceability. You can encode identity in filenames or inside files. But that identity remains conventional rather than enforced. With structured objects, identity is explicit and stable by construction. This makes traceability across views, transformations, and time reliable rather than dependent on naming discipline.

Static validation and unambiguous semantics. Once structure is explicit, it can be validated mechanically. Unused fields, dangling references, inconsistent types, and missing relationships can be detected with linters and schema checks. Presence, absence, and multiplicity are represented directly, avoiding ambiguity that arises when meaning has to be inferred from empty files, missing files, or comments.

First class relationships and auditable, scale aware navigation. Folders work well as long as relationships are hierarchical. But once relationships cross folder boundaries or form general graphs, structure becomes implicit and convention based. With structured objects, relationships are explicit and directly traversable. As a result, agents can reason about cardinality, scope, and sampling before loading content; context selection can be audited at the level of object identity rather than inferred from file access. This keeps prompts smaller, more predictable, and less failure prone.

This does not invalidate the filesystem approach. The filesystem-first context architecture works so well because it exposes structure in a simple, navigable form and gives agents general operators they already understand. It starts to break down once objects carry rich structure that is hard or impossible to maintain explicitly as files and folders. In those cases, especially when that structure needs to be auditable or relied on over time, keeping it explicit in an object space turns it from a convention into something you can actively use.

§What not to do

Do not fall into the "bespoke trap".

The operators you give to the model must feel as native as grep or awk. If they are replaced with a bespoke query language or dozens of narrow tools, the advantage disappears. This is an easy backdoor to recreate exactly the kind of bespoke tooling the filesystem approach avoids, because an in-memory object space is unconstrained. It requires discipline to keep the operators small, general, and familiar.

In practice, this means:

  • Do not create dozens of specialized tools like find_items_by_property or get_related_things.
  • Do not encode business logic inside tools instead of exposing raw structure.
  • Do not invent a custom query language that no LLM has never seen.
  • Do not hide cardinality, uncertainty, or partial results behind abstractions.

All of these recreate the brittleness the filesystem approach is trying to escape.

§What a good in memory setup looks like

A good in memory setup should feel as boring and general as a filesystem. The point is not raw expressiveness, but composability built from a small set of general primitives.

Instead of files and directories, you maintain a set of structured objects. Each object has a stable identifier and typed fields. Objects are grouped into collections, such as documents, records, events, or artifacts. The agent never mutates this state. It only explores it.

The agent is given a small set of general operators that mirror what it would do in a filesystem.

For example:

list(collection, where?) -> ids

Return the identifiers of objects in a collection, optionally filtered by simple conditions. This is the equivalent of ls (list a directory) or a constrained find.

grep(collection, pattern, fields?) -> ids

Return the identifiers of objects in a collection where one or more selected fields match a regex (or other familiar "pattern" syntax). This is the equivalent of grep: quick, cheap text search that lets the agent narrow down what to inspect next without loading full objects. The optional fields? argument keeps this predictable and efficient on structured objects: you can search just title / summary / body, for example, instead of scanning every field. If omitted, one reasonable default is a small set of "text-like" fields.

get(id, fields?) -> object

Fetch a single object, or a projection of it. This plays the role of cat , except that the agent can ask for only the fields it needs.

group(ids, key)

Partition a set of objects by a shared attribute. This mirrors the way an engineer might use awk to group and aggregate records based on selected fields in their content.

count(ids)

Return cardinality without loading the objects themselves. This is analogous to using wc in a Unix pipeline to understand scale before deciding what to inspect.

sample(ids, n)

Inspect a small, representative subset before deciding what to explore further. This is analogous to using head , tail , or shuf in a Unix pipeline to understand the shape of data before processing it fully.

join(left_ids, right_collection, on)

Traverse relationships explicitly instead of inferring them from filenames, directory structure, or text patterns like grep .

Individually, these operators are trivial. Together, they let the agent explore structure incrementally and pull in only the context it needs.

§The architectural takeaway

The question is not whether agents should use filesystems or JSON.

The question is what minimal set of general primitives lets a model explore structure autonomously, without being told in advance what matters.

Filesystem-first context architectures are one answer. Structured object spaces are another. Both work for the same reason: they expose structure directly and give the model familiar, composable operations to navigate it.

The difference is not the idea, but the substrate. Filesystem-first context architectures work when structure maps cleanly to hierarchy and text. Structured object spaces become a better fit once structure is richer, not strictly hierarchical, and needs to remain explicit, auditable, and usable over time.