Agents Need More Than Models: Context Infrastructure Is Taking Shape

Looking at recent GitHub projects, one pattern is becoming harder to ignore:

The central problem for agents is moving from “can I call a stronger model?” to “can I manage context reliably, continuously, and at a reasonable cost?”

This is not a single news event. Several projects are pointing in the same direction.

OpenViking describes itself as a context database for AI agents, using a filesystem-like model to manage memory, resources, and skills. Mirage tries to mount external services such as S3, Google Drive, Slack, Gmail, and Redis into one virtual file tree, so agents can use familiar Unix-like operations across different backends. SkillOpt goes one layer further and treats natural-language skills as external state that can be trained, validated, and improved instead of being a one-off prompt.

These projects solve different problems, but they are all answering a larger question:

When an agent starts running long tasks, calling tools, and accumulating experience, where should its working context live? How should it be read, edited, debugged, and reused?

My current view is simple: context infrastructure will become one of the most important layers in agent applications.

Meet The Three Projects

This article is not trying to blend three projects into one vague trend. They sit at different parts of the agent infrastructure stack: OpenViking is closer to a context database, Mirage is closer to a virtual filesystem for tools and data sources, and SkillOpt is closer to a skill optimizer.

In one table:

Project	One-line Positioning	Main Problem Layer
OpenViking	A context database for AI agents	How to manage memory, resources, and skills together
Mirage	A unified virtual filesystem for AI agents	How agents access different tools and data sources through one interface
SkillOpt	A natural-language optimizer for agent skills	How skills move from handwritten experience to evaluated, iterative assets

These projects are not just “another agent framework.” They are closer to missing infrastructure for agents that need to work over longer horizons.

OpenViking: A Context Database For Agents

OpenViking positions itself as The Context Database for AI Agents.

The problem it targets is fragmentation. An agent’s useful context may include project documentation, user preferences, task logs, skill descriptions, external resources, and historical experience. If these materials are scattered across prompts, chat history, vector databases, and temporary files, the agent has a hard time reusing them consistently.

OpenViking puts memory, resources, and skills into one context database and organizes them with a filesystem paradigm. It is not just about similarity search. Its README emphasizes hierarchy, recursive directory retrieval, layered context loading, and visualized retrieval traces.

You can think of it as a context workspace for agents:

memory stores preferences, experience, and historical state;
resources mount documents, code, references, and external knowledge;
skills describe reusable ways to perform tasks;
layered loading controls which context enters the current task;
retrieval traces help developers understand why the agent read certain information.

This is useful for agent applications that care about long-term memory, project-level context, and multi-skill collaboration. Coding assistants, enterprise knowledge assistants, and personal workflow agents can all run into this problem.

The boundary is also clear. OpenViking helps organize and retrieve context, but it does not replace permissions, security, task planning, or business approvals. If a context database touches sensitive information, access control and audit logging still need to be designed separately.

Mirage: Mounting The Outside World As A File Tree

Mirage positions itself as A Unified Virtual Filesystem For AI Agents.

It focuses on a different practical issue: agents need to access more tools. A single agent may need to read GitHub, inspect S3, search Slack, open Gmail, query Redis, and process local files. Each system has its own API, permission model, pagination behavior, error codes, and data shape. The more tools an agent uses, the more likely it is to fail at tool selection or parameter construction.

Mirage’s approach is to mount external data sources into one virtual filesystem. The repository examples show resources like /s3, /gdrive, /slack, /gmail, and /redis under one tree, so agents can explore, read, and combine data through file-like operations.

That design has two practical benefits.

First, it turns tool use into environment exploration. ls, cat, grep, directories, files, and paths are more stable abstractions than a large set of heterogeneous API schemas.

Second, it gives complex tasks an intermediate workspace. An agent can read data from different sources, organize it into files, produce intermediate outputs, and pass those outputs to the next step.

Mirage is a good fit for agent scenarios where there are many data sources, but most operations are reading, searching, organizing, and combining information. Research assistants, data cleanup agents, operations assistants, and internal knowledge tools all fit this pattern.

But Mirage should not be treated as a universal replacement for APIs. Operations with side effects, such as transfers, deletions, approvals, or sending messages, still need stricter permissions, confirmations, and audit trails. A filesystem abstraction can reduce reading and organization complexity, but it should not blur the risk boundary of business actions.

SkillOpt: Making Skills Trainable And Testable

SkillOpt comes from Microsoft Research and describes itself as Executive Strategy for Self-Evolving Agent Skills.

It targets the problem of reusable agent experience. Many agent systems now have skills, prompts, playbooks, or SOPs. Most of them are still written by humans and tuned through manual trial and error. Whether a skill got better or worse is often judged by a few ad hoc tests.

SkillOpt takes a different approach: it does not update model weights, but instead optimizes external natural-language skills. It uses task trajectories, feedback, and validation gates to repeatedly edit skills, accepting changes only when validation improves. The output is a deployable artifact such as best_skill.md.

This is interesting because it upgrades a skill from “prompt experience” into a trainable asset.

For development teams, this suggests that skills may be maintained more like code:

prepare fixed evaluation sets for common task types;
record the impact of each skill change;
feed failure cases back into the optimization loop;
require human review for high-risk skills;
release stable skills into real agent workflows.

SkillOpt is closer to research and methodology than a drop-in product for every agent platform. But the direction matters: as agents depend more on skills, skills themselves need versioning, evaluation, optimization, and release processes.

Why This Matters Now

Early AI applications were mostly chat.

A user provided input, the model returned output. Context management mostly meant putting conversation history into the prompt, truncating it when it got too long, summarizing it, or adding a simple RAG layer.

Agents are different.

A real working agent constantly produces new state:

which files it read;
which tools it called;
which commands failed;
which user preferences were confirmed;
which lessons should be reused next time;
which external resources should be loaded only when needed;
which skills worked for this project and which did not.

If this information is scattered across chat history, vector stores, temporary files, and tool logs, the agent becomes difficult to control.

The issue is not only that it forgets. The deeper issue is that even when it remembers, it may not know how to use what it remembered.

Traditional RAG helps with part of the retrieval problem, but it often flattens information into chunks and similarity scores. For long-running agents, that is not enough. Agents need hierarchy, provenance, freshness, permissions, and task relationships. When something goes wrong, developers also need to understand why the agent retrieved the context it used.

That is why projects like OpenViking, Mirage, and SkillOpt are worth watching. They are not simply expanding the context window. They are rethinking how agents organize external state.

Direction One: Context Becomes More Like A Filesystem

OpenViking’s core idea is straightforward: organize the memory, resources, and skills that agents need through a filesystem-like structure.

It emphasizes directories, files, layered loading, recursive retrieval, visualized retrieval traces, and automatic session management. The key point is not to vectorize everything. The key point is to give developers a structured and debuggable way to manage context.

The engineering judgment behind this is important:

An agent’s context should not be a black-box knowledge store. It should be an organized, observable, and debuggable workspace.

If an agent makes a wrong decision because of bad context, developers should not only see “the model answered incorrectly.” They should be able to inspect:

which directory it read;
which memory it used;
which resources it ignored;
why it selected a certain skill;
in which task loop the wrong assumption appeared.

This moves context management from retrieval augmentation toward context engineering.

Direction Two: External Tools Become A Virtual File Tree

Mirage is more about the tool access layer.

It mounts multiple services and data sources into one virtual filesystem. Instead of learning a separate API interface for every backend, an agent can use familiar files, directories, paths, pipes, and commands to explore its environment.

The value is reducing the cognitive overhead of tool use.

Many agent tool integrations look powerful, but they are costly in practice. Every MCP server, SaaS API, and database has its own parameters, permissions, pagination behavior, error codes, and data shape. The more tools an agent has, the more likely it is to fail while choosing tools or constructing arguments.

If backend capabilities are exposed through one filesystem abstraction, the agent can at least rely on a stable mental model:

ls to see available resources;
cat to read content;
grep to search;
cp or mv to organize intermediate results;
snapshots or versions to manage a task workspace.

This does not mean filesystems can solve every problem. Complex business actions still require clear APIs, permissions, and transaction boundaries. But for the high-frequency agent actions of reading, searching, organizing, and combining, a virtual filesystem is a natural abstraction.

Direction Three: Skills Become Optimizable Assets

SkillOpt focuses on another layer: how agents reuse experience.

Many agent platforms now have skills, instructions, playbooks, or prompt libraries. The problem is that most of them are hand-written and manually tuned. Whether a skill works is often judged by trying it a few times.

SkillOpt treats natural-language skills as trainable state outside the frozen model. It does not change model weights. It iterates skill documents through task trajectories, feedback, and validation gates, producing a deployable artifact such as best_skill.md.

That is interesting because it turns skills from prompt snippets into engineering assets.

A mature team may not only maintain code, tests, and documentation. It may also maintain:

skills for specific task types;
evaluation sets for each skill;
before-and-after success rates for skill changes;
rules for which tasks can be optimized automatically;
review gates before high-risk skills enter production.

This resembles testing, CI, and staged rollout in software engineering. The difference is that the optimization target is a natural-language strategy rather than code.

Why This Is Bigger Than Any Single Project

Looking at these projects together, the agent application stack is starting to separate into layers:

Layer	Previous Approach	What Is Emerging
Model layer	Pick GPT, Claude, Gemini, or DeepSeek	Model routing, cost control, edge models
Tool layer	Write API wrappers or MCP servers	Tool gateways, permissions, sandboxes, virtual filesystems
Context layer	Chat history, RAG, handwritten memory	Context databases, layered loading, retrieval traces
Experience layer	Prompts, system instructions, human SOPs	Skills, evaluation, automatic optimization
Operations layer	Watch tokens and logs	Cost, permissions, audit, replay, failure analysis

This suggests that agent competition will become increasingly engineering-heavy.

In the future, the quality of an agent product will not only depend on which model it uses. It will also depend on whether it has a stable context system: whether it remembers the right things, forgets noisy information, loads context on demand, explains what context it used, and turns successful workflows into reusable skills.

Practical Takeaways For Developers

If you are building an agent application, you do not necessarily need to adopt these projects immediately. But you should start treating context as a first-class design problem.

I would check the following questions first:

Is context structured?
Do not dump everything into the prompt. At least separate user preferences, project documentation, task logs, tool outputs, long-term memory, and temporary drafts.
Does context have provenance?
Every memory, resource, and skill should trace back to its source. Otherwise, when an agent acts on bad information, debugging becomes painful.
Does context have a lifecycle?
Some information is only useful for the current task. Some should be retained long term. Some must expire. Long-term memory is not better just because it is larger.
Is context observable?
At minimum, record what the agent read, what it skipped, which skill it used, and where it failed.
Are skills evaluated?
If a skill affects real task outcomes, do not change it purely by feel. Prepare fixed tasks and judge changes by success rate, latency, cost, and error patterns.

Boundaries

This kind of infrastructure should not be over-romanticized.

First, a context system does not replace a permission system. What an agent can see, call, and write still needs independent control.

Second, a virtual filesystem should not turn every business operation into a file operation. Actions such as transactions, approvals, deletions, and external messages need explicit confirmation and auditability.

Third, automatic skill optimization is not magic. If the evaluation set is weak, the optimized skill may overfit a few tasks and become more fragile in real use.

Fourth, these open-source projects are moving quickly. OpenViking, Mirage, and SkillOpt are worth watching, but before production use, teams still need to evaluate licenses, data boundaries, dependency complexity, and maintenance pace.

Conclusion

The next stage of agents will not be driven by stronger models alone.

Models still matter, but they only solve part of the reasoning problem. What makes agents work over longer horizons is another layer: how context is organized, how tools are exposed, how experience is captured, how failures are replayed, and how permissions are narrowed.

The shared signal from OpenViking, Mirage, and SkillOpt is this: agents need their own infrastructure layer.

If the last year was about making models use tools, the next question is more operational:

When tools multiply, tasks get longer, and context becomes more complex, why should an agent still work reliably?

The answer is probably not a longer prompt. It is a manageable, observable, and iterative context system.