Memory Space

Overview

Memory Space is autobotAI's native RAG (Retrieval-Augmented Generation) engine. It acts as a dedicated long-term memory for your AI Agents, allowing them to store and retrieve information across different bot executions and sessions.

Unlike standard LLM context windows which are ephemeral, Memory Spaces use a Vector Database (Amazon DocumentDB) to index information semantically. This enables your agents to "think" with your private documentation, previous incident reports, or real-time infrastructure data.

Key Capabilities

Semantic Search: Find information based on meaning, not just keywords.
Automated Ingestion: Feed bot outputs (like API results) directly into memory.
Advanced Indexing: AI-generated Embedding Strategies for structured data.
Data Sovereignty: All vectors are stored within your own AWS environment.

Architecture & Concepts

1. Memory Space

The top-level container. Each space is tied to a specific AI Integration which provides the embedding model used to index data.

Embedding Models (set automatically based on integration type):

OpenAI: text-embedding-3-small
AWS Bedrock: amazon.titan-embed-text-v2:0

2. Memory Artifacts

The individual "knowledge units" within a space. An artifact can be:

A Document (origin: FILE): Uploaded files processed and chunked into the vector store.
A Data Stream (origin: NODE_RESULT): Dynamic output from a workflow node (e.g., a live list of EC2 instances).

3. Embedding Strategy

For NODE_RESULT artifacts, autobotAI uses AI to analyze the node output and auto-generate an EmbeddingStrategy. This controls how structured data is converted into searchable vectors.

Note: The embedding strategy is immutable — it cannot be changed after the artifact is created.

Field	Type	Default
`strategy_type`	`string`	—
`chunk_size`	`integer`	`1024`
`chunk_overlap`	`integer`	`50`
`fields_to_embed`	`list[string]`	`null`
`metadata_fields`	`list[string]`	`null`
`primary_key`	`string`	`null`
`deduplicate_by_primary_key`	`boolean`	`false`
`custom_instruction`	`string`	`null`

Field descriptions:

strategy_type (required) — The indexing approach: "semantic" (meaning-based), "keyword" (exact match), or "hybrid" (both). Use "hybrid" for data that mixes descriptive text with technical codes or IDs.
chunk_size — Maximum token size per text chunk sent to the embedding model. Default is 1024.
chunk_overlap — Overlapping tokens between adjacent chunks to prevent context loss at boundaries. Default is 50.
fields_to_embed — The most impactful field. Specifies which fields from the node output are converted into vectors for semantic search. If null, the entire JSON object is stringified as one block — functional but degrades search quality for records with many non-descriptive fields (IDs, timestamps, etc.).
metadata_fields — Fields stored as filterable metadata alongside the vector but not used for semantic search. Use for structured attributes like "region", "status", or "severity" so agents can filter results precisely.
primary_key — A unique field name per record (e.g., "instance_id"). Enables Upsert logic — see Data Management. Without this, new documents are always appended on every bot run.
deduplicate_by_primary_key — When true, records with a null or duplicate primary_key within the same batch are dropped (first occurrence kept). Default is false.
custom_instruction — Optional preprocessing note for the AI, e.g., "Concatenate name and description before embedding".

Example — given this node output:

json
{
  "instance_id": "i-0abc123",
  "region": "us-east-1",
  "state": "running",
  "title": "Publicly Exposed EC2 Instance",
  "description": "Instance has a public IP reachable from 0.0.0.0/0",
  "recommendation": "Restrict the security group to known CIDRs"
}

A well-configured strategy:

json
{
  "strategy_type": "hybrid",
  "fields_to_embed": ["title", "description", "recommendation"],
  "metadata_fields": ["region", "state"],
  "primary_key": "instance_id"
}

Only title, description, and recommendation are semantically indexed. region and state are available for filtering. instance_id ensures records are upserted rather than duplicated on each bot run.

Setting Up Memory Space

Navigate to Memory Spaces in the navigation bar.
Click Create Memory Space.
Fill in the required fields:
- Name: A descriptive name (minimum 5 characters).
- Description: Purpose of this memory space (minimum 20 characters).
- Integration Type: The AI provider (e.g., OpenAI or AWS Bedrock).
- Account ID: The specific integration account to use for embeddings.
Click Create.

Creating and Managing Artifacts

1. Manual Ingestion (File Upload)

For static knowledge like playbooks, compliance documents, or reference data.

Open your Memory Space and click Add Artifact.
Provide a Name and Description.
Upload a file. Supported formats: .pdf, .docx, .pptx, .ppt, .pptm, .txt, .md, .csv, .json, .yaml, .yml, .xls, .xlsx, .epub, .ipynb.
The system chunks and embeds the file automatically.

2. Automated Capture (Node Output)

For building a dynamic knowledge base from workflow automation.

In a bot workflow, select an action node (e.g., REST API or Python).

Important: The node must have run at least once successfully so the system can analyze its output structure.
In the node's Advance Settings, enable Save Output to Memory Space Artifact.
Select your Memory Space and click Create New Artifact.
The system analyzes the last execution output and auto-generates an Embedding Strategy.
Give the artifact a name and description, then save.

On every subsequent successful bot run, the node output is automatically synced to the artifact — only changed records are re-embedded (see Data Management).

Using Memory in AI Agents

1. Connecting the Space

In the AI Agent Node, select your Memory Space in the Memory Space dropdown. This automatically equips the agent with the query_memory_space tool, which it uses to search artifact data for context-aware responses.

2. Episodic Memory (Conversation Memory)

In the Advanced section of the AI Agent Node, the Use Memory toggle enables Episodic/Conversation Memory — summarizing past turns to maintain context within a session. This adds the load_memory_tool and is distinct from the long-term knowledge retrieval provided by a Memory Space.

3. How the Agent Queries

The agent autonomously decides when to search, reformulating the user's request into a standalone search query.

Top K: Number of relevant chunks to retrieve (default 25).
Artifact Filtering: The agent can scope its search to a specific artifact_id when relevant.

Data Management & Logic

Upsert and Incremental Updates

Memory Space never blindly re-embeds all data on every bot run. Instead it performs an intelligent sync:

Event	Behavior
Unchanged record	Content is MD5-hashed and compared. If hash matches, the record is skipped entirely.
New record	Inserted into the vector store.
Modified record	Old vector entry deleted; new one generated from updated content.
Deleted record	If the node output contains `modification_type: "delete"` on a record, that entry is removed from the vector store.

The primary_key field in the Embedding Strategy is what makes this upsert logic work — without it, new documents are always appended and duplicates accumulate over time.

Artifact Sync Status

Each artifact tracks its last sync state, visible on the artifact detail page:

Status	Meaning
`IN_PROGRESS`	Embedding is actively being generated. Do not re-trigger while in this state.
`SUCCESS`	Data was successfully embedded. Chunk Count shows how many vector chunks are stored.
`FAILED`	Last sync failed. The `last_sync_error` field on the artifact detail page contains the exact error.

Re-Syncing an Artifact

If an artifact was not synced after a successful bot run (e.g., during initial setup or after the strategy was misconfigured), re-trigger the sync by re-running the bot or the specific node from the execution page. The sync runs automatically once the node completes successfully.

Strategy Validation

Before each embedding run, the system validates the strategy against the actual node output:

fields_to_embed: Hard fails if none of the listed fields are present; warns if only some are missing.
metadata_fields: Warning only — missing fields are skipped.
primary_key: Hard fails if the field does not exist in the data.

Use the Validate Embedding Strategy endpoint to pre-check before enabling on a scheduled bot:

POST /memory_spaces/{memory_space_id}/artifacts/validate_embedding_strategy

json
{
  "artifact_id": "<artifact_id>",
  "bot_id": "<bot_id>",
  "execution_id": "<bot_execution_id>",
  "node_id": "<node_name>"
}

Returns {"valid": true} on success, or {"valid": false, "error": "<reason>"} on failure.

Vector Store Isolation

Each Memory Space gets its own isolated collection in DocumentDB, named:

{model_provider}_memory_space_{sanitized_root_user_id}_{memory_space_id}

Best Practices

Set fields_to_embed for node output artifacts: Without this, the entire JSON object is embedded as one block, diluting vector quality with non-descriptive fields (IDs, timestamps, etc.).
Use metadata_fields for filtering attributes: Fields like region, status, or severity belong here — not in fields_to_embed — so agents can filter results without polluting the semantic vector.
Always set primary_key for recurring bots: Without a primary key, every run appends new documents. With one, only changed records are re-embedded.
Validate before scheduling: Use the validate endpoint to confirm strategy compatibility with node output before enabling on a recurring schedule.
Use descriptive artifact names: Clear names (e.g., EC2_Exposure_Findings, FY25_Compliance_Audit) help the agent identify what it's searching.
Keep node output schema consistent: If the output schema changes across runs, the existing strategy may miss fields — validate after any schema change.