Skip to main content

Documentation Index

Fetch the complete documentation index at: https://support.agentrank.io/llms.txt

Use this file to discover all available pages before exploring further.

Overview

When documents are fed to Vespa, they go through an indexing pipeline that transforms and processes them before storage. The pipeline consists of:
  • Indexing Language - Declarative expressions for field transformations
  • Document Processors - Custom Java components for complex processing
  • Indexing Pipeline - The complete flow from ingestion to storage

Indexing Language

The indexing language is a domain-specific language for transforming document fields during indexing.

Basic Syntax

Define indexing statements in your schema:
schema music {
    document music {
        field title type string {
            indexing: summary | index
        }
        
        field artist type string {
            indexing: summary | attribute
        }
        
        field year type int {
            indexing: summary | attribute
        }
    }
}

Indexing Expressions

The indexing language supports various expressions for field manipulation:

Input Expression

Read a field value from the document:
field my_field type string {
    indexing: input title | lowercase | index
}
Reference: indexinglanguage/src/main/java/com/yahoo/vespa/indexinglanguage/expressions/InputExpression.java:19

Output Expressions

Specify where to store the processed value:
Store in memory index for full-text search:
field title type string {
    indexing: input title | lowercase | index
}
Store in in-memory attribute for fast access, filtering, and sorting:
field year type int {
    indexing: attribute
}
Store in document summary for retrieval:
field description type string {
    indexing: summary
}
Reference: indexinglanguage/src/main/java/com/yahoo/vespa/indexinglanguage/expressions/IndexExpression.java:7

Transformation Expressions

The indexing language provides 87+ built-in expressions for data transformation:
field normalized_title type string {
    indexing: input title | lowercase | trim | normalize | index
}

field tokens type array<string> {
    indexing: input text | tokenize | index
}

Common Expressions

ExpressionDescriptionExample
inputRead field valueinput title
lowercaseConvert to lowercaselowercase
tokenizeSplit into tokenstokenize
normalizeUnicode normalizationnormalize
trimRemove whitespacetrim
indexStore in indexindex
attributeStore as attributeattribute
summaryInclude in summarysummary
embedGenerate embeddingsembed embedder_name
flattenFlatten nested structuresflatten
for_eachProcess array elementsfor_each { ... }

Control Flow

Choice Expression

Conditional processing based on field presence:
field display_title type string {
    indexing: (input title || input name || "Untitled") | summary
}

ForEach Expression

Process array elements:
field normalized_tags type array<string> {
    indexing: input tags | for_each { lowercase | trim } | index
}

Script Expressions

Chain multiple operations:
field processed_text type string {
    indexing: input raw_text | 
              lowercase | 
              trim | 
              tokenize | 
              normalize | 
              index | 
              summary
}

Embedding Generation

Generate embeddings during indexing:
schema doc {
    document doc {
        field text type string {
            indexing: summary | index
        }
    }
    
    field embedding type tensor<float>(x[384]) {
        indexing: input text | embed embedder | attribute
    }
}
The embed expression requires configuring an embedder in your services.xml.

Document Processors

Document processors are Java components that perform custom processing on documents before they’re indexed.

Creating a Document Processor

Extend DocumentProcessor and implement the process method:
import com.yahoo.docproc.DocumentProcessor;
import com.yahoo.docproc.Processing;
import com.yahoo.document.DocumentPut;
import com.yahoo.document.Document;

public class MusicEnricherProcessor extends DocumentProcessor {
    
    @Override
    public Progress process(Processing processing) {
        for (DocumentOperation op : processing.getDocumentOperations()) {
            if (op instanceof DocumentPut) {
                DocumentPut put = (DocumentPut) op;
                Document doc = put.getDocument();
                
                // Enrich document
                enrichDocument(doc);
            }
        }
        return Progress.DONE;
    }
    
    private void enrichDocument(Document doc) {
        String artist = (String) doc.getFieldValue("artist");
        if (artist != null) {
            // Add normalized artist field
            doc.setFieldValue("artist_normalized", 
                artist.toLowerCase().trim());
        }
    }
}
Reference: docproc/src/main/java/com/yahoo/docproc/DocumentProcessor.java:45

Processing Return Values

Document processors return a Progress value indicating the outcome:
// Processing completed successfully
return Progress.DONE;
Reference: docproc/src/main/java/com/yahoo/docproc/DocumentProcessor.java:108-150

Accessing Document Operations

The Processing object contains all document operations:
import com.yahoo.docproc.Processing;
import com.yahoo.document.DocumentOperation;
import com.yahoo.document.DocumentPut;
import com.yahoo.document.DocumentUpdate;
import com.yahoo.document.DocumentRemove;

@Override
public Progress process(Processing processing) {
    for (DocumentOperation op : processing.getDocumentOperations()) {
        if (op instanceof DocumentPut) {
            DocumentPut put = (DocumentPut) op;
            processPut(put.getDocument());
        } else if (op instanceof DocumentUpdate) {
            DocumentUpdate update = (DocumentUpdate) op;
            processUpdate(update);
        } else if (op instanceof DocumentRemove) {
            DocumentRemove remove = (DocumentRemove) op;
            processRemove(remove.getId());
        }
    }
    return Progress.DONE;
}
Reference: docproc/src/main/java/com/yahoo/docproc/Processing.java:204-207

Context Variables

Store and retrieve context data across processors:
@Override
public Progress process(Processing processing) {
    // Set context variable
    processing.setVariable("start_time", System.currentTimeMillis());
    
    // Get context variable
    Long startTime = (Long) processing.getVariable("start_time");
    
    // Check if variable exists
    if (processing.hasVariable("user_id")) {
        String userId = (String) processing.getVariable("user_id");
    }
    
    return Progress.DONE;
}
Reference: docproc/src/main/java/com/yahoo/docproc/Processing.java:140-176

Asynchronous Processing

For operations requiring external calls:
import java.util.concurrent.CompletableFuture;

public class AsyncEnricherProcessor extends DocumentProcessor {
    
    @Override
    public Progress process(Processing processing) {
        for (DocumentOperation op : processing.getDocumentOperations()) {
            if (op instanceof DocumentPut) {
                Document doc = ((DocumentPut) op).getDocument();
                
                // Check if already processed
                if (processing.hasVariable("enriched_" + doc.getId())) {
                    continue;
                }
                
                // Start async enrichment
                String artist = (String) doc.getFieldValue("artist");
                CompletableFuture<ArtistInfo> future = 
                    fetchArtistInfo(artist);
                    
                future.whenComplete((info, error) -> {
                    if (error == null) {
                        doc.setFieldValue("genre", info.getGenre());
                        processing.setVariable("enriched_" + doc.getId(), true);
                    }
                });
                
                // Return LATER to be called again
                return Progress.LATER;
            }
        }
        return Progress.DONE;
    }
}
When returning Progress.LATER, the processor will be called again. Ensure you track state to avoid infinite loops.

Configuring Document Processors

Define processors in services.xml:
<services version="1.0">
    <container version="1.0" id="default">
        <document-processing>
            <chain id="default" inherits="indexing">
                <documentprocessor id="com.example.MusicEnricherProcessor"/>
                <documentprocessor id="com.example.ValidationProcessor"/>
            </chain>
        </document-processing>
        
        <nodes>
            <node hostalias="node1"/>
        </nodes>
    </container>
</services>

Multiple Processing Chains

Create different chains for different document types:
<document-processing>
    <chain id="music-chain" inherits="indexing">
        <documentprocessor id="com.example.MusicEnricherProcessor"/>
    </chain>
    
    <chain id="user-chain" inherits="indexing">
        <documentprocessor id="com.example.UserValidationProcessor"/>
    </chain>
</document-processing>

Indexing Pipeline

The complete indexing flow:
1
Document Reception
2
Vespa receives the document via feed client or HTTP API.
3
Document Processing
4
Document processors in the chain execute sequentially:
5
Document → Processor 1 → Processor 2 → ... → Processor N
6
Indexing Language Execution
7
Field-level transformations defined in the schema are applied.
8
Storage
9
Processed document is stored:
10
  • Fields marked index go to memory index
  • Fields marked attribute go to attribute storage
  • Fields marked summary go to document summary
  • Error Handling

    Handle errors in document processors:
    @Override
    public Progress process(Processing processing) {
        try {
            for (DocumentOperation op : processing.getDocumentOperations()) {
                validateOperation(op);
            }
            return Progress.DONE;
        } catch (ValidationException e) {
            log.warning("Validation failed: " + e.getMessage());
            return Progress.FAILED.withReason(e.getMessage());
        } catch (Exception e) {
            log.severe("Unexpected error: " + e.getMessage());
            return Progress.PERMANENT_FAILURE;
        }
    }
    

    Timeouts

    Monitor and enforce timeouts:
    import java.time.Duration;
    
    @Override
    public Progress process(Processing processing) {
        Duration timeLeft = processing.timeLeft();
        
        if (timeLeft.toMillis() < 1000) {
            log.warning("Processing timeout approaching");
            return Progress.TIMEOUT;
        }
        
        // Process with remaining time
        return Progress.DONE;
    }
    
    Reference: docproc/src/main/java/com/yahoo/docproc/Processing.java:232-237

    Best Practices

    1
    Keep Indexing Expressions Simple
    2
    Use indexing language for simple transformations. Move complex logic to document processors:
    3
    // Good: Simple transformation
    field title type string {
        indexing: input title | lowercase | index
    }
    
    // Complex logic → Use document processor instead
    
    4
    Make Processors Stateless
    5
    Document processors must be thread-safe. Avoid mutable instance variables:
    6
    public class SafeProcessor extends DocumentProcessor {
        // Good: Immutable configuration
        private final String apiEndpoint;
        
        // Bad: Mutable state
        // private int counter;
        
        @Override
        public Progress process(Processing processing) {
            // Use local variables for state
            int localCounter = 0;
            return Progress.DONE;
        }
    }
    
    7
    Handle Async Operations Properly
    8
    Track async operation state to avoid reprocessing:
    9
    if (!processing.hasVariable("async_started")) {
        // Start async operation
        startAsyncOperation();
        processing.setVariable("async_started", true);
        return Progress.LATER;
    }
    
    10
    Use Appropriate Progress Codes
    11
    Return the correct progress code:
    12
  • DONE - Processing complete
  • LATER - Need more time (async operation)
  • FAILED - This document failed (temporary)
  • PERMANENT_FAILURE - Critical error (disables processor)
  • See Also