# Indexing

Curio uses a local index store to manage the mapping of multihashes to content pieces. This system allows efficient retrieval of content across multiple nodes in the Curio cluster.

The index store does not interact with IPNI directly, but Curio may use index data to create and serve advertisements for the IPNI network when enabled.

## Key Components

* **Local Index Store**: A Cassandra-based local store that maps multihashes → piece CIDs + offsets/sizes.
* **Concurrency & Batching**: Configurable concurrency and batch sizing for index inserts.
* **Integration with Retrievals**: Retrieval performance depends on correct, timely indexing.
* **Task-Based Indexing**: Indexing work is performed asynchronously via tasks.

## Index Store

Curio’s indexing mechanism relies on a Cassandra-based storage system. The local copy of the indexes allows efficient lookups for retrieval operations.

The IndexStore component handles all interactions with the underlying Cassandra database, including creating, adding, and removing index entries, and querying pieces by multihashes.

### Concurrency and Batching

The IndexStore allows configurable concurrency and batching for operations. It uses multiple workers to process index entries, enabling parallel indexing to scale with the size of the dataset.

### Configuration

The `IndexingConfig` struct defines parameters that control how indexing operations are performed:

* **InsertBatchSize**: number of records per batch insert.
* **InsertConcurrency**: number of concurrent insert workers.

## Task-Based Indexing

Indexing tasks are created and executed asynchronously, allowing Curio to manage indexing without blocking other subsystems.

Tasks are assigned to nodes within the Curio cluster, and each node executes indexing tasks based on the available system resources and the availability of the local copy of unsealed sector containing the deal.

***

## How “CheckIndex” works (and why it fails)

Operators frequently see errors on a task called **CheckIndex** and aren’t sure what it does.

What it does (high level):

* Periodically scans for indexing and announcement work that should exist (or be retried).
* Schedules follow-up tasks when it finds missing/failed items.

Code reference:

* `tasks/indexing/task_check_indexes.go` implements the `CheckIndex` task.

### Common failure mode: DB health/latency

If Yugabyte is slow/unhealthy, downstream tasks (including market/indexing workflows) can appear broken.

Before chasing indexing logic:

* run the DB connectivity check from the Curio host
* check tserver logs for FATALs

See: [Yugabyte troubleshooting](/administration/yugabyte-troubleshooting.md)

### Common failure mode: duplicate key (SQLSTATE 23505)

You may see:

* `duplicate key value violates unique constraint ... (SQLSTATE 23505)`

This often indicates concurrency/retries inserting the same identity row.

What to do:

* Determine whether the pipeline is still progressing.
* If tasks are stuck, collect deal UUID + piece CID + failing task IDs and consult:
  * [Curio Market troubleshooting](/curio-market/troubleshooting.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.curiostorage.org/curio-market/indexing.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.