Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus.
Find a file
Andy Asp ac515c3689
Compactor scheduler: shard bbolt databases (#14768)
#### What this PR does

The compactor scheduler persists using a bbolt database, but bbolt only
supports a single write at a time. This introduces sharding tenants
across multiple bbolt databases in the scheduler to improve write
throughput. The configuration that used to be the database path has been
changed to a directory where these shards will be located.

A consistent hash is used to assign tenants to shards. Changing the
number of shards through configuration is supported and tenant data is
migrated between the shards on startup if the shards changed. The
stability of the hash is not a requirement because `RecoverAll` binds
tenants to a `JobPersister` through a full scan. `Drop()` was added to
`JobPersister` to never have to rediscover the associated shard.

The migration procedure upon shard change was the most complex part of
this change. To ensure crash safety the operations are ordered to
perform copies before any deletions. The modifications to each database
are batched at each stage. In order to know when a migration procedure
was completed (and to detect if a shard disappeared later on) a metadata
object is written to the first shard to persist the intended number of
shards.

As a note, I separated bbolt specific logic from `persistence.go` into
`persistence_bbolt.go` in order to make the interfaces easier to see.

#### Checklist

- [x] Tests updated.
- [ ] Documentation added.
- [ ] `CHANGELOG.md` updated - the order of entries should be
`[CHANGE]`, `[FEATURE]`, `[ENHANCEMENT]`, `[BUGFIX]`. If changelog entry
is not needed, please add the `changelog-not-needed` label to the PR.
- [ ]
[`about-versioning.md`](https://github.com/grafana/mimir/blob/main/docs/sources/mimir/configure/about-versioning.md)
updated with experimental features.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Introduces multi-file bbolt sharding and on-startup migration logic,
which impacts durability and recovery paths; failures could lead to lost
or duplicated persisted jobs if edge cases exist.
> 
> **Overview**
> The scheduler’s bbolt persistence is reworked to **shard tenants
across multiple bbolt database files** (directory-based storage) to
improve write throughput, using consistent hashing to select the shard
per tenant.
> 
> Startup now **prepares/migrates shards** when `shard_count` changes by
copying tenant buckets to their new shard, deleting old placements, and
persisting shard layout in a new `PersistenceMetadata` proto stored in a
reserved metadata bucket.
> 
> Configuration is updated from `BboltPath` to a structured
`BboltConfig` (`dir`, `shard_count`), `JobPersister` gains `Drop()` for
shard-aware tenant deletion, and tests are moved/expanded to cover
sharding and scale up/down migration behavior.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
a997ea7157. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
2026-03-20 16:11:33 -04:00
.config/go Remove Go telemetry from git repository, ignore it. (#9122) 2024-08-28 07:50:45 +02:00
.github MQE: move expensive tests to their own non-race packages (#14745) 2026-03-18 11:12:59 -04:00
cmd Query-frontend: Add time range filtering to blocked queries (#14609) 2026-03-19 10:12:53 +10:00
development Grafana Alertmanager: Remove /full_state endpoint (#14746) 2026-03-19 10:48:45 -04:00
docs Mixin: Add usage-tracker snapshot upload/download alerts (#14778) 2026-03-20 13:50:30 +01:00
images Update logo image to be less jagged (#1484) 2022-03-15 14:05:49 +00:00
integration Speed up linter in CI (#14721) 2026-03-18 08:39:32 +01:00
mimir-build-image chore(deps): update dependency mvdan/sh to v3.13.0 (main) (#14660) 2026-03-17 09:22:59 +01:00
operations fix(helm): remove selector that throws errors in argocd (#14684) 2026-03-20 15:05:21 +00:00
packaging chore(deps): update debian:13 docker digest to 3615a74 (main) (#14526) 2026-03-02 07:26:37 +01:00
pkg Compactor scheduler: shard bbolt databases (#14768) 2026-03-20 16:11:33 -04:00
tools Speed up linter in CI (#14721) 2026-03-18 08:39:32 +01:00
vendor fix(deps): update module github.com/go-openapi/strfmt to v0.26.1 (main) (#14759) 2026-03-20 11:09:52 +01:00
.gitattributes Exclude vendor/ from PR size (#2229) 2020-03-09 12:00:15 +01:00
.gitconfig Rename prometheus-private to mimir-prometheus (#843) 2022-01-21 15:55:07 +00:00
.gitignore kafkatool: add create-topic and list-topics commands (#14639) 2026-03-11 21:13:02 +01:00
.golangci.yml Speed up linter in CI (#14721) 2026-03-18 08:39:32 +01:00
.lintignore Remove old website (#1135) 2022-02-09 15:44:34 +01:00
.prettierignore Move the mimir-distributed helm chart into the mimir repository (#1925) 2022-05-30 11:02:02 +02:00
ADOPTERS.md Update ADOPTERS.md (#9620) 2024-10-15 09:31:57 +02:00
AGENTS.md Reconcile agent instructions (#14207) 2026-01-30 17:21:55 +01:00
CHANGELOG.md Mixin: Add usage-tracker snapshot upload/download alerts (#14778) 2026-03-20 13:50:30 +01:00
CLAUDE.md Reconcile agent instructions (#14207) 2026-01-30 17:21:55 +01:00
CODE_OF_CONDUCT.md Fix code of conduct (#922) 2022-01-27 15:36:57 +01:00
CODEOWNERS Remove @tacole02 from CODEOWNERS (#14770) 2026-03-19 14:51:15 -04:00
CONTRIBUTING.md fixes link (#1476) 2022-03-14 17:06:57 +01:00
go.mod fix(deps): update module github.com/go-openapi/strfmt to v0.26.1 (main) (#14759) 2026-03-20 11:09:52 +01:00
go.sum fix(deps): update module github.com/go-openapi/strfmt to v0.26.1 (main) (#14759) 2026-03-20 11:09:52 +01:00
GOVERNANCE.md Add myself to Maintainers and Team (#14421) 2026-02-20 14:50:58 +10:00
LICENSE Apply standard Grafana Labs governance and license (#22) 2021-08-05 14:40:22 +02:00
LICENSING.md Change license for operations folder to Apache2. (#5753) 2023-08-22 12:27:49 +02:00
MAINTAINERS.md Add myself to Maintainers and Team (#14421) 2026-02-20 14:50:58 +10:00
Makefile MQE: move expensive tests to their own non-race packages (#14745) 2026-03-18 11:12:59 -04:00
Makefile.local.example Push all images to Docker Hub (#1204) 2022-02-16 16:27:05 +00:00
README.md Fix broken link in readme, and add redirect to docs. (#6115) 2023-09-25 09:51:02 +02:00
RELEASE.md Remove GEM mentions from release documentation (#13255) 2025-10-30 12:43:44 +01:00
renovate.json5 renovate: add Makefile to auto-rebase matchers (#14708) 2026-03-17 15:46:53 +01:00
VERSION Mimir 3.0.4 version update and changelog (#14587) 2026-03-11 13:32:51 +08:00

Grafana Mimir

Grafana Mimir logo

Grafana Mimir is an open source software project that provides a scalable long-term storage for Prometheus. Some of the core strengths of Grafana Mimir include:

  • Easy to install and maintain: Grafana Mimirs extensive documentation, tutorials, and deployment tooling make it quick to get started. Using its monolithic mode, you can get Grafana Mimir up and running with just one binary and no additional dependencies. Once deployed, the best-practice dashboards, alerts, and runbooks packaged with Grafana Mimir make it easy to monitor the health of the system.
  • Massive scalability: You can run Grafana Mimir's horizontally-scalable architecture across multiple machines, resulting in the ability to process orders of magnitude more time series than a single Prometheus instance. Internal testing shows that Grafana Mimir handles up to 1 billion active time series.
  • Global view of metrics: Grafana Mimir enables you to run queries that aggregate series from multiple Prometheus instances, giving you a global view of your systems. Its query engine extensively parallelizes query execution, so that even the highest-cardinality queries complete with blazing speed.
  • Cheap, durable metric storage: Grafana Mimir uses object storage for long-term data storage, allowing it to take advantage of this ubiquitous, cost-effective, high-durability technology. It is compatible with multiple object store implementations, including AWS S3, Google Cloud Storage, Azure Blob Storage, OpenStack Swift, as well as any S3-compatible object storage.
  • High availability: Grafana Mimir replicates incoming metrics, ensuring that no data is lost in the event of machine failure. Its horizontally scalable architecture also means that it can be restarted, upgraded, or downgraded with zero downtime, which means no interruptions to metrics ingestion or querying.
  • Natively multi-tenant: Grafana Mimirs multi-tenant architecture enables you to isolate data and queries from independent teams or business units, making it possible for these groups to share the same cluster. Advanced limits and quality-of-service controls ensure that capacity is shared fairly among tenants.

Migrating to Grafana Mimir

If you're migrating to Grafana Mimir, refer to the following documents:

Deploying Grafana Mimir

For information about how to deploy Grafana Mimir, refer to Deploy Grafana Mimir.

Getting started

If youre new to Grafana Mimir, read the Get started guide.

Before deploying Grafana Mimir in a production environment, read:

  1. An overview of Grafana Mimirs architecture
  2. Configure Grafana Mimir
  3. Run Grafana Mimir in production

Documentation

Refer to the following links to access Grafana Mimir documentation:

Contributing

To contribute to Grafana Mimir, refer to Contributing to Grafana Mimir.

Join the Grafana Mimir discussion

If you have any questions or feedback regarding Grafana Mimir, join the Grafana Mimir Discussion. Alternatively, consider joining the monthly Grafana Mimir Community Call.

Your feedback is always welcome, and you can also share it via the #mimir Slack channel.

License

Grafana Mimir is distributed under AGPL-3.0-only.