mirror of https://github.com/grafana/mimir.git synced 2026-03-21 14:51:10 +00:00

Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus.

metrics observability opentelemetry otlp prometheus tsdb

Find a file

Andy Asp ac515c3689 Compactor scheduler: shard bbolt databases (#14768 ) #### What this PR does The compactor scheduler persists using a bbolt database, but bbolt only supports a single write at a time. This introduces sharding tenants across multiple bbolt databases in the scheduler to improve write throughput. The configuration that used to be the database path has been changed to a directory where these shards will be located. A consistent hash is used to assign tenants to shards. Changing the number of shards through configuration is supported and tenant data is migrated between the shards on startup if the shards changed. The stability of the hash is not a requirement because `RecoverAll` binds tenants to a `JobPersister` through a full scan. `Drop()` was added to `JobPersister` to never have to rediscover the associated shard. The migration procedure upon shard change was the most complex part of this change. To ensure crash safety the operations are ordered to perform copies before any deletions. The modifications to each database are batched at each stage. In order to know when a migration procedure was completed (and to detect if a shard disappeared later on) a metadata object is written to the first shard to persist the intended number of shards. As a note, I separated bbolt specific logic from `persistence.go` into `persistence_bbolt.go` in order to make the interfaces easier to see. #### Checklist - [x] Tests updated. - [ ] Documentation added. - [ ] `CHANGELOG.md` updated - the order of entries should be `[CHANGE]`, `[FEATURE]`, `[ENHANCEMENT]`, `[BUGFIX]`. If changelog entry is not needed, please add the `changelog-not-needed` label to the PR. - [ ] [`about-versioning.md`](https://github.com/grafana/mimir/blob/main/docs/sources/mimir/configure/about-versioning.md) updated with experimental features. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Introduces multi-file bbolt sharding and on-startup migration logic, which impacts durability and recovery paths; failures could lead to lost or duplicated persisted jobs if edge cases exist. > > Overview > The scheduler’s bbolt persistence is reworked to shard tenants across multiple bbolt database files (directory-based storage) to improve write throughput, using consistent hashing to select the shard per tenant. > > Startup now prepares/migrates shards when `shard_count` changes by copying tenant buckets to their new shard, deleting old placements, and persisting shard layout in a new `PersistenceMetadata` proto stored in a reserved metadata bucket. > > Configuration is updated from `BboltPath` to a structured `BboltConfig` (`dir`, `shard_count`), `JobPersister` gains `Drop()` for shard-aware tenant deletion, and tests are moved/expanded to cover sharding and scale up/down migration behavior. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `a997ea7157`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->		2026-03-20 16:11:33 -04:00
.config/go	Remove Go telemetry from git repository, ignore it. (#9122 )	2024-08-28 07:50:45 +02:00
.github	MQE: move expensive tests to their own non-race packages (#14745 )	2026-03-18 11:12:59 -04:00
cmd	Query-frontend: Add time range filtering to blocked queries (#14609 )	2026-03-19 10:12:53 +10:00
development	Grafana Alertmanager: Remove /full_state endpoint (#14746 )	2026-03-19 10:48:45 -04:00
docs	Mixin: Add usage-tracker snapshot upload/download alerts (#14778 )	2026-03-20 13:50:30 +01:00
images	Update logo image to be less jagged (#1484 )	2022-03-15 14:05:49 +00:00
integration	Speed up linter in CI (#14721 )	2026-03-18 08:39:32 +01:00
mimir-build-image	chore(deps): update dependency mvdan/sh to v3.13.0 (main) (#14660 )	2026-03-17 09:22:59 +01:00
operations	fix(helm): remove selector that throws errors in argocd (#14684 )	2026-03-20 15:05:21 +00:00
packaging	chore(deps): update debian:13 docker digest to 3615a74 (main) (#14526 )	2026-03-02 07:26:37 +01:00
pkg	Compactor scheduler: shard bbolt databases (#14768 )	2026-03-20 16:11:33 -04:00
tools	Speed up linter in CI (#14721 )	2026-03-18 08:39:32 +01:00
vendor	fix(deps): update module github.com/go-openapi/strfmt to v0.26.1 (main) (#14759 )	2026-03-20 11:09:52 +01:00
.gitattributes	Exclude vendor/ from PR size (#2229 )	2020-03-09 12:00:15 +01:00
.gitconfig	Rename prometheus-private to mimir-prometheus (#843 )	2022-01-21 15:55:07 +00:00
.gitignore	kafkatool: add create-topic and list-topics commands (#14639 )	2026-03-11 21:13:02 +01:00
.golangci.yml	Speed up linter in CI (#14721 )	2026-03-18 08:39:32 +01:00
.lintignore	Remove old website (#1135 )	2022-02-09 15:44:34 +01:00
.prettierignore	Move the mimir-distributed helm chart into the mimir repository (#1925 )	2022-05-30 11:02:02 +02:00
ADOPTERS.md	Update ADOPTERS.md (#9620 )	2024-10-15 09:31:57 +02:00
AGENTS.md	Reconcile agent instructions (#14207 )	2026-01-30 17:21:55 +01:00
CHANGELOG.md	Mixin: Add usage-tracker snapshot upload/download alerts (#14778 )	2026-03-20 13:50:30 +01:00
CLAUDE.md	Reconcile agent instructions (#14207 )	2026-01-30 17:21:55 +01:00
CODE_OF_CONDUCT.md	Fix code of conduct (#922 )	2022-01-27 15:36:57 +01:00
CODEOWNERS	Remove @tacole02 from CODEOWNERS (#14770 )	2026-03-19 14:51:15 -04:00
CONTRIBUTING.md	fixes link (#1476 )	2022-03-14 17:06:57 +01:00
go.mod	fix(deps): update module github.com/go-openapi/strfmt to v0.26.1 (main) (#14759 )	2026-03-20 11:09:52 +01:00
go.sum	fix(deps): update module github.com/go-openapi/strfmt to v0.26.1 (main) (#14759 )	2026-03-20 11:09:52 +01:00
GOVERNANCE.md	Add myself to Maintainers and Team (#14421 )	2026-02-20 14:50:58 +10:00
LICENSE	Apply standard Grafana Labs governance and license (#22 )	2021-08-05 14:40:22 +02:00
LICENSING.md	Change license for operations folder to Apache2. (#5753 )	2023-08-22 12:27:49 +02:00
MAINTAINERS.md	Add myself to Maintainers and Team (#14421 )	2026-02-20 14:50:58 +10:00
Makefile	MQE: move expensive tests to their own non-race packages (#14745 )	2026-03-18 11:12:59 -04:00
Makefile.local.example	Push all images to Docker Hub (#1204 )	2022-02-16 16:27:05 +00:00
README.md	Fix broken link in readme, and add redirect to docs. (#6115 )	2023-09-25 09:51:02 +02:00
RELEASE.md	Remove GEM mentions from release documentation (#13255 )	2025-10-30 12:43:44 +01:00
renovate.json5	renovate: add Makefile to auto-rebase matchers (#14708 )	2026-03-17 15:46:53 +01:00
VERSION	Mimir 3.0.4 version update and changelog (#14587 )	2026-03-11 13:32:51 +08:00

README.md

Grafana Mimir

Grafana Mimir is an open source software project that provides a scalable long-term storage for Prometheus. Some of the core strengths of Grafana Mimir include:

Easy to install and maintain: Grafana Mimir’s extensive documentation, tutorials, and deployment tooling make it quick to get started. Using its monolithic mode, you can get Grafana Mimir up and running with just one binary and no additional dependencies. Once deployed, the best-practice dashboards, alerts, and runbooks packaged with Grafana Mimir make it easy to monitor the health of the system.
Massive scalability: You can run Grafana Mimir's horizontally-scalable architecture across multiple machines, resulting in the ability to process orders of magnitude more time series than a single Prometheus instance. Internal testing shows that Grafana Mimir handles up to 1 billion active time series.
Global view of metrics: Grafana Mimir enables you to run queries that aggregate series from multiple Prometheus instances, giving you a global view of your systems. Its query engine extensively parallelizes query execution, so that even the highest-cardinality queries complete with blazing speed.
Cheap, durable metric storage: Grafana Mimir uses object storage for long-term data storage, allowing it to take advantage of this ubiquitous, cost-effective, high-durability technology. It is compatible with multiple object store implementations, including AWS S3, Google Cloud Storage, Azure Blob Storage, OpenStack Swift, as well as any S3-compatible object storage.
High availability: Grafana Mimir replicates incoming metrics, ensuring that no data is lost in the event of machine failure. Its horizontally scalable architecture also means that it can be restarted, upgraded, or downgraded with zero downtime, which means no interruptions to metrics ingestion or querying.
Natively multi-tenant: Grafana Mimir’s multi-tenant architecture enables you to isolate data and queries from independent teams or business units, making it possible for these groups to share the same cluster. Advanced limits and quality-of-service controls ensure that capacity is shared fairly among tenants.

Migrating to Grafana Mimir

If you're migrating to Grafana Mimir, refer to the following documents:

Deploying Grafana Mimir

For information about how to deploy Grafana Mimir, refer to Deploy Grafana Mimir.

Getting started

If you’re new to Grafana Mimir, read the Get started guide.

Before deploying Grafana Mimir in a production environment, read:

Documentation

Refer to the following links to access Grafana Mimir documentation:

Latest release
Upcoming release, at the tip of the main branch

Contributing

To contribute to Grafana Mimir, refer to Contributing to Grafana Mimir.

Join the Grafana Mimir discussion

If you have any questions or feedback regarding Grafana Mimir, join the Grafana Mimir Discussion. Alternatively, consider joining the monthly Grafana Mimir Community Call.

Your feedback is always welcome, and you can also share it via the #mimir Slack channel.

License

Grafana Mimir is distributed under AGPL-3.0-only.

README.md Unescape Escape