GithubHelp home page GithubHelp logo

segment_toy's Introduction

rp-storage-tool

Purpose

This low level tool is for offline use by Redpanda Engineering when diagnosing faults.

This tool is not for everyday use on live clusters.

Quickstart

# Get a rust toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Compile and run
cargo run --release -- --backend=<aws|gcp|azure> scan --source=<bucket name>

Installation

To run the tool outside a built tree, you may simply copy the statically linked binary.

# Compile
cargo build --release

# Grab the binary
cp target/release/rp-storage-tool /usr/local/bin

Usage

Typical usage

The following examples assume you are running on an AWS instance with IAM roles configured for access to your bucket. If you need to specify static keys, see the "Working with local object stores" section below.

# Scan the bucket, store metadata in a file, and report any metadata anomalies
$ rp-storage-tool --backend aws scan-metadata --source=my-bucket --meta-file my-bucket.json

# Scan the data within a topic of interest, and report any anomalies.  Use the metadata
# file from the last step to avoid re-scanning all objects in the bucket.
$ rp-storage-tool --filter kafka/my-topic/*_* --backend aws scan-data --source=my-bucket --meta-file my-bucket.json

# Having identified an issue, extract the data from a partition for further analysis offline
$ mkdir my-data-dump
$ rp-storage-tool --filter kafka/my-topic/13_63 --backend aws extract --source=my-bucket --meta-file my-bucket.json --sink ./my-data-dump/

Generic parameters

  • The --backend argument selects the cloud storage backend. This has a default (AWS) for convenience when working with commands that don't use cloud storage, but ordinarily you should be specifying it.
  • The --filter argument controls which topics & partitions will be examined. When scanning metadata we always scan all objects, but will only store+analyze metadata for partitions matching the filter.
  • If you are running on a node with authentication already set up (e.g. IAM Roles on AWS), this will Just Work. Otherwise you may need to set the appropriate access key/secret environment variables for the cloud platform you are connecting to.

Subcommands

scan-metadata

This walks the objects in a bucket used by Redpanda Tiered Storage and reports on any inconsistencies. Not all issues this tool reports are harmful, for example segments_outside_manifest may contain objects harmlessly left behind when Redpanda was restarted during an upload.

scan-data

This walks the data within segments (you probably want to use --filter to only scan segments for certain NTPS) to check that it is readable, and that the metadata in the partition manifest is consistent with the data.

extract

For NTPs matching the filter, copy all metadata and data objects into a local folder.

Working with local object stores

This tool uses environment variables for any special configuration of storage backends required for working outside of real cloud environments.

For example, to use the tool with a bucket called data in a minio cluster at aplite:9000:

AWS_ALLOW_HTTP=1 AWS_ACCESS_KEY_ID=minioadmin AWS_SECRET_ACCESS_KEY=minioadmin AWS_REGION=us-east-1 \
  AWS_ENDPOINT=http://aplite:9000 cargo run --release -- --backend aws scan --source data

Building a portable binary

If you are building on a workstation and copying the binary to a remote machine, and your workstation doesn't happen to run the same linux distro as the remote machine, it is useful to build a statically linked binary that will work on any distro.

# A dependency requires this to build: adjust as needed if not on an RPM distro
sudo dnf install -y musl-gcc

# Install the Rust toolchain for musl (statically linked libc)
rustup target add x86_64-unknown-linux-musl

# Build with the musl toolchain
cargo build --release --target=x86_64-unknown-linux-musl

# Output in target/release/rp-storage-tool

segment_toy's People

Contributors

jcsp avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

vladlazar andrwng

segment_toy's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.