rosedblabs / wal Goto Github PK
View Code? Open in Web Editor NEWWrite Ahead Log for LSM or bitcask storage, designed to optimize the random io workloads.
License: Apache License 2.0
Write Ahead Log for LSM or bitcask storage, designed to optimize the random io workloads.
License: Apache License 2.0
Calling all WAL(Write Ahead Log) users!
Share a brief description of your project/use case, how WAL benefits you, and any encountered challenges.
Your insights will strengthen our community and inspire others.
Comment below with:
This is just my OCD kicks in, there is no actual access to the semgent's functions from the WAL struct since segment attr is private itself, but should its functions be private as well?
Usecase:
we are planning to build a snapshot of a service with WAL project. The service needs to stop all writing events and get the latest of chunk position in the wal instance. When the service experiences an outage or downtime, the service can rebuild the state from snapshot file and replay the events since latest chunk position. I was wondering if we could add a LastChunkPosition
function to wal struct?
something like:
func (wal *WAL) LastChunkPosition() (*ChunkPosition, error)
Line 225 in 8de9190
chunkSize <= leftSize, end <= dataSize, the second if branch is unreachable.
if chunkSize > leftSize {
chunkSize = leftSize
}
var end = dataSize - leftSize + chunkSize
if end > dataSize {
end = dataSize
}
Hi, I'm thinking of using this to increase the rate of instrument transactions that we can process, by using a local WAL I can increase the throughput as I can process the requests to the database in a worker.
So reading this:
// Sync is whether to synchronize writes through os buffer cache and down onto the actual disk.
// Setting sync is required for durability of a single write operation, but also results in slower writes.
//
// If false, and the machine crashes, then some recent writes may be lost.
// Note that if it is just the process that crashes (machine does not) then no writes will be lost.
//
// In other words, Sync being false has the same semantics as a write
// system call. Sync being true means write followed by fsync.
Sync bool
I'm a little bit confused - if there's a fatal crash in the process, how will writes not be lost? If they're stored in a buffer in memory, before fsync, then how are those writes recoverred?
Second question, if I'm simultaneously writing and reading to (and then deleting from) the WAL from different threads:
I use:
w.WAL.Write(b)
to write, and:
reader := w.WAL.NewReader()
for {
val, pos, err := reader.Next()
if err == io.EOF {
break
}
fmt.Println(string(val))
fmt.Println(pos) // get position of the data for next read
w.ch <- val
}
to read. Does reader := w.WAL.NewReader()
return all the segments up and until the point in time that the function is called? I think it does looking at:
if segId == 0 || wal.activeSegment.id <= segId {
reader := wal.activeSegment.NewReader()
segmentReaders = append(segmentReaders, reader)
}
and then:
func (seg *segment) NewReader() *segmentReader {
return &segmentReader{
segment: seg,
blockNumber: 0,
chunkOffset: 0,
}
}
seems to be 0 chunks in the new reader that was created and therefore it doesn't process any messages in there?
What's also the safest way to delete so that I never reprocess a message twice (although it isn't the end of the world if I do (if it's chronological), it's just costs time).
I can work it out with sufficient testing, but I figured it may be worth asking here.
Thank you in advance 🧡
If the data is greater than blockSize(32k), it will call Write multiple times, which will have cost(because of the System call), we can cache it in the buffer pool and write all once.
Import CLOCK-Pro caching algorithm to manage the block read and decompress.
Aims to avoid allocating new bytes every time while reading or decompressing.
it would be nice to support batch writes.
i want to be able to buffer all of my DB modifications that are appended to the WAL during a batch insert and only when the batch insert is completed flush it to the underlying file.
在这里将 SegmentSize 转换成 uint32,这就意味着 SegmentSize 参数值不能超过4G,确定是这样的吗?
// Open opens a WAL with the given options.
// It will create the directory if not exists, and open all segment files in the directory.
// If there is no segment file in the directory, it will create a new one.
func Open(options Options) (*WAL, error) {
if !strings.HasPrefix(options.SegmentFileExt, ".") {
return nil, fmt.Errorf("segment file extension must start with '.'")
}
if options.BlockCache > uint32(options.SegmentSize) {
return nil, fmt.Errorf("BlockCache must be smaller than SegmentSize")
}
....
}
reproduce:
func TestSegment_Write_LargeSize(t *testing.T) {
t.Run("32KB-10000", func(t *testing.T) {
testSegmentReaderLargeSize(t, 32*blockSize, 7000)
})
}
func testSegmentReaderLargeSize(t *testing.T, size int, count int) {
dir, _ := os.MkdirTemp("", "seg-test-reader-ManyChunks_large_size")
os.MkdirAll(dir, os.ModePerm)
cache, _ := lru.New[uint64, []byte](5)
seg, err := openSegmentFile(dir, ".SEG", 1, cache)
assert.Nil(t, err)
defer func() {
_ = seg.Remove()
}()
positions := make([]*ChunkPosition, 0)
bytes1 := []byte(strings.Repeat("W", size))
for i := 1; i <= count; i++ {
pos, err := seg.Write(bytes1)
assert.Nil(t, err)
positions = append(positions, pos)
}
for i, pos := range positions {
val, err := seg.Read(pos.BlockNumber, pos.ChunkOffset)
assert.Nil(t, err)
if !bytes.Equal(bytes1, val) {
t.Log(i)
t.Log(len(val))
break
}
}
}
firstly, change segSize function like this to avoid another problem
func (seg *segment) Size() int64 {
size := int64(seg.currentBlockNumber) * int64(blockSize)
return size + int64(+seg.currentBlockSize)
}
but the bug still exists.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.