Skip to content

Commit cfadf32

Browse files
author
Goshawk
committed
Update README.md with dataset structures
1 parent 1b942de commit cfadf32

File tree

1 file changed

+32
-3
lines changed

1 file changed

+32
-3
lines changed

README.md

Lines changed: 32 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,46 @@
11
# datalake-worker
2-
Data lake implementation integrated with S3
2+
Data lake implementation integrated with AWS S3
33

44
# Supported features
55

6-
- Async download a chunk from S3
6+
- Async-Download chunks from AWS S3
77
- Persist on-disk in a lock-less manner
88
- List all persisted chunks by ID from a cache
99
- Find and lock a chunk - Once locked, chunk cannot be deleted
1010
- Scheduled deletion - Scheduled for deletion, a chunk will be removed once it is no longer in use.
1111

12-
- Backend-agnostic datamanager. The RocksDB backend can be substituted with any in-process NoSQL or SQL storage engine.
12+
- Backend-agnostic datamanager. The RocksDB backend can be substituted with any in-process NoSQL or SQL storage engine.g
1313

1414

1515
# Design
1616

1717
![image info](./design.png)
18+
19+
# Datasource structure
20+
21+
### Cache - in-memory map of chunks IDs to a lock permit
22+
23+
| Chunk_ID | Permit |
24+
| -------- | ------- |
25+
| 0x0A0B | 0 |
26+
| 0x0A0C | 1 |
27+
| 0x0A0C | 0 |
28+
29+
### OnDisk Tables and Indexes
30+
31+
| Chunk_ID | Encoded Chunk Data |
32+
| -------- | ------- |
33+
| 0x0A0B | 0x.. |
34+
| 0x0A0C | 0x... |
35+
| 0x0A0C | 0x |
36+
37+
| DatasetID_BlockNum | Chunk_ID |
38+
| -------- | ------- |
39+
| 100_0 | 0x0A0B |
40+
| 100_1 | 0x0A0B |
41+
| 100_2 | 0x0A0B |
42+
43+
| Metadata | Value |
44+
| ---------| -------- |
45+
0x1 (Size_Key) | 2000000 |
46+

0 commit comments

Comments
 (0)