What
A Percolator worker, a Bigtable tablet server, and a GFS chunkserver.
Timestamp Oracle (TSO)
- Strictly Monotonically Increasing
Why
requirement to run at massive scales and the lack of a requirement for extremely low latency.
How
Using 2PC
Writing
1. Pre-write Phase
start_ts
from TSO- lock all the cells being written (
lock
col), designate one lock as primary, secondary lock contains the location of the primary lock. - write the value to
data
col - reads metadata to check for conflicts, rollback
- write after
start_ts
(newer version) - lock detected at any time
- write after
- write lock and data to each cell at the
start_ts
2. Commit Phase
commit_ts
from TSO- replace the primary lock with write record (
write
col) withcommit_ts
- if primary lock is missing, the commit fails
- write record would suggest that the data is visible to the reader (aka finished)
- remove all secondary locks
Reading
- get
ts
- check if
lock
col contains lock with ts in[0, ts]
- if so, it was locked by earlier txn → unsafe → backoff
- get the latest record row where
write
col, whosecommit_ts
is in range[0, ts]
Roll Forward
- Primary Lock missing → finished
- Remove relavant locks
BigTable Integration
c:lock
- Shows primary / where primary is
c:write
- committed entry, pointer to the data
c:data
- stores data, referenced by pointer
c:notify
- mark modified cell to be dirty
c:ack_O
- do operations when data in observed col changed
- by scanning
notify
col
In TiKV
- Pre-condition: atomic to read and write a single user key
- 3 Cols:
CF_DEFAULT
,CF_LOCK
andCF_WRITE
, which corresponds to Percolator’sdata
column,lock
column andwrite
column respectively.CF_DEFAULT
:(key, start_ts)
→value
CF_LOCK
:key
→lock_info
- No ts As only one lock can be held at a time
CF_WRITE
:(key, commit_ts)
→write_info