What
A Percolator worker, a Bigtable tablet server, and a GFS chunkserver.
Timestamp Oracle (TSO)
- Strictly Monotonically Increasing
Why
requirement to run at massive scales and the lack of a requirement for extremely low latency.
How
Using 2PC
Writing
1. Pre-write Phase
start_tsfrom TSO- lock all the cells being written (
lockcol), designate one lock as primary, secondary lock contains the location of the primary lock. - write the value to
datacol - reads metadata to check for conflicts, rollback
- write after
start_ts(newer version) - lock detected at any time
- write after
- write lock and data to each cell at the
start_ts
2. Commit Phase
commit_tsfrom TSO- replace the primary lock with write record (
writecol) withcommit_ts- if primary lock is missing, the commit fails
- write record would suggest that the data is visible to the reader (aka finished)
- remove all secondary locks
Reading
- get
ts - check if
lockcol contains lock with ts in[0, ts]- if so, it was locked by earlier txn → unsafe → backoff
- get the latest record row where
writecol, whosecommit_tsis in range[0, ts]
Roll Forward
- Primary Lock missing → finished
- Remove relavant locks
BigTable Integration
c:lock- Shows primary / where primary is
c:write- committed entry, pointer to the data
c:data- stores data, referenced by pointer
c:notify- mark modified cell to be dirty
c:ack_O- do operations when data in observed col changed
- by scanning
notifycol
In TiKV
- Pre-condition: atomic to read and write a single user key
- 3 Cols:
CF_DEFAULT,CF_LOCKandCF_WRITE, which corresponds to Percolator’sdatacolumn,lockcolumn andwritecolumn respectively.CF_DEFAULT:(key, start_ts)→valueCF_LOCK:key→lock_info- No ts As only one lock can be held at a time
CF_WRITE:(key, commit_ts)→write_info