- B+ Tree
- Application: File System (Ext4)
Introduction
B-trees are balanced search trees designed to work well on disk drives or other direct-access secondary storage devices. B-trees are similar to red-black trees (Chapter 13), but they are better at minimizing the number of operations that access disks. Many database systems use B-trees, or variants of B-trees, to store information. —CLRS
It only read one node into memory, so it is useful for databases which mostly have larger table size than memory capacity.
Properties
- Every node
x
has the following attributes.x.n
: the number of keysx.keys
: in monotonically increasing orderx.leaf
: boolean value
- Each node x contains
x.n + 1
pointers to its children. Leaf nodes have no children. - The keys
x.key_i
separate the ranges of keys stored in each subtree. Just like a 2-3-4 Tree - All leaves have the same depth, which is the height
h
- Nodes have a lower and upper bounds on the number of keys they can contain (
t
is an arbitrary constant representing the lower and upper bounds).- Every node other than the root must have at least
t-1
keys. Thus nodes have at leastt
children. - Every node may contain at most
2t - 1
keys. Thus nodes may have at most2t
children. Which we call it full.
- Every node other than the root must have at least
Operations
- The root of the B-tree is always in main memory, so that no procedure ever needs to perform a
DISK-READ
on the root. If any changes to the root node occur, however, thenDISK-WRITE
must be called on the root. - Any nodes that are passed as parameters must already have had a
DISK-READ
operation performed on them
Search
Search makes a multiway branching decision according to the number of the node’s children. More precisely, at each internal node x , the search makes an (x.n + 1)
-way branching decision.
B-Tree-Search(x, k) // x is the node, k for key
i = 1
while i <= x.n and k > x.key_i // find the right interval
i += 1
if i <= x.n and k == x.key_i
return (x, i)
elseif x.leaf
return nil
else DISK-READ(x.c_i)
return B-Tree-Search(x.c_i, k)
Insertion
Create Empty Tree
ALLOCATE-NODE
: allocates one disk block to be used as a new node in O(1) time
B-Tree-Create(T)
x = ALLOCATE-NODE()
x.leaf = True
x.n = 0
DISK-WRITE(x)
T.root = x
Splitting a Node in a B-tree
Insert the new key into an existing leaf node. Since you cannot insert a key into a leaf node that is full, you need an operation that splits a full node y (having 2t - 1
keys) around its median key y.key_t
into two nodes having only t - 1 keys each.
To avoid having to go back up the tree, just split every full node you encounter as you go down the tree. In this way, whenever you need to split a full node, you are assured that its parent is not full. Inserting a key into a B-tree then requires only a single pass down the tree from the root to a leaf.
To split a full root, you first need to make the root a child of a new empty root node, so that you can use B-TREE-SPLIT-CHILD
. The tree thus grows in height by 1: splitting is the only means by which the tree grows taller.
B-Tree-Split-Child(x, i)
y = x.c_i // y(x's ith child) is the full node to split
z = ALLOCATE-NODE() // z will take half of y
z.leaf = y.leaf
z.n = t - 1
for j = 1 to t-1
z.key_j = y.key_j+t // z gets y's greater parts
if not y.leaf
for j = 1 to t
z.c_j = y.c_j+t // z also get the splited node's children
y.n = t - 1 // y keeps t - 1 keys
for j = x.n + 1 downto i+1 // shift parent's(x's) children to the right
x.c_j+1 = x.c_j
x.c_i+1 = z // make z a new child
for j = x.n downto i
x.key_j+1 = x.key_j // shift the keys in x
x.key_i = y.key_i // insert y's median key
x.n = x.n + 1 // x has a new child
DISK-WRITE(y)
DISK-WRITE(z)
DISK-WRITE(x)
Split Root
B-Tree-Split-Root(T)
s = ALLOCATE-NODE()
s.leaf = FALSE
s.n = 0
s.c_1 = T.root
T.root = s
B-Tree-Split-Schild(s, 1)
return s
Insert Procedure
Inserting a key k
into a B-tree T of height h
requires just a single pass down the tree and O(h) disk accesses. The CPU time required is O(th) = O(t log t n).
The B-TREE-INSERT procedure uses B-TREE-SPLIT-CHILD to guarantee that the recursion never descends to a full node. If the root is full, B-TREE-I NSERT splits it by calling the procedure B-TREE-SPLIT-ROOT on the facing page
B-Tree-Insert(T, k)
r = T.root
if r.n == 2t - 1
s = B-Tree-Split-Root(T)
B-Tree-Insert-Nonfull(s, k)
else
B-Tree-Insert-Nonfull(r, k)
B-Tree-Insert-Nonfull(x, k)
i = x.n // search from right to left
if x.leaf // inserting into a leaf
while i >= 1 and k < x.key_i // right shift keys in x to make room for k
x.key_i+1 = x.key_i
i -= 1
x.key_i+1 = k // insert key k in x
x.n += 1 // have 1 more key
DISK-WRITE(x)
else // find the child where k belongs
while i >= 1 and k < x.key_i
i -= 1
i += 1 // position located
DISK-READ(x.c_i)
if x.c_i.n == 2t-1 // split the child if it's full
B-Tree-Split-Child(x, i)
if k > x.key_i // k go into x.c_i ro x.c_i+1
i += 1
B-Tree-Insert-Nonfull(x.c_i, k)
- Complexity:
Deletion
- Case 1: Leaf node x, delete
x.k
if found - Case 2: Internal node x contain
k
, depending on the number of keys in x.c_i, we have- Case 2a:
x.c_i
has at least t keys. Recursively find the substitute keyk'
in the subtree rooted at x.c_i and move it up. - Case 2b:
x.c_i
hast - 1
keys andx.c_i+1
has at least t keys. Symmetric to case 2a. Findk'
in subtree rooted atx.c_i+1
- Case 2c: Both
x.c_i
andx.c_i+1
havet - 1
keys. Mergek
and all ofx.c_i+1
intox.c_i
so that x now have2t - 1
keys. Then recursively delete k fromx.c_i
- Case 2a:
- Case 3: Hold the properties! If
x.c_i
has onlyt - 1
keys, execute case 3a or 3b to guarantee descending to a node containing at leastt
keys.- Case 3a:
x.c_i
has onlyt - 1
keys but has an immediate sibling with at leastt
keys.x
give a key tox.c_i
and sibling give a key tox
- Case 3b:
x.c_i
and each(1 or 2) immediate siblings have onlyt - 1
keys. Mergex.c_i
with one sibling and move a key fromx
down into the newly merged node to become the median key for that node.
- Case 3a:
- t = 3