Flexible FTL-based Address Mappings for Flash
May 6, 2014
Flash is growing in popularity
Fundamental physical differences (erase cycles, wear-out)
Software must adapt to hardware's characteristics
New software techniques can exploit flash in useful ways
But existing software stack makes flash look just like a disk!
Standard block read/write interface over over SATA/SAS (sometimes PCIe)
Flash translation layer (FTL) manages hardware's requirements
On-device firmware or host-based software
Log-structured writes to avoid read-erase-rewrite cycles
Remap logical blocks to physical locations
Space management: garbage collection
FTL keeps validity bitmap
Existing software works, but isn't ideal.
What we want:
Greater efficiency
New features
New software offering different interfaces can better exploit existing hardware to achieve this.
Stackable block storage layer:
FTL-like in many ways (log-structured, CoW oriented)
Presents large, sparse address space mapped to smaller underlying storage
Allows applications to explicitly manipulate address map
My suggestion:
FuBAR(vetoed.)
But in the absence of anything better...
Remains backward-compatible with traditional read/write block IO interface
Additionally offers new operations, range move and clone
Also available in vectored atomic flavors
Allows efficient implementation of new features and opportunities for improvements in existing systems
Range-move [0,2) to [4,6):
Range-move [0,2) to [4,6):
Clone [2,4) to [5,7):
Clone [2,4) to [5,7):
Clone [2,4) to [5,7):
CoW means cloned data remains at other mappings when one is overwritten.
Volume snapshots for backups, auditing, etc.
Simply clone the entire address space of a volume
Sparse address space allows many volumes within a single device
Atomic, time- and space-efficient
Smaller clones allow easy implementation of advanced FS features:
Zero-copy/CoW file snapshots (as in ZFS, BTRFS) easily added to conventional filesystems
FS need only allocate a block range and issue a clone operation
cp
in O(1) time and space
Could also provide back-end mechanism for a
Range moves can improve efficiency of existing systems (especially with vectored atomics)
Write-ahead logging (RDBMS, journaling FS):
Conventional approach uses double-write (once to log/journal, then again to "home")
Can instead write to scratch location, then atomically move data to home location
80% TPCC improvement with MySQL
Reduced write traffic also lengthens device lifespan
Garbage collection becomes much more complicated.
Address map is M:1, not 1:1, so bitmaps no longer work.
Metadata persistence:
Can't acknowledge a write until both data and metadata have been safely stored
Tricky to do efficiently with only block-granularity storage
Thanks!
Questions?