Introduction of UFS
UFS (Universal Flash Storage) protocol is a communication interface protocol developed by JEDEC for mobile storage devices. Mobile storage products based on the UFS protocol are often referred to as UFS devices. UFS devices are widely applied in smartphones, tablets, VR (virtual reality) devices, AR (augmented reality) devices, UAVs, 3D games, monitoring systems, PDAs, digital recorders, MP3 players, electronic toys and other fields.
As a replacement for eMMC, UFS provides higher performance and power-efficiency. Figure 1 shows the performance data comparison between eMMC and each generation of UFS.
(Figure 1: Maximum bandwidth of eMMC and UFS)
As shown in Figure 1, the maximum bandwidth of the latest UFS 4.0 can reach more than 4GB/s, which is more than 10 times the maximum bandwidth of eMMC. At present, eMMC has stopped evolving, being slowly replaced by UFS.
Having gone through several iterations, the latest version of UFS is UFS 4.0 (released in August 2022). As shown in Figure 1, the UFS performance doubles that of the previous generation with each iteration. UFS 4.0 is no exception. It has doubled the performance of UFS 3.0/3.1 and its maximum bandwidth has reached more than 4GB/s. On top of that, some new features have been introduced in UFS 4.0 on the basis of UFS 3.0/3.1, such as Barrier instructions, advanced RPMB, FBO, etc. Today we will focus on the FBO feature.
FBO, short for File Based Optimization, refers to the optimization of performance on the base of files. Before we move forward with the introduction of FBO, let's first explain the background knowledge and figure out what is the logical and physical fragmentation of a file.
Logical and physical fragmentation of files
For a file, the file system will allocate several logical blocks (addressed with LBA) for storing file data. When allocating logical blocks to a file, the file system will try to allocate contiguous logical blocks to it. But if the required contiguous logical blocks cannot be allocated, discontiguous blocks will be allocated.
(Figure 2: Two scenarios where a file is allocated logical blocks)
Here, we refer to LBA contiguous as "logically contiguous" for short. In Figure 2, scenario 1 is "logically contiguous", and scenario 2 is "logically discontiguous". If the LBA of a file is "logically discontiguous", which means that the file is logically fragmented - the more discrete the LBA of the file, the more logically fragmented it is.
File data will eventually be stored on a storage device, that is, these logical blocks (regardless of whether they are contiguous or discontiguous) must be written to the physical blocks of the flash memory of the storage device. On the device side, if no other write command is inserted, the storage device will write the above file data to contiguous flash memory space.
As shown in Figure 3:
(Figure 3: Storage device writes file data to contiguous flash space)
The scenario where the data of the file is contiguously written in the flash space is known as "physically contiguous".
However, when the host side writes the above files, other writes may be mixed, such as the writing of the metadata of the file, or the writing of other file data. The device side receives the write data in sequence according to the received write commands. Due to the interspersed writing of various data, the data of the above-mentioned files may not be contiguously written to the flash memory space.
As shown in Figure 4:
(Figure 4: Storage device writes file data to discontiguous flash space)
The scenario where the file data is not contiguously written in the flash space is referred to as "physically discontiguous", which means that the file is physically fragmented - the more scattered the data of a file is stored in the flash space, the more physically fragmented it is.
It is worth mentioning that even if the file written to the flash memory is initially contiguous, due to some subsequent operations inside the storage device, such as garbage collection, the final memory location where the file data is stored in the flash memory space may not be contiguous.
Impact of file fragmentation on performance
For storage devices (such as UFS devices), since the data of an LBA may be stored in any physical location of the flash memory, the storage device needs to maintain an L2P mapping table, in which the logical block addresses are mapped to the physical addresses. The L2P mapping table is a large array, with the index being LBA and the content being the physical address of the LBA in the flash memory (hereinafter referred to as PBA). When the storage device needs to read data, it first searches the L2P mapping table to obtain the PBA corresponding to the LBA, and then reads the LBA data based on the PBA. The size of the L2P mapping table generally accounts for 1/1024 of the capacity of the storage device. For example, for a 256GB UFS device, the size of the L2P mapping table is 256MB. Generally, there is no DRAM(dynamic random access memory) in consumer-grade storage devices. Therefore, the L2P mapping table data is stored in the flash memory most of the time, and the storage device firmware loads part of the L2P mapping relationship to the small-capacity SRAM (static random access memory) on demand.
When accessing a file, if the LBA of the file is contiguous: on the one hand, the host only needs to send a few commands to the storage device. Taking scenario 1 in Figure 2 as an example, to read the file, it only needs to send one read command. For scenario 2 in Figure 2 (when there are logical fragments in the file), to read the file, the host needs to send three read commands, which increases the number of I/O commands. The increase in the number of I/O commands brings a lot of burden to both the host-side software and the device-side firmware. On the other hand, if the file LBA is contiguous, loading 4KB of L2P mapping from the flash device at a time can satisfy 4MB of LBA data access. On the contrary, if the LBAs are discontiguous, the worst case is that each time an LBA is read, the storage device needs to load a 4KB mapping from flash memory. However, frequent L2P loading will have a serious impact on the storage device read performance.
In short, both the host side and the device side prefer "logically contiguous" to logical fragmentation within a file.
What about "physically contiguous"? Obviously, storage devices also like "physically contiguous". The reason is that if the data is concentrated together while being read, one read can use multiple Plane operations. For example, for 4 Plane flash memory, one read can obtain 64KB of data. But if this 64KB of data is not physically contiguous - scattered in different places in the flash, the worst case is to read the flash 16 times (only 4KB per command).
The conclusion we can draw so far is when both "logically contiguous" and "physically contiguous" are satisfied, the file has the best read performance. However, if any of the scenarios are not satisfied, the file read performance will be affected, which may eventually lead to a "lagging" problem on the phone. Therefore, the optimization direction of file read performance is to avoid or reduce logical fragmentation and physical fragmentation within a file.
Now we come back to the FBO characteristics. As an extension protocol of UFS 4.0, FBO can be summarized as follows: the host and the device cooperate together to convert the file data from "physically discontiguous" to "physically contiguous" to improve the file data read performance.
To be specific, when the system is idle (such as in the quiet of night), the host informs the storage device of the LBA information of a certain (or some) file that needs performance optimization, and the storage device checks whether these LBAs are contiguous on the flash memory block. The device will query the mapping relationship of these LBAs, analyze whether the file is physically contiguous in the flash memory space, and the degree of discontiguity (physical fragmentation) through the physical addresses of these LBAs on the flash memory, and then return this information to the host. According to the feedback information of the device, the host instructs the storage device to take the next action: if the file is scattered in the flash memory space, the host will instruct the storage device to move these discontiguous data blocks to a contiguous position. After the instruction is received from the host, the storage device performs data defragmentation: centrally writing discontiguous data to new contiguous flash memory block locations. FBO can solve the "fragmentation" problem of file data in the storage space through the cooperation between the host and the device, thereby improving the file reading performance.
(Figure 5: FBO defragments file data, turning it from physically discontiguous to physically contiguous)
FBO is designed to solve the physical fragmentation of files.
Problems that cannot be solved through FBO
FBO solves the problem of physical fragmentation of files, that is, turns "physically discontiguous" into "physically contiguous", but it does not solve the logical fragmentation of files. Data shows that the file performance will be more affected by the "logically discontiguous" than the "physically discontiguous" of files, therefore it is more important to solve the logical fragmentation of files.
The logical fragmentation problem of files has existed since the HDD era to the solid-state storage era. Lots of efforts have been made in the industry to solve it.
The first is the emergence of the log-structured file system, represented by F2FS (Flash Friendly File System). F2FS is a file system specially designed for flash-based storage devices, and is one of the two most commonly used file systems in mobile phones (the other is the EXT4 file system). When allocating logical blocks to files, F2FS generally allocates logical blocks by allocating them in sequence, which is known as appending. Only when the logical space of the storage device is full, a logical block allocation method called Threaded Logging is used, which may allocate discrete logical blocks for a file. Nevertheless, the emergence of F2FS has greatly improved the logical fragmentation of files.
The second is the ZNS (Zoned Namespace) technology in the SSD, which divides the entire storage space into several blocks (Zones) and forces sequential writing within the blocks. This benefits storage devices, whose L2P mapping table can be made small (with a larger mapping granularity), thus allowing the L2P mapping table to be resident in memory. When the firmware processes the read command, it can quickly obtain the physical address of the LBA, thereby improving the read performance. While formulating the UFS 4.0 standard, many companies in the industry suggested applying the concept of Zoned Storage to UFS. But from the results, this suggestion was not adopted in UFS 4.0. But it is foreseeable that ZNS technology similar to SSD will appear in a certain UFS versions in the future.
Comments on FBO
Although it does not solve the problem of logical fragmentation of files, FBO solves the problem of physical fragmentation of files, which can improve the file reading performance to a certain extent. When considering using a similar F2FS file system, in which the LBA of most files is contiguous, and now with the addition of FBO, the reading performance of large files on the mobile phone will be improved, and a solution to the "lagging" issue of mobile phones "may have been" found. (Why is it "may have been"? According to the previous analysis, the biggest factor affecting file system performance is file logical fragmentation. FBO won't work so well if the file logical fragmentation problem is not addressed.）
File defragmentation requires reading data from one flash memory block and then centrally writing it to another flash memory block. And this extra write of FBO will cause write amplification, which will affect the storage device life.
In addition, FBO is a remedy, meaning that files are defragmented after physical fragmentation occurs. If the storage device is designed to avoid physical fragmentation of files at the beginning, that is, the generation of physical fragmentation of files is rejected from the beginning, then FBO loses its meaning.
There are two main reasons for the physical fragmentation of files:
1. Due to the mixed writing of various data, the LBA data of a certain file may be written to discontiguous flash memory space at the beginning;
2. Also, some storage devices have not recognized the significance of the contiguous storage of file data in flash memory space while being designed. Some internal operations, such as garbage collection, may cause initially contiguously written file data to be scattered across the flash space, resulting in "physically discontiguous".
If we know the cause of physical fragmentation, storage devices can be designed accordingly.
For example, a storage device can use a physical isolation method in the algorithms, writing large-size data to one flash memory block, and small-size data to another flash memory block so as to avoid the scenario where large-size data is written discontinuously in physical space due to small-size metadata writing.
Or the file data is initially written to the flash memory block discontinuously, but when the flash data block is garbage collected, the contiguous LBA data on the flash memory block is written to a new contiguous physical block, so as to achieve a similar performance of FBO.
Longsys Smart GC Technology
Longsys UFS3.1 has realized the problem of physical fragmentation of files at the beginning of its design. When the R&D team was designing the garbage collection algorithm, they not only considered recycling flash memory blocks through the garbage collection function, but also considered completing the defragmentation from "physically discontiguous" to "physically contiguous" while performing garbage collection. We call this kind of "all-in-one" garbage collection technology Smart GC.
(Figure 6: Smart GC: Device is physically defragmented while performing garbage collection)
This innovative technology of Smart GC not only solves the problem of physical fragmentation of files and improves the read performance of large files, but also avoids the adverse impact of additional defragmentation on the life of storage devices.
Previous FORESEE Memory Product Development and Strategy 2022.10.10
Next SLC Parallel NAND Flash: A Comprehensive Overview 2022.10.18