| Nie Xuejun(聂雪军),Qin Leihua,Zhou Jingli.[J].高技术通讯(英文),2012,18(1):45~50 |
|
| A content aware chunking scheme for data de-duplication in archival storage systems① |
| |
| DOI: |
| 中文关键词: |
| 英文关键词: data de-duplicate, content aware chunking (CAC), candidate anchor histogram (CAH) |
| 基金项目: |
| Author Name | Affiliation | | Nie Xuejun(聂雪军) | | | Qin Leihua | | | Zhou Jingli | |
|
| Hits: 2017 |
| Download times: 0 |
| 中文摘要: |
| |
| 英文摘要: |
| Based on variable sized chunking, this paper proposes a content aware chunking scheme, called CAC, that does not assume fully random file contents, but tonsiders the characteristics of the file types. CAC uses a candidate anchor histogram and the file-type specific knowledge to refine how anchors are determined when performing de-duplication of file data and enforces the selected average chunk size. CAC yields more chunks being found which in turn produces smaller average chunks and a better reduction in data. We present a detailed evaluation of CAC and the experimental results show that this scheme can improve the compression ratio chunking for file types whose bytes are not randomly distributed (from 11.3% to 16.7% according to different datasets), and improve the write throughput on average by 9.7%. |
|
View Full Text
View/Add Comment Download reader |
| Close |
|
|
|