Transcript
Technical White Paper Enterprise Server Edition v17: A technical overview of the Backup Client
Enterprise Server Edition v17: A technical overview of the Backup Client
Contents Introduction ..................................................................................2
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.
Caching method ...............................................................3 Hashing method ...............................................................3 Indexing engine ................................................................3 Encryption technology on the client .......................4 Encryption technology on the Storage Platform 5 Data integrity ..................................................................... 5 Various backup performance implications ...........6 Various restore performance implications ...........7 VSS robustness .................................................................7 Disk usage and mount points .....................................8 Disk usage of ESE accounts on the SP .................. 10 Backup Operator mode ................................................ 11 General Backup Client features................................ 11
Benefits of ESE ESE Backup Client •
Improved and steady memory usage allows several million files to be backed up within hours
•
Dramatically improved cache access speed
•
Incremental Streaming backups due to its speed benefits and seamless resumption mid-file
•
Interrupted restores are resumed and not cancelled
•
Faster file scanning and patching through better hashing and indexing techniques
•
Faster restore selections
•
Directory structures are only traversed once
•
Multiple volumes are processed simultaneously during backup and restore
Introduction
•
Data is encrypted once using 256-bit AES (GCM)
The Enterprise Server Edition (ESE) Backup Client is
•
Perform Full System Backups and Recoveries
•
Local copies can be kept for faster restores
•
Corrupted and missing files and patches are detected
designed to function in an enterprise environment where millions of files must be backed up. It provides functionality similar to the existing Server Edition (SE) Backup Client but with re-engineered underlying architecture. This document explains these architectural improvements in ESE and any noteworthy features not previously provided in SE.
instantly and retransmitted for storage •
data container •
In order to gain context of the information that follows, take note of the backup and restore processes:
Better Snapshot backups and restores with a single
Single data format for Snapshot and HSM backups – HSM data can be restored directly by the Backup Client
•
VSS snapshots occur only for volumes included in the backup selection thereby saving disk activity
•
Automatic program updates
Storage Platform •
SSL authentication for data transfers
•
Encrypted data storage using 256-bit AES (GCM)
•
Data integrity is confirmed before data is stored
•
Faster roll-ups – only occasional decompression and decryption required
•
Efficient long-term storage of roll-ups
•
Data blocks are transferred in sequence during restore – less decryption and decompression disk- and CPU overheads
2 Copyright © Redstor Limited. All rights reserved
Enterprise Server Edition v17: A technical overview of the Backup Client
1. Caching method In ESE, a new cache (.cache file) is created in the Data folder for each backup. The backup cache on the client machine contains footprint files exactly like SE’s using delta block patching. The difference, however, is that the cache files are stored in a single file, or cache container, which dramatically improves cache access speed.
1.1.
Fast streaming during backup
ESE only performs incremental Streaming backups due to its speed benefits (unlike SE which can perform “Staged” backups which is better suited for low bandwidth environments). During backup the ESE client maintains a rolling buffer of data transmitted to the Storage Platform. When there is a connection drop and the client reconnects to the StorageServer, it is able to go back to the exact position of interruption and seamlessly resume the streaming backup without having to start at the beginning of the file as in SE. This can save a lot of resources and backup time when large files are transferred (also see the section “7.3. Resume and retry interrupted backups” below for more information).
1.2.
Cache usage of Snapshots
In ESE, the cache of any Snapshot backup together with the cache for the last known backup on the Storage Platform are retained on the client machine. At the start of the next backup the Client checks with the StorageServer if the snapshot was imported. If the Backup Account was re-enabled without importing the snapshot, ESE will patch against the last backup. If the snapshot was imported, then it will patch against the snapshot cache as normal. In SE a backup done without importing the snapshot results in a lot of patch failures and will cause a full backup thereafter.
2. Hashing method When a file is processed in SE, the footprint file (containing hashes for blocks) is generated and also a hash of the whole file, the second being a computationally expensive operation. In ESE, the second is calculated over the hashes of the blocks instead of the blocks themselves, resulting in faster processing of the file.
3. Indexing engine The backup index is used to identify new and modified files since the previous backup. Where the backup index in SE was stored in the install folder (C:\Program Files), a backup index is created in the Data folder (C:\ProgramData) for each backup in ESE. An improved method of indexing facilitates faster scanning during the backup process. Note that, when no files have been selected for backup, no processing will be done.
3 Copyright © Redstor Limited. All rights reserved
Enterprise Server Edition v17: A technical overview of the Backup Client
3.1.
Faster file scanning
File scanning and processing are now combined into a single process, which means there is no separate file scanning process before compression and patching starts. Therefore the directory structure is only traversed once and not twice as in SE.
3.2.
Process multiple volumes simultaneously
Indexes are stored on the client machine for each mount point included in the backup selection. This allows multiple volumes to be processed simultaneously during backup and restore. (For more information, see “10. Disk usage and mount points” below)
4. Encryption technology on the client Data is encrypted on the Backup Client by using the 256-bit AES Galois Counter Mode (GCM) encryption algorithm prior to being transferred to the Storage Platform. No extra encryption on the Storage Platform is required after each backup. Note: In order for ESE the make use of performance benefits of the AES-GCM encryption algorithm, the CPU of the Backup Client’s machine must support AES-NI.
4.1.
Snapshots
The format of the Snapshot backup data of the ESE Backup Client is compatible with the Snapshot export performed by the Storage Platform – both are stored in encrypted form using 256-bit AES (GCM). This means that a Snapshot backup can be used to restore data on a client machine without it having been imported and exported from the Storage Platform. For example: 1. Perform Snapshot backup using ESE. 2. Save Snapshot on mobile storage device. 3. Restore data directly on client machine from mobile storage device using ESE.
4.2.
Local Backups
Local backup copies are accumulated as incremental backups on the local disk based on the retention setting. When local copies are kept, the data is encrypted in the same fashion as on the Storage Platform. Thus, no data is accessible without the encryption key. A local copy has to have a companion backup on the Storage Platform taken at the same time. This is to ensure that the Storage Platform as the final authority of secure backups, e.g. in the case of disk corruption on the Client machine. After each backup is taken, previous local copies that fall outside of the retention setting are consolidated. Note: There could be siginificant impact on local disk usage with large backups and a high retention setting.
4 Copyright © Redstor Limited. All rights reserved
Enterprise Server Edition v17: A technical overview of the Backup Client
4.3.
HSM
The format of an HSM (Hierarchical Storage Management) backup is compatible with the Snapshot export performed by the Storage Platform. This means that the archived HSM backup can also be used to restore data on client machine using ESE by importing it as a “Restore Snapshot”.
5. Encryption technology on the Storage Platform SSL is used to authenticate the data transfer and to create a secure session between the ESE Backup Client and the Storage Platform. Backup data on the Storage Platform is stored in encrypted form using 256-bit AES (GCM). Processing performance is increased on the Storage Platform (since the V8 Peregrine release) due to all encryption being done by the Backup Client and no decryption of data being required before storage.
5.1.
Snapshots
In the event that a Snapshot backup was performed on the client machine but never imported on the Storage Platform, the Backup Client will require a full backup as explained in “1.2 Cache usage of Snapshots” above. It is, however, still possible to successfully import such a Snapshot after subsequent backups have been performed without affecting future backups. (Also take note of the impact on cache storage on the client machine mentioned in “1.2 ”.)
5.2.
HSM
(See “4.3 HSM” above)
5.3.
Roll-ups
When a roll-up cycle is performed, most of the data is neither decrypted nor decompressed in order for it to be consolidated. (Decryption and decompression is only needed when patching is not aligned to 64k boundaries.) This results in much faster roll-ups than on previous versions of the Backup Pro Storage Platform.
5.4.
Restores
In SE, a restore of a long patch chain requires the decryption and decompression of the base file and all its patches, the replaying of the patches, and the compression of the final file for transmission. In ESE, very little decompression or decryption is done – in most cases, only the relevant blocks from the base file or patches are transferred in the correct sequence, resulting in significantly less disk and CPU activity. This technique is used in both DynamicRestore as well as the ESE Client.
6. Data integrity The ESE Backup Client, in conjunction with the Storage Platform, will detect whether files were not received correctly by the Storage Platform. By making use of the 256-bit AES (GCM) encryption algorithm, the integrity of 5 Copyright © Redstor Limited. All rights reserved
Enterprise Server Edition v17: A technical overview of the Backup Client
each block of data is verified before being stored on the Storage Platform. In the unlikely event of backup data being corrupted in-transit, it will not be accepted by the Storage Platform. Data corruption is detected early without requiring the original unencrypted copy and without the need for running full integrity checks. (Integrity checks can however still be run from the Storage Platform Console to detect corruption on Storage Platform disks.) For more on data integrity, see the Technical Overview document titled Enterprise Server Edition v17: Data Integrity from End to End (available on the Redstor Partner Portal)
6.1.
Full System Backups
If enabled, the ESE Backup Client can perform System State backups which allows the recovery of an entire server in a disaster, for example, i.e. a Full System Recovery. Note: See Redstor Knowledge Base article 923 on how to perform a Full System Recovery with the InstantData application.
7. Various backup performance implications 7.1.
Steady memory usage
The ESE Backup Client does not require additional memory if the number of files to be backed up increases. A default of 512 MB for the Windows service, and 1 GB for the user interface is recommended. Due to the ESE Backup Client’s efficient memory and disk usage, it is able to process several million files within hours.
7.2.
No Missing files
Files removed from the Storage Platform (due to disk corruption, for example) are identified at the start of each backup in which case they are immediately processed and resent from the ESE Backup Client in the same backup. In SE, they are only identified at the end of the backup which means it will only be re-transmitted with the next backup.
7.3.
Resume and retry interrupted backups
Due to using a transfer buffer (as mentioned in “1.1. Fast streaming during backup”), interrupted Streaming backups can be resumed. Also, if a backup is interrupted unexpectedly, ESE will reset the retry count as soon as the backup has been successfully resumed. This means that infrequent connection drops will not use up retries as long as the backup can be resumed.
7.4.
Fast Snapshot backups
Because all data of the Snapshot backup is stored in a single file container, Snapshot backups are done much faster and are also quicker and easier to copy.
6 Copyright © Redstor Limited. All rights reserved
Enterprise Server Edition v17: A technical overview of the Backup Client
7.5.
Partial backups are purged
Partial backups may exist on the Storage Platform and are caused when streaming backups are cancelled by the user. Because a cancelled Streaming backup cannot be resumed, these partial backups are only taking up unnecessary space. To correct this, the ESE Backup Client will send a “purge” instruction to the Storage Platform whenever a backup is cancelled that will remove the partial backup to free up disk space.
8. Various restore performance implications 8.1.
Resume and retry interrupted restores
If the restore process is interrupted unexpectedly, ESE will make a configured number of attempts (retries) to resume the restore process. The retry count is reset once the restore is resumed, similar to backups. Also, files that cannot be restored from the Storage Platform will not abort the restore process.
8.2.
Fast Snapshot restores
After a Snapshot backup has been performed, the Backup Account on the Storage Platform is disabled for backups only. Restores can still be performed on the client machine. And because all data of the Snapshot to be restored resides in a single container, this speeds up the Snapshot restores.
8.3.
Browsing for backed up files
Due to improved indexing compared to SE, the ESE Backup Client displays files from previous backups in the Restore tree much faster when browsing for files to restore.
9. VSS robustness During backup, ESE takes VSS snapshots per volume. No VSS snapshots will be taken of volumes that don’t contain any files in the backup selection (in contrast to SE which create VSS snapshots for all existing volumes before transferring only relevant files to the Storage Platform). This saves CPU, disk and memory resources. Additional robustness has been introduced: •
Files previously backed up using VSS will merely be skipped if the whole volume or network storage they reside on is not available, instead of being removed from backup storage.
•
In SE, the VSS snapshot is only taken after file scanning. If a file is deleted or added just before the VSS snapshot is taken, this may lead to errors during the rest of the backup. In ESE, the VSS snapshot is taken before scanning and processing starts, ensuring a consistent view of the files and no “missing file” errors.
7 Copyright © Redstor Limited. All rights reserved
Enterprise Server Edition v17: A technical overview of the Backup Client
10. Disk usage and mount points 10.1.
Disk usage of the cache
Historical cache files are removed at the end of each backup process. Only the last backup’s cache is retained, similar to SE (with the exception Snapshots as explained in “1.2” above).
10.2. •
Disk usage of the index Due to an index being kept per backup, there is a slight increase in disk usage compared to that of the SE Backup Client. However, no indexes have to be downloaded from the Storage Platform for restore purposes.
•
Total index size per backup is estimated at 150 MB per one million. As the number of files to be backed up grows, so will the disk usage of the index.
•
As the Storage Platform performs roll-ups of older backups, so the indexes for these rolled up
Example scenario
backups are also removed on the client machine.
10.3.
Disk usage and mount points
ESE contains several disk usage improvements over SE, as mentioned above. The ESE Backup Client’s simultaneous processing of multiple volumes and mount point support is further explained below. The Backup Client uses the backup selection and profiling
•
Volumes
C:
and
E:
will
be
processed
settings to determine which files and folders will be
simultaneously because they do not share the
included in the backup selection. In order for files to be
same physical disks. Therefore D: and the “\\fileserver\share”
backed up in parallel, the processing sequence is restricted
•
will
be
•
Volume “MyDrive” has no drive letter assigned to it but is indirectly included in the backup
Storage device: In context of Backup Pro data
selection via a mount point from E: and shares
backups, a physical disk or a UNC network storage
the same physical disk. Volumes D: and MyDrive
location. •
share
processed thereafter.
by both their logical and physical location on storage media. The following terms are relevant:
network
Volume: A single accessible storage area resident on one or more partitions of one or more physical disks (a.k.a. logical drive or a “mount”). A single volume can therefore span multiple physical disks and a single physical disk can contain multiple volumes.
will be backed up simultaneously but E: will be backed up separately from MyDrive. Note: The physical disk performance depends on the read pattern. A sequential read pattern tends to be faster than a random read pattern. If two volumes on the same disk are processed simultaneously, the read pattern will be more random. This is why ESE rather processes these volumes sequentially.
8 Copyright © Redstor Limited. All rights reserved
Enterprise Server Edition v17: A technical overview of the Backup Client
•
Drive letter: As one of the methods of making the files on a volume accessible, the Windows operating system assigns a drive letter to each volume e.g. C:, D:, E:, etc.
•
Mount point: A specific path in an existing volume that points to another volume.
The dynamics between storage types are as follows: 1. During the backup process, up to four volumes on four different storage devices on can be backed up in parallel. 2. However, when two or more volumes share the same storage device, these volumes will be processed consecutively. 3. The same principle applies to volumes mounted within volumes. Where mount points refer to items on volumes that reside on the same storage device, the volumes will also be processed one after the other. Also take note of the following limitations: •
Mounted drive images like ISO files are treated as removable disks and will not be backed up
•
Progress bars in the Backup Client are displayed per volume being backed up.
•
Cyclical references of mount points between volumes are not valid if included in the backup selection and will result in a failed backup e.g. C: points to D: (via mount point) which points back to C: (via mount point).
10.4.
A note on reparse points/tags
Some items will not be backed up by pre-v16.8 ESE Clients because the files contain unsupported reparse points, identified by a "reparse tag", such as in symbolic links and offline files. Note: For more information and a full list of supported reparse tags, see the Redstor Knowledge Base article 585. Deduplication by Windows: A known limitation Currently, files with reparse points in your backups selection are not backed up if those files have been deduplicated by the Windows operating system. The reason is that the chunk containers of reparse points are not backed up except the reparse tags, along with zero-filled content as presented virtually by the deduplication service to give the illusion of the correct size for these files. However, these chunk containers are backed up when doing a Full System Backup. This requires the setting to be enabled on the ESE Client and the recovery be done using InstantData’s Full System Recovery method. (See Full System Backups in the “Data integrity” section earlier in this document for more information.)
9 Copyright © Redstor Limited. All rights reserved
Enterprise Server Edition v17: A technical overview of the Backup Client
11. Disk usage of ESE accounts on the SP 11.1. •
Deduplication & Pre-allocation Deduplication: due to the different storage formats that exist between Backup Accounts created using ESE (type 2) and other Backup Clients (SE & DL, type 1), deduplication/Single Instance Storage (SIS) cannot occur across type 1 and 2 accounts. Deduplication is, however, still performed across Backup Accounts of the same type.
•
As type 2 data is written to disk on the Storage Platform, data blocks are pre-allocated during backups, roll-ups and snapshot imports to help reduce file fragmentation.
11.2.
Long-term storage & Roll-ups
Roll-ups on the Storage Platform enforce data retention and maintain disk usage. This is particularly useful in regulated environments and where ESE backup accounts have large data sets. (Also see Roll-ups, in the section “Encryption technology on the Storage Platform” above.) Note: Roll-ups differ from the Storage Platform’s current HSM capability in that data is not tiered to lower cost or offsite storage. However, roll-ups can still be configured to adhere to the HSM policy. How it works A series of incremental backups is consolidated (rolled up) into a single data set (a roll-up) at scheduled intervals e.g. monthly. This results in the synthetic full version of each file in the backup being created which frees up disk space. The roll-up is then used by the Storage Platform as the base reference for subsequent backups and restores (it remains online). With each roll-up cycle, as newer roll-ups accumulate so older ones are deleted, depending on the retention settings. A sequence of roll-ups and the available online backups per cycle are illustrated below:
10 Copyright © Redstor Limited. All rights reserved
Enterprise Server Edition v17: A technical overview of the Backup Client
Roll-up settings on the Storage Platform Console The default roll-up retention settings on the Storage Platform is a calendar month, however, this retention can be altered according to the needs of the Storage Platform Administrator using the Storage Platform Console application. Retention can be handled in a variety of ways: a) Per backupn: Roll-ups can be set to occur after a certain number of backups have been run (e.g. 5) b) Per roll-up: The number of roll-ups kept on the Storage Platform can be altered. For example, previous month’s roll-ups can be set to be retained on the SP and not deleted from the StorageServer, however, this will have storage space implications. Disk usage projections Although roll-ups have always been part of the Backup Pro solution, since v16.11 roll-ups for ESE Backup Accounts are stored more efficiently by preventing duplicate data between roll-ups: like incremental backups, only the differences are kept between each roll-up. The difference in disk usage can be illustrated as follows:
Note: Actual disk usage of roll-ups is difficult to predict because the size of each depends on the nature of the files and the changes they contain. Standard Storage Platform efficiencies such as compression, encryption and deduplication also affect the final size on disk.
12. Backup Operator mode Server Edition users will be familiar with the "Backup Operator" mode provided with the Backup Client (which is set when deploying the MSI from the Storage Platform Console). ESE, however, makes use of the appropriate Windows API for performing backups and therefore the Backup Pro service runs as the LocalSystem user (by default) which is within the Backup Operators group, and will have the necessary permissions to access all data necessary for backups.
13. General Backup Client features 13.1.
Automatic updates
The ESE Backup Client stays up to date by means of a scheduled update request being sent to the Storage Platform. This check occurs daily and after the Backup Client Windows service has been restarted and also after each backup. 11 Copyright © Redstor Limited. All rights reserved
Enterprise Server Edition v17: A technical overview of the Backup Client
13.2.
Folder structures on the client
ESE follows a Windows-standardised approach to folder structures used by the Backup Client: •
The location of user data i.e. ProgramData. This contains, for example, the Backup Client working folder and the log folders.
•
13.3.
Installation of the application is made to the Program Files folder
Useful logging and logging folders
Due to the large number of files expected to be backed up with ESE, the default logging option is set to the “Suppress detail” (as opposed to “All information”). This prevents log files from being unnecessarily inflated. Important logs are contained in the following files: File
File location
Purpose
Backup.log
User-relevant backup information
Backup.log_summary*
User-relevant backup information summary*
Restore.log
[log folder as per Backup Client Options]\
User-relevant restore information
Restore.log_summary*
User-relevant restore information summary*
Update.log
Auto-update information
backupservice.log
All information reported by the Backup Client Windows service
backupservice_errors.log backup-ui.log
Pre-v17.4 - C:\Program Files\Redstor Backup Pro\Backup Client ESE\ Pre-v17.4 - C:\ProgramData\Redstor Backup Pro\Backup Client ESE\
Only errors and warnings Technical user-interface information
*These files are created and updated when clicking the “Summary” button in the Backup Client.
13.4.
User notifications
Notifications are presented in the Windows System Tray. Compared to the SE and DL Backup Clients’, they have been simplified and clarified to avoid ambiguity.
12 Copyright © Redstor Limited. All rights reserved