Vinchin CEO Dr. Hu Speech | Trends of the Data Backup Technology in Hybrid Cloud
On November 27, organized by Informa Markets, the world’s largest exhibition organization, jointly undertaken by Informa Markets China (Chengdu) Co., Ltd., and Sichuan Tianfu International Conference & Exhibition Co., Ltd., as the information security conference based in Chengdu and connecting the world, 2020 INSEC WORLD · Chengdu came to a successful end at the Western China International Expo City. The four-day conference encompassed two-day advanced training, two-day keynotes, six tracks, and technology exhibitions, attracted more than 2,000 attendees, 700,000 online global audiences, more than 40 Exhibitors, nearly 100 media coverage.
Vinchin CEO, the professor of Sichuan University, Dr. Hu, delivered a speech about "Trends of the Data Backup Technology in Hybrid Cloud" at the Data Security and Cloud Security forum in the 2020 INSEC WORLD·Chengdu. With the continuous development and application of cloud computing, the combination of data backup technology and the Cloud becomes closer and closer. New scenarios keep emerging, and cloud-based backup technology needs to be continuously developed. Dr. Hu shared the Archive Backups, Disaster Backup and Recovery, and other technology of Vinchin, to improve the security, ease of use, and bring more utilization value to the users.
The speech content covers 4 parts, "the challenge and state of the hybrid cloud ", "the data backup technology under the hybrid cloud", "the typical application scenarios" and "Vinchin application and customer story”.
The challenge and state of the hybrid cloud
According to the IDC report, until the first season of 2020, public cloud market size reaches about 3.9 billion dollars with a year-on-year growth of 60%, which shows a very prospective increasing speed. It’s estimated that the number will reach 120 billion dollars by the end of the year.
The trend shows that the application of public cloud has made a great progress, even under the serious influence of covid-19.
Among all China domestic cloud vendors, Alibaba, Tencent and Huawei are still the top 3 winners. Although Huawei Cloud starts very late, its strong relationships with customers from government enterprises helps it to grow quickly. Also, Huawei Cloud takes great advantage of hybrid cloud to enlarge their public cloud markets, and that’s another reason why it succeeds to develop so fast.
We can see that in 2019, the number of private cloud users covers only 38% among large businesses and traditional government enterprises market. According to Development Research Center, the number will increase to over 60% in 2023.
Another survey result by RightScale shows that in 2019, 58% of enterprise customers were planning to deploy hybrid cloud service in the future, which increases 7% compared with that of 2018.
Actually, with public cloud has put into wide use this year, series of data security issues were also inevitably reflected under such circumstances.
For examples, on February 1st 2017, Gitlab, a company specialized in code service, lost 300GB of source codes that were accidentally deleted during maintenance process, and serious loss had caused.
On September 29th,2019, Azure, the data center of Microsoft, interrupted service for 7 hours long because the fire extinguisher system was accidentally turned on, which has been a quite famous story in the industry history.
On July 18th, 2018, AWS encountered serious technical breakdown that last 6 hours long.
And back to China, on March 2nd, 2019, Beijing data center of Alibaba Cloud also encountered about 6 hours of downtime caused by IO error, thus had to compensate their customers for disobeying SLA index which promises to have 99.99% availability per year with maximum 4 hours of downtime. The breakdown had influenced the network processing within the whole northern part of China.
During the period of covid-19 outbreak, a maintainer of Welmob deliberately deleted its database, caused the company which mainly focuses on Wechat business that has large quantities of users had to spend few days to recover the data and service. As a result, 2 billion yuan of listed market value was lost.
So, it’s clear that public cloud data safety is in fact not as perfect as we think.
During actual practice, ransomware is another very threatening factor. Judging from some cases, many large companies in the world, including some in China, have been affected by it. It can initiate indiscriminate attack aims to get real money like USD, RMB or Bitcoin, and you may easily get hacked at any time.
Therefore, data backup and disaster recovery are the last line of defense for data security, playing a role of saving life.
But before setting out to data protection, you must figure out where data is stored first. As data in private clouds, we know perfectly well that they're in the data center.
But what about public clouds?
If we buy data from Beijing on Alibaba Cloud, which rack server it's in?
We don't know, let alone it will move into another row. That's why you'll think it's just in the cloud service provider.
In fact, this mode of operation has some security risks. Therefore, according to the basic requirements of the 2nd and 3rd level in Cyber Security Level Protection 2.0 (*), users are required to save data in the public cloud to local site.
Many people may ask, what kinds of users are involved in the 2nd and 3rd levels? From the current understanding of laws and regulations, it goes without saying that many industries including the national economy and the people's livelihood, even radio and television, education, and medical care, most of the institutions in these industries will be covered.
Therefore, if we use the public cloud, we must back up the data to local site. Such regulations thus bring some new application scenarios.
(*) ：Information security technology-Baseline for classified protection of cybersecurity
The data backup technology under the hybrid cloud
This is the ideal application scenario of hybrid cloud.
The picture on the left lists some private clouds, and the right are public clouds. In private clouds, we will deploy sensitive or confidential data and applications; in public ones, we will deploy applications that require fast computing power growth or high elasticity for business. In that way, the advantages of both clouds can be brought into play.
But another problem comes: how to realize the data migration and synchronization between the public and the private cloud back and forth in time? As a result, the application scenarios of data management and backup will become more complicated.
The first step is to put the application on the cloud. If you want to run a brand new system, just purchase a VPS and you can easily install the program on the cloud. However, in reality, many companies and institutions already have a data foundation, thus it’s a more challenging task. One of the optional choices is to change the old operating system into a new one, and then migrate the data.
In addition, we can also directly migrate the operating system and data to the cloud together. This time, we need to check many compatibility issues. Cloud migration thus become a complex thing.
If you want the cloud migration to be done in a very short time, professional database and file continuous data protection technology must be applied to the cloud.
After successfully migrating all data and applications, in an ideal state, both public and private clouds can smoothly switch services and synchronize data with each other. Then put the disaster recovery center and the private cloud in the same city to form a typical 2-location-3-center structure with the public cloud.
In this way, both data and applications in private and public cloud are protected and can work with each other. Also, Both clouds’ data are backed up to the disaster recovery center to achieve data-level disaster recovery, which is a quite ideal scenario.
What technologies are needed?
The first one is the private cloud agentless backup. The virtual machine can be fully backed up to the backup system directly through hypervisor, and there is no need to install any agent in the operating system, thereby reducing maintenance and system overhead. This technology has been put into wide use in private clouds.
Second is the public cloud agentless backup. Through the Cloud OS interface, the public cloud data will be completely backed up offline or in the backup system. In order to do so, we will add a backup appliance into every public cloud host which will automatically connect with the public cloud through OS. The backup data will be cached in the backup appliance, and eventually, be stored in the backup system. Data recovery is just the reverse version of it.
Through these two agentless backup technologies, the operating system and complete data can be well tied together, and the recovery process will become faster.
Since we've been focusing on virtualization backup, and the amount of data in the backup process will be relatively large because the operating system data is also contained, so it’s difficult to do a traditional full backup once a week. In order to solve the issue, we intend to use the forever incremental technology, that is, only deploy a full data backup once, and the subsequent backup nodes are all incremental ones, reducing storage occupied by the backup data.
During the process, it's also very important to deploy automatic data merging to merge the old data into the full backup restore point, thereby freeing up the backup storage.
And we also need to exclude data that has no backup value, such as swap files, partition gap, and even some deleted files like a movie that exist in the operating system.
When we deploy virtualization backup, we will find them out, and achieve the purpose of excluding the target file through the partition table and the mapping of file system, thereby improving the backup speed.
Since we are backing up data in a hybrid cloud scenario, the bandwidth of some users is not enough. Therefore, we need to accelerate the WAN speed.
A technology similar to deduplication will be used to retrieve or compute the hash value of the target data when executing backup, and then compare to the existing data on the backup system. If they’re identical, the data will not be uploaded to the WAN, thus increasing the backup speed by dozens of times, greatly enhancing the backup efficiency and reducing bandwidth pressure.
After introducing some backup technologies, let's move on to the instant recovery technology that is only available for clouds.
This recovery technology has abandoned the traditional method of copying production data to the production system. It can successfully run the virtual machine directly by sharing the backup data with the production system. Generally, it only takes more than 10 seconds. And just in about 1 minute, it can take over the whole service system, and we do not have to consider whether the backup is a full backup or an incremental backup, which greatly reduces the time required for recovery and improves efficiency.
When deploying traditional backup solutions, we’ve always wondered if the backup data available or damaged. This is a problem that all of us worry about. But now, through automatic verification, we can restore the virtual machine by constructing a logically isolated network, and let the applications in the system start running again.
With the help of scripts, MD5 calculations and various methods, we enable to detect whether the data in the system is complete and whether the application can run. In this way, we can clearly know whether the backup data can be used for final recovery. If it’s confirmed damaged, we will send SMS and email to the user to report such potential risks.
Sometimes even though the virtual machine is still running, the file system in it has been corrupted and an error occurs on restart. Automatic verification can always easily detect such problems.
In addition to the backup and recovery technology, we’re also concerned about the disaster recovery, and hope to complete data and application recovery in a very short time.
Let's look at the diagram, on the left is a production system, while on the right is a set of backup systems and some applications. We can quickly catch the changes from hypervisor and write them into the mirrored volume in real time. At the same time, all data will be written down in the log volume in order, like the 1-8 data blocks shown in the example.
When the production system is physically damaged by a fire or flood, the data in the mirrored volume can be used. Or if a logical error occurs, you can roll back the data in the log volume to where it was needed before.
This is the process of virtualization disaster recovery, which is meaningful for the development of cloud platforms.
Now, let me show you the conversion between various cloud loads.
Cloud structure may vary from different platforms. For example, if we want to run AWS-based applications on Huawei Cloud, is there any way to achieve it? Due to various reasons such as budgets and policies, sometimes applications must be migrated, and v2v is now adopted to convert the virtual machine format and replace the driver for it to run on the new cloud platform.
Therefore, this kind of migrating way becomes very important.
Since there are so many complicated technologies and applications implemented, management becomes crucial as well, and that’s why we use web-based management method in the backup system, allows IT administrators to make a complete operation within ten minutes. And it’s also a built-in multi-language backup system supports Chinese, English, German, Czech, Slovak and more for you to use.
We provide a big-screen module that can clearly show the status of the system. For system maintainers, it helps them to quickly find possible risks, while for supervisors, it helps them to know the effects of disaster recovery and backup system investment at a glance.
The typical application scenarios
A backup example of private cloud
This shows a backup example of private cloud, which has been widely used in most industrial enterprises, hospitals and schools. We will use cloud platforms such as VMware or Huawei Cloud to centrally back up the virtual machines and applications.
A backup example of the public cloud
This is a backup example of the public cloud. We will back up the workloads on the public cloud to local site through legacy database backup, virtual machine backup or database CDP (continuous data protection).
This method complies with the mandatory requirements of Cyber Security Level Protection 2.0 (*), and also has data backup locally. It is hoped that the application operation can be quickly taken over, so as to achieve business continuity operation of the two data centers, which is a relatively ideal scheme.
Back up local data and applications to the public cloud
Another solution is to back up local data and applications to the public cloud. This will package the entire virtual machine and database in the private cloud to Alibaba Cloud or other clouds and let the application run.
Archive the backup data to Cloud
As for users who lack funds but also want to do offsite backups, they can archive the backup data to Alibaba Cloud or AWS. Archiving is very cheap, about 1GB only costs one or two dimes a month. So this is also a low-cost solution.
Applications on government clouds
There are also applications on government clouds. A typical government cloud will use thousands of virtual machines. How to implement backup in such a large environment?
We will first deploy multiple backup nodes in the environment so that all government critical data can be distributed and backed up to different storages via those backup nodes, all nodes can be centrally managed by a single web console of the backup server to achieve better results in management.
Vinchin partner and customer story
We have done a series of adaptations for mainstream cloud vendors such as VMware, Citrix and RedHat.
In China, we also have native cloud vendors including Huawei, Huashan, Inspur, and Sangfor to complete industry mutual certifications.
Our products also have mutual certification with some cutting-edge cloud vendors such as Xsky, Zstack and AWCloud.
Judging from this aspect, our product ecology is developed relatively complete.
We also have various customer stories including Shandong Provincial Government Cloud, Tianjin Municipal Affairs Cloud, Sichuan Suining Municipal Affairs Cloud and other large government affairs clouds.
Energy companies such as PetroChina and Yunnan Energy Investment.
First-class universities concluded in project 211 and 985 in the education industry; Hanjiang Group, Kelun Group and other enterprises.
We've also done a lot of work overseas and served thousands of customers including Leag, the second largest energy group in Germany, Evora University in Portugal, which needs data backup support for covid-19 research, printing manufacturer Enbi with headquarter in the United States, has factories both in China and Singapore, Turkish investment banking arm Yatrim, Expleo the vps service provider across Austria, Czech Republic and Slovakia, and many more.
As today's circumstances accelerate cloud adoption, make sure you don't overlook Intelligent Data Management as a key component of your cloud strategy. Lock your data down with Vinchin.
For more information and video about the event, please keep looking forward to the upcoming news.
Interested Blogs More
Brief introduction to hypervisor, KVM and QEMU
Challenges faced by traditional it (architecture, development, operation and maintenance) posts and Countermeasures
Save data before fire, why DR solution a must-have for your business?
WHAT'S NEW in Vinchin Backup & Recovery v6.0 - Introduction