It is no secret that AWS, Azure, and Google, like most cloud providers, charge nothing for data ingress. These organisations ensure it costs nothing to put your data into the cloud; however, they rack up the charges for data egress – or getting your data back from the cloud. Unfortunately, when companies model their business case by comparing on-premise to cloud, they often forget or underestimate this cost.
These charges have long been one of the biggest reasons behind the reluctance of some companies to move their data into and out of the cloud. The complexity of these charges makes it almost impossible to calculate how much money needs to be allocated to manage data effectively. Cloud vendors are transparent about data egress fees, but they’re still confusing to get a full view of, and these costs tend to be high.
The spiralling cost of data egress
According to an internal NASA audit carried out in March 2020, the volume of Earth observation data that NASA will need to archive is expected to increase from 32 to 247 petabytes over the next six years. One petabyte of storage is the equivalent of 1.5 million CD-ROM discs. The data growth is attributed to several high-data-volume missions, such as the NASA-Indian Space Research Organization Synthetic Aperture Radar (NISAR) and the Surface Water and Ocean Topography (SWOT), coming online.
The data collected from these missions is being migrated from NASA's 12 Distributed Active Archive Centers (DAACs) to a project known as Earthdata Cloud. In the new paradigm, data storage will migrate to the cloud, and DAAC-provided tools and services built on top of the data will be colocated in the Earthdata Cloud. This new cloud service is operated by Amazon Web Services (AWS). As a result, this data is openly available to all scientists. Currently, when end users access and egress data through a DAAC, there is no additional cost to NASA other than maintaining the current infrastructure. However, when end users download data from Earthdata Cloud, the Agency, not the user, will be charged every time data is egressed.
Transferring 247 PB of data a single time at an estimated cost of $0.05/GB would cost $12.35 million. Transferring data once is not the same as repeatedly writing it out over time, though, which may contribute to the audit's estimate that the related cloud budget is projected to grow exponentially in the coming years, reaching approximately $30 million annually by 2025. The audit also found that current cost projections may be lower than what will be necessary to cover future expenses, and cloud adoption may become more expensive and difficult to manage. This presents potential risks that scientific data may become less available to researchers if NASA imposes limitations on the amount of data egress for cost control reasons. NASA is still in the process of identifying which data sets will migrate to Earthdata Cloud, indicating that it will continue to struggle with its cloud usage projections.
However, this dilemma is not unique to NASA, and it’s not unique to large enterprises. For NASA, the data is being downloaded regularly by scientists worldwide to improve their modelling, so the costs could mount very quickly. But, in this age of advanced analysis, digital twins, and the growing use of artificial intelligence and machine learning, that use case could be mirrored by any organisation.
Even if you’re not using petabytes of data, you still need to be careful about data egress costs when planning on expanding your business and launching new projects. Data egress costs can quickly get out of control if:
- Your business has many workers in diverse locations (different offices and working from home) who are downloading data to work locally
- You’re planning on launching or increasing data analytics, or any other data heavy project
- Regular updates are needed to databases or software applications that require frequent downloads
Options for lowering egress costs
As simple as it seems, it is not an option to simply try and reduce data egress to reduce related fees. Focussing on reducing data egress in your business runs the risks of disrupting people’s ability to work, and stifling productivity if they can’t access the data they need. Reducing costs around data egress needs to focus on optimising processes and network infrastructure.
There are several options to lower egress costs on Azure, AWS, and Google. One is by directly connecting their platform to your private network. Azure, AWS, and Google Cloud Platform offer discounts. The challenge, however, is around how to establish these connections. To connect directly with a hyperscaler, you have two routes you can take.
The first is installing a private Multiprotocol Label Switching (MPLS) or ethernet connection from your premises to the hyperscaler's on-ramps. This is an expensive and time-consuming operation and depending on distance could add latency. The other route is working with a data centre to interconnect your hardware with their network to connect direction with the cloud.
Making the most of your data
Don’t let the financial implication of data egress stifle innovation by avoiding downloading your data. The tendency may be to leave the data in cloud storage to prevent egress data, but that is a short-sighted strategy. You have collected your data and need to use it to deliver vital business insights, so you must use it to its full potential. This is especially important when innovative technologies such as artificial intelligence and machine learning turn data into business value in new and unexpected ways.
According to the Hybrid Cloud Storage Maturity Model 2021 report from S&P Global Market Intelligence, 34 per cent of enterprises said egress costs had impacted their use of cloud storage. As a result, some of these companies have repatriated data to their on-premises data centres. At the same time, more have placed data with a private cloud partner offering advanced interconnection services such as Cloud Connect services.
Whether you are cloud-first, cloud always, or cloud where it works, you need easy connectivity that enables you to move data from one platform to another without disruption or delay. With 4D's Cloud Connect services, you will utilise our network's strength and power, plus our Cisco-certified engineers' expertise, to achieve fast, seamless connectivity. In addition, when your workflows are spread across multiple clouds, you need the connections between those cloud environments to be fast, reliable, and cost-effective. Alternatively, if you’re interested in launching a hybrid cloud solution, or want some advice, get in touch to talk to one of our experts.