This is the first of two blogs about Ericsson Cloud Storage written for a large audience. The result is a collective effort in the Accessibility team from Cloud Product Team: Hans Haenlein, Johan Carlsson and myself. We started by asking what is Big Data? This is one of those terms we believe we know, but we need to look closely to really understand.
Big Data is changing our lives
In the last ten years, cloud based storage data services have exploded. Data volumes are growing up to 60% per year. By 2020 the total stored and archived data accumulates worldwide to a staggering 40,000 EB. This mind boggling volume of data – called Big Data – forces us to re-consider the way we design computer storage’
|Storage Units for Big Data||Symbol||Size|
|Terabyte||TB||1 TB = 1,000 GB|
|Petabyte||PB||1 PB = 1,000 TB|
|Exabyte||EB||1 EB = 1,000 PB|
|Zettabyte||ZB||1 ZB = 1,000 EB|
|Yottabyte||YB||1 YB = 1,000 ZB|
There is no longer such a thing as small data. In an article from Information Week in 2014, the founder of Cleversafe, Chris Gladwin, is quoted saying:
“A decade ago (in 2004), only 60-70 companies in the world acquired a petabyte or more of new storage each year. “There was maybe one organization at that time that was at a 100-petabyte scale, and that was it. Now, 10 years later,(in 2014) the number of companies in the world that are deploying a petabyte or more of storage every year is around 7,000.”
A decade from now (in 2024), “when you look at the capacity-optimized segment of enterprise storage — which is the big enterprise storage systems — we’re projecting that zero percent of the market will be systems that are a petabyte or less.””
Here are some metaphors to visualize these mind boggling volumes of data
- 1 EB of storage could contain 50,000 years’ worth of DVD-quality video. If we travel 50,000 years back in time, we will see the Sahara desert as a wet and fertile land. At that time, the Later Stone Age just started in Africa…
- To store 1 EB, let’s assume 4 million consumer grade 250 GB hard drives. In a research paper by Google, the Annual Failure Rate (AFR%) of commercial grade disk drives is 8% after 3 years. Per day this is 877 drives failing or 37 drives failing per hour.
- Wikipedia visualizes the size of the storage of 1 YB using 1 TB disks as “one million city block size data centers, as big as the states of Delaware and Rhode Island combined.”
From now on, legacy storage won’t work because the architecture can’t scale. The usage of replication drives will send costs to unmanageable levels. Storage objects will be 1000x larger than today. Web-scale requires a new storage architecture.
Structured data can be easily organized and placed in databases. It accounts for only 20% of the data available worldwide.
Examples of machine generated structured data are Sensory Data – GPS data, manufacturing sensors, medical devices; Point-of-Sale Data – credit and debit card information,
Seamus Keane, Head of Marketing – Converged Cloud at Ericsson – makes a fine point on the similarities and differences between structural and transactional data:
“While transactional data can be structured, this is not necessarily always so. And even if we were to take all transactional data as being structured, there is a much larger set of data out there that is non transactional. We can’t use the terms interchangeably”
Although much smaller in volume, structured data functions as a solid foundation to critical business relevant insights. Without structured data, it is difficult to know where to find treasured discoveries buried in the unstructured data sets
Unstructured data refers to information that either does not have a pre-defined data model and/or is not organized as such.
Human unstructured data includes email and text, documents, pictures, videos, slideware which deal with opinions or aesthetic judgments. Different people will have different opinions. Social networks and content driven services are the dominant source for human generated unstructured data.
The hundreds of millions people texting and emailing or leaving voice mails from their mobile phones – like the two young men from Brazil in the image above – create massive sets of unstrucured information never available before
Machine generated unstructured data originates from man-made machines produce nonstop streams of data. These are computer logs, satellite telemetry (espionage or science), industrial sensors, video from security cameras, medical, seismic and geophysical sensors.
As Derek Collison, the founder and CTO of Apcera predicts in his blog Takeaways from TED 2015
“…massive amounts of data exist in forms that appear static, but now we’re leveraging massive computing resources and the cloud to turn this data into amazing information that enables not only humans – but machines – to learn. In fact, thanks to this data explosion, machine learning will advance so rapidly that in just five years what we’re doing now to analyze and leverage data could look as antiquated as a rotary dial telephone.”
Questions and Wishes
Every company wants to be self-sufficient in the Big Data business. But where do they start? Here are some questions and wishes that our accessibility group inferred from interviews and Internet research
“How do I store all this data? What data do I keep active and what do I archive? (They call this “triage”) “How do I move data around without breaking the law? How do we keep multiple clouds compliant, under multiple jurisdictions, in different countries, and ever changing compliance rules?”
“I tried the public storage cloud services from 3rd party vendors; they do not have geographical fencing and Governance, Risk-Management and Compliance (GRC). Also, we are not sure about their future long term pricing. ”
“Why are we feeding their margin, and not our company’s margin?”
Enterprises want to deploy the storage at will, and be able to scale it up to at least 100 PB. They want built-in governance, risk-management and compliance (GRC), automated as part of the system. They need to audit in-house to avoid surprises from external auditors.
The active content should be always on, and accessible in 100 milliseconds or less. We hear open questions like “What is “secure-enough?”, or “To comply with what? At what cost?”
Enterprises want disruptive economics and savings. They want to make money and not to spend unnecessarily. And above all they want simply to own and operate cloud storage and control in-house the tools used to speed up the management decision.
Ericsson calls ease of use Accessibility.
No matter how powerful the cloud storage solution is – if it ties up the customer’s smartest people and consumes inordinate amounts of money – it will fail. Enterprise UX is a catch-all term for work done for internal tools – software that’s used by employees, not consumers. It is just as important for the employee of a company to use the software tools, as it is for the external users. Otherwise the productivity will not be improved. When an employee uses software because “she has to”, and not because “she likes it” there is de-motivation. Research shows that 60% of our decisions making are emotional.
Many traditional big organizations have been defined by engineering and business thinking. Any design was either incidental or unintentional. Now, those companies are waking up to the value of solid design. They shed excess and are building better, leaner, and more human organizations.
The time has come to emulate (in any suitable enterprise) the cloud “giants.” Google, Facebook, Amazon have been playing a different game for over a decade. By increasing both operational and asset efficiency, it is possible for a company to own cloud and storage solutions as “the giants” have, but on a smaller scale.
Stay tuned for Ericsson Cloud Storage part 2…
Cloud Product Team