July 1, 2019 By IBM Blog 6 min read

Exploring the relationship of serverless technology and big data and analytics.

Big data has been around the past few decades in traditional form factors of big data systems, such as a data warehouses. Starting around the year 2000, however, Hadoop helped expand these integrated systems to be more open in terms of the data and analytics that could be supported.

In this lightboarding video, I trace the evolution of big data and analytics and explain how a new trend around serverless technology has made a big impact.

Big Data Explained

07:03

Big Data Explained

Learn more

Video Transcript

Big data analytics and serverless technology

Hello, this is Torsten Steinbach, Architect here at IBM for Data and Analytics in the Cloud, and today, I’m going to talk to you about serverless technology and how it is applied to big data analytics.


Data warehouses

When we look at big data in the past decades, we can see that there has been—well, there is a traditional form factor of big data systems that has been used for many decades already, and this is the form factor of a data warehouse.


So, this is a highly integrated system—highly optimized for handling big data queries, big data analytics in a very efficient manner.

Hadoop

Nevertheless, we had (around the year 2000) Hadoop coming up and being adopted very rapidly and gaining a lot of popularity in this now widely adopted industry.


Even though there was already big data analytics, so why is that Hadoop came up? So, this is because it brought—in addition to this integrated system—more openness to the table. More openness, in terms of the type of data that it could handle, data formats, bring-your-own-data formats, the types of analytics, analytics libraries, and languages that can be supported. And also, the flexibility in terms of the hardware, the deployment options that you can have. You can bring your custom hardware—even heterogeneous hardware.

So, that’s why Hadoop basically gained a lot of traction and is now widely adopted.

The rise of cloud and big data analytics

Today, however, we are seeing a trend that basically results in yet another form factor of doing big data analytics, and this trend is driven by, actually, one thing that is happening, which is the era of the rise of cloud.


Consumer behavior and the sharing economy

And another thing to actually goes hand in hand a little bit with the rise of cloud is the consumption behavior of many people—of end users—to be more oriented on the sharing economy. So, people are using more and more ride shares instead of just renting a car and not to speak of buying a car just to get around. Or, they are just going with Airbnb to sleep a night somewhere.

Serverless as the sharing economy for IT

So, this consumer behavior is also applied now to IT.  And this term serverless is actually explained as this: serverless is, in fact, the sharing economy for IT. And it is it is enabled by cloud.

And it is, in fact, the most consequent usage model of cloud—serverless.


Functions-as-a-Service (FaaS)

Many of you have heard the term serverless, and probably most of you will associate a thing called Functions-as-a-Service with serverless. Many of you may think it’s synonymous, which is not exactly true, but that is what basically what many people think of. 


Functions-as-a-Service is: I have my code that I need to run—my business logic—but I don’t provision dedicated systems, dedicated hardware, or not yet not even dedicated software; I’m just sending it to the server and saying, please run it for me. Run it for me maybe that many times. 

So, how to scale out, and it’s all done ad-hoc. It’s, basically, hiding the fact that there are servers. That’s why it’s called serverless.

Big data and analytics

Now, as I said, this is what many people think of when they hear the term serverless, but serverless is more than just Functions-as-a-Service. Especially when we now look back again here at our domain here, which is data—big data and analytics.

The problem with big data analytics is that we are talking about state. State has to be kept—my data has to be kept safely and durable and reliably. I need to be able to access it anytime I want it. And that’s what these systems provide.

Data storage in the cloud

But now in the cloud, we have new options. We can actually abstract the storage of data itself as a cloud service on its own.

That’s also what’s happening on the cloud, and there is, basically, cloud-native storage of object storage.


Object storage is, basically, serverless storage because you do not provision disk volumes, you do not configure disk volumes—you just bring your data and the system figures out how to store it and how to distribute it to make it highly available and so on.

It’s highly abstracted—you just have a REST API where you upload and download your data. You can come with kilobytes of data, going up to terabytes of data, in the same organizational unit.

Pay-as-you-go consumption model

And to think about why it is serverless—it is also that it’s a pay-as-you-go consumption model. You just don’t use it as you go, you also to pay as you go, which means you’re just paying for the gigabytes if you’re storing at this point right now. And if you store less, you will be paying less in a very elastic, completely seamlessly elastic way.


Analyzing and processing data

Now, when we talk about big data analytics, it’s not just about storage of data but, also, how can we analyze this data and process this data. And that’s exactly what we are now seeing as well driven by cloud; we are seeing additional services that are made available around object storage such as SQL-as-a-Service or, also, it allows you to run SQL, basically, on the data in object storage and just be built for this one SQL, depending on how big the SQL was in terms of data it had to scan. And you do not pay for database that is provisioned and standing around—just a single SQL and that’s it. 

And there are other things that basically play in, like, for instance, Messaging-as-a-Service—Kafka-as-a-Service—where you are just paying by the number of messages being processed and then eventually stored to the object storage.


Complementary big data and analytics form factors

So there’s a series of these services basically coming up, and, in combination, they are providing this new form factor of a big data and analytics system that is augmenting and actually complementing the existing form factors because even though they are more established and older, there is still a point for using them. They have their sweet spots in terms of their own performance characteristics and response time guarantees, but, on the other side, there are maybe cost-effectiveness benefits here. 

So, depending on your business model and requirements, you may use this or this or the combination of those things.

So, I hope this helps to put in perspective how serverless play into big data analytics and how it basically generates a whole new form factor with big data and analytics systems.

Was this article helpful?
YesNo

More from Cloud

The power of embracing distributed hybrid infrastructure

2 min read - Data is the greatest asset to help organizations improve decision-making, fuel growth and boost competitiveness in the marketplace. But today’s organizations face the challenge of managing vast amounts of data across multiple environments. This is why understanding the uniqueness of your IT processes, workloads and applications demands a workload placement strategy based on key factors such as the type of data, necessary compute capacity and performance needed and meeting your regulatory security and compliance requirements. While hybrid cloud has become…

Serverless vs. microservices: Which architecture is best for your business?

7 min read - When enterprises need to build an application, one of the most important decisions their leaders must make is what kind of software development to use. While there are many software architectures to choose from, serverless and microservices architectures are increasingly popular due to their scalability, flexibility and performance. Also, with spending on cloud services expected to double in the next four years, both serverless and microservices instances should grow rapidly since they are widely used in cloud computing environments. While…

Seamless cloud migration and modernization: overcoming common challenges with generative AI assets and innovative commercial models

3 min read - As organizations continue to adopt cloud-based services, it’s more pressing to migrate and modernize infrastructure, applications and data to the cloud to stay competitive. Traditional migration and modernization approach often involve manual processes, leading to increased costs, delayed time-to-value and increased risk. Cloud migration and modernization can be complex and time-consuming processes that come with unique challenges; meanwhile there are many benefits to gen AI assets and assistants and innovative commercial models. Cloud Migration and Modernization Factory from IBM Consulting®…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters