The Worlds of Data Science & Software Development Are Converging

Share this post:

Extracting value from data is a massive undertaking that requires full buy-in and collaboration across teams – and companies are responding by hiring data professionals en masse. According to an IBM report, annual demand for the roles of data scientists, data developers, and data engineers will reach nearly 700,000 openings by 2020.

But the hiring figures alone don’t tell the full story of companies embracing a data-first approach. The rise of data has also reshaped existing roles across industries and continues to do so. Data scientists and developers are working more closely together than ever before – a reflection of the fact that the worlds of software production and data science are converging.

There is no need for data scientists to be outright experts in software and programming thanks to emerging tools for using data delivered through the cloud. Making use of the cloud as a shared platform ensures the delivery of the right data to the right people at the right time, without the inefficiency of workflows passing through many hands through hand-overs.

Using cloud services to build a user-friendly and productive collaboration platform means that data scientists can focus on their core skills to derive insights and useful knowledge from data. Using these new tools, data scientists can begin to think of their role as one running parallel to their developer colleagues – both working to deliver business value to end users keen to take advantage of the amount of data being produced.

A marriage between technique and technology

The Technique

Data scientists and developers work on different parts of the same workflow. Data scientists explore the data for new insights, and developers use these insights to automate the workflow and create apps. However, they are both working toward the common goal of delivering well-constructed apps, and that goal is best realised through structured, close collaboration.

The mechanics behind the process of app creation involves elements of experimentation. For example – to create an application the data scientists work with raw data, building analytic models to draw useful and applicable insights from data. These insights would be fed back into the development team, which translates the resulting data models into functionality for the end user, through a programming language best suited to the app. This is a continual process, aimed at producing the most functional app possible.

However, closer collaboration is vital to fully capitalise the potential of the vast quantities of data now available and make the process of app creation as efficient as possible. This collaboration can be achieved by employing agile working methodologies and using a cloud platform to share data scientist and developer project workspaces. This allows for shared visibility into the work-in-progress and early results on each side, as well as allowing for quicker turnaround on feedback and co-creation of the project as it unfolds. Secondly, communication is key. Whilst having the tools to effectively share information is critical, there has to be a concerted effort to open lines of communication and share information and feedback as regularly as possible.

This still leaves the question of how data scientists and developers actually digest complex data using cloud services to ensure end users, whether in business or on a consumer app, receive accurate data as fast as possible.

The Technology

A cloud-ready tool encouraging the quicker delivery of data through collaboration is the Jupyter notebook. Notebooks allow users to write and share code in different languages such as Python, R, Scala and Node.js in one place. Data can be loaded from and saved to any cloud database, cleaned and processed to be used for prediction with machine learning models, and finally the results can be published directly from a notebook as visualisations and APIs. Data sets can be worked with simultaneously – saving time on the traditional feedback loop which requires translation of code into different programming languages, and passing findings back and forth.

PixieDust is the magic open-source ingredient to add to notebooks to speed up the exploration of data. PixieDust allows both data scientists and developers to quickly create data visualisations without any code and publish these as standalone web apps. This means that data becomes accessible to even non-technical end-users. Data presented visually, as opposed to in code and numbers, lends itself more readily to the identification of business opportunities.

Harnessing data to inform business decisions

With the skills of both data scientists and developers combined, and increasingly sophisticated tools such as Jupyter notebooks and PixieDust, the potential for innovation is significantly enhanced. Let’s take one example – weather data.

Weather data can be combined and analysed with many other data sets gathered from a range of sources to inform business decisions. For example, weather influences traffic and can be used to build a system that predicts the likelihood of traffic congestion and collisions. Historic weather data can be related to traffic flow, collisions and road quality data to build a predictive machine learning model that can be published as an API and used with weather forecast data to build a road safety app that can give authorities insight on how to improve safety on the roads.

It is easy to see the potential for creativity and innovation when we make it easy to gain insights and collaborate on ways to use it – exploring the power of data for both consumer and business insights.

Collaboration results in innovation

Cloud continues to be on the rise, and with it comes the power to explore more data, and extract value faster. This cloud-facilitated potential is bringing closer together the roles of data scientists and developers, which have historically operated with a significant degree of separation – driven largely by their usage of different tools and programming languages. However, this no longer poses an issue, with tools that can easily be employed to streamline working processes, and encourage more agile working. With this new efficiency, data scientists and developers have the capability to deliver creative, task focused products faster.

Do you have questions?

Come and join me on November 15th, in Hungary at the Budapest BI Forum to further discuss on the Weather and Climate data. I’ll be demonstrating how Jupyter notebooks and PixieDust can be used. The session will provide a brief overview of the science behind weather data example above and provide you with the tools to get started – even if you aren’t a meteorologist. Learn how to connect weather data to other data sources, how to visualize weather and climate data in an interactive weather dashboard embedded in a Python notebook. It will also include examples using weather APIs, maps and machine learning..

Developer Advocate & Data Scientist, IBM

More stories
By IGOR PRAVICA on June 23, 2021

Investing in our people, our partners and our locations

June 23, 2021 We have never lived in such an extraordinary age with monumental changes taking place over the last year. The extent of uncertainty and change everyone has dealt with in all aspects of their lives throughout the pandemic is unparalleled globally. I am so proud and continue to be inspired by how IBMers […]

Continue reading

By ibmblogs on June 22, 2021

Fairness and equality in the workplace is, in fact, decreasing. AI can help.

June 22, 2021 Recent studies show that despite all the effort, gender equality in the workplace is not only not advancing, but actually decreasing. This situation has been, indeed, accelerated by the COVID-19 pandemic. A panel discussion with high level representatives from EU institutions, employers and governments from Central and Eastern Europe will address this […]

Continue reading

By ibmblogs on June 16, 2021

Circeo Joins Growing Ecosystem of Partners Supporting the IBM Cloud for Financial Services

Circeo Intends to Use IBM Cloud for Financial Services to Help Customers Accelerate Transactions with Financial Institutions in a Highly-Secured Environment June 16, 2021 Circeo, a European fintech and lending technology provider delivering a cloud-based lending solution for banks and financial services companies, today announced it has joined IBM’s (NYSE: IBM) growing ecosystem of more […]

Continue reading