AI at Scale - IBM Blog

Accelerating AI integration across your enterprise can generate positive business growth. 90% of corporate AI initiatives are struggling to move beyond test stages. Organizations are maturing in Data Science, but still fail to integrate and scale Advanced Analytics and AI/ML into every day, real-time decision making – hence they cannot reap the value of AI. An accelerated digital transformation will be required for the new world of remote work and AI/ML can be leveraged to achieve this more quickly. And they result in more efficient business operations, more compelling customer experiences and more insightful decision-making. Enterprises can capture significant gains across the value chain with AI, but organizations have to do it right from the very beginning or run the risk of accruing fines, penalties, errors, corrupted results and general distrust from their business users and the market.

Companies that are strategically scaling AI report nearly 3X the return from AI investments compared to companies pursuing siloed proof of concepts.

Scaling AI within the Enterprise:

IBM’s methodology

IBM Services has end-to-end capabilities to drive value from AI. Drive sustainable enterprise-wide innovation with scalable AI/ML models, that are environmentally friendly, actionable, reusable, and scalable, which are not just one-off science experiments. IBM services for AI at scale aims at scaling current AI engagements and applications towards an enterprise setup. It consists of multiple pillars, which are building up the overall offering:

Vision

We start with a vision to establish and scale trustworthy AI and data as key business strategy components for competitive advantage. We base it on a measurement framework to generate genuine AI value that you and your clients can trust.

Operating Model

We advise and work collaboratively with your team to build a tailored operating model. We understand that each organization is different, and what works for one won’t work for another. For example, a federated model instead of a non-federated model. We then work side by side with you to develop a pipeline of initiatives that produces measurable business value through the harvesting of AI assets by scalable and connected teams.

Data and Platform

We guide your data and technology direction for AI with the ability to migrate and build new AI and ML data-driven applications on a data platform that’s flexible enough to gather, integrate and manage data for multiple use cases, platforms and clouds.

Engineering and Operations

We position AI operations as a key component and critical part of rolling out data science and AI models repeatably, consistently and at scale with four main objectives: engineer, deploy, monitor and trust.

Change Management

We help develop change management for increasing AI adoption rates with minimal risk by establishing active, enterprise-level change management. This approach can identify and address blockers to the ways in which AI can create value for your enterprise.

People and Enablement

We help in choosing the right skill set, roles and team setup in the AI organization which is essential to achieve maturity and scalability.

IBMs Approach to AI at scale implementation

An important detail of IBM Services for AI at Scale is that you don’t have to start over. IBM works with your existing environment: your intelligence automation, your governance and your data management. The client can gain full visibility and control over their workloads—wherever they run—generating real business value. With the goal of minimizing time to deploy and time to value with minimized risk, IBM’s process includes a four-phased approach to AI at scale implementation.

Assess phase (4-6 weeks): Short-term audit, assessment, and planning – to identify gaps in the existing Process, Methods and Tools. Work with the client in a joint collaboration to execute first solutions on the new platform.
Design and Establish (4-6 weeks): Collaboratively build a common framework for building, scaling, and maintaining AI. Set up a framework of scalability with the client based on the existing environment.
. Adopt (3-4 months): Co-work to deliver first projects. Pilot 3-5 MVPs on framework to hone it; finalize and set up architecture, processes, program. Work with the client in a joint collaboration to execute first solutions on the new platform. IBM Garage: Co-Create, Co-Execute, Co-Operate.
Scale (ongoing): Set up Scaling Team, manage Machine Learning in production. Provide client with fully managed AI as a service throughout the organization, so the client can focus on the business challenges.

RAD-ML

RAD-ML is the IBM’s approach Framework to rapidly accelerate time to production of data science applications via automation. Supported by Rapid Asset Development – Machine Learning (RAD-ML) methodology and other IBM assets and accelerators, IBM Services for AI at Scale provides responsible, consistent, yet innovative frameworks to address and harness data science to build repeatable, reusable, scalable, and actionable AI / ML models. IBMs’ offering radically reduces the development time of those models and establishes pipelines to accelerate deployment into production, while increasing the efficiency of the clients’ data scientists – allowing them to focus on achieving expected business results and do what they do best and enjoy most.

IBM Services for AI at Scale is a “consult-to-operate” service that provides a means to consistently integrate and scale AI/ML PoCs into production, as well as run and manage those AI / ML models over time. Assets developed using the RAD-ML method guidelines can be more easily deployed on scalable machine learning architecture.

RAD-ML is a proven framework for developing scalable ML assets, defining asset readiness across functional and strategic dimensions, and can be used as a starting point for any AI/ML solution if the client doesn’t have any common framework. It can be leveraged for developing standalone data science assets or modules on top of existing solutions. It empowers the creation of machine learning assets that respect the three capabilities (actionable, reusable and scalable) using the following key concepts:

▪ Machine learning assets should be integrated in business processes with proven ROI

▪ Machine learning assets should be flexible to different data contexts and technology investments

▪ Machine learning assets should be based on a robust technology and ops design that can be scaled up

Tooling Considerations for RAD-ML

Each RAD-ML project should be integrated into the preexisting client environment. It should also add the open source and free RAD-ML accelerators: Brainstem, dash-blocks, architecture, and documents templates. To define a suitable and standardized ML Ops architecture, a detailed target component overview needs to be established. The target components will be aligned with internal infrastructure and tooling set-up.

AWS agnostic architecture machine learning pipeline

IBM can implement this common framework on essentially any cloud, including a hybrid multicloud. Following is an example how IBM can use AWS tools to create a machine learning pipeline.

CodeCommit

AWS CodeCommit replaces a conventional git repository – this is the essential place where all of the used code of a project is stored.

CodeDeploy/CodeBuild

CodeBuild will run all unit and integration tests, as well as build a tarball from the specified python sources, which can be deployed into a docker container later. CodeDeploy will execute a specified deployment scenario, which will e.g. build the docker container, push it to a docker image repository and in the end load the image in a production setting.

AWS ECR

AWS ECR functions as the repository for all docker containers, which are built in the above-mentioned pipeline. It acts as repository for containers just as CodeCommit acts as a repository for config files and source code. This is the point where AWS SageMaker will look for a specified docker image, when a training job is triggered with the respective parameters from the outside.

AWS SageMaker

AWS SageMaker acts as the runtime environment for all training jobs. AWS SageMaker can be triggered via an API/python binding. User specifies, what kind of model is to be run and where the respective input and output data is located. AWS SageMaker will accept docker images with a predefined entry point containing the training code. However, it is also possible run a TensorFlow/MXNext/ONNX-defined job there. SageMaker offers a User Interface for administration and can be elastically scaled as it is a managed service. Therefore, the user can choose from a wide variety of machines, which are used to train a specific model. AWS SageMaker can also be used to perform Hyperparameter Tuning, which can be triggered via the API as well. The tool will automatically select the best performing combination of hyperparameters. The results from a run can be directly written to S3 or even DynamoDB.

AWS S3

AWS S3 acts as the basic file system for input and output files. Usually S3 is used to store large training data files and can also be used to store serialized models. AWS S3 seamlessly integrates with SageMaker.

AWS DynamoDB

AWS DynamoDB is a key-value based NoSQL database, which is completely managed by AWS and can be scaled on demand. The database can be used to hold the KPIs from a model run to track model performance over time for example. It is also leveraged to integrate runtime information and performance meta data for a model run. AWS DynamoDB can be seamlessly integrated with QuickSight, which is a data visualization tool offered by AWS.

AWS Elastic Inference

AWS Elastic Inference is an EC2 instance on steroids. Models trained in AWS SageMaker can be hosted on an EI instance for prediction. The underlying machine(s) can be scaled on demand.

Developing trustworthy AI

The Ethics question is not just a modelling problem but a business problem. 60% of companies see compliance as a barrier to achieving success in applying AI, in part due to a lack of trust and understanding of the system. IBM Designed a 3-Pronged Approach to Nurture Trust, Transparency & Fairness to consistently run, maintain, and scale AI while maintaining trust and reducing brand and reputation risk. IBM can assist the client with the culture they need to adopt and safely scale AI, with AI engineering through forensic tools to see inside black-box algorithms, and with the governance to make sure the engineering sticks to the culture. At the center of trustworthy AI is the telemetry and forensic tooling that IBM holds supreme in the community for our open source and Linux® foundation.

IBM Services for AI at Scale is framed around the IBM Research open-source toolkit, AI Fairness 360 and fact sheets. Developers are able to share and receive state-of-the-art codes and data sets related to AI bias detection and mitigation. These IBM Research efforts also led us to integrate IBM Watson® OpenScale™, a commercial offering designed to build AI-based solutions or enterprises to detect, manage and mitigate AI bias.

IBM’s Value Proposition

Was this article helpful?

YesNo

Rebecca Carroll

More from Artificial intelligence

Applying generative AI to revolutionize telco network operations

Re-evaluating data management in the generative AI age

IBM announces new AI assistant and feature innovations at Think 2024

IBM Newsletters