The outcome? IBM's Cloud Pak for Data and its Watson Knowledge Catalog solution proved to be the most scalable. “In particular, most of the solutions we reviewed had problems such as it is limited to handling metadata only or we had to purchase additional products for data lineage or data needed processing and could not be used immediately. Only IBM's Cloud Pak for Data doesn’t have any issues like these” said Sanghee Han, project lead, Samsung-Electro Mechanics.
An added advantage was the ability to run and scale SPSS® and visualize data on dashboards through Jupyter notebooks with IBM's Cloud Pak for Data. The team at Samsung Electro-Mechanics also built a portal on the front end of the solution to create, add, complement, and control the functions they needed, such as constructing trees using data, drawing data maps, and including additional security features. The fact that it could be customized to the company's needs and was easy to use in conjunction with its own systems, was a big advantage.
Connecting with Impala, SAP Hana, Oracle, MS SQL and other database servers were successful despite initial technical challenges in importing assets or using functions. The active involvement of IBM Korea in providing proactive technical support enabled the Samsung Electro-Mechanics team to solve the technical issues. Development work began in October 2021 with the first phase open for testing in December. The actual project took about 6 months to complete with the official roll-out on 22 April 2022.
Today, the platform based on IBM Cloud Pak for Data organizes data generated from multiple sources into systematic assets that can be easily shared, searched, and utilized through APIs across the organization. Personal and sensitive information is managed according to Samsung Electro-Mechanics' strict governance principles, and the data assets are continuously updated with newly generated data. The number of these data assets has grown from 500 to more than 2,500 and is still growing.
The best part about building a data platform with IBM's Cloud Pak for Data is to provide users with self service functions. Users can quickly find the data they need, process it in the form they want, and utilize it immediately, instead of having to ask developers to pull data that would have taken days. This has allowed users to complete tasks that typically took 30 days to 10 days and less.
“Some colleague said what used to take a week can now be done within a day. While time taken is dependent on data type, the overall employee productivity improved and they are able to derive useful insights from it. In cases of specific data requested by customers, our team can find, analyze and deliver the data quickly and efficiently which has contributed to the satisfaction of not only from our employees but also our customers.
“To top it off, the amount of data that can be utilized in the data lake from the raw material system, which accounts for the largest part of Samsung Electro-Mechanics' data corpus, has increased significantly. It is now possible to search and utilize not only structured data and files, but also images and unstructured data through the connected portal, which is another significant result of this project,” said Sanghee Han, project lead, Samsung-Electro Mechanics.
Samsung Electro-Mechanics plans to continue to expand the data platform based on the successful outcome of the project. The number of data systems connected to the solution has grown from five or six from the start to more than 20 today. While they are currently using it mainly on the DataOps side in cooperation with data scientists, there are plans to use it for the MLOps side.
"It was harder than I thought to create it from scratch and set up the process because this kind of innovation process is not something we do every day, but I feel very rewarded because the users are very satisfied and using it better than I expected at first," concluded Sanghee Han, project lead, Samsung-Electro Mechanics.