Oracle Announces Oracle Cloud Data Science Platform
Seven new services, including new data catalog, to discover, find, organize, enrich and create data assets; new big data service delivers a full Cloudera Hadoop implementation; new service provides SQL access to HDFS; new fully managed service to run Apache Spark applications
Oracle recently announced the availability of the Oracle Cloud Data Science Platform. At the core is Oracle Cloud Infrastructure Data Science, helping enterprises to collaboratively build, train, manage and deploy machine learning models to increase the success of data science projects. Unlike other data science products that focus on individual data scientists, Oracle Cloud Infrastructure Data Science helps improve the effectiveness of data science teams with capabilities like shared projects, model catalogs, team security policies, reproducibility and auditability. Oracle Cloud Infrastructure Data Science automatically selects the most optimal training datasets through AutoML algorithm selection and tuning, model evaluation and model explanation.
Today, organizations realize only a fraction of the enormous transformational potential of data because data science teams don’t have easy access to the right data and tools to build and deploy effective machine learning models. The net result is that models take too long to develop, don’t always meet enterprise requirements for accuracy and robustness and too frequently never make it into production.
“Effective machine learning models are the foundation of successful data science projects, but the volume and variety of data facing enterprises can stall these initiatives before they ever get off the ground,” said Greg Pavlik, senior vice president product development, Oracle Data and AI Services. “With Oracle Cloud Infrastructure Data Science, we’re improving the productivity of individual data scientists by automating their entire workflow and adding strong team support for collaboration to help ensure that data science projects deliver real value to businesses.”
Designed for Data Science Teams and Scientists
Oracle Cloud Infrastructure Data Science includes automated data science workflow, saving time and reducing errors with the following capabilities:
- AutoML automated algorithm selection and tuning automates the process of running tests against multiple algorithms and hyperparameter configurations. It checks results for accuracy and confirms that the optimal model and configuration is selected for use. This saves significant time for data scientists and, more importantly, is designed to allow every data scientist to achieve the same results as the most experienced practitioners.
- Automated predictive feature selection simplifies feature engineering by automatically identifying key predictive features from larger datasets.
- Model evaluation generates a comprehensive suite of evaluation metrics and suitable visualizations to measure model performance against new data and can rank models over time to enable optimal behavior in production. Model evaluation goes beyond raw performance to take into account expected baseline behavior and uses a cost model so that the different impacts of false positives and false negatives can be fully incorporated.
- Model explanation: Oracle Cloud Infrastructure Data Science provides automated explanation of the relative weighting and importance of the factors that go into generating a prediction. Oracle Cloud Infrastructure Data Science offers the first commercial implementation of model-agnostic explanation. With a fraud detection model, for example, a data scientist can explain which factors are the biggest drivers of fraud so the business can modify processes or implement safeguards.
Getting effective machine learning models successfully into production needs more than just dedicated individuals. It requires teams of data scientists working together collaboratively. Oracle Cloud Infrastructure Data Science delivers powerful team capabilities including:
- Shared projects help users organize, enable version control and reliably share a team’s work including data and notebook sessions.
- Model catalogs enable team members to reliably share already-built models and the artifacts necessary to modify and deploy them.
- Team-based security policies allow users to control access to models, code and data, which are fully integrated with Oracle Cloud Infrastructure Identity and Access Management.
- Reproducibility and auditability functionalities enable the enterprise to keep track of all relevant assets, so that all models can be reproduced and audited, even if team members leave.
With Oracle Cloud Infrastructure Data Science, organizations can accelerate successful model deployment and produce enterprise-grade results and performance for predictive analytics to drive positive business outcomes.
Comprehensive Data and Machine Learning Services
The Oracle Cloud Data Science Platform includes seven new services that deliver a comprehensive end-to-end experience designed to accelerate and improve data science results:
- Oracle Cloud Infrastructure Data Science: Enables users to build, train and manage new machine learning models on Oracle Clou using Python and other open-source tools and libraries including TensorFlow, Keras and Jupyter.
- Powerful New Machine Learning Capabilities in Oracle Autonomous Database: Machine learning algorithms are tightly integrated in Oracle Autonomous Database with new support for Python and automated machine learning. Upcoming integration with Oracle Cloud Infrastructure Data Science will enable data scientists to develop models using both open source and scalable in-database algorithms. Uniquely, bringing algorithms to the data in Oracle Database speeds time to results by reducing data preparation and movement.
- Oracle Cloud Infrastructure Data Catalog: Allows users to discover, find, organize, enrich and trace data assets on Oracle Cloud. Oracle Cloud Infrastructure Data Catalog has a built-in business glossary making it easy to curate and discover the right, trusted data.
- Oracle Big Data Service: Offers a full Cloudera Hadoop implementation, with dramatically simpler management than other Hadoop offerings, including just one click to make a cluster highly available and to implement security. Oracle Big Data Service also includes machine learning for Spark allowing organizations to run Spark machine learning in memory with one product and with minimal data movement.
- Oracle Cloud SQL: Enables SQL queries on data in HDFS, Hive, Kafka, NoSQL and Object Storage. Only CloudSQL enables any user, application or analytics tool that can talk to Oracle databases to transparently work with data in other data stores, with the benefit of push-down, scale-out processing to minimize data movement.
- Oracle Cloud Infrastructure Data Flow: A fully-managed Big Data service that allows users to run Apache Spark applications with no infrastructure to deploy or manage. It enables enterprises to deliver Big Data and AI applications faster. Unlike competing Hadoop and Spark services, Oracle Cloud Infrastructure Data Flow includes a single window to track all Spark jobs making it simple to identify expensive tasks or troubleshoot problems.
- Oracle Cloud Infrastructure Virtual Machines for Data Science: Preconfigured GPU-based environments with common IDEs, notebooks and frameworks that can be up and running in under 15 minutes, for $30 a day.
Source: Oracle Newsroom