Introduction to BigQuery ML Beta
This is a beta release of BigQuery ML. This product might be changed in backward-incompatible ways and is not subject to any SLA or deprecation policy. Overview
BigQuery ML enables users to create and execute machine learning models in BigQuery using standard SQL queries. BigQuery ML democratizes machine learning by enabling SQL practitioners to build models using existing SQL tools and skills. BigQuery ML increases development speed by eliminating the need to move data.
BigQuery ML currently supports the following types of models:
Linear regression — These models can be used for predicting a numerical value. Binary logistic regression — These models can be used for predicting one of two classes (such as identifying whether an email is spam). Multiclass logistic regression for classification — These models can be used to predict more than two classes such as whether an input is "low-value", "medium-value", or "high-value".
BigQuery ML functionality is available by using:
The BigQuery web UI The bq command-line tool The BigQuery REST API An external tool such as a Jupyter notebook or business intelligence platform
Machine learning on large data sets requires extensive programming and knowledge of ML frameworks. These requirements restrict solution development to a very small set of people within each company, and they exclude data analysts who understand the data but have limited machine learning knowledge and programming expertise.
BigQuery ML empowers data analysts to use machine learning through existing SQL tools and skills. Analysts can use BigQuery ML to build and evaluate ML models in BigQuery. Analysts no longer need to export small amounts of data to a spreadsheets or other applications, and analysts no longer need to wait for limited resources from a data science team. Advantages of BigQuery ML
BigQuery ML has the following advantages over other approaches to using ML with a cloud-based data warehouse:
BigQuery ML democratizes the use of ML by empowering data analysts, the primary data warehouse users, to build and run models using existing business intelligence tools and spreadsheets. This enables business decision making through predictive analytics across the organization. There is no need to program an ML solution using Python or Java. Models are trained and accessed in BigQuery using SQL — a language data analysts know.
BigQuery ML increases the speed of model development and innovation by removing the need to export data from the data warehouse. Instead, BigQuery ML brings ML to the data. Exporting and re-formatting the data: Increases complexity — Multiple tools are required. Reduces speed — Moving and formatting large amounts data for Python-based ML frameworks takes longer than model training in BigQuery. Requires multiple steps to export data from the warehouse, restricting the ability to experiment on your data. Can be prevented by legal restrictions (such as HIPAA guidelines).
Like BigQuery, BigQuery ML is a multi-regional resource. BigQuery ML supports the same regions as BigQuery.
Data locality is specified when you create a dataset to store your BigQuery ML models and training data. BigQuery ML processes and stages data in the same location as the target dataset. Note: For more information on data storage, see the Service Specific Terms. Quotas
In addition to BigQuery ML-specific limits, queries that use BigQuery ML functions and CREATE MODEL statements are subject to the quotas and limits on Query jobs.
For more information on all quotas and limits, see Quotas and Limits. Pricing
BigQuery ML models are stored in BigQuery datasets like tables and views. When you create and use models in BigQuery ML, your charges are based on how much data is used to train a model and on the queries you run against the data.
For information on BigQuery ML pricing, see BigQuery ML pricing. For information on storage pricing, see Storage pricing. For information on query pricing, see Query pricing. Resources
To learn more about machine learning and BigQuery ML, see the:
Applying machine learning to your data with GCP course at Coursera Data and machine learning training program Machine learning crash course Machine learning glossary
To get started using BigQuery ML, see Getting started with BigQuery ML for data analysts or Getting started with BigQuery ML for data scientists.
Was this page helpful? Let us know how we did:
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 3.0 License, and code samples are licensed under the Apache 2.0 License. For details, see our Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated February 4, 2019.