🔗 View code on Github

Description

The Google Analytics Sample Ecommerce dataset that has millions of Google Analytics records for the Google Merchandise Store loaded into BigQuery. In this lab, I used this data to run some typical queries that businesses would want to know about their customers’ purchasing habits. Each row within a table corresponds to a session in Analytics 360.

Objectives:

In this Lab we look for predict which new visitors will come back and purchase, through the following tasks.

  • Use BigQuery to find public datasets
  • Query and explore the ecommerce dataset
  • Create a training and evaluation dataset to be used for batch prediction
  • Create a classification (logistic regression) model in BigQuery ML
  • Evaluate the performance of the machine learning model
  • Predict and rank the probability that a visitor will make a purchase

Outcome

After evaluating the model I got a roc_auc of 0.72, which shows that the model has not great predictive power. I Added some new features and create a second machine learning model, a key new feature that was added to the training dataset query is the maximum checkout progress each visitor reached in their session. With this new model I got a roc_auc of 0.91 which is significantly better than the first model.

Conclusion

BigQuery ML (BigQuery machine learning) is a feature in BigQuery where data analysts can create, train, evaluate, and predict with machine learning models with minimal coding.