Data Preparation and Feature Engineering in ML  |  Machine Learning  |  Google for Developers (2024)

  • Home
  • Products
  • Machine Learning
  • Foundational courses
  • Data Prep
Stay organized with collections Save and categorize content based on your preferences.

Machine learning helps us find patterns in data—patterns we then use tomake predictions about new data points. To get those predictions right,we must construct the datasetand transform the data correctly.This course covers these two key steps. We'll also see howtraining/serving considerations play into these steps.

Data Preparation and Feature Engineering in ML | Machine Learning | Google for Developers (1)

Prerequisites

This course assumes you have:

  • Completed Machine Learning Crash Course.

Why Learn About Data Preparation and Feature Engineering?

You can think of feature engineering as helping the model tounderstand the data set in the same way you do. Learners often come to a machinelearning course focused on model building, but end up spending much more timefocusing on data.

For the following question,click the desired arrow to check your answer:

If you had to prioritize improving one of the areas below in your machine learning project, which would have the most impact?

The quality and size of your data

Data trumps all. It's true that updating your learning algorithm or model architecture will let you learn different types of patterns, but if your data is bad, you will end up building functions that fit the wrong thing. The quality and size of the data set matters much more than which shiny algorithm you use.

Using the latest optimization algorithm

You could definitely see some gains in pushing optimizers, but it wouldn't have as significant an impact on your model as another item in this list.

A deeper network

While a deeper network may improve your model, the impact won't be as significant as another item in this list.

A more clever loss function

Close! A better loss function can give you a big win, but it's still second to another item in this list.

Why is Collecting a Good Data Set Important?

Google Translate

"...one of our most impactful quality advances since neural machine translationhas been in identifying the best subset of our training data to use"- Software Engineer, Google Translate

The Google Translate team has more training data than they can use.Rather than tuning their model, the team hasearned bigger wins by using the best features in their data.

"...most of the times when I tried to manually debug interesting-looking errors they could be traced back to issues with the training data."- Software Engineer, Google Translate

"Interesting-looking" errors are typically caused bythe data. Faulty data may cause your model to learn the wrong patterns,regardless of what modeling techniques you try.

Brain's Diabetic Retinopathy Project

Google Brain's diabetic retinopathy project employed a neural networkarchitecture, known as Inception, to detect disease by classifying images. The team didn't tweak models.Instead, they succeeded by creating a data set of 120,000 examples labeledby ophthalmologists. (Learn more athttps://research.google.com/pubs/pub43022.html.)

Next The Process arrow_forward

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2023-08-23 UTC.

Data Preparation and Feature Engineering in ML  |  Machine Learning  |  Google for Developers (2024)
Top Articles
Latest Posts
Article information

Author: Catherine Tremblay

Last Updated:

Views: 5839

Rating: 4.7 / 5 (67 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Catherine Tremblay

Birthday: 1999-09-23

Address: Suite 461 73643 Sherril Loaf, Dickinsonland, AZ 47941-2379

Phone: +2678139151039

Job: International Administration Supervisor

Hobby: Dowsing, Snowboarding, Rowing, Beekeeping, Calligraphy, Shooting, Air sports

Introduction: My name is Catherine Tremblay, I am a precious, perfect, tasty, enthusiastic, inexpensive, vast, kind person who loves writing and wants to share my knowledge and understanding with you.