Representation: Feature Engineering | Machine Learning

Machine Learning

English
Deutsch
Español
Español – América Latina
Français
Indonesia
Italiano
Polski
Português – Brasil
Tiếng Việt
Türkçe
Русский
עברית
العربيّة
فارسی
हिंदी
বাংলা
ภาษาไทย
中文 – 简体
中文 – 繁體
日本語
한국어

Foundational courses

Machine Learning

Foundational courses
- Home
- Crash Course
Advanced courses
Guides
Glossary
- More

Quick Links
Overview
Prerequisites and Prework
Exercises
ML Concepts
Introduction to ML (3 min)
Framing (15 min)
- Video Lecture
- Key ML Terminology
- Check Your Understanding
Descending into ML (20 min)
- Video Lecture
- Linear Regression
- Training and Loss
- Check Your Understanding
See Also
Fundamental Techniques of Feature Engineering for Machine Learning What is Feature Engineering? - GeeksforGeeks What is Feature Engineering?Data Preparation and Feature Engineering in ML | Machine Learning | Google for Developers
Reducing Loss (60 min)
- Video Lecture
- An Iterative Approach
- Gradient Descent
- Learning Rate
- Optimizing Learning Rate
- Stochastic Gradient Descent
- Playground Exercise
- Check Your Understanding
First Steps with TF (65 min)
- Toolkit
- Programming Exercises
Generalization (15 min)
- Video Lecture
- Peril of Overfitting
Training and Test Sets (25 min)
- Video Lecture
- Splitting Data
- Playground Exercise
Validation Set (35 min)
- Check Your Intuition
- Video Lecture
- Another Partition
- Programming Exercise
Representation (35 min)
- Video Lecture
- Feature Engineering
- Qualities of Good Features
- Cleaning Data
Feature Crosses (70 min)
- Video Lecture
- Encoding Nonlinearity
- Crossing One-Hot Vectors
- Playground Exercises
- Programming Exercise
- Check Your Understanding
Regularization: Simplicity (40 min)
- Playground Exercise: Overcrossing?
- Video Lecture
- L2 Regularization
- Lambda
- Playground Exercise: L2 Regularization
- Check Your Understanding
Logistic Regression (20 min)
- Video Lecture
- Calculating a Probability
- Loss and Regularization
Classification (90 min)
- Video Lecture
- Thresholding
- True vs. False; Positive vs. Negative
- Accuracy
- Precision and Recall
- Check Your Understanding: Accuracy, Precision, Recall
- ROC Curve and AUC
- Check Your Understanding: ROC and AUC
- Prediction Bias
- Programming Exercise
Regularization: Sparsity (20 min)
- Video Lecture
- L1 Regularization
- Playground Exercise
- Check Your Understanding
Neural Networks (65 min)
- Video Lecture
- Structure
- Playground Exercises
- Programming Exercise
Training Neural Nets (10 min)
- Video Lecture
- Best Practices
Multi-Class Neural Nets (45 min)
- Video Lecture
- One vs. All
- Softmax
- Programming Exercise
Embeddings (50 min)
- Video Lecture
- Motivation from Collaborative Filtering
- Categorical Input Data
- Translating to a Lower-Dimensional Space
- Obtaining Embeddings
ML Engineering
Production ML Systems (3 min)
Static vs. Dynamic Training (7 min)
- Video Lecture
- Check Your Understanding
Static vs. Dynamic Inference (7 min)
- Video Lecture
- Check Your Understanding
Data Dependencies (14 min)
- Video Lecture
- Check Your Understanding
Fairness (70 min)
- Video Lecture
- Types of Bias
- Identifying Bias
- Evaluating for Bias
- Programming Exercise
- Check Your Understanding
ML Systems in the Real World
Cancer Prediction (5 min)
Literature (5 min)
Guidelines (2 min)
Conclusion
Next Steps

Home
Products
Machine Learning
Foundational courses
Crash Course

Stay organized with collections Save and categorize content based on your preferences.

In traditional programming, the focus is on code. In machine learningprojects, the focus shifts to representation. That is, one way developers honea model is by adding and improving its features.

Mapping Raw Data to Features

The left side of Figure 1 illustrates raw data from an input data source;the right side illustrates a feature vector, which is the set offloating-point values comprising the examples in your data set.Feature engineering means transforming raw data intoa feature vector. Expect to spend significant time doing featureengineering.

Many machine learning models must represent the features asreal-numbered vectors since the feature values must be multiplied by themodel weights.

Figure 1. Feature engineering maps raw data to ML features.

Mapping numeric values

Integer and floating-point data don't need a special encoding because they canbe multiplied by a numeric weight. As suggested in Figure 2, converting the rawinteger value 6 to the feature value 6.0 is trivial:

Figure 2. Mapping integer values to floating-point values.

Mapping categorical values

Categoricalfeatures have a discrete set of possible values.For example, theremight be a feature called street_name with options that include:

{'Charleston Road', 'North Shoreline Boulevard', 'Shorebird Way', 'Rengstorff Avenue'}

Since models cannot multiply strings by the learned weights, we use featureengineering to convert strings to numeric values.

We can accomplish this by defining a mapping from the feature values, whichwe'll refer to as the vocabulary of possible values, to integers. Since notevery street in the world will appear in our dataset, we can group all otherstreets into a catch-all "other" category, known as an OOV (out-of-vocabulary)bucket.

Using this approach, here's how we can map our street names to numbers:

map Charleston Road to 0
map North Shoreline Boulevard to 1
map Shorebird Way to 2
map Rengstorff Avenue to 3
map everything else (OOV) to 4

However, if we incorporate these index numbers directly into our model, it willimpose some constraints that might be problematic:

We'll be learning a single weight that applies to all streets. For example, ifwe learn a weight of 6 for street_name, then we will multiply it by 0 forCharleston Road, by 1 for North Shoreline Boulevard, 2 for Shorebird Way andso on. Consider a model that predicts house prices using street_name as afeature. It is unlikely that there is a linear adjustment of price basedon the street name, and furthermore this would assume you have ordered thestreets based on their average house price. Our model needs the flexibilityof learning different weights for each street that will be added to the price estimated using the other features.
We aren't accounting for cases where street_name may take multiplevalues. For example, many houses are located at the corner of two streets, andthere's no way to encode that information in the street_name value if itcontains a single index.

To remove both these constraints, we can instead create a binary vector for eachcategorical feature in our model that represents values as follows:

For values that apply to the example, set corresponding vector elements to 1.
Set all other elements to 0.

The length of this vector is equal to the number of elements in the vocabulary.This representation is called a one-hot encoding when a single value is 1,and a multi-hot encoding when multiple values are 1.

Figure 3 illustrates a one-hot encoding of a particular street: Shorebird Way.The element in the binary vector for Shorebird Way has a value of 1, while theelements for all other streets have values of 0.

Figure 3. Mapping street address via one-hot encoding.

This approach effectively creates a Boolean variable for every feature value(e.g., street name). Here, if a house is on Shorebird Way then the binary valueis 1 only for Shorebird Way. Thus, the model uses only the weight for ShorebirdWay.

Similarly, if a house is at the corner of two streets, then two binary valuesare set to 1, and the model uses both their respective weights.

Sparse Representation

Suppose that you had 1,000,000 different street names in your data setthat you wanted to include as values for street_name. Explicitly creating abinary vector of 1,000,000 elements where only 1 or 2 elements are true is avery inefficient representation in terms of both storage and computation timewhen processing these vectors. In this situation, a common approach is to use asparse representation in which only nonzero values are stored. In sparserepresentations, an independent model weight is still learned for each featurevalue, as described above.

Help Center

Previous arrow_back Video Lecture

Next Qualities of Good Features arrow_forward

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2022-07-18 UTC.

Representation: Feature Engineering | Machine Learning | Google for Developers (2024)

Mapping Raw Data to Features

Mapping numeric values

Mapping categorical values

Sparse Representation