- Home
- Products
- Machine Learning
- Foundational courses
- Crash Course
In traditional programming, the focus is on code. In machine learningprojects, the focus shifts to representation. That is, one way developers honea model is by adding and improving its features.
Mapping Raw Data to Features
The left side of Figure 1 illustrates raw data from an input data source;the right side illustrates a feature vector, which is the set offloating-point values comprising the examples in your data set.Feature engineering means transforming raw data intoa feature vector. Expect to spend significant time doing featureengineering.
Many machine learning models must represent the features asreal-numbered vectors since the feature values must be multiplied by themodel weights.
Figure 1. Feature engineering maps raw data to ML features.
Mapping numeric values
Integer and floating-point data don't need a special encoding because they canbe multiplied by a numeric weight. As suggested in Figure 2, converting the rawinteger value 6 to the feature value 6.0 is trivial:
Figure 2. Mapping integer values to floating-point values.
Mapping categorical values
Categoricalfeatures have a discrete set of possible values.For example, theremight be a feature called street_name
with options that include:
{'Charleston Road', 'North Shoreline Boulevard', 'Shorebird Way', 'Rengstorff Avenue'}
Since models cannot multiply strings by the learned weights, we use featureengineering to convert strings to numeric values.
We can accomplish this by defining a mapping from the feature values, whichwe'll refer to as the vocabulary of possible values, to integers. Since notevery street in the world will appear in our dataset, we can group all otherstreets into a catch-all "other" category, known as an OOV (out-of-vocabulary)bucket.
Using this approach, here's how we can map our street names to numbers:
- map Charleston Road to 0
- map North Shoreline Boulevard to 1
- map Shorebird Way to 2
- map Rengstorff Avenue to 3
- map everything else (OOV) to 4
However, if we incorporate these index numbers directly into our model, it willimpose some constraints that might be problematic:
We'll be learning a single weight that applies to all streets. For example, ifwe learn a weight of 6 for
street_name
, then we will multiply it by 0 forCharleston Road, by 1 for North Shoreline Boulevard, 2 for Shorebird Way andso on. Consider a model that predicts house prices usingstreet_name
as afeature. It is unlikely that there is a linear adjustment of price basedon the street name, and furthermore this would assume you have ordered thestreets based on their average house price. Our model needs the flexibilityof learning different weights for each street that will be added to the price estimated using the other features.We aren't accounting for cases where
street_name
may take multiplevalues. For example, many houses are located at the corner of two streets, andthere's no way to encode that information in thestreet_name
value if itcontains a single index.
To remove both these constraints, we can instead create a binary vector for eachcategorical feature in our model that represents values as follows:
- For values that apply to the example, set corresponding vector elements to
1
. - Set all other elements to
0
.
The length of this vector is equal to the number of elements in the vocabulary.This representation is called a one-hot encoding when a single value is 1,and a multi-hot encoding when multiple values are 1.
Figure 3 illustrates a one-hot encoding of a particular street: Shorebird Way.The element in the binary vector for Shorebird Way has a value of 1
, while theelements for all other streets have values of 0
.
Figure 3. Mapping street address via one-hot encoding.
This approach effectively creates a Boolean variable for every feature value(e.g., street name). Here, if a house is on Shorebird Way then the binary valueis 1 only for Shorebird Way. Thus, the model uses only the weight for ShorebirdWay.
Similarly, if a house is at the corner of two streets, then two binary valuesare set to 1, and the model uses both their respective weights.
Sparse Representation
Suppose that you had 1,000,000 different street names in your data setthat you wanted to include as values for street_name
. Explicitly creating abinary vector of 1,000,000 elements where only 1 or 2 elements are true is avery inefficient representation in terms of both storage and computation timewhen processing these vectors. In this situation, a common approach is to use asparse representation in which only nonzero values are stored. In sparserepresentations, an independent model weight is still learned for each featurevalue, as described above.
Help Center
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-07-18 UTC.
[{ "type": "thumb-down", "id": "missingTheInformationINeed", "label":"Missing the information I need" },{ "type": "thumb-down", "id": "tooComplicatedTooManySteps", "label":"Too complicated / too many steps" },{ "type": "thumb-down", "id": "outOfDate", "label":"Out of date" },{ "type": "thumb-down", "id": "samplesCodeIssue", "label":"Samples / code issue" },{ "type": "thumb-down", "id": "otherDown", "label":"Other" }] [{ "type": "thumb-up", "id": "easyToUnderstand", "label":"Easy to understand" },{ "type": "thumb-up", "id": "solvedMyProblem", "label":"Solved my problem" },{ "type": "thumb-up", "id": "otherUp", "label":"Other" }]