279E4E5D07301655

From "A B C"
Jump to navigation Jump to search

Features for Machine Leraning

[snip]

Contents

  • Basic Level
1. How to transform numerical data into features for machine learning?
4.3.1 Standardization od dataset
4.3.1.1 Scaling features to a range
4.3.1.2 Scaling sparse data
4.3.2 Non-linear transformation
4.3.3 Normalization
4.3.4 Binarization
4.3.4.1 Feature binarization: the process of thresholding numerical features to get boolean values.
4.3.5 Encoding categorical features
4.3.6 Imputation of missing values
2. Importance of Feature Engineering - Why we need good features in the first place?
- Better features means flexibility
- Better features means simpler models
- Better features means better results
- What is Feature Engineering?
Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data.
" Feature engineering is manually designing what the input x’s should be" - Tomasz Malisiewicz.
- Feature Engineering is a Representation Problem
- Feature: An attribute useful for your modelling task
- Feature Extraction: The automatic construction of new features from raw data.
- Feature construction: The manual construction of new features from raw data
In order to manually create features, you have to take time with actual data and think about the underlying problem, structures in the data and how to best expose them to predictive modeling algorithms.
- Process of feature engineering
- Process of Machine Learning: problem definition, data selection, preparation, model preparation, evaluation, results.
Select Data: integrate data, de-normalize it into a dataset, collect it together.
Preprocess Data: format it, clean it, sample it so you can work with it.
Transform data: feature engineer happens HERE
Model data: create models, evaluate them and tune them ….
“Transforming data” from raw state and feature engineer are synonyms.
- Iterative Process of Feature Engineering: it is an iterative process that interplays with data selection and model evaluation.
1. Brainstorm features: Really get into the problem, look at a lot of data, study feature engineering on other problems and see what you can steal.
2. Devise features: Depends on your own problem, you may use automatic feature extraction or manual feature construction or mixtures of two.
3. Select features
4. Evaluate models
- Decompose Categorical Attributes
For example: you have a categorical attribute, like “item_color” that can be Red, Blue or Unknown. You could create a new binary feature called “has_color” and assign the value of “1” when an item has a color and “0” when the color is unknown.
- Reframe Numerical Quantities
For example:  you may have Item_Weight in grams, with a value like 6289. You could create a new feature with this quantity in kilograms as 6.289 or rounded kilograms like 6. If the domain is shipping data, perhaps kilograms is sufficient or more useful (less noisy) a precision for Item_Weight. The Item_Weight could be split into two features: Item_Weight_Kilograms and Item_Weight_Remainder_Grams, with example values of 6 and 289 respectively.

[snip]