Recently, I've been interested in Data analysis.
So I researched about how to do machine-learning project and do it by myself.
I learned that scaling is important in handling features.
So I scaled every features while using Tree model like Decision Tree or LightGBM.
Then, the result when I scaled had worse result.
I searched on the Internet, but all I earned is that Tree and Ensemble algorithm are not sensitive to variance of the data.
I also bought a book "Hands-on Machine-learning" by O'Relly But I couldn't get enough explanation.
Can I get more detailed explanation for this?
Why Does Tree and Ensemble based Algorithm don't need feature scaling?
651 views Asked by yoon-seul At
2
There are 2 answers
1
MikkiPython
On
Do not confuse trees and ensembles (which may be consist from models, that need to be scaled). Trees do not need to scale features, because at each node, the entire set of observations is divided by the value of one of the features: relatively speaking, to the left everything is less than a certain value, and to the right - more. What difference then, what scale is chosen?
Related Questions in PYTHON
- How to store a date/time in sqlite (or something similar to a date)
- Instagrapi recently showing HTTPError and UnknownError
- How to Retrieve Data from an MySQL Database and Display it in a GUI?
- How to create a regular expression to partition a string that terminates in either ": 45" or ",", without the ": "
- Python Geopandas unable to convert latitude longitude to points
- Influence of Unused FFN on Model Accuracy in PyTorch
- Seeking Python Libraries for Removing Extraneous Characters and Spaces in Text
- Writes to child subprocess.Popen.stdin don't work from within process group?
- Conda has two different python binarys (python and python3) with the same version for a single environment. Why?
- Problem with add new attribute in table with BOTO3 on python
- Can't install packages in python conda environment
- Setting diagonal of a matrix to zero
- List of numbers converted to list of strings to iterate over it. But receiving TypeError messages
- Basic Python Question: Shortening If Statements
- Python and regex, can't understand why some words are left out of the match
Related Questions in DATA-ANALYSIS
- Pneumonia detection, using transfer learning
- duplicates within a 30 day period in samples from location A
- Understanding numeric_only boolean parameter in Pandas
- How can I turn categories into columns with percentage results?
- Unable to filter in power bi dax query
- YTD sum by month, using only latest value for each month
- Stopping a Power BI Table visual slicing the result of a virtual table
- Removing duplicate data conditionally in Excel
- How can I compare the similarity between multiple sets?
- Forecast the revenue for next month using 1 year historical data
- issue using dataset with data analysis project
- How can passive terms be rendered in the calculation of an MFA in R?
- Upsert using DuckDB
- Dynamic Filtering of Calculated Table Not Working with SELECTEDVALUE(slicer) in Power BI
- Mediation Analysis in R with two mediators in a repeated measure experiment (within-subject design)
Related Questions in DECISION-TREE
- Decision tree using rpart for factor returns only the first node
- ValueError: The feature names should match those that were passed during fit
- Creating Tensorflow decision forests from individual trees
- How to identify feature names from indices in a decision tree using scikit-learn’s CountVectorizer?
- How does persisting the model increase accuracy?
- XGBoost custom & default objective and evaluation functions
- AttributeError: 'RandomForestRegressor' object has no attribute 'tree_'. How do i resolve?
- Problem with Decision Tree Visualization in Weka: sorry there is no instances data for this node
- How can I limit the depth of a decision tree using C4.5 in Weka?
- Error when importing DecisionTreeClassifier from sklearn
- i have loaded a csv file in weka tool but J48 is not highlight
- how to change rules name? (chefboost)
- Why DecisionTreeClassifier split wrongly the data with the specified criterion?
- How to convert string to float, dtype='numeric' is not compatible with arrays of bytes/strings.Convert your data to numeric values explicitly instead
- Multivariate regression tree with "mvpart" (in R) and plots for each leaf of the tree visualization
Related Questions in ENSEMBLE-LEARNING
- How to ensamble models in R with caretEnsemble library (or other way)?
- how importance is calculated when pre() is used
- Pytorch: How to properly parallelize forward passes through an ensemble of networks?
- How can i use two textual inputs for GPT2 based regression model effectively?
- Using "user defined weights" for an ensemble model
- How to do ensembles with time series using AICc?
- How to extract feature names used by the first tree in GradientBoostingRegressor in scikit-learn
- Ensembling two Binary MLP classifier models, and making a unified multi-class classifier model. (Detection of DoS and DDoS of CICIDS2017 dataset)
- Creating an ensemble of classifiers based on predefined feature subsets
- StackingClassifier with base-models trained on feature subsets
- How to calculate the values at each node in a scikit-learn GradientBoostingRegressor?
- How to create my own custom objective function
- Learn a Weighted Average of Pre-Trained Neural Network Weights
- Soft ensembling voting with machine learning algorithms for multiple predictions in R
- How can I make a model built with small numbers, work with big numbers?
Related Questions in FEATURE-SCALING
- Feature Scaling with MinMaxScaler()
- R quanteda textplot_network for each document and influence number of features
- Is there any data scaling methods except for Min-Max Normalization and Quantile transformation that keeps the range between [0,1]?
- Getting a negative prediction after min-max scaling the price in a linear regression
- Understanding the Implications of Scaling Test Data Using the Same Scalar Object as Training Data
- Do i need to use RobustScaler() and OneHotEncoder() in new data before model.predict()
- How can I see progress of all Features with a Progress bar
- Does it makes sense to scale features by only one label before using logistic regression?
- Machine Learning: Combining Binary Encoder and RobustScaler
- Strange results when scaling data using scikit learn
- Should we always first perform feature normalization and then the feature reduction?
- Feature rescaling for k-means clustering
- Some columns became NaN after scaling
- Why Does Tree and Ensemble based Algorithm don't need feature scaling?
- Do features need to be scaled in Logistic Regression?
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Popular Tags
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Though I don't know the exact notations and equations, the answer has to do with the Big O Notation for the algorithms.
Big O notation is a way of expressing the theoretical worse time for an algorithm to complete over extremely large data sets. For example, a simple loop that goes over every item in a one dimensional array of size n has a O(n) run time - which is to say that it will always run at the proportional time per size of the array no matter what.
Say you have a 2 dimensional array of X,Y coords and you are going to loop across every potential combination of x/y locations, where x is size n and y is size m, your Big O would be O(mn)
and so on. Big O is used to compare the relative speed of different algorithms in abstraction, so that you can try to determine which one is better to use.
If you grab O(n) over the different potential sizes of n, you end up with a straight 45 degree line on your graph.
As you get into more complex algorithms you can end up with O(n^2) or O(log n) or even more complex. -- generally though most algorithms fall into either O(n), O(n^(some exponent)), O(log n) or O(sqrt(n)) - there are obviously others but generally most fall into this with some form of co-efficient in front or after that modifies where they are on the graph. If you graph each one of those curves you'll see which ones are better for extremely large data sets very quickly
It would entirely depend on how well your algorithm is coded, but it might look something like this: (don't trust me on this math, i tried to start doing it and then just googled it.)
Fitting a decision tree of depth ‘m’:
and a Log n graph ... well pretty much doesn't change at all even with sufficiently large numbers of n, does it?
so it doesn't matter how big your data set is, these algorithms are very efficient in what they do, but also do not scale because of the nature of a log curve on a graph (the worst increase in performance for +1 n is at the very beginning, then it levels off with only extremely minor increases to time with more and more n)