sklearn RandomForestRegressor incorrect max_samples
I had tried to use sklearn RandomForestRegressor as a model. My data input contains 58 data points, and I set the parameter ‘max_samples’ as default, ‘bootstrap’ as true, meaning that it will use bootstrap and take ‘If None (default), then draw X.shape[0] samples.’. Thus, it seems that it should be take 58 data points when training(could have the same data point since it takes with replacement). However, when I use sklearn.tree.plot_tree to draw the tree, it shows up that, when training a tree only about 34 was used. I am confused. So does it mean that the training neglect the duplication data, or it simply just not taking 58 data points? I think the first explanation makes more sense, but if that is true, what is the point to use bootstrap if the model not going to use the duplication data at all? Can someone explain? Thank you.
rf = RandomForestRegressor(n_estimators=200,oob_score=True, n_jobs=-1,bootstrap=True) fig, axes = plt.subplots(nrows = 1,ncols = 1,figsize = (4,4), dpi=1000) tree.plot_tree(rf.estimators_[100], filled = True); fig.savefig('rfr_individualtree0.png')