xgboost in python not giving consistent model performance
I have been having the same problem with multiple datasets. I trained xgboost model in python on different datasets and used randomsearch hyper parameter tuning. The problem is that if I change random seeds, the model behavior on my out of sample changes. Sometimes recall changes from 40% to 60% on the same dataset. One problem is that my datasets are always small (less than a few thousand records). I tried to use cross-validation with different number of folds (5 to 20) and considering a wide range for the parameters that I try to tune but still same problem exists.
I wonder what other options is there that I can try to have a more consistent model. my dataset is pretty balanced so I dont think undersampling/over sampling would help me. I also tried doing a gridsearch rather than random search but then I had to have a smaller range for parameters so the model training does not take long.
Thanks for the help