Question: Are your sure if your test data is comparable to your training data?

Adversarial validation has been developed just for this purpose. It is a technique allowing you to easily estimate the degree of difference between your training and test data.

The idea is simple

Take your training data, remove the target, assemble your training data together with your test data, and create a new binary classification target where the positive label is assigned to the test data. At this point, run a machine learning classifier and evaluate for the ROC-AUC evaluation metric.

If your ROC-AUC is around 0.5, it means that the training and test data are not easily distinguishable and are apparently from the same distribution.

ROC-AUC values higher than 0.5 and nearing 1.0 signal that it is easy for the algorithm to figure out what is from the training set and what is from the test set: in such a case, don't expect to be able to easily generalize to the test set because it clearly comes from a different distribution.

AUC

ROC

Code

A fresh example can be found in The Kaggle Book page 302 (38%)