In order to perform competitively, you need trickier feature engineering. A good place to start is looking at features based on each row, considered separately:
- Compute the mean, median, sum, standard deviation, minimum, or maximum of the numeric values (or of a subset of them)
- Count the missing values
- Compute the frequencies of common values found in the rows (for instance, considering the binary features and counting the positive values)
- Assign each row to a cluster derived from a cluster analysis such as k-means
These meta-features (called thus because they are features that are representative of a set of single features) help to distinguish the different kinds of samples found in your data by pointing out specific groups of samples to your algorithm.
Meta-features can also be built based on columns. Aggregation and summarization operations on single features instead have the objective of providing further information about the value of numeric and categorical features; is this characteristic common or rare? This is information that the model cannot grasp because it cannot count categorical instances in a feature.
As meta-features, you can use any kind of column statistic (such as mode, mean, median, sum, standard deviation, min, max, and also skewness and kurtosis for numerical features). For column-wise meta-features, you can proceed in a few different ways:
- Frequency encoding: Simply count the frequency of the values in a categorical feature and then create a new feature where you replace those values with their frequency. You can also apply frequency encoding to numeric features when there are frequently recurring values.
- Frequencies and column statistics computed with respect to a relevant group: In this case, you can create new features from the values of both numeric and categorical features because you are considering distinct groups in the data. A group could be a cluster you compute by cluster analysis, or a group you can define using a feature (for instance, age may produce age groups, locality may provide areas, and so on). The meta-features describing each group are then applied to each sample based on its group. For instance, using a Pandas groupby function, you can create your meta-features, which are then merged with the original data based on the grouping variable. The trickiest part of this feature engineering technique is finding meaningful groups in data to compute the features on.
- Further column frequencies and statistics can be derived by combining more groups together.
- …. many more … be creative!
Example in The Kaggle Book (incl. explanation p.354 of 826) and here.