It is common to have hundreds to thousands of features available for a given machine learning problem. As a data scientist and machine learning practitioner, the first step we have to do after defining the problem is to select the optimal subset of features. Or, in another sense, we will...
[Read More]

## Feature Selection and Dimensionality Reduction

### Does this feature spark joy using using Scikit Learn and Pandas

It is common to have hundreds to thousands of features available for a given machine learning problem. As a data scientist and machine learning practitioner, the first step we have to do after defining the problem is to select the optimal subset of features. Or, in another sense, we will...
[Read More]

## Evaluation of Clustering Algorithms for Information Retrieval

### Using F-measure to evaluate clustering over pairs of points

A common question for clustering is that, once we cluster documents (e.g., articles, images, etc) together, how do we determine how good is the clustering results given ground truth clusters?
[Read More]

## Detecting Election Irregularities

### Using Benford's Law for irregularity detection in natural numbers

As 2020 General Election draws to a conclusion, the losing side is, as usual, raising questions about potential election fraud. So we believe it would be interesting to see if we can use Benfordâ€™s law to detect irregularities.
[Read More]

## Reformer Presentation at Weights and Biases Deep Learning Salon

### Weights and Biases is awesome

Recently I had the opportunity to give a talk at Weights and Biasesâ€™ Deep Learning Salon. I find Reformers to be an very interesting paper where it combines a lot of computer science techniques to deep neural networks. The talk has been recorded and published on Youtube. Please enjoy the...
[Read More]