It is common to have hundreds to thousands of features available for a given machine learning problem. As a data scientist and machine learning practitioner, the first step we have to do after defining the problem is to select the optimal subset of features. Or, in another sense, we will...
[Read More]
Feature Selection and Dimensionality Reduction
Does this feature spark joy using using Scikit Learn and Pandas
It is common to have hundreds to thousands of features available for a given machine learning problem. As a data scientist and machine learning practitioner, the first step we have to do after defining the problem is to select the optimal subset of features. Or, in another sense, we will...
[Read More]
Evaluation of Clustering Algorithms for Information Retrieval
Using F-measure to evaluate clustering over pairs of points
A common question for clustering is that, once we cluster documents (e.g., articles, images, etc) together, how do we determine how good is the clustering results given ground truth clusters?
[Read More]
Detecting Election Irregularities
Using Benford's Law for irregularity detection in natural numbers
As 2020 General Election draws to a conclusion, the losing side is, as usual, raising questions about potential election fraud. So we believe it would be interesting to see if we can use Benford’s law to detect irregularities.
[Read More]
Reformer Presentation at Weights and Biases Deep Learning Salon
Weights and Biases is awesome
Recently I had the opportunity to give a talk at Weights and Biases’ Deep Learning Salon. I find Reformers to be an very interesting paper where it combines a lot of computer science techniques to deep neural networks. The talk has been recorded and published on Youtube. Please enjoy the...
[Read More]