Recently, I gained some insight on Structuring Machine Learning projects. How I wish I had this insight when we did some experiments in ML domain in not so distant past. Anyways, I wouldn’t want anybody else to get hit by the same stones, so below is a crux of what I think I have understood.
Feel free to share your inputs if you are not totally onboard or have a different insight.
[8 mins read]
A typical ML project pathway would have followed as the ToDo list:
- Fit Training set well.
- Fit Dev Set well.
- Fit Test Set well.
- Perform well in real world.
Here is a list of guidelines that could help a team to work efficiently. I know these are very subtle inputs, but I think, when working iteratively, it could have a significant impact.
For Overall Model:
- Define a Single Calculation metrics to know which model is better.e.g. Instead of having Precision and Recall as 2 values, get F1-Score as a single metric.
- To Split dataset, Identify the right split between your train/dev/test set. I know it was a standard practice to have a 60-20-20 split. If you have a good dataset(of the order ~100K), you could do well with 98-1-1 split(yes, you read that right). The objective of having a Dev set is to have a representative sample is that it gives you the confidence of your model performance. If 10k examples could do it, why go for an overkill.
For Dev/Test set:
- Have dev set and test set from the same distribution. Spending some hours on getting this right could help a lot, as we can definitely rely more on our dev set metric. Failing to do so, could lead you spending efforts on tuning a product that client did not ask for.
- Make sure the data in these sets reflect data you expect to get in future and consider important to do well on. e.g. If you care more about HD images, it might NOT be a good idea to optimize your model for some random quality of images downloaded from the web. These are the cracks which can cause a huge rework later by collapsing your foundation.
Orthogonality in Execution: This is by far one of the most important thought processes which were too obvious yet only after knowing how to execute it, I realize we missed it.
Often when working on ML project, we think about too many things at once. And everybody ends up working on all the problems. Yes, it is as bad as it sounds.
For example, Identifying a Single Calculation Metric and Improving It are totally different problems. Similarly, getting a bigger dataset and extracting appropriate features out are totally different problems. Worry about them separately.
A good thing to do when doing error analysis would be to:
- Get the mis-labeled dev set examples(where you got incorrect classification).
- Get the distribution of failures (basically, categorize them why this could have been misclassified and count them).
- Tag them with the complexity to fix those against the gain in the improvement of the model.
Now, you have a better sense of what to pursue.
As a bonus, you could split teams to fix those issues orthogonally.
Another important fact to be aware of at all moments is to maintain same distributions of dev set and test set if you are altering dev sets. An example could be, if you add more HD images to your dev set, make sure you update your test set to reflect the same distribution.
Human level performance:
One can use intuitions from folks around if we are struggling to achieve human-level accuracy.It might be obvious but I still mention these, as one might miss these with too many moving parts.
- Why did the human get it right?
- Better analysis of Bias/Variance
- Get Labelled data from humans.
It’s a good idea to revisit your metrics after a certain period of time and revalidate them. E.g. If doing well on Dev set + Single Evaluation Metrics does not reflect doing well on your product, REVISIT and CHANGE the metric as it no longer serves its purpose.
That’s it, folks! I am not sure if I did justice to whatever intuition I gained, and I won’t know unless you share feedback.
Together we learn, share and discover.