Over past year, I have seen quite a number of folks start with Data Science. And there are plenty of articles indicating the surface area of the entire domain. Many start, but few continue. Here, I try to list some traps that could stall an aspiring Data Scientist’s progress. As always, feel free to share any feedback you have.
A typical journey on ML looks like below:
- Get fascinated by all the hype and aim to become a data scientist.
- Get started with Andrew Ng’s ML course.
- Don’t understand what’s going on for 3 weeks, and wonder when will we start ‘actual’ deep learning.
- Switch to the fast.ai course.
- Get a feel of learning but deep down still not understanding how this works.
- Feel hollow and abandon all hope.
My journey was a little different but let me share my learnings over past year. This will definitely be the same structure of bullet points, but a little more verbose.
- Don’t be in a hurry to finish things up. I’ve learned this lesson the hard way and it cost me ~3 months of effort. Even after completing some courses and solving assignments, I forgot all about it. Before moving to next part, pause for a moment, ask yourself, ‘Did I understand what was done?‘. If the answer is ‘YES’, pause again and ask ‘Explain WHY?‘. If you have a hiccup there, don’t hesitate to go back, re-validate your understanding.
- Don’t just know the ‘HOW’ and ‘WHAT’ part of it, but focus more on the ‘WHY’ part. Do understand steps to do logistic regression but also understand the rationale behind on why do we need each step. Trust me, it will come back to haunt you in not so distant future if you have not understood it. If you need to go and look for variance and standard deviation, go and look for it.
- Don’t be paranoid if your progress is slow. ‘Getting there right’ is far more important than ‘Getting there fast’.
- Do give a shot at kaggle competitions. No amount of course experience can come close to exposure shared by a kaggle competition.
- Know that you will have to understand your data to solve an ML problem. It’s not an option. You cannot just try models and hope that a Neural Network will work magically(well, it will do that to some extent.). But for it to be of any practical use, you will have to understand the nitty-gritty details.
- You will have to care about numerical, categorical variables, measurements et all. You have feature engineering an integral part of an ML solution.
- Know that ML is Applied Statistics. And that is what Mr. Andrew Ng was trying to teach you all through Octave.
- Understand visualizations and by that I mean, understand when would you plot a certain graph instead of getting snsplot or ggplot right(that too is important but understand the intent first. ) e.g. If you have to confirm around association between two variables, scatter plot would make sense whereas if no association is required, a box plot should suffice. The ‘HOW’ part can always follow once you have the ‘WHY’ bit.
- Lastly, ‘Gut’ feeling is definitely important but visualizations will help you develop and direct your intuition. Don’t shy away from them. Knowing your data via plots can be better than skimming thru a large csv.
That’s it, folks. Happy Machine Learning!