How I Became a Self-Taught Data Scientist

I became a Data Scientist by quitting my job and competing in Kaggle competitions.

This is my Kaggle profile. And yes, that’s a picture of me holding a catfish!

Kaggle Profile.png

After I graduated from college in early 2014, I knew that I wanted to become a Data Scientist, but I didn’t have the skillset yet. I thought I had a pretty good idea of what I needed to learn to get my foot in the door, but I was sick of being in school, didn’t have much money, and didn’t want to borrow $60,000 to get a Master’s degree.

I had a job lined up at Amazon in Seattle after I graduated since I had interned on the Prime team as a financial analyst during my junior year. I decided to go back because I knew that that I’d learn valuable skills like Excel, SQL, and be constantly working with data at a rapidly growing tech company. I would save up some money while figuring out what my next step was going to be.

I had discovered Kaggle in late 2013, and was obsessed with the potential of machine learning and data science to transform business and the world, but every time I tried to work with a dataset other than Titanic I felt lost. It didn’t help that whenever I attempted to train a machine learning algorithm on a Kaggle dataset it would takes hours to finish (my laptop at the time only had 4 Gb of RAM — it was 2014 and I didn’t have much money). I also routinely worked 60 hours a week so doing Kaggle competitions or taking online courses in the evenings seemed really unmanageable. Every day, it felt nearly impossible to study data science in my spare time, let alone take care of my other, albeit few, responsibilities.

Luckily for me, I had learned a bit of R while taking courses for my economics major in college. I made sure my managers at Amazon knew this, and told them to “take advantage of this skillset whenever possible”. I am glad I did this — in my first year I ended up making some histograms for an ad-hoc analysis of the Kindle Unlimited program that ended up being included in a report that went directly to Jeff Bezos. More importantly, I was tasked with building regression models to estimate the efficiency of Kindle marketing spend which I did in R (doing this in Excel would have been miserable). This latter project was an invaluable learning experience.

Still, I knew that I truly, madly, deeply wanted to be a Data Scientist, and so I quit my job after working about a year and half at Amazon.

So, what did I do?

1. I made a personal data science self-study program. The Open Source Data Science Masters was a big inspiration for me.

2. I completed the Coursera Data Science Specialization (R), and the Udacity Machine Learning Engineer Nanodegree (Python, scikit-learn, pandas, numpy, etc).

3. After #1 and #2, I felt like I could finally do some damage. This is when I transitioned to competing on Kaggle full-time. I competed in these competitions before getting a full-time job offer.

After finishing the Bosch Production Line Performance competition, it felt like it was time to start applying to Data Scientist jobs. I had followed my fiancé to North Carolina, and was depleting my Amazon savings — I was also itching to apply everything I’d learned to generate real business value.

I applied and interviewed with several employers, and I know for a fact that my Kaggle performance was instrumental in proving that I could do the work of a Data Scientist. After a year of self-study, it felt like I got the role I wanted, at the company I wanted. So far, working as a Data Scientist has been a thrilling, breathtaking experience — it was totally worth it!

Anyone that has the desire to become a Data Scientist can become one. Don’t listen to the haters on LinkedIn who say you need a PhD in Mathematics to become a “real Data Scientist” — that’s b******t.

I don’t think there is a perfect way to become a Data Scientist, and not everyone can do what I did, but here are some recommendations that I have found worked for me:

  • Just get started. Find a dataset you are curious about (Kaggle is a great place to start) and explore it with whatever tool you are comfortable with at first. It doesn’t matter if it’s Excel, SQL, R, Python, or something else. Don’t worry about “not knowing enough”. Your natural passion and curiosity will drive you to learn whatever tools and techniques you need to organically (probably through StackOverFlow questions and Kaggle forums). This will increase your energy and passion and make you excited to learn more.

  • Do Kaggle competitions. If you are anything like me, your natural competitive juices will push you to learn more. Don’t worry about doing poorly — there are plenty of competitions where I just downloaded the dataset and made a submission or two, scoring in the bottom half of competitors. This is normal, even for the GrandMasters who place in the top 10 out of thousands of teams.

  • Don’t spend a lot of money. There are so many online resources it isn’t even funny. Udacity and Coursera are both inexpensive options to learn the fundamentals of data science and machine learning.

  • Follow Data Scientists on Twitter. Jeremy Howard is my personal favorite. He is one of the top Data Scientists in the world, and has a very inclusive mindset which I find honorable.

Thomas Hepner1 Comment