Thursday, September 19, 2019

What is a clear study/work plan to learn data science?

Start by learning statistics. It’s the first step to become a Data scientist. In fact, let me give you an overview of the responsibilities of a Data Scientist, so you will have clear idea about what you should learn and why you should learn them.
A data scientist has the following responsibilities—
  1. Gather and clean data — A data scientist collects data from different sources and this data can come in multiple forms. In the form of excel sheets, word file and more. We spend more than 90 % of our time collecting this data. You can call this data preparation for further work
  2. Data Analysis— This requires you to know statistics. You use different statistical techniques to analyse data. You are required to be familiar with both statistical as well as inferential statistics. You need to have knowledge of different R and Python libraries for this. Using these programming languages you implement statistical techniques.
  3. Data visualisation— You present the analysed data in a presentable format so that it makes sense to other people. For this, you need to know tools like Tableau..this again requires you to know R and Python.
  4. Predictive analytics — This is the final work. You need to know machine learning algorithms for building predictive models.

    The following picture describes the work of a Data Scientist elaborately — starting from data preparationto building predictive models. It might look a little complicated. However, if you pay close attention, you would be able to figure.
In order to perform all the above tasks you need to know the following skills —
  1. Statistics — descriptive and inferential
  2. R and Python
  3. Machine learning
To begin learning, you should start at statistics. It’s not very simple, though you might feel that it’s exhaustive. However, be patient and keep learning. Following are some of the statistical techniques you should focus on—
Descriptive statistics
  1. Types of data variables
    1. Central tendency measures
    2. Spread of data, Skew of data
    3. Measures of dispersion
  • Inferential statistics
    1. Population and sample (Sampling methods is optional but read it : simple random sampling and stratified random sampling
    2. Random variables, Probability distributions - normal, Poisson
    3. Estimation and Hypothesis testing.
Once you have good command on these, you proceed to learn how to implement these on data sets. For this, you would require data sets, you can use public data sets or platforms like Kdnuggets, Kaggle etc to work on projects.
Further, you move to learn R and Python programming. In fact, it’s best if you simultaneously learn how to implement statistical techniques using these languages. If you have experience in programming, this is the best way to go. If you don’t, gradually move to learn R and Python programming.
Learning this much should get on the path to becoming data scientists. In fact, you would be able to get entry level roles in data science, if you know this much.
Next, to further learn skills, you can start learning machine learning and try and build predictive models. Following are some of the frequently used machine learning algorithms—
  1. Linear Regression
  2. Logistic Regresssion
  3. Decision Tree
  4. SVM
  5. Naive Bayes
  6. KNN
  7. K-Means
  8. Random Forest
  9. Dimensionality Reduction Algorithms
  10. Gradient Boosting algorithms
    -GBM
    -XGBoost
    -LightGBM
    -CatBoost
Since you‘re just beginning to learn data science, I want to suggest you that simply learning all these skills is not all. You should be working on projects. In fact, when it comes to data science application of these skills is far more important. No matter which stage you are at, you will find good projects to do on Kaggle.
However, since you’re a complete beginner, I would recommend you to use edWisor. Here you can learn complete data science while working on projects. In fact, a lot of analytics companies in India hire freshers for data science roles from here based on the projects people do here. Additionally, you can go for data campdata science .com etc . However, these platforms are limited in terms of the projects one can do.
Overall, to become a data scientist, follow this approach—
  1. Learn statistics
  2. Learn R and Python programming
  3. Learn visualisation and finally predictive modelling
PS: I have known a lot of data scientists including myself, and almost every one has had a random learning path. This is precisely because earlier there was no particular path one could follow to learn data science. However, the present generation has an opportunity here, as there are lot of resources available on the internet and can be easily accessed to learn data science and become a data scientists sooner. So follow the path I above mentioned above and you will become data scientist sooner.
Share:

0 comments:

Post a Comment