First task as a data scientist?
Do you feel nervous?
Well, let me help you out here by breaking down the lifecycle of a data science project a little.
A non-technical client or someone from management or a stakeholder approaches you with a problem.
The problem could be anything ranging for ‘trying to reduce the number of paying customers is losing’ to ‘making sure a bus doesn’t shut down mid-ride’.
Most times, you are approached with a pretty ambitious task description. So the first thing you need to do as a data scientist is to translate the task into a concrete problem statement.
From defining the problem to presenting a solution back to all of your stakeholders comprises of some steps that we will discuss below!
These are the key steps:
- Frame the problem: It is extremely important to understand why you are working on that particular problem. Understanding the end user and their requirements. As those requests are often going to ambiguous, it is important to convert it into a concrete, well-defined problem statement.
- Collect the raw data needed to solve the problem: At this point you have a good understanding of what you are trying to achieve, what type of data you need to start with. So, do you have the data made available to you? Or do you collect the data? If you have the data available, is it huge piles of unusable data? What all can be used out of the provided information, and what more needs to be collected? How much time and money would you need to get the needed data? Do you have some other particular requirements for the project that may add to the budget or infrastructure?These are the questions you ask yourself in this step.
- Process the data (data wrangling): I am not sure anyone works in Data Science is lucky enough to get beautiful clean raw data. The data we often receive is full of anomalies, it has errors, missing values, so many other challenges. Hence, you will have to first clean the data to convert it to a form that you can further analyze.
- Explore the data: So yay, now you have cleaned the data, organised it and you are ready to go. What now? Well, this is an extremely important step where you try to understand the information your data contains at a higher level. Do you notice some kind of patterns, trends or correlations in your data? Is there something in the data that speaks to you? Something that stands out?
- Perform in-depth analysis (machine learning, statistical models, algorithms): There is where you invest some time understanding the information the data contains on a higher-level. Once you have done that you apply all the cutting-edge machinery of data analysis to unearth high-value insights and predictions.
- Operationalise : It is important to deliver final results and any other technical documents. Make sure you codes are neat and others can clearly understand what’s going on in them, comment all your codes. Implement pilot project in a real-world environment. Look for performance constraints, if any.
- Communicate the results of the analysis: This final step involves identifying all the key findings and communicate with management / stakeholders. Here you explain the model to non-technical colleagues. Communication with domain experts in order to determine if the results are a success or failure based on the pre-defined criteria is also a key motive of this step.