What Data Science Means to a Digital Marketer
- Posted: 22nd April 2017
- Written by:
Rowan Meade @StrtgPrjctMgtRM
Data Science is an area of interest that has been on the rise over the last five years and we are quickly approaching a new plateau that will make the power of Data Science methodology, tools and resources accessible to more people than ever. In the following article, I’m going to outline a standard process for applying Data Science and provide some examples of where this can apply to Digital Marketers. In addition to this, while there are many tools coming on the market to help create/define/execute it’s worth noting a healthy enthusiasm for statistics, mathematics and (if possible SQL + Python) are a big advantage to those starting out.
Before I begin, I should acknowledge there is still a lot of misconception as to what Data Science is and more importantly how can it be used in the real world. Let’s start with a basic high level overview of how a Digital Marketer can start wrapping their minds around how to use the process:
- Asking an interesting question – This is where it all starts; define what is the burning question you and your company have? Just like any other scientific endeavor, you need to create your “hypothesis” and:
- Define what is the scientific goal
- Define what you will do when you have all the data
- Define what you want to predict or estimate
- Ask the question, ‘why’?
- After you ask why, try to gather variables that might limit your findings.
- Get the data – This step is crucial. Many companies have large amounts of data but the key to cultivating a superior data set is to know how to cleanly structure your data. If your company has clean/executable datasets then you are in a great position. But don’t worry if you not as fortunate; data cleaning is tedious but most certainly doable. You should:
- Determine how the data was sampled
- Determine which data is relevant
- Confirm if there are any privacy issues
NOTE: Save a backup copy of your data ( archive process/ rollback )
- Explore the data – So once you have completed Step 2, you can start getting into the meat of your analysis. Ultimately you need to be able to create a framework that can serve your goal. Luckily there are lots of resources out there that have been created by many extremely intelligent folks and they have (in their infinite wisdom) made them free to all. Understand the difference between descriptive vs inferential statistics. Descriptive statistics use the data to provide descriptions of the population, either through numerical calculations or graphs or tables. Inferential statistics make inferences and predictions about a population based on a sample of data taken from the population in question. Once you have investigated a framework that you believe suits your needs, you need to:
- Plot the data
- Identify any anomalies
- Determine if there are any patterns
- Model data – A data model organizes data elements and standardizes how the data elements relate to one another. Since data elements document real life people, places and things and the events between them, the data model represents reality (e.g., a house has many windows or a cat has two eyes.) Once you have tried a few models and made your choice you would next:
- Build a model
- Fit the model
- Validate the model
- Communicate and visualize the results – When you have gotten to this point, you will likely have put in quite a few hours into digging through data and constructing your solution. You will find that (depending on your audience) using simple MSPowerPoint presentations can convey your findings or for the more adventurous you can hop over to D3js.org. Finally, you should be able to:
- Determine what we learned
- Determine if the results make sense
- Determine what else were underlying variables and what else might help the visualization process/ data gathering
- Can we tell a story?
The above should start you on your path and there’s even more good news – there is no lack of resources to explore all the parts of this process. I’ve also provided some resources below for those of you that want to learn more, and I intend to follow this post up with another soon!
In the meantime, if you have any questions or comments, please feel free to contact me at firstname.lastname@example.org.
- http://datascience.ibm.com/ – IBM’s data science tool
- https://www.dataquest.io/mission/97/introduction-to-pandas – Great for learning
- https://www.tableau.com/about/blog/2015/6/tableau-mongodb-visual-analytics-json-speed-thought-39557 – Valuable for visualization
- https://bl.ocks.org/kerryrodden/7090426 – Another visualization example
- https://cloud.google.com/blog/big-data/2017/01/learn-tensorflow-and-deep-learning-without-a-phd – Google’s Tensor flow
- http://flowingdata.com/ – Good resource
- http://simplystatistics.org/ – Good resource
- http://blog.echen.me/ – Good resource