Acheron Analytics
  • Home
  • Who We Are
  • Services
    • All Data Science Services
    • Fraud and Anomaly Detection
    • Data Engineering And Automation
    • Healthcare Policy/Program ROI Engine
    • Data Analytics As A Service
    • Data Science Trainings >
      • Python, SQL and R Trainings
      • ARIMA And Predictive Model Forecasting
  • Contact
  • Acheron Blog
  • Partners

A Guide to Designing a Data Science Project

8/15/2017

1 Comment

 
Recently, our team of data consultants had an awesome opportunity to present to a class of future data scientists at Galvanize Seattle. One student who came to hear our talk was Rebecca Njeri. Below, she shares tips on how to design a Data Science project.

To Begin, Brainstorm Data Project Ideas

To begin your data science project, you will need an idea to work on. To get started, brainstorm possible ideas that might interest you. During this process, go as wide and as crazy as you can, don’t censor yourself. Once you have a few ideas, you can narrow down to the most feasible/interesting idea. You could brainstorm ideas around these prompts:

​

Questions To Help You Think Of Your Next Data Science Projects
  • What excites you enough that you would enjoy working on for 2-3 weeks? Is it basketball, baseball or The Voice? What kind of data exists in that space? What kinds of apps could be build around these passions?
  • Do you have any pain points associated with the products and services you use each day and could they be improved?
  • Where do you want to work? What kind of app would showcase your skills to your potential employer? (I found out that Chrome hired a few developers because they kept hacking Chrome. I think that is brilliant, although not entirely sure how that translates in Data Science.)
  • Are there products that you could design for social good? e.g . is there a way to ease homelessness in Seattle using the data available?
  • What kind of tools would you like to learn/practice? What kind of projects could you work on that enable you to use these tools? Do you want to improve your SQL, AWS, Python, Image processing, tensor flow, Business Intelligence, Tableau, or Power BI?
If you are unable to think of something, or just need place to get started, check out this post by Analytics Vidhya.

​Write a proposal:

Write a proposal along the Cross Industry Standard Process for Data Mining (CRISP DM standards) which has the following steps:

Business Understanding
​

What are the business needs you are trying to address? What are the objectives of the Data Science project? For example, if you are at a telecommunications company, that needs to retain its customers, can you build a model that predicts churn? Maybe you are interested in using live data to help better predict what coupons to offer what customers at the grocery store.

Data Understanding

What kind of data is available to you? Is it stored in a relational or NoSQL database? How large is your data? Can it be stored and processed on your hard drive or will you need cloud services? Are the any confidentiality issues or NDAs involved if you are working in partnership with a company or organization? Can you find a new data set online that you could merge and increase your insights.

Data Preparation
​

This stage involves doing a little Exploratory Data Analysis and thinking about how your data will fit into the model that you have. Is the data in data types that are compatible with the model? Are there missing values or outliers? Are these naturally occurring discrepancies or errors that should be corrected before fitting the data into a model? Do you need to create dummy variables for categorical variables? Will you need all the variables in the data set are some dependent on each other?

Modeling

Choose a model and tune the parameters before fitting it to your training set of data. Python’s scikit learn library is a good place to get model algorithms. With larger data, consider using Spark ML.

Evaluation

Withhold a test set of data to evaluate the model performance. Data Science Central has a great post on different metrics that can be used to measure mode performance. The Confusion Matrix can help with considering the cost-benefit implications of the model’s performance.

Deployment/Prototyping
Deployment and implementation are some of the key components of any data driven project. You have to get past the theory and algorithms and actually integrate your data science solution into the larger environment. 

​Flask and bootstrap are great tools to help you deploy your data science project to the world.


​
data planning project consulting
Planning Your Data Science Projects

Keep a timeline with a To Do, In Progress, Completed and Parking section. Have a self-scrum(lol) each morning to see what you accomplished the previous day and set a goal for the new day. It could also help to get a friend with whom to scrum and help you keep track of your metrics. Goals and metrics can help you hold yourself accountable and ensure that you actually follow through and get your project done.

Track  your Progress

Create a github repo for your project. Your proposal can be incorporated as the read me. Commit your work at the frequency which makes you comfortable, and keep track of how much progress you are making on your metrics. A repo will also make it easier to show your code to friends/mentors for a code review.

Knowing When to Stop Your Project

It may be good to work on your project with a minimum viable product in mind. You may not get all the things on your To Do list accomplished, but having an MVP can help you know when to stop. When you have learned as much as you can from a project, even if you don’t have the perfect classification algorithm, it may be more worthwhile to invest in a new project.

Some Examples Of Data Driven Projects

Below are some links to Github repos of some Data Science Capstones:
Mememoji
Predicting Change in Rental Price Units in NYC
Bass Generator

All the best with your new Data Science project! Feel free to reach out if you need someone to help you plan your new project.

Want to be further inspired on your next data driven project!

Check out some of our other data science and machine learning articles. You never know what might inspire you.

Practical Data Science Tips

Creatively Classify Your Data

25 Tips To Gain New Customer

How To Grow Your Data Science Or Analytics Practice

​
consulting data driven
1 Comment
Tex Hooper link
11/17/2021 06:07:22 pm

I like your data tips. We need some research imported. I'll have to hire a data mining service.

Reply



Leave a Reply.

    Subscribe Here!

    Our Team

    We are a team of data scientists and network engineers who want to help your functional teams reach their full potential!

    Archives

    November 2019
    September 2019
    August 2019
    July 2019
    June 2019
    May 2019
    April 2019
    February 2019
    January 2019
    December 2018
    August 2018
    June 2018
    May 2018
    January 2018
    November 2017
    October 2017
    September 2017
    August 2017
    July 2017
    June 2017
    May 2017
    April 2017
    March 2017
    February 2017

    Categories

    All
    Big Data
    Data Engineering
    Data Science
    Data Science Teams
    Executives
    Executive Strategy
    Leadership
    Machine Learning
    Python
    Team Work
    Web Scraping

    RSS Feed

    Enter your email address:

    Delivered by FeedBurner

  • Home
  • Who We Are
  • Services
    • All Data Science Services
    • Fraud and Anomaly Detection
    • Data Engineering And Automation
    • Healthcare Policy/Program ROI Engine
    • Data Analytics As A Service
    • Data Science Trainings >
      • Python, SQL and R Trainings
      • ARIMA And Predictive Model Forecasting
  • Contact
  • Acheron Blog
  • Partners