Start-ups focused on data science, machine learning, and deep learning have blown up in cities like Seattle, San Francisco, New York, and just about everywhere else. They are hitting every field, finance, health, education, customer service, and beyond. Yet, some companies are still working out the kinks. Some are still trying to find the out how to turn their data and machine learning talent into fiscal gain.
Is machine learning just another fad? Isn’t business intelligence enough? Is there and need to learn languages and tools like R, Tensorflow, and Hadoop?
The truth is, not every problem can be solved with a deep learning algorithm, or automated chat bot. So how do you know if you are facing a problem where a data science solution is necessary?
Whether you are a service based industry or widgets factory, your company will naturally produce data as a byproduct of operations. You may even take the time to track this data in a very organized set of excel sheets or databases. That data is worth looking back at! Here are some projects and techniques that could lead to reduced costs and increased revenue that require basic data science and predictive analytic techniques that don’t require huge changes and investment in new hardware and expensive talent.
Predictive Analytics(A.K.A. Forecasting)
Some basic things that can be done for small companies is setting up a decent excel sheet that predicts future demand and pricing optimization. This isn’t data science per se. Forecasting has been done for business since the beginning of business. Some may call it ‘Predictive Analytics’, to give it a fancier sounding term, but it is just basic forecasting. If you aren’t doing this already, it would be a valuable place to start. Utilizing your day to day data, you can create a basic application that helps spot trends. One such example is what days are the best days to be open for business. One real life example involves a small restaurant that actually found they could save money by closing down on specific days of the year, because they had actually been losing money consistently on the specified days. So, instead, when they discovered this using forecasting they began to give these days off to their staff. This allowed for both the owners to increase their profits, and their staff was generally happier(they would otherwise put in 45+ hour weeks).
If your skill lies outside of this subject matter, it would be best to hire a team specialized in predictive analytics/forecasting. When set-up correctly, the application should be easy enough to maintain with minimal technical skills.
Multi Armed Bandit
For those of you familiar with A/B testing, this concept of the Multi-Arm bandit may sound similar. However, there are distinct differences. The Multi-armed bandit refers to a theoretical problem in which a gambler walks into a casino and has to decide the best method of approaching multiple slot machines (e.g. One armed bandits) in order to get the best outcome (based on the winning probability of each machine). This same idea has been used in website testing, ad-placement optimizing , story recommendations, etc. Basically, any time the computer can test and see which type of content an end-user will most likely click on. This requires you track your end-users actions and set up either a dynamic or time delayed algorithm that figures out what is the most effective piece of content to show an end-user to get them to interact. From here, you could even go as far as testing out some more complex neural networks once there is enough data gathered to see what features in the content itself, make it more attractive.
If you do A/B testing, there are some caveats if you plan to switch to using the Multi-armed Bandit method. Here is a great read about some of the drawbacks from VWO.com. It is good to know the pros and cons!
More complex statistics can be pulled by melding multiple data sources. Not just taking your everyday operations, but melding in customer data from multiple areas like social media, credit card data, etc. Could lead to a treasure trove of new revenue streams. It could help your company find new verticals to expand in, or which new neighborhood you should place your next store.
If you have an online store, you have even more possible data sources. The use cases expand as you increase the types of data. This will require some pruning and feature selection, depending on the style of machine learning/data science you are looking into applying. If you have enough data, with the right formats, you might be able to get into deep learning. However, realistically, if your company is small to medium sized, solid data science techniques will probably work best.
Let’s say you are a charity, and you have a backlog of donors that consistently donate every year. What if you could find more people like them? Some companies try to call thousands of people to find the few hundred that will donate. However, if you have their emails, or better yet, the social media tags. You can utilize social media services that help find these two hundred people more effectively. Companies like Architech Social and Facebook both provide services that help you grow your business by finding ‘look alikes’. Basically, they help you advertise to plausible future customers, that act like your current customers.
How you may ask? they are using a combination of clustering, classification and neural networks. Clustering and classification both utilize complex statistical techniques to help figure out the probabilities that one customer is similar to another. This could be based off of end-user patterns, product preferences, social factors, purchase history, and even comments, and posts on social media. At the end of the day, this requires a combination of most of the techniques mentioned above. This, of course, is only one example of using complex statistical analysis to find value in data.
These services don’t come cheap. Getting a solution in place that is a one time deal can help reduce the monthly bill Facebook and other digital advertisers charge.
These are just a few examples where data science,machine learning and analytics could be useful. If you are not sure about whether or not you have a project worth pursuing, feel free to give us a ping. We would be happy to help you figure out if a project is even worth spending time on.
Are you looking for great tools for machine learning, data science or data visualization?
Currently, there are an overwhelming amount of options. Do you pick Tensorflow or Theano? Tableau or Qlik, MongoDB, Hadoop...oh dear. How do you know which data tool to use? Many of them are good, some, not so good. In the end, you might need an algorithm just to pick which data science tools are best for your team.
We just wanted to go over a few technologies we personally love to work with. This in no way is all of them, but these are definitely some of the best options out there. However, every use case is different, give us a call if you are trying to decide what tools would work best for your company or project! We would love to help.
Libraries and Languages
Tensorflow itself is not a language. It is built off of C++ and Nvidia’s CUDA. The library itself is typically implemented using Python. However, it is not actually executed in python. Python just allows the end-user to design the data flow graph that will be used by the much faster lower level languages. Some people actually go down into the raw C++ level to further optimize their run times.
Even the overall design style of tensorflow can feel a little wonky if you are python programmer. Compared to python, you might feel like you are writing in a more model or math based language like mathematica or Matlab. You will declare several sets of variables before even actually setting an actual variable. This can be a little jarring for python purist. Overall, the basics are pretty easy to pick up.
The difficulty comes in actually understanding how to properly set up a neural network, convolution network, etc. A lot of questions come into play, which type of model, what type of data regularization do you think is best, what level of data drop out or robustness do you want and are you going to purchase GPUs from Nvidia or try to make it work on CPUs?(Pending on your data size, you will most likely have to purchase, or pay for AI as a service tech from Google).
You can’t get to far in the data science world without finding a few programmers who enjoy R. It is a very well developed language that is great for statisticians and CS majors alike. I do think it doesn’t offer the cool factor that python and other more ‘modern’ machine learning languages. However, it is a great workhorse and it is tried and true.
It can provide some newbies a false understanding of data science. Libraries to ensemble and boost algorithms require minimal knowledge of the algorithms themselves. This is great if you know why you are picking each algorithm. However, the illusion can lead to a false sense of understanding.
As a side-note, Sql Server actually just implemented the ability to run some R functions in a query. Not 100% on the performance, but it would be pretty cool if you could run a lot of data analysis even before getting the data out of the system.
The first time I was exposed to Caffe for deep learning was back when I took a compuational neuro class. We were programming in Matlab, but one of the other students happened to show me the work he was doing in his lab. It was all using the Caffe framework. Most of this requires modifying .sh configuration files. You can alter what type of network layers you are using, how many neurons, drop out, etc. For those more accustomed to running command line files, it works pretty smoothly and it works both on GPUs and CPUs.
Data Visualization Tools
Tableau is arguably one of the most popular data visualization tools for analysts no matter their proficiency with technology. Whether you are an engineer who develops complex neural networks or a business analyst who prefers to model in excel. Tableau is a friendly and easy to use data visualization tool. It allows the end-user to develop visually appealing interactive reports that help executives make decisions quickly.
On top of that, if your company has its own Tableau server, it also allows for quick and easy data transfer through beautiful and effective reports. If you just want to download the data itself, Tableau also allows you to download CSVs, screenshots, PDFs etc. You can even have reports emailed to you on a specific cadence.
This tool was built with the end-user in mind. One of my favorite features is Tableau Public. It allows you to share your reports publicly. Obviously, you can’t do this with company data. However, there are plenty of fun open data sets that can be used to make some beautiful and effective data reports. Check it out!
(Credit of Tableau goes to https://public.tableau.com/profile/gabe.dewitt#!/)
One of our employees was first introduced to D3js in college. Professor Jeff Heer, came and gave his class a 1 hour lecture on the library.He was instantly sold. The power D3 had to display data and let the end user drill into specific facts was amazing. This was his first exposure to data visualization in this manner.
Sure, before this he had seen matlab charts, and excel graphs, but nothing like this. He found he could use his skills with D3 to create graphs that were appealing, informative and interactive. There were so many benefits for an end user. Plus, unlike Tableau and other data viz tools. D3js allowed for almost unlimited customization.
We use it when customers are trying to avoid the steep costs of tableau, along with some other js libraries. There are some limits. For instance, D3 runs on the client side, this means it cannot handle the sheer magnitude of data that Tableau can.
Domo has some similarities to Tableau. It allows the end-user to have very pretty graphs and is used for KPI reports. It quickly integrates with over 1000+ data sources and manages large amounts of data quiet easily. From their it quickly melds data and creates pre-formatted KPI reports that can be shared across their internal platform. This is great, especially if your team doesn’t have enough resources to develop highly effective reports. Within minutes, your team can have standardized reports from tools like Salesforce, Concur, etc. In addition, if integrated properly, your company may be able to reduce its reporting tools. Thus, reducing maintenance, development and design costs.
There is some ability for modification. However, it is limited compared to most other data visualization tools. This will drive typical developers crazy. We love being able to get down to system level and actually modifying what each small component does and not being limited by buttons. However, if your team can't afford the extra person to create the reports, then this tool will save a large amount of resources.
Data Storage Tools
You can’t say big data and not have at least one person bring up Hadoop. Hadoop is not for the fain of heart. It does an amazing job at distributing storage of very large data sets over on computer clusters. However, unless you are comfortable with Java and command line type environments, it is not an easy beast to wrangle. It requires a heavy amount of configuration and manipulation to ensure it works optimally on your companies system. The reward at the end though, is more than well worth it. The ability to access data quickly, even with hardware failure is pretty hard to beat. For small companies, the maintenance and costs to employee a Hadoop specialist would probably be too large.
At some point in time, data storage was very expensive and databases had to be finely tuned to perfectly manage every byte and bit. Thus, the relational model was decided to be one of the best options of data storage.
Now we are in 2017 and due to both hardware and software advances. Data has become much cheaper. Suddenly, storing large masses of unstructured data is feasible and can be beneficial when designed well. MongoDB is a document store database. Instead of storing rows, it stores an entire document in one instance. This means you no longer have to query the entire database just to get two related data points. Instead, they will all remain on the same document(this used to be bad because this meant a lot of duplicated data). Now it is great. It allows speed to increase dramatically. There are plenty of pitfalls with MongoDB. This includes security, storage, and it is not ACID compliant(click here to read about ACID).
Oracle in itself could represent most other RDBMS. However, I find Oracle one of the best databases for managing big data. This has many reasons. Everything from the underlying architecture of the DB itself, to the ability to manage and manipulate the configuration of objects inside is much better tuned in Oracle. SQL Server is great for beginners, but it just doesn’t do what Oracle can.
We are a team of data scientists and network engineers who want to help your functional teams reach their full potential!