One of the current common projects for companies to take on today are the migrations. Not just from one database system to another. Like a recent project where we migrated Oracle databases to MS SQL Server, Oracle to PostgreSql or MS SQL server to MYSQL, etc. But also, migrating servers from local hardware to cloud based systems like AWS and Azure.
The goal of these projects are simple.
Reduce costs, and increase the ability to spin up servers and databases on a whim.
Databases are not cheap to licenses when you use products like Oracle or MS SQL server. Add on top of that all of the Oracle and Microsoft's one off costs for every tool and doo-hicky a company can add on and the price starts to become overwhelming.
Databases no longer have to cost an arm and a leg and with the ability to reduce costs further by not just converting RDBMS (relationship database management systems) but also migrating to AWS.
In addition, the ease at which a general user can spin up and down a server on AWS allows for much more agile project development. This makes it easier to go from prototype to final product without dealing with as much bureaucracy. That is, if the company even had enough space on their servers.
Back when companies had to manage a lot more of their own servers. If a company ran out of space on their current server racks, they had to go through the process of buying a new server.
That means getting approval, putting in a PO, waiting for the server, configuring it, securing it, and then putting it online. This was not only expensive. It could take weeks, months...maybe even a year or two depending on the pace of the company(let’s not even get started on a discussion on whether or not the server was needed. Often times, there was plenty of space on a server somewhere..just no one knew it existed).
Now, if a new server is needed, depending on the internal processes, it could be a quick approval a way from being spun up. A new database just a statement away.
Both offer significant advantages in savings.
However, database migrations and conversions are technologically complicated and intense projects! They require experts in database management, security and project managers to ensure the end result is secure and acts 100% the same as the previous set of objects.
So why would a company do it?
Over the next few articles we will be discussing how to convert various databases like Oracle and MS SQL to other options that can be free and just as effective.
We will also be listing out the benefits to switching to a cloud system.
At the end of the day, all of this will help reduce an IT department's cost substantially.
If you need help with any services such as data migrations or converting one RDBMS to another, our team would be happy to help! We have many members who have done all forms of data conversions and migrations.
If you want to read more about databases, data science and how to manage great data teams, then check out the articles below.
Should Our Team Invest In A Data Warehouse?
How To Survive Corporate Politics As A Data Scientist
8 Great Libraries For Machine Learning
Creating A Better Algorithm With Boosting and Bagging
Guest Written By Rebecca Njeri
Last Thursday, I attended the machinery.ai conference in Seattle, WA, and got to listen to talks by Machine Learning experts that ranged from Machine Thinking to Integrating Data Science into Legacy Products. After about 1.5 years of learning and practising data science, this conference reminded me of the things that intrigued me when I first started learning data science, and I thought that I should write a post explaining the three different groups of machine learning algorithms.
Machine Learning can be defined as the science of getting computers to act without being explicitly programmed. It can be further divided into three broad categories: supervised learning, unsupervised learning, and reinforcement learning. A machine learning model should be chosen depending on the nature of the data available as will be illustrated below.
Asish Bansal premised his talk, Machine Thinking, by stating that not all business problems need a machine learning or deep learning solution. He argued that most business problems have a software engineering solution, and later, if need be, a machine learning or deep learning solution can be developed. To illustrate his point, he used the “FizzBuzz in TensorFlow interview” example where Joel Grus codes, as a joke, a TensorFlow solution to the fizzbuzz problem.
Bansal’s talk reminded me of the importance of the business understanding and data understanding parts of the CRISP-DM process. Understanding the kind of data available: numbers, words, images, or voice data, labelled versus unlabeled, will determine what kind, if any, machine learning algorithm is the appropriate solution.
The main goal of supervised learning is to learn a model from labeled training data that allows us to make predictions about unseen or future data(Python Machine Learning, 3). Supervised learning can be divided into two categories depending on the outcome. If the outcome is a continuous value, we have a regression model, and if the outcome is discrete class labels, there is a classification model. There can be both binary classification models and multi-class classification models.
The simplest example of a regression problem is y = mx + c, where a univariate independent variable x is correlated with a dependent variable y, and an equation can be fit to known values, and used to predict unknown values of y given x. Another example of a regression problem, to once more borrow from the machinery.ai talks, is how long a person’s commute how will take given a labelled training set that has weather information and time of day as the independent variables, and commute times as the associated response variable.
Commonly occurring examples of binary classification problems in business analytics include: whether a customer churn or not churn, whether a lead convert or not, whether a transaction is fraud, whether an email is spam or not, among others.
Multi-class classification problems are similar to binary classification problems except there are more than two class labels. An example of this can be a classification of the different demographics of people who frequent a bookstore where labels can include: children under five, teens, young adults, adults, etc. Clearly segregating the shoppers can facilitate more efficient marketing campaigns and help the store’s bottom line.
In reinforcement learning, the goal is to develop a system that improves its performance based on interactions with the environment. The term reinforcement learning is actually borrowed from psychology which refers to any “stimulus which strengthens or increases the probability of a specific response. For example, if you want your dog to sit on command you may give him a treat every time he sits for you.”
For a machine learning example, when a self driving car takes a sharp turn too fast and moves outside its lane, it learns to adjust its speed the next time it takes that turn to ensure it stays within its lane. A reinforcement learning model improves its performance because it learns as it interacts with its environment.
Unsupervised learning is machine learning where there is unlabeled data or data of unknown structure. Examples of unsupervised learning algorithms include clustering and dimensionality reduction such as Principal Component Analysis. The model tries to learn patterns and correlations within the data on its own. Without an associated response variable Y, the goal is to “discover interesting things about the measurements: is there an informative way to visualize the data? Can we discover subgroups among the variables or among the observations?”
If the bookstore problem was presented without the class labels of the shoppers, a clustering algorithm could be fit to the data to separate the shoppers into different groups.
Almost every data science talk I have listened to underlines the fact that majority of data science work is data mining and data cleaning before any machine learning models can be built. In fact, most supervised and unsupervised learning algorithms are available in Python’s sklearn library, in RStudio, or some other form of open source software. Ultimately, an intimate understanding of the data that is available, and the implementation of the different machine learning algorithms, is necessary to leverage the power of supervised, unsupervised, and reinforcement learning.
Andrew Ng’s Machine Learning Class on Coursera
Just for gags: Alexa And Google Home Are Scheming Against Apple's HomePod
Read More Data Science and Machine Learning Blog Posts
Creating A Better Algorithm With Boosting and Bagging
How To Survive Corporate Politics As A Data Scientists
Statistics Review For Data Scientists
A Guide To Starting A New Data Science Project
How To Grow A Data Science Team
We are a team of data scientists and network engineers who want to help your functional teams reach their full potential!