Acheron Analytics
  • Home
  • Who We Are
  • Services
    • All Data Science Services
    • Fraud and Anomaly Detection
    • Data Engineering And Automation
    • Healthcare Policy/Program ROI Engine
    • Data Analytics As A Service
    • Data Science Trainings >
      • Python, SQL and R Trainings
      • ARIMA And Predictive Model Forecasting
  • Contact
  • Acheron Blog
  • Partners

Why Migrate Databases To AWS Or Other Cloud Providers?

11/26/2017

0 Comments

 
One of the current common projects for companies to take on today are the migrations. Not just from one database system to another. Like a recent project where we migrated Oracle databases to MS SQL Server, Oracle to PostgreSql or MS SQL server to MYSQL, etc. But also, migrating servers from local hardware to cloud based systems like AWS and Azure.

The goal of these projects are simple.

Reduce costs, and increase the ability to spin up servers and databases on a whim.

Databases are not cheap to licenses when you use products like Oracle or MS SQL server. Add on top of that all of the Oracle and Microsoft's one off costs for every tool and doo-hicky a company can add on and the price starts to become overwhelming.
​

Databases no longer have to cost an arm and a leg and with the ability to reduce costs further by not just converting RDBMS (relationship database management systems) but also migrating to AWS.
In addition, the ease at which a general user can spin up and down a server on AWS allows for much more agile project development. This makes it easier to go from prototype to final product without dealing with as much bureaucracy. That is, if the company even had enough space on their servers.

Back when
companies had to manage a lot more of their own servers. If a company ran out of space on their current server racks, they had to go through the process of buying a new server.


That means getting approval, putting in a PO, waiting for the server, configuring it, securing it, and then putting it online. This was not only expensive. It could take weeks, months...maybe even a year or two depending on the pace of the company(let’s not even get started on a discussion on whether or not the server was needed. Often times, there was plenty of space on a server somewhere..just no one knew it existed).
Now, if a new server is needed, depending on the internal processes, it could be a quick approval a way from being spun up. A new database just a statement away.

Both offer significant advantages in savings.

​
However, database migrations and conversions are technologically complicated and intense projects! They require experts in database management, security and project managers to ensure the end result is secure and acts 100% the same as the previous set of objects.

So why would a company do it?

Over the next few articles we will be discussing how to convert various databases like Oracle and MS SQL to other options that can be free and just as effective.

We will also be listing out the benefits to switching to a cloud system.

At the end of the day, all of this will help reduce an IT department's cost substantially.

If you need help with any services such as data migrations or converting one RDBMS to another, our team would be happy to help! We have many members who have done all forms of data conversions and migrations.

If you want to read more about databases, data science and how to manage great data teams, then check out the articles below.

Should Our Team Invest In A Data Warehouse?

How To Survive Corporate Politics As A Data Scientist
​

8 Great Libraries For Machine Learning

Creating A Better Algorithm With Boosting and Bagging

0 Comments

Three Kinds of Machine Learning

11/8/2017

2 Comments

 
Guest Written By Rebecca Njeri
​
Last Thursday, I attended the machinery.ai conference in Seattle, WA, and got to listen to talks by Machine Learning experts that ranged from Machine Thinking to Integrating Data Science into Legacy Products. After about 1.5 years of learning and practising data science, this conference reminded me of the things that intrigued me when I first started learning data science, and I thought that I should write a post explaining the three different groups of machine learning algorithms.

Machine Learning can be defined as the science of getting computers to act without being explicitly programmed. It can be further divided into three broad categories: supervised learning, unsupervised learning, and reinforcement learning. A machine learning model should be chosen depending on the nature of the data available as will be illustrated below.

Asish Bansal premised his talk, Machine Thinking, by stating that not all business problems need a machine learning or deep learning solution. He argued that most business problems have a software engineering solution, and later, if need be, a machine learning or deep learning solution can be developed. To illustrate his point, he used the “FizzBuzz in TensorFlow interview” example where Joel Grus codes, as a joke, a TensorFlow solution to the fizzbuzz problem.  

Bansal’s talk reminded me of the importance of the business understanding and data understanding parts of the CRISP-DM process. Understanding the kind of data available: numbers, words, images, or voice data, labelled versus unlabeled, will determine what kind, if any, machine learning algorithm is the appropriate solution.

​
Picture
Supervised Learning

The main goal of supervised learning is to learn a model from labeled training data that allows us to make predictions about unseen or future data(Python Machine Learning, 3). Supervised learning can be divided into two categories depending on the outcome. If the outcome is a continuous value, we have a regression model, and if the outcome is discrete class labels, there is a classification model. There can be both binary classification models and multi-class classification models.

The simplest example of a regression problem is y = mx + c, where a univariate independent variable x is correlated with a dependent variable y, and an equation can be fit to known values, and used to predict unknown values of y given x. Another example of a regression problem, to once more borrow from the machinery.ai talks, is how long a person’s commute how will take given a labelled training set that has weather information and time of day as the independent variables, and commute times as the associated response variable.

Commonly occurring examples of binary classification problems in business analytics include: whether a customer churn or not churn, whether a lead convert or not, whether a transaction is fraud, whether an email is spam or not, among others.

Multi-class classification problems are similar to binary classification problems except there are more than two class labels. An example of this can be a classification of the different demographics of people who frequent a bookstore where labels can include: children under five, teens, young adults, adults, etc. Clearly segregating the shoppers can facilitate more efficient marketing campaigns and help the store’s bottom line.
Picture
Reinforcement Learning

In reinforcement learning, the goal is to develop a system that improves its performance based on interactions with the environment. The term reinforcement learning is actually borrowed from psychology which refers to any “stimulus which strengthens or increases the probability of a specific response. For example, if you want your dog to sit on command you may give him a treat every time he sits for you.”

For a machine learning example, when a self driving car takes a sharp turn too fast and moves outside its lane, it learns to adjust its speed the next time it takes that turn to ensure it stays within its lane. A reinforcement learning model improves its performance because it learns as it interacts with its environment.

Unsupervised Learning

Unsupervised learning is machine learning where there is unlabeled data or data of unknown structure. Examples of unsupervised learning algorithms include clustering and dimensionality reduction such as Principal Component Analysis. The model tries to learn patterns and correlations within the data on its own. Without an associated response variable Y, the goal is to “discover interesting things about the measurements: is there an informative way to visualize the data? Can we discover subgroups among the variables or among the observations?”

If the bookstore problem was presented without the class labels of the shoppers, a clustering algorithm could be fit to the data to separate the shoppers into different groups.

Conclusion

Almost every data science talk I have listened to underlines the fact that majority of data science work is data mining and data cleaning before any machine learning models can be built. In fact, most supervised and unsupervised learning algorithms are available in Python’s sklearn library, in RStudio, or some other form of open source software. Ultimately, an intimate understanding of the data that is available, and the implementation of the different machine learning algorithms, is necessary to leverage the power of supervised, unsupervised, and reinforcement learning.

Additional Resources

Andrew Ng’s Machine Learning Class on Coursera

Just for gags: Alexa And Google Home Are Scheming Against Apple's HomePod

Read More Data Science and Machine Learning Blog Posts

Creating A Better Algorithm With Boosting and Bagging

How To Survive Corporate Politics As A Data Scientists

Statistics Review For Data Scientists

A Guide To Starting A New Data Science Project
​

How To Grow A Data Science Team
2 Comments
    Subscribe Here!

    Our Team

    We are a team of data scientists and network engineers who want to help your functional teams reach their full potential!

    Archives

    November 2019
    September 2019
    August 2019
    July 2019
    June 2019
    May 2019
    April 2019
    February 2019
    January 2019
    December 2018
    August 2018
    June 2018
    May 2018
    January 2018
    November 2017
    October 2017
    September 2017
    August 2017
    July 2017
    June 2017
    May 2017
    April 2017
    March 2017
    February 2017

    Categories

    All
    Big Data
    Data Engineering
    Data Science
    Data Science Teams
    Executives
    Executive Strategy
    Leadership
    Machine Learning
    Python
    Team Work
    Web Scraping

    RSS Feed

    Enter your email address:

    Delivered by FeedBurner

  • Home
  • Who We Are
  • Services
    • All Data Science Services
    • Fraud and Anomaly Detection
    • Data Engineering And Automation
    • Healthcare Policy/Program ROI Engine
    • Data Analytics As A Service
    • Data Science Trainings >
      • Python, SQL and R Trainings
      • ARIMA And Predictive Model Forecasting
  • Contact
  • Acheron Blog
  • Partners