Data engineering has many facets. One of the most common projects a data engineer takes on is developing an ETL pipeline from an operational DB to a data warehouse. Our team wanted to cover the overarching design of an ETL.
What are the typical principal components, stages, considerations, etc?
We started this by first writing Creating An ETL Part 1(more to come) and we and now have worked on a video that is below that walks through the process. We wanted to discuss why each stage is important and what occurs when data goes from raw to stage, why do we need a raw database and so on.
Data engineering is a complex discipline that partners automation, programming, system design, databases, and analytics in order to ensure that analysts, data scientists and end-users have access to clean data.
This all starts with the basic ETL design.
We are practicing up for a webinar that we will be hosting on ETL development with Python and SQL. The webinar itself will be much more technical and dive much deeper into each component described in the video. However, we wanted to see how using the whiteboard was like in case we need it.
If you would like to sign up for the free webinar we will be hosting it on February 23rd at 10 AM PT. Feel free to sign up here! If you have other questions please do contact us.
Read more about data science, healthcare analytics and automation below:
How Algorithms Can Become Unethical and Biased
How To Develop Robust Algorithms
4 Must Have Skills For Data Scientists
Time Series Model Development: Basic Models
How To Measure The Accuracy Of Predictive Models
What is A Decision Tree
We are a team of data scientists and network engineers who want to help your functional teams reach their full potential!