Acheron Analytics
  • Home
  • Who We Are
  • Services
    • All Data Science Services
    • Fraud and Anomaly Detection
    • Data Engineering And Automation
    • Healthcare Policy/Program ROI Engine
    • Data Analytics As A Service
    • Data Science Trainings >
      • Python, SQL and R Trainings
      • ARIMA And Predictive Model Forecasting
  • Contact
  • Acheron Blog
  • Partners

How To Load Multiple Files Into Your Database With Python And SQL

2/24/2019

0 Comments

 
Picture
Loading files into databases are not exactly exciting work. It can be tedious and boring work. Often times it is constrained by tools like SSIS that freak out when a small change is made.
This is one of the core roles of BI and data engineering teams. They need to load files from operational databases into data warehouses. They do this by developing pipelines that process and transform data.
​
One problem we see is teams have multiple file types, data spec versions, and data providers. Each of which requires a different pipeline just to load into their raw database.

This forces teams to manage 10, 20 maybe even 50 smaller pipelines that are all subject to the change of the data providers. In addition, each new data provider requires an entirely new pipeline to be developed. It also requires managing infrastructure around each pipeline in order to log and track data pipelines statuses. This can quickly become very difficult to manage and tedious to build.

Another solution instead of attempting to manage multiple pipelines is to develop a central system that manages the various insert statements required to insert data into each different raw table. Kind of how we explain in the video below.
What the system does is essentially act as a lookup system that matches flat files to their insert queries. This reduces the need to redundantly have separate pipelines that would also all need an insert statement.

Instead, now the system uses one main system to load all the various tables. Theoretically, you can also put version numbers on each of the files that come in. This could be set up by tracking each version in the metadata table that will then allow you to tie a raw file to an insert file version.

Loading multiple file types by individual pipelines requires a lot of tedious development. Whether it be through SSIS or another system, the constant development quickly weighs heavily on data engineers. Instead of spending time developing analytical pipelines that drive clear value for your company. They will be stuck maintaining and developing operational pipelines.

This reduces its impact and reduces its overall value.

Thank you so much for reading! We hope you enjoyed the video. Please feel free to send us any questions. We would love to help you design your next automation system.
Also, if you liked this, feel free to check out our video on using bulk insert!

For further reading and videos on data science, SQL and Python:
How To Develop Robust Algorithms
Dynamically Bulk Inserting CSV Data Into A SQL Server
 
SQL Best Practices - Designing An ETL - Part 1
How Algorithms Can Become Unethical and Biased
4 Must Have Skills For Data Scientists
What is A Decision Tree

0 Comments



Leave a Reply.

    Subscribe Here!

    Our Team

    We are a team of data scientists and network engineers who want to help your functional teams reach their full potential!

    Archives

    November 2019
    September 2019
    August 2019
    July 2019
    June 2019
    May 2019
    April 2019
    February 2019
    January 2019
    December 2018
    August 2018
    June 2018
    May 2018
    January 2018
    November 2017
    October 2017
    September 2017
    August 2017
    July 2017
    June 2017
    May 2017
    April 2017
    March 2017
    February 2017

    Categories

    All
    Big Data
    Data Engineering
    Data Science
    Data Science Teams
    Executives
    Executive Strategy
    Leadership
    Machine Learning
    Python
    Team Work
    Web Scraping

    RSS Feed

    Enter your email address:

    Delivered by FeedBurner

  • Home
  • Who We Are
  • Services
    • All Data Science Services
    • Fraud and Anomaly Detection
    • Data Engineering And Automation
    • Healthcare Policy/Program ROI Engine
    • Data Analytics As A Service
    • Data Science Trainings >
      • Python, SQL and R Trainings
      • ARIMA And Predictive Model Forecasting
  • Contact
  • Acheron Blog
  • Partners