Airbyte added an open source Python Library, dubbed PyAirbyte, to its data migration platform that makes it simpler for IT teams to move data without first deploying a connector to each platform involved.
Instead, IT teams can use PyAirbyte to programmatically invoke more than 250 data connectors that Airbyte has already built.
Airbyte COO John Lafleur said as IT continues to evolve, itโs clear organizations need to move ever-increasing amounts of data between platforms to enable a wide range of digital business processes.
At the same time, organizations are also discovering there is a need to migrate data into vector databases to expose large language models (LLMs) that are the core of generative artificial intelligence (AI) platforms to new data using retrieval augmented generation (RAG) techniques to extend their capabilities. PyAirbyte is also compatible with orchestration tools such as Airflow and Dagster as well as LangChain, a framework used to expose LLMs to the data required to train them.
The challenge is that there are not enough data engineers available to migrate tools, so there is now a greater need for tools to automate that process, noted Lafleur.
Airbyte already makes available an Airbyte application programming interface (API)and Terraform Provider to enable IT teams to programmatically invoke its platform. However, as Python becomes a de facto standard for programmatically managing IT service management (ITSM), the need to make a library that is simpler to invoke has become apparent, said Lafleur.
Historically, IT teams have used extract, transform and load (ETL) tools to migrate data. Airbyte provides a tool for accomplishing that task at higher level abstraction, thereby eliminating the need for data engineering expertise to migrate large amounts of data.
Both the amount of data and the frequency it occurs will naturally vary from one organization to another but there is little doubt that what was once a comparatively rare event is now an almost daily occurrence within many organizations. As a result, IT administrators are now increasingly being called upon to manage these tasks.
In the meantime, not only is the volume of data that needs to be managed increasing, but so are the types of data being processed, analyzed and stored. Many organizations are embracing data lakes to centralize the management of all that data, but, as always, there needs to be a mechanism for moving data from the point where it is created to those repositories.
Regardless of the tools selected, organizations of all sizes will revisit their data management strategies in the age of AI. Much of their data is often conflicting or erroneous, so there is a pressing need to improve data quality. The first step in that effort is to centralize the repositories being used to store that data using tools such as PyAirbyte.
Unfortunately, the longer organizations wait to address that challenge the more data there will be manage once an organization finally does decide to confront the inevitable.