Data transformation
and storage

The fundamental tools and methods for data processing
Modern analytical solutions can process data both in on-premise and cloud environment, within a single data center or on different continents. Reports and visualizations can be generated instantly even for large volumes of data.

A growing number of apps and devices generate data that needs processing. The traditional approach, in which data is prepared and transferred via centralized storage using a single platform, no longer meets business needs. Modern scalable solutions require the use of a wide range of tools and ecosystems.

ETL & CDC
Traditional data transformation procedure—ETL (Extract, Transform, Load)—includes such stages as loading, cleansing, mapping, consolidation, and export to a target application. Today is rapidly rising the CDC technology (Change Data Capture), which allows you to process large data sets in real time. Using CDC along with automated data changing allows to synchronize information in target applications with the source.
Python and frameworks
The simplicity and availability of wide range of open Python libraries have made this programming language the most common used data analysis tool. With Python, you can easily create scripts for data loading and transformation (ETL) that can extend standard BI platforms features built-in. Open-source libraries and frameworks, such as Airflow, help automate data management and ETL tasks.
Fast database management systems
Transactional DBMS cannot instantly return data in response to an analytical query as they were developed for another usage scenario. Their task is to save the flow of small portions of data on a disk and assure their integrity (OLTP scenario). Performing analytical queries (OLAP scenario) requires the use of a DBMS in which information is stored differently and can be extracted much faster. This becomes possible thanks to its column-oriented storage structure, in-memory computing technology, and other methods for optimized execution of analytical queries.

Alternative solutions

Hadoop ecosystem
Hadoop, a framework that scales data processing and storage across computer clusters, has its own project and technology ecosystem. Open-source products within this ecosystem solve a variety of data transformation and storage tasks typical for businesses. Today, the Hadoop ecosystem supports the majority of cloud storage providers and adjacent solutions, including Amazon, Microsoft, Qlik, and Tableau.
Virtualization and microservice architecture
The execution of analytical applications and services within containers helps to dramatically reduce the time required to transfer them from a test environment into production. As a result, less time is required to integrate business applications, implement new indicators, and connect new data sources. Managing system resources for containers makes it easier to scale solutions. The use of containers opens up new ways to embed data processing into existing business processes.
Layered transformation framework (LTF)
Data is stored separately at different stages of transformation. The created indicators are transferred from model to model making it possible to filter, enrich, cleanse, and perform other transformations in each layer independently. Data handling procedures become more transparent, while unified storage rules within a single layer simplify development—ETL processes can be controlled separately. Solutions are highly scalable, and you can connect any number of data sources to them without compromising performance.

Why are these tools and technologies chosen?

  • The technology has reached the maturity required for use in corporate settings.
  • Upscale professionals are available on the labor market.
  • The capability to achieve objectives specific for mid-sized companies and large enterprises both.
  • Ready-to-use practices and cases are avaliable.
  • There is a clear technology or ecosystem development roadmap.
  • Solutions based on this technology are easy to develop, implement and support.
  • They enable you to create a competitive edge.