GlueSync - RDBMS and NoSQL data replication

GlueSync is a software product for real-time event-based data replication from RDBMS to NoSQL databases and viceversa, plus NoSQL to NoSQL databases too.

This means that you will be able to replicate data to and from relational, non-relational and between non-relational databases in real-time using native technologies officially supported and maintained by each database vendor, deployable in any cloud, virtual or containeraized environment and on-prem deployments, even on bare-metal servers.

Suppport for relational-to-relational database replication is set to become available with our upcoming major release.

You can read more about GlueSync’s native approach looking at our blog article The GlueSync Journey.

In this documentation you can find the installation and configuration steps necessary to setup GlueSync into your infrastructure and connect a RDBMS instance with a NoSQL database and viceversa, or a establish a replication between NoSQL and NoSQL databases. But, before jumping into the details, let’s talk about few core concepts.

Core concept behind GlueSync’s architecture

One of the main considerations we took when designing the GlueSync architecture is its native ability to be upscaled and deployed with ease, just like you used to do with your container-based applications: pull-config-deploy-enjoy. That’s the motto. You gain full control of what happens under the hood without keeping you into playing with GUIs that wouldn’t have allowed you to harness the full potential of this data replication suite.

GlueSync is being shipped trought containers, if you haven’t already read about containers you can have a look here at this link that points you directly to the official docker’s homepage for a reference about a common container environment.

This doesn’t mean that if you don’t have the possibility to run a containeraized environment into your infrastructure you couldn’t run GlueSync, on the contrary! You can ask our team to provide you the package for your specific destination platform in order to run it even on-prem in bare metal servers.

Here in the following diagram is represented an architectural overview of a GlueSync environment.

a diagram illustrating the architectural overview of GlueSync

Design

The design concept that have been adopted has basically been taking in consideration the purpose of each core functionality provided by the suite: it provides ability to replicate data from a relational database to a non-relational database and viceversa. This two aformentioned functionalities are called "ways" or "directions".

So, rather than having a monolithic single piece of general-purpose software, we have decoupled its functinalities into an auto-consistent and highly-resilient and specialized service per each "direction". Capable of replicating, logging, alerting and being monitored by itself without the need of a master central authority that could only have increased complexity and introduced a single point of failure into the overall architecture.

Being that said the result and outcome for our users is the ability to decide what to deploy per each use case: you have the control over the decision to deploy only the module to replicate data from MS SQL Server to MongoDB or just the viceversa due to your specific use case. In that way you’re going to have the fine graned control over permissions, security and performances that you deserve from a product made for real-worlds production use cases.

Understanding CDC vs GDC

When talking about sourcing changes from a relational database there are just a few ways to accomplish the task of auditing writes, updates and deletions performed at field-row level. The most common approach, but also the most challenging, is reading from the database transaction logs that luckily nowadays are wrapped around an API layer called CDC - Change Data Capture - which provides a way for application developers and DBAs to read throught it and understand the entire history of the changes that have been made from a certain time frame.

We used the term "challenging" because every database vendor have implemented its own way to expose these logs and building a tool capable of being compatible with all of them it is indeed a challenge by itself and sometimes specific vendors or database versions (esperially older ones) don’t provide either CDC or low level APIs to grab transaction logs from it.

In order to provide a wider compatibility on capturing real-time change streams from the vast majority of relational databases out there in the field we decided to develop a fine-tuned subset of UDFs (user defined functions) that together with a set of triggers helps GlueSync to enlarge the compatibility-base with relational databases while maintaining a safe, fast and secure approach whit which entire lifecycle is entirely managed by its engine itself. We called this feature GCD - GlueSync Data Capture - used for certain kind of database brands | versions | editions that do not currently (or at all) the native CDC tecnique or for those who we initially decided to make compatible first throught that feature and then to provide CDC out-of-the-box in the upcoming future.

Naming conventions

As naming conventions to nickname each type of GlueSync direction we adopted the following nomenclature per each replication component:

  • from relational (RDBMS) to non-relational databases (NoSQL) has been labeled SQL to NoSQL

  • from non-relational (NoSQL) to relational databases (RDBMS) has been labeled NoSQL to SQL

  • from non-relational (NoSQL) to non-relational databases (NoSQL) has been labeled NoSQL to NoSQL

NoSQL to NoSQL data replication is a newly added feature in this version.

What you’re going to need

Before proceeding, for each GlueSync instance that you’re willing to deploy, please check if you have the following information:

  • RDBMS connection details, like:

    • Username

    • Password

    • Connection string (IP address / port)

    • Tables names

  • NoSQL database connection details, like:

    • Destination, either bucket or database

    • Connection string (IP address / port)

    • Username

    • Password

If you’re involved on the implementation of GlueSync or just tryng it out via our trial program, we higly suggest you to have a visual query editor tool in order to bootstrap multiple datasources connection, easily import / edit / display data. The tested tools from the GlueSync product and QA teams are:

As MOLO17 we do not provide any support on these specific tools neither we advertise them, you are free to use the toolset that you prefer the most in order to connect and perform queries against your database(s).

…​and also, do not miss our section dedicated to tutorials and use cases of GlueSync.

Compatibility matrix

Non-relational databases (NoSQL)

vendor / edition / version GlueSync compatibility Technology used

Aerospike

✅ from GlueSync v1.3.4 starting from version 5.X and above all editions

compatible as a target trought official SDK, ability to read CDC stream planned for Q1 2023

Azure CosmosDB

⏱ support coming soon

-

Cassandra

⏱ support coming soon

-

Couchbase

✅ from GlueSync v1.0 starting from version 5.5 and above all editions

Native CDC via Eventing service, also available via Sync Gateway, works via official SDK

CouchDB

⏱ support coming soon

-

DynamoDB

✅ from GlueSync v1.4

Native CDC via DynamoDB Streams, works via official SDK

MongoDB

✅ from GlueSync v1.3 starting from version 3.6 and above all editions

Native CDC via Change Streams, works via official SDK

RavenDB

⏱ support coming by Q1 2023

-

Redis

⏱ support coming by Q1 2023

-

ScyllaDB

⏱ support coming by Q1 2023

-

BigData store

vendor / edition / version GlueSync compatibility Technology used

Amazon AWS S3

✅ from GlueSync v1.3.4

write-to performed trought official SDK

Apache HBase

✅ from GlueSync v1.4

Native CDC using HBase Java SDK

Azure data lake

⏱ write-to support coming soon

writes will be performed trought official SDK

Databriks

⏱ write-to support coming soon

writes will be performed trought official SDK

Google Cloud Storage

✅ from GlueSync v1.4

write-to performed trought official SDK

Snowflake

⏱ write-to support coming soon

writes will be performed trought official SDK

Relational databases (RDBMS)

vendor / edition / version GlueSync compatibility Technology used

DB2 for series i (AS/400), DB2 for z/OS, DB2 LUW

✅ from GlueSync v1.4.1 on DB2 LUW (tested from version 11.5), support for z/OS & i series to be added soon

On LUW Via GlueSync Data Capture (GDC)

MariaDB

✅ from GlueSync v1.3.3, tested from version 10.0 and above

Via GlueSync Data Capture (GDC)

Microsoft SQL Server and Microsoft SQL Azure

✅ from GlueSync v1.0, all editions

Native CDC via Change Tracking, from version 2016 or via GlueSync Data Capture (GDC) for older versions

MySQL

✅ from GlueSync v1.3.3, tested from version 8.0 and above

Via GlueSync Data Capture (GDC)

Oracle Database

✅ from GlueSync v1.2, all editions

Native CDC via Xstream APIs starting from 11.2g (11.2.0.4) or via GlueSync Data Capture (GDC) for older versions

PostgreSQL

✅ from GlueSync v1.3.3, tested from version 9.0 and above

Via GlueSync Data Capture (GDC)

SAP Hana

⏱ CDC support coming soon

-

SingleStore

⏱ write-to support planned, CDC support coming soon

-

Sybase SQL (Adaptive Server Enterprise, ASE, SAP Sybase)

✅ from GlueSync v1.3.3, tested from version ASE 15.7 and above

Via GlueSync Data Capture (GDC)

Sybase SQL Anywhere

✅ from GlueSync v1.3.3, to be tested

Via GlueSync Data Capture (GDC)

YugabyteDB

⏱ CDC support coming soon

-

Tested means that actually each version that ranges from the specific tag mentioned and above are currently under the integration tests suite and being tested together with performance benchmarks that are performed per each commit-basis in order to ensure no regression and the best quality outcomes. Other versions older that those included in our test suites might work but are not currently battle-tested for a production use case. If you would like to consider testing a specific database version which appears to not have been currently made compatible you are more then welcome to join our beta program, in that case consider to drop us a line at this email address telling us that you would like to be part of the beta program for a specific db version tag.

Minimum system requirements

  • a containerized environment is suggested (docker / K8s / OpenShift, …​);

  • 2 vCPU and 2GB of RAM;

  • 1 GB free disk space, used for logging redaction.