Data Movement from Oracle to MongoDB Made Easy with Apache Kafka

Change Data Capture features have existed for many years in the database world. CDC makes it possible to listen to changes to the database like inserting, updating and deleting data and have these events be sent to other database systems in various scenarios like ETL, replications and database migrations. By leveraging the Apache Kafka, the Confluent Oracle CDC Connector and the MongoDB Connector for Apache Kafka, you can easily stream database changes from Oracle to MongoDB. In this post we will pass data from Oracle to MongoDB providing a step by step configuration for you to easily re-use, tweak and explore the functionality.

At a high level, we will configure the above references image in a self-contained docker compose environment that consists of the following:

Oracle Database
MongoDB
Apache Kafka
Confluent KSQL

These containers will be run all within a local network bridged so you can play around with them from your local Mac or PC. Check out the GitHub repository to download the complete example.

Preparing the Oracle Docker image

If you have an existing Oracle database, remove the section “database” from the docker-compose file. If you do not already have an Oracle database, you can pull the Oracle Database Enterprise Edition from Docker Hub. You will need to accept the Oracle terms and conditions and then login into your docker account via docker login then docker pull store/oracle/database-enterprise:12.2.0.1-slim to download the image locally.

Launching the docker environment

The docker-compose file will launch the following:

Apache Kafka including Zookeeper, REST API, Schema Registry, KSQL
Apache Kafka Connect
MongoDB Connector for Apache Kafka
Confluent Oracle CDC Connector
Oracle Database Enterprise

The complete sample code is available from a GitHub repository.

To launch the environment, make sure you have your Oracle environment ready and then git clone the repo and build the following:

docker-compose up -d --build&NewLine;

Once the compose file finishes you will need to configure your Oracle environment to be used by the Confluent CDC Connector.

Step 1: Connect to your Oracle instance

If you are running Oracle within the docker environment, you can use docker exec as follows:

docker exec -it oracle bash -c "source /home/oracle/.bashrc; sqlplus /nolog "&NewLine;&NewLine;connect / as sysdba&NewLine;

Step 2: Configure Oracle for CDC Connector

First, check if the database is in archive log mode.

select log_mode from v$database;&NewLine;

If the mode is not “ARCHIVELOG”, perform the following:

SHUTDOWN IMMEDIATE;&NewLine;STARTUP MOUNT;&NewLine;ALTER DATABASE ARCHIVELOG;&NewLine;ALTER DATABASE OPEN;&NewLine;

Verify the archive mode:

select log_mode from v$database&NewLine;

The LOG_MODE should now be, “ARCHIVELOG”.

Next, enable supplemental logging for all columns

ALTER SESSION SET CONTAINER=cdb$root;&NewLine;ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;&NewLine;

The following should be run on the Oracle CDB:

CREATE ROLE C##CDC_PRIVS;&NewLine;GRANT CREATE SESSION,&NewLine;EXECUTE_CATALOG_ROLE,&NewLine;SELECT ANY TRANSACTION,&NewLine;SELECT ANY DICTIONARY TO C##CDC_PRIVS;&NewLine;GRANT SELECT ON SYSTEM.LOGMNR_COL$ TO C##CDC_PRIVS;&NewLine;GRANT SELECT ON SYSTEM.LOGMNR_OBJ$ TO C##CDC_PRIVS;&NewLine;GRANT SELECT ON SYSTEM.LOGMNR_USER$ TO C##CDC_PRIVS;&NewLine;GRANT SELECT ON SYSTEM.LOGMNR_UID$ TO C##CDC_PRIVS;&NewLine; &NewLine;CREATE USER C##myuser IDENTIFIED BY password CONTAINER=ALL;&NewLine;GRANT C##CDC_PRIVS TO C##myuser CONTAINER=ALL;&NewLine;ALTER USER C##myuser QUOTA UNLIMITED ON sysaux;&NewLine;ALTER USER C##myuser SET CONTAINER_DATA = (CDB$ROOT, ORCLPDB1) CONTAINER=CURRENT;&NewLine; &NewLine;ALTER SESSION SET CONTAINER=CDB$ROOT;&NewLine;GRANT CREATE SESSION, ALTER SESSION, SET CONTAINER, LOGMINING, EXECUTE_CATALOG_ROLE TO C##myuser CONTAINER=ALL;&NewLine;GRANT SELECT ON GV_$DATABASE TO C##myuser CONTAINER=ALL;&NewLine;GRANT SELECT ON V_$LOGMNR_CONTENTS TO C##myuser CONTAINER=ALL;&NewLine;GRANT SELECT ON GV_$ARCHIVED_LOG TO C##myuser CONTAINER=ALL;&NewLine;GRANT CONNECT TO C##myuser CONTAINER=ALL;&NewLine;GRANT CREATE TABLE TO C##myuser CONTAINER=ALL;&NewLine;GRANT CREATE SEQUENCE TO C##myuser CONTAINER=ALL;&NewLine;GRANT CREATE TRIGGER TO C##myuser CONTAINER=ALL;&NewLine; &NewLine;ALTER SESSION SET CONTAINER=cdb$root;&NewLine;ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;&NewLine; &NewLine;GRANT FLASHBACK ANY TABLE TO C##myuser;&NewLine;GRANT FLASHBACK ANY TABLE TO C##myuser container=all;&NewLine;

Next, create some objects

CREATE TABLE C##MYUSER.emp&NewLine;(&NewLine;   i INTEGER GENERATED BY DEFAULT AS IDENTITY,&NewLine;   name VARCHAR2(100),&NewLine;   lastname VARCHAR2(100),&NewLine;   PRIMARY KEY (i)&NewLine;) tablespace sysaux;&NewLine;  &NewLine;insert into C##MYUSER.emp (name, lastname) values ('Bob', 'Perez');&NewLine;insert into C##MYUSER.emp (name, lastname) values ('Jane','Revuelta');&NewLine;insert into C##MYUSER.emp (name, lastname) values ('Mary','Kristmas');&NewLine;insert into C##MYUSER.emp (name, lastname) values ('Alice','Cambio');&NewLine;commit;&NewLine;

Step 3: Create Kafka Topic

Open a new terminal/shell and connect to your kafka server as follows:

docker exec -it broker /bin/bash&NewLine;

When connected create the kafka topic :

kafka-topics --create --topic SimpleOracleCDC-ORCLCDB-redo-log \&NewLine;--bootstrap-server broker:9092 --replication-factor 1 \&NewLine;--partitions 1 --config cleanup.policy=delete \&NewLine;--config retention.ms=120960000&NewLine;

Step 4: Configure the Oracle CDC Connector

The oracle-cdc-source.json file in the repository contains the configuration of Confluent Oracle CDC connector. To configure simply execute:

curl -X POST -H "Content-Type: application/json" -d @oracle-cdc-source.json  http://localhost:8083/connectors&NewLine;

Step 5: Setup kSQL data flows within Kafka

As Oracle CRUD events arrive in the Kafka topic, we will use KSQL to stream these events into a new topic for consumption by the MongoDB Connector for Apache Kafka.

docker exec -it ksql-server bin/bash&NewLine;&NewLine;ksql http://127.0.0.1:8088&NewLine;

Enter the following commands:

CREATE STREAM CDCORACLE (I DECIMAL(20,0), NAME varchar, LASTNAME varchar, op_type VARCHAR) WITH ( kafka_topic='ORCLCDB-EMP', PARTITIONS=1, REPLICAS=1, value_format='AVRO');&NewLine;&NewLine;CREATE STREAM WRITEOP AS&NewLine;  SELECT CAST(I AS BIGINT) as "_id",  NAME ,  LASTNAME , OP_TYPE  from CDCORACLE WHERE OP_TYPE!='D' EMIT CHANGES;&NewLine;&NewLine;CREATE STREAM DELETEOP AS&NewLine;  SELECT CAST(I AS BIGINT) as "_id",  NAME ,  LASTNAME , OP_TYPE  from CDCORACLE WHERE OP_TYPE='D' EMIT CHANGES;&NewLine;

To verify the steams were created:

SHOW STREAMS;

This command will show the following:

Stream Name | Kafka Topic | Format &NewLine;------------------------------------&NewLine; CDCORACLE   | ORCLCDB-EMP | AVRO   &NewLine; DELETEOP    | DELETEOP    | AVRO   &NewLine; WRITEOP     | WRITEOP     | AVRO   &NewLine; ------------------------------------&NewLine;

Step 6: Configure MongoDB Sink

The following is the configuration for the MongoDB Connector for Apache Kafka:

{&NewLine;  "name": "Oracle",&NewLine;  "config": {&NewLine;    "connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",&NewLine;    "topics": "WRITEOP",&NewLine;    "connection.uri": "mongodb://mongo1",&NewLine;    "writemodel.strategy": "com.mongodb.kafka.connect.sink.writemodel.strategy.UpdateOneBusinessKeyTimestampStrategy",&NewLine;    "database": "kafka",&NewLine;    "collection": "oracle",&NewLine;    "document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy",&NewLine;    "document.id.strategy.overwrite.existing": "true",&NewLine;    "document.id.strategy.partial.value.projection.type": "allowlist",&NewLine;    "document.id.strategy.partial.value.projection.list": "_id",&NewLine;    "errors.log.include.messages": true,&NewLine;    "errors.deadletterqueue.context.headers.enable": true,&NewLine;    "value.converter":"io.confluent.connect.avro.AvroConverter",&NewLine;    "value.converter.schema.registry.url":"http://schema-registry:8081",&NewLine;    "key.converter":"org.apache.kafka.connect.storage.StringConverter"&NewLine;&NewLine;  }&NewLine;}&NewLine;

In this example, this sink process consumes records from the WRITEOP topic and saves the data to MongoDB. The write model, UpdateOneBusinessKeyTimestampStrategy, performs an upsert operation using the filter defined on PartialValueStrategy property which in this example is the "_id" field. For your convenience, this configuration script is written in the mongodb-sink.json file in the repository. To configure execute:

curl -X POST -H "Content-Type: application/json" -d @mongodb-sink.json  http://localhost:8083/connectors&NewLine;

Delete events are written in the DELETEOP topic and are sinked to MongoDB with the following sink configuration:

{&NewLine;  "name": "Oracle-Delete",&NewLine;  "config": {&NewLine;    "connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",&NewLine;    "topics": "DELETEOP",&NewLine;    "connection.uri": "mongodb://mongo1”,&NewLine;    "writemodel.strategy": "com.mongodb.kafka.connect.sink.writemodel.strategy.DeleteOneBusinessKeyStrategy",&NewLine;    "database": "kafka",&NewLine;    "collection": "oracle",&NewLine;    "document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy",&NewLine;    "document.id.strategy.overwrite.existing": "true",&NewLine;    "document.id.strategy.partial.value.projection.type": "allowlist",&NewLine;    "document.id.strategy.partial.value.projection.list": "_id",&NewLine;    "errors.log.include.messages": true,&NewLine;    "errors.deadletterqueue.context.headers.enable": true,&NewLine;    "value.converter":"io.confluent.connect.avro.AvroConverter",&NewLine;    "value.converter.schema.registry.url":"http://schema-registry:8081"&NewLine;&NewLine;  }&NewLine;}&NewLine;

curl -X POST -H "Content-Type: application/json" -d @mongodb-sink-delete.json  http://localhost:8083/connectors&NewLine;

This sink process uses the DeleteOneBusinessKeyStrategy writemdoel strategy. In this configuration, the sink reads from the DELETEOP topic and deletes documents in MongoDB based upon the filter defined on PartialValueStrategy property. In this example that filter is the “_id” field.

Step 7: Write data to Oracle

Now that your environment is setup and configured, return to the Oracle database and insert the following data:

insert into C##MYUSER.emp (name, lastname) values ('Juan','Soto');&NewLine;insert into C##MYUSER.emp (name, lastname) values ('Robert','Walters');&NewLine;insert into C##MYUSER.emp (name, lastname) values ('Ruben','Trigo');&NewLine;commit;&NewLine;

Next, notice the data as it arrived in MongoDB by accessing the MongoDB shell.

docker exec -it mongo1 /bin/mongo&NewLine;

The inserted data will now be available in MongoDB.

If we update the data in Oracle e.g.

UPDATE C##MYUSER.emp SET name=’Rob’ WHERE name=’Robert’;&NewLine;COMMIT;\&NewLine;

The document will be updated in MongoDB as:

{&NewLine;        "_id" : NumberLong(11),&NewLine;        "LASTNAME" : "Walters",&NewLine;        "NAME" : "Rob",&NewLine;        "OP_TYPE" : "U",&NewLine;        "_insertedTS" : ISODate("2021-07-27T10:25:08.867Z"),&NewLine;        "_modifiedTS" : ISODate("2021-07-27T10:25:08.867Z")&NewLine;}&NewLine;

If we delete the data in Oracle e.g.

DELETE FROM C##MYUSER.emp WHERE name=’Rob’; COMMIT;.&NewLine;

The documents with name=’Rob’ will no longer be in MongoDB.

Note that it may take a few seconds for the propagation from Oracle to MongoDB.

Many possibilities

In this post we performed a basic setup of moving data from Oracle to MongoDB via Apache Kafka and the Confluent Oracle CDC Connector and MongoDB Connector for Apache Kafka. While this example is fairly simple, you can add more complex transformations using KSQL and integrate other data sources within your Kafka environment making a production ready ETL or streaming environment with best of breed solutions.

Resources

How to Get Started with MongoDB Atlas and Confluent Cloud

Announcing the MongoDB Atlas Sink and Source Connectors in Confluent Cloud

Making your Life Easier with MongoDB and Kafka

Streaming Time-Series Data Using Apache Kafka and MongoDB

Data Movement from Oracle to MongoDB Made Easy with Apache Kafka

Preparing the Oracle Docker image

Launching the docker environment

Step 1: Connect to your Oracle instance

Step 2: Configure Oracle for CDC Connector

Step 3: Create Kafka Topic

Step 4: Configure the Oracle CDC Connector

Step 5: Setup kSQL data flows within Kafka

Step 6: Configure MongoDB Sink

Many possibilities

Resources

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112