Source code is available here
If you’re reading this then probably you already know what PostgreSQL, Elastic, and Docker are. I hope so 😎 Because I won’t be focusing on that.
The aim of this article is to show you how you can easily sync data between PostgreSQL and ElasticSearch. By syncing I mean that every time you CRUDing data in the database, those changes are immediately mirrored in the Elastic.
So, let’s get started!
Create Elastic and Kibana services
Kibana is an open-source plugin for data visualization in Elastic. It provides variety of tools for data manipulation. It also gives us console to maked direct queries to Elastic instance. It’s something similar to GraphQL Playground (if you’ve ever worked with that).
So, our docker-compose could look like this:
version: '3.8'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.13.4
container_name: elasticsearch
environment:
- xpack.security.enabled=false
- discovery.type=single-node
- bootstrap.memory_lock=true
- 'ES_JAVA_OPTS=-Xms512m -Xmx512m'
ulimits:
memlock:
soft: -1
hard: -1
ports:
- '9200:9200'
kibana:
image: docker.elastic.co/kibana/kibana:7.13.4
depends_on:
- elasticsearch
ports:
- '5601:5601'
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
Of course, you can create just Elastic. But with the help of Kibana you can easily make requests to Elastic and view (also create) data via UI.
PostgreSQL service
We’ll need to customize it a bit later, so there’s no need to use the basic image directly in the docker-compose file. Create PostgresDockerfile
and paste this:
FROM postgres:13
Then add this to the docker-compose file under services:
database:
build:
context: .
dockerfile: PostgresDockerfile
container_name: database
ports:
- '5432:5432'
environment:
- POSTGRES_PASSWORD=postgres
PGsync
PGSync is a tool that’ll help us in our goal! You can read more about it here. All we need is just Redis (because it uses pub-sub mechanism) and the schema. So, add this lines to the docker-compose file:
redis:
image: 'redis:alpine'
container_name: redis
ports:
- '6379:6379'
This tool is written in Python, so we’ll need to create a Python container. As we can see from the docs, we should also change some configurations in postgresql.conf
.
So, create directory pgsync
and PgsyncDockerfile
at the root of the project.
FROM python:3.7-slim
WORKDIR /usr/src/app
COPY ./pgsync ./
RUN chmod +x ./entrypoint.sh
RUN pip install pgsync==2.1.1
RUN apt update \
&& apt install -y moreutils \
&& apt install -y jq \
&& apt install -y wait-for-it
ENTRYPOINT ["bash", "./entrypoint.sh"]
To apply additional configuration to postgres container we just create as many sql files as we want and just copy them into /docker-entrypoint-initdb.d
directory of our container. Nice! They will be executed at container initialization.
So, let’s create conf.sql
with the following content:
ALTER SYSTEM SET wal_level = logical;
ALTER SYSTEM SET max_replication_slots = 1;
-- set this value to the number of tables you want to load into elastic
And change PostgresDockerfile
to this:
FROM postgres:13
COPY ./pgsync/conf.sql /docker-entrypoint-initdb.d/
RUN chmod a+r /docker-entrypoint-initdb.d/conf.sql
Ok, as you saw we also should have entrypoint.sh
in out pgsync
directory. This is the main file and it will be executed if we run pgsync container.
#!/usr/bin/env bash
wait-for-it $PG_HOST:5432 -t 60
wait-for-it $REDIS_HOST:6379 -t 60
wait-for-it $ELASTICSEARCH_HOST:9200 -t 60
jq '.[].database = env.PG_DATABASE' schema.json | sponge schema.json
bootstrap --config ./schema.json
pgsync --config ./schema.json -d
We use jq
utility here for setting up database name dynamically in our schema
file depending on the env variable PG_DATABASE
at the build stage.
We also use wait-for-it
script to wait for our services. Just writing depends_on
in docker-compose file won’t always work, believe me!
Schema file
Create schema.json
inside pgsync
directory.
[
{
"database": "db_name",
"index": "elastic-index-name",
"nodes": {
"table": "table-name",
"schema": "public",
"columns": []
}
}
]
More info about schema definition is available here.
Service
Add these to docker-compose:
pgsync:
build:
context: .
dockerfile: PgsyncDockerfile
container_name: pgsync
environment:
- PG_USER=postgres
- PG_PASSWORD=postgres
- PG_DATABASE=postgres
- PG_HOST=database
- ELASTICSEARCH_HOST=elasticsearch
- REDIS_HOST=redis
Run it!
That’s it. Just run it using docker-compose up -d
. Now if you try to CRUD some data in the database table you’re syncing, the changes will be mirrored to Elastic almost immediately.
Of course there are other techniques how to sync data but this is the easiest in my opinion. Maybe we’ll talk about them next time 🤙
You can find complete example in my Github repository