OctoSQL is a free and open source software to join data from multiple databases and files using SQL. It allows you to fetch data from sources like MySQL, CSV, Redis, PostgreSQL and shows that to you. You can use different type of joins using SQL to get the data and then save it. This is a command line tool and all you have to do is just execute a SQL query and get the data. Simple as that. To read local or remote database instance, it uses a configuration file that you can setup easily. If you know SQL Joins then you can easily use it to get data from multiple sources in any shape.
There are many database tools out there which let you connect to a database and do various things. But the problem is that they can’t do that for different type of databases. That’s where OctoSQL comes in handy. Here using this tool, you can connect to different type of databases and then you can simply get the data you want. You just need to have SQL knowledge. The tool is cross-platform and you can run it directly from MAC, Linux and, Windows command line. The default output format is ASCII table like but you can opt to get output in CSV, TSV, and JSON.
How to Join Data from Multiple Databases and Files using SQL?
For this post, I will be using OctoSQL on Windows to join data from a MySQL database and a CSV file. The first thing you have to do is to download OctoSQL Windows binary and then put it somewhere on your PC. For simplicity, you can rename it to “octosql.exe” and then you will have to create a configuration file. Create a “octosql.yaml” file in the same folder where you have saved the EXE file of OctoSQL.
Now, it is time to add database connection details in the configuration file. It uses a specific syntax for the database configuration in which you have to give a connection name, destination path, user name, password, etc. in my case, you can see the configuration as below. For other type of databases, you can see configuration parameters here.
dataSources:
- name: cats
type: csv
config:
path: "~/Documents/cats.csv"
- name: people
type: redis
config:
address: "localhost:6379"
password: ""
databaseIndex: 0
databaseKeyName: "id"
Now, you are ready to execute queries from command line and join data from multiple sources. The basic syntax of joining data from multiple databases is given below. Here you need to have knowledge of SQL Joins. Also, if you want then you can use a few SQL aggregate functions to refine the final data.
octosql "select c.naam, p.email from cust c JOIN people p ON p.id=c.ied"
In the above command, you can see the data it generated in my case. The default format is tabular but if you want to get the final data in other format then you will have to add special switch in the command and specify the format in which you want the data. See the example command below and its output in the screenshot.
octosql "select c.naam, p.email from cust c JOIN people p ON p.id=c.ied" -o OutputFormat
This way, you can easily execute different SQL commands to join data from multiple sources. This tool is very powerful and can even handle complex queries which include aggregate and other advanced SQL functions. By default, you will get the output data on the CLI but you can save that in a file as well. Just add “> abc.csv” to save the output in “abc” CSV file which will be created in current directory.
Closing thoughts
If you work with data in different data sources then OctoSQL is the tool for you. Here you just have to use SQL to get data from multiple sources and then save it in any format you want. You can use any JOIN in SQL and combine data from multiple tables and do whatever you want. So, if you are looking for a free database tool to get data from multiple sources then this post will be useful.