dsq is a free command line tool to run SQL queries against JSON, CSV, Excel, and Parquet files. Here this is an open source program and one of the built-in utilities in Datastation. You can install this standalone tool on your PC and use it to query different Excel compatible files. You can run simple as well complex SQL statements against a file and generate/extract data in the way you want.
This command line tool can read a file directly, or you can make it read data from standard input. And you can later pipe the output to a file in order to save the result of a certain SQL query. This is a cross-platform tool as well, but you have to manually do a build-install as there are no binaries available for now. But I hope there will be in the later updates.
Free tool to run SQL queries against JSON, CSV, Excel, Parquet Files
In order to install dsq for now, you need to have Golang installed on your PC. You can download Golang for your PC from here. After that, you can simply use it in the way you want. So, to install it, you can just the command below. Also, you have to make sure that you have Git installed. If you are on Linux or MAC then you probably have all these tools already installed. On Windows, however, you will have to install them in order to get started.
go install github.com/multiprocessio/dsq@latest
After dsq has been installed successfully, you can start using it right away. Here is the very basic command to read a CSV file and print it on the console.
./dsq cities.csv
To run a SQL query on a CSV file, use the following syntax of the dsq command. It will print the output of the SQL query on the command window, and then you can either save it or do whatever you want.
./dsq cities.csv "select City from {}"
Another example with SQL functions is below. Remember that it uses a SQLite database underneath, so all the SQLite commands are supported.
./dsq cities.json "select Min(LonD) from {}"
./dsq cities.json "select Max(LonD) from {}"
Apart from CSV, you can run it on other type of files as well, such JSON. The process is very similar to what you follow for CSV files. Just specify the query and it will generate the output as JSON as well. See this in the screenshot below.
./dsq cities.json "select City from {}"
In this way, you can use this simple and powerful command line tool to run SQL queries on CSV, JSON, and Parquet files. I didn’t have any Parquet files, so couldn’t test it on them. But I am confident that it will work on those files as well without any problem. If you want to see some technical details about it, then you can check out its GitHub repository.
Final thoughts:
If you often work with CSV or JSON files, then I think you should have a tool like this on your PC. It can be helpful to quickly extract some data from large Excel CSV or JSON files. Just run a SQL command and use its output wherever you want. This is as simple as that.