2.5 | 2 Votes
What is a PARQUET file?
A PARQUET file is a dataset saved in the Apache Parquet format. It contains column-based data split into row groups. Projects that use Apache Hadoop software utilities often process data saved in the PARQUET format.
Apache Hadoop is an open-source software library used to process large data sets within a distributed computing system. Typically, projects created using Hadoop are used to solve complex problems that require analyzing incredibly large amounts of data.
Any project that uses Hadoop can store column-based data in PARQUET files. Each PARQUET file contains a magic number, PAR1, followed by row groups that specify the data each column contains. The files also contain various metadata stored in footer entries linked to each data chunk.
How to open a PARQUET file
You can use Parquet-tools (multiplatform), which is available as both a standalone tool and part of Apache Hadoop, to read and manipulate PARQUET files.