Documentation

gitHub

Parquet schema

Hackolade easily imports the schema from .parquet files, located on your local file system or on a shared directory, to represent the corresponding Entity Relationship Diagram and schema structure.  When multiple files are selected, you have the choice to either combine the schemas of the selected files (default), or to create a separate schema in the model for each selected file.

 

Parquet Cloud Selection - combine schemas

 

Parquet files can also be reverse-engineered from AWS S3, Azure Blob Storage, and Google Cloud Storage.

 

 

1) AWS S3

Give a meaningful name to the connection to identify it for later, and provide proper URI to your S3 bucket, and optional folder path.

 

Parquet Cloud Storage - AWS S3 connection

 

 

If the S3 bucket is private, you must also provide authentication parameters (Access key id and Secret access key):

parquet Cloud Storage - AWS S3 authentication

 

If you wish to handle AWS authentication through the credentials file, you may leave blank the Access Key ID and Secret Access Key fields, knowing that Hackolade supplies credentials following the recommendations described here.

 

 

2) Azure Blob Storage

Give a meaningful name to the connection to identify it for later, and provide proper Container name and Storage account name.

 

Parquet Cloud Storage - Azure connection

 

If the storage account is private, you must also provide your Storage access key:

Parquet Cloud Storage - Azure authentication

 

 

If you wish to filter files, you may enter a file name prefix:

Parquet Cloud Storage - Azure prefix

 

3) Google Cloud Storage

Give a meaningful name to the connection to identify it for later, and provide proper URI to your GCS bucket, and optional folder path.

Parquet Cloud Storage - Google connection

 

If the  bucket is private, you must also access to the Private key:

Parquet Cloud Storage - Google authentication