On this page
Prepfile
Introducting Prepfile
If you have worked with tools such as Docker you’ll notice that they give the user the ability to declare operations they wish to perform using simple files such as Dockerfile
or compose.yml
.
Preprocess CLI implements a similar idea with the Prepfile.toml
The Prepfile.toml contains three sections :
- Data section [data]
- Preprocess section [preprocess]
- Postprocess section [postprocess]
In the data section you provide all necessary information to read the dataset on which you want to apply the operations. The preprocess section contains the operations that will be applied on the dataset. The postprocess section contains what to do after the preprocessing steps.
Prepfile tags
[data]
Parameter | Default Value | Description |
---|---|---|
filename | N/A | The path to the dataset. This is a mandatory parameter. |
csv_separator | N/A | Specify the separator used for reading the CSV file. Accepts any single character : ";" , "\t" , " " |
decimal_separator | "." | Specify the decimal separator when reading the dataset. Possible values may include "," , "." |
encoding | "utf-8" | Specify the character set encoding. Possible values are "utf-8" , "latin-1" |
missing_identifier | "" | Specify how missing values are represented in the dataset. For most data it’s empty "" but some dataset may explicitly fill missing values with "NA" , "N/A" … |
[preprocess]
[preprocess.numerics]
[preprocess.texts]
The following syntax
[preprocess.texts]
operations = [
{op = "fillna", method = "mean"}
]
is equivalent to :
[[preprocess.texts.operations]]
op = "fillna"
method = "mean"
But you cannot mix them. Choose one style and keep it throughout the Prepfile.
[[preprocess.columns]]
[postprocess]
[postprocess]
dropcolumns = ["BALANCE","SURFACE_AREA","EXPORTS_GOOD_SERVICES"]
format = 'csv'
sortdataset = {descending = false}
filename = 'indicators_numerics_cleaned.csv'
Last updated 31 May 2025, 21:32 +0200 .