Skip to content

API Documentation

Below you will find details about the functions and modules of our project.

Módulo consolidador

module for extracting the necessary data to consolidate opening data.

extract_from_excel(path)

Read the files of a folder data/input and return a list of dataframes.

Parameters:

Name Type Description Default
input_path str

Path to folder with the files

required

Returns:

Name Type Description
list List[DataFrame]

list of dataframes

Source code in app\ETL\extract.py
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def extract_from_excel(path: str) -> List[pd.DataFrame]:
    """Read the files of a folder ```data/input``` and return a list of dataframes.

    Args:
        input_path (str): Path to folder with the files

    Returns:
        list: list of dataframes
    """
    all_files = glob.glob(os.path.join(path, '*.xlsx'))
    if not all_files:
        raise ValueError('No Excel files found in the specified folder')

    data_frame_list = []
    for file in all_files:
        data_frame_list.append(pd.read_excel(file))

    return data_frame_list

module with all the transformations necessary to consolidate the opening data.

concat_data_frames(data_frame_list)

Transform a list of dataframes in only one dataframe.

Parameters:

Name Type Description Default
data_frame_list List[DataFrame]

list of dataframes

required

Returns:

Type Description
DataFrame

pd.DataFrame: one dataframe

Source code in app\ETL\transform.py
 8
 9
10
11
12
13
14
15
16
17
18
19
def concat_data_frames(data_frame_list: List[pd.DataFrame]) -> pd.DataFrame:
    """Transform a list of dataframes in only one dataframe.

    Args:
        data_frame_list (List[pd.DataFrame]): list of dataframes

    Returns:
        pd.DataFrame: one dataframe
    """
    if not data_frame_list:
        raise ValueError('No data to transform')
    return pd.concat(data_frame_list, ignore_index=True)

module with all the transformation necessary to consolidate the opening data.

load_excel(data_frame, output_path, file_name)

Receive a dataframe and save as excel.

Parameters:

Name Type Description Default
data_frame DataFrame

dataframe to be save as excel

required
output_path str

path where the file will be saved

required
file_name str

name of folder to be saved

required

Returns:

Name Type Description
None None
Source code in app\ETL\load.py
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
def load_excel(
    data_frame: pd.DataFrame, output_path: str, file_name: str
) -> None:
    """Receive a dataframe and save as excel.

    Args:
        data_frame (pd.DataFrame): dataframe to be save as excel
        output_path (str): path where the file will be saved
        file_name (str): name of folder to be saved

    Returns:
        None:
    """
    if not isinstance(data_frame, pd.DataFrame):
        raise TypeError('`data_frame` must be a pandas DataFrame.')
    if not file_name:
        raise ValueError('`file_name` cannot be empty.')
    if not os.path.exists(output_path):
        os.makedirs(output_path)
    try:
        data_frame.to_excel(os.path.join(output_path, file_name), index=False)
        print('File saved successfully!')
    except PermissionError as e:
        raise PermissionError(
            errno.EACCES, f'Permission denied: {output_path}'
        ) from e
    except Exception as e:
        raise RuntimeError(
            f'Failed to save DataFrame as Excel file: {e}'
        ) from e

This module contains functions for the ETL process.

pipeline_completa(input_folder, output_folder, output_file_name)

Função ETL: Extract, Transform and load data from Excel files.

type: input_folder: strs

Source code in app\ETL\pipeline.py
 8
 9
10
11
12
13
14
15
16
def pipeline_completa(input_folder, output_folder, output_file_name):
    """
    Função ETL: Extract, Transform and load data from Excel files.

    type: input_folder: strs
    """
    data = extract_from_excel(input_folder)
    consolidated_df = concat_data_frames(data)
    load_excel(consolidated_df, output_folder, output_file_name)