from_parquet¶

classmethod DataFrame.from_parquet(fname, name, roles=None, ignore=False, dry=False)[source]¶

Create a DataFrame from parquet files.

This is one of the fastest way to get data into the getML engine.

Args:

fname (str):

The path of the parquet file to be read.

name (str):

Name of the data frame to be created.

roles (dict[str, List[str]] or Roles, optional):

Maps the roles to the column names (see colnames()).

The roles dictionary is expected to have the following format:

roles = {getml.data.role.numeric: ["colname1", "colname2"],
         getml.data.role.target: ["colname3"]}

Otherwise, you can use the Roles class.

ignore (bool, optional):

Only relevant when roles is not None. Determines what you want to do with any colnames not mentioned in roles. Do you want to ignore them (True) or read them in as unused columns (False)?

dry (bool, optional):

If set to True, then the data will not actually be read. Instead, the method will only return the roles it would have used. This can be used to hard-code roles when setting up a pipeline.