read_s3¶
-
DataFrame.
read_s3
(bucket, keys, region, append=False, sep=',', num_lines_read=0, skip=0, colnames=None, time_formats=None)[source]¶ Read CSV files from an S3 bucket.
NOTE THAT S3 IS NOT SUPPORTED ON WINDOWS.
It is assumed that the first line of each CSV file contains a header with the column names.
- Args:
- bucket (str):
The bucket from which to read the files.
- keys (List[str]):
The list of keys (files in the bucket) to be read.
- region (str):
The region in which the bucket is located.
- append (bool, optional):
If a data frame object holding the same
name
is already present in the getML, should the content of of the CSV files in fnames be appended or replace the existing data?- sep (str, optional):
The separator used for separating fields.
- num_lines_read (int, optional):
Number of lines read from each file. Set to 0 to read in the entire file.
- skip (int, optional):
Number of lines to skip at the beginning of each file.
- colnames(List[str] or None, optional):
The first line of a CSV file usually contains the column names. When this is not the case, you need to explicitly pass them.
- time_formats (List[str], optional):
The list of formats tried when parsing time stamps.
The formats are allowed to contain the following special characters:
%w - abbreviated weekday (Mon, Tue, …)
%W - full weekday (Monday, Tuesday, …)
%b - abbreviated month (Jan, Feb, …)
%B - full month (January, February, …)
%d - zero-padded day of month (01 .. 31)
%e - day of month (1 .. 31)
%f - space-padded day of month ( 1 .. 31)
%m - zero-padded month (01 .. 12)
%n - month (1 .. 12)
%o - space-padded month ( 1 .. 12)
%y - year without century (70)
%Y - year with century (1970)
%H - hour (00 .. 23)
%h - hour (00 .. 12)
%a - am/pm
%A - AM/PM
%M - minute (00 .. 59)
%S - second (00 .. 59)
%s - seconds and microseconds (equivalent to %S.%F)
%i - millisecond (000 .. 999)
%c - centisecond (0 .. 9)
%F - fractional seconds/microseconds (000000 - 999999)
%z - time zone differential in ISO 8601 format (Z or +NN.NN)
%Z - time zone differential in RFC format (GMT or +NNNN)
%% - percent sign
- Returns:
DataFrame
:Handler of the underlying data.