In the case of non-object Series, the NumPy dtype is translated to its Arrow equivalent. The column types in the resulting Arrow Table are inferred from the dtypes of the pandas.Series in theĭataFrame. Use preserve_index=True to force it to be stored as a column.Ĭonvert pandas.DataFrame to a pyarrow.Table to create a Dataset. The default of None will store the index as a column, except for RangeIndex which is stored as metadata only. Whether to store the index as an additional column in the resulting Dataset. Not all fields are known on construction and may be updated later.ĭataset information, like description, citation, etc. See the constructor arguments and properties for a full list. Keyword arguments to be passed to the BuilderConfig and used in the DatasetBuilder.ĭatasetInfo documents datasets, including its name, version, and features. * *config_kwargs (additional keyword arguments).Each template casts the dataset’s Features to standardized column names and types as detailed in datasets.tasks. The task templates to prepare the dataset for during training and evaluation. The combined size in bytes of all files associated with the dataset (downloaded files + Arrow files). The combined size in bytes of the Arrow tables for all splits. Size of the dataset in bytes after post-processing, if any. post_processing_size ( int, optional).The size of the files to download to generate the dataset, in bytes. The mapping between the URL to download the dataset’s checksums and corresponding metadata. The mapping between split name and metadata. The name of the configuration derived from BuilderConfig. It is also the snake_case version of the dataset builder class name. Usually matched to the corresponding script name. The name of the GeneratorBasedBuilder subclass used to create the dataset. Specifies the input feature and the label for supervised learning if applicable for the dataset (legacy from TFDS). supervised_keys ( SupervisedKeysData, optional).For example, it can contain the information of an index. Information regarding the resources of a possible post-processing of a dataset. post_processed ( PostProcessedInfo, optional).The features used to specify the dataset’s column types. It can be the name of the license or a paragraph containing the terms of the license. A URL to the official homepage for the dataset.
0 Comments
Leave a Reply. |