上一篇链接:
Python pandas有好几百个库函数,你都用过吗(5)_Hann Yang的博客-CSDN博客
DataFrame 类方法(211个,其中包含18个子类、2个子模块)
>>> import pandas as pd
>>> funcs = [_ for _ in dir(pd.DataFrame) if 'a'<=_[0]<='z']
>>> len(funcs)
211
>>> for i,f in enumerate(funcs,1):
print(f'{f:18}',end='' if i%5 else '\n')
abs add add_prefix add_suffix agg
aggregate align all any append
apply applymap asfreq asof assign
astype at at_time attrs axes
backfill between_time bfill bool boxplot
clip columns combine combine_first compare
convert_dtypes copy corr corrwith count
cov cummax cummin cumprod cumsum
describe diff div divide dot
drop drop_duplicates droplevel dropna dtypes
duplicated empty eq equals eval
ewm expanding explode ffill fillna
filter first first_valid_index flags floordiv
from_dict from_records ge get groupby
gt head hist iat idxmax
idxmin iloc index infer_objects info
insert interpolate isin isna isnull
items iteritems iterrows itertuples join
keys kurt kurtosis last last_valid_index
le loc lookup lt mad
mask max mean median melt
memory_usage merge min mod mode
mul multiply ndim ne nlargest
notna notnull nsmallest nunique pad
pct_change pipe pivot pivot_table plot
pop pow prod product quantile
query radd rank rdiv reindex
reindex_like rename rename_axis reorder_levels replace
resample reset_index rfloordiv rmod rmul
rolling round rpow rsub rtruediv
sample select_dtypes sem set_axis set_flags
set_index shape shift size skew
slice_shift sort_index sort_values sparse squeeze
stack std style sub subtract
sum swapaxes swaplevel tail take
to_clipboard to_csv to_dict to_excel to_feather
to_gbq to_hdf to_html to_json to_latex
to_markdown to_numpy to_parquet to_period to_pickle
to_records to_sql to_stata to_string to_timestamp
to_xarray to_xml transform transpose truediv
truncate tshift tz_convert tz_localize unstack
update value_counts values var where
xs
Series 类方法刚好也有211个:
>>> funcs = [_ for _ in dir(pd.Series) if 'a'<=_[0]<='z']
>>> len(funcs)
211
>>> for i,f in enumerate(funcs,1):
print(f'{f:18}',end='' if i%5 else '\n')
abs add add_prefix add_suffix agg
aggregate align all any append
apply argmax argmin argsort array
asfreq asof astype at at_time
attrs autocorr axes backfill between
between_time bfill bool cat clip
combine combine_first compare convert_dtypes copy
corr count cov cummax cummin
cumprod cumsum describe diff div
divide divmod dot drop drop_duplicates
droplevel dropna dt dtype dtypes
duplicated empty eq equals ewm
expanding explode factorize ffill fillna
filter first first_valid_index flags floordiv
ge get groupby gt hasnans
head hist iat idxmax idxmin
iloc index infer_objects interpolate is_monotonic
is_monotonic_decreasingis_monotonic_increasingis_unique isin isna
isnull item items iteritems keys
kurt kurtosis last last_valid_index le
loc lt mad map mask
max mean median memory_usage min
mod mode mul multiply name
nbytes ndim ne nlargest notna
notnull nsmallest nunique pad pct_change
pipe plot pop pow prod
product quantile radd rank ravel
rdiv rdivmod reindex reindex_like rename
rename_axis reorder_levels repeat replace resample
reset_index rfloordiv rmod rmul rolling
round rpow rsub rtruediv sample
searchsorted sem set_axis set_flags shape
shift size skew slice_shift sort_index
sort_values sparse squeeze std str
sub subtract sum swapaxes swaplevel
tail take to_clipboard to_csv to_dict
to_excel to_frame to_hdf to_json to_latex
to_list to_markdown to_numpy to_period to_pickle
to_sql to_string to_timestamp to_xarray tolist
transform transpose truediv truncate tshift
tz_convert tz_localize unique unstack update
value_counts values var view where
xs
两者同名的方法有181个,另各有30个不同名的:
>>> A,B = [_ for _ in dir(pd.DataFrame) if 'a'<=_[0]<='z'],[_ for _ in dir(pd.Series) if 'a'<=_[0]<='z']
>>> len(set(A)&set(B))
181
>>> len(set(A)|set(B))
241
>>> len(set(A)-set(B))
30
>>> len(set(B)-set(A))
30
>>> for i,f in enumerate(set(A)-set(B),1):
print(f'{f:18}',end='' if i%5 else '\n')
boxplot to_html from_dict to_xml info
corrwith eval to_parquet to_records join
stack columns melt iterrows to_feather
applymap to_stata style pivot set_index
assign itertuples lookup query select_dtypes
from_records insert merge to_gbq pivot_table
>>>
>>> for i,f in enumerate(set(B)-set(A),1):
print(f'{f:18}',end='' if i%5 else '\n')
factorize nbytes between to_list str
argsort rdivmod argmax tolist item
is_monotonic_increasingdt autocorr is_monotonic_decreasingview
repeat name array map dtype
divmod to_frame unique ravel searchsorted
hasnans is_unique is_monotonic cat argmin
>>>
>>> for i,f in enumerate(set(A)&set(B),1):
print(f'{f:18}',end='' if i%5 else '\n')
lt get reorder_levels reindex_like rfloordiv
rtruediv gt diff index update
add_prefix swapaxes reset_index mod reindex
product apply set_flags to_numpy cumprod
min transpose kurtosis to_latex median
eq last_valid_index rename pow all
loc to_pickle squeeze divide duplicated
to_json sort_values astype resample shape
to_xarray to_period kurt ffill idxmax
plot to_clipboard cumsum nlargest var
add abs any tshift nunique
count combine keys values set_axis
isnull sparse first_valid_index combine_first ewm
notnull empty mask truncate to_csv
bool at clip radd to_markdown
value_counts first isna between_time replace
sample idxmin div iloc add_suffix
pipe to_sql items max rsub
flags sem to_string to_excel prod
fillna backfill align pct_change expanding
nsmallest append attrs rmod bfill
ndim rank floordiv unstack groupby
skew quantile copy ne describe
sort_index truediv mode dropna drop
compare tz_convert cov equals memory_usage
sub pad rename_axis ge mean
last cummin notna agg convert_dtypes
round transform asof isin asfreq
slice_shift xs mad infer_objects rpow
drop_duplicates mul cummax corr droplevel
dtypes subtract rdiv filter multiply
to_dict le dot aggregate pop
rolling where interpolate head tail
size iteritems rmul take iat
to_hdf to_timestamp shift hist std
sum at_time tz_localize axes swaplevel
explode
所有函数帮助已上传本站资源版块,欢迎下载:
https://download.csdn.net/download/boysoft2002/87343363https://download.csdn.net/download/boysoft2002/87343363
to_系列函数:22个
Function01
to_clipboard(self, excel: 'bool_t' = True, sep: 'str | None' = None, **kwargs) -> 'None'
Copy object to the system clipboard.
Help on function to_clipboard in module pandas.core.generic:
to_clipboard(self, excel: 'bool_t' = True, sep: 'str | None' = None, **kwargs) -> 'None'
Copy object to the system clipboard.
Write a text representation of object to the system clipboard.
This can be pasted into Excel, for example.
Parameters
----------
excel : bool, default True
Produce output in a csv format for easy pasting into excel.
- True, use the provided separator for csv pasting.
- False, write a string representation of the object to the clipboard.
sep : str, default ``'\t'``
Field delimiter.
**kwargs
These parameters will be passed to DataFrame.to_csv.
See Also
--------
DataFrame.to_csv : Write a DataFrame to a comma-separated values
(csv) file.
read_clipboard : Read text from clipboard and pass to read_table.
Notes
-----
Requirements for your platform.
- Linux : `xclip`, or `xsel` (with `PyQt4` modules)
- Windows : none
- OS X : none
Examples
--------
Copy the contents of a DataFrame to the clipboard.
>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['A', 'B', 'C'])
>>> df.to_clipboard(sep=',') # doctest: +SKIP
... # Wrote the following to the system clipboard:
... # ,A,B,C
... # 0,1,2,3
... # 1,4,5,6
We can omit the index by passing the keyword `index` and setting
it to false.
>>> df.to_clipboard(sep=',', index=False) # doctest: +SKIP
... # Wrote the following to the system clipboard:
... # A,B,C
... # 1,2,3
... # 4,5,6
Function02
to_csv(self, path_or_buf: 'FilePathOrBuffer[AnyStr] | None' = None, sep: 'str' = ',', na_rep: 'str' = '', float_format: 'str | None' = None, columns: 'Sequence[Hashable] | None' = None, header: 'bool_t | list[str]' = True, index: 'bool_t' = True, index_label: 'IndexLabel | None' = None, mode: 'str' = 'w', encoding: 'str | None' = None, compression: 'CompressionOptions' = 'infer', quoting: 'int | None' = None, quotechar: 'str' = '"', line_terminator: 'str | None' = None, chunksize: 'int | None' = None, date_format: 'str | None' = None, doublequote: 'bool_t' = True, escapechar: 'str | None' = None, decimal: 'str' = '.', errors: 'str' = 'strict', storage_options: 'StorageOptions' = None) -> 'str | None'
Help on function to_csv in module pandas.core.generic:
to_csv(self, path_or_buf: 'FilePathOrBuffer[AnyStr] | None' = None, sep: 'str' = ',', na_rep: 'str' = '', float_format: 'str | None' = None, columns: 'Sequence[Hashable] | None' = None, header: 'bool_t | list[str]' = True, index: 'bool_t' = True, index_label: 'IndexLabel | None' = None, mode: 'str' = 'w', encoding: 'str | None' = None, compression: 'CompressionOptions' = 'infer', quoting: 'int | None' = None, quotechar: 'str' = '"', line_terminator: 'str | None' = None, chunksize: 'int | None' = None, date_format: 'str | None' = None, doublequote: 'bool_t' = True, escapechar: 'str | None' = None, decimal: 'str' = '.', errors: 'str' = 'strict', storage_options: 'StorageOptions' = None) -> 'str | None'
Write object to a comma-separated values (csv) file.
Parameters
----------
path_or_buf : str or file handle, default None
File path or object, if None is provided the result is returned as
a string. If a non-binary file object is passed, it should be opened
with `newline=''`, disabling universal newlines. If a binary
file object is passed, `mode` might need to contain a `'b'`.
.. versionchanged:: 1.2.0
Support for binary file objects was introduced.
sep : str, default ','
String of length 1. Field delimiter for the output file.
na_rep : str, default ''
Missing data representation.
float_format : str, default None
Format string for floating point numbers.
columns : sequence, optional
Columns to write.
header : bool or list of str, default True
Write out the column names. If a list of strings is given it is
assumed to be aliases for the column names.
index : bool, default True
Write row names (index).
index_label : str or sequence, or False, default None
Column label for index column(s) if desired. If None is given, and
`header` and `index` are True, then the index names are used. A
sequence should be given if the object uses MultiIndex. If
False do not print fields for index names. Use index_label=False
for easier importing in R.
mode : str
Python write mode, default 'w'.
encoding : str, optional
A string representing the encoding to use in the output file,
defaults to 'utf-8'. `encoding` is not supported if `path_or_buf`
is a non-binary file object.
compression : str or dict, default 'infer'
If str, represents compression mode. If dict, value at 'method' is
the compression mode. Compression mode may be any of the following
possible values: {'infer', 'gzip', 'bz2', 'zip', 'xz', None}. If
compression mode is 'infer' and `path_or_buf` is path-like, then
detect compression mode from the following extensions: '.gz',
'.bz2', '.zip' or '.xz'. (otherwise no compression). If dict given
and mode is one of {'zip', 'gzip', 'bz2'}, or inferred as
one of the above, other entries passed as
additional compression options.
.. versionchanged:: 1.0.0
May now be a dict with key 'method' as compression mode
and other entries as additional compression options if
compression mode is 'zip'.
.. versionchanged:: 1.1.0
Passing compression options as keys in dict is
supported for compression modes 'gzip' and 'bz2'
as well as 'zip'.
.. versionchanged:: 1.2.0
Compression is supported for binary file objects.
.. versionchanged:: 1.2.0
Previous versions forwarded dict entries for 'gzip' to
`gzip.open` instead of `gzip.GzipFile` which prevented
setting `mtime`.
quoting : optional constant from csv module
Defaults to csv.QUOTE_MINIMAL. If you have set a `float_format`
then floats are converted to strings and thus csv.QUOTE_NONNUMERIC
will treat them as non-numeric.
quotechar : str, default '\"'
String of length 1. Character used to quote fields.
line_terminator : str, optional
The newline character or character sequence to use in the output
file. Defaults to `os.linesep`, which depends on the OS in which
this method is called ('\\n' for linux, '\\r\\n' for Windows, i.e.).
chunksize : int or None
Rows to write at a time.
date_format : str, default None
Format string for datetime objects.
doublequote : bool, default True
Control quoting of `quotechar` inside a field.
escapechar : str, default None
String of length 1. Character used to escape `sep` and `quotechar`
when appropriate.
decimal : str, default '.'
Character recognized as decimal separator. E.g. use ',' for
European data.
errors : str, default 'strict'
Specifies how encoding and decoding errors are to be handled.
See the errors argument for :func:`open` for a full list
of options.
.. versionadded:: 1.1.0
storage_options : dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to ``urllib`` as header options. For other URLs (e.g.
starting with "s3://", and "gcs://") the key-value pairs are forwarded to
``fsspec``. Please see ``fsspec`` and ``urllib`` for more details.
.. versionadded:: 1.2.0
Returns
-------
None or str
If path_or_buf is None, returns the resulting csv format as a
string. Otherwise returns None.
See Also
--------
read_csv : Load a CSV file into a DataFrame.
to_excel : Write DataFrame to an Excel file.
Examples
--------
>>> df = pd.DataFrame({'name': ['Raphael', 'Donatello'],
... 'mask': ['red', 'purple'],
... 'weapon': ['sai', 'bo staff']})
>>> df.to_csv(index=False)
'name,mask,weapon\nRaphael,red,sai\nDonatello,purple,bo staff\n'
Create 'out.zip' containing 'out.csv'
>>> compression_opts = dict(method='zip',
... archive_name='out.csv') # doctest: +SKIP
>>> df.to_csv('out.zip', index=False,
... compression=compression_opts) # doctest: +SKIP
Function03
to_dict(self, orient: 'str' = 'dict', into=<class 'dict'>)
Help on function to_dict in module pandas.core.frame:
to_dict(self, orient: 'str' = 'dict', into=<class 'dict'>)
Convert the DataFrame to a dictionary.
The type of the key-value pairs can be customized with the parameters
(see below).
Parameters
----------
orient : str {'dict', 'list', 'series', 'split', 'records', 'index'}
Determines the type of the values of the dictionary.
- 'dict' (default) : dict like {column -> {index -> value}}
- 'list' : dict like {column -> [values]}
- 'series' : dict like {column -> Series(values)}
- 'split' : dict like
{'index' -> [index], 'columns' -> [columns], 'data' -> [values]}
- 'records' : list like
[{column -> value}, ... , {column -> value}]
- 'index' : dict like {index -> {column -> value}}
Abbreviations are allowed. `s` indicates `series` and `sp`
indicates `split`.
into : class, default dict
The collections.abc.Mapping subclass used for all Mappings
in the return value. Can be the actual class or an empty
instance of the mapping type you want. If you want a
collections.defaultdict, you must pass it initialized.
Returns
-------
dict, list or collections.abc.Mapping
Return a collections.abc.Mapping object representing the DataFrame.
The resulting transformation depends on the `orient` parameter.
See Also
--------
DataFrame.from_dict: Create a DataFrame from a dictionary.
DataFrame.to_json: Convert a DataFrame to JSON format.
Examples
--------
>>> df = pd.DataFrame({'col1': [1, 2],
... 'col2': [0.5, 0.75]},
... index=['row1', 'row2'])
>>> df
col1 col2
row1 1 0.50
row2 2 0.75
>>> df.to_dict()
{'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}
You can specify the return orientation.
>>> df.to_dict('series')
{'col1': row1 1
row2 2
Name: col1, dtype: int64,
'col2': row1 0.50
row2 0.75
Name: col2, dtype: float64}
>>> df.to_dict('split')
{'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],
'data': [[1, 0.5], [2, 0.75]]}
>>> df.to_dict('records')
[{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]
>>> df.to_dict('index')
{'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}
You can also specify the mapping type.
>>> from collections import OrderedDict, defaultdict
>>> df.to_dict(into=OrderedDict)
OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])),
('col2', OrderedDict([('row1', 0.5), ('row2', 0.75)]))])
If you want a `defaultdict`, you need to initialize it:
>>> dd = defaultdict(list)
>>> df.to_dict('records', into=dd)
[defaultdict(<class 'list'>, {'col1': 1, 'col2': 0.5}),
defaultdict(<class 'list'>, {'col1': 2, 'col2': 0.75})]
Function04
to_excel(self, excel_writer, sheet_name: 'str' = 'Sheet1', na_rep: 'str' = '', float_format: 'str | None' = None, columns=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True, encoding=None, inf_rep='inf', verbose=True, freeze_panes=None, storage_options: 'StorageOptions' = None) -> 'None'
Help on function to_excel in module pandas.core.generic:
to_excel(self, excel_writer, sheet_name: 'str' = 'Sheet1', na_rep: 'str' = '', float_format: 'str | None' = None, columns=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True, encoding=None, inf_rep='inf', verbose=True, freeze_panes=None, storage_options: 'StorageOptions' = None) -> 'None'
Write object to an Excel sheet.
To write a single object to an Excel .xlsx file it is only necessary to
specify a target file name. To write to multiple sheets it is necessary to
create an `ExcelWriter` object with a target file name, and specify a sheet
in the file to write to.
Multiple sheets may be written to by specifying unique `sheet_name`.
With all data written to the file it is necessary to save the changes.
Note that creating an `ExcelWriter` object with a file name that already
exists will result in the contents of the existing file being erased.
Parameters
----------
excel_writer : path-like, file-like, or ExcelWriter object
File path or existing ExcelWriter.
sheet_name : str, default 'Sheet1'
Name of sheet which will contain DataFrame.
na_rep : str, default ''
Missing data representation.
float_format : str, optional
Format string for floating point numbers. For example
``float_format="%.2f"`` will format 0.1234 to 0.12.
columns : sequence or list of str, optional
Columns to write.
header : bool or list of str, default True
Write out the column names. If a list of string is given it is
assumed to be aliases for the column names.
index : bool, default True
Write row names (index).
index_label : str or sequence, optional
Column label for index column(s) if desired. If not specified, and
`header` and `index` are True, then the index names are used. A
sequence should be given if the DataFrame uses MultiIndex.
startrow : int, default 0
Upper left cell row to dump data frame.
startcol : int, default 0
Upper left cell column to dump data frame.
engine : str, optional
Write engine to use, 'openpyxl' or 'xlsxwriter'. You can also set this
via the options ``io.excel.xlsx.writer``, ``io.excel.xls.writer``, and
``io.excel.xlsm.writer``.
.. deprecated:: 1.2.0
As the `xlwt <https://pypi.org/project/xlwt/>`__ package is no longer
maintained, the ``xlwt`` engine will be removed in a future version
of pandas.
merge_cells : bool, default True
Write MultiIndex and Hierarchical Rows as merged cells.
encoding : str, optional
Encoding of the resulting excel file. Only necessary for xlwt,
other writers support unicode natively.
inf_rep : str, default 'inf'
Representation for infinity (there is no native representation for
infinity in Excel).
verbose : bool, default True
Display more information in the error logs.
freeze_panes : tuple of int (length 2), optional
Specifies the one-based bottommost row and rightmost column that
is to be frozen.
storage_options : dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to ``urllib`` as header options. For other URLs (e.g.
starting with "s3://", and "gcs://") the key-value pairs are forwarded to
``fsspec``. Please see ``fsspec`` and ``urllib`` for more details.
.. versionadded:: 1.2.0
See Also
--------
to_csv : Write DataFrame to a comma-separated values (csv) file.
ExcelWriter : Class for writing DataFrame objects into excel sheets.
read_excel : Read an Excel file into a pandas DataFrame.
read_csv : Read a comma-separated values (csv) file into DataFrame.
Notes
-----
For compatibility with :meth:`~DataFrame.to_csv`,
to_excel serializes lists and dicts to strings before writing.
Once a workbook has been saved it is not possible to write further
data without rewriting the whole workbook.
Examples
--------
Create, write to and save a workbook:
>>> df1 = pd.DataFrame([['a', 'b'], ['c', 'd']],
... index=['row 1', 'row 2'],
... columns=['col 1', 'col 2'])
>>> df1.to_excel("output.xlsx") # doctest: +SKIP
To specify the sheet name:
>>> df1.to_excel("output.xlsx",
... sheet_name='Sheet_name_1') # doctest: +SKIP
If you wish to write to more than one sheet in the workbook, it is
necessary to specify an ExcelWriter object:
>>> df2 = df1.copy()
>>> with pd.ExcelWriter('output.xlsx') as writer: # doctest: +SKIP
... df1.to_excel(writer, sheet_name='Sheet_name_1')
... df2.to_excel(writer, sheet_name='Sheet_name_2')
ExcelWriter can also be used to append to an existing Excel file:
>>> with pd.ExcelWriter('output.xlsx',
... mode='a') as writer: # doctest: +SKIP
... df.to_excel(writer, sheet_name='Sheet_name_3')
To set the library that is used to write the Excel file,
you can pass the `engine` keyword (the default engine is
automatically chosen depending on the file extension):
>>> df1.to_excel('output1.xlsx', engine='xlsxwriter') # doctest: +SKIP
Function05
to_feather(self, path: 'FilePathOrBuffer[AnyStr]', **kwargs) -> 'None'
Help on function to_feather in module pandas.core.frame:
to_feather(self, path: 'FilePathOrBuffer[AnyStr]', **kwargs) -> 'None'
Write a DataFrame to the binary Feather format.
Parameters
----------
path : str or file-like object
If a string, it will be used as Root Directory path.
**kwargs :
Additional keywords passed to :func:`pyarrow.feather.write_feather`.
Starting with pyarrow 0.17, this includes the `compression`,
`compression_level`, `chunksize` and `version` keywords.
.. versionadded:: 1.1.0
Function06
to_gbq(self, destination_table: 'str', project_id: 'str | None' = None, chunksize: 'int | None' = None, reauth: 'bool' = False, if_exists: 'str' = 'fail', auth_local_webserver: 'bool' = False, table_schema: 'list[dict[str, str]] | None' = None, location: 'str | None' = None, progress_bar: 'bool' = True, credentials=None) -> 'None'
Help on function to_gbq in module pandas.core.frame:
to_gbq(self, destination_table: 'str', project_id: 'str | None' = None, chunksize: 'int | None' = None, reauth: 'bool' = False, if_exists: 'str' = 'fail', auth_local_webserver: 'bool' = False, table_schema: 'list[dict[str, str]] | None' = None, location: 'str | None' = None, progress_bar: 'bool' = True, credentials=None) -> 'None'
Write a DataFrame to a Google BigQuery table.
This function requires the `pandas-gbq package
<https://pandas-gbq.readthedocs.io>`__.
See the `How to authenticate with Google BigQuery
<https://pandas-gbq.readthedocs.io/en/latest/howto/authentication.html>`__
guide for authentication instructions.
Parameters
----------
destination_table : str
Name of table to be written, in the form ``dataset.tablename``.
project_id : str, optional
Google BigQuery Account project ID. Optional when available from
the environment.
chunksize : int, optional
Number of rows to be inserted in each chunk from the dataframe.
Set to ``None`` to load the whole dataframe at once.
reauth : bool, default False
Force Google BigQuery to re-authenticate the user. This is useful
if multiple accounts are used.
if_exists : str, default 'fail'
Behavior when the destination table exists. Value can be one of:
``'fail'``
If table exists raise pandas_gbq.gbq.TableCreationError.
``'replace'``
If table exists, drop it, recreate it, and insert data.
``'append'``
If table exists, insert data. Create if does not exist.
auth_local_webserver : bool, default False
Use the `local webserver flow`_ instead of the `console flow`_
when getting user credentials.
.. _local webserver flow:
https://google-auth-oauthlib.readthedocs.io/en/latest/reference/google_auth_oauthlib.flow.html#google_auth_oauthlib.flow.InstalledAppFlow.run_local_server
.. _console flow:
https://google-auth-oauthlib.readthedocs.io/en/latest/reference/google_auth_oauthlib.flow.html#google_auth_oauthlib.flow.InstalledAppFlow.run_console
*New in version 0.2.0 of pandas-gbq*.
table_schema : list of dicts, optional
List of BigQuery table fields to which according DataFrame
columns conform to, e.g. ``[{'name': 'col1', 'type':
'STRING'},...]``. If schema is not provided, it will be
generated according to dtypes of DataFrame columns. See
BigQuery API documentation on available names of a field.
*New in version 0.3.1 of pandas-gbq*.
location : str, optional
Location where the load job should run. See the `BigQuery locations
documentation
<https://cloud.google.com/bigquery/docs/dataset-locations>`__ for a
list of available locations. The location must match that of the
target dataset.
*New in version 0.5.0 of pandas-gbq*.
progress_bar : bool, default True
Use the library `tqdm` to show the progress bar for the upload,
chunk by chunk.
*New in version 0.5.0 of pandas-gbq*.
credentials : google.auth.credentials.Credentials, optional
Credentials for accessing Google APIs. Use this parameter to
override default credentials, such as to use Compute Engine
:class:`google.auth.compute_engine.Credentials` or Service
Account :class:`google.oauth2.service_account.Credentials`
directly.
*New in version 0.8.0 of pandas-gbq*.
See Also
--------
pandas_gbq.to_gbq : This function in the pandas-gbq library.
read_gbq : Read a DataFrame from Google BigQuery.
Function07
to_hdf(self, path_or_buf, key: 'str', mode: 'str' = 'a', complevel: 'int | None' = None, complib: 'str | None' = None, append: 'bool_t' = False, format: 'str | None' = None, index: 'bool_t' = True, min_itemsize: 'int | dict[str, int] | None' = None, nan_rep=None, dropna: 'bool_t | None' = None, data_columns: 'bool_t | list[str] | None' = None, errors: 'str' = 'strict', encoding: 'str' = 'UTF-8') -> 'None'
Help on function to_hdf in module pandas.core.generic:
to_hdf(self, path_or_buf, key: 'str', mode: 'str' = 'a', complevel: 'int | None' = None, complib: 'str | None' = None, append: 'bool_t' = False, format: 'str | None' = None, index: 'bool_t' = True, min_itemsize: 'int | dict[str, int] | None' = None, nan_rep=None, dropna: 'bool_t | None' = None, data_columns: 'bool_t | list[str] | None' = None, errors: 'str' = 'strict', encoding: 'str' = 'UTF-8') -> 'None'
Write the contained data to an HDF5 file using HDFStore.
Hierarchical Data Format (HDF) is self-describing, allowing an
application to interpret the structure and contents of a file with
no outside information. One HDF file can hold a mix of related objects
which can be accessed as a group or as individual objects.
In order to add another DataFrame or Series to an existing HDF file
please use append mode and a different a key.
.. warning::
One can store a subclass of ``DataFrame`` or ``Series`` to HDF5,
but the type of the subclass is lost upon storing.
For more information see the :ref:`user guide <io.hdf5>`.
Parameters
----------
path_or_buf : str or pandas.HDFStore
File path or HDFStore object.
key : str
Identifier for the group in the store.
mode : {'a', 'w', 'r+'}, default 'a'
Mode to open file:
- 'w': write, a new file is created (an existing file with
the same name would be deleted).
- 'a': append, an existing file is opened for reading and
writing, and if the file does not exist it is created.
- 'r+': similar to 'a', but the file must already exist.
complevel : {0-9}, optional
Specifies a compression level for data.
A value of 0 disables compression.
complib : {'zlib', 'lzo', 'bzip2', 'blosc'}, default 'zlib'
Specifies the compression library to be used.
As of v0.20.2 these additional compressors for Blosc are supported
(default if no compressor specified: 'blosc:blosclz'):
{'blosc:blosclz', 'blosc:lz4', 'blosc:lz4hc', 'blosc:snappy',
'blosc:zlib', 'blosc:zstd'}.
Specifying a compression library which is not available issues
a ValueError.
append : bool, default False
For Table formats, append the input data to the existing.
format : {'fixed', 'table', None}, default 'fixed'
Possible values:
- 'fixed': Fixed format. Fast writing/reading. Not-appendable,
nor searchable.
- 'table': Table format. Write as a PyTables Table structure
which may perform worse but allow more flexible operations
like searching / selecting subsets of the data.
- If None, pd.get_option('io.hdf.default_format') is checked,
followed by fallback to "fixed"
errors : str, default 'strict'
Specifies how encoding and decoding errors are to be handled.
See the errors argument for :func:`open` for a full list
of options.
encoding : str, default "UTF-8"
min_itemsize : dict or int, optional
Map column names to minimum string sizes for columns.
nan_rep : Any, optional
How to represent null values as str.
Not allowed with append=True.
data_columns : list of columns or True, optional
List of columns to create as indexed data columns for on-disk
queries, or True to use all columns. By default only the axes
of the object are indexed. See :ref:`io.hdf5-query-data-columns`.
Applicable only to format='table'.
See Also
--------
read_hdf : Read from HDF file.
DataFrame.to_parquet : Write a DataFrame to the binary parquet format.
DataFrame.to_sql : Write to a SQL table.
DataFrame.to_feather : Write out feather-format for DataFrames.
DataFrame.to_csv : Write out to a csv file.
Examples
--------
>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]},
... index=['a', 'b', 'c'])
>>> df.to_hdf('data.h5', key='df', mode='w')
We can add another object to the same file:
>>> s = pd.Series([1, 2, 3, 4])
>>> s.to_hdf('data.h5', key='s')
Reading from HDF file:
>>> pd.read_hdf('data.h5', 'df')
A B
a 1 4
b 2 5
c 3 6
>>> pd.read_hdf('data.h5', 's')
0 1
1 2
2 3
3 4
dtype: int64
Deleting file with data:
>>> import os
>>> os.remove('data.h5')
Function08
to_html(self, buf: 'FilePathOrBuffer[str] | None' = None, columns: 'Sequence[str] | None' = None, col_space: 'ColspaceArgType | None' = None, header: 'bool | Sequence[str]' = True, index: 'bool' = True, na_rep: 'str' = 'NaN', formatters: 'FormattersType | None' = None, float_format: 'FloatFormatType | None' = None, sparsify: 'bool | None' = None, index_names: 'bool' = True, justify: 'str | None' = None, max_rows: 'int | None' = None, max_cols: 'int | None' = None, show_dimensions: 'bool | str' = False, decimal: 'str' = '.', bold_rows: 'bool' = True, classes: 'str | list | tuple | None' = None, escape: 'bool' = True, notebook: 'bool' = False, border: 'int | None' = None, table_id: 'str | None' = None, render_links: 'bool' = False, encoding: 'str | None' = None)
Help on function to_html in module pandas.core.frame:
to_html(self, buf: 'FilePathOrBuffer[str] | None' = None, columns: 'Sequence[str] | None' = None, col_space: 'ColspaceArgType | None' = None, header: 'bool | Sequence[str]' = True, index: 'bool' = True, na_rep: 'str' = 'NaN', formatters: 'FormattersType | None' = None, float_format: 'FloatFormatType | None' = None, sparsify: 'bool | None' = None, index_names: 'bool' = True, justify: 'str | None' = None, max_rows: 'int | None' = None, max_cols: 'int | None' = None, show_dimensions: 'bool | str' = False, decimal: 'str' = '.', bold_rows: 'bool' = True, classes: 'str | list | tuple | None' = None, escape: 'bool' = True, notebook: 'bool' = False, border: 'int | None' = None, table_id: 'str | None' = None, render_links: 'bool' = False, encoding: 'str | None' = None)
Render a DataFrame as an HTML table.
Parameters
----------
buf : str, Path or StringIO-like, optional, default None
Buffer to write to. If None, the output is returned as a string.
columns : sequence, optional, default None
The subset of columns to write. Writes all columns by default.
col_space : str or int, list or dict of int or str, optional
The minimum width of each column in CSS length units. An int is assumed to be px units.
.. versionadded:: 0.25.0
Ability to use str.
header : bool, optional
Whether to print column labels, default True.
index : bool, optional, default True
Whether to print index (row) labels.
na_rep : str, optional, default 'NaN'
String representation of ``NaN`` to use.
formatters : list, tuple or dict of one-param. functions, optional
Formatter functions to apply to columns' elements by position or
name.
The result of each function must be a unicode string.
List/tuple must be of length equal to the number of columns.
float_format : one-parameter function, optional, default None
Formatter function to apply to columns' elements if they are
floats. This function must return a unicode string and will be
applied only to the non-``NaN`` elements, with ``NaN`` being
handled by ``na_rep``.
.. versionchanged:: 1.2.0
sparsify : bool, optional, default True
Set to False for a DataFrame with a hierarchical index to print
every multiindex key at each row.
index_names : bool, optional, default True
Prints the names of the indexes.
justify : str, default None
How to justify the column labels. If None uses the option from
the print configuration (controlled by set_option), 'right' out
of the box. Valid values are
* left
* right
* center
* justify
* justify-all
* start
* end
* inherit
* match-parent
* initial
* unset.
max_rows : int, optional
Maximum number of rows to display in the console.
min_rows : int, optional
The number of rows to display in the console in a truncated repr
(when number of rows is above `max_rows`).
max_cols : int, optional
Maximum number of columns to display in the console.
show_dimensions : bool, default False
Display DataFrame dimensions (number of rows by number of columns).
decimal : str, default '.'
Character recognized as decimal separator, e.g. ',' in Europe.
bold_rows : bool, default True
Make the row labels bold in the output.
classes : str or list or tuple, default None
CSS class(es) to apply to the resulting html table.
escape : bool, default True
Convert the characters <, >, and & to HTML-safe sequences.
notebook : {True, False}, default False
Whether the generated HTML is for IPython Notebook.
border : int
A ``border=border`` attribute is included in the opening
`<table>` tag. Default ``pd.options.display.html.border``.
encoding : str, default "utf-8"
Set character encoding.
.. versionadded:: 1.0
table_id : str, optional
A css id is included in the opening `<table>` tag if specified.
render_links : bool, default False
Convert URLs to HTML links.
Returns
-------
str or None
If buf is None, returns the result as a string. Otherwise returns
None.
See Also
--------
to_string : Convert DataFrame to a string.
Function09
to_json(self, path_or_buf: 'FilePathOrBuffer | None' = None, orient: 'str | None' = None, date_format: 'str | None' = None, double_precision: 'int' = 10, force_ascii: 'bool_t' = True, date_unit: 'str' = 'ms', default_handler: 'Callable[[Any], JSONSerializable] | None' = None, lines: 'bool_t' = False, compression: 'CompressionOptions' = 'infer', index: 'bool_t' = True, indent: 'int | None' = None, storage_options: 'StorageOptions' = None) -> 'str | None'
Help on function to_json in module pandas.core.generic:
to_json(self, path_or_buf: 'FilePathOrBuffer | None' = None, orient: 'str | None' = None, date_format: 'str | None' = None, double_precision: 'int' = 10, force_ascii: 'bool_t' = True, date_unit: 'str' = 'ms', default_handler: 'Callable[[Any], JSONSerializable] | None' = None, lines: 'bool_t' = False, compression: 'CompressionOptions' = 'infer', index: 'bool_t' = True, indent: 'int | None' = None, storage_options: 'StorageOptions' = None) -> 'str | None'
Convert the object to a JSON string.
Note NaN's and None will be converted to null and datetime objects
will be converted to UNIX timestamps.
Parameters
----------
path_or_buf : str or file handle, optional
File path or object. If not specified, the result is returned as
a string.
orient : str
Indication of expected JSON string format.
* Series:
- default is 'index'
- allowed values are: {'split', 'records', 'index', 'table'}.
* DataFrame:
- default is 'columns'
- allowed values are: {'split', 'records', 'index', 'columns',
'values', 'table'}.
* The format of the JSON string:
- 'split' : dict like {'index' -> [index], 'columns' -> [columns],
'data' -> [values]}
- 'records' : list like [{column -> value}, ... , {column -> value}]
- 'index' : dict like {index -> {column -> value}}
- 'columns' : dict like {column -> {index -> value}}
- 'values' : just the values array
- 'table' : dict like {'schema': {schema}, 'data': {data}}
Describing the data, where data component is like ``orient='records'``.
date_format : {None, 'epoch', 'iso'}
Type of date conversion. 'epoch' = epoch milliseconds,
'iso' = ISO8601. The default depends on the `orient`. For
``orient='table'``, the default is 'iso'. For all other orients,
the default is 'epoch'.
double_precision : int, default 10
The number of decimal places to use when encoding
floating point values.
force_ascii : bool, default True
Force encoded string to be ASCII.
date_unit : str, default 'ms' (milliseconds)
The time unit to encode to, governs timestamp and ISO8601
precision. One of 's', 'ms', 'us', 'ns' for second, millisecond,
microsecond, and nanosecond respectively.
default_handler : callable, default None
Handler to call if object cannot otherwise be converted to a
suitable format for JSON. Should receive a single argument which is
the object to convert and return a serialisable object.
lines : bool, default False
If 'orient' is 'records' write out line-delimited json format. Will
throw ValueError if incorrect 'orient' since others are not
list-like.
compression : {'infer', 'gzip', 'bz2', 'zip', 'xz', None}
A string representing the compression to use in the output file,
only used when the first argument is a filename. By default, the
compression is inferred from the filename.
index : bool, default True
Whether to include the index values in the JSON string. Not
including the index (``index=False``) is only supported when
orient is 'split' or 'table'.
indent : int, optional
Length of whitespace used to indent each record.
.. versionadded:: 1.0.0
storage_options : dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to ``urllib`` as header options. For other URLs (e.g.
starting with "s3://", and "gcs://") the key-value pairs are forwarded to
``fsspec``. Please see ``fsspec`` and ``urllib`` for more details.
.. versionadded:: 1.2.0
Returns
-------
None or str
If path_or_buf is None, returns the resulting json format as a
string. Otherwise returns None.
See Also
--------
read_json : Convert a JSON string to pandas object.
Notes
-----
The behavior of ``indent=0`` varies from the stdlib, which does not
indent the output but does insert newlines. Currently, ``indent=0``
and the default ``indent=None`` are equivalent in pandas, though this
may change in a future release.
``orient='table'`` contains a 'pandas_version' field under 'schema'.
This stores the version of `pandas` used in the latest revision of the
schema.
Examples
--------
>>> import json
>>> df = pd.DataFrame(
... [["a", "b"], ["c", "d"]],
... index=["row 1", "row 2"],
... columns=["col 1", "col 2"],
... )
>>> result = df.to_json(orient="split")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4) # doctest: +SKIP
{
"columns": [
"col 1",
"col 2"
],
"index": [
"row 1",
"row 2"
],
"data": [
[
"a",
"b"
],
[
"c",
"d"
]
]
}
Encoding/decoding a Dataframe using ``'records'`` formatted JSON.
Note that index labels are not preserved with this encoding.
>>> result = df.to_json(orient="records")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4) # doctest: +SKIP
[
{
"col 1": "a",
"col 2": "b"
},
{
"col 1": "c",
"col 2": "d"
}
]
Encoding/decoding a Dataframe using ``'index'`` formatted JSON:
>>> result = df.to_json(orient="index")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4) # doctest: +SKIP
{
"row 1": {
"col 1": "a",
"col 2": "b"
},
"row 2": {
"col 1": "c",
"col 2": "d"
}
}
Encoding/decoding a Dataframe using ``'columns'`` formatted JSON:
>>> result = df.to_json(orient="columns")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4) # doctest: +SKIP
{
"col 1": {
"row 1": "a",
"row 2": "c"
},
"col 2": {
"row 1": "b",
"row 2": "d"
}
}
Encoding/decoding a Dataframe using ``'values'`` formatted JSON:
>>> result = df.to_json(orient="values")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4) # doctest: +SKIP
[
[
"a",
"b"
],
[
"c",
"d"
]
]
Encoding with Table Schema:
>>> result = df.to_json(orient="table")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4) # doctest: +SKIP
{
"schema": {
"fields": [
{
"name": "index",
"type": "string"
},
{
"name": "col 1",
"type": "string"
},
{
"name": "col 2",
"type": "string"
}
],
"primaryKey": [
"index"
],
"pandas_version": "0.20.0"
},
"data": [
{
"index": "row 1",
"col 1": "a",
"col 2": "b"
},
{
"index": "row 2",
"col 1": "c",
"col 2": "d"
}
]
}
Function10
to_latex(self, buf=None, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, bold_rows=False, column_format=None, longtable=None, escape=None, encoding=None, decimal='.', multicolumn=None, multicolumn_format=None, multirow=None, caption=None, label=None, position=None)
Help on function to_latex in module pandas.core.generic:
to_latex(self, buf=None, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, bold_rows=False, column_format=None, longtable=None, escape=None, encoding=None, decimal='.', multicolumn=None, multicolumn_format=None, multirow=None, caption=None, label=None, position=None)
Render object to a LaTeX tabular, longtable, or nested table/tabular.
Requires ``\usepackage{booktabs}``. The output can be copy/pasted
into a main LaTeX document or read from an external file
with ``\input{table.tex}``.
.. versionchanged:: 1.0.0
Added caption and label arguments.
.. versionchanged:: 1.2.0
Added position argument, changed meaning of caption argument.
Parameters
----------
buf : str, Path or StringIO-like, optional, default None
Buffer to write to. If None, the output is returned as a string.
columns : list of label, optional
The subset of columns to write. Writes all columns by default.
col_space : int, optional
The minimum width of each column.
header : bool or list of str, default True
Write out the column names. If a list of strings is given,
it is assumed to be aliases for the column names.
index : bool, default True
Write row names (index).
na_rep : str, default 'NaN'
Missing data representation.
formatters : list of functions or dict of {str: function}, optional
Formatter functions to apply to columns' elements by position or
name. The result of each function must be a unicode string.
List must be of length equal to the number of columns.
float_format : one-parameter function or str, optional, default None
Formatter for floating point numbers. For example
``float_format="%.2f"`` and ``float_format="{:0.2f}".format`` will
both result in 0.1234 being formatted as 0.12.
sparsify : bool, optional
Set to False for a DataFrame with a hierarchical index to print
every multiindex key at each row. By default, the value will be
read from the config module.
index_names : bool, default True
Prints the names of the indexes.
bold_rows : bool, default False
Make the row labels bold in the output.
column_format : str, optional
The columns format as specified in `LaTeX table format
<https://en.wikibooks.org/wiki/LaTeX/Tables>`__ e.g. 'rcl' for 3
columns. By default, 'l' will be used for all columns except
columns of numbers, which default to 'r'.
longtable : bool, optional
By default, the value will be read from the pandas config
module. Use a longtable environment instead of tabular. Requires
adding a \usepackage{longtable} to your LaTeX preamble.
escape : bool, optional
By default, the value will be read from the pandas config
module. When set to False prevents from escaping latex special
characters in column names.
encoding : str, optional
A string representing the encoding to use in the output file,
defaults to 'utf-8'.
decimal : str, default '.'
Character recognized as decimal separator, e.g. ',' in Europe.
multicolumn : bool, default True
Use \multicolumn to enhance MultiIndex columns.
The default will be read from the config module.
multicolumn_format : str, default 'l'
The alignment for multicolumns, similar to `column_format`
The default will be read from the config module.
multirow : bool, default False
Use \multirow to enhance MultiIndex rows. Requires adding a
\usepackage{multirow} to your LaTeX preamble. Will print
centered labels (instead of top-aligned) across the contained
rows, separating groups via clines. The default will be read
from the pandas config module.
caption : str or tuple, optional
Tuple (full_caption, short_caption),
which results in ``\caption[short_caption]{full_caption}``;
if a single string is passed, no short caption will be set.
.. versionadded:: 1.0.0
.. versionchanged:: 1.2.0
Optionally allow caption to be a tuple ``(full_caption, short_caption)``.
label : str, optional
The LaTeX label to be placed inside ``\label{}`` in the output.
This is used with ``\ref{}`` in the main ``.tex`` file.
.. versionadded:: 1.0.0
position : str, optional
The LaTeX positional argument for tables, to be placed after
``\begin{}`` in the output.
.. versionadded:: 1.2.0
Returns
-------
str or None
If buf is None, returns the result as a string. Otherwise returns
None.
See Also
--------
DataFrame.to_string : Render a DataFrame to a console-friendly
tabular output.
DataFrame.to_html : Render a DataFrame as an HTML table.
Examples
--------
>>> df = pd.DataFrame(dict(name=['Raphael', 'Donatello'],
... mask=['red', 'purple'],
... weapon=['sai', 'bo staff']))
>>> print(df.to_latex(index=False)) # doctest: +NORMALIZE_WHITESPACE
\begin{tabular}{lll}
\toprule
name & mask & weapon \\
\midrule
Raphael & red & sai \\
Donatello & purple & bo staff \\
\bottomrule
\end{tabular}
Function11
to_markdown(self, buf: 'IO[str] | str | None' = None, mode: 'str' = 'wt', index: 'bool' = True, storage_options: 'StorageOptions' = None, **kwargs) -> 'str | None'
Help on function to_markdown in module pandas.core.frame:
to_markdown(self, buf: 'IO[str] | str | None' = None, mode: 'str' = 'wt', index: 'bool' = True, storage_options: 'StorageOptions' = None, **kwargs) -> 'str | None'
Print DataFrame in Markdown-friendly format.
.. versionadded:: 1.0.0
Parameters
----------
buf : str, Path or StringIO-like, optional, default None
Buffer to write to. If None, the output is returned as a string.
mode : str, optional
Mode in which file is opened, "wt" by default.
index : bool, optional, default True
Add index (row) labels.
.. versionadded:: 1.1.0
storage_options : dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to ``urllib`` as header options. For other URLs (e.g.
starting with "s3://", and "gcs://") the key-value pairs are forwarded to
``fsspec``. Please see ``fsspec`` and ``urllib`` for more details.
.. versionadded:: 1.2.0
**kwargs
These parameters will be passed to `tabulate <https://pypi.org/project/tabulate>`_.
Returns
-------
str
DataFrame in Markdown-friendly format.
Notes
-----
Requires the `tabulate <https://pypi.org/project/tabulate>`_ package.
Examples
--------
>>> s = pd.Series(["elk", "pig", "dog", "quetzal"], name="animal")
>>> print(s.to_markdown())
| | animal |
|---:|:---------|
| 0 | elk |
| 1 | pig |
| 2 | dog |
| 3 | quetzal |
Output markdown with a tabulate option.
>>> print(s.to_markdown(tablefmt="grid"))
+----+----------+
| | animal |
+====+==========+
| 0 | elk |
+----+----------+
| 1 | pig |
+----+----------+
| 2 | dog |
+----+----------+
| 3 | quetzal |
+----+----------+
Function12
to_numpy(self, dtype: 'NpDtype | None' = None, copy: 'bool' = False, na_value=<no_default>) -> 'np.ndarray'
Help on function to_numpy in module pandas.core.frame:
to_numpy(self, dtype: 'NpDtype | None' = None, copy: 'bool' = False, na_value=<no_default>) -> 'np.ndarray'
Convert the DataFrame to a NumPy array.
By default, the dtype of the returned array will be the common NumPy
dtype of all types in the DataFrame. For example, if the dtypes are
``float16`` and ``float32``, the results dtype will be ``float32``.
This may require copying data and coercing values, which may be
expensive.
Parameters
----------
dtype : str or numpy.dtype, optional
The dtype to pass to :meth:`numpy.asarray`.
copy : bool, default False
Whether to ensure that the returned value is not a view on
another array. Note that ``copy=False`` does not *ensure* that
``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that
a copy is made, even if not strictly necessary.
na_value : Any, optional
The value to use for missing values. The default value depends
on `dtype` and the dtypes of the DataFrame columns.
.. versionadded:: 1.1.0
Returns
-------
numpy.ndarray
See Also
--------
Series.to_numpy : Similar method for Series.
Examples
--------
>>> pd.DataFrame({"A": [1, 2], "B": [3, 4]}).to_numpy()
array([[1, 3],
[2, 4]])
With heterogeneous data, the lowest common type will have to
be used.
>>> df = pd.DataFrame({"A": [1, 2], "B": [3.0, 4.5]})
>>> df.to_numpy()
array([[1. , 3. ],
[2. , 4.5]])
For a mix of numeric and non-numeric types, the output array will
have object dtype.
>>> df['C'] = pd.date_range('2000', periods=2)
>>> df.to_numpy()
array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],
[2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)
Function13
to_parquet(self, path: 'FilePathOrBuffer | None' = None, engine: 'str' = 'auto', compression: 'str | None' = 'snappy', index: 'bool | None' = None, partition_cols: 'list[str] | None' = None, storage_options: 'StorageOptions' = None, **kwargs) -> 'bytes | None'
Help on function to_parquet in module pandas.core.frame:
to_parquet(self, path: 'FilePathOrBuffer | None' = None, engine: 'str' = 'auto', compression: 'str | None' = 'snappy', index: 'bool | None' = None, partition_cols: 'list[str] | None' = None, storage_options: 'StorageOptions' = None, **kwargs) -> 'bytes | None'
Write a DataFrame to the binary parquet format.
This function writes the dataframe as a `parquet file
<https://parquet.apache.org/>`_. You can choose different parquet
backends, and have the option of compression. See
:ref:`the user guide <io.parquet>` for more details.
Parameters
----------
path : str or file-like object, default None
If a string, it will be used as Root Directory path
when writing a partitioned dataset. By file-like object,
we refer to objects with a write() method, such as a file handle
(e.g. via builtin open function) or io.BytesIO. The engine
fastparquet does not accept file-like objects. If path is None,
a bytes object is returned.
.. versionchanged:: 1.2.0
Previously this was "fname"
engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'
Parquet library to use. If 'auto', then the option
``io.parquet.engine`` is used. The default ``io.parquet.engine``
behavior is to try 'pyarrow', falling back to 'fastparquet' if
'pyarrow' is unavailable.
compression : {'snappy', 'gzip', 'brotli', None}, default 'snappy'
Name of the compression to use. Use ``None`` for no compression.
index : bool, default None
If ``True``, include the dataframe's index(es) in the file output.
If ``False``, they will not be written to the file.
If ``None``, similar to ``True`` the dataframe's index(es)
will be saved. However, instead of being saved as values,
the RangeIndex will be stored as a range in the metadata so it
doesn't require much space and is faster. Other indexes will
be included as columns in the file output.
partition_cols : list, optional, default None
Column names by which to partition the dataset.
Columns are partitioned in the order they are given.
Must be None if path is not a string.
storage_options : dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to ``urllib`` as header options. For other URLs (e.g.
starting with "s3://", and "gcs://") the key-value pairs are forwarded to
``fsspec``. Please see ``fsspec`` and ``urllib`` for more details.
.. versionadded:: 1.2.0
**kwargs
Additional arguments passed to the parquet library. See
:ref:`pandas io <io.parquet>` for more details.
Returns
-------
bytes if no path argument is provided else None
See Also
--------
read_parquet : Read a parquet file.
DataFrame.to_csv : Write a csv file.
DataFrame.to_sql : Write to a sql table.
DataFrame.to_hdf : Write to hdf.
Notes
-----
This function requires either the `fastparquet
<https://pypi.org/project/fastparquet>`_ or `pyarrow
<https://arrow.apache.org/docs/python/>`_ library.
Examples
--------
>>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
>>> df.to_parquet('df.parquet.gzip',
... compression='gzip') # doctest: +SKIP
>>> pd.read_parquet('df.parquet.gzip') # doctest: +SKIP
col1 col2
0 1 3
1 2 4
If you want to get a buffer to the parquet content you can use a io.BytesIO
object, as long as you don't use partition_cols, which creates multiple files.
>>> import io
>>> f = io.BytesIO()
>>> df.to_parquet(f)
>>> f.seek(0)
0
>>> content = f.read()
Function14
to_period(self, freq: 'Frequency | None' = None, axis: 'Axis' = 0, copy: 'bool' = True) -> 'DataFrame'
Help on function to_period in module pandas.core.frame:
to_period(self, freq: 'Frequency | None' = None, axis: 'Axis' = 0, copy: 'bool' = True) -> 'DataFrame'
Convert DataFrame from DatetimeIndex to PeriodIndex.
Convert DataFrame from DatetimeIndex to PeriodIndex with desired
frequency (inferred from index if not passed).
Parameters
----------
freq : str, default
Frequency of the PeriodIndex.
axis : {0 or 'index', 1 or 'columns'}, default 0
The axis to convert (the index by default).
copy : bool, default True
If False then underlying input data is not copied.
Returns
-------
DataFrame with PeriodIndex
Function15
to_pickle(self, path, compression: 'CompressionOptions' = 'infer', protocol: 'int' = 5, storage_options: 'StorageOptions' = None) -> 'None'
Help on function to_pickle in module pandas.core.generic:
to_pickle(self, path, compression: 'CompressionOptions' = 'infer', protocol: 'int' = 5, storage_options: 'StorageOptions' = None) -> 'None'
Pickle (serialize) object to file.
Parameters
----------
path : str
File path where the pickled object will be stored.
compression : {'infer', 'gzip', 'bz2', 'zip', 'xz', None}, default 'infer'
A string representing the compression to use in the output file. By
default, infers from the file extension in specified path.
Compression mode may be any of the following possible
values: {¡®infer¡¯, ¡®gzip¡¯, ¡®bz2¡¯, ¡®zip¡¯, ¡®xz¡¯, None}. If compression
mode is ¡®infer¡¯ and path_or_buf is path-like, then detect
compression mode from the following extensions:
¡®.gz¡¯, ¡®.bz2¡¯, ¡®.zip¡¯ or ¡®.xz¡¯. (otherwise no compression).
If dict given and mode is ¡®zip¡¯ or inferred as ¡®zip¡¯, other entries
passed as additional compression options.
protocol : int
Int which indicates which protocol should be used by the pickler,
default HIGHEST_PROTOCOL (see [1]_ paragraph 12.1.2). The possible
values are 0, 1, 2, 3, 4, 5. A negative value for the protocol
parameter is equivalent to setting its value to HIGHEST_PROTOCOL.
.. [1] https://docs.python.org/3/library/pickle.html.
storage_options : dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to ``urllib`` as header options. For other URLs (e.g.
starting with "s3://", and "gcs://") the key-value pairs are forwarded to
``fsspec``. Please see ``fsspec`` and ``urllib`` for more details.
.. versionadded:: 1.2.0
See Also
--------
read_pickle : Load pickled pandas object (or any object) from file.
DataFrame.to_hdf : Write DataFrame to an HDF5 file.
DataFrame.to_sql : Write DataFrame to a SQL database.
DataFrame.to_parquet : Write a DataFrame to the binary parquet format.
Examples
--------
>>> original_df = pd.DataFrame({"foo": range(5), "bar": range(5, 10)})
>>> original_df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
>>> original_df.to_pickle("./dummy.pkl")
>>> unpickled_df = pd.read_pickle("./dummy.pkl")
>>> unpickled_df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
>>> import os
>>> os.remove("./dummy.pkl")
Function16
to_records(self, index=True, column_dtypes=None, index_dtypes=None) -> 'np.recarray'
Help on function to_records in module pandas.core.frame:
to_records(self, index=True, column_dtypes=None, index_dtypes=None) -> 'np.recarray'
Convert DataFrame to a NumPy record array.
Index will be included as the first field of the record array if
requested.
Parameters
----------
index : bool, default True
Include index in resulting record array, stored in 'index'
field or using the index label, if set.
column_dtypes : str, type, dict, default None
If a string or type, the data type to store all columns. If
a dictionary, a mapping of column names and indices (zero-indexed)
to specific data types.
index_dtypes : str, type, dict, default None
If a string or type, the data type to store all index levels. If
a dictionary, a mapping of index level names and indices
(zero-indexed) to specific data types.
This mapping is applied only if `index=True`.
Returns
-------
numpy.recarray
NumPy ndarray with the DataFrame labels as fields and each row
of the DataFrame as entries.
See Also
--------
DataFrame.from_records: Convert structured or record ndarray
to DataFrame.
numpy.recarray: An ndarray that allows field access using
attributes, analogous to typed columns in a
spreadsheet.
Examples
--------
>>> df = pd.DataFrame({'A': [1, 2], 'B': [0.5, 0.75]},
... index=['a', 'b'])
>>> df
A B
a 1 0.50
b 2 0.75
>>> df.to_records()
rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
dtype=[('index', 'O'), ('A', '<i8'), ('B', '<f8')])
If the DataFrame index has no label then the recarray field name
is set to 'index'. If the index has a label then this is used as the
field name:
>>> df.index = df.index.rename("I")
>>> df.to_records()
rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
dtype=[('I', 'O'), ('A', '<i8'), ('B', '<f8')])
The index can be excluded from the record array:
>>> df.to_records(index=False)
rec.array([(1, 0.5 ), (2, 0.75)],
dtype=[('A', '<i8'), ('B', '<f8')])
Data types can be specified for the columns:
>>> df.to_records(column_dtypes={"A": "int32"})
rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
dtype=[('I', 'O'), ('A', '<i4'), ('B', '<f8')])
As well as for the index:
>>> df.to_records(index_dtypes="<S2")
rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],
dtype=[('I', 'S2'), ('A', '<i8'), ('B', '<f8')])
>>> index_dtypes = f"<S{df.index.str.len().max()}"
>>> df.to_records(index_dtypes=index_dtypes)
rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],
dtype=[('I', 'S1'), ('A', '<i8'), ('B', '<f8')])
Function17
to_sql(self, name: 'str', con, schema=None, if_exists: 'str' = 'fail', index: 'bool_t' = True, index_label=None, chunksize=None, dtype: 'DtypeArg | None' = None, method=None) -> 'None'
Help on function to_sql in module pandas.core.generic:
to_sql(self, name: 'str', con, schema=None, if_exists: 'str' = 'fail', index: 'bool_t' = True, index_label=None, chunksize=None, dtype: 'DtypeArg | None' = None, method=None) -> 'None'
Write records stored in a DataFrame to a SQL database.
Databases supported by SQLAlchemy [1]_ are supported. Tables can be
newly created, appended to, or overwritten.
Parameters
----------
name : str
Name of SQL table.
con : sqlalchemy.engine.(Engine or Connection) or sqlite3.Connection
Using SQLAlchemy makes it possible to use any DB supported by that
library. Legacy support is provided for sqlite3.Connection objects. The user
is responsible for engine disposal and connection closure for the SQLAlchemy
connectable See `here <https://docs.sqlalchemy.org/en/13/core/connections.html>`_.
schema : str, optional
Specify the schema (if database flavor supports this). If None, use
default schema.
if_exists : {'fail', 'replace', 'append'}, default 'fail'
How to behave if the table already exists.
* fail: Raise a ValueError.
* replace: Drop the table before inserting new values.
* append: Insert new values to the existing table.
index : bool, default True
Write DataFrame index as a column. Uses `index_label` as the column
name in the table.
index_label : str or sequence, default None
Column label for index column(s). If None is given (default) and
`index` is True, then the index names are used.
A sequence should be given if the DataFrame uses MultiIndex.
chunksize : int, optional
Specify the number of rows in each batch to be written at a time.
By default, all rows will be written at once.
dtype : dict or scalar, optional
Specifying the datatype for columns. If a dictionary is used, the
keys should be the column names and the values should be the
SQLAlchemy types or strings for the sqlite3 legacy mode. If a
scalar is provided, it will be applied to all columns.
method : {None, 'multi', callable}, optional
Controls the SQL insertion clause used:
* None : Uses standard SQL ``INSERT`` clause (one per row).
* 'multi': Pass multiple values in a single ``INSERT`` clause.
* callable with signature ``(pd_table, conn, keys, data_iter)``.
Details and a sample callable implementation can be found in the
section :ref:`insert method <io.sql.method>`.
Raises
------
ValueError
When the table already exists and `if_exists` is 'fail' (the
default).
See Also
--------
read_sql : Read a DataFrame from a table.
Notes
-----
Timezone aware datetime columns will be written as
``Timestamp with timezone`` type with SQLAlchemy if supported by the
database. Otherwise, the datetimes will be stored as timezone unaware
timestamps local to the original timezone.
References
----------
.. [1] https://docs.sqlalchemy.org
.. [2] https://www.python.org/dev/peps/pep-0249/
Examples
--------
Create an in-memory SQLite database.
>>> from sqlalchemy import create_engine
>>> engine = create_engine('sqlite://', echo=False)
Create a table from scratch with 3 rows.
>>> df = pd.DataFrame({'name' : ['User 1', 'User 2', 'User 3']})
>>> df
name
0 User 1
1 User 2
2 User 3
>>> df.to_sql('users', con=engine)
>>> engine.execute("SELECT * FROM users").fetchall()
[(0, 'User 1'), (1, 'User 2'), (2, 'User 3')]
An `sqlalchemy.engine.Connection` can also be passed to `con`:
>>> with engine.begin() as connection:
... df1 = pd.DataFrame({'name' : ['User 4', 'User 5']})
... df1.to_sql('users', con=connection, if_exists='append')
This is allowed to support operations that require that the same
DBAPI connection is used for the entire operation.
>>> df2 = pd.DataFrame({'name' : ['User 6', 'User 7']})
>>> df2.to_sql('users', con=engine, if_exists='append')
>>> engine.execute("SELECT * FROM users").fetchall()
[(0, 'User 1'), (1, 'User 2'), (2, 'User 3'),
(0, 'User 4'), (1, 'User 5'), (0, 'User 6'),
(1, 'User 7')]
Overwrite the table with just ``df2``.
>>> df2.to_sql('users', con=engine, if_exists='replace',
... index_label='id')
>>> engine.execute("SELECT * FROM users").fetchall()
[(0, 'User 6'), (1, 'User 7')]
Specify the dtype (especially useful for integers with missing values).
Notice that while pandas is forced to store the data as floating point,
the database supports nullable integers. When fetching the data with
Python, we get back integer scalars.
>>> df = pd.DataFrame({"A": [1, None, 2]})
>>> df
A
0 1.0
1 NaN
2 2.0
>>> from sqlalchemy.types import Integer
>>> df.to_sql('integers', con=engine, index=False,
... dtype={"A": Integer()})
>>> engine.execute("SELECT * FROM integers").fetchall()
[(1,), (None,), (2,)]
Function18
to_stata(self, path: 'FilePathOrBuffer', convert_dates: 'dict[Hashable, str] | None' = None, write_index: 'bool' = True, byteorder: 'str | None' = None, time_stamp: 'datetime.datetime | None' = None, data_label: 'str | None' = None, variable_labels: 'dict[Hashable, str] | None' = None, version: 'int | None' = 114, convert_strl: 'Sequence[Hashable] | None' = None, compression: 'CompressionOptions' = 'infer', storage_options: 'StorageOptions' = None) -> 'None'
Help on function to_stata in module pandas.core.frame:
to_stata(self, path: 'FilePathOrBuffer', convert_dates: 'dict[Hashable, str] | None' = None, write_index: 'bool' = True, byteorder: 'str | None' = None, time_stamp: 'datetime.datetime | None' = None, data_label: 'str | None' = None, variable_labels: 'dict[Hashable, str] | None' = None, version: 'int | None' = 114, convert_strl: 'Sequence[Hashable] | None' = None, compression: 'CompressionOptions' = 'infer', storage_options: 'StorageOptions' = None) -> 'None'
Export DataFrame object to Stata dta format.
Writes the DataFrame to a Stata dataset file.
"dta" files contain a Stata dataset.
Parameters
----------
path : str, buffer or path object
String, path object (pathlib.Path or py._path.local.LocalPath) or
object implementing a binary write() function. If using a buffer
then the buffer will not be automatically closed after the file
data has been written.
.. versionchanged:: 1.0.0
Previously this was "fname"
convert_dates : dict
Dictionary mapping columns containing datetime types to stata
internal format to use when writing the dates. Options are 'tc',
'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer
or a name. Datetime columns that do not have a conversion type
specified will be converted to 'tc'. Raises NotImplementedError if
a datetime column has timezone information.
write_index : bool
Write the index to Stata dataset.
byteorder : str
Can be ">", "<", "little", or "big". default is `sys.byteorder`.
time_stamp : datetime
A datetime to use as file creation date. Default is the current
time.
data_label : str, optional
A label for the data set. Must be 80 characters or smaller.
variable_labels : dict
Dictionary containing columns as keys and variable labels as
values. Each label must be 80 characters or smaller.
version : {114, 117, 118, 119, None}, default 114
Version to use in the output dta file. Set to None to let pandas
decide between 118 or 119 formats depending on the number of
columns in the frame. Version 114 can be read by Stata 10 and
later. Version 117 can be read by Stata 13 or later. Version 118
is supported in Stata 14 and later. Version 119 is supported in
Stata 15 and later. Version 114 limits string variables to 244
characters or fewer while versions 117 and later allow strings
with lengths up to 2,000,000 characters. Versions 118 and 119
support Unicode characters, and version 119 supports more than
32,767 variables.
Version 119 should usually only be used when the number of
variables exceeds the capacity of dta format 118. Exporting
smaller datasets in format 119 may have unintended consequences,
and, as of November 2020, Stata SE cannot read version 119 files.
.. versionchanged:: 1.0.0
Added support for formats 118 and 119.
convert_strl : list, optional
List of column names to convert to string columns to Stata StrL
format. Only available if version is 117. Storing strings in the
StrL format can produce smaller dta files if strings have more than
8 characters and values are repeated.
compression : str or dict, default 'infer'
For on-the-fly compression of the output dta. If string, specifies
compression mode. If dict, value at key 'method' specifies
compression mode. Compression mode must be one of {'infer', 'gzip',
'bz2', 'zip', 'xz', None}. If compression mode is 'infer' and
`fname` is path-like, then detect compression from the following
extensions: '.gz', '.bz2', '.zip', or '.xz' (otherwise no
compression). If dict and compression mode is one of {'zip',
'gzip', 'bz2'}, or inferred as one of the above, other entries
passed as additional compression options.
.. versionadded:: 1.1.0
storage_options : dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to ``urllib`` as header options. For other URLs (e.g.
starting with "s3://", and "gcs://") the key-value pairs are forwarded to
``fsspec``. Please see ``fsspec`` and ``urllib`` for more details.
.. versionadded:: 1.2.0
Raises
------
NotImplementedError
* If datetimes contain timezone information
* Column dtype is not representable in Stata
ValueError
* Columns listed in convert_dates are neither datetime64[ns]
or datetime.datetime
* Column listed in convert_dates is not in DataFrame
* Categorical label contains more than 32,000 characters
See Also
--------
read_stata : Import Stata data files.
io.stata.StataWriter : Low-level writer for Stata data files.
io.stata.StataWriter117 : Low-level writer for version 117 files.
Examples
--------
>>> df = pd.DataFrame({'animal': ['falcon', 'parrot', 'falcon',
... 'parrot'],
... 'speed': [350, 18, 361, 15]})
>>> df.to_stata('animals.dta') # doctest: +SKIP
Function19
to_string(self, buf: 'FilePathOrBuffer[str] | None' = None, columns: 'Sequence[str] | None' = None, col_space: 'int | None' = None, header: 'bool | Sequence[str]' = True, index: 'bool' = True, na_rep: 'str' = 'NaN', formatters: 'fmt.FormattersType | None' = None, float_format: 'fmt.FloatFormatType | None' = None, sparsify: 'bool | None' = None, index_names: 'bool' = True, justify: 'str | None' = None, max_rows: 'int | None' = None, min_rows: 'int | None' = None, max_cols: 'int | None' = None, show_dimensions: 'bool' = False, decimal: 'str' = '.', line_width: 'int | None' = None, max_colwidth: 'int | None' = None, encoding: 'str | None' = None) -> 'str | None'
Help on function to_string in module pandas.core.frame:
to_string(self, buf: 'FilePathOrBuffer[str] | None' = None, columns: 'Sequence[str] | None' = None, col_space: 'int | None' = None, header: 'bool | Sequence[str]' = True, index: 'bool' = True, na_rep: 'str' = 'NaN', formatters: 'fmt.FormattersType | None' = None, float_format: 'fmt.FloatFormatType | None' = None, sparsify: 'bool | None' = None, index_names: 'bool' = True, justify: 'str | None' = None, max_rows: 'int | None' = None, min_rows: 'int | None' = None, max_cols: 'int | None' = None, show_dimensions: 'bool' = False, decimal: 'str' = '.', line_width: 'int | None' = None, max_colwidth: 'int | None' = None, encoding: 'str | None' = None) -> 'str | None'
Render a DataFrame to a console-friendly tabular output.
Parameters
----------
buf : str, Path or StringIO-like, optional, default None
Buffer to write to. If None, the output is returned as a string.
columns : sequence, optional, default None
The subset of columns to write. Writes all columns by default.
col_space : int, list or dict of int, optional
The minimum width of each column.
header : bool or sequence, optional
Write out the column names. If a list of strings is given, it is assumed to be aliases for the column names.
index : bool, optional, default True
Whether to print index (row) labels.
na_rep : str, optional, default 'NaN'
String representation of ``NaN`` to use.
formatters : list, tuple or dict of one-param. functions, optional
Formatter functions to apply to columns' elements by position or
name.
The result of each function must be a unicode string.
List/tuple must be of length equal to the number of columns.
float_format : one-parameter function, optional, default None
Formatter function to apply to columns' elements if they are
floats. This function must return a unicode string and will be
applied only to the non-``NaN`` elements, with ``NaN`` being
handled by ``na_rep``.
.. versionchanged:: 1.2.0
sparsify : bool, optional, default True
Set to False for a DataFrame with a hierarchical index to print
every multiindex key at each row.
index_names : bool, optional, default True
Prints the names of the indexes.
justify : str, default None
How to justify the column labels. If None uses the option from
the print configuration (controlled by set_option), 'right' out
of the box. Valid values are
* left
* right
* center
* justify
* justify-all
* start
* end
* inherit
* match-parent
* initial
* unset.
max_rows : int, optional
Maximum number of rows to display in the console.
min_rows : int, optional
The number of rows to display in the console in a truncated repr
(when number of rows is above `max_rows`).
max_cols : int, optional
Maximum number of columns to display in the console.
show_dimensions : bool, default False
Display DataFrame dimensions (number of rows by number of columns).
decimal : str, default '.'
Character recognized as decimal separator, e.g. ',' in Europe.
line_width : int, optional
Width to wrap a line in characters.
max_colwidth : int, optional
Max width to truncate each column in characters. By default, no limit.
.. versionadded:: 1.0.0
encoding : str, default "utf-8"
Set character encoding.
.. versionadded:: 1.0
Returns
-------
str or None
If buf is None, returns the result as a string. Otherwise returns
None.
See Also
--------
to_html : Convert DataFrame to HTML.
Examples
--------
>>> d = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
>>> df = pd.DataFrame(d)
>>> print(df.to_string())
col1 col2
0 1 4
1 2 5
2 3 6
Function20
to_timestamp(self, freq: 'Frequency | None' = None, how: 'str' = 'start', axis: 'Axis' = 0, copy: 'bool' = True) -> 'DataFrame'
Help on function to_timestamp in module pandas.core.frame:
to_timestamp(self, freq: 'Frequency | None' = None, how: 'str' = 'start', axis: 'Axis' = 0, copy: 'bool' = True) -> 'DataFrame'
Cast to DatetimeIndex of timestamps, at *beginning* of period.
Parameters
----------
freq : str, default frequency of PeriodIndex
Desired frequency.
how : {'s', 'e', 'start', 'end'}
Convention for converting period to timestamp; start of period
vs. end.
axis : {0 or 'index', 1 or 'columns'}, default 0
The axis to convert (the index by default).
copy : bool, default True
If False then underlying input data is not copied.
Returns
-------
DataFrame with DatetimeIndex
Function21
to_xarray(self)
Help on function to_xarray in module pandas.core.generic:
to_xarray(self)
Return an xarray object from the pandas object.
Returns
-------
xarray.DataArray or xarray.Dataset
Data in the pandas structure converted to Dataset if the object is
a DataFrame, or a DataArray if the object is a Series.
See Also
--------
DataFrame.to_hdf : Write DataFrame to an HDF5 file.
DataFrame.to_parquet : Write a DataFrame to the binary parquet format.
Notes
-----
See the `xarray docs <https://xarray.pydata.org/en/stable/>`__
Examples
--------
>>> df = pd.DataFrame([('falcon', 'bird', 389.0, 2),
... ('parrot', 'bird', 24.0, 2),
... ('lion', 'mammal', 80.5, 4),
... ('monkey', 'mammal', np.nan, 4)],
... columns=['name', 'class', 'max_speed',
... 'num_legs'])
>>> df
name class max_speed num_legs
0 falcon bird 389.0 2
1 parrot bird 24.0 2
2 lion mammal 80.5 4
3 monkey mammal NaN 4
>>> df.to_xarray()
<xarray.Dataset>
Dimensions: (index: 4)
Coordinates:
* index (index) int64 0 1 2 3
Data variables:
name (index) object 'falcon' 'parrot' 'lion' 'monkey'
class (index) object 'bird' 'bird' 'mammal' 'mammal'
max_speed (index) float64 389.0 24.0 80.5 nan
num_legs (index) int64 2 2 4 4
>>> df['max_speed'].to_xarray()
<xarray.DataArray 'max_speed' (index: 4)>
array([389. , 24. , 80.5, nan])
Coordinates:
* index (index) int64 0 1 2 3
>>> dates = pd.to_datetime(['2018-01-01', '2018-01-01',
... '2018-01-02', '2018-01-02'])
>>> df_multiindex = pd.DataFrame({'date': dates,
... 'animal': ['falcon', 'parrot',
... 'falcon', 'parrot'],
... 'speed': [350, 18, 361, 15]})
>>> df_multiindex = df_multiindex.set_index(['date', 'animal'])
>>> df_multiindex
speed
date animal
2018-01-01 falcon 350
parrot 18
2018-01-02 falcon 361
parrot 15
>>> df_multiindex.to_xarray()
<xarray.Dataset>
Dimensions: (animal: 2, date: 2)
Coordinates:
* date (date) datetime64[ns] 2018-01-01 2018-01-02
* animal (animal) object 'falcon' 'parrot'
Data variables:
speed (date, animal) int64 350 18 361 15
Function22
to_xml(self, path_or_buffer: 'FilePathOrBuffer | None' = None, index: 'bool' = True, root_name: 'str | None' = 'data', row_name: 'str | None' = 'row', na_rep: 'str | None' = None, attr_cols: 'str | list[str] | None' = None, elem_cols: 'str | list[str] | None' = None, namespaces: 'dict[str | None, str] | None' = None, prefix: 'str | None' = None, encoding: 'str' = 'utf-8', xml_declaration: 'bool | None' = True, pretty_print: 'bool | None' = True, parser: 'str | None' = 'lxml', stylesheet: 'FilePathOrBuffer | None' = None, compression: 'CompressionOptions' = 'infer', storage_options: 'StorageOptions' = None) -> 'str | None'
Help on function to_xml in module pandas.core.frame:
to_xml(self, path_or_buffer: 'FilePathOrBuffer | None' = None, index: 'bool' = True, root_name: 'str | None' = 'data', row_name: 'str | None' = 'row', na_rep: 'str | None' = None, attr_cols: 'str | list[str] | None' = None, elem_cols: 'str | list[str] | None' = None, namespaces: 'dict[str | None, str] | None' = None, prefix: 'str | None' = None, encoding: 'str' = 'utf-8', xml_declaration: 'bool | None' = True, pretty_print: 'bool | None' = True, parser: 'str | None' = 'lxml', stylesheet: 'FilePathOrBuffer | None' = None, compression: 'CompressionOptions' = 'infer', storage_options: 'StorageOptions' = None) -> 'str | None'
Render a DataFrame to an XML document.
.. versionadded:: 1.3.0
Parameters
----------
path_or_buffer : str, path object or file-like object, optional
File to write output to. If None, the output is returned as a
string.
index : bool, default True
Whether to include index in XML document.
root_name : str, default 'data'
The name of root element in XML document.
row_name : str, default 'row'
The name of row element in XML document.
na_rep : str, optional
Missing data representation.
attr_cols : list-like, optional
List of columns to write as attributes in row element.
Hierarchical columns will be flattened with underscore
delimiting the different levels.
elem_cols : list-like, optional
List of columns to write as children in row element. By default,
all columns output as children of row element. Hierarchical
columns will be flattened with underscore delimiting the
different levels.
namespaces : dict, optional
All namespaces to be defined in root element. Keys of dict
should be prefix names and values of dict corresponding URIs.
Default namespaces should be given empty string key. For
example, ::
namespaces = {"": "https://example.com"}
prefix : str, optional
Namespace prefix to be used for every element and/or attribute
in document. This should be one of the keys in ``namespaces``
dict.
encoding : str, default 'utf-8'
Encoding of the resulting document.
xml_declaration : bool, default True
Whether to include the XML declaration at start of document.
pretty_print : bool, default True
Whether output should be pretty printed with indentation and
line breaks.
parser : {'lxml','etree'}, default 'lxml'
Parser module to use for building of tree. Only 'lxml' and
'etree' are supported. With 'lxml', the ability to use XSLT
stylesheet is supported.
stylesheet : str, path object or file-like object, optional
A URL, file-like object, or a raw string containing an XSLT
script used to transform the raw XML output. Script should use
layout of elements and attributes from original output. This
argument requires ``lxml`` to be installed. Only XSLT 1.0
scripts and not later versions is currently supported.
compression : {'infer', 'gzip', 'bz2', 'zip', 'xz', None}, default 'infer'
For on-the-fly decompression of on-disk data. If 'infer', then use
gzip, bz2, zip or xz if path_or_buffer is a string ending in
'.gz', '.bz2', '.zip', or 'xz', respectively, and no decompression
otherwise. If using 'zip', the ZIP file must contain only one data
file to be read in. Set to None for no decompression.
storage_options : dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to ``urllib`` as header options. For other URLs (e.g.
starting with "s3://", and "gcs://") the key-value pairs are forwarded to
``fsspec``. Please see ``fsspec`` and ``urllib`` for more details.
Returns
-------
None or str
If ``io`` is None, returns the resulting XML format as a
string. Otherwise returns None.
See Also
--------
to_json : Convert the pandas object to a JSON string.
to_html : Convert DataFrame to a html.
Examples
--------
>>> df = pd.DataFrame({'shape': ['square', 'circle', 'triangle'],
... 'degrees': [360, 360, 180],
... 'sides': [4, np.nan, 3]})
>>> df.to_xml() # doctest: +SKIP
<?xml version='1.0' encoding='utf-8'?>
<data>
<row>
<index>0</index>
<shape>square</shape>
<degrees>360</degrees>
<sides>4.0</sides>
</row>
<row>
<index>1</index>
<shape>circle</shape>
<degrees>360</degrees>
<sides/>
</row>
<row>
<index>2</index>
<shape>triangle</shape>
<degrees>180</degrees>
<sides>3.0</sides>
</row>
</data>
>>> df.to_xml(attr_cols=[
... 'index', 'shape', 'degrees', 'sides'
... ]) # doctest: +SKIP
<?xml version='1.0' encoding='utf-8'?>
<data>
<row index="0" shape="square" degrees="360" sides="4.0"/>
<row index="1" shape="circle" degrees="360"/>
<row index="2" shape="triangle" degrees="180" sides="3.0"/>
</data>
>>> df.to_xml(namespaces={"doc": "https://example.com"},
... prefix="doc") # doctest: +SKIP
<?xml version='1.0' encoding='utf-8'?>
<doc:data xmlns:doc="https://example.com">
<doc:row>
<doc:index>0</doc:index>
<doc:shape>square</doc:shape>
<doc:degrees>360</doc:degrees>
<doc:sides>4.0</doc:sides>
</doc:row>
<doc:row>
<doc:index>1</doc:index>
<doc:shape>circle</doc:shape>
<doc:degrees>360</doc:degrees>
<doc:sides/>
</doc:row>
<doc:row>
<doc:index>2</doc:index>
<doc:shape>triangle</doc:shape>
<doc:degrees>180</doc:degrees>
<doc:sides>3.0</doc:sides>
</doc:row>
</doc:data>