上一篇链接:
Python pandas库|任凭弱水三千,我只取一瓢饮(2)_Hann Yang的博客-CSDN博客
I~Q: Function10~25
Types['Function'][9:25]
['infer_freq', 'interval_range', 'isna', 'isnull', 'json_normalize', 'lreshape', 'melt', 'merge', 'merge_asof', 'merge_ordered', 'notna', 'notnull', 'period_range', 'pivot', 'pivot_table', 'qcut']
Function10
infer_freq(index, warn: 'bool' = True) -> 'str | None'
Help on function infer_freq in module pandas.tseries.frequencies:
infer_freq(index, warn: 'bool' = True) -> 'str | None'
Infer the most likely frequency given the input index. If the frequency is
uncertain, a warning will be printed.
Parameters
----------
index : DatetimeIndex or TimedeltaIndex
If passed a Series will use the values of the series (NOT THE INDEX).
warn : bool, default True
Returns
-------
str or None
None if no discernible frequency.
Raises
------
TypeError
If the index is not datetime-like.
ValueError
If there are fewer than three values.
Examples
--------
>>> idx = pd.date_range(start='2020/12/01', end='2020/12/30', periods=30)
>>> pd.infer_freq(idx)
'D'
Function11
interval_range(start=None, end=None, periods=None, freq=None, name: 'Hashable' = None, closed='right') -> 'IntervalIndex'
Help on function interval_range in module pandas.core.indexes.interval:
interval_range(start=None, end=None, periods=None, freq=None, name: 'Hashable' = None, closed='right') -> 'IntervalIndex'
Return a fixed frequency IntervalIndex.
Parameters
----------
start : numeric or datetime-like, default None
Left bound for generating intervals.
end : numeric or datetime-like, default None
Right bound for generating intervals.
periods : int, default None
Number of periods to generate.
freq : numeric, str, or DateOffset, default None
The length of each interval. Must be consistent with the type of start
and end, e.g. 2 for numeric, or '5H' for datetime-like. Default is 1
for numeric and 'D' for datetime-like.
name : str, default None
Name of the resulting IntervalIndex.
closed : {'left', 'right', 'both', 'neither'}, default 'right'
Whether the intervals are closed on the left-side, right-side, both
or neither.
Returns
-------
IntervalIndex
See Also
--------
IntervalIndex : An Index of intervals that are all closed on the same side.
Notes
-----
Of the four parameters ``start``, ``end``, ``periods``, and ``freq``,
exactly three must be specified. If ``freq`` is omitted, the resulting
``IntervalIndex`` will have ``periods`` linearly spaced elements between
``start`` and ``end``, inclusively.
To learn more about datetime-like frequency strings, please see `this link
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`__.
Examples
--------
Numeric ``start`` and ``end`` is supported.
>>> pd.interval_range(start=0, end=5)
IntervalIndex([(0, 1], (1, 2], (2, 3], (3, 4], (4, 5]],
dtype='interval[int64, right]')
Additionally, datetime-like input is also supported.
>>> pd.interval_range(start=pd.Timestamp('2017-01-01'),
... end=pd.Timestamp('2017-01-04'))
IntervalIndex([(2017-01-01, 2017-01-02], (2017-01-02, 2017-01-03],
(2017-01-03, 2017-01-04]],
dtype='interval[datetime64[ns], right]')
The ``freq`` parameter specifies the frequency between the left and right.
endpoints of the individual intervals within the ``IntervalIndex``. For
numeric ``start`` and ``end``, the frequency must also be numeric.
>>> pd.interval_range(start=0, periods=4, freq=1.5)
IntervalIndex([(0.0, 1.5], (1.5, 3.0], (3.0, 4.5], (4.5, 6.0]],
dtype='interval[float64, right]')
Similarly, for datetime-like ``start`` and ``end``, the frequency must be
convertible to a DateOffset.
>>> pd.interval_range(start=pd.Timestamp('2017-01-01'),
... periods=3, freq='MS')
IntervalIndex([(2017-01-01, 2017-02-01], (2017-02-01, 2017-03-01],
(2017-03-01, 2017-04-01]],
dtype='interval[datetime64[ns], right]')
Specify ``start``, ``end``, and ``periods``; the frequency is generated
automatically (linearly spaced).
>>> pd.interval_range(start=0, end=6, periods=4)
IntervalIndex([(0.0, 1.5], (1.5, 3.0], (3.0, 4.5], (4.5, 6.0]],
dtype='interval[float64, right]')
The ``closed`` parameter specifies which endpoints of the individual
intervals within the ``IntervalIndex`` are closed.
>>> pd.interval_range(end=5, periods=4, closed='both')
IntervalIndex([[1, 2], [2, 3], [3, 4], [4, 5]],
dtype='interval[int64, both]')
Function12、13
isna(obj)、isnull(obj) 两者用法相同,即同一函数的别称
Help on function isna in module pandas.core.dtypes.missing:
isna(obj)
Detect missing values for an array-like object.
This function takes a scalar or array-like object and indicates
whether values are missing (``NaN`` in numeric arrays, ``None`` or ``NaN``
in object arrays, ``NaT`` in datetimelike).
Parameters
----------
obj : scalar or array-like
Object to check for null or missing values.
Returns
-------
bool or array-like of bool
For scalar input, returns a scalar boolean.
For array input, returns an array of boolean indicating whether each
corresponding element is missing.
See Also
--------
notna : Boolean inverse of pandas.isna.
Series.isna : Detect missing values in a Series.
DataFrame.isna : Detect missing values in a DataFrame.
Index.isna : Detect missing values in an Index.
Examples
--------
Scalar arguments (including strings) result in a scalar boolean.
>>> pd.isna('dog')
False
>>> pd.isna(pd.NA)
True
>>> pd.isna(np.nan)
True
ndarrays result in an ndarray of booleans.
>>> array = np.array([[1, np.nan, 3], [4, 5, np.nan]])
>>> array
array([[ 1., nan, 3.],
[ 4., 5., nan]])
>>> pd.isna(array)
array([[False, True, False],
[False, False, True]])
For indexes, an ndarray of booleans is returned.
>>> index = pd.DatetimeIndex(["2017-07-05", "2017-07-06", None,
... "2017-07-08"])
>>> index
DatetimeIndex(['2017-07-05', '2017-07-06', 'NaT', '2017-07-08'],
dtype='datetime64[ns]', freq=None)
>>> pd.isna(index)
array([False, False, True, False])
For Series and DataFrame, the same type is returned, containing booleans.
>>> df = pd.DataFrame([['ant', 'bee', 'cat'], ['dog', None, 'fly']])
>>> df
0 1 2
0 ant bee cat
1 dog None fly
>>> pd.isna(df)
0 1 2
0 False False False
1 False True False
>>> pd.isna(df[1])
0 False
1 True
Name: 1, dtype: bool
Function14
json_normalize(data: 'dict | list[dict]', record_path: 'str | list | None' = None, meta: 'str | list[str | list[str]] | None' = None, meta_prefix: 'str | None' = None, record_prefix: 'str | None' = None, errors: 'str' = 'raise', sep: 'str' = '.', max_level: 'int | None' = None) -> 'DataFrame'
Help on function _json_normalize in module pandas.io.json._normalize:
_json_normalize(data: 'dict | list[dict]', record_path: 'str | list | None' = None, meta: 'str | list[str | list[str]] | None' = None, meta_prefix: 'str | None' = None, record_prefix: 'str | None' = None, errors: 'str' = 'raise', sep: 'str' = '.', max_level: 'int | None' = None) -> 'DataFrame'
Normalize semi-structured JSON data into a flat table.
Parameters
----------
data : dict or list of dicts
Unserialized JSON objects.
record_path : str or list of str, default None
Path in each object to list of records. If not passed, data will be
assumed to be an array of records.
meta : list of paths (str or list of str), default None
Fields to use as metadata for each record in resulting table.
meta_prefix : str, default None
If True, prefix records with dotted (?) path, e.g. foo.bar.field if
meta is ['foo', 'bar'].
record_prefix : str, default None
If True, prefix records with dotted (?) path, e.g. foo.bar.field if
path to records is ['foo', 'bar'].
errors : {'raise', 'ignore'}, default 'raise'
Configures error handling.
* 'ignore' : will ignore KeyError if keys listed in meta are not
always present.
* 'raise' : will raise KeyError if keys listed in meta are not
always present.
sep : str, default '.'
Nested records will generate names separated by sep.
e.g., for sep='.', {'foo': {'bar': 0}} -> foo.bar.
max_level : int, default None
Max number of levels(depth of dict) to normalize.
if None, normalizes all levels.
.. versionadded:: 0.25.0
Returns
-------
frame : DataFrame
Normalize semi-structured JSON data into a flat table.
Examples
--------
>>> data = [
... {"id": 1, "name": {"first": "Coleen", "last": "Volk"}},
... {"name": {"given": "Mark", "family": "Regner"}},
... {"id": 2, "name": "Faye Raker"},
... ]
>>> pd.json_normalize(data)
id name.first name.last name.given name.family name
0 1.0 Coleen Volk NaN NaN NaN
1 NaN NaN NaN Mark Regner NaN
2 2.0 NaN NaN NaN NaN Faye Raker
>>> data = [
... {
... "id": 1,
... "name": "Cole Volk",
... "fitness": {"height": 130, "weight": 60},
... },
... {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
... {
... "id": 2,
... "name": "Faye Raker",
... "fitness": {"height": 130, "weight": 60},
... },
... ]
>>> pd.json_normalize(data, max_level=0)
id name fitness
0 1.0 Cole Volk {'height': 130, 'weight': 60}
1 NaN Mark Reg {'height': 130, 'weight': 60}
2 2.0 Faye Raker {'height': 130, 'weight': 60}
Normalizes nested data up to level 1.
>>> data = [
... {
... "id": 1,
... "name": "Cole Volk",
... "fitness": {"height": 130, "weight": 60},
... },
... {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
... {
... "id": 2,
... "name": "Faye Raker",
... "fitness": {"height": 130, "weight": 60},
... },
... ]
>>> pd.json_normalize(data, max_level=1)
id name fitness.height fitness.weight
0 1.0 Cole Volk 130 60
1 NaN Mark Reg 130 60
2 2.0 Faye Raker 130 60
>>> data = [
... {
... "state": "Florida",
... "shortname": "FL",
... "info": {"governor": "Rick Scott"},
... "counties": [
... {"name": "Dade", "population": 12345},
... {"name": "Broward", "population": 40000},
... {"name": "Palm Beach", "population": 60000},
... ],
... },
... {
... "state": "Ohio",
... "shortname": "OH",
... "info": {"governor": "John Kasich"},
... "counties": [
... {"name": "Summit", "population": 1234},
... {"name": "Cuyahoga", "population": 1337},
... ],
... },
... ]
>>> result = pd.json_normalize(
... data, "counties", ["state", "shortname", ["info", "governor"]]
... )
>>> result
name population state shortname info.governor
0 Dade 12345 Florida FL Rick Scott
1 Broward 40000 Florida FL Rick Scott
2 Palm Beach 60000 Florida FL Rick Scott
3 Summit 1234 Ohio OH John Kasich
4 Cuyahoga 1337 Ohio OH John Kasich
>>> data = {"A": [1, 2]}
>>> pd.json_normalize(data, "A", record_prefix="Prefix.")
Prefix.0
0 1
1 2
Returns normalized data with columns prefixed with the given string.
Function15
lreshape(data: 'DataFrame', groups, dropna: 'bool' = True, label=None) -> 'DataFrame'
Help on function lreshape in module pandas.core.reshape.melt:
lreshape(data: 'DataFrame', groups, dropna: 'bool' = True, label=None) -> 'DataFrame'
Reshape wide-format data to long. Generalized inverse of DataFrame.pivot.
Accepts a dictionary, ``groups``, in which each key is a new column name
and each value is a list of old column names that will be "melted" under
the new column name as part of the reshape.
Parameters
----------
data : DataFrame
The wide-format DataFrame.
groups : dict
{new_name : list_of_columns}.
dropna : bool, default True
Do not include columns whose entries are all NaN.
label : None
Not used.
.. deprecated:: 1.0.0
Returns
-------
DataFrame
Reshaped DataFrame.
See Also
--------
melt : Unpivot a DataFrame from wide to long format, optionally leaving
identifiers set.
pivot : Create a spreadsheet-style pivot table as a DataFrame.
DataFrame.pivot : Pivot without aggregation that can handle
non-numeric data.
DataFrame.pivot_table : Generalization of pivot that can handle
duplicate values for one index/column pair.
DataFrame.unstack : Pivot based on the index values instead of a
column.
wide_to_long : Wide panel to long format. Less flexible but more
user-friendly than melt.
Examples
--------
>>> data = pd.DataFrame({'hr1': [514, 573], 'hr2': [545, 526],
... 'team': ['Red Sox', 'Yankees'],
... 'year1': [2007, 2007], 'year2': [2008, 2008]})
>>> data
hr1 hr2 team year1 year2
0 514 545 Red Sox 2007 2008
1 573 526 Yankees 2007 2008
>>> pd.lreshape(data, {'year': ['year1', 'year2'], 'hr': ['hr1', 'hr2']})
team year hr
0 Red Sox 2007 514
1 Yankees 2007 573
2 Red Sox 2008 545
3 Yankees 2008 526
Function16
melt(frame: 'DataFrame', id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None, ignore_index: 'bool' = True) -> 'DataFrame'
Help on function melt in module pandas.core.reshape.melt:
melt(frame: 'DataFrame', id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None, ignore_index: 'bool' = True) -> 'DataFrame'
Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.
This function is useful to massage a DataFrame into a format where one
or more columns are identifier variables (`id_vars`), while all other
columns, considered measured variables (`value_vars`), are "unpivoted" to
the row axis, leaving just two non-identifier columns, 'variable' and
'value'.
Parameters
----------
id_vars : tuple, list, or ndarray, optional
Column(s) to use as identifier variables.
value_vars : tuple, list, or ndarray, optional
Column(s) to unpivot. If not specified, uses all columns that
are not set as `id_vars`.
var_name : scalar
Name to use for the 'variable' column. If None it uses
``frame.columns.name`` or 'variable'.
value_name : scalar, default 'value'
Name to use for the 'value' column.
col_level : int or str, optional
If columns are a MultiIndex then use this level to melt.
ignore_index : bool, default True
If True, original index is ignored. If False, the original index is retained.
Index labels will be repeated as necessary.
.. versionadded:: 1.1.0
Returns
-------
DataFrame
Unpivoted DataFrame.
See Also
--------
DataFrame.melt : Identical method.
pivot_table : Create a spreadsheet-style pivot table as a DataFrame.
DataFrame.pivot : Return reshaped DataFrame organized
by given index / column values.
DataFrame.explode : Explode a DataFrame from list-like
columns to long format.
Examples
--------
>>> df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
... 'B': {0: 1, 1: 3, 2: 5},
... 'C': {0: 2, 1: 4, 2: 6}})
>>> df
A B C
0 a 1 2
1 b 3 4
2 c 5 6
>>> pd.melt(df, id_vars=['A'], value_vars=['B'])
A variable value
0 a B 1
1 b B 3
2 c B 5
>>> pd.melt(df, id_vars=['A'], value_vars=['B', 'C'])
A variable value
0 a B 1
1 b B 3
2 c B 5
3 a C 2
4 b C 4
5 c C 6
The names of 'variable' and 'value' columns can be customized:
>>> pd.melt(df, id_vars=['A'], value_vars=['B'],
... var_name='myVarname', value_name='myValname')
A myVarname myValname
0 a B 1
1 b B 3
2 c B 5
Original index values can be kept around:
>>> pd.melt(df, id_vars=['A'], value_vars=['B', 'C'], ignore_index=False)
A variable value
0 a B 1
1 b B 3
2 c B 5
0 a C 2
1 b C 4
2 c C 6
If you have multi-index columns:
>>> df.columns = [list('ABC'), list('DEF')]
>>> df
A B C
D E F
0 a 1 2
1 b 3 4
2 c 5 6
>>> pd.melt(df, col_level=0, id_vars=['A'], value_vars=['B'])
A variable value
0 a B 1
1 b B 3
2 c B 5
>>> pd.melt(df, id_vars=[('A', 'D')], value_vars=[('B', 'E')])
(A, D) variable_0 variable_1 value
0 a B E 1
1 b B E 3
2 c B E 5
Function17
merge(left: 'DataFrame | Series', right: 'DataFrame | Series', how: 'str' = 'inner', on: 'IndexLabel | None' = None, left_on: 'IndexLabel | None' = None, right_on: 'IndexLabel | None' = None, left_index: 'bool' = False, right_index: 'bool' = False, sort: 'bool' = False, suffixes: 'Suffixes' = ('_x', '_y'), copy: 'bool' = True, indicator: 'bool' = False, validate: 'str | None' = None) -> 'DataFrame'
Help on function merge in module pandas.core.reshape.merge:
merge(left: 'DataFrame | Series', right: 'DataFrame | Series', how: 'str' = 'inner', on: 'IndexLabel | None' = None, left_on: 'IndexLabel | None' = None, right_on: 'IndexLabel | None' = None, left_index: 'bool' = False, right_index: 'bool' = False, sort: 'bool' = False, suffixes: 'Suffixes' = ('_x', '_y'), copy: 'bool' = True, indicator: 'bool' = False, validate: 'str | None' = None) -> 'DataFrame'
Merge DataFrame or named Series objects with a database-style join.
A named Series object is treated as a DataFrame with a single named column.
The join is done on columns or indexes. If joining columns on
columns, the DataFrame indexes *will be ignored*. Otherwise if joining indexes
on indexes or indexes on a column or columns, the index will be passed on.
When performing a cross merge, no column specifications to merge on are
allowed.
Parameters
----------
left : DataFrame
right : DataFrame or named Series
Object to merge with.
how : {'left', 'right', 'outer', 'inner', 'cross'}, default 'inner'
Type of merge to be performed.
* left: use only keys from left frame, similar to a SQL left outer join;
preserve key order.
* right: use only keys from right frame, similar to a SQL right outer join;
preserve key order.
* outer: use union of keys from both frames, similar to a SQL full outer
join; sort keys lexicographically.
* inner: use intersection of keys from both frames, similar to a SQL inner
join; preserve the order of the left keys.
* cross: creates the cartesian product from both frames, preserves the order
of the left keys.
.. versionadded:: 1.2.0
on : label or list
Column or index level names to join on. These must be found in both
DataFrames. If `on` is None and not merging on indexes then this defaults
to the intersection of the columns in both DataFrames.
left_on : label or list, or array-like
Column or index level names to join on in the left DataFrame. Can also
be an array or list of arrays of the length of the left DataFrame.
These arrays are treated as if they are columns.
right_on : label or list, or array-like
Column or index level names to join on in the right DataFrame. Can also
be an array or list of arrays of the length of the right DataFrame.
These arrays are treated as if they are columns.
left_index : bool, default False
Use the index from the left DataFrame as the join key(s). If it is a
MultiIndex, the number of keys in the other DataFrame (either the index
or a number of columns) must match the number of levels.
right_index : bool, default False
Use the index from the right DataFrame as the join key. Same caveats as
left_index.
sort : bool, default False
Sort the join keys lexicographically in the result DataFrame. If False,
the order of the join keys depends on the join type (how keyword).
suffixes : list-like, default is ("_x", "_y")
A length-2 sequence where each element is optionally a string
indicating the suffix to add to overlapping column names in
`left` and `right` respectively. Pass a value of `None` instead
of a string to indicate that the column name from `left` or
`right` should be left as-is, with no suffix. At least one of the
values must not be None.
copy : bool, default True
If False, avoid copy if possible.
indicator : bool or str, default False
If True, adds a column to the output DataFrame called "_merge" with
information on the source of each row. The column can be given a different
name by providing a string argument. The column will have a Categorical
type with the value of "left_only" for observations whose merge key only
appears in the left DataFrame, "right_only" for observations
whose merge key only appears in the right DataFrame, and "both"
if the observation's merge key is found in both DataFrames.
validate : str, optional
If specified, checks if merge is of specified type.
* "one_to_one" or "1:1": check if merge keys are unique in both
left and right datasets.
* "one_to_many" or "1:m": check if merge keys are unique in left
dataset.
* "many_to_one" or "m:1": check if merge keys are unique in right
dataset.
* "many_to_many" or "m:m": allowed, but does not result in checks.
Returns
-------
DataFrame
A DataFrame of the two merged objects.
See Also
--------
merge_ordered : Merge with optional filling/interpolation.
merge_asof : Merge on nearest keys.
DataFrame.join : Similar method using indices.
Notes
-----
Support for specifying index levels as the `on`, `left_on`, and
`right_on` parameters was added in version 0.23.0
Support for merging named Series objects was added in version 0.24.0
Examples
--------
>>> df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
... 'value': [1, 2, 3, 5]})
>>> df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
... 'value': [5, 6, 7, 8]})
>>> df1
lkey value
0 foo 1
1 bar 2
2 baz 3
3 foo 5
>>> df2
rkey value
0 foo 5
1 bar 6
2 baz 7
3 foo 8
Merge df1 and df2 on the lkey and rkey columns. The value columns have
the default suffixes, _x and _y, appended.
>>> df1.merge(df2, left_on='lkey', right_on='rkey')
lkey value_x rkey value_y
0 foo 1 foo 5
1 foo 1 foo 8
2 foo 5 foo 5
3 foo 5 foo 8
4 bar 2 bar 6
5 baz 3 baz 7
Merge DataFrames df1 and df2 with specified left and right suffixes
appended to any overlapping columns.
>>> df1.merge(df2, left_on='lkey', right_on='rkey',
... suffixes=('_left', '_right'))
lkey value_left rkey value_right
0 foo 1 foo 5
1 foo 1 foo 8
2 foo 5 foo 5
3 foo 5 foo 8
4 bar 2 bar 6
5 baz 3 baz 7
Merge DataFrames df1 and df2, but raise an exception if the DataFrames have
any overlapping columns.
>>> df1.merge(df2, left_on='lkey', right_on='rkey', suffixes=(False, False))
Traceback (most recent call last):
...
ValueError: columns overlap but no suffix specified:
Index(['value'], dtype='object')
>>> df1 = pd.DataFrame({'a': ['foo', 'bar'], 'b': [1, 2]})
>>> df2 = pd.DataFrame({'a': ['foo', 'baz'], 'c': [3, 4]})
>>> df1
a b
0 foo 1
1 bar 2
>>> df2
a c
0 foo 3
1 baz 4
>>> df1.merge(df2, how='inner', on='a')
a b c
0 foo 1 3
>>> df1.merge(df2, how='left', on='a')
a b c
0 foo 1 3.0
1 bar 2 NaN
>>> df1 = pd.DataFrame({'left': ['foo', 'bar']})
>>> df2 = pd.DataFrame({'right': [7, 8]})
>>> df1
left
0 foo
1 bar
>>> df2
right
0 7
1 8
>>> df1.merge(df2, how='cross')
left right
0 foo 7
1 foo 8
2 bar 7
3 bar 8
Function18
merge_asof(left: 'DataFrame | Series', right: 'DataFrame | Series', on: 'IndexLabel | None' = None, left_on: 'IndexLabel | None' = None, right_on: 'IndexLabel | None' = None, left_index: 'bool' = False, right_index: 'bool' = False, by=None, left_by=None, right_by=None, suffixes: 'Suffixes' = ('_x', '_y'), tolerance=None, allow_exact_matches: 'bool' = True, direction: 'str' = 'backward') -> 'DataFrame'
Help on function merge_asof in module pandas.core.reshape.merge:
merge_asof(left: 'DataFrame | Series', right: 'DataFrame | Series', on: 'IndexLabel | None' = None, left_on: 'IndexLabel | None' = None, right_on: 'IndexLabel | None' = None, left_index: 'bool' = False, right_index: 'bool' = False, by=None, left_by=None, right_by=None, suffixes: 'Suffixes' = ('_x', '_y'), tolerance=None, allow_exact_matches: 'bool' = True, direction: 'str' = 'backward') -> 'DataFrame'
Perform an asof merge.
This is similar to a left-join except that we match on nearest
key rather than equal keys. Both DataFrames must be sorted by the key.
For each row in the left DataFrame:
- A "backward" search selects the last row in the right DataFrame whose
'on' key is less than or equal to the left's key.
- A "forward" search selects the first row in the right DataFrame whose
'on' key is greater than or equal to the left's key.
- A "nearest" search selects the row in the right DataFrame whose 'on'
key is closest in absolute distance to the left's key.
The default is "backward" and is compatible in versions below 0.20.0.
The direction parameter was added in version 0.20.0 and introduces
"forward" and "nearest".
Optionally match on equivalent keys with 'by' before searching with 'on'.
Parameters
----------
left : DataFrame or named Series
right : DataFrame or named Series
on : label
Field name to join on. Must be found in both DataFrames.
The data MUST be ordered. Furthermore this must be a numeric column,
such as datetimelike, integer, or float. On or left_on/right_on
must be given.
left_on : label
Field name to join on in left DataFrame.
right_on : label
Field name to join on in right DataFrame.
left_index : bool
Use the index of the left DataFrame as the join key.
right_index : bool
Use the index of the right DataFrame as the join key.
by : column name or list of column names
Match on these columns before performing merge operation.
left_by : column name
Field names to match on in the left DataFrame.
right_by : column name
Field names to match on in the right DataFrame.
suffixes : 2-length sequence (tuple, list, ...)
Suffix to apply to overlapping column names in the left and right
side, respectively.
tolerance : int or Timedelta, optional, default None
Select asof tolerance within this range; must be compatible
with the merge index.
allow_exact_matches : bool, default True
- If True, allow matching with the same 'on' value
(i.e. less-than-or-equal-to / greater-than-or-equal-to)
- If False, don't match the same 'on' value
(i.e., strictly less-than / strictly greater-than).
direction : 'backward' (default), 'forward', or 'nearest'
Whether to search for prior, subsequent, or closest matches.
Returns
-------
merged : DataFrame
See Also
--------
merge : Merge with a database-style join.
merge_ordered : Merge with optional filling/interpolation.
Examples
--------
>>> left = pd.DataFrame({"a": [1, 5, 10], "left_val": ["a", "b", "c"]})
>>> left
a left_val
0 1 a
1 5 b
2 10 c
>>> right = pd.DataFrame({"a": [1, 2, 3, 6, 7], "right_val": [1, 2, 3, 6, 7]})
>>> right
a right_val
0 1 1
1 2 2
2 3 3
3 6 6
4 7 7
>>> pd.merge_asof(left, right, on="a")
a left_val right_val
0 1 a 1
1 5 b 3
2 10 c 7
>>> pd.merge_asof(left, right, on="a", allow_exact_matches=False)
a left_val right_val
0 1 a NaN
1 5 b 3.0
2 10 c 7.0
>>> pd.merge_asof(left, right, on="a", direction="forward")
a left_val right_val
0 1 a 1.0
1 5 b 6.0
2 10 c NaN
>>> pd.merge_asof(left, right, on="a", direction="nearest")
a left_val right_val
0 1 a 1
1 5 b 6
2 10 c 7
We can use indexed DataFrames as well.
>>> left = pd.DataFrame({"left_val": ["a", "b", "c"]}, index=[1, 5, 10])
>>> left
left_val
1 a
5 b
10 c
>>> right = pd.DataFrame({"right_val": [1, 2, 3, 6, 7]}, index=[1, 2, 3, 6, 7])
>>> right
right_val
1 1
2 2
3 3
6 6
7 7
>>> pd.merge_asof(left, right, left_index=True, right_index=True)
left_val right_val
1 a 1
5 b 3
10 c 7
Here is a real-world times-series example
>>> quotes = pd.DataFrame(
... {
... "time": [
... pd.Timestamp("2016-05-25 13:30:00.023"),
... pd.Timestamp("2016-05-25 13:30:00.023"),
... pd.Timestamp("2016-05-25 13:30:00.030"),
... pd.Timestamp("2016-05-25 13:30:00.041"),
... pd.Timestamp("2016-05-25 13:30:00.048"),
... pd.Timestamp("2016-05-25 13:30:00.049"),
... pd.Timestamp("2016-05-25 13:30:00.072"),
... pd.Timestamp("2016-05-25 13:30:00.075")
... ],
... "ticker": [
... "GOOG",
... "MSFT",
... "MSFT",
... "MSFT",
... "GOOG",
... "AAPL",
... "GOOG",
... "MSFT"
... ],
... "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],
... "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03]
... }
... )
>>> quotes
time ticker bid ask
0 2016-05-25 13:30:00.023 GOOG 720.50 720.93
1 2016-05-25 13:30:00.023 MSFT 51.95 51.96
2 2016-05-25 13:30:00.030 MSFT 51.97 51.98
3 2016-05-25 13:30:00.041 MSFT 51.99 52.00
4 2016-05-25 13:30:00.048 GOOG 720.50 720.93
5 2016-05-25 13:30:00.049 AAPL 97.99 98.01
6 2016-05-25 13:30:00.072 GOOG 720.50 720.88
7 2016-05-25 13:30:00.075 MSFT 52.01 52.03
>>> trades = pd.DataFrame(
... {
... "time": [
... pd.Timestamp("2016-05-25 13:30:00.023"),
... pd.Timestamp("2016-05-25 13:30:00.038"),
... pd.Timestamp("2016-05-25 13:30:00.048"),
... pd.Timestamp("2016-05-25 13:30:00.048"),
... pd.Timestamp("2016-05-25 13:30:00.048")
... ],
... "ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
... "price": [51.95, 51.95, 720.77, 720.92, 98.0],
... "quantity": [75, 155, 100, 100, 100]
... }
... )
>>> trades
time ticker price quantity
0 2016-05-25 13:30:00.023 MSFT 51.95 75
1 2016-05-25 13:30:00.038 MSFT 51.95 155
2 2016-05-25 13:30:00.048 GOOG 720.77 100
3 2016-05-25 13:30:00.048 GOOG 720.92 100
4 2016-05-25 13:30:00.048 AAPL 98.00 100
By default we are taking the asof of the quotes
>>> pd.merge_asof(trades, quotes, on="time", by="ticker")
time ticker price quantity bid ask
0 2016-05-25 13:30:00.023 MSFT 51.95 75 51.95 51.96
1 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.98
2 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.93
3 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.93
4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN
We only asof within 2ms between the quote time and the trade time
>>> pd.merge_asof(
... trades, quotes, on="time", by="ticker", tolerance=pd.Timedelta("2ms")
... )
time ticker price quantity bid ask
0 2016-05-25 13:30:00.023 MSFT 51.95 75 51.95 51.96
1 2016-05-25 13:30:00.038 MSFT 51.95 155 NaN NaN
2 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.93
3 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.93
4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN
We only asof within 10ms between the quote time and the trade time
and we exclude exact matches on time. However *prior* data will
propagate forward
>>> pd.merge_asof(
... trades,
... quotes,
... on="time",
... by="ticker",
... tolerance=pd.Timedelta("10ms"),
... allow_exact_matches=False
... )
time ticker price quantity bid ask
0 2016-05-25 13:30:00.023 MSFT 51.95 75 NaN NaN
1 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.98
2 2016-05-25 13:30:00.048 GOOG 720.77 100 NaN NaN
3 2016-05-25 13:30:00.048 GOOG 720.92 100 NaN NaN
4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN
Function19
merge_ordered(left: 'DataFrame', right: 'DataFrame', on: 'IndexLabel | None' = None, left_on: 'IndexLabel | None' = None, right_on: 'IndexLabel | None' = None, left_by=None, right_by=None, fill_method: 'str | None' = None, suffixes: 'Suffixes' = ('_x', '_y'), how: 'str' = 'outer') -> 'DataFrame'
Help on function merge_ordered in module pandas.core.reshape.merge:
merge_ordered(left: 'DataFrame', right: 'DataFrame', on: 'IndexLabel | None' = None, left_on: 'IndexLabel | None' = None, right_on: 'IndexLabel | None' = None, left_by=None, right_by=None, fill_method: 'str | None' = None, suffixes: 'Suffixes' = ('_x', '_y'), how: 'str' = 'outer') -> 'DataFrame'
Perform merge with optional filling/interpolation.
Designed for ordered data like time series data. Optionally
perform group-wise merge (see examples).
Parameters
----------
left : DataFrame
right : DataFrame
on : label or list
Field names to join on. Must be found in both DataFrames.
left_on : label or list, or array-like
Field names to join on in left DataFrame. Can be a vector or list of
vectors of the length of the DataFrame to use a particular vector as
the join key instead of columns.
right_on : label or list, or array-like
Field names to join on in right DataFrame or vector/list of vectors per
left_on docs.
left_by : column name or list of column names
Group left DataFrame by group columns and merge piece by piece with
right DataFrame.
right_by : column name or list of column names
Group right DataFrame by group columns and merge piece by piece with
left DataFrame.
fill_method : {'ffill', None}, default None
Interpolation method for data.
suffixes : list-like, default is ("_x", "_y")
A length-2 sequence where each element is optionally a string
indicating the suffix to add to overlapping column names in
`left` and `right` respectively. Pass a value of `None` instead
of a string to indicate that the column name from `left` or
`right` should be left as-is, with no suffix. At least one of the
values must not be None.
.. versionchanged:: 0.25.0
how : {'left', 'right', 'outer', 'inner'}, default 'outer'
* left: use only keys from left frame (SQL: left outer join)
* right: use only keys from right frame (SQL: right outer join)
* outer: use union of keys from both frames (SQL: full outer join)
* inner: use intersection of keys from both frames (SQL: inner join).
Returns
-------
DataFrame
The merged DataFrame output type will the be same as
'left', if it is a subclass of DataFrame.
See Also
--------
merge : Merge with a database-style join.
merge_asof : Merge on nearest keys.
Examples
--------
>>> df1 = pd.DataFrame(
... {
... "key": ["a", "c", "e", "a", "c", "e"],
... "lvalue": [1, 2, 3, 1, 2, 3],
... "group": ["a", "a", "a", "b", "b", "b"]
... }
... )
>>> df1
key lvalue group
0 a 1 a
1 c 2 a
2 e 3 a
3 a 1 b
4 c 2 b
5 e 3 b
>>> df2 = pd.DataFrame({"key": ["b", "c", "d"], "rvalue": [1, 2, 3]})
>>> df2
key rvalue
0 b 1
1 c 2
2 d 3
>>> merge_ordered(df1, df2, fill_method="ffill", left_by="group")
key lvalue group rvalue
0 a 1 a NaN
1 b 1 a 1.0
2 c 2 a 2.0
3 d 2 a 3.0
4 e 3 a 3.0
5 a 1 b NaN
6 b 1 b 1.0
7 c 2 b 2.0
8 d 2 b 3.0
9 e 3 b 3.0
Function20、21
notna(obj)、notnull(obj)
Help on function notna in module pandas.core.dtypes.missing:
notna(obj)
Detect non-missing values for an array-like object.
This function takes a scalar or array-like object and indicates
whether values are valid (not missing, which is ``NaN`` in numeric
arrays, ``None`` or ``NaN`` in object arrays, ``NaT`` in datetimelike).
Parameters
----------
obj : array-like or object value
Object to check for *not* null or *non*-missing values.
Returns
-------
bool or array-like of bool
For scalar input, returns a scalar boolean.
For array input, returns an array of boolean indicating whether each
corresponding element is valid.
See Also
--------
isna : Boolean inverse of pandas.notna.
Series.notna : Detect valid values in a Series.
DataFrame.notna : Detect valid values in a DataFrame.
Index.notna : Detect valid values in an Index.
Examples
--------
Scalar arguments (including strings) result in a scalar boolean.
>>> pd.notna('dog')
True
>>> pd.notna(pd.NA)
False
>>> pd.notna(np.nan)
False
ndarrays result in an ndarray of booleans.
>>> array = np.array([[1, np.nan, 3], [4, 5, np.nan]])
>>> array
array([[ 1., nan, 3.],
[ 4., 5., nan]])
>>> pd.notna(array)
array([[ True, False, True],
[ True, True, False]])
For indexes, an ndarray of booleans is returned.
>>> index = pd.DatetimeIndex(["2017-07-05", "2017-07-06", None,
... "2017-07-08"])
>>> index
DatetimeIndex(['2017-07-05', '2017-07-06', 'NaT', '2017-07-08'],
dtype='datetime64[ns]', freq=None)
>>> pd.notna(index)
array([ True, True, False, True])
For Series and DataFrame, the same type is returned, containing booleans.
>>> df = pd.DataFrame([['ant', 'bee', 'cat'], ['dog', None, 'fly']])
>>> df
0 1 2
0 ant bee cat
1 dog None fly
>>> pd.notna(df)
0 1 2
0 True True True
1 True False True
>>> pd.notna(df[1])
0 True
1 False
Name: 1, dtype: bool
Function22
period_range(start=None, end=None, periods: 'int | None' = None, freq=None, name=None) -> 'PeriodIndex'
Help on function period_range in module pandas.core.indexes.period:
period_range(start=None, end=None, periods: 'int | None' = None, freq=None, name=None) -> 'PeriodIndex'
Return a fixed frequency PeriodIndex.
The day (calendar) is the default frequency.
Parameters
----------
start : str or period-like, default None
Left bound for generating periods.
end : str or period-like, default None
Right bound for generating periods.
periods : int, default None
Number of periods to generate.
freq : str or DateOffset, optional
Frequency alias. By default the freq is taken from `start` or `end`
if those are Period objects. Otherwise, the default is ``"D"`` for
daily frequency.
name : str, default None
Name of the resulting PeriodIndex.
Returns
-------
PeriodIndex
Notes
-----
Of the three parameters: ``start``, ``end``, and ``periods``, exactly two
must be specified.
To learn more about the frequency strings, please see `this link
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`__.
Examples
--------
>>> pd.period_range(start='2017-01-01', end='2018-01-01', freq='M')
PeriodIndex(['2017-01', '2017-02', '2017-03', '2017-04', '2017-05', '2017-06',
'2017-07', '2017-08', '2017-09', '2017-10', '2017-11', '2017-12',
'2018-01'],
dtype='period[M]')
If ``start`` or ``end`` are ``Period`` objects, they will be used as anchor
endpoints for a ``PeriodIndex`` with frequency matching that of the
``period_range`` constructor.
>>> pd.period_range(start=pd.Period('2017Q1', freq='Q'),
... end=pd.Period('2017Q2', freq='Q'), freq='M')
PeriodIndex(['2017-03', '2017-04', '2017-05', '2017-06'],
dtype='period[M]')
Function23
pivot(data: 'DataFrame', index: 'IndexLabel | None' = None, columns: 'IndexLabel | None' = None, values: 'IndexLabel | None' = None) -> 'DataFrame'
Help on function pivot in module pandas.core.reshape.pivot:
pivot(data: 'DataFrame', index: 'IndexLabel | None' = None, columns: 'IndexLabel | None' = None, values: 'IndexLabel | None' = None) -> 'DataFrame'
Return reshaped DataFrame organized by given index / column values.
Reshape data (produce a "pivot" table) based on column values. Uses
unique values from specified `index` / `columns` to form axes of the
resulting DataFrame. This function does not support data
aggregation, multiple values will result in a MultiIndex in the
columns. See the :ref:`User Guide <reshaping>` for more on reshaping.
Parameters
----------
data : DataFrame
index : str or object or a list of str, optional
Column to use to make new frame's index. If None, uses
existing index.
.. versionchanged:: 1.1.0
Also accept list of index names.
columns : str or object or a list of str
Column to use to make new frame's columns.
.. versionchanged:: 1.1.0
Also accept list of columns names.
values : str, object or a list of the previous, optional
Column(s) to use for populating new frame's values. If not
specified, all remaining columns will be used and the result will
have hierarchically indexed columns.
Returns
-------
DataFrame
Returns reshaped DataFrame.
Raises
------
ValueError:
When there are any `index`, `columns` combinations with multiple
values. `DataFrame.pivot_table` when you need to aggregate.
See Also
--------
DataFrame.pivot_table : Generalization of pivot that can handle
duplicate values for one index/column pair.
DataFrame.unstack : Pivot based on the index values instead of a
column.
wide_to_long : Wide panel to long format. Less flexible but more
user-friendly than melt.
Notes
-----
For finer-tuned control, see hierarchical indexing documentation along
with the related stack/unstack methods.
Examples
--------
>>> df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',
... 'two'],
... 'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
... 'baz': [1, 2, 3, 4, 5, 6],
... 'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
>>> df
foo bar baz zoo
0 one A 1 x
1 one B 2 y
2 one C 3 z
3 two A 4 q
4 two B 5 w
5 two C 6 t
>>> df.pivot(index='foo', columns='bar', values='baz')
bar A B C
foo
one 1 2 3
two 4 5 6
>>> df.pivot(index='foo', columns='bar')['baz']
bar A B C
foo
one 1 2 3
two 4 5 6
>>> df.pivot(index='foo', columns='bar', values=['baz', 'zoo'])
baz zoo
bar A B C A B C
foo
one 1 2 3 x y z
two 4 5 6 q w t
You could also assign a list of column names or a list of index names.
>>> df = pd.DataFrame({
... "lev1": [1, 1, 1, 2, 2, 2],
... "lev2": [1, 1, 2, 1, 1, 2],
... "lev3": [1, 2, 1, 2, 1, 2],
... "lev4": [1, 2, 3, 4, 5, 6],
... "values": [0, 1, 2, 3, 4, 5]})
>>> df
lev1 lev2 lev3 lev4 values
0 1 1 1 1 0
1 1 1 2 2 1
2 1 2 1 3 2
3 2 1 2 4 3
4 2 1 1 5 4
5 2 2 2 6 5
>>> df.pivot(index="lev1", columns=["lev2", "lev3"],values="values")
lev2 1 2
lev3 1 2 1 2
lev1
1 0.0 1.0 2.0 NaN
2 4.0 3.0 NaN 5.0
>>> df.pivot(index=["lev1", "lev2"], columns=["lev3"],values="values")
lev3 1 2
lev1 lev2
1 1 0.0 1.0
2 2.0 NaN
2 1 4.0 3.0
2 NaN 5.0
A ValueError is raised if there are any duplicates.
>>> df = pd.DataFrame({"foo": ['one', 'one', 'two', 'two'],
... "bar": ['A', 'A', 'B', 'C'],
... "baz": [1, 2, 3, 4]})
>>> df
foo bar baz
0 one A 1
1 one A 2
2 two B 3
3 two C 4
Notice that the first two rows are the same for our `index`
and `columns` arguments.
>>> df.pivot(index='foo', columns='bar', values='baz')
Traceback (most recent call last):
...
ValueError: Index contains duplicate entries, cannot reshape
Function24
pivot_table(data: 'DataFrame', values=None, index=None, columns=None, aggfunc: 'AggFuncType' = 'mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=False, sort=True) -> 'DataFrame'
Help on function pivot_table in module pandas.core.reshape.pivot:
pivot_table(data: 'DataFrame', values=None, index=None, columns=None, aggfunc: 'AggFuncType' = 'mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=False, sort=True) -> 'DataFrame'
Create a spreadsheet-style pivot table as a DataFrame.
The levels in the pivot table will be stored in MultiIndex objects
(hierarchical indexes) on the index and columns of the result DataFrame.
Parameters
----------
data : DataFrame
values : column to aggregate, optional
index : column, Grouper, array, or list of the previous
If an array is passed, it must be the same length as the data. The
list can contain any of the other types (except list).
Keys to group by on the pivot table index. If an array is passed,
it is being used as the same manner as column values.
columns : column, Grouper, array, or list of the previous
If an array is passed, it must be the same length as the data. The
list can contain any of the other types (except list).
Keys to group by on the pivot table column. If an array is passed,
it is being used as the same manner as column values.
aggfunc : function, list of functions, dict, default numpy.mean
If list of functions passed, the resulting pivot table will have
hierarchical columns whose top level are the function names
(inferred from the function objects themselves)
If dict is passed, the key is column to aggregate and value
is function or list of functions.
fill_value : scalar, default None
Value to replace missing values with (in the resulting pivot table,
after aggregation).
margins : bool, default False
Add all row / columns (e.g. for subtotal / grand totals).
dropna : bool, default True
Do not include columns whose entries are all NaN.
margins_name : str, default 'All'
Name of the row / column that will contain the totals
when margins is True.
observed : bool, default False
This only applies if any of the groupers are Categoricals.
If True: only show observed values for categorical groupers.
If False: show all values for categorical groupers.
.. versionchanged:: 0.25.0
sort : bool, default True
Specifies if the result should be sorted.
.. versionadded:: 1.3.0
Returns
-------
DataFrame
An Excel style pivot table.
See Also
--------
DataFrame.pivot : Pivot without aggregation that can handle
non-numeric data.
DataFrame.melt: Unpivot a DataFrame from wide to long format,
optionally leaving identifiers set.
wide_to_long : Wide panel to long format. Less flexible but more
user-friendly than melt.
Examples
--------
>>> df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
... "bar", "bar", "bar", "bar"],
... "B": ["one", "one", "one", "two", "two",
... "one", "one", "two", "two"],
... "C": ["small", "large", "large", "small",
... "small", "large", "small", "small",
... "large"],
... "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
... "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
>>> df
A B C D E
0 foo one small 1 2
1 foo one large 2 4
2 foo one large 2 5
3 foo two small 3 5
4 foo two small 3 6
5 bar one large 4 6
6 bar one small 5 8
7 bar two small 6 9
8 bar two large 7 9
This first example aggregates values by taking the sum.
>>> table = pd.pivot_table(df, values='D', index=['A', 'B'],
... columns=['C'], aggfunc=np.sum)
>>> table
C large small
A B
bar one 4.0 5.0
two 7.0 6.0
foo one 4.0 1.0
two NaN 6.0
We can also fill missing values using the `fill_value` parameter.
>>> table = pd.pivot_table(df, values='D', index=['A', 'B'],
... columns=['C'], aggfunc=np.sum, fill_value=0)
>>> table
C large small
A B
bar one 4 5
two 7 6
foo one 4 1
two 0 6
The next example aggregates by taking the mean across multiple columns.
>>> table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
... aggfunc={'D': np.mean,
... 'E': np.mean})
>>> table
D E
A C
bar large 5.500000 7.500000
small 5.500000 8.500000
foo large 2.000000 4.500000
small 2.333333 4.333333
We can also calculate multiple types of aggregations for any given
value column.
>>> table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
... aggfunc={'D': np.mean,
... 'E': [min, max, np.mean]})
>>> table
D E
mean max mean min
A C
bar large 5.500000 9.0 7.500000 6.0
small 5.500000 9.0 8.500000 8.0
foo large 2.000000 5.0 4.500000 4.0
small 2.333333 6.0 4.333333 2.0
Function25
qcut(x, q, labels=None, retbins: bool = False, precision: int = 3, duplicates: str = 'raise')
Help on function qcut in module pandas.core.reshape.tile:
qcut(x, q, labels=None, retbins: bool = False, precision: int = 3, duplicates: str = 'raise')
Quantile-based discretization function.
Discretize variable into equal-sized buckets based on rank or based
on sample quantiles. For example 1000 values for 10 quantiles would
produce a Categorical object indicating quantile membership for each data point.
Parameters
----------
x : 1d ndarray or Series
q : int or list-like of float
Number of quantiles. 10 for deciles, 4 for quartiles, etc. Alternately
array of quantiles, e.g. [0, .25, .5, .75, 1.] for quartiles.
labels : array or False, default None
Used as labels for the resulting bins. Must be of the same length as
the resulting bins. If False, return only integer indicators of the
bins. If True, raises an error.
retbins : bool, optional
Whether to return the (bins, labels) or not. Can be useful if bins
is given as a scalar.
precision : int, optional
The precision at which to store and display the bins labels.
duplicates : {default 'raise', 'drop'}, optional
If bin edges are not unique, raise ValueError or drop non-uniques.
Returns
-------
out : Categorical or Series or array of integers if labels is False
The return type (Categorical or Series) depends on the input: a Series
of type category if input is a Series else Categorical. Bins are
represented as categories when categorical data is returned.
bins : ndarray of floats
Returned only if `retbins` is True.
Notes
-----
Out of bounds values will be NA in the resulting Categorical object
Examples
--------
>>> pd.qcut(range(5), 4)
... # doctest: +ELLIPSIS
[(-0.001, 1.0], (-0.001, 1.0], (1.0, 2.0], (2.0, 3.0], (3.0, 4.0]]
Categories (4, interval[float64, right]): [(-0.001, 1.0] < (1.0, 2.0] ...
>>> pd.qcut(range(5), 3, labels=["good", "medium", "bad"])
... # doctest: +SKIP
[good, good, medium, bad, bad]
Categories (3, object): [good < medium < bad]
>>> pd.qcut(range(5), 4, labels=False)
array([0, 0, 1, 2, 3])
待续......
下一篇链接:
https://blog.csdn.net/boysoft2002/article/details/128428569