Python pandas有好几百个库函数,你都用过吗(4)

news2025/1/11 23:43:29

上一篇链接:

https://blog.csdn.net/boysoft2002/article/details/128428569

S~W:  Function46~56

Types['Function'][45:]
['set_eng_float_format', 'show_versions', 'test', 'timedelta_range', 'to_datetime', 'to_numeric', 'to_pickle', 'to_timedelta', 'unique', 'value_counts', 'wide_to_long']

Function46

set_eng_float_format(accuracy: 'int' = 3, use_eng_prefix: 'bool' = False) -> 'None'

Help on function set_eng_float_format in module pandas.io.formats.format:

set_eng_float_format(accuracy: 'int' = 3, use_eng_prefix: 'bool' = False) -> 'None'
    Alter default behavior on how float is formatted in DataFrame.
    Format float in engineering format. By accuracy, we mean the number of
    decimal digits after the floating point.
    
    See also EngFormatter.

Function47

show_versions(as_json: 'str | bool' = False) -> 'None'

Help on function show_versions in module pandas.util._print_versions:

show_versions(as_json: 'str | bool' = False) -> 'None'
    Provide useful information, important for bug reports.
    
    It comprises info about hosting operation system, pandas version,
    and versions of other installed relative packages.
    
    Parameters
    ----------
    as_json : str or bool, default False
        * If False, outputs info in a human readable form to the console.
        * If str, it will be considered as a path to a file.
          Info will be written to that file in JSON format.
        * If True, outputs info in JSON format to the console.

Function48

test(extra_args=None)

Help on function test in module pandas.util._tester:

test(extra_args=None)

Function49

timedelta_range(start=None, end=None, periods: 'Optional[int]' = None, freq=None, name=None, closed=None) -> 'TimedeltaIndex'

Help on function timedelta_range in module pandas.core.indexes.timedeltas:

timedelta_range(start=None, end=None, periods: 'Optional[int]' = None, freq=None, name=None, closed=None) -> 'TimedeltaIndex'
    Return a fixed frequency TimedeltaIndex, with day as the default
    frequency.
    
    Parameters
    ----------
    start : str or timedelta-like, default None
        Left bound for generating timedeltas.
    end : str or timedelta-like, default None
        Right bound for generating timedeltas.
    periods : int, default None
        Number of periods to generate.
    freq : str or DateOffset, default 'D'
        Frequency strings can have multiples, e.g. '5H'.
    name : str, default None
        Name of the resulting TimedeltaIndex.
    closed : str, default None
        Make the interval closed with respect to the given frequency to
        the 'left', 'right', or both sides (None).
    
    Returns
    -------
    TimedeltaIndex
    
    Notes
    -----
    Of the four parameters ``start``, ``end``, ``periods``, and ``freq``,
    exactly three must be specified. If ``freq`` is omitted, the resulting
    ``TimedeltaIndex`` will have ``periods`` linearly spaced elements between
    ``start`` and ``end`` (closed on both sides).
    
    To learn more about the frequency strings, please see `this link
    <https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`__.
    
    Examples
    --------
    >>> pd.timedelta_range(start='1 day', periods=4)
    TimedeltaIndex(['1 days', '2 days', '3 days', '4 days'],
                   dtype='timedelta64[ns]', freq='D')
    
    The ``closed`` parameter specifies which endpoint is included.  The default
    behavior is to include both endpoints.
    
    >>> pd.timedelta_range(start='1 day', periods=4, closed='right')
    TimedeltaIndex(['2 days', '3 days', '4 days'],
                   dtype='timedelta64[ns]', freq='D')
    
    The ``freq`` parameter specifies the frequency of the TimedeltaIndex.
    Only fixed frequencies can be passed, non-fixed frequencies such as
    'M' (month end) will raise.
    
    >>> pd.timedelta_range(start='1 day', end='2 days', freq='6H')
    TimedeltaIndex(['1 days 00:00:00', '1 days 06:00:00', '1 days 12:00:00',
                    '1 days 18:00:00', '2 days 00:00:00'],
                   dtype='timedelta64[ns]', freq='6H')
    
    Specify ``start``, ``end``, and ``periods``; the frequency is generated
    automatically (linearly spaced).
    
    >>> pd.timedelta_range(start='1 day', end='5 days', periods=4)
    TimedeltaIndex(['1 days 00:00:00', '2 days 08:00:00', '3 days 16:00:00',
                    '5 days 00:00:00'],
                   dtype='timedelta64[ns]', freq=None)

Function50

to_datetime(arg: 'DatetimeScalarOrArrayConvertible', errors: 'str' = 'raise', dayfirst: 'bool' = False, yearfirst: 'bool' = False, utc: 'bool | None' = None, format: 'str | None' = None, exact: 'bool' = True, unit: 'str | None' = None, infer_datetime_format: 'bool' = False, origin='unix', cache: 'bool' = True) -> 'DatetimeIndex | Series | DatetimeScalar | NaTType | None'

Help on function to_datetime in module pandas.core.tools.datetimes:

to_datetime(arg: 'DatetimeScalarOrArrayConvertible', errors: 'str' = 'raise', dayfirst: 'bool' = False, yearfirst: 'bool' = False, utc: 'bool | None' = None, format: 'str | None' = None, exact: 'bool' = True, unit: 'str | None' = None, infer_datetime_format: 'bool' = False, origin='unix', cache: 'bool' = True) -> 'DatetimeIndex | Series | DatetimeScalar | NaTType | None'
    Convert argument to datetime.
    
    Parameters
    ----------
    arg : int, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like
        The object to convert to a datetime.
    errors : {'ignore', 'raise', 'coerce'}, default 'raise'
        - If 'raise', then invalid parsing will raise an exception.
        - If 'coerce', then invalid parsing will be set as NaT.
        - If 'ignore', then invalid parsing will return the input.
    dayfirst : bool, default False
        Specify a date parse order if `arg` is str or its list-likes.
        If True, parses dates with the day first, eg 10/11/12 is parsed as
        2012-11-10.
        Warning: dayfirst=True is not strict, but will prefer to parse
        with day first (this is a known bug, based on dateutil behavior).
    yearfirst : bool, default False
        Specify a date parse order if `arg` is str or its list-likes.
    
        - If True parses dates with the year first, eg 10/11/12 is parsed as
          2010-11-12.
        - If both dayfirst and yearfirst are True, yearfirst is preceded (same
          as dateutil).
    
        Warning: yearfirst=True is not strict, but will prefer to parse
        with year first (this is a known bug, based on dateutil behavior).
    utc : bool, default None
        Return UTC DatetimeIndex if True (converting any tz-aware
        datetime.datetime objects as well).
    format : str, default None
        The strftime to parse time, eg "%d/%m/%Y", note that "%f" will parse
        all the way up to nanoseconds.
        See strftime documentation for more information on choices:
        https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior.
    exact : bool, True by default
        Behaves as:
        - If True, require an exact format match.
        - If False, allow the format to match anywhere in the target string.
    
    unit : str, default 'ns'
        The unit of the arg (D,s,ms,us,ns) denote the unit, which is an
        integer or float number. This will be based off the origin.
        Example, with unit='ms' and origin='unix' (the default), this
        would calculate the number of milliseconds to the unix epoch start.
    infer_datetime_format : bool, default False
        If True and no `format` is given, attempt to infer the format of the
        datetime strings based on the first non-NaN element,
        and if it can be inferred, switch to a faster method of parsing them.
        In some cases this can increase the parsing speed by ~5-10x.
    origin : scalar, default 'unix'
        Define the reference date. The numeric values would be parsed as number
        of units (defined by `unit`) since this reference date.
    
        - If 'unix' (or POSIX) time; origin is set to 1970-01-01.
        - If 'julian', unit must be 'D', and origin is set to beginning of
          Julian Calendar. Julian day number 0 is assigned to the day starting
          at noon on January 1, 4713 BC.
        - If Timestamp convertible, origin is set to Timestamp identified by
          origin.
    cache : bool, default True
        If True, use a cache of unique, converted dates to apply the datetime
        conversion. May produce significant speed-up when parsing duplicate
        date strings, especially ones with timezone offsets. The cache is only
        used when there are at least 50 values. The presence of out-of-bounds
        values will render the cache unusable and may slow down parsing.
    
        .. versionchanged:: 0.25.0
            - changed default value from False to True.
    
    Returns
    -------
    datetime
        If parsing succeeded.
        Return type depends on input:
    
        - list-like: DatetimeIndex
        - Series: Series of datetime64 dtype
        - scalar: Timestamp
    
        In case when it is not possible to return designated types (e.g. when
        any element of input is before Timestamp.min or after Timestamp.max)
        return will have datetime.datetime type (or corresponding
        array/Series).
    
    See Also
    --------
    DataFrame.astype : Cast argument to a specified dtype.
    to_timedelta : Convert argument to timedelta.
    convert_dtypes : Convert dtypes.
    
    Examples
    --------
    Assembling a datetime from multiple columns of a DataFrame. The keys can be
    common abbreviations like ['year', 'month', 'day', 'minute', 'second',
    'ms', 'us', 'ns']) or plurals of the same
    
    >>> df = pd.DataFrame({'year': [2015, 2016],
    ...                    'month': [2, 3],
    ...                    'day': [4, 5]})
    >>> pd.to_datetime(df)
    0   2015-02-04
    1   2016-03-05
    dtype: datetime64[ns]
    
    If a date does not meet the `timestamp limitations
    <https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
    #timeseries-timestamp-limits>`_, passing errors='ignore'
    will return the original input instead of raising any exception.
    
    Passing errors='coerce' will force an out-of-bounds date to NaT,
    in addition to forcing non-dates (or non-parseable dates) to NaT.
    
    >>> pd.to_datetime('13000101', format='%Y%m%d', errors='ignore')
    datetime.datetime(1300, 1, 1, 0, 0)
    >>> pd.to_datetime('13000101', format='%Y%m%d', errors='coerce')
    NaT
    
    Passing infer_datetime_format=True can often-times speedup a parsing
    if its not an ISO8601 format exactly, but in a regular format.
    
    >>> s = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000'] * 1000)
    >>> s.head()
    0    3/11/2000
    1    3/12/2000
    2    3/13/2000
    3    3/11/2000
    4    3/12/2000
    dtype: object
    
    >>> %timeit pd.to_datetime(s, infer_datetime_format=True)  # doctest: +SKIP
    100 loops, best of 3: 10.4 ms per loop
    
    >>> %timeit pd.to_datetime(s, infer_datetime_format=False)  # doctest: +SKIP
    1 loop, best of 3: 471 ms per loop
    
    Using a unix epoch time
    
    >>> pd.to_datetime(1490195805, unit='s')
    Timestamp('2017-03-22 15:16:45')
    >>> pd.to_datetime(1490195805433502912, unit='ns')
    Timestamp('2017-03-22 15:16:45.433502912')
    
    .. warning:: For float arg, precision rounding might happen. To prevent
        unexpected behavior use a fixed-width exact type.
    
    Using a non-unix epoch origin
    
    >>> pd.to_datetime([1, 2, 3], unit='D',
    ...                origin=pd.Timestamp('1960-01-01'))
    DatetimeIndex(['1960-01-02', '1960-01-03', '1960-01-04'],
                  dtype='datetime64[ns]', freq=None)
    
    In case input is list-like and the elements of input are of mixed
    timezones, return will have object type Index if utc=False.
    
    >>> pd.to_datetime(['2018-10-26 12:00 -0530', '2018-10-26 12:00 -0500'])
    Index([2018-10-26 12:00:00-05:30, 2018-10-26 12:00:00-05:00], dtype='object')
    
    >>> pd.to_datetime(['2018-10-26 12:00 -0530', '2018-10-26 12:00 -0500'],
    ...                utc=True)
    DatetimeIndex(['2018-10-26 17:30:00+00:00', '2018-10-26 17:00:00+00:00'],
                  dtype='datetime64[ns, UTC]', freq=None)

Function51

to_numeric(arg, errors='raise', downcast=None)

Help on function to_numeric in module pandas.core.tools.numeric:

to_numeric(arg, errors='raise', downcast=None)
    Convert argument to a numeric type.
    
    The default return dtype is `float64` or `int64`
    depending on the data supplied. Use the `downcast` parameter
    to obtain other dtypes.
    
    Please note that precision loss may occur if really large numbers
    are passed in. Due to the internal limitations of `ndarray`, if
    numbers smaller than `-9223372036854775808` (np.iinfo(np.int64).min)
    or larger than `18446744073709551615` (np.iinfo(np.uint64).max) are
    passed in, it is very likely they will be converted to float so that
    they can stored in an `ndarray`. These warnings apply similarly to
    `Series` since it internally leverages `ndarray`.
    
    Parameters
    ----------
    arg : scalar, list, tuple, 1-d array, or Series
        Argument to be converted.
    errors : {'ignore', 'raise', 'coerce'}, default 'raise'
        - If 'raise', then invalid parsing will raise an exception.
        - If 'coerce', then invalid parsing will be set as NaN.
        - If 'ignore', then invalid parsing will return the input.
    downcast : {'integer', 'signed', 'unsigned', 'float'}, default None
        If not None, and if the data has been successfully cast to a
        numerical dtype (or if the data was numeric to begin with),
        downcast that resulting data to the smallest numerical dtype
        possible according to the following rules:
    
        - 'integer' or 'signed': smallest signed int dtype (min.: np.int8)
        - 'unsigned': smallest unsigned int dtype (min.: np.uint8)
        - 'float': smallest float dtype (min.: np.float32)
    
        As this behaviour is separate from the core conversion to
        numeric values, any errors raised during the downcasting
        will be surfaced regardless of the value of the 'errors' input.
    
        In addition, downcasting will only occur if the size
        of the resulting data's dtype is strictly larger than
        the dtype it is to be cast to, so if none of the dtypes
        checked satisfy that specification, no downcasting will be
        performed on the data.
    
    Returns
    -------
    ret
        Numeric if parsing succeeded.
        Return type depends on input.  Series if Series, otherwise ndarray.
    
    See Also
    --------
    DataFrame.astype : Cast argument to a specified dtype.
    to_datetime : Convert argument to datetime.
    to_timedelta : Convert argument to timedelta.
    numpy.ndarray.astype : Cast a numpy array to a specified type.
    DataFrame.convert_dtypes : Convert dtypes.
    
    Examples
    --------
    Take separate series and convert to numeric, coercing when told to
    
    >>> s = pd.Series(['1.0', '2', -3])
    >>> pd.to_numeric(s)
    0    1.0
    1    2.0
    2   -3.0
    dtype: float64
    >>> pd.to_numeric(s, downcast='float')
    0    1.0
    1    2.0
    2   -3.0
    dtype: float32
    >>> pd.to_numeric(s, downcast='signed')
    0    1
    1    2
    2   -3
    dtype: int8
    >>> s = pd.Series(['apple', '1.0', '2', -3])
    >>> pd.to_numeric(s, errors='ignore')
    0    apple
    1      1.0
    2        2
    3       -3
    dtype: object
    >>> pd.to_numeric(s, errors='coerce')
    0    NaN
    1    1.0
    2    2.0
    3   -3.0
    dtype: float64
    
    Downcasting of nullable integer and floating dtypes is supported:
    
    >>> s = pd.Series([1, 2, 3], dtype="Int64")
    >>> pd.to_numeric(s, downcast="integer")
    0    1
    1    2
    2    3
    dtype: Int8
    >>> s = pd.Series([1.0, 2.1, 3.0], dtype="Float64")
    >>> pd.to_numeric(s, downcast="float")
    0    1.0
    1    2.1
    2    3.0
    dtype: Float32

Function52

to_pickle(obj: Any, filepath_or_buffer: Union[ForwardRef('PathLike[str]'), str, IO[~AnyStr], io.RawIOBase, io.BufferedIOBase, io.TextIOBase, _io.TextIOWrapper, mmap.mmap], compression: Union[str, Dict[str, Any], NoneType] = 'infer', protocol: int = 5, storage_options: Union[Dict[str, Any], NoneType] = None)

Help on function to_pickle in module pandas.io.pickle:

to_pickle(obj: Any, filepath_or_buffer: Union[ForwardRef('PathLike[str]'), str, IO[~AnyStr], io.RawIOBase, io.BufferedIOBase, io.TextIOBase, _io.TextIOWrapper, mmap.mmap], compression: Union[str, Dict[str, Any], NoneType] = 'infer', protocol: int = 5, storage_options: Union[Dict[str, Any], NoneType] = None)
    Pickle (serialize) object to file.
    
    Parameters
    ----------
    obj : any object
        Any python object.
    filepath_or_buffer : str, path object or file-like object
        File path, URL, or buffer where the pickled object will be stored.
    
        .. versionchanged:: 1.0.0
           Accept URL. URL has to be of S3 or GCS.
    
    compression : {'infer', 'gzip', 'bz2', 'zip', 'xz', None}, default 'infer'
        If 'infer' and 'path_or_url' is path-like, then detect compression from
        the following extensions: '.gz', '.bz2', '.zip', or '.xz' (otherwise no
        compression) If 'infer' and 'path_or_url' is not path-like, then use
        None (= no decompression).
    protocol : int
        Int which indicates which protocol should be used by the pickler,
        default HIGHEST_PROTOCOL (see [1], paragraph 12.1.2). The possible
        values for this parameter depend on the version of Python. For Python
        2.x, possible values are 0, 1, 2. For Python>=3.0, 3 is a valid value.
        For Python >= 3.4, 4 is a valid value. A negative value for the
        protocol parameter is equivalent to setting its value to
        HIGHEST_PROTOCOL.
    
    storage_options : dict, optional
        Extra options that make sense for a particular storage connection, e.g.
        host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
        are forwarded to ``urllib`` as header options. For other URLs (e.g.
        starting with "s3://", and "gcs://") the key-value pairs are forwarded to
        ``fsspec``. Please see ``fsspec`` and ``urllib`` for more details.
    
        .. versionadded:: 1.2.0
    
        .. [1] https://docs.python.org/3/library/pickle.html
    
    See Also
    --------
    read_pickle : Load pickled pandas object (or any object) from file.
    DataFrame.to_hdf : Write DataFrame to an HDF5 file.
    DataFrame.to_sql : Write DataFrame to a SQL database.
    DataFrame.to_parquet : Write a DataFrame to the binary parquet format.
    
    Examples
    --------
    >>> original_df = pd.DataFrame({"foo": range(5), "bar": range(5, 10)})
    >>> original_df
       foo  bar
    0    0    5
    1    1    6
    2    2    7
    3    3    8
    4    4    9
    >>> pd.to_pickle(original_df, "./dummy.pkl")
    
    >>> unpickled_df = pd.read_pickle("./dummy.pkl")
    >>> unpickled_df
       foo  bar
    0    0    5
    1    1    6
    2    2    7
    3    3    8
    4    4    9
    
    >>> import os
    >>> os.remove("./dummy.pkl")

Function53

to_timedelta(arg, unit=None, errors='raise')

Help on function to_timedelta in module pandas.core.tools.timedeltas:

to_timedelta(arg, unit=None, errors='raise')
    Convert argument to timedelta.
    
    Timedeltas are absolute differences in times, expressed in difference
    units (e.g. days, hours, minutes, seconds). This method converts
    an argument from a recognized timedelta format / value into
    a Timedelta type.
    
    Parameters
    ----------
    arg : str, timedelta, list-like or Series
        The data to be converted to timedelta.
    
        .. deprecated:: 1.2
            Strings with units 'M', 'Y' and 'y' do not represent
            unambiguous timedelta values and will be removed in a future version
    
    unit : str, optional
        Denotes the unit of the arg for numeric `arg`. Defaults to ``"ns"``.
    
        Possible values:
    
        * 'W'
        * 'D' / 'days' / 'day'
        * 'hours' / 'hour' / 'hr' / 'h'
        * 'm' / 'minute' / 'min' / 'minutes' / 'T'
        * 'S' / 'seconds' / 'sec' / 'second'
        * 'ms' / 'milliseconds' / 'millisecond' / 'milli' / 'millis' / 'L'
        * 'us' / 'microseconds' / 'microsecond' / 'micro' / 'micros' / 'U'
        * 'ns' / 'nanoseconds' / 'nano' / 'nanos' / 'nanosecond' / 'N'
    
        .. versionchanged:: 1.1.0
    
           Must not be specified when `arg` context strings and
           ``errors="raise"``.
    
    errors : {'ignore', 'raise', 'coerce'}, default 'raise'
        - If 'raise', then invalid parsing will raise an exception.
        - If 'coerce', then invalid parsing will be set as NaT.
        - If 'ignore', then invalid parsing will return the input.
    
    Returns
    -------
    timedelta64 or numpy.array of timedelta64
        Output type returned if parsing succeeded.
    
    See Also
    --------
    DataFrame.astype : Cast argument to a specified dtype.
    to_datetime : Convert argument to datetime.
    convert_dtypes : Convert dtypes.
    
    Notes
    -----
    If the precision is higher than nanoseconds, the precision of the duration is
    truncated to nanoseconds for string inputs.
    
    Examples
    --------
    Parsing a single string to a Timedelta:
    
    >>> pd.to_timedelta('1 days 06:05:01.00003')
    Timedelta('1 days 06:05:01.000030')
    >>> pd.to_timedelta('15.5us')
    Timedelta('0 days 00:00:00.000015500')
    
    Parsing a list or array of strings:
    
    >>> pd.to_timedelta(['1 days 06:05:01.00003', '15.5us', 'nan'])
    TimedeltaIndex(['1 days 06:05:01.000030', '0 days 00:00:00.000015500', NaT],
                   dtype='timedelta64[ns]', freq=None)
    
    Converting numbers by specifying the `unit` keyword argument:
    
    >>> pd.to_timedelta(np.arange(5), unit='s')
    TimedeltaIndex(['0 days 00:00:00', '0 days 00:00:01', '0 days 00:00:02',
                    '0 days 00:00:03', '0 days 00:00:04'],
                   dtype='timedelta64[ns]', freq=None)
    >>> pd.to_timedelta(np.arange(5), unit='d')
    TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'],
                   dtype='timedelta64[ns]', freq=None)

Function54

unique(values)

Help on function unique in module pandas.core.algorithms:

unique(values)
    Hash table-based unique. Uniques are returned in order
    of appearance. This does NOT sort.
    
    Significantly faster than numpy.unique for long enough sequences.
    Includes NA values.
    
    Parameters
    ----------
    values : 1d array-like
    
    Returns
    -------
    numpy.ndarray or ExtensionArray
    
        The return can be:
    
        * Index : when the input is an Index
        * Categorical : when the input is a Categorical dtype
        * ndarray : when the input is a Series/ndarray
    
        Return numpy.ndarray or ExtensionArray.
    
    See Also
    --------
    Index.unique : Return unique values from an Index.
    Series.unique : Return unique values of Series object.
    
    Examples
    --------
    >>> pd.unique(pd.Series([2, 1, 3, 3]))
    array([2, 1, 3])
    
    >>> pd.unique(pd.Series([2] + [1] * 5))
    array([2, 1])
    
    >>> pd.unique(pd.Series([pd.Timestamp("20160101"), pd.Timestamp("20160101")]))
    array(['2016-01-01T00:00:00.000000000'], dtype='datetime64[ns]')
    
    >>> pd.unique(
    ...     pd.Series(
    ...         [
    ...             pd.Timestamp("20160101", tz="US/Eastern"),
    ...             pd.Timestamp("20160101", tz="US/Eastern"),
    ...         ]
    ...     )
    ... )
    <DatetimeArray>
    ['2016-01-01 00:00:00-05:00']
    Length: 1, dtype: datetime64[ns, US/Eastern]
    
    >>> pd.unique(
    ...     pd.Index(
    ...         [
    ...             pd.Timestamp("20160101", tz="US/Eastern"),
    ...             pd.Timestamp("20160101", tz="US/Eastern"),
    ...         ]
    ...     )
    ... )
    DatetimeIndex(['2016-01-01 00:00:00-05:00'],
            dtype='datetime64[ns, US/Eastern]',
            freq=None)
    
    >>> pd.unique(list("baabc"))
    array(['b', 'a', 'c'], dtype=object)
    
    An unordered Categorical will return categories in the
    order of appearance.
    
    >>> pd.unique(pd.Series(pd.Categorical(list("baabc"))))
    ['b', 'a', 'c']
    Categories (3, object): ['a', 'b', 'c']
    
    >>> pd.unique(pd.Series(pd.Categorical(list("baabc"), categories=list("abc"))))
    ['b', 'a', 'c']
    Categories (3, object): ['a', 'b', 'c']
    
    An ordered Categorical preserves the category ordering.
    
    >>> pd.unique(
    ...     pd.Series(
    ...         pd.Categorical(list("baabc"), categories=list("abc"), ordered=True)
    ...     )
    ... )
    ['b', 'a', 'c']
    Categories (3, object): ['a' < 'b' < 'c']
    
    An array of tuples
    
    >>> pd.unique([("a", "b"), ("b", "a"), ("a", "c"), ("b", "a")])
    array([('a', 'b'), ('b', 'a'), ('a', 'c')], dtype=object)

Function55

value_counts(values, sort: 'bool' = True, ascending: 'bool' = False, normalize: 'bool' = False, bins=None, dropna: 'bool' = True) -> 'Series'

Help on function value_counts in module pandas.core.algorithms:

value_counts(values, sort: 'bool' = True, ascending: 'bool' = False, normalize: 'bool' = False, bins=None, dropna: 'bool' = True) -> 'Series'
    Compute a histogram of the counts of non-null values.
    
    Parameters
    ----------
    values : ndarray (1-d)
    sort : bool, default True
        Sort by values
    ascending : bool, default False
        Sort in ascending order
    normalize: bool, default False
        If True then compute a relative histogram
    bins : integer, optional
        Rather than count values, group them into half-open bins,
        convenience for pd.cut, only works with numeric data
    dropna : bool, default True
        Don't include counts of NaN
    
    Returns
    -------
    Series

Function56

wide_to_long(df: 'DataFrame', stubnames, i, j, sep: 'str' = '', suffix: 'str' = '\\d+') -> 'DataFrame'

Help on function wide_to_long in module pandas.core.reshape.melt:

wide_to_long(df: 'DataFrame', stubnames, i, j, sep: 'str' = '', suffix: 'str' = '\\d+') -> 'DataFrame'
    Wide panel to long format. Less flexible but more user-friendly than melt.
    
    With stubnames ['A', 'B'], this function expects to find one or more
    group of columns with format
    A-suffix1, A-suffix2,..., B-suffix1, B-suffix2,...
    You specify what you want to call this suffix in the resulting long format
    with `j` (for example `j='year'`)
    
    Each row of these wide variables are assumed to be uniquely identified by
    `i` (can be a single column name or a list of column names)
    
    All remaining variables in the data frame are left intact.
    
    Parameters
    ----------
    df : DataFrame
        The wide-format DataFrame.
    stubnames : str or list-like
        The stub name(s). The wide format variables are assumed to
        start with the stub names.
    i : str or list-like
        Column(s) to use as id variable(s).
    j : str
        The name of the sub-observation variable. What you wish to name your
        suffix in the long format.
    sep : str, default ""
        A character indicating the separation of the variable names
        in the wide format, to be stripped from the names in the long format.
        For example, if your column names are A-suffix1, A-suffix2, you
        can strip the hyphen by specifying `sep='-'`.
    suffix : str, default '\\d+'
        A regular expression capturing the wanted suffixes. '\\d+' captures
        numeric suffixes. Suffixes with no numbers could be specified with the
        negated character class '\\D+'. You can also further disambiguate
        suffixes, for example, if your wide variables are of the form A-one,
        B-two,.., and you have an unrelated column A-rating, you can ignore the
        last one by specifying `suffix='(!?one|two)'`. When all suffixes are
        numeric, they are cast to int64/float64.
    
    Returns
    -------
    DataFrame
        A DataFrame that contains each stub name as a variable, with new index
        (i, j).
    
    See Also
    --------
    melt : Unpivot a DataFrame from wide to long format, optionally leaving
        identifiers set.
    pivot : Create a spreadsheet-style pivot table as a DataFrame.
    DataFrame.pivot : Pivot without aggregation that can handle
        non-numeric data.
    DataFrame.pivot_table : Generalization of pivot that can handle
        duplicate values for one index/column pair.
    DataFrame.unstack : Pivot based on the index values instead of a
        column.
    
    Notes
    -----
    All extra variables are left untouched. This simply uses
    `pandas.melt` under the hood, but is hard-coded to "do the right thing"
    in a typical case.
    
    Examples
    --------
    >>> np.random.seed(123)
    >>> df = pd.DataFrame({"A1970" : {0 : "a", 1 : "b", 2 : "c"},
    ...                    "A1980" : {0 : "d", 1 : "e", 2 : "f"},
    ...                    "B1970" : {0 : 2.5, 1 : 1.2, 2 : .7},
    ...                    "B1980" : {0 : 3.2, 1 : 1.3, 2 : .1},
    ...                    "X"     : dict(zip(range(3), np.random.randn(3)))
    ...                   })
    >>> df["id"] = df.index
    >>> df
      A1970 A1980  B1970  B1980         X  id
    0     a     d    2.5    3.2 -1.085631   0
    1     b     e    1.2    1.3  0.997345   1
    2     c     f    0.7    0.1  0.282978   2
    >>> pd.wide_to_long(df, ["A", "B"], i="id", j="year")
    ... # doctest: +NORMALIZE_WHITESPACE
                    X  A    B
    id year
    0  1970 -1.085631  a  2.5
    1  1970  0.997345  b  1.2
    2  1970  0.282978  c  0.7
    0  1980 -1.085631  d  3.2
    1  1980  0.997345  e  1.3
    2  1980  0.282978  f  0.1
    
    With multiple id columns
    
    >>> df = pd.DataFrame({
    ...     'famid': [1, 1, 1, 2, 2, 2, 3, 3, 3],
    ...     'birth': [1, 2, 3, 1, 2, 3, 1, 2, 3],
    ...     'ht1': [2.8, 2.9, 2.2, 2, 1.8, 1.9, 2.2, 2.3, 2.1],
    ...     'ht2': [3.4, 3.8, 2.9, 3.2, 2.8, 2.4, 3.3, 3.4, 2.9]
    ... })
    >>> df
       famid  birth  ht1  ht2
    0      1      1  2.8  3.4
    1      1      2  2.9  3.8
    2      1      3  2.2  2.9
    3      2      1  2.0  3.2
    4      2      2  1.8  2.8
    5      2      3  1.9  2.4
    6      3      1  2.2  3.3
    7      3      2  2.3  3.4
    8      3      3  2.1  2.9
    >>> l = pd.wide_to_long(df, stubnames='ht', i=['famid', 'birth'], j='age')
    >>> l
    ... # doctest: +NORMALIZE_WHITESPACE
                      ht
    famid birth age
    1     1     1    2.8
                2    3.4
          2     1    2.9
                2    3.8
          3     1    2.2
                2    2.9
    2     1     1    2.0
                2    3.2
          2     1    1.8
                2    2.8
          3     1    1.9
                2    2.4
    3     1     1    2.2
                2    3.3
          2     1    2.3
                2    3.4
          3     1    2.1
                2    2.9
    
    Going from long back to wide just takes some creative use of `unstack`
    
    >>> w = l.unstack()
    >>> w.columns = w.columns.map('{0[0]}{0[1]}'.format)
    >>> w.reset_index()
       famid  birth  ht1  ht2
    0      1      1  2.8  3.4
    1      1      2  2.9  3.8
    2      1      3  2.2  2.9
    3      2      1  2.0  3.2
    4      2      2  1.8  2.8
    5      2      3  1.9  2.4
    6      3      1  2.2  3.3
    7      3      2  2.3  3.4
    8      3      3  2.1  2.9
    
    Less wieldy column names are also handled
    
    >>> np.random.seed(0)
    >>> df = pd.DataFrame({'A(weekly)-2010': np.random.rand(3),
    ...                    'A(weekly)-2011': np.random.rand(3),
    ...                    'B(weekly)-2010': np.random.rand(3),
    ...                    'B(weekly)-2011': np.random.rand(3),
    ...                    'X' : np.random.randint(3, size=3)})
    >>> df['id'] = df.index
    >>> df # doctest: +NORMALIZE_WHITESPACE, +ELLIPSIS
       A(weekly)-2010  A(weekly)-2011  B(weekly)-2010  B(weekly)-2011  X  id
    0        0.548814        0.544883        0.437587        0.383442  0   0
    1        0.715189        0.423655        0.891773        0.791725  1   1
    2        0.602763        0.645894        0.963663        0.528895  1   2
    
    >>> pd.wide_to_long(df, ['A(weekly)', 'B(weekly)'], i='id',
    ...                 j='year', sep='-')
    ... # doctest: +NORMALIZE_WHITESPACE
             X  A(weekly)  B(weekly)
    id year
    0  2010  0   0.548814   0.437587
    1  2010  1   0.715189   0.891773
    2  2010  1   0.602763   0.963663
    0  2011  0   0.544883   0.383442
    1  2011  1   0.423655   0.791725
    2  2011  1   0.645894   0.528895
    
    If we have many columns, we could also use a regex to find our
    stubnames and pass that list on to wide_to_long
    
    >>> stubnames = sorted(
    ...     set([match[0] for match in df.columns.str.findall(
    ...         r'[A-B]\(.*\)').values if match != []])
    ... )
    >>> list(stubnames)
    ['A(weekly)', 'B(weekly)']
    
    All of the above examples have integers as suffixes. It is possible to
    have non-integers as suffixes.
    
    >>> df = pd.DataFrame({
    ...     'famid': [1, 1, 1, 2, 2, 2, 3, 3, 3],
    ...     'birth': [1, 2, 3, 1, 2, 3, 1, 2, 3],
    ...     'ht_one': [2.8, 2.9, 2.2, 2, 1.8, 1.9, 2.2, 2.3, 2.1],
    ...     'ht_two': [3.4, 3.8, 2.9, 3.2, 2.8, 2.4, 3.3, 3.4, 2.9]
    ... })
    >>> df
       famid  birth  ht_one  ht_two
    0      1      1     2.8     3.4
    1      1      2     2.9     3.8
    2      1      3     2.2     2.9
    3      2      1     2.0     3.2
    4      2      2     1.8     2.8
    5      2      3     1.9     2.4
    6      3      1     2.2     3.3
    7      3      2     2.3     3.4
    8      3      3     2.1     2.9
    
    >>> l = pd.wide_to_long(df, stubnames='ht', i=['famid', 'birth'], j='age',
    ...                     sep='_', suffix=r'\w+')
    >>> l
    ... # doctest: +NORMALIZE_WHITESPACE
                      ht
    famid birth age
    1     1     one  2.8
                two  3.4
          2     one  2.9
                two  3.8
          3     one  2.2
                two  2.9
    2     1     one  2.0
                two  3.2
          2     one  1.8
                two  2.8
          3     one  1.9
                two  2.4
    3     1     one  2.2
                two  3.3
          2     one  2.3
                two  3.4
          3     one  2.1
                two  2.9

待续......

下一篇链接:

https://hannyang.blog.csdn.net/article/details/128428995

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/112551.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

技术分享 Oracle下启用块跟踪

创建存放块跟踪文件目录 [oraclehost01 ~]$ cd /u01/app [oraclehost01 app]$ mkdir BCT 启用块跟踪 SQL> alter database enable block change tracking using file /u01/app/BCT/rman.bct; 检查块跟踪状态 SQL> col filename for a22 SQL> select filename, status,…

linux中级——libcurl库访问百度

libcurl简介: 在linux底下用c语言做http编程方法&#xff0c;依赖libcurl。 libcurl是一个跨平台的网络协议库&#xff0c;支持http, https, ftp, gopher, telnet, dict, file, 和ldap 协议。libcurl同样支持HTTPS证书授权&#xff0c;HTTP POST, HTTP PUT, FTP 上传, HTTP基…

1549:最大数——线段树

【题目描述】 原题来自&#xff1a;JSOI 2008 给定一个正整数数列 a1,a2,a3,⋯,an &#xff0c;每一个数都在 0∼p–1 之间。可以对这列数进行两种操作&#xff1a; 添加操作&#xff1a;向序列后添加一个数&#xff0c;序列长度变成 n1&#xff1b; 询问操作&#xff1a;询…

数据库系统概论笔记

数据库系统概论(王珊 萨师煊 编著)笔记。 第一章 绪论 1.数据库系统概述 1.1数据库的4个基本概念 数据&#xff1a;描述事物的符号记录称为数据。数据的含义称为数据的语义&#xff0c;数据与其语义是不可分的。数据库&#xff1a;长期存储在计算机内、有组织的、可共享的大…

fpga实操训练(仿真和状态机)

【 声明&#xff1a;版权所有&#xff0c;欢迎转载&#xff0c;请勿用于商业用途。 联系信箱&#xff1a;feixiaoxing 163.com】 在进行fpga上板子实验之前&#xff0c;相信很多同学都是通过仿真的方式来实现verilog学习的。仿真比较容易&#xff0c;也不需要依赖物理硬件&…

offsetTop、clientTop、scrollTop等属性详解【概念+详细例子分析】

文章目录一、 offsetscrollclient详细讲解1-1 offset系列1-2 client系列1-3 scroll系列二、 一张图片即可理解一、 offsetscrollclient详细讲解 1-1 offset系列 MDN中offset… offsetWidth/offsetHeight :对象的可见宽度offsetLeft/offsetTop&#xff1a; 当前元素距浏览器边界…

AcWing第 82 场周赛

第k个数 给定一个长度为 nn 的整数数列 a1,a2,…,ana1,a2,…,an&#xff0c;以及一个整数 kk。 请你计算并输出该数列从大到小排序后的第 kk 个数。 输入格式 第一行包含两个整数 n,kn,k。 第二行包含 nn 个整数 a1,a2,…,ana1,a2,…,an。 输出格式 一个整数&#xff0c…

map与set详解

&#x1f9f8;&#x1f9f8;&#x1f9f8;各位大佬大家好&#xff0c;我是猪皮兄弟&#x1f9f8;&#x1f9f8;&#x1f9f8; 文章目录一、两个概念二、set①set的两种遍历方式②set的erase③set的count三、map①SGI-STL中关于键值对的定义②map的insert③访问键值对④map的op…

docker redis容器化(极简教程)

1.通过redis-cli连接你原来的redis&#xff0c;进入后输入info&#xff0c;查看到具体版本号 2.下载redis官方镜像,docker pull redis:你的版本号 3.创建一个新文件夹redis&#xff0c;mkdir -r /hadoop/redis 4.复制你原来的redis.conf&#xff0c;到redis文件夹中,cp /usr/l…

c语言 指针进阶5 6 自定义冒泡函数 qsort

指向函数指针数组的指针 回调函数 如何使用 一个函数可以实现加减乘除 calc&#xff08;&#xff09; 不同点通过函数参数传进去 代码解释如下 int Add(int x, int y) {return x y; } int Sub(int x, int y) {return x - y; } int Mul(int x, int y) {return x * y; } in…

JDBC -- API

目录 DriverManager 驱动管理类 作用 注册驱动 获取数据库连接 Connection 数据库连接对象 作用 获取执行SQL的对象 管理事务 Statement 作用 执行SQL语句 ResultSet 结果集对象 作用 封装了DQL查询语句的结果 获取查询结果 PreparedStatement 作用 预编译SQ…

C#大型医院HIS系统源码 医院信息管理系统源码 C/S架构 VS2013+sql2012

了解更多源码内容&#xff0c;可私信我。 开发环境&#xff1a;VS2013sql2012 C/S架构 一、门诊系统&#xff1a; 1、挂号与预约系统:实现了医院门诊部挂号处所需的各种功能&#xff0c;包括门诊安排的管理&#xff0c;号表的生成及维护&#xff0c;门诊预约管理和挂号处理&…

6. SSM整合

1. SSM整合配置 SM整合流程 创建工程SSM整合 Spring SpringConfig MyBatis MybatisConfigJdbcConfigjdbc.properties SpringMVC ServletContainerInitConfigSpringMvcConfig 1.1 创建工程&#xff0c;添加依赖和插件 <dependencies><dependency><groupId&g…

【LeetCode】解数独 [H](深度优先遍历)

37. 解数独 - 力扣&#xff08;LeetCode&#xff09; 一、题目 编写一个程序&#xff0c;通过填充空格来解决数独问题。 数独的解法需 遵循如下规则&#xff1a; 数字 1-9 在每一行只能出现一次。数字 1-9 在每一列只能出现一次。数字 1-9 在每一个以粗实线分隔的 3x3 宫内只…

设计模式-责任链模式

一、知其然 责任链字面含义第一联想到的就是他是一个链式的行为&#xff0c;就像一个链条一样把所产生的动力传输到到齿轮上一样&#xff1b;还有类似生活中的一个游戏“击鼓传花”&#xff0c;这样说好像也是泛泛而谈&#xff0c;来看看度娘的官方概念&#xff08;摘自百度百科…

[每周一更]-(第26期):反爬虫机制

随着网站的越来越普及&#xff0c;我们开发出来的知识类网站更不希望被竞争对手爬虫&#xff0c;虽然现在网络中充斥着各种各样的蜘蛛&#xff0c;有合法的浏览器爬虫&#xff0c;以及不合法 的人为爬虫&#xff0c;所以攻防战一直都存在&#xff0c;我们只能更好的设定规则&am…

中文文本分类

手把手带你做一个文本分类实战项目(模型代码解读) https://www.bilibili.com/video/BV15Z4y1S7aR/?spm_id_from333.788.recommend_more_video.-1&vd_sourcec47fbb8166930edc486d8fdc405bf569 中文汉字对应的数字索引 之后对应的数字索引 之后找到tokn embedding的东西 1…

34. 池化层 / 汇聚层

1. 池化层 如果我们拍摄黑白之间轮廓清晰的图像X&#xff0c;并将整个图像向右移动一个像素&#xff0c;即Z[i, j] X[i, j 1]&#xff0c;则新图像Z的输出可能大不相同。而在现实中&#xff0c;随着拍摄角度的移动&#xff0c;任何物体几乎不可能发生在同一像素上。即使用三脚…

15【SpringMVC的注解开发】

文章目录二、SpringMVC注解支持2.1 回顾Servlet容器启动源码流程2.2 分析SpringMVC启动源码分析2.2.1 SpringServletContainerInitializer源码分析2.2.2 WebApplicationInitializer源码分析1&#xff09;AbstractContextLoaderInitializer2&#xff09;AbstractDispatcherServl…

短视频播放量超10w后,流量变少的问题解决方案

短视频播放量超10w后&#xff0c;流量变少的问题解决方案 上一篇我们聊了视频播放超10w后&#xff0c;会遇到流量变少的问题并分析了可能的原因&#xff0c;既然知道了原因&#xff0c;那么我们就可以针对性的去解决了。 今天给大家聊一聊在我赢助手跟超200名短视频创作者沟通…