logo

Pandas - Series

Series: a "column" in a DataFrame

Create Series from list with default index(0, 1, 2 ...)

>>> s = pd.Series([1, 2, 3])
>>> s
0    1
1    2
2    3
dtype: int64

With index array:

>>> s = pd.Series([1., 2., 3.], ['a', 'b', 'c'])
>>> s
a    1.0
b    2.0
c    3.0
dtype: float64

Check the type

>>> type(s)
<class 'pandas.core.series.Series'>

Mixed type: (notice the dtype is object)

>>> s = pd.Series([1, 2., "asdf"])
>>> s
0       1
1       2
2    asdf
dtype: object
>>> s[1]
2.0
>>> s[2]
'asdf'

Nested list:

>>> s = pd.Series([[1,2],[3,4]])
>>> s
0    [1, 2]
1    [3, 4]
dtype: object
>>> s.index
RangeIndex(start=0, stop=2, step=1)

From a scalar value:

>>> s = pd.Series(1)
>>> s
0    1
dtype: int64
>>> pd.Series(5., index=['a', 'b', 'c', 'd', 'e'])
a    5.0
b    5.0
c    5.0
d    5.0
e    5.0
dtype: float64

Create from a dict:

>>> s = pd.Series({"key": "value"})
>>> s
key    value
dtype: object
>>> s.index
Index(['key'], dtype='object')

ix

.ix:

  • if index is numeric, return specified index
  • if index is not numeric, return specified locations, same as iloc

count frequencies of values

pandas.Series.value_counts

Pandas Series vs Numpy Array

Pandas series will ignore nan; np will return nan if operate on nan

>>> import pandas as pd
>>> import numpy as np

>>> pd.Series([1, 2, 3, np.nan]).mean()
2.0
>>> np.array([1, 2, 3, np.nan]).mean()
nan