Pandas - Series
Series: a "column" in a DataFrame
Create Series from list with default index(0, 1, 2 ...)
>>> s = pd.Series([1, 2, 3])
>>> s
0 1
1 2
2 3
dtype: int64
With index array:
>>> s = pd.Series([1., 2., 3.], ['a', 'b', 'c'])
>>> s
a 1.0
b 2.0
c 3.0
dtype: float64
Check the type
>>> type(s)
<class 'pandas.core.series.Series'>
Mixed type: (notice the dtype is object
)
>>> s = pd.Series([1, 2., "asdf"])
>>> s
0 1
1 2
2 asdf
dtype: object
>>> s[1]
2.0
>>> s[2]
'asdf'
Nested list:
>>> s = pd.Series([[1,2],[3,4]])
>>> s
0 [1, 2]
1 [3, 4]
dtype: object
>>> s.index
RangeIndex(start=0, stop=2, step=1)
From a scalar value:
>>> s = pd.Series(1)
>>> s
0 1
dtype: int64
>>> pd.Series(5., index=['a', 'b', 'c', 'd', 'e'])
a 5.0
b 5.0
c 5.0
d 5.0
e 5.0
dtype: float64
Create from a dict:
>>> s = pd.Series({"key": "value"})
>>> s
key value
dtype: object
>>> s.index
Index(['key'], dtype='object')
ix
.ix
:
- if index is numeric, return specified index
- if index is not numeric, return specified locations, same as iloc
count frequencies of values
pandas.Series.value_counts
Pandas Series vs Numpy Array
Pandas series will ignore nan; np will return nan if operate on nan
>>> import pandas as pd
>>> import numpy as np
>>> pd.Series([1, 2, 3, np.nan]).mean()
2.0
>>> np.array([1, 2, 3, np.nan]).mean()
nan