API

First things first

If you want to use uncertain_panda, you need to call the following line at the beginning of your script/notebook:

from uncertain_panda import pandas as pd

This will load pandas as normal, but will add the uncertainty calculations described in the following. It is basically equivalent to do

import pandas as pd

and then add the additional features of uncertain_panda by hand.

Calculating a quantity with uncertainty

In the following, lets suppose you have a pandas object (could be a pandas.Series, a pandas.DataFrame or even the result of a groupby operation) which we call df. You want to calculate a function f on them and normally you would call

df.f()

or with arguments

df.f(some_arg=value)

To do the same calculation, but this time with uncertainties, just add an unc in between:

df.unc.f()
df.unc.f(some_arg=value)

The return value is a number/series/data frame (whatever f normally returns) with uncertainties. Thanks to the uncertainties package (make sure to star this great package), these results behave just as normal numbers. The error is even propagated correctly! Remember, df can be any pandas object and f can be any pandas function, so you can do things like

df.groupby("group").unc.median()
df[df.x > 3].y.unc.quantile(0.25)
(df + 2).loc["A"].unc.sum()

The results will behave as the normal function call - just with the uncertainties added!

Advanced functionality

Actually, the return values of the uncertainty calculations are not only “bare numbers with uncertainties”. They are instances of BootstrapResult and have a bunch of additional functionality:

class uncertain_panda.BootstrapResult(nominal_value, bootstrap)[source]

Result of any calculation performed with the unc wrapper. It is an instance of uncertainties.core.Variable, so it behaves like a normal number with uncertainties.

bs()[source]

Return the full data sample of bootstrapped results. Usually used for visualisations, such as:

df["var"].unc.mean().bs().plot(kind="hist")
ci(a=0.682689, b=None)[source]

Return the confidence interval between a and b. This is the pair of values [left, right], so that a fraction a of the bootstrapped results is left of left and b of the results is right of right. If you only give one parameter, the symmetric interval with b = a is returned. :return: a pd.Series with the columns value, left and right.

compare_ge(rhs)[source]

How many of the values are >= than rhs?

compare_gt(rhs)[source]

How many of the values are > than rhs?

compare_le(rhs)[source]

How many of the values are <= than rhs?

compare_lt(rhs)[source]

How many of the values are < than rhs?

prob(value)[source]

Return the probability to have a resut equal or greater than value.

If we assume the bootstrapped results are a probability density function, this is equivalent to the p-value.

strip()[source]

This result still includes the full sample of bootstrapped results. So it can be quite heavy (in terms of memory). The function returns an uncertainty number without the bootstrapped histogram.

Plotting

The package also adds another function to the basic pandas objects (Series, DataFrame) to plot values with uncertainties correctly. You can call it with

df.plot_with_uncertainty()

It is 1:1 equivalent to the normal call to plot, so you can put in the same argument

df.plot_with_uncertainty(kind="bar", label="Something", figsize=(20, 10))