Here is a quick rundown of ffn’s capabilities. For a more complete guide, read the source, or check out the API docs.
import ffn
#%pylab inline
Data Retrieval
The main method for data retrieval is the get function. The get function uses a data provider to download data from an external service and packs that data into a pandas DataFrame for further manipulation.
data = ffn.get('agg,hyg,spy,eem,efa', start='2010-01-01', end='2014-01-01')
print data.head()
By default, the data is downloaded from Yahoo! Finance and the Adjusted Close is used as the security’s price. Other data sources are also available and you may select other fields as well. Fields are specified by using the following format: {ticker}:{field}. So, if we want to get the Open, High, Low, Close for aapl, we would do the following:
The default data provider is ffn.data.web(). This is basically just a thin wrapper around pandas’ pandas.io.data provider. Please refer to the appropriate docs for more info (data sources, etc.). The ffn.data.csv() provider is also available when we want to load data from a local file. In this case, we can tell ffn.data.get() to use the csv provider. In this case, we also want to merge this new data with the existing data we downloaded earlier. Therefore, we will provide the data object as the existing argument, and the new data will be merged into the existing DataFrame.
data = ffn.get('dbc', provider=ffn.data.csv, path='test_data.csv', existing=data)
print data.head()
As we can see above, the dbc column was added to the DataFrame. Internally, get is using the function ffn.merge, which is useful when you want to merge TimeSeries and DataFrames together. We plan on adding many more data sources over time. If you know your way with Python and would like to contribute a data provider, please feel free to submit a pull request - contributions are always welcome!
Data Manipulation
Now that we have some data, let’s start manipulating it. In quantitative finance, we are often interested in the returns of a given time series. Let’s calculate the returns by simply calling the to_returns or to_log_returns extension methods.
Let’s look at the different distributions to see how they look.
ax = returns.hist(figsize=(12, 5))
We can also use the numerous functions packed into numpy, pandas and the like to further analyze the returns. For example, we can use the corr function to get the pairwise correlations between assets.
returns.corr().as_format('.2f')
agg
hyg
spy
eem
efa
dbc
agg
1.00
-0.11
-0.33
-0.23
-0.29
-0.18
hyg
-0.11
1.00
0.77
0.75
0.76
0.49
spy
-0.33
0.77
1.00
0.88
0.92
0.59
eem
-0.23
0.75
0.88
1.00
0.90
0.62
efa
-0.29
0.76
0.92
0.90
1.00
0.61
dbc
-0.18
0.49
0.59
0.62
0.61
1.00
6 rows × 6 columns
Here we used the convenience method as_format to have a prettier output. We could also plot a heatmap to better visualize the results.
Let’s start looking at how all these securities performed over the period. To achieve this, we will plot rebased time series so that we can see how they each performed relative to eachother.
ax = data.rebase().plot(figsize=(12,5))
Performance Measurement
For a more complete view of each asset’s performance over the period, we can use the ffn.core.calc_stats() method which will create a ffn.core.GroupStats object. A GroupStats object wraps a bunch of ffn.core.PerformanceStats objects in a dict with some added convenience methods.
perf = data.calc_stats()
Now that we have our GroupStats object, we can analyze the performance in greater detail. For example, the plot method yields a graph similar to the one above.
perf.plot()
We can also display a wide array of statistics that are all contained in the PerformanceStats object. This will probably look crappy in the docs, but do try it out in a Notebook. We are also actively trying to improve the way we display this wide array of stats.
ffn also provides commonly used numerical routines and plans to add many more in the future. One can easily determine the proper weights using a mean-variance approach using the ffn.core.calc_mean_var_weights() function.
Some other interesting functions are the clustering routines, such as a Python implementation of David Varadi’s Fast Threshold Clustering Algorithm (FTCA)