Facebook open sourced its forecasting tool Prohpet for time series data. Although forecasting is not a trivial task, the libraries are very easy to use and produce nice results quickly. In this basic blog post, I am going to forecast the visitor statistics based on the historical data I collected with Piwik.
Install and initialize a new virtual Python environment
# Install virtual environments package sudo pip3 install virtualenv # Create a new folder for the project mkdir python-projects cd python-projects/ # Create a new virtual environment virtualenv -p python3 py
Install Prophet and its Dependencies
Within your new Python virtual environment, install the required dependencies first and then Prophet
# Linux Dependencies sudo apt-get install python3-tk # Python Dependencies ./py/bin/pip3 install cython numpy # Prohpet ./py/bin/pip3 install fbprophet
Get the Data from your Piwik Database
We aggregate the data from the visitors table per day and store the result in a CSV file. In the case of this blog, I started collecting visitor traffic data from early 2013. Prophet allows displaying not only trends and seasonality, but also to forecast into the future.
SELECT DATE_FORMAT(visit_first_action_time,'%Y-%m-%d'), SUM(visitor_count_visits) FROM db_piwik.piwik_log_visit GROUP BY 1 INTO OUTFILE '/tmp/visits.csv' FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n';"
Usually MySQL runs with a security setting that prevents writing files to the server’s disk (for a good reason). Check the variable secure-file-priv to find the path you can use for exporting.
The data now looks similar like this:
~/python-projects $ head visits.csv 2013-11-05,3 2014-01-11,4 2014-01-14,2 2014-01-15,10 2014-01-16,8 2014-01-17,6 2014-01-18,1 2014-01-19,1 2014-01-20,1 2014-01-21,6
This is exactly the format which Prophet expects.
Forecasting with Prophet
The short but nice tutorial basically shows it all. Here is the script, it is basically the very same as from the tutorial:
import pandas as pd import numpy as np from fbprophet import Prophet import matplotlib.pyplot as plt df = pd.read_csv('visits.csv') df.columns = ['ds', 'y'] df['y'] = np.log(df['y']) df.head() m = Prophet() m.fit(df); future = m.make_future_dataframe(periods=365) future.tail() forecast = m.predict(future) forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail() figure_forecast = m.plot(forecast); plt.savefig('forcast.png') m.plot_components(forecast); plt.savefig('forcast_component.png')
The results are the forecast graph and the components as nice graphs. Facebook Prophet incorporates seasonal variations, holidays and trends derived from historical data.