Week 3
Overview
- Was very happy to be introduced to Numpy
- There are, however, better resources for learning Numpy
- Really enjoyed playing with QSTK -- getting a taste of what a quant framework is like
1.1
- Numpy tutorial
- Easier to go through tutorial via Wiki
- A real eye-opener. Seriously, one of the best things I've ever done. How was I ever going to get introduced to Numpy?
1.2
- Slicing arrays
- Indexing using an array of indices
- Operations on arrays. Work out how many elements in the array is better than average.
1.3
- Performing basic operations on matrices
In [1]: squareArray * 2
Out[1]:
array([[ 2, 4, 6],
[ 8, 10, 12],
[14, 16, 18]])
In [1]: matA = np.array( [[1,2], [3,4] ] )
In [1]: matB = np.array( [[5,6], [7,8] ] )
In [1]: matA * matB
Out[1]:
array([[ 5, 12],
[21, 32]])
2.1
- QSTK overview
- Set a start and end date
In [2]: import QSTK.qstkutil.qsdateutil as du
In [5]: import datetime as dt
In [8]: ls_symbols = ['AAPL', 'GLD', 'GOOG', '$SPX', 'XOM']
In [9]: dt_start = dt.datetime(2010, 1, 1)
In [10]: dt_end = dt.datetime(2010, 1, 15)
In [11]: dt_timeofday = dt.timedelta(hours=16)
In [12]: ldt_timestamps = du.getNYSEdays(dt_start, dt_end, dt_timeofday)
In [13]: ldt_timestamps
Out[13]:
[Timestamp('2010-01-04 16:00:00', tz=None),
Timestamp('2010-01-05 16:00:00', tz=None),
Timestamp('2010-01-06 16:00:00', tz=None),
Timestamp('2010-01-07 16:00:00', tz=None),
Timestamp('2010-01-08 16:00:00', tz=None),
Timestamp('2010-01-11 16:00:00', tz=None),
Timestamp('2010-01-12 16:00:00', tz=None),
Timestamp('2010-01-13 16:00:00', tz=None),
Timestamp('2010-01-14 16:00:00', tz=None)]
- "I wish I could edit this out and start over again." You kinda can, dude. :)
- Get data object
In [14]: c_dataobj = da.DataAccess('Yahoo')
In [15]: c_dataobj
Out[15]: <QSTK.qstkutil.DataAccess.DataAccess at 0xa3d14cc>
In [16]: ls_keys = ['open', 'high', 'low', 'close', 'volume', 'actual_close']
In [17]: ldf_data = c_dataobj.get_data(ldt_timestamps, ls_symbols, ls_keys)
In [20]: d_data = dict(zip(ls_keys, ldf_data))
In [22]: d_data['close']
Out[22]:
AAPL GLD GOOG $SPX XOM
2010-01-04 16:00:00 213.10 109.80 626.75 1132.99 64.55
2010-01-05 16:00:00 213.46 109.70 623.99 1136.52 64.80
2010-01-06 16:00:00 210.07 111.51 608.26 1137.14 65.36
2010-01-07 16:00:00 209.68 110.82 594.10 1141.69 65.15
2010-01-08 16:00:00 211.07 111.37 602.02 1144.98 64.89
2010-01-11 16:00:00 209.21 112.85 601.11 1146.98 65.62
2010-01-12 16:00:00 206.83 110.49 590.48 1136.22 65.29
2010-01-13 16:00:00 209.75 111.54 587.09 1145.68 65.03
2010-01-14 16:00:00 208.53 112.03 589.85 1148.46 65.04
In [16]: na_price = d_data['close'].values
In [17]: na_price
Out[17]:
array([[ 213.1 , 109.8 , 626.75, 1132.99, 64.55],
[ 213.46, 109.7 , 623.99, 1136.52, 64.8 ],
[ 210.07, 111.51, 608.26, 1137.14, 65.36],
[ 209.68, 110.82, 594.1 , 1141.69, 65.15],
[ 211.07, 111.37, 602.02, 1144.98, 64.89],
[ 209.21, 112.85, 601.11, 1146.98, 65.62],
[ 206.83, 110.49, 590.48, 1136.22, 65.29],
[ 209.75, 111.54, 587.09, 1145.68, 65.03],
[ 208.53, 112.03, 589.85, 1148.46, 65.04]])
In [18]: plt.clf()
In [19]: plt.plot(ldt_timestamps, na_price)
Out[19]:
[<matplotlib.lines.Line2D at 0xaf4abac>,
<matplotlib.lines.Line2D at 0xb25632c>,
<matplotlib.lines.Line2D at 0xb2564ac>,
<matplotlib.lines.Line2D at 0xb25662c>,
<matplotlib.lines.Line2D at 0xb2567ac>]
In [20]: plt.legend(ls_symbols)
Out[20]: <matplotlib.legend.Legend at 0xb25a56c>
In [21]: plt.ylabel('Adjusted Close')
Out[21]: <matplotlib.text.Text at 0xaf52eec>
In [22]: plt.xlabel('Date')
Out[22]: <matplotlib.text.Text at 0xaf4a70c>
In [23]: plt.show()
2.2
- Normalising the data by comparing it with the first price
In [26]: na_normalized_price = na_price / na_price[0,:]
In [27]: na_normalized_price
Out[27]:
array([[ 1. , 1. , 1. , 1. , 1. ],
[ 1.00168935, 0.99908925, 0.99559633, 1.00311565, 1.00387297],
[ 0.98578132, 1.01557377, 0.9704986 , 1.00366287, 1.01254841],
[ 0.9839512 , 1.00928962, 0.94790586, 1.0076788 , 1.00929512],
[ 0.99047396, 1.01429872, 0.96054248, 1.01058262, 1.00526723],
[ 0.98174566, 1.02777778, 0.95909055, 1.01234786, 1.0165763 ],
[ 0.97057719, 1.00628415, 0.94213004, 1.00285086, 1.01146398],
[ 0.98427968, 1.01584699, 0.93672118, 1.01120045, 1.0074361 ],
[ 0.97855467, 1.02030965, 0.94112485, 1.01365414, 1.00759101]])
In [28]: plt.plot(ldt_timestamps, na_normalized_price)
Out[28]:
[<matplotlib.lines.Line2D at 0xb49b8ac>,
<matplotlib.lines.Line2D at 0xb5197cc>,
<matplotlib.lines.Line2D at 0xb51994c>,
<matplotlib.lines.Line2D at 0xb519acc>,
<matplotlib.lines.Line2D at 0xb519c4c>]
In [29]: plt.show()