Thursday, August 6, 2015

SAP HANA - Getting my data ready part II

After thinking about yesterday's data load overnight, I came to the conclusion that the data I put together wasn't really all that interesting. Yeah. So I get to look at historical data from a few years back and run some analysis on it. But it's just not going to be that much fun until the data is something I can use for myself.

I'm a value investor. I spend my time watching for stocks that are selling below a fair market value. One of the measures I review is the price to book ratio. Yahoo lists this metric on their finance website as shown briefly here:


In general, a low price to book ratio is sign of a company selling at a discount and I want to view companies with low ratios within certain segments of the market (say like mid-cap stocks for one and maybe tech companies in another).

For my own personal purposes, I would like to view this metric across a small list of sample companies. To do this I wrote another Python script (related to the one from yesterday) that extracts the Price/Book ratio for a list of companies that I provide.

As you can see below, this Python script uses the MySQL and the BeautifulSoup libraries to store the information and pull information out of web pages.

Here, the script is connecting to a database and pulling a list of stocks.






Further down, a URL link is created for each symbol and opened up using urllib.
Then two lists are created from content within the HTML page. The first is a list of 'headers' (the first column shown above) and a second is a list of values (the second column shown above). The script then steps through those lists and creates another list of items with all of the HTML code removed. 


Finally at the bottom, the script steps takes matching items from each list and creates an insert statement so the data can be stored in a MariaDB table.


The structure of the table is very simple. I've place the results into just two fields.


So I'm now done gathering my data and will use it for a learning exercise on HANA. It has been fun working with Python (it takes me back to the days when I worked almost exclusively with PERL) but I need to move on and see how HANA works.

No comments: