:: :: :: :: :: ::
A cool guy named Chad Orzel with an interesting blog about physics has written two cool books about teaching physics to your dog. He mentioned that he wanted a system to track the book's sales rank at Amazon.com. Here's a very "quick n' dirty" way to do that. Please note that Amazon changed their page format a while back, and it took me a few months to get it back working again. Sorry for the gap in data.
p>
p>
Update: (2012-02-22) Just added the salesrank trackers for the new book, How to Teach Relativity to Your Dog.
Update: (2010-04-24) After Amazon changed their page format, the old and ugly bash script no longer worked. I replaced it with a nice bit of Python that works much better, and should be more reliable in the future. The script grabs a copy of the book's page at Amazon.com and parses out the sales rank number. This values are appended to a text file and plotted with gnuplot. Times are in the Pacific time zone.
import os, sys, datetime, re import subprocess from lxml import etree def get_pagerank(url): parser = etree.HTMLParser() tree = etree.parse(url, parser) raw = tree.xpath("//li[@id=\"SalesRank\"]/text()")[1] rank_commas = re.search(".*\#([\d,]*).*", raw) rank = rank_commas.group(1).replace(",", "") return rank os.chdir("/home/mbeckler/mbeckler.org/dog_physics/") url_book = "http://www.amazon.com/How-Teach-Physics-Your-Dog/dp/1416572287" url_kindle = "http://www.amazon.com/How-Teach-Physics-Your-ebook/dp/B002ZJCQT2" url_relativity = "http://www.amazon.com/How-Teach-Relativity-Your-Dog/dp/0465023312" url_relativity_kindle = "http://www.amazon.com/How-Teach-Relativity-Your-ebook/dp/B0072HV11G/" #pagerank_book = None #pagerank_kindle = None # TODO put this in a while loop with try blocks on the get_pagerank() calls # or maybe while/try blocks on each page? # or maybe if not pagerank_book: #while not pagerank_book and not pagerank_kindle: pagerank_book = get_pagerank(url_book) pagerank_kindle = get_pagerank(url_kindle) pagerank_relativity = get_pagerank(url_relativity) pagerank_relativity_kindle = get_pagerank(url_relativity_kindle) datetimestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M") fid = open("dog_physics_data.txt", "a") fid.write("%s\t%s\t%s\t%s\t%s\n" % (datetimestamp, pagerank_book, pagerank_kindle, pagerank_relativity, pagerank_relativity_kindle)) fid.close() # write the current values to a file that is included in the webpage fid = open("dog_physics_data_current_book.txt", "w") fid.write("%s" % pagerank_book) fid.close() fid = open("dog_physics_data_current_kindle.txt", "w") fid.write("%s" % pagerank_kindle) fid.close() fid = open("dog_physics_data_current_relativity.txt", "w") fid.write("%s" % pagerank_relativity) fid.close() fid = open("dog_physics_data_current_relativity_kindle.txt", "w") fid.write("%s" % pagerank_relativity_kindle) fid.close() subprocess.call(["gnuplot", "dog_physics.gnuplot"]) subprocess.call(["gnuplot", "dog_relativity.gnuplot"])
The script gets called by cron every hour to update the data. I'm not sure how often Amazon updates their values, so this frequency should be matched with their update frequency. There's nothing fancy being done with plotting, so in a few weeks there will probably be too much data to make a nice plot in such a naive way. You can download the raw data below to make your own plots.
Download combined data table (txt)
Download gnuplot script (physics)
Download gnuplot script (relativity)
Copyright © 2004 - 2024, Matthew L. Beckler, CC BY-SA 3.0
Last modified: 2013-02-21 04:14:26 PM (EST)