Matthew Beckler's Home Page

Home :: General :: Software :: Hardware :: Microcontrollers :: Artsy Things


Tracking Amazon.com Sales Rank for
"How to Teach Physics to Your Dog"

A cool guy named Chad Orzel with an interesting blog about physics has written two cool books about teaching physics to your dog. He mentioned that he wanted a system to track the book's sales rank at Amazon.com. Here's a very "quick n' dirty" way to do that. Please note that Amazon changed their page format a while back, and it took me a few months to get it back working again. Sorry for the gap in data.

Current sales rank (book / kindle):

Plot of sales rank

Plot of sales rank

How it works

Update: (2012-02-22) Just added the salesrank trackers for the new book, How to Teach Relativity to Your Dog.

Update: (2010-04-24) After Amazon changed their page format, the old and ugly bash script no longer worked. I replaced it with a nice bit of Python that works much better, and should be more reliable in the future. The script grabs a copy of the book's page at Amazon.com and parses out the sales rank number. This values are appended to a text file and plotted with gnuplot. Times are in the Pacific time zone.

#!/usr/bin/python

import os, sys, datetime, re
import subprocess
from lxml import etree

def get_pagerank(url):
    parser = etree.HTMLParser()
    tree = etree.parse(url, parser)

    raw = tree.xpath("//li[@id=\"SalesRank\"]/text()")[1]
    rank_commas = re.search(".*\#([\d,]*).*", raw)
    rank = rank_commas.group(1).replace(",", "")
    return rank

os.chdir("/home/mbeckler/mbeckler.org/dog_physics/")

url_book = "http://www.amazon.com/How-Teach-Physics-Your-Dog/dp/1416572287"
url_kindle = "http://www.amazon.com/How-Teach-Physics-Your-ebook/dp/B002ZJCQT2"
url_relativity = "http://www.amazon.com/How-Teach-Relativity-Your-Dog/dp/0465023312"
url_relativity_kindle = "http://www.amazon.com/How-Teach-Relativity-Your-ebook/dp/B0072HV11G/"

#pagerank_book = None
#pagerank_kindle = None
# TODO put this in a while loop with try blocks on the get_pagerank() calls
# or maybe while/try blocks on each page?
# or maybe if not pagerank_book:
#while not pagerank_book and not pagerank_kindle:
pagerank_book = get_pagerank(url_book)
pagerank_kindle = get_pagerank(url_kindle)
pagerank_relativity = get_pagerank(url_relativity)
pagerank_relativity_kindle = get_pagerank(url_relativity_kindle)

datetimestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M")

fid = open("dog_physics_data.txt", "a")
fid.write("%s\t%s\t%s\t%s\t%s\n" % (datetimestamp, pagerank_book, pagerank_kindle, pagerank_relativity, pagerank_relativity_kindle))
fid.close()

# write the current values to a file that is included in the webpage
fid = open("dog_physics_data_current_book.txt", "w")
fid.write("%s" % pagerank_book)
fid.close()

fid = open("dog_physics_data_current_kindle.txt", "w")
fid.write("%s" % pagerank_kindle)
fid.close()

fid = open("dog_physics_data_current_relativity.txt", "w")
fid.write("%s" % pagerank_relativity)
fid.close()

fid = open("dog_physics_data_current_relativity_kindle.txt", "w")
fid.write("%s" % pagerank_relativity_kindle)
fid.close()

subprocess.call(["gnuplot", "dog_physics.gnuplot"])
subprocess.call(["gnuplot", "dog_relativity.gnuplot"])

The script gets called by cron every hour to update the data. I'm not sure how often Amazon updates their values, so this frequency should be matched with their update frequency. There's nothing fancy being done with plotting, so in a few weeks there will probably be too much data to make a nice plot in such a naive way. You can download the raw data below to make your own plots.

Some of the related posts at Uncertain Principles

Downloads

Download combined data table (txt)

Download gnuplot script (physics)

Download gnuplot script (relativity)


Homepage Made with Vim! Validate HTML Email Me! Made with Inkscape! Validate CSS

Copyright © 2014, Matthew L. Beckler, CC BY-SA 3.0
Last modified: 2013-02-21 04:14:26 PM (EST)