Downloading binary files with Clojure

In addition to the FRED stuff we downloaded and parsed in the prior tutorial we now need the commodities too, they can be found on the USGS homepage.

Let’s begin with parsing the start page for all Excel files:

(defn get-commodity-xls-files []
  (re-seq #"[a-z0-9-]+\.xls" (slurp "http://minerals.usgs.gov/ds/2005/140/")) )

We then feed that list into download-commodities:

(defn download-commodities []
  (doseq [fname (get-commodity-xls-files)]
    (when-not (file-exists (str "commodities-xls/" fname))
      (download-binary (str "commodities-xls/" fname) (str "http://minerals.usgs.gov/ds/2005/140/" fname)  ))))

Update, Jürgen Hötzel just emailed me this (didn’t try it yet) which will make download-binary much shorter when I implement it:

Clojure 1.2 merged clojure.contrib.io into
clojure.java.io and JIO also provides a copy multimethod to dispatch on various IO sources:

(use ‘[clojure.java.io :as io])
(io/copy (input-stream “http://….”) (output-stream “/tmp/out.dat”))

What is that download-binary you might ask. As it turns out when we are dealing with binary files we can’t just slurp and spit things, no we need to get a lot more complicated than that. Nothing can ever be easy in the Java world, otherwise all those enterprise consultants would be out of work.

(defn download-binary [to from]
  (with-open [out (FileOutputStream. to)]
    (.write out
            (io/to-byte-array
             (io/input-stream
              (io/as-url from))))))

There you have it, you need to first import FileOutputStream and require the contrib io library to manage to download a stupid file. My ns section now looks like this:

(ns fred.core
  (:require
   [clojure.contrib.duck-streams :as ds]
   [clojure.contrib.io :as io]
   [clojure.xml :as xml]
   [clojure.contrib.str-utils2 :as string]
   [clojure.contrib.sql :as sql]
   [incanter.core :as icore]
   [incanter.stats :as istats]
   [incanter.charts :as icharts]
   [incanter.excel :as iexcel])
  (:import [java.io File FileOutputStream]))


Related Posts

Tags: , , , ,