File uploading in RoR

Update: here is the file upload equivalent using form_for:

<% form_for(@bonus_offer, :html => {:multipart => true}) do |f| %>
  <%= f.error_messages %>
  <p>
    <%= f.label :picture %><br />
    <%= f.file_field :picture %>
  </p>
  <p>
    <%= f.submit "Create" %>
  </p>
<% end %>

And in the controller:

def create
  @bonus_offer = BonusOffer.new(params[:bonus_offer])

  pic = params[:bonus_offer][:picture]

  @bonus_offer.picture = pic.original_filename

  FileUtils.copy(pic.local_path, "#{RAILS_ROOT}/public/images/#{pic.original_filename}")

  respond_to do |format|
    if @bonus_offer.save
      flash[:notice] = 'BonusOffer was successfully created.'
      format.html { redirect_to(@bonus_offer) }
      format.xml  { render :xml => @bonus_offer, :status => :created, :location => @bonus_offer }
    else
      format.html { render :action => "new" }
      format.xml  { render :xml => @bonus_offer.errors, :status => :unprocessable_entity }
    end
  end
end

- - - - - - - - -


Finally we’re getting to the uploading part. Handling uploads (at least not big ones) is no biggie in Ruby on Rails. No most of the code in the listing below has to do with inserting the data we’ve uploaded.

As noted in the previous article we will upload a zip containing a bunch of html documents. We will use the document name as title for the inserted article, the document contents will of course become the article body and the article language will be chosen in the upload form.

With the help of the chosen language we will retrieve all blogs in that same language and portion out all articles evenly but randomly between the blogs, more on that below. We will also use a predefined set of words (tags and category words) to use, those too, will be retrieved based on the chosen language. The article bodies will be searched for these predefined words which will be counted and the ones who occur the most times will be selected to actually become the future tags and categories in the wordpress posts.

Let’s look at the upload form first:

<h1>Upload ZIP with articles</h1>

<% form_tag('upload', :multipart => true) do %>
  <p>
    File:<br />
    <%= file_field_tag 'zip_file' %><br />
  </p>
  <p>
    Country Code:<br />
   <%= text_field_tag 'country_code' %><br />
  </p>
  <p>
    <%= submit_tag "Upload" %>
  </p>
<% end %>

<%= link_to 'Back', pages_path %>

Not much to add here, note the multipart => true and the file_field_tag.

def get_cats_or_tags(els, filtered, ltresh)
    tmp = els.collect{ |el|
      str = el.name.downcase
      cnt = filtered.chars.scan(Regexp.new("#{str}")).length
      [cnt, str]
    }.find_all{|el| el[0] != 0 }

    cut_length = tmp.length > ltresh ? ltresh : tmp.length
    tmp.sort.slice(-cut_length, cut_length).collect{|el| el[1] }.join(",")
  end

  def upload
    require 'find'
    require 'fileutils'

    @msg = ""

    if params[:country_code].length == 2
      zipf = params[:zip_file]
      country_code = params[:country_code]
      
      directory = "/tmp/uploads"
      `rm -rf "#{directory}"`
      `mkdir "#{directory}"`

      path = File.join(directory, zipf.original_filename)
      File.open(path, "wb") { |f| f.write(zipf.read) }

      `unzip "#{path}" -d "#{directory}"`
      `rm #{path}`

      sql_cond = "country_code = '#{country_code}' OR country_code = ''"
      all_tags = WpTag.find(:all, :conditions => sql_cond)
      all_cats = WpCategory.find(:all, :conditions => sql_cond)
      all_sites = WpSite.find(:all, :conditions => "country_code = '#{country_code}'")
      cur_sites = all_sites.clone

      Find.find( directory ) do |fpath|
        if FileTest.file?( fpath )

          if cur_sites.empty?
            cur_sites = all_sites.clone
          end

          file_name = File.basename(fpath).chars.gsub(/\.\w+$/, '').gsub(/[_-]/, ' ').capitalize
          content = File.open(fpath).read
          filtered = content.chars.downcase

          tags = self.get_cats_or_tags(all_tags, filtered, 5)
          cats = self.get_cats_or_tags(all_cats, filtered, 2)

          cur_site = cur_sites.delete_at(rand(cur_sites.length))

          article = WpArticle.new(
            :content      => content.chars.tidy_bytes,
            :country_code => country_code,
            :subject      => file_name.chars.tidy_bytes,
            :categories   => cats,
            :tags         => tags,
            :wp_site_id   => cur_site.id)

          article.save
          
          @msg += "Article: <b>#{article.subject}</b> with tags: <b>#{article.tags}</b> and categories: <b>#{article.categories}</b> has been assigned to site: <b>#{cur_site.name}</b><br>"
        end
      end
    else
      @msg = "Country Code missing."
    end
  end

Update: Note that I’m using str.chars here and there, for instance when creating a new article with tidy_bytes, this is to make sure that I don’t have any issues with utf-8 characters.

Big method here, no doubt purists would cry “Shorten it damn you!”. I would, I really would, if parts of it were used elsewhere…

So this is the method that gets run after the form has been submitted.

1.) First of all note that we import find and fileutils, that might or might not be overkill but I’ve become accustomed to them after creating that file renamer thing.

2.) If we have a proper country code we proceed with assigning the contents of the zip_file parameter to zipf, we’re working with ActionController::UploadedFile and we use the original_filename method to get at the file.

3.) We then copy the file to /tmp/uploads and unpack it there after which we delete it, end of story for the upload there. We now have a lot of HTML documents in /tmp/uploads.

4.) We get all tags, categories and blogs associated with the selected country code by way of active record’s find.

5.) The file reading can begin by looping through /tmp/uploads, note that we begin with a check to see if cur_sites is empty. The logic here is that if we left the article to blog assignment completely up to chance we might end up with blog A getting 10 articles and blog B getting none. That won’t do so we assign one article to a random blog in cur_sites after which we delete that blog from the cur_sites array. When there are no blogs left in cur_sites we again populate it with all blogs, that will allow for an even (but random) spread.

6.) Next we strip the file name of the prefix and all underscores and hyphens to create the article title.

7.) Next we down case the whole article body and loop through it using get_cats_or_tags. Note how we keep track of how many occurrences we have of each string using a regex, don’t try the count string method. I did that mistake before reading up on it more closely, it won’t do what we want here. We finish off with getting rid of all words that didn’t exist in the content and making sure we don’t try and slice by a bigger length than the array we’re slicing.

8.) Finally we create a new article and save it + output some feedback to the SEO guy.

Related Posts

Tags: , ,