So, the web site for the opensource wiki project my budy, David, and I have been working on, Informl, is now being crawled by Google. Looking at the logs, it was pretty silly to see that Googlebot is crawling, indexing and caching the page edit forms (all the same content as the wiki pages, but with annoying HTML form textareas for looking at the content).

So, we wanted to keep Googlebot from returning the page edit forms for search queries. We looked at using robots.txt to exclude these pages. For Google this can work, but only because of special extended behavior for Google in the robots.txt.

The URLs we wanted to exclude look like:

http://informl.folklogic.net/pages/1234;edit

It’s the part on the end – ;edit – that indicates that the URL is for editing the page. Normally, robots.txt only pays attention to the initial part of the URL to make exclusion decisions, but Googlebot allows wild cards, so we put the following in the file:

User-Agent: Googlebot
Disallow: /*;edit

We used the Google Webmaster Tools to see how Googlebot would interpret the URLs in relation to the above rule, which appears to do the trick.

Now, we started thinking about other search engines (Yahoo, Microsoft, etc.) and realized that they wouldn’t generally handle the wildcards that Googlebot will. So, searching for another solution we found the robot meta tags.

So we wanted to add the following to our edit HTML pages:

<meta name="robots" content="noindex,nofollow">

between the start and end head tags.

Now, pretty much all the HTML content on project uses the same Rails layout, and the layout contains the head tags and we wanted to add content to that part of the HTML, but only on the edit pages.

We came up with the following to do this. First, in the layout we added:

<%= @extra_header_content %>

This is pretty safe, because if @extra_header_content is undefined, the line will simply be blank.

Next, we added the following to the edit view:

<%- @extra_header_content = """
  <meta name="robots" content="noindex,nofollow">    
""" -%>

The important thing to note here, is that the view is evaluated before the layout is applied, so setting an instance variable can be used effectively in the layout.

So, that’s how we got our edit URLs to be ignored by (most) webcrawlers using robots.txt and the <name="robots" content="noindex,nofollow"> tag and how we got the meta tag into the HTML header using rails views and layouts.


Click here to check out
The Aix-en-Provence Expat Meetup Group!
So I’ve moved from Grenoble to Aix en Provence (that’s in France for those not following my every move). What a great town! It’s taken a while to get myself settled in with an apartment and such, but I’m still smiling ear-to-ear every time I walk out the door of my building smack-dab in the historic center of town. The weather is great and lots of great restaurants and shops. the French provincial countryside surrounds Aix, the Mediterranean is 25 min. by car and downtown Marseilles is about hte same by train.

Having sung the praises of Aix, since on my own I hardly know anyone in town. to try and jump start things I’ve started an Expat Meetup for Aix. It will be great to get together with people in a similar situation and chew the fat. So if you live near Aix, please check out the meetup on the 16 of June!

I’m currently vexed about where to park my car. Parking in the city lots is secure, well located and not badly priced if you are just coming into town for the day (€12 per day), but pretty silly for long term parking. I’m looking for a spot, but they are not easy to find. If act I’m off to talk to an agent about one this morning. Wish me luck.

I wanted to take a hash with a bunch of arrays, strings and other hashes and convert all the “word” like stings in all of them to Ruby symbols. I started to write a gory method that took an object and checked its class, but then I thought better and extended the base classes and used the #collect() method pull everything together.

So, at first I created a method like the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def symify(obj)
  case obj
  when Hash
    o2 = {}
    obj.each {|k,v| o2[symify(k)] = symify(v)}
    obj
  when Array
    o2 = []
    obj.each {|v| o2.push(symify(v))}
    o2
  when String
    obj =~ /\A\w+\z/ ? obj.to_sym : obj
  else obj
  end
end

I was actually happy to find that the Ruby case statement handled the class comparison “out of the box”, but, really, this code is pretty ugly, since there is object class specific behaviour of different classed collected together.

One way to deal with this is to sub-class String, Array and Hash, but there are a few problems with that:

  • since these are pretty basic types, you can’t use natural syntax for the data you want to create (lots of stuff like my_string = MyString.new("mystring") instead of my_string = "mystring"); and
  • there’s no really clean way of dealing with the default cases for all the other object classes.

Ruby’s ability to extend existing classes – even the basics like String and Array – can simplify this kind of thing a lot. You can even extend the base class Object to deal with the default case.

So, here’s my final code for my #symify() method (now implemented on individual objects):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#!/usr/bin/env ruby

class Object
  def symify; self; end
end

class String
      def symify; match(/\A\w+\z/) ? to_sym : self; end
end

class Hash
  def symify
    hash_array = inject({}) do |obj, p| k, v = p
      obj[k.symify] = v.symify
      obj
    end
  end
end

class Array
  def symify; map { |v| v.symify }; end
end

Here’s some sample input and output:

1
2
3
4
5
6
7
8
9
10
"word".symify                   # -> :word
    
['a', 'b', 'c', 'x x'].symify   # -> [:a, :b, :c, 'x x']
      
{'a' => 'x', 'b' => 'y', 'c' => 'a b', 'x x' => 'c'}.symify
                                # -> {:a => :x, :b => :y, :c => 'a b', 'x x' => :c}
{'a' => 'x', 'b' => [ 'c', 'd', 'x x']}.symify
                                # ->{:a => :x, :b => [ :c, :d, 'x x']}
['a' , { 'b' => 'c', 'd' => 'x x'}] 
                                # -> [:a , { :b => :c, :d => 'x x'}]

I had to fiddle quite a bit with the Hash#inject() method to figure out how it worked. It wasn’t immediately obviously to me that the key/value pair would be supplied as an array. In any case, the code here works and demonstrates how the inject works with a Hash.

So, I ran into a situation in Ruby where I needed to expand tabs in a string and replace them with spaces. I rolled my own, but found that I really just needed to improve my googling skills and to RTFM.

Now I didn’t want to simply replace each tab character with 4 or 8 spaces, but instead to replace tabs with enough spaces to put the content following it on the next tab-stop column.

I googled around and found some stuff on the web 1,2, but it didn’t really do what I wanted. So I wrote my own:

class String
  def expand(width=4)

    tr = self
    tc = ""
    while 0 != tr.length
      case tr
      when /\A([^\t\n]{#{width}})/
        tc += $1
        tr = $'
      when /\A([^\t\n]{0,#{width-1}})\t/
        tc += $1 + " "*(width-$1.length)
        tr = $'
      when /\A([^\t\n]{0,#{width-1}}\n)/
        tc += $1
        tr = $'
      else
        tc += tr
        tr = ""
      end
    end

    tc
  end
end

You’ll notice that I wrote this as an extension of the String class; so, you can do stuff like:

no_tab_string = "x\txx".expand      # -> "x   xx"
# ^^^^^^^0^^^^^^^^^0^^^^^^^^^0^^^^^^^^^0^^^^^^^^^0
#        1         2         3         4         5

I was pretty satisfied with this, but I did some additional googling for better regular expression methods, and low and behold, I found the Ruby FAQ, which as the following suggestions for tab expansion:

  1 while a.sub!(/(^[^\t]*)\t(\t*)/){$1+' '*(8-$1.size%8+8*$2.size)}
#or
  1 while a.sub!(/\t(\t*)/){' '*(8-$~.begin(0)%8+8*$1.size)}
#or
  a.gsub!(/([^\t]{8})|([^\t]*)\t/n){[$+].pack("A8")}

Well, I guess I should have RTFM. All those are much better than what I came up with.

I particularly liked the 3rd one. I did have to make a minor change to it to make it work the way I wanted – I needed a way to handle tabs across multiple lines:

class String
  def expand(width=4)
    gsub(/(\n?)(?:([^\t\n]{#(width}})|([^\t\n]*)\t)/n) \
      {$1+[$+].pack("A#{width}")}
  end
end

So, through this exercise I learned a little about Ruby, but more about learning from others…

I’ve been working on a wiki parsing engine in Ruby for a while, and the heart of the engine uses a set of cascading regular expressions (RE) to parse the input and transform it to HTML. I’ve been mulling over better ways to handle managing the overall parsing task based on the expressions and have been envying the regular expression power of Perl et. al. I was guessing that other Ruby hackers were in the same boat, so I started to look around for what was available beyond the base regex options in Ruby 1.8. Well, there are several, but using them for a published project wouldn’t be the easiest thing – at least not for a while.

The Ruby 1.8 regex based REs are pretty good for what they do, and they handle most cases you generally run into in “casual” use, but they really pale in comparison to what’s available in Perl, Python and PHP. Perl has its own RE implementation (which has changed quite a bit between v5 and v6), which has been the driver for much of the RE feature advances in other languages and libraries. PHP and Python, for instance, use the pcre library, which provides Perl compatible REs (PCRE stands for “Perl [5] Compatible Regular Expressions”).

So, what are the alternatives for Ruby? Well, the ultimate target is the use of the Oniguruma library, which handles the generally desired Perl style features (named captures, lookbehind, etc. ), and it will be available in Ruby 2.0. (It’s available now in 1.9, but that’s not a “real” release FWICT.) If you don’t want to wait for 2.0 here are some other options:

If you need the features, one of these looks like the right way to go. I’ve currently coded around the need for the more advanced RE features, but I will probably want them in the near future in order to make my parser work efficiently.

So, I’ll try and hold out until some reasonable release of Ruby 2.0 comes along, and monitor to see if a reasonable 1.8 release comes out with Oniguruma, but otherwise I’ll probably use one of these two options.

A while a go my PowerBook started spacing out for minutes at a time while the rainbow cursor spun and I was left staring and feeling my blood pressure rise. After a while I realized that the problem only happened when I had iCal running. So for a while I just tried to avoid running iCal, but that wasn’t really working since I have my whole life in there. So, I’d start it – wait – check a date – wait – add an event – wait – switch to Thunderbird – wait – switch back to iCal – wait – quit iCal – wait – you get the idea.

In the mean time, I did notice that in the /var/log/system.log I had the following enigmatic error message:

    doublewide kernel[0]: disk0s3: I/O error.

That scared me a bit. I started to worry that there was something wrong with my disk, which initiated a round of Disk Utility checks, but nothing turned up. Once I got over the initial fright of a fried hard-disk, I started to get very frustrated with the lack of information associated with the error. After all, it really gives no help at all in figuring out what the source of the problem is. What process/application caused it? What file was being accessed when the problem occurred?

So, finally I figured “I’m a programmer and I should be able to figure this out”, so I broke down and pulled out the big guns with ktrace.

I used the following:

    % ps auxww | grep '[i]Cal ' | awk '{print $2}'
    2592
    % ktrace -p 2592
    % kdump -l

or some variation there of. I’d see lots of file errors, but there didn’t seem to be much correlation of the errors with I/O error entries in the system.log file nor with the stalls. Thrash, thrash, thrash, pull out hair, thrash. A t some point I looked at the man page for ktrace and found the -i flag which shows the activity of child processes as well. Once I did that, I stared to see the SyncServer process show up. Thrash, grep, thrash, scan voluminous kdump output, grep, thrash….

I finally started to see a pattern emerge. Most of the time, but not all the ti me, when the stalls occurred and the I/O error was reported in the system log, I ‘d see that there was a an error in the kdump like: 2311 SyncServer RET read -1 errno 5 Input/output error

It took me a while to figure out what SyncServer was doing when this happened, b ut scanning backward in the trace file, I saw the file descriptor of the file be ing accessed: 2311 SyncServer CALL read(0xa,0x197fc30,0x1000) and then I scanned back and found the name of the file associated with the fd, and it was:

    2311 SyncServer NAMI  "/Users/larry/Library/Application Support/SyncServices/Local/.dat0907.004"
    2311 SyncServer RET   open 10/0xa

When I looked into the offending file, it appeared to be a database file, so I s imply moved it aside so SyncServer wouldn’t find it and rebooted.When I did that all was good with the world. It was like having a new Mac.

I can now use iCal with out cringing and it seems like the overall performance of my laptop is better (but I suspect that is just an over generalisation of my e lation of finding the problem).

So, that’s how I saved myself from computicide and switched back to the love side of the love/hate relationship with my PowerBook.

Now if I could just fix the one broken pixel on my 17” display…

My big toe's wet...

May 3rd, 2006

Starting out, I’ll give you some idea what *my* blog is all about. In the end it’s about me, right? In any case, I’ll try to stay away from navel pondering, and, instead, I’ll talk about the following:

  • design: both on the Internet and in the world at large. Can’t say I’m an expert, but I’m certainly interested.
  • photography: again I’m an amateur, but I try to keep my lens on things now and then.
  • telecoms: I’ve been working in the business as a programmer for quite a while and there are things I like and things that bug me.
  • programming: it’s a craft to me like carpentry or pottery are to other people. I’m always trying to get better.
  • start-ups: I’m in one now and about to start another one. I’ve got a particular slant being an American in France. I’ll be looking forward to your comments and feedback.

And here we go…