Posted by Larry Baltz
Thu, 24 May 2007 18:14:00 GMT
So, the web site for the opensource wiki project my budy, David, and I have been working on, Informl, is now being crawled by Google. Looking at the logs, it was pretty silly to see that Googlebot is crawling, indexing and caching the page edit forms (all the same content as the wiki pages, but with annoying HTML form textareas for looking at the content).
So, we wanted to keep Googlebot from returning the page edit forms for search queries. We looked at using robots.txt to exclude these pages. For Google this can work, but only because of special extended behavior for Google in the robots.txt.
The URLs we wanted to exclude look like:
http://informl.folklogic.net/pages/1234;edit
It’s the part on the end – ;edit – that indicates that the URL is for editing the page. Normally, robots.txt only pays attention to the initial part of the URL to make exclusion decisions, but Googlebot allows wild cards, so we put the following in the file:
User-Agent: Googlebot
Disallow: /*;edit
We used the Google Webmaster Tools to see how Googlebot would interpret the URLs in relation to the above rule, which appears to do the trick.
Now, we started thinking about other search engines (Yahoo, Microsoft, etc.) and realized that they wouldn’t generally handle the wildcards that Googlebot will. So, searching for another solution we found the robot meta tags.
So we wanted to add the following to our edit HTML pages:
<meta name="robots" content="noindex,nofollow">
between the start and end head tags.
Now, pretty much all the HTML content on project uses the same Rails layout, and the layout contains the head tags and we wanted to add content to that part of the HTML, but only on the edit pages.
We came up with the following to do this. First, in the layout we added:
<%= @extra_header_content %>
This is pretty safe, because if @extra_header_content is undefined, the line will simply be blank.
Next, we added the following to the edit view:
<%- @extra_header_content = """
<meta name="robots" content="noindex,nofollow">
""" -%>
The important thing to note here, is that the view is evaluated before the layout is applied, so setting an instance variable can be used effectively in the layout.
So, that’s how we got our edit URLs to be ignored by (most) webcrawlers using robots.txt and the <name="robots" content="noindex,nofollow"> tag and how we got the meta tag into the HTML header using rails views and layouts.
no comments | no trackbacks
Posted by Larry Baltz
Wed, 23 May 2007 07:42:00 GMT
So I’ve moved from Grenoble to Aix en Provence (that’s in France for those not following my every move). What a great town! It’s taken a while to get myself settled in with an apartment and such, but I’m still smiling ear-to-ear every time I walk out the door of my building smack-dab in the historic center of town. The weather is great and lots of great restaurants and shops. the French provincial countryside surrounds Aix, the Mediterranean is 25 min. by car and downtown Marseilles is about hte same by train.
Having sung the praises of Aix, since on my own I hardly know anyone in town. to try and jump start things I’ve started an Expat Meetup for Aix. It will be great to get together with people in a similar situation and chew the fat. So if you live near Aix, please check out the meetup on the 16 of June!
I’m currently vexed about where to park my car. Parking in the city lots is secure, well located and not badly priced if you are just coming into town for the day (€12 per day), but pretty silly for long term parking. I’m looking for a spot, but they are not easy to find. If act I’m off to talk to an agent about one this morning. Wish me luck.
no comments | no trackbacks
Posted by Larry Baltz
Sat, 17 Jun 2006 21:44:00 GMT
I wanted to take a hash with a bunch of arrays, strings and other hashes and convert all the “word” like stings in all of them to Ruby symbols. I started to write a gory method that took an object and checked its class, but then I thought better and extended the base classes and used the #collect() method pull everything together.
Read more...
Posted in programming, ruby | no comments | no trackbacks
Posted by Larry Baltz
Sat, 13 May 2006 10:18:00 GMT
I’ve been working on a wiki parsing engine in Ruby for a while, and the heart of the engine uses a set of cascading regular expressions (RE) to parse the input and transform it to HTML. I’ve been mulling over better ways to handle managing the overall parsing task based on the expressions and have been envying the regular expression power of Perl et. al. I was guessing that other Ruby hackers were in the same
boat, so I started to look around for what was available beyond the base regex options in Ruby 1.8. Well, there are several, but using them for a published project wouldn’t be the easiest thing – at least not for a while.
Read more...
Posted in programming, ruby | no comments | no trackbacks
Posted by Larry Baltz
Wed, 10 May 2006 12:02:00 GMT
So, I ran into a situation in Ruby where I needed to expand tabs in a string and replace them with spaces. I rolled my own, but found that I really just needed to improve my googling skills and to RTFM.
Read more...
Posted in programming, ruby | no comments | no trackbacks
Posted by Larry Baltz
Fri, 05 May 2006 19:54:00 GMT
A while a go my PowerBook started spacing out for minutes at a time while the rainbow cursor spun and I was left staring and feeling my blood pressure rise. After a while I realized that the problem only happened when I had iCal running. So for a while I just tried to avoid running iCal, but that wasn’t really working since I have my whole life in there. So, I’d start it – wait – check a date – wait – add an event – wait – switch to Thunderbird – wait – switch back to iCal – wait – quit iCal – wait – you get the idea.
Read more...
Posted in osx | no comments | no trackbacks
Posted by Larry Baltz
Wed, 03 May 2006 18:18:00 GMT
Starting out, I’ll give you some idea what my blog is all about. In the end it’s about me, right? In any case, I’ll try to stay away from navel pondering, and, instead, I’ll talk about the following:
- design: both on the Internet and in the world
at large. Can’t say I’m an expert, but I’m certainly
interested.
- photography: again I’m an amateur, but I try
to keep my lens on things now and then.
- telecoms: I’ve been working in the business
as a programmer for quite a while and there are
things I like and things that bug me.
- programming: it’s a craft to me like carpentry
or pottery are to other people. I’m always trying
to get better.
- start-ups: I’m in one now and about to start
another one. I’ve got a particular slant being an
American in France.
I’ll be looking forward to your comments and feedback.
And here we go…
Posted in general, non-technical | 1 comment | no trackbacks