Parsing Google Data XML with Nokogiri
nokogiriI recently starting working on a project which needs to consume Google’s
Shared Contacts
API.
I decided to use Nokogiri to parse the XML
feeds, but I ran into perplexing problem when using #xpath
to retrieve specific elements from the XML document. I wanted to
retrieave all of the entry tags (there were five in the
sample document) under the feed tag. Searching for
//feed/entry using #xpath
failed, but
searching for feed entry using #css
worked.
doc = Nokogiri.XML(open("feed.xml")) # => #<Nokogiri::XML::Document:0x...>
doc.xpath('//feed/entry').size # => 0
doc.css('feed > entry').size # => 5
While experimenting to figure out the problem I noticed that not all
XPath searches failed. For example searching for email addresses within
the contact feed using //gd:email returned the correct
number of elements. A bit of googling turned up this article on Stack
Overflow.
Commenter Pesto pointed
out that, when using #xpath
, you must use the fully
qualified XML namespaces, i.e., //xmlns:feed/xmlns:entry.
doc.xpath('//xmlns:feed/xmlns:entry').size # => 5
I didn’t catch on at the time, but that’s why //gd:email worked — it included the namespace.