class Loofah::Scrubber

A Scrubber wraps up a block (or method) that is run on an HTML node (element):

# change all <span> tags to <div> tags
span2div = Loofah::Scrubber.new do |node|
  node.name = "div" if node.name == "span"
end

Alternatively, this scrubber could have been implemented as:

class Span2Div < Loofah::Scrubber
  def scrub(node)
    node.name = "div" if node.name == "span"
  end
end
span2div = Span2Div.new

This can then be run on a document:

Loofah.fragment("<span>foo</span><p>bar</p>").scrub!(span2div).to_s
# => "<div>foo</div><p>bar</p>"

Scrubbers can be run on a document in either a top-down traversal (the default) or bottom-up. Top-down scrubbers can optionally return Scrubber::STOP to terminate the traversal of a subtree.

Constants

CONTINUE

Top-down Scrubbers may return CONTINUE to indicate that the subtree should be traversed.

STOP

Top-down Scrubbers may return STOP to indicate that the subtree should not be traversed.

Attributes

block[R]

When a scrubber is initialized, the optional block is saved as :block. Note that, if no block is passed, then the scrub method is assumed to have been implemented.

direction[R]

When a scrubber is initialized, the :direction may be specified as :top_down (the default) or :bottom_up.

Public Class Methods

new(options = {}, &block) click to toggle source

Options may include

:direction => :top_down (the default)

or

:direction => :bottom_up

For top_down traversals, if the block returns Loofah::Scrubber::STOP, then the traversal will be terminated for the current node's subtree.

Alternatively, a Scrubber may inherit from Loofah::Scrubber, and implement scrub, which is slightly faster than using a block.

# File lib/loofah/scrubber.rb, line 64
def initialize(options = {}, &block)
  direction = options[:direction] || :top_down
  unless [:top_down, :bottom_up].include?(direction)
    raise ArgumentError, "direction #{direction} must be one of :top_down or :bottom_up" 
  end
  @direction, @block = direction, block
end

Public Instance Methods

scrub(node) click to toggle source

When new is not passed a block, the class may implement scrub, which will be called for each document node.

# File lib/loofah/scrubber.rb, line 85
def scrub(node)
  raise ScrubberNotFound, "No scrub method has been defined on #{self.class.to_s}"
end
traverse(node) click to toggle source

Calling traverse will cause the document to be traversed by either the lambda passed to the initializer or the scrub method, in the direction specified at new time.

# File lib/loofah/scrubber.rb, line 77
def traverse(node)
  direction == :bottom_up ? traverse_conditionally_bottom_up(node) : traverse_conditionally_top_down(node)
end

Private Instance Methods

html5lib_sanitize(node) click to toggle source
# File lib/loofah/scrubber.rb, line 91
def html5lib_sanitize(node)
  case node.type
  when Nokogiri::XML::Node::ELEMENT_NODE
    if HTML5::Scrub.allowed_element? node.name
      HTML5::Scrub.scrub_attributes node
      return Scrubber::CONTINUE
    end
  when Nokogiri::XML::Node::TEXT_NODE, Nokogiri::XML::Node::CDATA_SECTION_NODE
    return Scrubber::CONTINUE
  end
  Scrubber::STOP
end
traverse_conditionally_bottom_up(node) click to toggle source
# File lib/loofah/scrubber.rb, line 113
def traverse_conditionally_bottom_up(node)
  node.children.each {|j| traverse_conditionally_bottom_up(j)}
  if block
    block.call(node)
  else
    scrub(node)
  end
end
traverse_conditionally_top_down(node) click to toggle source
# File lib/loofah/scrubber.rb, line 104
def traverse_conditionally_top_down(node)
  if block
    return if block.call(node) == STOP
  else
    return if scrub(node) == STOP
  end
  node.children.each {|j| traverse_conditionally_top_down(j)}
end