Scriptabulous!

Back

CRIOSWEB_HTMLCleaner

This class can be used to remove unwanted tags and data from HTML document. It takes a string with the HTML document to clean and parses it assuming a given character set encoding. The class can perform several types of clean-up operations like: - Removing style definitions- Remove tags or attributes based on white lists or blacklists- Use the HTML tidy extension to clean the document and format the output as XHTML and drop proprietary attributes from Microsoft Word HTML documents- Drop empty paragraphs- Remove needless white space- Fill empty table cells

Download this script