API Reference

Extractor Class

class selectorlib.selectorlib.Extractor(config, formatters=None)[source]

selector class

extract(html: str, base_url: str = None)[source]
Args:

html: html string base_url (str, optional): specifying the base_url will make all

extracted Links absolute
Returns:
dict: extracted data from given html string
>>> response = requests.get(url)
>>> extractor.extract(response.text, base_url=response.url)
classmethod from_yaml_file(yaml_filename: str, formatters=None)[source]

create Extractor object from yaml file

>>> extractor = Extractor.from_yaml_string('selectors.yaml')
classmethod from_yaml_string(yaml_string: str, formatters=None)[source]

create Extractor object from yaml string

>>> yaml_string = '''
    title:
        css: "h1"
        type: Text
    '''
>>> extractor = Extractor.from_yaml_string(yaml_string)

Formatter Class

class selectorlib.formatter.Formatter[source]

Inherit this class and override format function

format(text: str)[source]

Override this function in inherited subclass. return text after formatting

classmethod get_all()[source]

returns all subclasses inherited from Formatter

>>> formatters = Formatter.get_all()
>>> Extractor.from_yaml_file('a.yaml', formatters=formatters)