diffbot-0.1: Simple client for the Diffbot API

Safe HaskellSafe-Infered

Diffbot

Contents

Synopsis

Examples

Just download information about the primary article content on the submitted page:

 import Diffbot

 main = do
     let token = "11111111111111111111111111111111"
         url   = "http://blog.diffbot.com/diffbots-new-product-api-teaches-robots-to-shop-online/"
     resp <- diffbot token url defaultArticle
     print resp

You can control which fields are returned:

 main = do
     let token  = "11111111111111111111111111111111"
         url    = "http://blog.diffbot.com/diffbots-new-product-api-teaches-robots-to-shop-online/"
         fields = Just "meta,querystring,images(*)"
     resp <- diffbot token url $ setFields fields defaultArticle
     print resp

If your content is not publicly available (e.g., behind a firewall), you can POST markup for analysis directly:

 {-# LANGUAGE OverloadedStrings #-}
 import Diffbot

 main = do
     let token   = "11111111111111111111111111111111"
         url     = "http://www.diffbot.com/our-apis/article"
         content = Content TextPlain "Now is the time for all good robots to come to the aid of their -- oh never mind, run!"
     -- Please note that the 'url' is still required, and will be used
     -- to resolve any relative links contained in the markup.
     resp <- diffbot token url $ setContent (Just content) defaultArticle
     print resp

Perform a request

diffbotSource

Arguments

:: Request a 
=> String

Developer token.

-> String

URL to process.

-> a

API

-> IO (Maybe Object) 

The Object type contains JSON objects:

>>> let token = "11111111111111111111111111111111"
>>> let url = "http://blog.diffbot.com/diffbots-new-product-api-teaches-robots-to-shop-online/"
>>> Just resp <- diffbot token url defaultArticle
>>> resp
fromList [("author",String "John Davi"),("title",String "Diffbot\8217s New Product API Teaches Robots to Shop Online"),...

You can extract values from it with a parser using parse, parseEither or, in this example, parseMaybe from aeson package:

 getInfo :: Object -> Maybe String
 getInfo resp = flip parseMaybe resp $ \obj -> do
     author <- obj .: "author"
     title  <- obj .: "title"
     return $ title ++ ", by " ++ author
>>> getInfo resp
Just "Diffbot\8217s New Product API Teaches Robots to Shop Online, by John Davi"

API

Article

data Article Source

Used to extract clean article text from news article, blog post and similar text-heavy web pages.

Constructors

Article 

Fields

articleContent :: Maybe Content

Content.

articleFields :: Maybe String

Used to control which fields are returned by the API.

articleTimeout :: Maybe Int

Set a value in milliseconds to terminate the response.

Front Page

data FrontPage Source

Takes in a multifaceted "homepage" and returns individual page elements.

Constructors

FrontPage 

Fields

frontPageAll :: Bool

Returns all content from page, including navigation and similar links that the Diffbot visual processing engine considers less important/non-core.

frontPageContent :: Maybe Content
 
frontPageTimeout :: Maybe Int

Specify a value in milliseconds to override the default API timeout of 5000ms.

Image

data Image Source

Analyzes a web page and returns its primary image(s).

Constructors

Image 

Product

data Product Source

Analyzes a shopping or e-commerce product page and returns information on the product.

Page Classifier

Type classes

class Fields a whereSource

Used to control which fields are returned by the API.

class Post a whereSource

Datatypes

data Content Source

Constructors

Content 

Fields

contentType :: ContentType

Type of content.

contentData :: ByteString

Content to analyze.

data ContentType Source

Constructors

TextPlain 
TextHtml 

Instances

Internal

class Request a whereSource

All information on how to connect to a Diffbot and what should be sent in the request.

Methods

toReq :: a -> ReqSource

data Req Source

Constructors

Req 

Exceptions