Safe Haskell | Safe-Infered |
---|
- diffbot :: Request a => String -> String -> a -> IO (Maybe Object)
- data Article = Article {}
- defArticle :: Article
- data FrontPage = FrontPage {}
- defFrontPage :: FrontPage
- data Image = Image {}
- defImage :: Image
- data Product = Product {}
- defProduct :: Product
- data Classifier = Classifier {}
- defClassifier :: Classifier
- class Fields a where
- class Post a where
- class Timeout a where
- data Content = Content {}
- data ContentType
- class Request a where
- data Req = Req {}
- data HttpException
- = StatusCodeException Status ResponseHeaders CookieJar
- | InvalidUrlException String String
- | TooManyRedirects [Response ByteString]
- | UnparseableRedirect (Response ByteString)
- | TooManyRetries
- | HttpParserException String
- | HandshakeFailed
- | OverlongHeaders
- | ResponseTimeout
- | FailedConnectionException String Int
- | ExpectedBlankAfter100Continue
- | InvalidStatusLine ByteString
- | InvalidHeader ByteString
- | InternalIOException IOException
- | ProxyConnectException ByteString Int (Either ByteString HttpException)
- | NoResponseDataReceived
- | TlsException SomeException
- | TlsNotSupported
- | ResponseBodyTooShort Word64 Word64
- | InvalidChunkHeaders
- | IncompleteHeaders
Examples
Just download information about the primary article content on the submitted page:
import Diffbot main = do let token = "11111111111111111111111111111111" url = "http://blog.diffbot.com/diffbots-new-product-api-teaches-robots-to-shop-online/" resp <- diffbot token url defaultArticle print resp
You can control which fields are returned:
main = do let token = "11111111111111111111111111111111" url = "http://blog.diffbot.com/diffbots-new-product-api-teaches-robots-to-shop-online/" fields = Just "meta,querystring,images(*)" resp <- diffbot token url $ setFields fields defaultArticle print resp
If your content is not publicly available (e.g., behind a firewall), you can POST markup for analysis directly:
{-# LANGUAGE OverloadedStrings #-} import Diffbot main = do let token = "11111111111111111111111111111111" url = "http://www.diffbot.com/our-apis/article" content = Content TextPlain "Now is the time for all good robots to come to the aid of their -- oh never mind, run!" -- Please note that the 'url' is still required, and will be used -- to resolve any relative links contained in the markup. resp <- diffbot token url $ setContent (Just content) defaultArticle print resp
Perform a request
The Object
type contains JSON objects:
>>>
let token = "11111111111111111111111111111111"
>>>
let url = "http://blog.diffbot.com/diffbots-new-product-api-teaches-robots-to-shop-online/"
>>>
Just resp <- diffbot token url defaultArticle
>>>
resp
fromList [("author",String "John Davi"),("title",String "Diffbot\8217s New Product API Teaches Robots to Shop Online"),...
You can extract values from it with a parser using parse
,
parseEither
or, in this example, parseMaybe
from aeson package:
getInfo :: Object -> Maybe String getInfo resp = flip parseMaybe resp $ \obj -> do author <- obj .: "author" title <- obj .: "title" return $ title ++ ", by " ++ author
>>>
getInfo resp
Just "Diffbot\8217s New Product API Teaches Robots to Shop Online, by John Davi"
API
Article
Used to extract clean article text from news article, blog post and similar text-heavy web pages.
Article | |
|
Front Page
Takes in a multifaceted "homepage" and returns individual page elements.
FrontPage | |
|
Image
Analyzes a web page and returns its primary image(s).
Image | |
|
Product
Analyzes a shopping or e-commerce product page and returns information on the product.
Page Classifier
data Classifier Source
Type classes
Used to control which fields are returned by the API.
Datatypes
Content | |
|
Internal
All information on how to connect to a Diffbot and what should be sent in the request.
Exceptions
data HttpException