| Safe Haskell | Safe-Infered |
|---|
Diffbot
Contents
- diffbot :: Request a => String -> String -> a -> IO (Maybe Object)
- data Article = Article {}
- defArticle :: Article
- data FrontPage = FrontPage {}
- defFrontPage :: FrontPage
- data Image = Image {}
- defImage :: Image
- data Product = Product {}
- defProduct :: Product
- data Classifier = Classifier {}
- defClassifier :: Classifier
- class Fields a where
- class Post a where
- class Timeout a where
- data Content = Content {}
- data ContentType
- class Request a where
- data Req = Req {}
- data HttpException
- = StatusCodeException Status ResponseHeaders CookieJar
- | InvalidUrlException String String
- | TooManyRedirects [Response ByteString]
- | UnparseableRedirect (Response ByteString)
- | TooManyRetries
- | HttpParserException String
- | HandshakeFailed
- | OverlongHeaders
- | ResponseTimeout
- | FailedConnectionException String Int
- | ExpectedBlankAfter100Continue
- | InvalidStatusLine ByteString
- | InvalidHeader ByteString
- | InternalIOException IOException
- | ProxyConnectException ByteString Int (Either ByteString HttpException)
- | NoResponseDataReceived
- | TlsException SomeException
- | TlsNotSupported
- | ResponseBodyTooShort Word64 Word64
- | InvalidChunkHeaders
- | IncompleteHeaders
Examples
Just download information about the primary article content on the submitted page:
import Diffbot
main = do
let token = "11111111111111111111111111111111"
url = "http://blog.diffbot.com/diffbots-new-product-api-teaches-robots-to-shop-online/"
resp <- diffbot token url defaultArticle
print resp
You can control which fields are returned:
main = do
let token = "11111111111111111111111111111111"
url = "http://blog.diffbot.com/diffbots-new-product-api-teaches-robots-to-shop-online/"
fields = Just "meta,querystring,images(*)"
resp <- diffbot token url $ setFields fields defaultArticle
print resp
If your content is not publicly available (e.g., behind a firewall), you can POST markup for analysis directly:
{-# LANGUAGE OverloadedStrings #-}
import Diffbot
main = do
let token = "11111111111111111111111111111111"
url = "http://www.diffbot.com/our-apis/article"
content = Content TextPlain "Now is the time for all good robots to come to the aid of their -- oh never mind, run!"
-- Please note that the 'url' is still required, and will be used
-- to resolve any relative links contained in the markup.
resp <- diffbot token url $ setContent (Just content) defaultArticle
print resp
Perform a request
Arguments
| :: Request a | |
| => String | Developer token. |
| -> String | URL to process. |
| -> a | API |
| -> IO (Maybe Object) |
The Object type contains JSON objects:
>>>let token = "11111111111111111111111111111111">>>let url = "http://blog.diffbot.com/diffbots-new-product-api-teaches-robots-to-shop-online/">>>Just resp <- diffbot token url defaultArticle>>>respfromList [("author",String "John Davi"),("title",String "Diffbot\8217s New Product API Teaches Robots to Shop Online"),...
You can extract values from it with a parser using parse,
parseEither or, in this example, parseMaybe from aeson package:
getInfo :: Object -> Maybe String
getInfo resp = flip parseMaybe resp $ \obj -> do
author <- obj .: "author"
title <- obj .: "title"
return $ title ++ ", by " ++ author
>>>getInfo respJust "Diffbot\8217s New Product API Teaches Robots to Shop Online, by John Davi"
API
Article
Used to extract clean article text from news article, blog post and similar text-heavy web pages.
Constructors
| Article | |
Fields
| |
Front Page
Takes in a multifaceted "homepage" and returns individual page elements.
Constructors
| FrontPage | |
Fields
| |
Image
Analyzes a web page and returns its primary image(s).
Constructors
| Image | |
Fields
| |
Product
Analyzes a shopping or e-commerce product page and returns information on the product.
Constructors
| Product | |
Fields | |
Page Classifier
Type classes
Used to control which fields are returned by the API.
Datatypes
Constructors
| Content | |
Fields
| |
Internal
All information on how to connect to a Diffbot and what should be sent in the request.
Exceptions
data HttpException
Constructors