How to use the minet.exceptions.RedirectError function in minet

To help you get started, we’ve selected a few minet examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github medialab / gazouilloire / bin / complete_links_resolving_v2.py View on Github external
links_to_save = []
        t = datetime.now().isoformat()
        print("  + [%s] %s urls to resolve" % (t, len(urls_to_clear)))
        try:
            for res in multithreaded_resolve(
              urls_to_clear,
              threads=min(50, batch_size),
              throttle=0.2,
              max_redirects=20,
              insecure=True,
              timeout=Timeout(connect=10, read=30),
              follow_meta_refresh=True
            ):
                source = res.url
                last = res.stack[-1]
                if res.error and type(res.error) != RedirectError and not issubclass(type(res.error), RedirectError):
                    print("ERROR on resolving %s: %s (last url: %s)" % (source, res.error, last.url), file=sys.stderr)
                    continue
                if verbose:
                    print("          ", last.status, "(%s)" % last.type, ":", source, "->", last.url, file=sys.stderr)
                if len(source) < 1024:
                    links_to_save.append({'_id': source, 'real': last.url})
                alreadydone[source] = last.url
                if source != last.url:
                    done += 1
        except Exception as e:
            print("CRASHED with %s (%s) while resolving batch, skipping it for now..." % (e, type(e)))
            print("CRASHED with %s (%s) while resolving %s" % (e, type(e), urls_to_clear), file=sys.stderr)
            skip += batch_size
            print("  + [%s] STORING %s REDIRECTIONS IN MONGO" % (t, len(links_to_save)))
            if links_to_save:
                try:

minet

A webmining CLI tool & library for python.

GPL-3.0
Latest version published 4 days ago

Package Health Score

78 / 100
Full package analysis