CJ Jackson

  • I got myself a new iPad, a different world to the iPhone!
  • Posts: 241
Modular Autoembed Script in Python
« on October 18th, 2011, 03:25 PM »
embedder/__init__.py
Code: [Select]
import re
from importlib import import_module
from _lib.shortcode import shortcode
from ._list import __embedlist


def execute(user, url):
    msg = ''
    for list in __embedlist:
        match = re.match(list['re'], url, flags=re.I)
        if match:
            try:
                module = import_module('_lib.embedder.' + list['module'])
                return module.execute(user, match, 640, 376)
            except:
                msg = "\nModule Failed to load"
    url = "[url]" + url + "[/url]" + msg
    code = shortcode(user=user, filter=['url'])
    url = code.execute(url)
    return url

embedder/_list.py
Code: [Select]
__embedlist = [
    {
        're': '^http(s?)://([a-z]*).youtube.com/(watch|index)\?(.*)v=(?P<youtubeid>[a-zA-Z0-9]+)',
        'module': 'youtube'
    },
    {
        're': '^http(s?)://([a-z]*).youtube.com/v/(?P<youtubeid>[a-zA-Z0-9]+)',
        'module': 'youtube'
    },
    {
        're': '^http(s?)://youtu.be/(?P<youtubeid>[a-zA-Z0-9]+)',
        'module': 'youtube'
    },
    {
        're': '^http(s?)://www.metacafe.com/watch/(?P<metacafeid>\d+)/(?P<metacafename>[a-zA-Z0-9_-]+)(/?)',
        'module': 'metacafe'
    },
    {
        're': '^http(s?)://www.dailymotion.com/(.*)video/(?P<dailymotionid>[^_]+)',
        'module': 'dailymotion'
    },
    {
        're': '^http(s?)://www.gametrailers.com/video/(?P<gametrailersname>[a-zA-Z-]+)/(?P<gametrailersid>\d+)(/?)',
        'module': 'gametrailers'
    },
    {
        're': '^http(s?)://([www.]?)collegehumor.com/video/(?P<collegehumorid>\d+)(/?)',
        'module': 'collegehumor'
    },
    {
        're': '^http(s?)://www.funnyordie.com/videos/(?P<funnyordieid>[a-zA-Z0-9]+)(/?)',
        'module': 'funnyordie'
    },
    {
        're': '^http(s?)://revision3.com/(?P<r3group>[a-zA-Z0-9]+)/(?P<r3title>[a-zA-Z0-9]+)(/?)',
        'module': 'revision3'
    },
    {
        're': '^http(s?)://www.break.com/([a-zA-Z-]+)/(?P<breakname>[a-zA-Z-]+)-(?P<breakid>\d+)(/?)',
        'module': 'break'
    },
]

I studied the links from (& only from) http://embed.ly/providers  :eheh:  Great thing about re in python is that it's allows named groups, much better than counting the groups.

embedder/youtube.py
Code: [Select]
from django.template import Context, loader


def execute(user, match, width, height):
    youtubeid = match.group('youtubeid')

    template = loader.get_template('embedder/youtube.html')
    context = Context({
        'youtubeid': youtubeid, 'width': width, 'height': height
    })

    return template.render(context)

All of the modules were very easy to code because the urls had id's in them, except revision3 that required HTML DOM, just to get the id from meta tag.  Also I had to use database caching for revision3 because HTML DOM is damn slow.  It's took about two hours to get revision3 to work.

embedder/revision3.py
Code: [Select]
from django.template import Context, loader
import hashlib
import json
from embed_cache.models import embed_cache
import urllib2
import html5lib
from html5lib import treebuilders
import re


def execute(user, match, width, height):
    r3group = match.group('r3group')
    r3title = match.group('r3title')
    prehash = "revision3/" + r3group + "/" + r3title
    hash = hashlib.new('ripemd160')
    hash.update(prehash)
    hash = hash.hexdigest()

    try:
        object = embed_cache.objects.get(hash=hash)
        data = json.loads(object.data)
        r3id = data['r3id']
        del data
        del object
    except:
        try:
            parser = html5lib.HTMLParser(tree=treebuilders.getTreeBuilder("dom"))
            dom = parser.parse(urllib2.urlopen(match.group(0)).read())
            for node in dom.getElementsByTagName('meta'):
                if node.getAttribute('property') == 'og:video':
                    content = node.getAttribute('content')
                    break
            pattern = 'http://revision3.com/player-v(?P<theid>[0-9]+)'
            match2 = re.match(pattern, content, flags=re.I)
            if match2:
                r3id = match2.group('theid')
                data = {'r3id': r3id}
                data = json.dumps(data)
                object = embed_cache.objects.create(hash=hash, data=data)
                del object
                del data
            else:
                return ""
        except:
            return ""

    template = loader.get_template('embedder/revision3.html')
    context = Context({
        'r3id': r3id, 'width': width, 'height': height
    })

    return template.render(context)

As you can see each module has it's own template (or html code).  I can't believe how most of it was ridiculously easy in Python.

Nao

  • Dadman with a boy
  • Posts: 16,082

CJ Jackson

  • I got myself a new iPad, a different world to the iPhone!
  • Posts: 241
Re: Modular Autoembed Script in Python
« Reply #2, on October 19th, 2011, 09:58 PM »
It's a media embedder script.  Just think of it as a compromise between your Aeva Media and oEmbed, the flexibility of oEmbed and the performance of Aeva Media.

I got it running at RockForums.Co here and here, it works pretty nicely.  :)