Friday 1 May 2015

GrepSlash, A tech content curation platform


GrepSlash is a Technical content curation platform , intended to allow an enthusiastic user to discover community reviewed projects and research works.
It also provides means to developers and academicians to share their work for community feedbacks apart from showcasing their recent endeavors. 

There is yet another feature where it aggregate top tech news from various websites, so you  get to see all the recent tech stuffs in a single place.


Built on Ruby on Rails. 

Thursday 21 August 2014

eventEmitter - Lets get that event emitted


eventEmitter a clone of node's own events.EventEmitter now compatible in Node and all major browsers .

Check out in Github .

Sunday 3 August 2014

Google app engine for java with eclipse and hello world

Installing stuffs in linux is not a joke , well apart from the addiction of solving the little intricacies which the kernel puts in our way sometimes you just can't get the kernel do what you want to do !


Having said that, i tried incorporating google app engine plugin with eclipse indigo which by default is available from the ubuntu software center. Some hours and few cusswords later i finally gave up and used juno.

So lets begin
1.Download juno from link
2.Running it is pretty cool , just click the executable and viola , but for the viola moment you need to have java installed.

sudo apt-get  install openjdk-7-jre
and
sudo apt-get install openjdk-7-jdk

3.Move the extracted eclipse folder to /opt
sudo mv eclipse /opt/

4.Create a eclipse.desktop file and move to /usr/share/applications with contents

[Desktop Entry]
Type=Application
Name=Eclipse
Comment=Eclipse Integrated Development Environment
Icon=eclipse
Exec=eclipse
Terminal=false
Categories=Development;IDE;Java;

5.Make a symlink at /usr/local/bin by

cd /usr/local/bin
and
sudo ln -s /opt/eclipse/eclipse

6.We are done installing juno , you can now search eclipse in dashboard or run via terminal


Now lets get a hello world running

In Eclipse choose help->install new software
and use https://dl.google.com/eclipse/plugin/4.2 to install
  Google Plugin for Eclipse (required)   
  and
  SDKs->Google App Engine Java SDK 1.9.7    1.9.7




Once that is done we need to download the GWT SDK from
download

Then go to window->preference->google->web toolkit
use the 'Add' button to browse the location where you have extracted and submit.

We are done , you can create projects using
file->new -> project -> google-> web application project

Good luck


Wednesday 9 July 2014

A webcrawler to detect 404 , using scrapy in python

Well i was recently presented with a problem of mitigating the ironic 404 pages for a web domain .

What best then to use a web crawler to crawl all the domain pages and observing the response !!

We shall summon scrapy spiders to do our biddings .

You can install it by

$ sudo apt-get install python-pip python-dev libffi-dev libxslt1-dev libxslt1.1 libxml2-dev libxml2 libssl-dev

$sudo pip install Scrapy

$sudo pip install service_identity

Then start the project by ,
$ scrapy startproject Project_name

This will create the directory structure :
Project_name/
    scrapy.cfg
    Project_name/
        __init__.py
        items.py
        pipelines.py
        settings.py
        spiders/
            __init__.py
           

Under spiders , create a python script with any name , and compose the soul of the spider

For the 404 you can use this





import scrapy
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import *
from scrapy.item import Item, Field

class Item(scrapy.Item):
    title = Field()
    link = Field()
    response = Field()
    refer = Field()

class MySpider(CrawlSpider):
    name = "AcrazySpiderofDoom"
    allowed_domains = ["www.domain.com"]
    start_urls = ["http://www.domain.com/"]

    rules = (Rule (SgmlLinkExtractor(allow=(),unique=True)
    , callback="parse_items", follow= True),
    )



    def parse_items(self,response):
        item = Item()
            item ["title"] = response.xpath('//title').extract()[0]
            item ["link"] = response.url
        item ["response"] = response.status
        item ["refer"] = response.request.headers['referer']
        return item



Thats it, Give it life by

$ scrapy crawl AcrazySpiderofDoom


You can even make it enter the data to a csv by
$ scrapy crawl AcrazySpiderofDoom -o items.csv









Example screen grab of running the crawler for partypoker.com




Once completed (I killed it here) , it will also provide valuable statistics