[geeklog-devel] [geeklog-spam] Google Summer of Code 2009: Geeklog

saurabh gupta saurabhgupta1403 at gmail.com
Fri Mar 20 19:13:39 EDT 2009


Hi,

On Sat, Mar 21, 2009 at 1:02 AM, Dirk Haun <dirk at haun-online.de> wrote:
> saurabh gupta wrote:
>
>>All right. What I have understood from your point is that whenever a
>>user is a spammer, then the SWOT feed should pass this information
>>also. This can be a good option because in many cases spammers with
>>same user  name and account trolls on other sites.
>
> I'm beginning to wonder if that was Mike's original intention, but I
> would like to be able to have several SWOT feeds for different purposes.
>
> The way I see it: You can catch a lot of the usual spam with simple
> measures. Add the usual pill names and a couple of porn-related
> expressions to your personal blacklist. Add SLV to the mix, and you've
> already got rid of a lot of spam.
>
> But I'm beginning to see an increasing amount of more subtle spam, some
> of it manual. It won't be caught by SLV (or Akismet or similar services)
> because it's low-volume but still annoying. I want to have a way to put
> information about that sort of spam out there. SWOT may be the answer to that.
>
> Of course, other SWOT feeds would contain other information such as,
> say, about that spammer from the Ukraine who was so annoying that I had
> to block the entire 91.207.4.0/22 and 91.207.8.0/23 adress ranges[1] in
> my .htaccess.
>
> Of course, being able to selectively or semi-automatically put different
> information into different feeds would also require some changes in the
> way the Spam-X plugin currently works (on the user, i.e. the Admin's,
> side). So maybe that is the point where the two spam-related projects
> (SWOT and Spam-X overhaul) are beginning to merge into one ...

All right. The whole scenario is like that in a site, different kind
of spammers and spam posts are encountered. But some spams are
specific to a particular site type or business type related to that as
we have discussed above. In my opinion, the best way to implement new
features is to make them enough simple, comfortable user interface and
expandable application. As we discussed, this thing can be handled by
categorizing the spam feeds. One will be general SWOT feeds which
provide information about the general spams and  another will be
improvement in spam-X engine and adding new features to it which are
(a brief overview):

1. New type of feeds which are site related and have full
export/import facility through graphical user interface. The idea has
been discussed and I am thinking to store the data in form of db
(databases) files for which parsing and manipulations APIs are
available in geeklog.

2. Spam queue implementation to deal with the false positives.
(already discussed)

3. Modification of current features of spam-X like  removal of
duplicate entries, sorting of list, storing of all data in database so
that it can be exported etc. etc.

4. Features like highlighting, marking the spam words in the spam
detected comments and stories.

5. Other ideas indicated on the geeklog wiki
(http://wiki.geeklog.net/index.php/SoC_spam-x_overhaul).


> Just thinking out loud. But having some brilliant idea here would
> probably increase your chances of being accepted :)
I will try my best to do enough research on this project and
contribute to my maximum.

Both spam-X and SWOT ideas can be merged to some extent in this summer
and the above features and  some more can be added as time permits.
However, from my experience, I feel that making the site more admin
and user friendly is more important. Like now, the administrator has
to add several list of spammers in the spam-X engine to train its
spam-detection engine. If we can bring more intelligence to the
spam-X, then it will ease the life a lot.

For example, whenever a comment is posted and it is a spam according
to the user. Now, we can provide a separate button along with each
post which says "Mark as spam". Now, after clicking on this, user will
get the option for on what basis it can be a spam post. Either its
user name, its IP or its origin is an indication of spam. After this,
user will also get the choice to add the spam info either to the SWOT
feed or to the spam-X engine. This can to some extent remove the
headache of manually adding the IP or usernames of spammers to the
spam-X. I don't know how much feasible is this idea but this can also
be a part of this summer if time permits. I can see a huge scope in
spam dealing mechanism of geeklog.

Also, the import/export feature is itself can be a big and challenging
work and can bring a lot of relief to the users or admins who run more
than one sites using geeklog. In fact, any new user can also import
the spam-X feed of some famous site which he considers to be enough
spam-proof. If necessary, any security feature can be added which will
not allow any one to import the spam-X feed without the permission of
site-admin.

As, we can see there are a lot of work to do and also due to merging
of spam-X and SWOT (to some extent), I will try to start working on it
as soon as possible irrespective of whether I get accepted or not. Now
a days, I can't devote my full time in this because of my college's
mid-semester exams going on. I will be over with this very soon.


-- 
Saurabh Gupta
Senior,
NSIT,New Delhi, India



More information about the geeklog-devel mailing list