Tuesday 14 June 2011

On Google PageRank and SEO

Content aggregators are bad. They just rely on original work and using seo/sem techniques manage to push their content above the original ones.

A friend had a problem where his original post enjoyed a brief appearance on the first pages of google and then it got picked up by an aggregator/scraper. It went downhill from there.

https://www.sumo.gr/2011/06/01/ta-10-kalytera-nhsia-toy-kosmoy/

So here is some free (as in free beer) advice on SEO:

The hyperlinks of the articles should ideally have the form of
* https://www.sumo.gr/{date}/{article-title}[B].html[/B]
The html extension is important as it essentially tricks the google bot to think that you are serving a static page (vs a dynamic) and so it's boosted

* Even better, the link page and file title should be in its original language:
https://www.sumo.gr/2011/06/01/Τα-10-καλύτερα-νησιά-του-κόσμου.html
So, searching for "τα καλύτερα νησιά" there should be partial and direct correlation to an address

* IMG tags should always have an ALT text! Ideally it should be normal text so *that* can also be included in the index. Normally the filename it self should be human readable.

* The page title must not include the site's name in each and every page/article, only the article's title

* The <meta name="keywords" /> is very bad. So much, that i believe that it appears as spam to the bot. Let it have 10-15 keywords (even if for every language although the original language is the only important) and non-repeatable. Only the "hawaii" keyword appears 5 times! And the title appears something like 7-8 times

* The <meta name="description" /> is a little large. Keep it close to 140-150 characters (now its 167, google keeps 160, bing 150)

Here i should say that allegedly meta description/keyword are not used in the algorithm (since 2009) but that doesn't mean that are not getting flagged as spam

* Content-Encoding:gzip . 'nuff said. Generally the site suffers from a performance perspective and it lowers your pagespeed rank

* caching:
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Expires: ...1981
Really? This does not influence pagerank directly but pagespeed. And *that* influences pagerank

In fact, the latest trend in google search is giving direct preference to pagespeed

* passing the w3c validator may or not may be important
http://validator.w3.org/check?uri=https%3A%2F%2Fwww.sumo.gr%2F2011%2F06%2F01%2Fta-10-kalytera-nhsia-toy-kosmoy%2F&charset=%28detect+automatically%29&doctype=Inline&group=0

Ignoring the various pseudo-attributes for facebook and meta data like rel (and which generally are not considered "braking" there are actual html errors. The most important i could detect is not cleaning double quotes. Take for example the text <<Το πρώτο νέο "σπίτι" της ανθρωπότητας>>. if this is placed in a title attribute (and it is) then the whole element brakes, disrupting the bot's work:
<a href="..." title="Το πρώτο νέο "σπίτι" της ανθρωπότητας" is read like
<a href="..." title="Το πρώτο νέο " σπίτι" της ανθρωπότητας" where <<σπίτι>> is a bad attribute.

also, <<Stray end tag a.>> not good, bots expect quality html

Generally the aggregator's w3c validation is much much better and this may boost them instead of you.

This definitely not an exhaustive list but just some things i saw in a short glance

Thursday 28 April 2011

NHibernate and the missing OperatorProjection

It's been a long while, unfortunately i can't seem to find time to tend to my blog :/

I've been using NHibernate for some years now and although i consider it to be great there are some things that bug me. I've been looking at some questions over at SO (like for example http://stackoverflow.com/questions/4243890/nhibernate-restriction-with-sum-of-2-columns and http://stackoverflow.com/questions/2936700/nhibernate-criteria-how-to-filter-on-combination-of-properties) and it seems that i am not the only one that has this problem.

I consider it odd that not all standard SQL-92 operators -mathematical and maybe bitwise / unary ?- are supported in an intuitive way (there, i said it). In fact there is specific implementation for each of the supported conditional operators via Restrictions (Eq, Lt, Le etc).

I consider the Criteria API the strongest query engine for NHibernate and if you sweeten it with some LINQ candy just to get compilation support you can get the most out of it.

So, i looked about to find out how hard it would be to make an OperatorProjection.

Apparently it wasn't that hard :-)

The solution was to create my own Projection that is responsible to generate the necessary SQL, which derives from NHibernate's own SimpleProjection.

Lets say, for the sake of argument, that i want to return a given property with an added number.

This is pretty easy in hql, its like writing sql:
select t.Id, t.Duration, t.Duration + 5 from WorkSpecification

In vanilla Criteria API its not that simple and it hides a couple of traps:
crit.SetProjection(Projections.Id(),
    Projections.Property("Duration"),
    Projections.SqlProjection("{alias}.DaysDuration + 5",
                              new[] {"DaysDuration"},
                              new[]
                                  {
                                      NHibernateUtil.Int32
                                  }));

Its not simple because you have to write SQL and that means that the "DaysDuration" reference inside the SqlProjection means directly to a Column named "DaysDuration" and that point the ORM looses its power and flexibility.

Secondly the {alias} part commands the Criteria engine to inject at that point the alias of the root entity. That distinctively destroys any hope to make joins and apply the desired projection to the joined row (you can make subqueries via DetachedCriteria but that's not the same).

With OperatorProjection it can be written as

crit.SetProjection(Projections.Id(),
               Projections.Property("Duration"),
               new ArithmeticOperatorProjection("+",
                                                NHibernateUtil.Int32,
                                                Projections.Property("Duration"), Projections.Constant(5))
);


which generates this piece of SQL
SELECT this_.Id as y0_, this_.DaysDuration as y1_, (this_.DaysDuration + 5) as y2_ FROM WorkSpecification

The ugly part is that you pass the operator symbol as a string (but we may be able to do something about it as you will see in part 2) but then again now you don't need to rely to SQL. The only thing required is knowledge of the domain.

And since the Restrictions API also includes IProjections as parameters now this is also feasable:

zcrit.Add(
    Restrictions.Eq(new ArithmeticOperatorProjection("+",
                                    NHibernateUtil.Int32,
                                    Projections.Property("Duration"), Projections.Constant(5)
                                    )
                                    , 6
         );

which generates this piece of SQL
SELECT this_.(…) FROM WorkSpecification this_ WHERE (this_.DaysDuration + @p0) = @p1; @p0 = 5, @p1 = 6

Interestingly and as a side benefit Microsoft's SQL Server also uses the + operator to concatenate strings. And the result of the experiment was...

zcrit.Add(
        Restrictions.Eq(new ArithmeticOperatorProjection("+",
                            NHibernateUtil.String,
                            Projections.Property("FirstName"), Projections.Constant(" "), Projections.Property("LastName")
                            )
                        , “foo bar”
                        )
    );

which generates this piece of SQL
SELECT this_.(…) FROM PersonSpecification this_ WHERE (this_.FirstName + @p0 + this_.LastName) = @p1;
@p0 = ' ', @p1 = 'foo bar'

So what makes all this happen? Check it out below:
public abstract class OperatorProjection : SimpleProjection
    {
        private readonly IProjection[] args;
        private readonly IType returnType;
        
        private string op;
        private string Op { get { return op; }
            set
            {
                var trimmed = value.Trim();
                if (System.Array.IndexOf(AllowedOperators, trimmed) == -1)
                    throw new ArgumentOutOfRangeException("value", trimmed, "Not allowed operator");
                op = " " + trimmed + " ";
            }
        }

        public abstract string[] AllowedOperators { get; }

        protected OperatorProjection(string op, IType returnType, params IProjection[] args)
        {
            this.Op = op;    
            this.returnType = returnType;
            this.args = args;
        }

        public override SqlString ToSqlString(ICriteria criteria, int position, ICriteriaQuery criteriaQuery, IDictionary<string, IFilter> enabledFilters)
        {
            SqlStringBuilder sb = new SqlStringBuilder();
            sb.Add("(");

            for (int i = 0; i < args.Length; i++)
            {
                int loc = (position + 1) * 1000 + i;
                SqlString projectArg = GetProjectionArgument(criteriaQuery, criteria, args[i], loc, enabledFilters);
                sb.Add(projectArg);

                if (i < args.Length - 1)
                    sb.Add(Op);
            }
            sb.Add(")");
            sb.Add(" as ");
            sb.Add(GetColumnAliases(position)[0]);
            return sb.ToSqlString();
        }

        private static SqlString GetProjectionArgument(ICriteriaQuery criteriaQuery, ICriteria criteria,
                                                       IProjection projection, int loc,
                                                       IDictionary<string, IFilter> enabledFilters)
        {
            SqlString sql = projection.ToSqlString(criteria, loc, criteriaQuery, enabledFilters);
            return StringHelper.RemoveAsAliasesFromSql(sql);
        }

        public override IType[] GetTypes(ICriteria criteria, ICriteriaQuery criteriaQuery)
        {
            return new IType[] { returnType };
        }

        public override bool IsAggregate
        {
            get { return false; }
        }

        public override bool IsGrouped
        {
            get
            {
                foreach (IProjection projection in args)
                {
                    if (projection.IsGrouped)
                    {
                        return true;
                    }
                }
                return false;
            }
        }

        public override SqlString ToGroupSqlString(ICriteria criteria, ICriteriaQuery criteriaQuery, IDictionary<string, IFilter> enabledFilters)
        {
            SqlStringBuilder buf = new SqlStringBuilder();
            foreach (IProjection projection in args)
            {
                if (projection.IsGrouped)
                {
                    buf.Add(projection.ToGroupSqlString(criteria, criteriaQuery, enabledFilters)).Add(", ");
                }
            }
            if (buf.Count >= 2)
            {
                buf.RemoveAt(buf.Count - 1);
            }
            return buf.ToSqlString();
        }
    }

Which is not really complicated and in fact is modified code from NHibernate's own Projections.SqlFunction(), and is a handy-dandy abstract class in case some one wants to implement some other operator. Note that the above class expects pairs of projections so it is not suitable for Unary Operators, although the only thing that has to change is the ToSqlString() method.

So the actual usable implementations look like this:

public class ArithmeticOperatorProjection : OperatorProjection
    {
        public ArithmeticOperatorProjection(string op, IType returnType, params IProjection[] args)
            : base(op, returnType, args)
        {
            if (args.Length < 2)
                throw new ArgumentOutOfRangeException("args", args.Length, "Requires at least 2 projections");
        }

        public override string[] AllowedOperators
        {
            get { return new[] { "+", "-", "*", "/", "%" }; }
        }
    }

    public class BitwiseOperatorProjection : OperatorProjection
    {
        public BitwiseOperatorProjection(string op, IType returnType, params IProjection[] args)
            : base(op, returnType, args)
        {
            if (args.Length < 2)
                throw new ArgumentOutOfRangeException("args", args.Length, "Requires at least 2 projections");
        }

        public override string[] AllowedOperators
        {
            get { return new[] { "&", "|", "^" }; }
        }
    }

You might notice that during sql generation i explicitly place the generated sql inside parenthesis () because, well they don't really hurt :-P.
Jokes aside, i haven't tested this in enough complicated scenarios to see if it behaves properly without parenthesis. Besides that, in the case of using multiple operators side by side, there is operator precedence that has to be taken into account, and at least for now there is no code to detect and/or fix that.

So it all boils down to the modular nature of NHibernate and the ability to write your own tools without having to necessarily modify the source. Can you say "win" ?