Classification Functions

Entity Extraction, Linking, Classification, and Tagging for Social Media: A Wikipedia-Based Approach.

Classification_functions.py - no_common_node_types

From the Web Information Retrieval's project


In the navbar you can find all the link to the codes with their explanation.

We start by printing all the common nodes in the common_nodes list.

print(common_nodes)
    all_types = list()
    not_in = False

For each node in the list we extract the types, storing them in a and if “Thing” is present we remove it. For each type of that node we:

As usual at the beginning we remove the “Thing” type.

    for elem in common_nodes:
        types = list(elem[1])
        types.remove("Thing")

        for t in types:

            there = next((item for item in all_types if item["type"] == t), False)

            if not there:
                to_insert = {"type": t, "count": 1, "score": elem[2]}
                all_types.append(to_insert)

If the type is already present we increment its counter and upgraded its score. The update of the score is done differently depending on the type of the tweet if it is one of the type twitted by the author of the tweet or not.

            else:
                present = next((item for item in p.get(author) if item[0] == t), False)
                if present:
                    couple = next(item for item in p.get(author) if item[0] == t)
                    print("Type " + t + " is present with prob. : " + str(couple[1]))
                    to_change = all_types[all_types.index(there)]
                    to_change["count"] = to_change["count"] + 1
                    print(t,to_change["score"])

In the case the type is contained in the data-structure containing the probabilities, we increment the score by considering the probabilities, so the overall score will be higher since this type is more likely to be the one of the tweet.

                    to_change["score"] = to_change["score"] + (elem[2]*(1 + (couple[1]*1000)))
                    print(t,to_change["score"])

If the type is not among the ones used by the author of the current tweet we update the score just by adding the score of the current node in the common_nodes list.

                else:
                    to_change = all_types[all_types.index(there)]
                    to_change["count"] = to_change["count"] + 1
                    print(t, to_change["score"])

                    to_change["score"] = to_change["score"] + elem[2]
                    print(t, to_change["score"])

The all_types is sorted according to the score, in order to get the most likely type for the tweet on top. Them we do the same reasoning of the one_node_type function:

    all_types = sorted(all_types, key=lambda i: i['score'], reverse=True)

    print(all_types)
    predicted_tag = ""

    # if all_types is empty we choose for the "Thing" class.
    if not all_types:
        predicted_tag = "Thing"

    else:
        select_type = all_types
        predicted_tag = "\\"

        while predicted_tag not in target_names:
            print(predicted_tag)
            if not select_type:
                not_in = True
                break

            predicted_tag = class_adjust(select_type[0].get('type'))
            select_type.pop(0)

        if not_in:
            # predicted_tag = all_types[0].get('type')
            predicted_tag = "Thing"

    return predicted_tag