Probabilities
Entity Extraction, Linking, Classification, and Tagging for Social Media: A Wikipedia-Based Approach.
AuthorAndClasses.py
From the Web Information Retrieval's project
In the navbar you can find all the link to the codes with their explanation.
This function computes the probabilities by reading the tweets and checking how many tweets were written by a particular
author, looking for the types and their occurrencies, once these values are computed we store the probabilities p, in a
dictionary indexed by the author’s name, where p is:
#tweets_of_a_with_type_t
for author a, t, p = ________________________
#tweets_of_a
Get probabilities
def get_probabilities():
dataset_paths = ["CSV/Sergio_one_label_data.csv","CSV/Gianluca_one_label_data.csv","CSV/Kai_one_label_data.csv"]
tweet_list = list()
for path in dataset_paths:
lst = readSingleLabeledCSV(path)
tweet_list = tweet_list + lst
authors = list()
probabilities = dict()
for t in tweet_list:
a = t.get('screen_name')
c = class_adjust(t.get('single_tag'))
there = next((item for item in authors if item["author"] == a), False)
first each author is stored in a list of dictionary called “authors”
in order to count how many tweet were written by a particular author a,
with the indication of the class and its occurencies.
if not there:
classes = list()
classes.append((c,1))
to_insert = {"author": a, "classes": classes, "count": 1}
authors.append(to_insert)
else:
to_change = authors[authors.index(there)]
to_change["count"] += 1
to_update = next((item for item in to_change.get("classes") if item[0] == c),False)
if not to_update:
to_change.get("classes").append((c,1))
else:
elem = to_change.get("classes")[to_change.get("classes").index(to_update)]
to_change.get("classes").remove(elem)
to_add = (elem[0],elem[1]+1)
to_change.get("classes").append(to_add)
“authors” list is sorted by author names
authors = sorted(authors, key=lambda i: i["author"], reverse=False)
here we create the dictionary indexed by the author’s name
for elem in authors:
a = elem.get("author")
classes = elem.get("classes")
d = elem.get("count")
sequence = list()
for c in classes:
prob = c[1]/d
sequence.append((c[0],prob))
probabilities[a] = sequence
return probabilities