Graphs can be directed or undirected - a directed graph’s nodes are linked with a direction (surprisingly), whereas the direction of links are irrelevant for an undirected graph. ![]() This article isn’t meant as a primer on Graph Theory, but I do want to highlight a few important properties of graphs, which we’ll find useful further along the exercise: Where G is our graph, made up of a set of vertices V (or nodes) and a set of edges E (or links). Graphs are, by definition, a set of vertices and edges: Once we have our list of entities for each article, we’ll organise them into a graph structure. We’ll use one of these models to list, for each article, words tagged as persons and organisations by the model. It achieves this using statistical models trained on large corpora of documents the model learns to recognise and categorise entities based on the context in which they appear as words. Named Entity Recognition is a Natural Language Processing task for extracting information from text as the name suggests, it recognises entities, such as proper nouns within unstructured data. Luckily for us, Named Entity Recognition exists. Our first order of business is identifying individuals of interest within articles, which is not so easily done, if we don’t know whom we’re looking for ahead of time (otherwise we could use a simple string search). To achieve this, we’ll use two techniques, which I’ll briefly touch upon now. You can imagine the kinds of insights we can draw from hundreds of documents writing about thousands of people. This is just a bare-bones example of three (very short) articles on the news. We can also tell that Donald is the alpha male in the group, having influence over both Mike and Boris. We not only get a pictorial representation of the friendship group, we can also start seeing hidden relationships: although Boris wasn’t mentioned in the same article as Mike, we can guess with some certainty that the two are related (and that they are related via their mutual friend, Donald). We’ll therefore construct a social network like the following: Let’s take an example: if an article mentions Donald and Boris, and two other, separate articles mention Donald and Mike, we’ll say that Donald is friends with Boris, and Donald is also friends with Mike, only twice as much.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |