Graph Analytics

Mr.polisetty Sairaghuram
3 min readApr 22, 2023

--

Analyzing graph

The practice of analyzing and interpreting data represented as a graph or network is known as graph analytics. There are various sorts of graphs, including social networks, transportation networks, biological networks, and so on. Graph analytics is used to study the structure of these graphs and uncover patterns and interactions between the graph’s nodes (or vertices) and edges (or links).

Problem statement:

Given a set of nodes (users) in a social network graph, the goal is to identify the influencing (important) users and estimate the likelihood of a future link (edge) between two nodes, despite the fact that no association exists between the nodes in the current state of the graph.

MOTIVATION:
Friendship, collaboration, following, or mutual interests are all examples of edges indicated in the issue statement. Here, we analyze and create our model specifically over Facebook’s social network, with the following,The general application of a friend’s recommendation to a specific user.
Predicting hidden relationships in a terrorist-created social network organization, as well as identifying their leaders/key influencers.
Product marketing that is targeted: Marketing through highly powerful persons, as well as discovering potential customers.
Suggestions of possible contacts or collaborations within an organization that has not yet been recognized.
Link prediction can be used in bioinformatics to discover protein interactions.
The following approach can be extended or modified to meet the demands of different social networks such as Twitter, Google+, and Facebook.

Data set:

http://snap.stanford.edu/data/egonets-Gplus.html

This dataset is made up of Facebook’s ‘circles’ (or ‘friends lists’).

This anonymized dataset contains node profiles, circles, and ego networks.

The edges are not directed in any way.

10 ego networks with 193 circles and 4,039 users.

The following format is used to describe the characteristics of various nodes: [Type]:[Subtype]:attributeName

The graphic below depicts an example of the attribute and feature array generation technique.

LINK PREDICTION:

We applied the following Machine learning approach based on our survey[1, usability criteria, and experiments]:

Support Vector Machine: The Support Vector Machine (SVM) is a supervised learning technique that is used for classification, regression, and outlier detection. It operates by locating the optimal separating hyperplane between two classes in a high-dimensional space. SVM is frequently utilized in pattern recognition and machine learning applications.

The primary principle of SVM is to discover the hyperplane that maximizes the margin between the two classes. The margin is defined as the distance between the hyperplane and the nearest data point from each class. The ideal hyperplane is the one that maximizes this distance. SVM can handle non-linearly separable data by transferring the input data to a higher-dimensional space using a kernel function.

The basic idea is to separate the two classes using a hyperplane.

There are two classes here: Both linked and unlinked

Input: A graph dataset with labels attached (one for each ego network).
dictionary (230) Features
Output: The predicted association between two nodes [x,y] is: 0 if there is no association and 1 otherwise.
We divide the dataset in a 2:1:1 ratio for Train:Validation: Test

Python libraries used in the code:

Plotly is a graphing framework that allows you to create dynamic, publication-quality graphs online.

IGraph: iGraph is a suite of network analysis tools with a focus on efficiency**, portability**, and usability. igraph is open source and completely free.

Numpy: Adds support for huge, multi-dimensional arrays and matrices, as well as a wide set of high-level mathematical functions for working with these arrays. Scikit (dependency)

Scipy is a Python open-source library used for scientific and technical

computing.Scikit (dependency)

Scikit-Learn: A simple and effective data mining and data analysis tool. Used to reduce dimensionality and implement machine learning methods.

--

--