import networkx as nx
import matplotlib.pyplot as plt
import numpy as np

Graph Featurization#

Raw graph structure alone is not always enough for machine learning. Featurization converts graph structure and node/edge properties into numeric vectors that models can consume.

Why Featurization?#

Graphs are inherently relational, but most ML models expect fixed-size numeric inputs. Featurization bridges this gap.

Types of Graph Features#

There are four main types:

Type

What it captures

Examples

1. Graph-level features

Properties of the whole graph

Diameter, density, spectral features

2. Edge features

Properties of connections

Weight, transaction amount, timestamp

3. Node features

Properties of individual nodes

Age, degree, label

Structural features

Topological role of a node

Centrality, clustering coefficient, motif count

In Graph Neural Networks, the model learns to aggregate these features across multi-hop neighborhoods automatically.

1. Graph-Level Features (Global View)#

Describe the entire graph as a whole.

What they capture:#

  • Overall connectivity

  • Global structure

  • Network complexity

Common Features:#

  • Diameter → longest shortest path in the graph

  • Average path length → average distance between nodes

  • Density → how many edges exist vs possible edges

  • Modularity → strength of community structure

Example:#

  • Social network: High density → many people are connected; Low diameter → information spreads quickly.

  • Used in: Graph classification (e.g., molecule type, network type)

2. Edge-Level Features (Relationship View)#

Describe connections between two nodes.

What they capture:#

  • Strength or nature of relationships

  • Likelihood of future connections

Common Features:#

  • Weight → distance, cost, interaction frequency

  • Common neighbors → shared connections

  • Jaccard similarity → similarity between neighborhoods

  • Edge betweenness → importance of an edge

Example:#

  • E-commerce:

    • Edge = user → product

    • Weight = number of purchases

  • Social network:

    • Two users with many common friends → high similarity

Used in: Link prediction, Recommendation systems, Fraud detection.

3. Node-Level Features (Local View — Focus)#

Describe individual nodes. There are two ways - via “Degree” and via “Centrality”

(a) Degree:#

Number of connections a node has
Types:

  • In-degree → incoming edges

  • Out-degree → outgoing edges

Example:#

  • Twitter:

    • High in-degree → many followers

    • High out-degree → follows many people

Interpretation:High degree → more connected → potentially more influence

(b) Centrality (Covered Next Section)#

Centrality measures capture node importance in the network.

  • Degree Centrality: Measures importance based on the number of direct connections a node has.

  • Closeness Centrality: Measures how quickly a node can reach all other nodes in the network.

  • Betweenness Centrality: Measures how often a node lies on the shortest paths between other nodes (bridge nodes).

  • Eigenvector Centrality: Measures importance by considering not just connections, but how important those neighbors are.

4. Structural Features (Topological Role)#

Describe how a node is positioned in the overall structure.

What they capture:#

  • Local neighborhood patterns

  • Role of node in graph structure

Common Features:#

  • Clustering coefficient → how connected neighbors are

  • Triangle count → number of triangles a node participates in

  • Motifs → recurring subgraph patterns

Example:#

  • Social network:High clustering → tight friend group

  • Fraud detection: Unusual patterns → suspicious behavior

# ── Build a feature table for each node ─────────────────────────────────────
features = {}
for node in G.nodes():
    deg   = G.degree(node)
    cc    = nx.clustering(G, node)
    close = nx.closeness_centrality(G, node)
    betw  = nx.betweenness_centrality(G)[node]
    neigh_degrees = [G.degree(nb) for nb in G.neighbors(node)]
    avg_nb_deg = np.mean(neigh_degrees) if neigh_degrees else 0

    features[node] = {
        'Degree':                      deg,
        'Clustering Coeff':            round(cc,   3),
        'Closeness Centrality':        round(close,3),
        'Betweenness Centrality':      round(betw, 3),
        'Avg Neighbor Degree':         round(avg_nb_deg, 2),
    }

df_features = pd.DataFrame(features).T
df_features.index.name = 'Node'
print("=== Node Feature Matrix (structural features) ===")
print(df_features.to_string())

# Heatmap
fig, ax = plt.subplots(figsize=(10, 4))
data_norm = df_features.apply(lambda col: (col - col.min()) / (col.max() - col.min() + 1e-9))
im = ax.imshow(data_norm.T.values, cmap='YlGn', aspect='auto', vmin=0, vmax=1)
ax.set_xticks(range(len(df_features)))
ax.set_yticks(range(len(df_features.columns)))
ax.set_xticklabels(df_features.index, fontsize=11)
ax.set_yticklabels(df_features.columns, fontsize=10)
for i, col in enumerate(df_features.columns):
    for j, node in enumerate(df_features.index):
        ax.text(j, i, df_features.loc[node, col], ha='center', va='center', fontsize=9)
ax.set_title('Node Feature Heatmap (darker = higher value)', fontsize=12, fontweight='bold', pad=12)
plt.colorbar(im, ax=ax, fraction=0.02)
plt.tight_layout()
plt.show()
=== Node Feature Matrix (structural features) ===
         Degree  Clustering Coeff  Closeness Centrality  Betweenness Centrality  Avg Neighbor Degree
Node                                                                                                
You         2.0             1.000                 0.556                    0.00                 3.00
Alice       3.0             0.333                 0.714                    0.30                 2.67
Bob         3.0             0.000                 0.714                    0.45                 2.00
Charlie     3.0             0.333                 0.625                    0.15                 2.33
David       1.0             0.000                 0.455                    0.00                 3.00
Eve         2.0             0.000                 0.625                    0.10                 3.00
../_images/cd986c781d2bfdd843d68ec0c94864bf17f9f07510b22c4553899656ffa8e370.png