Agent skill
building-role-mining-for-rbac-optimization
Apply bottom-up and top-down role mining techniques to discover optimal RBAC roles from existing user-permission assignments, reducing role explosion and enforcing least privilege.
Install this agent skill to your Project
npx add-skill https://github.com/mukul975/Anthropic-Cybersecurity-Skills/tree/main/skills/building-role-mining-for-rbac-optimization
SKILL.md
Building Role Mining for RBAC Optimization
Overview
Role mining is the process of analyzing existing user-permission assignments to discover optimal roles for a Role-Based Access Control (RBAC) system. Organizations accumulate excessive permissions over time through job changes, project assignments, and ad-hoc access grants, leading to "role explosion" where thousands of granular roles exist with significant overlap. Role mining uses data analysis -- including clustering algorithms, formal concept analysis, and graph-based methods -- to consolidate permissions into a minimal set of roles that accurately represent business functions while enforcing least privilege.
When to Use
- When deploying or configuring building role mining for rbac optimization capabilities in your environment
- When establishing security controls aligned to compliance requirements
- When building or improving security architecture for this domain
- When conducting security assessments that require this implementation
Prerequisites
- Export of current user-permission assignments (CSV/database)
- Identity governance platform or directory service access
- Python 3.9+ with pandas, scikit-learn, numpy
- Understanding of organizational structure and job functions
- Stakeholder access for role validation workshops
Core Concepts
Role Mining Approaches
| Approach | Description | Best For |
|---|---|---|
| Bottom-Up | Analyze existing permissions to discover common patterns | Large datasets with organic permission growth |
| Top-Down | Design roles from business requirements and job descriptions | Greenfield RBAC or organizational restructuring |
| Hybrid | Combine bottom-up analysis with top-down business validation | Most production environments |
Role Mining Algorithms
1. Permission Clustering: Group users with similar permission sets using k-means or hierarchical clustering. Users in the same cluster share a common role.
2. Formal Concept Analysis (FCA): Mathematical framework that identifies complete set of concepts (user groups sharing exact permission sets) from a binary user-permission matrix.
3. Graph-Based Mining: Model users and permissions as a bipartite graph, then find dense subgraphs representing candidate roles.
4. Boolean Matrix Decomposition: Decompose the user-permission matrix U into U ≈ R × P where R maps users to roles and P maps roles to permissions.
Role Mining Metrics
| Metric | Formula | Target |
|---|---|---|
| Role Count | Total distinct roles after mining | Minimize |
| Coverage | Permissions explained by mined roles / Total permissions | > 95% |
| Weighted Structural Complexity (WSC) | Sum of role-user + role-permission assignments | Minimize |
| Deviation | Extra permissions not covered by assigned roles | < 5% |
Workflow
Step 1: Extract User-Permission Data
Collect the current access state from all identity sources:
import pandas as pd
import numpy as np
# Load user-permission assignments
# Format: user_id, permission_id (one row per assignment)
assignments = pd.read_csv("user_permissions.csv")
# Create binary user-permission matrix (UPA matrix)
upa_matrix = assignments.pivot_table(
index="user_id",
columns="permission_id",
aggfunc="size",
fill_value=0
)
upa_matrix = (upa_matrix > 0).astype(int)
print(f"Users: {upa_matrix.shape[0]}")
print(f"Permissions: {upa_matrix.shape[1]}")
print(f"Assignments: {assignments.shape[0]}")
print(f"Density: {upa_matrix.values.sum() / upa_matrix.size:.2%}")
Step 2: Bottom-Up Role Discovery Using Clustering
from sklearn.cluster import AgglomerativeClustering
from sklearn.metrics import silhouette_score
def find_optimal_clusters(matrix, max_k=50):
"""Find optimal number of roles using silhouette analysis."""
scores = []
for k in range(2, min(max_k, matrix.shape[0])):
clustering = AgglomerativeClustering(
n_clusters=k, metric="jaccard", linkage="average"
)
labels = clustering.fit_predict(matrix)
score = silhouette_score(matrix, labels, metric="jaccard")
scores.append((k, score))
optimal_k = max(scores, key=lambda x: x[1])[0]
return optimal_k, scores
def mine_roles_clustering(upa_matrix, n_clusters):
"""Mine roles using hierarchical clustering on Jaccard distance."""
clustering = AgglomerativeClustering(
n_clusters=n_clusters, metric="jaccard", linkage="average"
)
user_matrix = upa_matrix.values
labels = clustering.fit_predict(user_matrix)
roles = {}
for cluster_id in range(n_clusters):
cluster_users = upa_matrix.index[labels == cluster_id]
cluster_permissions = upa_matrix.loc[cluster_users]
# Core role = permissions held by >80% of cluster members
permission_frequency = cluster_permissions.mean()
core_permissions = permission_frequency[permission_frequency >= 0.8].index.tolist()
roles[f"Role_{cluster_id}"] = {
"permissions": core_permissions,
"user_count": len(cluster_users),
"users": cluster_users.tolist(),
"coverage": permission_frequency[permission_frequency >= 0.8].mean()
}
return roles, labels
Step 3: Formal Concept Analysis
def mine_roles_fca(upa_matrix, min_support=3):
"""Mine roles using Formal Concept Analysis (frequent closed itemsets)."""
from itertools import combinations
users = upa_matrix.index.tolist()
permissions = upa_matrix.columns.tolist()
concepts = []
# Find all maximal permission sets shared by at least min_support users
for size in range(len(permissions), 0, -1):
for perm_combo in combinations(permissions, size):
perm_set = set(perm_combo)
# Find users who have ALL permissions in this set
matching_users = []
for user in users:
user_perms = set(upa_matrix.columns[upa_matrix.loc[user] == 1])
if perm_set.issubset(user_perms):
matching_users.append(user)
if len(matching_users) >= min_support:
# Check if this is a closed concept (no superset with same extent)
is_closed = True
for concept in concepts:
if set(matching_users) == set(concept["users"]) and \
perm_set.issubset(set(concept["permissions"])):
is_closed = False
break
if is_closed:
concepts.append({
"permissions": list(perm_set),
"users": matching_users,
"support": len(matching_users)
})
if len(concepts) > 100: # Limit for performance
break
return concepts
Step 4: Evaluate and Select Roles
def evaluate_role_set(roles, upa_matrix):
"""Evaluate the quality of a mined role set."""
total_assignments = upa_matrix.values.sum()
covered_assignments = 0
extra_assignments = 0
for role_name, role_data in roles.items():
role_perms = set(role_data["permissions"])
for user in role_data["users"]:
user_perms = set(upa_matrix.columns[upa_matrix.loc[user] == 1])
covered = role_perms.intersection(user_perms)
extra = role_perms - user_perms
covered_assignments += len(covered)
extra_assignments += len(extra)
metrics = {
"total_roles": len(roles),
"total_assignments": total_assignments,
"covered_assignments": covered_assignments,
"coverage_rate": covered_assignments / total_assignments if total_assignments else 0,
"extra_permissions": extra_assignments,
"deviation_rate": extra_assignments / (covered_assignments + extra_assignments) if (covered_assignments + extra_assignments) else 0,
"avg_role_size": np.mean([len(r["permissions"]) for r in roles.values()]),
"avg_users_per_role": np.mean([r["user_count"] for r in roles.values()]),
}
return metrics
Step 5: Business Validation
After mining candidate roles:
- Map mined roles to business functions (department, job title)
- Conduct workshops with business unit managers to validate role definitions
- Identify outlier permissions that indicate misconfiguration
- Refine roles based on feedback and re-evaluate metrics
- Document role definitions with business justification
Validation Checklist
- User-permission matrix extracted from all identity sources
- Multiple mining algorithms compared (clustering, FCA)
- Optimal role count determined via silhouette analysis or WSC
- Coverage rate exceeds 95% of existing assignments
- Deviation rate below 5% (minimal extra permissions)
- Mined roles validated with business stakeholders
- Role hierarchy defined (parent-child inheritance)
- Exception/outlier permissions documented
- Migration plan created for transitioning to new role model
- Ongoing role governance process defined
References
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
mapping-mitre-attack-techniques
Maps observed adversary behaviors, security alerts, and detection rules to MITRE ATT&CK techniques and sub-techniques to quantify detection coverage and guide control prioritization. Use when building an ATT&CK-based coverage heatmap, tagging SIEM alerts with technique IDs, aligning security controls to adversary playbooks, or reporting threat exposure to executives. Activates for requests involving ATT&CK Navigator, Sigma rules, MITRE D3FEND, or coverage gap analysis.
hunting-for-spearphishing-indicators
Hunt for spearphishing campaign indicators across email logs, endpoint telemetry, and network data to detect targeted email attacks.
analyzing-malicious-url-with-urlscan
URLScan.io is a free service for scanning and analyzing suspicious URLs. It captures screenshots, DOM content, HTTP transactions, JavaScript behavior, and network connections of web pages in an isolat
implementing-zero-standing-privilege-with-cyberark
Deploy CyberArk Secure Cloud Access to eliminate standing privileges in hybrid and multi-cloud environments using just-in-time access with time, entitlement, and approval controls.
implementing-pam-for-database-access
Deploy privileged access management for database systems including Oracle, SQL Server, PostgreSQL, and MySQL. Covers session proxy configuration, credential vaulting, query auditing, dynamic credentia
detecting-t1003-credential-dumping-with-edr
Detect OS credential dumping techniques targeting LSASS memory, SAM database, NTDS.dit, and cached credentials using EDR telemetry, Sysmon process access monitoring, and Windows security event correlation.
Didn't find tool you were looking for?