Agent skill

sap-hana-ml

SAP HANA Machine Learning Python Client (hana-ml) development skill. Use when: Building ML solutions with SAP HANA's in-database machine learning using Python hana-ml library for PAL/APL algorithms, DataFrame operations, AutoML, model persistence, and visualization. Keywords: hana-ml, SAP HANA, machine learning, PAL, APL, predictive analytics, HANA DataFrame, ConnectionContext, classification, regression, clustering, time series, ARIMA, gradient boosting, AutoML, SHAP, model storage

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/sap-hana-ml

Metadata

Additional technical details for this skill

version
1.1.0
last verified
1764201600
package version
2.22.241011

SKILL.md

SAP HANA ML Python Client (hana-ml)

Package Version: 2.22.241011
Last Verified: 2025-11-27

Table of Contents


Installation & Setup

bash
pip install hana-ml

Requirements: Python 3.8+, SAP HANA 2.0 SPS03+ or SAP HANA Cloud


Quick Start

Connection & DataFrame

python
from hana_ml import ConnectionContext

# Connect
conn = ConnectionContext(
    address='<hostname>',
    port=443,
    user='<username>',
    password='<password>',
    encrypt=True
)

# Create DataFrame
df = conn.table('MY_TABLE', schema='MY_SCHEMA')
print(f"Shape: {df.shape}")
df.head(10).collect()

PAL Classification

python
from hana_ml.algorithms.pal.unified_classification import UnifiedClassification

# Train model
clf = UnifiedClassification(func='RandomDecisionTree')
clf.fit(train_df, features=['F1', 'F2', 'F3'], label='TARGET')

# Predict & evaluate
predictions = clf.predict(test_df, features=['F1', 'F2', 'F3'])
score = clf.score(test_df, features=['F1', 'F2', 'F3'], label='TARGET')

APL AutoML

python
from hana_ml.algorithms.apl.classification import AutoClassifier

# Automated classification
auto_clf = AutoClassifier()
auto_clf.fit(train_df, label='TARGET')
predictions = auto_clf.predict(test_df)

Model Persistence

python
from hana_ml.model_storage import ModelStorage

ms = ModelStorage(conn)
clf.name = 'MY_CLASSIFIER'
ms.save_model(model=clf, if_exists='replace')

Core Libraries

PAL (Predictive Analysis Library)

  • 100+ algorithms executed in-database
  • Categories: Classification, Regression, Clustering, Time Series, Preprocessing
  • Key classes: UnifiedClassification, UnifiedRegression, KMeans, ARIMA
  • See: references/PAL_ALGORITHMS.md for complete list

APL (Automated Predictive Library)

  • AutoML capabilities with automatic feature engineering
  • Key classes: AutoClassifier, AutoRegressor, GradientBoostingClassifier
  • See: references/APL_ALGORITHMS.md for details

DataFrames

  • Lazy evaluation - builds SQL until collect() called
  • In-database processing for optimal performance
  • See: references/DATAFRAME_REFERENCE.md for complete API

Visualizers

  • EDA plots, model explanations, metrics
  • SHAP integration for model interpretability
  • See: references/VISUALIZERS.md for 14 visualization modules

Common Patterns

Train-Test Split

python
from hana_ml.algorithms.pal.partition import train_test_val_split

train, test, val = train_test_val_split(
    data=df,
    training_percentage=0.7,
    testing_percentage=0.2,
    validation_percentage=0.1
)

Feature Importance

python
# APL models
importance = auto_clf.get_feature_importances()

# PAL models
from hana_ml.algorithms.pal.preprocessing import FeatureSelection
fs = FeatureSelection()
fs.fit(train_df, features=features, label='TARGET')

Pipeline

python
from hana_ml.algorithms.pal.pipeline import Pipeline
from hana_ml.algorithms.pal.preprocessing import Imputer, FeatureNormalizer

pipeline = Pipeline([
    ('imputer', Imputer(strategy='mean')),
    ('normalizer', FeatureNormalizer()),
    ('classifier', UnifiedClassification(func='RandomDecisionTree'))
])

Best Practices

  1. Use lazy evaluation - Operations build SQL without execution until collect()
  2. Leverage in-database processing - Keep data in HANA for performance
  3. Use Unified interfaces - Consistent APIs across algorithms
  4. Save models - Use ModelStorage for persistence
  5. Explain predictions - Use SHAP explainers for interpretability
  6. Monitor AutoML - Use PipelineProgressStatusMonitor for long-running jobs

Bundled Resources

Reference Files

  • references/DATAFRAME_REFERENCE.md (479 lines)

    • ConnectionContext API, DataFrame operations, SQL generation
  • references/PAL_ALGORITHMS.md (869 lines)

    • Complete PAL algorithm reference (100+ algorithms)
    • Classification, Regression, Clustering, Time Series, Preprocessing
  • references/APL_ALGORITHMS.md (534 lines)

    • AutoML capabilities, automated feature engineering
    • AutoClassifier, AutoRegressor, GradientBoosting classes
  • references/VISUALIZERS.md (704 lines)

    • 14 visualization modules (EDA, SHAP, metrics, time series)
    • Plot types, configuration, export options
  • references/SUPPORTING_MODULES.md (626 lines)

    • Model storage, spatial analytics, graph algorithms
    • Text mining, statistics, error handling

Error Handling

python
from hana_ml.ml_exceptions import Error

try:
    clf.fit(train_df, features=features, label='TARGET')
except Error as e:
    print(f"HANA ML Error: {e}")

Documentation

Didn't find tool you were looking for?

Be as detailed as possible for better results