Files
pre-repos/workday_monthly_admin_activities_audit/README.md
2026-02-03 15:10:50 -05:00

4.8 KiB

Workday Monthly Admin Activities Audit

A Python tool for processing Workday security admin audit reports. It extracts person names from audit data using NLP, filters relevant records, and generates human-readable summaries using a local LLM.

Overview

This script automates the monthly review of Workday administrative activities by:

  1. Extracting person names from unstructured audit text using spaCy NER
  2. Filtering and deduplicating audit records
  3. Generating natural language summaries of each admin action using LM Studio

Requirements

Python Dependencies

pip install pandas spacy openai requests tqdm
python -m spacy download en_core_web_trf

External Services

  • LM Studio: A local LLM server running on port 1234. The script will prompt for the IP address if the default is unreachable.

Configuration

Edit the constants at the top of process.py:

Variable Default Description
DEFAULT_LM_IP 100.113.108.121 IP address of LM Studio server
LLM_MODEL openai/gpt-oss-20b Model name to use for summarization
INPUT_CSV test.csv Input Workday audit report
OUTPUT_CSV test_with_names.csv Processed CSV with extracted names
EVENT_LOG event_log.txt Plain text event summaries
EVENT_LOG_MD event_log.md Markdown formatted event summaries
FINAL_SNAPSHOT final.csv Pre-LLM snapshot of filtered data

Input CSV Format

The script expects a Workday audit export with these columns:

Column Description
Instance that Changed Text describing what changed (contains person names)
Added Text describing what was added (contains person names)
In Transaction Transaction identifier
By User The admin who performed the action
Entered On Timestamp of the action

Processing Pipeline

Step 1: LM Studio Connection Check

The script first verifies connectivity to LM Studio. If unreachable, it prompts for a new IP address.

Step 2: spaCy Model Loading

Loads the en_core_web_trf transformer model for Named Entity Recognition (NER). Only the NER component is loaded for efficiency.

Step 3: Name Extraction (NER)

Uses spaCy to extract PERSON entities from audit text:

  1. Process "Added" column first - Extracts names and populates both Added Applied to and Applied to columns
  2. Process "Instance that Changed" - Only for rows where Applied to is still empty

Names are cleaned by:

  • Removing job codes and fragments after dashes/slashes
  • Keeping only alphabetic characters, apostrophes, periods, and hyphens
  • Deduplicating within each cell

Step 4: Data Filtering

The script applies several filters:

  1. Remove "Automatic Complete" rows - Filters out system-generated entries
  2. Keep matching rows only - Retains rows where Applied to equals Added Applied to
  3. Self-action filter - Removes rows where the person acted on themselves (name appears in By User)
  4. Deduplication - Removes duplicate rows

Step 5: LLM Summarization

Groups remaining records by:

  • Applied to (the person affected)
  • By User (the admin)
  • Entered On (timestamp)

For each group, sends a prompt to LM Studio requesting a one-sentence compliance summary including:

  • Who performed the action
  • Who it applied to
  • What roles were assigned/added
  • The date

Step 6: Output Generation

Generates three output files:

  1. test_with_names.csv - Processed CSV with extracted names
  2. event_log.txt - Plain text summaries grouped by admin user
  3. event_log.md - Markdown table format for easier reading

Output Format

event_log.txt

=== 1000054 / Dennis Cregan ===
01/06 - Dennis Cregan assigned the Academic Advising role to Casey Spelman on 1/6/26.
01/07 - Dennis Cregan assigned the PSID role to PSID on 1/7/26.

=== 1003966 / Kristinn Bjarnason ===
01/07 - Kristinn Bjarnason assigned the Professor role to Mary Perrodin on 1/7/26.

event_log.md

# Security Admin Audit Event Log

## 1000054 / Dennis Cregan

| Date | Event Summary |
|------|---------------|
| 01/06 | Dennis Cregan assigned the Academic Advising role to Casey Spelman on 1/6/26. |

Usage

python process.py

Caching Behavior

If test_with_names.csv already exists, the script skips CSV processing and jumps directly to LLM summarization. Delete this file to force full reprocessing.

Error Handling

  • LM Studio unreachable: Prompts for new IP address
  • LLM context too long: Records [LLM ERROR] in the output and continues
  • Missing columns: Gracefully handles missing optional columns

Performance Notes

  • The spaCy transformer model requires significant memory (~1GB)
  • Progress bars show NER extraction status with live name counts
  • LLM calls are sequential (one per event group)