4.8 KiB
Workday Monthly Admin Activities Audit
A Python tool for processing Workday security admin audit reports. It extracts person names from audit data using NLP, filters relevant records, and generates human-readable summaries using a local LLM.
Overview
This script automates the monthly review of Workday administrative activities by:
- Extracting person names from unstructured audit text using spaCy NER
- Filtering and deduplicating audit records
- Generating natural language summaries of each admin action using LM Studio
Requirements
Python Dependencies
pip install pandas spacy openai requests tqdm
python -m spacy download en_core_web_trf
External Services
- LM Studio: A local LLM server running on port 1234. The script will prompt for the IP address if the default is unreachable.
Configuration
Edit the constants at the top of process.py:
| Variable | Default | Description |
|---|---|---|
DEFAULT_LM_IP |
100.113.108.121 |
IP address of LM Studio server |
LLM_MODEL |
openai/gpt-oss-20b |
Model name to use for summarization |
INPUT_CSV |
test.csv |
Input Workday audit report |
OUTPUT_CSV |
test_with_names.csv |
Processed CSV with extracted names |
EVENT_LOG |
event_log.txt |
Plain text event summaries |
EVENT_LOG_MD |
event_log.md |
Markdown formatted event summaries |
FINAL_SNAPSHOT |
final.csv |
Pre-LLM snapshot of filtered data |
Input CSV Format
The script expects a Workday audit export with these columns:
| Column | Description |
|---|---|
Instance that Changed |
Text describing what changed (contains person names) |
Added |
Text describing what was added (contains person names) |
In Transaction |
Transaction identifier |
By User |
The admin who performed the action |
Entered On |
Timestamp of the action |
Processing Pipeline
Step 1: LM Studio Connection Check
The script first verifies connectivity to LM Studio. If unreachable, it prompts for a new IP address.
Step 2: spaCy Model Loading
Loads the en_core_web_trf transformer model for Named Entity Recognition (NER). Only the NER component is loaded for efficiency.
Step 3: Name Extraction (NER)
Uses spaCy to extract PERSON entities from audit text:
- Process "Added" column first - Extracts names and populates both
Added Applied toandApplied tocolumns - Process "Instance that Changed" - Only for rows where
Applied tois still empty
Names are cleaned by:
- Removing job codes and fragments after dashes/slashes
- Keeping only alphabetic characters, apostrophes, periods, and hyphens
- Deduplicating within each cell
Step 4: Data Filtering
The script applies several filters:
- Remove "Automatic Complete" rows - Filters out system-generated entries
- Keep matching rows only - Retains rows where
Applied toequalsAdded Applied to - Self-action filter - Removes rows where the person acted on themselves (name appears in
By User) - Deduplication - Removes duplicate rows
Step 5: LLM Summarization
Groups remaining records by:
Applied to(the person affected)By User(the admin)Entered On(timestamp)
For each group, sends a prompt to LM Studio requesting a one-sentence compliance summary including:
- Who performed the action
- Who it applied to
- What roles were assigned/added
- The date
Step 6: Output Generation
Generates three output files:
test_with_names.csv- Processed CSV with extracted namesevent_log.txt- Plain text summaries grouped by admin userevent_log.md- Markdown table format for easier reading
Output Format
event_log.txt
=== 1000054 / Dennis Cregan ===
01/06 - Dennis Cregan assigned the Academic Advising role to Casey Spelman on 1/6/26.
01/07 - Dennis Cregan assigned the PSID role to PSID on 1/7/26.
=== 1003966 / Kristinn Bjarnason ===
01/07 - Kristinn Bjarnason assigned the Professor role to Mary Perrodin on 1/7/26.
event_log.md
# Security Admin Audit Event Log
## 1000054 / Dennis Cregan
| Date | Event Summary |
|------|---------------|
| 01/06 | Dennis Cregan assigned the Academic Advising role to Casey Spelman on 1/6/26. |
Usage
python process.py
Caching Behavior
If test_with_names.csv already exists, the script skips CSV processing and jumps directly to LLM summarization. Delete this file to force full reprocessing.
Error Handling
- LM Studio unreachable: Prompts for new IP address
- LLM context too long: Records
[LLM ERROR]in the output and continues - Missing columns: Gracefully handles missing optional columns
Performance Notes
- The spaCy transformer model requires significant memory (~1GB)
- Progress bars show NER extraction status with live name counts
- LLM calls are sequential (one per event group)