# Workday Monthly Admin Activities Audit A Python tool for processing Workday security admin audit reports. It extracts person names from audit data using NLP, filters relevant records, and generates human-readable summaries using a local LLM. ## Overview This script automates the monthly review of Workday administrative activities by: 1. Extracting person names from unstructured audit text using spaCy NER 2. Filtering and deduplicating audit records 3. Generating natural language summaries of each admin action using LM Studio ## Requirements ### Python Dependencies ```bash pip install pandas spacy openai requests tqdm python -m spacy download en_core_web_trf ``` ### External Services - **LM Studio**: A local LLM server running on port 1234. The script will prompt for the IP address if the default is unreachable. ## Configuration Edit the constants at the top of `process.py`: | Variable | Default | Description | |----------|---------|-------------| | `DEFAULT_LM_IP` | `100.113.108.121` | IP address of LM Studio server | | `LLM_MODEL` | `openai/gpt-oss-20b` | Model name to use for summarization | | `INPUT_CSV` | `test.csv` | Input Workday audit report | | `OUTPUT_CSV` | `test_with_names.csv` | Processed CSV with extracted names | | `EVENT_LOG` | `event_log.txt` | Plain text event summaries | | `EVENT_LOG_MD` | `event_log.md` | Markdown formatted event summaries | | `FINAL_SNAPSHOT` | `final.csv` | Pre-LLM snapshot of filtered data | ## Input CSV Format The script expects a Workday audit export with these columns: | Column | Description | |--------|-------------| | `Instance that Changed` | Text describing what changed (contains person names) | | `Added` | Text describing what was added (contains person names) | | `In Transaction` | Transaction identifier | | `By User` | The admin who performed the action | | `Entered On` | Timestamp of the action | ## Processing Pipeline ### Step 1: LM Studio Connection Check The script first verifies connectivity to LM Studio. If unreachable, it prompts for a new IP address. ### Step 2: spaCy Model Loading Loads the `en_core_web_trf` transformer model for Named Entity Recognition (NER). Only the NER component is loaded for efficiency. ### Step 3: Name Extraction (NER) Uses spaCy to extract PERSON entities from audit text: 1. **Process "Added" column first** - Extracts names and populates both `Added Applied to` and `Applied to` columns 2. **Process "Instance that Changed"** - Only for rows where `Applied to` is still empty Names are cleaned by: - Removing job codes and fragments after dashes/slashes - Keeping only alphabetic characters, apostrophes, periods, and hyphens - Deduplicating within each cell ### Step 4: Data Filtering The script applies several filters: 1. **Remove "Automatic Complete" rows** - Filters out system-generated entries 2. **Keep matching rows only** - Retains rows where `Applied to` equals `Added Applied to` 3. **Self-action filter** - Removes rows where the person acted on themselves (name appears in `By User`) 4. **Deduplication** - Removes duplicate rows ### Step 5: LLM Summarization Groups remaining records by: - `Applied to` (the person affected) - `By User` (the admin) - `Entered On` (timestamp) For each group, sends a prompt to LM Studio requesting a one-sentence compliance summary including: - Who performed the action - Who it applied to - What roles were assigned/added - The date ### Step 6: Output Generation Generates three output files: 1. **`test_with_names.csv`** - Processed CSV with extracted names 2. **`event_log.txt`** - Plain text summaries grouped by admin user 3. **`event_log.md`** - Markdown table format for easier reading ## Output Format ### event_log.txt ``` === 1000054 / Dennis Cregan === 01/06 - Dennis Cregan assigned the Academic Advising role to Casey Spelman on 1/6/26. 01/07 - Dennis Cregan assigned the PSID role to PSID on 1/7/26. === 1003966 / Kristinn Bjarnason === 01/07 - Kristinn Bjarnason assigned the Professor role to Mary Perrodin on 1/7/26. ``` ### event_log.md ```markdown # Security Admin Audit Event Log ## 1000054 / Dennis Cregan | Date | Event Summary | |------|---------------| | 01/06 | Dennis Cregan assigned the Academic Advising role to Casey Spelman on 1/6/26. | ``` ## Usage ```bash python process.py ``` ### Caching Behavior If `test_with_names.csv` already exists, the script skips CSV processing and jumps directly to LLM summarization. Delete this file to force full reprocessing. ## Error Handling - **LM Studio unreachable**: Prompts for new IP address - **LLM context too long**: Records `[LLM ERROR]` in the output and continues - **Missing columns**: Gracefully handles missing optional columns ## Performance Notes - The spaCy transformer model requires significant memory (~1GB) - Progress bars show NER extraction status with live name counts - LLM calls are sequential (one per event group)