150 lines
4.8 KiB
Markdown
150 lines
4.8 KiB
Markdown
# Workday Monthly Admin Activities Audit
|
|
|
|
A Python tool for processing Workday security admin audit reports. It extracts person names from audit data using NLP, filters relevant records, and generates human-readable summaries using a local LLM.
|
|
|
|
## Overview
|
|
|
|
This script automates the monthly review of Workday administrative activities by:
|
|
|
|
1. Extracting person names from unstructured audit text using spaCy NER
|
|
2. Filtering and deduplicating audit records
|
|
3. Generating natural language summaries of each admin action using LM Studio
|
|
|
|
## Requirements
|
|
|
|
### Python Dependencies
|
|
|
|
```bash
|
|
pip install pandas spacy openai requests tqdm
|
|
python -m spacy download en_core_web_trf
|
|
```
|
|
|
|
### External Services
|
|
|
|
- **LM Studio**: A local LLM server running on port 1234. The script will prompt for the IP address if the default is unreachable.
|
|
|
|
## Configuration
|
|
|
|
Edit the constants at the top of `process.py`:
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `DEFAULT_LM_IP` | `100.113.108.121` | IP address of LM Studio server |
|
|
| `LLM_MODEL` | `openai/gpt-oss-20b` | Model name to use for summarization |
|
|
| `INPUT_CSV` | `test.csv` | Input Workday audit report |
|
|
| `OUTPUT_CSV` | `test_with_names.csv` | Processed CSV with extracted names |
|
|
| `EVENT_LOG` | `event_log.txt` | Plain text event summaries |
|
|
| `EVENT_LOG_MD` | `event_log.md` | Markdown formatted event summaries |
|
|
| `FINAL_SNAPSHOT` | `final.csv` | Pre-LLM snapshot of filtered data |
|
|
|
|
## Input CSV Format
|
|
|
|
The script expects a Workday audit export with these columns:
|
|
|
|
| Column | Description |
|
|
|--------|-------------|
|
|
| `Instance that Changed` | Text describing what changed (contains person names) |
|
|
| `Added` | Text describing what was added (contains person names) |
|
|
| `In Transaction` | Transaction identifier |
|
|
| `By User` | The admin who performed the action |
|
|
| `Entered On` | Timestamp of the action |
|
|
|
|
## Processing Pipeline
|
|
|
|
### Step 1: LM Studio Connection Check
|
|
|
|
The script first verifies connectivity to LM Studio. If unreachable, it prompts for a new IP address.
|
|
|
|
### Step 2: spaCy Model Loading
|
|
|
|
Loads the `en_core_web_trf` transformer model for Named Entity Recognition (NER). Only the NER component is loaded for efficiency.
|
|
|
|
### Step 3: Name Extraction (NER)
|
|
|
|
Uses spaCy to extract PERSON entities from audit text:
|
|
|
|
1. **Process "Added" column first** - Extracts names and populates both `Added Applied to` and `Applied to` columns
|
|
2. **Process "Instance that Changed"** - Only for rows where `Applied to` is still empty
|
|
|
|
Names are cleaned by:
|
|
- Removing job codes and fragments after dashes/slashes
|
|
- Keeping only alphabetic characters, apostrophes, periods, and hyphens
|
|
- Deduplicating within each cell
|
|
|
|
### Step 4: Data Filtering
|
|
|
|
The script applies several filters:
|
|
|
|
1. **Remove "Automatic Complete" rows** - Filters out system-generated entries
|
|
2. **Keep matching rows only** - Retains rows where `Applied to` equals `Added Applied to`
|
|
3. **Self-action filter** - Removes rows where the person acted on themselves (name appears in `By User`)
|
|
4. **Deduplication** - Removes duplicate rows
|
|
|
|
### Step 5: LLM Summarization
|
|
|
|
Groups remaining records by:
|
|
- `Applied to` (the person affected)
|
|
- `By User` (the admin)
|
|
- `Entered On` (timestamp)
|
|
|
|
For each group, sends a prompt to LM Studio requesting a one-sentence compliance summary including:
|
|
- Who performed the action
|
|
- Who it applied to
|
|
- What roles were assigned/added
|
|
- The date
|
|
|
|
### Step 6: Output Generation
|
|
|
|
Generates three output files:
|
|
|
|
1. **`test_with_names.csv`** - Processed CSV with extracted names
|
|
2. **`event_log.txt`** - Plain text summaries grouped by admin user
|
|
3. **`event_log.md`** - Markdown table format for easier reading
|
|
|
|
## Output Format
|
|
|
|
### event_log.txt
|
|
|
|
```
|
|
=== 1000054 / Dennis Cregan ===
|
|
01/06 - Dennis Cregan assigned the Academic Advising role to Casey Spelman on 1/6/26.
|
|
01/07 - Dennis Cregan assigned the PSID role to PSID on 1/7/26.
|
|
|
|
=== 1003966 / Kristinn Bjarnason ===
|
|
01/07 - Kristinn Bjarnason assigned the Professor role to Mary Perrodin on 1/7/26.
|
|
```
|
|
|
|
### event_log.md
|
|
|
|
```markdown
|
|
# Security Admin Audit Event Log
|
|
|
|
## 1000054 / Dennis Cregan
|
|
|
|
| Date | Event Summary |
|
|
|------|---------------|
|
|
| 01/06 | Dennis Cregan assigned the Academic Advising role to Casey Spelman on 1/6/26. |
|
|
```
|
|
|
|
## Usage
|
|
|
|
```bash
|
|
python process.py
|
|
```
|
|
|
|
### Caching Behavior
|
|
|
|
If `test_with_names.csv` already exists, the script skips CSV processing and jumps directly to LLM summarization. Delete this file to force full reprocessing.
|
|
|
|
## Error Handling
|
|
|
|
- **LM Studio unreachable**: Prompts for new IP address
|
|
- **LLM context too long**: Records `[LLM ERROR]` in the output and continues
|
|
- **Missing columns**: Gracefully handles missing optional columns
|
|
|
|
## Performance Notes
|
|
|
|
- The spaCy transformer model requires significant memory (~1GB)
|
|
- Progress bars show NER extraction status with live name counts
|
|
- LLM calls are sequential (one per event group)
|