From Brushstroke to Meaning: The Fluid Art of Data Science
Published: April 15, 2025
By Amy Humke, Ph.D.
Founder, Critical Influence
When I’m deep in data modeling, it doesn’t feel like programming, it feels like art. Like painting a ballerina in mid-twirl, I’m capturing something dynamic, incomplete, and full of potential. There’s structure, yes, but there’s also freedom, room to explore, revise, and intuit. Data science may have “science” in the name, but the process is rarely linear. It’s creative. It’s interpretive. And just like with my artwork, I begin with a mess of motion and color, and work toward something that holds meaning, something that brings clarity, even beauty, to complexity.
The most creative and art-like aspect of data science, I believe, happens in feature identification and engineering. Yes, there are other parts of the process that require creativity as well, but feature engineering is where you’re most often pushed to see beyond your own perspective. And just like in art, there is often that point of pure frustration when you feel you’ve hit a wall, the model is not hitting target and what other data point could possibly be out there (that you have access to include) that will make the model work? If you hit that point, take a step back and follow these steps toward creativity.
Let’s Get Creative
Before diving into specifics, prepare yourself to adopt an open mind and banish the negative. Start from a point where even the outlandish could be possible. Nothing is off the table, and there are no stupid thoughts, suggestions, or questions… just unexplored possibilities.
-
Invite collaboration or find ways to obtain diverse perspectives.
Tip: Read posts and articles from other domains, speak with stakeholders, or host a focus group. Often, simply explaining the situation to a colleague sparks new ideas. -
Change things up!
If you’re stuck in front of your computer, get out! Go for a walk. Travel somewhere new. A little movement and fresh air is often just the thing to inspire that ah-ha connection.
Creative Approaches to Feature Finding
In-Depth Exploration
This one isn't flashy, but it’s foundational. Take time to reacquaint yourself with your data warehouse: skim the data dictionaries, table names, and explore what's new. Don’t assume you already know what’s in there—schemas evolve.
- Stay open-minded. Just because you can’t immediately imagine a relationship doesn’t mean one isn’t there.
- Test plausible features before dismissing them.
- Watch for spurious correlations (Tyler Vigen’s classics, anyone?).
Capitalize on Domain Expertise
Sometimes, the best features aren’t found—they’re remembered by someone who understands the problem deeply.
Example: In enrollment modeling, an admissions manager might say, “students who reschedule coaching calls more than once rarely show up to the third attempt.” That insight leads to a call rescheduling volatility feature—something metadata alone wouldn’t reveal.
Visual Thinking
When in doubt, draw it out.
Graphs, mind maps, Sankey diagrams, and heatmaps aren’t just for presentations—they help uncover trends hidden in raw numbers.
Example: Plotting time-to-enrollment by marketing channel may reveal that one channel leads to slower but more consistent conversions, prompting exploration of interaction terms between time and channel.
Creative Techniques in Feature Engineering
Here’s a list of feature engineering strategies I’ve used in real-world projects. While the examples focus on structured behavioral and transactional data, the techniques apply broadly.
Simple Statistical Derived Features
Sometimes, the most valuable insights come from the simplest math:
- Count, sum, average, median, mode
- Max, min, standard deviation, variance, range
- Coefficient of variation
Example: From call log data, create features like average call duration or number of weekend calls.
Feature Combination and Interaction
Combining existing features across time, geography, or profile segments can reveal non-obvious relationships.
Example: Crossing a student’s program with their region may reveal unique patterns. For categorical data, try ethnicity × income band or channel × contact frequency.
- Also try grouping similar-performing categories to reduce noise (great for high-cardinality features).
Feature Transformation
When a continuous feature is skewed, apply:
- Logarithmic
- Square root
- Square or higher-order polynomials
Example: Apply a log transformation to highly skewed purchase counts to stabilize variance.
Discretization (Binning)
Binning continuous variables can improve both model performance and interpretability.
- Tree-based binning: Use decision trees to suggest meaningful split points.
- Unsupervised binning: Try K-means, quantile, or equal-width binning.
Example: Create engagement bins like Low, Medium, and High for cleaner modeling.
Target Encoding
For high-cardinality categorical features, replace each category with the average target value for that group.
Handle carefully to avoid data leakage:
- Use out-of-fold encoding.
- Smooth with the global mean to reduce noise in low-frequency categories.
Recommended Read: "Target Encoding Done the Right Way" by Max Halford.
Time Series Feature Generation
When working with temporal or sequential data, engineered time-based features are often essential.
- Lag features (e.g., last month’s value)
- Rolling statistics (moving averages, rolling std dev)
- Seasonality indicators (month, day of week, fiscal term)
- Time since last event (e.g., last login, last purchase)
These capture recency and periodic patterns, often critical in behavioral prediction.
Take Care
Feature engineering is powerful—but don’t overdo it.
- Avoid overfitting: Too many features can lead to memorizing noise.
- Use regularization: Lasso or Ridge helps reduce unnecessary features.
- Practice rigorous validation: Cross-validation isn’t optional—it’s essential.
Be especially cautious about data leakage. Ask yourself, “Would I have known this at the time the prediction was made?” If not, it doesn’t belong in the model.
Conclusion
It’s easy to hit a wall in feature engineering when your model plateaus and you’re out of ideas. I hope these techniques help you push through, spark new ideas, and feel just a little more creative in the process.
I’ve often noticed how many people in data science also have artistic sides—musicians, painters, dancers, writers. Maybe it’s not a coincidence. I truly believe you need a spark of creativity to do this work well. After all, turning data into something meaningful isn’t just technical—it’s interpretive, fluid, and, yes, a little bit artistic.
What about you? Where do you see the art in your own data science practice?