• Home
  • About
  • Tickets
  • New York
    • About
    • Agenda
    • Speakers
    • Workshops
    • Sponsors
  • Government
    • About
  • More
    • Videos
    • Job Board
    • Code of Conduct

The New York Data Science & AI Conference Presented by Lander Analytics


The New York Data Science & AI Conference

Presented by Lander Analytics

Workshops: Monday, August 25
Conference: Tuesday, August 26 & Wednesday, August 27
Location: New York City
✕

Click here to buy tickets
Sell tickets online with Ticket Tailor


Immerse yourself in the evolving world of data science and AI at The New York Data Science & AI Conference Presented by Lander Analytics—an intimate, single-track conference designed to connect data professionals and showcase world-class speakers. It will take place August 26 & 27, with hands-on workshops on August 25.

For over a decade, the New York R Conference has been the go-to event for R enthusiasts and data professionals. Now, as the field evolves, so does our conference. We continue to bring together data professionals from diverse industries such as technology, finance, healthcare, sports, retail, and more—fostering a space for exceptional content and unparalleled networking.

Attend in New York City and virtually to explore the latest advancements, share insights, and shape the future of data science and AI.



Agenda

Monday, Aug 25

  • 08:30 AM - 09:15 AM

    Registration & Breakfast

  • 09:15 AM - 05:00 PM

    Workshop: Machine Learning in R

    Max Kuhn

    Scientist @ Posit

    More details

    Join Max Kuhn on a tour through Machine Learning in R, with emphasis on using the software as opposed to general explanations of model building. This workshop is an abbreviated introduction to the tidymodels framework for modeling.

    You'll learn about data preparation, model fitting, model assessment and predictions. The focus will be on data splitting and resampling, data pre-processing and feature engineering, model creation, evaluation, and tuning. This is not a deep learning course and will focus on tabular data.

    Pre-requisites: some experience with modeling in R and the tidyverse (don't need to be experts); prior experience with lm is enough to get started and learn advanced modeling techniques. In case participants can’t install the packages on their machines, RStudio Server Pro instances will be available that are pre-loaded with the appropriate packages and GitHub repository.

    (In-Person & Virtual Ticket Options Available)

  • 09:15 AM - 05:00 PM

    Workshop: Introduction of LLMs/AI

    Daniel Chen

    Post-Doc Research and Teaching Fellow & Data Science Educator @ University of British Columbia

    More details

    There's a lot of hype around AI around all their use cases and the amazing things they can do. This workshop aims to demystify how LLMs work and give you a practical understanding of how they work and how to use them beyond the desktop application.

    We will code with LLMs using an API and introduce two packages, chatlas (python) and ellmer (r), that make it easier to interact with LLMs programitaclly. We'll also see how we can use LLMs in Shiny dashboards to create a user interface with your own chat bots. We'll then expand on these basics to learn about RAG (retrieval augmented generation) and tool calling to give our bots more context and abilities to work as "agents". Finally, we'll see how we can use LLMs to help us work with our data science projects.

    (In-Person & Virtual Ticket Options Available)

Workshop tickets sold separately

Tuesday, Aug 26

  • 09:00 AM - 09:50 AM

    Registration & Breakfast

  • 09:50 AM - 10:00 AM

    Opening Remarks

  • 10:00 AM - 10:20 AM

    Generating New Data Through Simulating an NFL Game

    Ally Blake

    Senior Coordinator, Football Data & Analytics @ NFL

    More details

    A play-level game simulation model can precisely quantify the impact and any intended and unintended consequences of potential rules changes. The goal of this project is to mimic a real NFL game, review the results of simulated games, compare to what one expects from a real game, and evaluate the results based on points per game and plays per game.
  • 10:25 AM - 10:45 AM

    How We Built It: An Offseason of Development at NFL Next Gen Stats

    Mike Band

    Sr. Manager, Research & Analytics @ NFL Next Gen Stats

  • 10:45 AM - 11:15 AM

    Break

  • 11:15 AM - 11:35 AM

    From Hype to Value: Mastering Gen AI Outcomes Through Effective Evaluations

    Bill Gold

    Head of AI @ Citizens Bank

    More details

    Large Language Models (LLMs) can produce varied quality results, necessitating effective evaluations. Understanding quality drivers like lossy models and reinforcement learning from human feedback (RLHF) is crucial. The presentation reviews evaluation approaches, including benchmarks, LLM as a judge, crowdsourcing, and human experts. Each method has trade-offs regarding scale, cost, and alignment with specific use cases. Developing strong intuitions about LLM behavior is vital for discerning impactful applications. Better practices involve early evaluation, aligning approaches with use cases, and leveraging human experts for gold standards.
  • 11:40 AM - 12:00 PM

    How I Learned to Stop Worrying and Love Vibe Coding

    Jared P. Lander

    Chief Data Scientist @ Lander Analytics

  • 12:05 PM - 12:25 PM

    LLMs, Chatbots, and Dashboards: Visualize Your Data with Natural Language

    Daniel Chen

    Post-Doc Research and Teaching Fellow & Data Science Educator @ University of British Columbia

  • 12:25 PM - 01:25 PM

    Lunch

  • 01:25 PM - 01:45 PM

    TBD

  • 01:50 PM - 02:30 PM

    What's Going On In There? Bayesian Tools for Understanding a Fitted Model

    Andrew Gelman

    Professor @ Department of Statistics and Department of Political Science, Columbia University

    More details

    A fitted model is a mapping from data (including information encoded in the model specification and the prior) to inferences. We present Bayesian tools for model understanding, generalizing existing analytical methods such as R-squared, graphical approaches such as influence plots, and workflow procedures such as sensitivity analysis. Our goal is to better understand the weaknesses of our fitted models and ultimately to learn more from data.
  • 02:30 PM - 03:10 PM

    Break

  • 03:10 PM - 03:30 PM

    Narratives in Data from the First Seven Months of Congestion Pricing

    Gayan Seneviratna

    Senior Data Scientist @ MTA Data and Analytics

    More details

    On January 5th, 2025, New York began an ambitious effort to reshape how we share our city streets. Under the Central Business District Tolling Program (CBDTP), drivers were charged on entry into southern Manhattan. By pricing the negative externality of driving, the program aimed to reduce vehicle entries, improve bus speeds in the CBD, and curtail polluting emissions. Did the program succeed in these goals? To answer that question, the MTA turned to its Data and Analytics team. My talk covers our work over the first months of CBDTP: the ingestion of tolling data with Apache Airflow, the building of a counterfactual for vehicle entries, and the storytelling needed for powerful, data-driven narratives.
  • 03:35 PM - 03:55 PM

    Using Quarto to Create Reports for Hospital Quality Improvement: Safely Lowering Cesarean Rates

    Xilin Chen

    Analytics Manager @ Michigan Medicine

    More details

    Cesarean births can be lifesaving, but when overused, they carry significant health risks and financial consequences. At the Obstetrics Initiative, we work with over 65 Michigan hospitals to reduce unnecessary cesareans and improve maternal care. As the analytics manager, I lead the development of performance reports that inform both individual hospitals and our internal clinical teams. These reports are based on near real-time data from each hospital's EHR system and are used to monitor trends, identify areas for improvement, and guide targeted support. We rely heavily on Quarto to generate these reports. Its flexibility and scalability allow us to create daily, monthly, and annual reporting pipelines, as well as custom one-off reports tailored to specific needs. By leveraging Quarto’s and R’s powerful analytics capabilities, we produce clea, and impactful reports that our hospital partners rely on—and love. In this talk, I’ll share how we’ve structured our reporting system with Quarto, the benefits it’s brought to our quality improvement work, and practical tips for building similar tools in your own healthcare or analytics environment.
  • 04:00 PM - 04:20 PM

    TBD

  • 04:20 PM - 04:30 PM

    Closing Remarks

Wednesday, Aug 27

  • 09:00 AM - 09:50 AM

    Registration & Breakfast

  • 09:50 AM - 10:00 AM

    Opening Remarks

  • 10:00 AM - 10:20 AM

    How to Use Free, Open-Source Text Embeddings to Accomplish Advanced Textual Analysis

    Andrew Wallender

    Data Editor @ Bloomberg Industry Group

    More details

    Have you ever found yourself overwhelmed by a mountain of documents to analyze? Discover how to easily find insights using a powerful, open-source text embedding model. Convert your text into meaningful numerical representations to go beyond keyword matching and uncover thematic clusters, find conceptually similar documents, and build semantic search applications. This session will show you how to leverage free tools to perform sophisticated textual analysis at a fraction of the cost of LLMs.
  • 10:25 AM - 10:45 AM

    Processing Document Collections with LLMs: A Practical Workflow

    Abigail Haddad

    Data Scientist/Machine Learning Engineer @ Freelance

    More details

    Every organization has stacks of similar documents - customer complaints, resumes, error logs - that need the same questions answered about each one. This talk walks through a systematic workflow for processing these document collections with LLMs, covering the full pipeline from messy input to polished results. I'll share real examples and the tools I built to automate the repetitive parts across different projects, including wrangling LLM outputs and creating modular display components.
  • 10:45 AM - 11:15 AM

    Break

  • 11:15 AM - 11:35 AM

    AI with ROI: How to Use ML to Cut Your Snowflake Bill in Half

    Ben Lerner

    CEO & Co-Founder @ Espresso AI

    More details

    Espresso AI uses two main techniques to run workloads substantially faster and cheaper on data warehouses: better job scheduling and automatically incrementalizing queries. This talk will dive into the technical details behind both approaches.
  • 11:40 AM - 12:00 PM

    Fine-Tuning LLMs to Automate Energy Savings

    Danya Murali

    Lead Data Scientist @ Arbor

    More details

    In this talk, I’ll show how we turned our operations team’s deep energy expertise into curated training data to fine-tune LLMs that extract key information from notoriously inconsistent electricity bills, whose formats, field names, and terminology shift across utilities, rate plans, and energy sources. This pipeline now powers Arbor’s ability to give customers a clear view of their electricity costs and automatically broker them onto lower-cost suppliers, cutting manual work by over 90% and enabling us to scale efficiently. Our entire system rests on meticulous data engineering combined with human judgment and context, which remain essential for a reliable AI-driven solution.
  • 12:05 PM - 12:25 PM

    TBD

  • 12:25 PM - 01:25 PM

    Lunch

  • 01:25 PM - 01:45 PM

    Rethinking A/B Tests for Connected Users and Teams

    Chiraag Kala

    Lead Data Scientist @ Airbnb

    More details

    Imagine running experiments that assume everyone acts independently—only to realize that, in practice, people collaborate. On platforms like Airbnb, for example, hosts can partner with other co-hosts, forming networks where behavior is interdependent. Traditional A/B tests that randomize individuals ignore these relationships, leading to biased results. In this talk, we introduce a new experimentation design that treats entire teams or networks as the unit of analysis. By accounting for collaboration and spillover effects, this approach yields more accurate results and leads to a better user experience. Whether you're building an e-commerce platform, a social network, or a financial product where users operate in groups, network-aware experiments will help you make smarter, more reliable decisions.
  • 01:50 PM - 02:10 PM

    Dealing with Duplicate Data (in R)

    Erin Grand

    Senior Data Scientist @ TRAILS to Wellness

    More details

    Maintaining high data quality is essential for accurate analyses and decision-making. Unfortunately, high data quality is often hard to come by (especially for non-profits). This talk will focus on some "how-tos" of cleaning data and removing duplicates to enhance data integrity. We'll go over common causes of duplicates, how to use the {{janitor}} package to identify and remove duplicates, and business practices that can help prevent these data issues from happening in the first place.
  • 02:15 PM - 02:35 PM

    Measuring LLM Effectiveness

    Max Kuhn

    Scientist @ Posit, PBC

    More details

    How can we quantify how accurately LLMs perform? In late 2024, Anthropic released a preprint of a manuscript about statistically analyzing model evaluations. The concepts are on target, but the statistical tactics have narrow applicability. A simpler statistical framework can be used to quantify LLM models that can be used in many more scenarios/experimental designs. We'll describe these methods and show an example.
  • 02:35 PM - 03:15 PM

    Break

  • 03:15 PM - 03:35 PM

    From Prediction to Foundation: Deep Learning Models for Patient Care Optimization

    Jon Sege & Vincent Pan

    White Plains Hospital

    More details

    Accurately predicting medical specialties and follow-up appointment needs from electronic medical records (EMR) can enhance personalized care and resource allocation. We present a neural network pipeline using LSTM architecture and diagnosis codes to predict specialties and flag follow-ups, addressing challenges like class imbalance and interpretability. We also introduce an approach to learning general-purpose medical code embeddings from EMR sequences, using Masked Code Modeling (MCM) and Graph Convolutional Transformers (GCT). Functioning as a clinical foundation model, these embeddings encode relationships among medical codes and can be leveraged across diverse downstream applications in healthcare analytics. Finally, we will discuss an application that leverages these models to provide actionable decision-points for our quality and coding teams.
  • 03:40 PM - 04:00 PM

    Understanding Artificial General Intelligence Futures: Toward a Shared Vocabulary for Policy Planning

    Swaptik Chowdhury

    Assistant Policy Researcher @ RAND Corporation

    More details

    The absence of shared terminology for describing artificial general intelligence (AGI) futures has created persistent misunderstandings in policy discussions. This talk presents a classification framework that makes explicit the assumptions underlying different AGI scenarios. It introduces six descriptive axes: locus of control, governance primacy, alignment level, takeoff speed, human AGI relationship, and AGI volition. These pivots enable policymakers and researchers to describe a comprehensive range of plausible AGI futures, clarify disagreements, and assess the robustness of policy responses across various scenarios. The framework supports more informed and structured dialogue in AI governance and foresight planning.
  • 04:05 PM - 04:25 PM

    Tiering Teams and Predicting Attendance with R

    Kelsey McDonald

    Senior Manager, Strategy & Business Intelligence @ New York Yankees

  • 04:25 PM - 04:35 PM

    Closing Remarks



Speakers

Headshot of Andrew Gelman
Andrew Gelman

Professor

Department of Statistics and Department of Political Science, Columbia University

@StatModeling

Talk: What's Going On In There? Bayesian Tools for Understanding a Fitted Model

Headshot of Danya Murali
Danya Murali

Lead Data Scientist

Arbor

@joinarbor

Talk: Fine-Tuning LLMs to Automate Energy Savings

Headshot of Ben Lerner
Ben Lerner

CEO & Co-Founder

Espresso AI

@ben_lern

Talk: AI with ROI: How to Use ML to Cut Your Snowflake Bill in Half

Headshot of Max Kuhn
Max Kuhn

Scientist

Posit, PBC

@topepo.bsky.social‬

Talk: Measuring LLM Effectiveness

Headshot of Kelsey McDonald
Kelsey McDonald

Senior Manager, Strategy & Business Intelligence

New York Yankees

@Yankees

Talk: Tiering Teams and Predicting Attendance with R

Headshot of Andrew Wallender
Andrew Wallender

Data Editor

Bloomberg Industry Group

@BBGIndustry

Talk: How to Use Free, Open-Source Text Embeddings to Accomplish Advanced Textual Analysis

Headshot of Gayan Seneviratna
Gayan Seneviratna

Senior Data Scientist

MTA Data and Analytics

@MTA

Talk: Narratives in Data from the First Seven Months of Congestion Pricing

Headshot of Ally Blake
Ally Blake

Senior Coordinator, Football Data & Analytics

NFL

@Ally_Blake3

Talk: Generating New Data Through Simulating an NFL Game

Headshot of Jared P. Lander
Jared P. Lander

Chief Data Scientist

Lander Analytics

@jaredlander

Talk: How I Learned to Stop Worrying and Love Vibe Coding

Headshot of Jon Sege
Jon Sege

AVP, Data Management & Analytics

White Plains Hospital

@WPHospital

Talk: From Prediction to Foundation: Deep Learning Models for Patient Care Optimization (Joint talk with Vincent Pan)

Headshot of Xilin Chen
Xilin Chen

Analytics Manager

Michigan Medicine

@xilinch

Talk: Using Quarto to Create Reports for Hospital Quality Improvement: Safely Lowering Cesarean Rates

Headshot of Chiraag Kala
Chiraag Kala

Lead Data Scientist

Airbnb

@Airbnb

Talk: Rethinking A/B Tests for Connected Users and Teams

Headshot of Bill Gold
Bill Gold

Head of AI

Citizens Bank

@BillCGold

Talk: From Hype to Value: Mastering Gen AI Outcomes Through Effective Evaluations

Headshot of Abigail Haddad
Abigail Haddad

Data Scientist/Machine Learning Engineer

Freelance

@presentofcoding

Talk: Processing Document Collections with LLMs: A Practical Workflow

Headshot of Mike Band
Mike Band

Sr. Manager, Research & Analytics

NFL Next Gen Stats

@MBandNFL

Talk: How We Built It: An Offseason of Development at NFL Next Gen Stats

Headshot of Daniel Chen
Daniel Chen

Post-Doc Research and Teaching Fellow & Data Science Educator

University of British Columbia

@chendaniely

Talk: LLMs, Chatbots, and Dashboards: Visualize Your Data with Natural Language

Headshot of Swaptik Chowdhury
Swaptik Chowdhury

Assistant Policy Researcher

RAND Corporation

@RANDCorporation

Talk: Understanding Artificial General Intelligence Futures: Toward a Shared Vocabulary for Policy Planning

Headshot of Vincent Pan
Vincent Pan

Data Scientist

White Plains Hospital

@WPHospital

Talk: From Prediction to Foundation: Deep Learning Models for Patient Care Optimization (Joint talk with Jon Sege)

Headshot of Erin Grand
Erin Grand

Senior Data Scientist

TRAILS to Wellness

@astroeringrand

Talk: Dealing with Duplicate Data (in R)

More speakers coming soon…



Workshops

Workshop leader headshot

Machine Learning in R

Hosted by Max Kuhn
Monday, Aug 25 | 9:15am - 5:00pm

More details

Join Max Kuhn on a tour through Machine Learning in R, with emphasis on using the software as opposed to general explanations of model building. This workshop is an abbreviated introduction to the tidymodels framework for modeling.

You'll learn about data preparation, model fitting, model assessment and predictions. The focus will be on data splitting and resampling, data pre-processing and feature engineering, model creation, evaluation, and tuning. This is not a deep learning course and will focus on tabular data.

Pre-requisites: some experience with modeling in R and the tidyverse (don't need to be experts); prior experience with lm is enough to get started and learn advanced modeling techniques. In case participants can’t install the packages on their machines, RStudio Server Pro instances will be available that are pre-loaded with the appropriate packages and GitHub repository.

(In-Person & Virtual Ticket Options Available)

Workshop leader headshot

Introduction of LLMs/AI

Hosted by Daniel Chen
Monday, Aug 25 | 9:15am - 5:00pm

More details

There's a lot of hype around AI around all their use cases and the amazing things they can do. This workshop aims to demystify how LLMs work and give you a practical understanding of how they work and how to use them beyond the desktop application.

We will code with LLMs using an API and introduce two packages, chatlas (python) and ellmer (r), that make it easier to interact with LLMs programitaclly. We'll also see how we can use LLMs in Shiny dashboards to create a user interface with your own chat bots. We'll then expand on these basics to learn about RAG (retrieval augmented generation) and tool calling to give our bots more context and abilities to work as "agents". Finally, we'll see how we can use LLMs to help us work with our data science projects.

(In-Person & Virtual Ticket Options Available)



Sponsors

Gold

Microsoft logo

Silver

Espresso AI logo
R Consortium logo

Supporting

Pearson logo
Manning logo
Springer logo
Chapman & Hall/CRC, Taylor & Francis Group logo

© Lander Analytics 2025