Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Significance: patch
Type: other

Adds subscribers list to post meta for debugging purposes
186 changes: 186 additions & 0 deletions projects/plugins/jetpack/modules/subscriptions.php
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
// phpcs:disable Universal.Files.SeparateFunctionsFromOO.Mixed -- TODO: Move classes to appropriately-named class files.

use Automattic\Jetpack\Admin_UI\Admin_Menu;
use Automattic\Jetpack\Connection\Client;
use Automattic\Jetpack\Connection\Manager as Connection_Manager;
use Automattic\Jetpack\Connection\XMLRPC_Async_Call;
use Automattic\Jetpack\Redirect;
Expand Down Expand Up @@ -132,6 +133,8 @@ public function __construct() {

add_filter( 'jetpack_published_post_flags', array( $this, 'set_post_flags' ), 10, 2 );

add_action( 'jetpack_published_post', array( $this, 'store_subscribers_when_sent' ), 10, 3 );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Automattic/jetpack-vulcan I would have a question about this. Do we need to sync the new _jetpack_newsletter_subscribers_when_sent post meta when we do this, to ensure we have access to that data on WordPress.com (we will be using that post meta in a page on WordPress.com).

Thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - currently with this PR checked out and after creating a new post with Newsletter enabled, _jetpack_newsletter_subscribers_when_sent exists as a meta key for the post on the remote site, but not on the cache site.

If the post meta should be available on WordPress.com, it would need to be whitelisted
(in the Sync package and then in WPcom in the jetpack mu-plugin, in $post_meta_whitelist within sync/class.jetpack-sync-defaults.php)

Copy link
Contributor

@coder-karen coder-karen Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separately it looks like a Sync test needs updating for this PR - test_sends_publish_post_action. The failing assertion is due to another event being queued after the publish hook, so the most recent event isn’t jetpack_published_post anymore (but updated_option). Instead, it looks like we should assert that an event with action jetpack_published_post exists for the post ID, so we're testing for intent but not ordering.
@Addison-Stavlo I'd be happy to work on that in a separate PR so it passes for changes made here if it would be helpful, since it's a change that looks like it should happen anyway.

Edit to add - I've started on a PR for the Sync test here: #46105


add_filter( 'post_updated_messages', array( $this, 'update_published_message' ), 18, 1 );

// Set "social_notifications_subscribe" option during the first-time activation.
Expand Down Expand Up @@ -989,6 +992,189 @@ public function register_post_meta() {
register_meta( 'post', '_jetpack_post_was_ever_published', $jetpack_post_was_ever_published );
}

/**
* Store the list of subscribers when a post is first emailed.
*
* This method is called when a post is published and emails are sent to subscribers.
* It stores the subscriber count and metadata in post meta for debugging purposes.
*
* @since $$next-version$$
*
* @param int $post_ID Post ID.
* @param array $flags Post flags including send_subscription.
* @param WP_Post $post Post object.
*
* @return void
*/
public function store_subscribers_when_sent( $post_ID, $flags, $post ) {
// Only store if emails are being sent.
if ( ! isset( $flags['send_subscription'] ) || ! $flags['send_subscription'] ) {
return;
}

// Only store once - check if we've already stored subscribers for this post.
$existing_subscribers = get_post_meta( $post_ID, '_jetpack_newsletter_subscribers_when_sent', true );
if ( ! empty( $existing_subscribers ) ) {
return;
}

// Only store for posts.
if ( 'post' !== $post->post_type ) {
return;
}

// Fetch subscriber data from WordPress.com API.
$subscriber_data = $this->get_subscriber_data();

// Store subscriber data with timestamp.
$data_to_store = array(
'timestamp' => current_time( 'mysql' ),
'email_subscribers' => isset( $subscriber_data['email_subscribers'] ) ? (int) $subscriber_data['email_subscribers'] : 0,
'paid_subscribers' => isset( $subscriber_data['paid_subscribers'] ) ? (int) $subscriber_data['paid_subscribers'] : 0,
'all_subscribers' => isset( $subscriber_data['all_subscribers'] ) ? (int) $subscriber_data['all_subscribers'] : 0,
'subscriber_list' => isset( $subscriber_data['subscriber_list'] ) && is_array( $subscriber_data['subscriber_list'] ) ? $subscriber_data['subscriber_list'] : array(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a lot of subscribers, that will make for a huge amount of post meta, saved for each post. That seems like it would be problematic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a main point of concern. With most newsletters there should be no issue, and leaving this out of rest and api responses is helpful, but once we start getting into extremes of tens to hundred of thousands of subscribers it starts becoming a bit more concerning.

Two possible options:

  1. We choose to support this for sites with up to X subscribers -- either not adding the subscriber list at all or noting it truncated/incomplete once it reaches the cutoff.
  2. Scrap the idea of adding this post meta. For debugging we could use the live subscribers list on the dashboard and filter out any subscribers that have joined since after the publish date. That does add more limitation to the debugger idea tho as any subscribers that unsubscribed, unsubbed & resubbed, or changed their paid tier status since the original publish wouldn't be reflected correctly in the tool.

);

update_post_meta( $post_ID, '_jetpack_newsletter_subscribers_when_sent', $data_to_store );
}

/**
* Get subscriber data from WordPress.com API.
*
* @since $$next-version$$
*
* @return array Subscriber data including counts and email list.
*/
private function get_subscriber_data() {
$subscriber_data = array(
'email_subscribers' => 0,
'paid_subscribers' => 0,
'all_subscribers' => 0,
'subscriber_list' => array(),
);

// Only fetch if Jetpack is connected.
if ( ! Jetpack::is_connection_ready() ) {
return $subscriber_data;
}

$site_id = Jetpack_Options::get_option( 'id' );

// First, get subscriber counts from stats endpoint.
$stats_path = sprintf( '/sites/%d/subscribers/stats', $site_id );
Comment on lines +1063 to +1064
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have logic to fetch from that endpoint in

if ( Jetpack::is_connection_ready() ) {
$site_id = Jetpack_Options::get_option( 'id' );
$api_path = sprintf( '/sites/%d/subscribers/stats', $site_id );
$response = Client::wpcom_json_api_request_as_blog(
$api_path,
'2',
array(),
null,
'wpcom'
);

Maybe we can take the opportunity to consolidate things into one central method? That would be helpful I think, given that we already fetch subscribers in multiple places in the codebase.

$stats_response = Client::wpcom_json_api_request_as_blog(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same like for the other API call, I think we should save that data in a transient to save outgoing calls when possible.

$stats_path,
'2',
array(),
null,
'wpcom'
);

if ( ! is_wp_error( $stats_response ) ) {
$stats_code = wp_remote_retrieve_response_code( $stats_response );
if ( 200 === $stats_code ) {
$subscriber_counts = json_decode( wp_remote_retrieve_body( $stats_response ), true );
if ( is_array( $subscriber_counts ) ) {
if ( isset( $subscriber_counts['counts']['email_subscribers'] ) ) {
$subscriber_data['email_subscribers'] = (int) $subscriber_counts['counts']['email_subscribers'];
}
if ( isset( $subscriber_counts['counts']['paid_subscribers'] ) ) {
$subscriber_data['paid_subscribers'] = (int) $subscriber_counts['counts']['paid_subscribers'];
}
if ( isset( $subscriber_counts['counts']['all_subscribers'] ) ) {
$subscriber_data['all_subscribers'] = (int) $subscriber_counts['counts']['all_subscribers'];
}
}
}
}

// Fetch the actual subscriber list with emails.
$subscriber_emails = $this->fetch_all_subscribers( $site_id );
$subscriber_data['subscriber_list'] = $subscriber_emails;

return $subscriber_data;
}

/**
* Fetch all subscribers from WordPress.com API with pagination.
*
* @since $$next-version$$
*
* @param int $site_id Site ID.
* @return array Array of subscriber data, each containing 'email' and 'is_paid' keys.
*/
private function fetch_all_subscribers( $site_id ) {
$subscriber_emails = array();
$page = 1;
$per_page = 100; // Maximum per page to minimize requests.

while ( true ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're potentially making multiple calls to WordPress.com to get all that data, I think this should be saved in a transient, to save any too frequent calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I generally agree we should limit calls, how long do we think that transient should last?

If we limit running this to posts being published, and return early if the meta is already set, then I think any extra calls that would be made would be triggered by a new separate post being published? (Or is there another means I am not thinking of?). If in that time between 2 posts being published the subscriber list changed, that transient would then also have us adding stale data on the meta that might not be very helpful for the debugger. If we are setting the timer for the transient much shorter to avoid that situation, then its unlikely to have any effect but also seems completely fine/appropriate to add it as a safety.

Thoughts?

$api_path = sprintf(
'/sites/%d/subscribers/?page=%d&per_page=%d&filter=email_subscriber',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're fetching all subscribers, do we need that extra email_subscriber filter? If we do, do you think you could explain why that's needed in the docblock for the method?

Since we already have other functions used to fetch subscribers in the codebase, I think that if we add a new one we need to be extra clear what it does, and how it differs from the others.

Alternatively, maybe that should be a parameter that could be passed to the method, this way this method can become the one method we use for everything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're fetching all subscribers, do we need that extra email_subscriber filter? If we do, do you think you could explain why that's needed in the docblock for the method?

I think this limits it to only subscribers that would be sent emails, excluding people who are only subscribed via the reader with no emails. So we don't NEED to limit it here, but it didn't seem like the others would be very useful for our purposes.

$site_id,
$page,
$per_page
);

$response = Client::wpcom_json_api_request_as_blog(
$api_path,
'2',
array(),
null,
'wpcom'
);

if ( is_wp_error( $response ) ) {
break;
}

$response_code = wp_remote_retrieve_response_code( $response );
if ( 200 !== $response_code ) {
break;
}

$response_body = json_decode( wp_remote_retrieve_body( $response ), true );
if ( ! is_array( $response_body ) ) {
break;
}

// Extract subscriber data from subscribers array.
if ( isset( $response_body['subscribers'] ) && is_array( $response_body['subscribers'] ) ) {
foreach ( $response_body['subscribers'] as $subscriber ) {
if ( isset( $subscriber['email_address'] ) && is_email( $subscriber['email_address'] ) ) {
// Determine if subscriber has an active paid plan.
$is_paid = false;
if ( isset( $subscriber['plans'] ) && is_array( $subscriber['plans'] ) ) {
foreach ( $subscriber['plans'] as $plan ) {
if ( isset( $plan['status'] ) && 'active' === $plan['status'] ) {
$is_paid = true;
break;
}
}
}

$subscriber_emails[] = array(
'email' => sanitize_email( $subscriber['email_address'] ),
Comment on lines +1144 to +1157
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could consolidate things a bit with $subscriber['email_address']. If we check is_email, I would assume that sanitize_email wouldn't run into issues later?

'is_paid' => $is_paid,
);
}
}
}

// Check if there are more pages.
$total = isset( $response_body['total'] ) ? (int) $response_body['total'] : 0;
$total_pages = isset( $response_body['total_pages'] ) ? (int) $response_body['total_pages'] : 1;

if ( $page >= $total_pages || count( $subscriber_emails ) >= $total ) {
break;
}

++$page;
}

return $subscriber_emails;
}

/**
* Create a Subscribers menu displayed on self-hosted sites.
*
Expand Down
Loading