# Positive proportion of short intervals with a given number of primes

Remark 1 (Disclaimer) This article has been written for my own reference in preparation to my talk in occasion of the RNTA5 conference.

1. Introduction

The distribution of the prime numbers is very chaotic and we do not have efficient rules to exaclty find the ${n}$-th prime ${p_n}$ as a function of ${n}$, whereas, thanks to the prime number theorem, we can certainly approximate the ${n}$-th prime with a suitable function of ${n}$ (i.e., ${p_n\sim n\log n}$). Indeed, most of the cryptography is based on the difficulty to factorize a number into its prime factors.

In particular, the distribution of gaps between primes behave like a random object, especially if we look at small scales, i.e. at gaps between consecutive primes smaller than the average gap (which by the prime number theorem is ${\log n}$ for ${p_{n+1}-p_n}$).

For this reason a well-studied problem was the following: fix ${m\geq 1}$ a positive integer and ${\lambda> 0}$ a positive real number. How many short intervals of the form ${[n,n+\lambda\log n]}$ contain at least ${m}$ primes?

In his breakthrough paper on bounded gaps between primes, James Maynard proved that the answer is: infinitely many. In particular, if we set ${m=2}$, we can be more specific and state that there exist infinitely many intervals of length at most ${246}$ containing at least ${2}$ primes.

Afterwards, Maynard improved his result, allowing a certain rate of uniformity, showing a more precise estimate:

$\displaystyle |\{n\leq x: |[n,n+\lambda\log n]\cap \mathbb{P}|\geq m\}|\gg_{m}x, \ \ \ \ \ (1)$

if ${0<\lambda<\epsilon(m)}$. The method used is a variation of a well-known sieve method, namely the GPY’s sieve.

These statements do not preclude the possibility that there are choices of ${\lambda}$ and ${m}$ for which the intervals ${[n,n+\lambda\log n]}$ contain exactly ${m}$ primes, for at most finitely many ${n}$. Anyway, this was proven to be not the case by Freiberg, who showed the following more precise result. For any positive real number ${\lambda}$ and any non-negative integer ${m}$, we have

$\displaystyle |\lbrace n\leq x: |[n,n+\lambda\log n]\cap \mathbb{P}|= m\rbrace|\geq x^{1-\varepsilon(x)}, \ \ \ \ \ (2)$

if ${x}$ is sufficiently large in terms of ${\lambda}$ and ${m}$, and ${\varepsilon(x)}$ is a certain function that tends to zero as ${x}$ tends to infinity. We may take ${\varepsilon(x)=(\log\log\log\log x)^{2}/\log\log\log x}$.

Since we are working with very short intervals we expect that an application of the well-known Cramér model for the distribution of prime numbers would give the exact answer for the distribution of such intervals. Therefore, heuristically we expect the above cardinality to be

$\displaystyle |\lbrace n\leq x: |[n,n+\lambda\log n]\cap \mathbb{P}|= m\rbrace|\backsim \frac{\lambda^{m} e^{-\lambda}}{m!}x,$

for every ${\lambda}$ and ${m}$.

The idea behind Freiberg’s result is that we can make use of the Maynard’s sieve method to find clusters of consecutive primes inside particular sets and then construct short intervals of specific form around them. Indeed, the Maynard’s sieve method allows us to show that any subset of the primes, which is well distributed in arithmetic progressions (e.g., that verifies a variant of the prime number theorem and of the Bombieri-Vinogradov’s theorem), contains many elements that are close together. The work of Freiberg showed that the subset of primes which belongs to the image of certain admissible sets of linear functions, i.e.

$\displaystyle \mathbb{P}\cap \{gn+h_1,gn+h_2,\dots,gn+h_k\}$

is well distributed in arithmetic progressions and is suitable for the application of Maynard’s results. More specifically, when considering ${n\leq x}$, we ask for a suitable ${g}$ very large in terms of ${x}$ and an admissible set ${\mathcal{H}=\{h_1,h_2,\dots, h_k\}}$ with ${0\leq h_1. We recall that a set ${\mathcal{H}}$ is admissible if for any prime ${p}$ the set of all residue classes ${h_1,h_2,\dots,h_k}$ modulo ${p}$ do not cover all the classes mod ${p}$. Next, the Maynard’s result tells us that the number of integers ${n\leq x}$ for which there are at least ${m}$ primes among ${gn+h_1,gn+h_2,\dots,gn+h_k}$ is at least of order ${x/(\log x)^k}$. A combinatorial process is hence used to detect exactly ${m}$ primes among them that are contained in our selected set of intervals. The idea indeed, is to start with a suitable interval ${I_{j_{0}}}$ of the form ${I_j:=[gn+j,gn+j+\lambda\log(gn+j)]\subset I:=[gn,gn+5\lambda\log x]}$, which contains all the primes (at least ${m}$) inside ${I}$. Next, we note that there exists a suitable ${j_{1}\geq j_{0}}$ such that ${I_{j_{1}}\subset I}$ and ${I_{j_{1}}\cap \mathbb{P}=\emptyset}$. Considering the sequence ${\{I_{j}\}_{j_0\leq j\leq j_{1}}}$, it is easy to prove that for at least one ${\tilde{j}\in[j_{0},j_{1}]}$ we must have ${|I_{\tilde{j}}\cap \mathbb{P}|=m}$. Collecting the various intervals ${I_{\tilde{j}}}$ found in this way and using the estimates contained in the Maynard’s work, we recover the Freiberg’s result.

The major difference between the work of Freiberg and that one of the author is in the way the needed admissible set of linear forms is generated. In the former case, an Erdhos–Rankin type construction was considered, which allows us to lower bound the density related to each choice of ${\lambda}$ and ${m}$, choosing a suitable large value ${g}$ and an admissible set ${\{h_1,h_2,\dots,h_k\}}$. However, this freedom inevitably forces us to lose precision and obtain weak estimates. In the latter case, the set of linear forms was chosen implicitly by means of the Maynard’s sieve, producing in this way better information on the density, allowing to consider ${g}$ of order ${\log x}$, only for very small values of ${\lambda}$.

A better exploration of the last aforementioned approach, in which we insert a well-separated admissible set into the uniform Maynard’s estimates, leads us to generate a positive proportion of short intervals containing a prescribed number of primes. The key idea is that at the start of the process we need to select clusters of primes in which the elements are also well-spaced.

From now on, we will indicate with ${m}$ a non-negative integer, with ${k}$ the value ${k=C\exp(49m/C')}$, for certain suitable constants ${C,C'>0}$, and with ${\lambda}$ a positive real number smaller than ${\varepsilon=\varepsilon(k):=k^{-4}(\log k)^{-2}}$. The result is the following.

Theorem 1 (M.) We have

$\displaystyle |\lbrace n\leq x: |[n,n+\lambda\log n]\cap \mathbb{P}|= m\rbrace|\gg \lambda^{k+1}e^{-Dk^{4}\log k}x, \ \ \ \ \ (3)$

for a certain absolute constant ${D>0}$, if ${x}$ is sufficiently large in terms of ${m}$ and ${\lambda}$.

2. Finding many intervals with clusters of well-spaced primes

We define the double sum

$\displaystyle S=\sum^{*}_{\mathcal{H}}\sum_{x

where $S(\mathcal{H},n)$ equals to

$\displaystyle\bigg( \sum_{i=1}^{k} \textbf{1}_{\mathbb{P}}(gn+h_{i})-m-k\sum_{i=1}^{k}\sum_{\substack{p|gn+h_{i}\\ p\leq x^{\rho}, p\nmid B}}1 -k\sum_{\substack{ h\leq 5\lambda\log x\\ (h,g)=1\\h\not\in\mathcal{H}}} \textbf{1}_{S(\rho, B)}(gn+h)\bigg)w_{n}(\mathcal{H}). \ \ \ \ \ (5)$

Here ${\Sigma^{*}_{\mathcal{H}}}$ means that the sum is over all the admissible sets ${\mathcal{H}}$ such that ${0\leq h_{1} and ${|h_i-h_j|>\frac{\lambda\log x}{C_{0}}}$, for any ${1\leq i\neq j\leq k}$, where ${C_0=C_0(k)}$ a suitable constant. Moreover, we define

$\displaystyle S(\rho, B)=\lbrace n\in \mathbb{N}: p|n\Rightarrow (p>x^{\rho}\ \textrm{or}\ p|B)\rbrace.$

We consider ${\rho}$ verifying suitable growth conditions to be chosen later, ${B}$ a natural number under certain arithmetic conditions, and take ${w_n(\mathcal{H})}$ as the sieve weights used in the Maynard’s work.

We can prove that

$\displaystyle S\ll k(\log x)^{2k} \exp (O(k/\rho))|I(x)|, \ \ \ \ \ (6)$

where now the set ${I(x)}$ contains intervals of the form ${[gn,gn+5\lambda\log x]}$, for ${x, with the property that ${|[gn,gn+5\lambda\log x]\cap \mathbb{P}|=|\lbrace gn+h_{1},...,gn+h_{k}\rbrace\cap\mathbb{P}|\geq m+1}$, for a unique admissible set ${\mathcal{H}}$ such that ${0\leq h_{1}<... and ${|h_i-h_j|>\frac{\lambda\log x}{C_{0}}}$, for any ${1\leq i\neq j\leq k}$. We recall that the intervals in ${I(x)}$ are pairwise disjoint, if for instance ${\lambda<1/5}$. The idea is to bound the term in parenthesis with ${k}$ times the indicator function that such term is positive, using the arithmetic definitions of the sums present there to define the type of intervals contained in ${I(x)}$.

Choosing ${k}$ sufficiently large in terms of ${m}$, e.g. ${k:=C\exp(49m/C')}$, ${\lambda\leq\varepsilon}$, with ${\varepsilon=\varepsilon(k):=k^{-4}(\log k)^{-2}}$, and ${\rho=k^{-3}(\log k)^{-1},}$ and finally taking ${x}$ and ${C}$ suitably large, we may conclude that

$\displaystyle S\gg \lambda^{k} e^{-C_{1} k^{2}}x(\log x)^{2k}, \ \ \ \ \ (7)$

for a certain constant ${C_{1}>0}$.

By combining (6) with (7), we obtain

$\displaystyle |I(x)|\gg \lambda^{k}e^{-C_{2}k^{4}\log k}x, \ \ \ \ \ (8)$

with an absolute constant ${C_{2}>0}$.

3. Combinatorial process

Consider an interval ${I\in I(x)}$. There exist an integer ${x and an admissible set ${\mathcal{H}}$, with ${0\leq h_{1} and ${|h_i-h_j|>\lambda\log x/C_0,}$ for any ${1\leq i\neq j\leq k}$, such that ${I=[gn, gn+5\lambda\log x]}$ and

$\displaystyle |[gn,gn+5\lambda\log x]\cap \mathbb{P}|=|\lbrace gn+h_{1},...,gn+h_{k}\rbrace\cap\mathbb{P}|\geq m+1.$

In order to avoid having a trivial gap between the elements of ${\mathcal{H}}$ we ask for ${x}$ to be sufficiently large with respect to ${\lambda}$ and ${k}$. Let us define

$\displaystyle I_{j}=[N_{j}, N_{j}+\lambda \log N_{j}],\ \ N_{j}=gn+j, \ \ \ \ \ (9)$

for ${j=0,...,\lfloor \lambda\log N_{0}\rfloor}$. We recall here the following properties of the intervals ${I_j}$:
1) for any such ${j}$ we have ${I_{j}\subseteq I}$;
2) for the choice ${j=h_{1}}$ we find that ${I_{j}\cap \lbrace gn+h_{1},...,gn+h_{k}\rbrace=\lbrace gn+h_{1},...,gn+h_{k}\rbrace}$;
3) for the value ${j=\lfloor \lambda\log N_{0}\rfloor}$ we have ${I_{j}\cap\lbrace gn+h_{1},...,gn+h_{k}\rbrace=\emptyset}$;
4) if ${|I_{j}\cap \mathbb{P}|<|I_{j+1}\cap\mathbb{P}|}$, for a certain ${j}$, then ${|I_{j+1}\cap\mathbb{P}|=|I_{j}\cap\mathbb{P}|+1.}$
Now, let’s define

$\displaystyle \tilde{j}:=\max\lbrace 0\leq j\leq \lfloor \lambda\log N_{0}\rfloor: |I_j\cap \mathbb{P}|\geq m+1\rbrace.$

Note that we necessarily have ${N_{\tilde{j}}=gn+\tilde{j}}$ being prime. Consequently, this implies ${|I_{\tilde{j}+1}\cap \mathbb{P}|=m}$, but from our assumption on ${\mathcal{H}}$ it actually derives that

$\displaystyle |I_{\tilde{j}+l}\cap \mathbb{P}|=m,\ \textrm{for any}\ 1\leq l\leq \lfloor \lambda\log x/C_0\rfloor. \ \ \ \ \ (10)$

4. Final estimate

The information (10) is equivalent to say that we have found ${\lfloor \lambda\log x/C_0\rfloor}$-different intervals ${[N, N+\lambda\log N]}$ containing exactly ${m}$ primes, with ${N<5x\log x}$, if ${x}$ is sufficiently large. Together with the lower bound (8), we have obtained that for every ${m\geq 1}$ and for each ${0<\lambda\leq\varepsilon}$,

$\displaystyle |\lbrace N\leq 5x\log x: |[N, N+\lambda\log N]\cap\mathbb{P}|=m\rbrace|\gg \frac{\lambda^{k+1}}{k\log k}e^{-C_{2}k^{4}\log k}x\log x, \ \ \ \ \ (11)$

which in turn is equivalent to

$\displaystyle |\lbrace N\leq X: |[N, N+\lambda\log N]\cap\mathbb{P}|=m\rbrace|\gg \lambda^{k+1}e^{-C_{3}k^{4}\log k}X, \ \ \ \ \ (12)$

when ${X}$ is large enough in terms of ${\lambda}$ and ${k}$, for a certain constant ${C_3>0}$.

5. Concluding remarks

The strength of the Maynard’s sieve is the flexibility, that makes it applicable to counting primes in sparser subsets as well. In fact, the same proof that leads to Theorem 1 can be overall adapted to study a variety of different situations, in which for instance we restrict the primes to lye on an arithmetic progression or allow for uniformity of the parameters ${\lambda}$ and ${m}$. The results are as following.

Remark 2 Let ${x}$ be sufficiently large in terms of ${m}$ and ${\lambda}$. Suppose that ${q\leq f(x)}$ is a positive integer, with ${(\log x)/f(x)\rightarrow\infty}$, as ${x\rightarrow\infty}$. Take ${0\leq a with ${(a,q)=1}$. Then, we have

$\displaystyle |\lbrace n\leq x: |[n,n+\lambda\log n]\cap \mathbb{P}_{a,q}|= m\rbrace|\gg \frac{\lambda^{k+1}e^{-Dk^{4}\log k}}{q^{k+1}}x, \ \ \ \ \ (13)$

for a certain ${D>0}$, where ${\mathbb{P}_{a,q}}$ is the intersection of ${\mathbb{P}}$ with ${a\pmod{q}}$.

Remark 3 Fix ${\epsilon_1>0}$ a small parameter and ${0<\epsilon_{2}<1}$. Let ${x\geq x_{0}(\epsilon_1,\epsilon_2)}$, ${m\leq \epsilon_1\log\log x}$ and ${\lambda\geq (\log x)^{\epsilon_2-1}}$, obeying to the relations ${k^4(\log k)^2 \lambda\leq 1}$ and ${\lambda>k\log k (\log x)^{-1}}$. Then, the estimate (3) continues to hold.

Remark 4 Let ${\mathbb{K}/\mathbb{Q}}$ be a Galois extension of ${\mathbb{Q}}$ with discriminant ${\Delta_{\mathbb{K}}}$. There exist constants ${C_{\mathbb{K}},C_{\mathbb{K}}'>0}$ depending only on ${\mathbb{K}}$ such that the following holds. Let ${\mathcal{C} \subset Gal(\mathbb{K}/\mathbb{Q})}$ be a conjugacy class in the Galois group of ${\mathbb{K}/\mathbb{Q}}$, and let

$\displaystyle \mathcal{P} = \lbrace p\ \textrm{prime} : p\nmid \Delta_{\mathbb{K}}, \left[\frac{\mathbb{K}/\mathbb{Q}}{p}\right]=\mathcal{C}\rbrace,$

where ${\left[\frac{\mathbb{K}/\mathbb{Q}}{.}\right]}$ denotes the Artin symbol. Let ${m\in\mathbb{N}}$, ${k=C_{\mathbb{K}}'\exp(C_{\mathbb{K}} m)}$ and ${\lambda<\varepsilon}$. Then, we have

$\displaystyle |\lbrace n\leq x: |[n,n+\lambda\log n]\cap \mathcal{P}|= m\rbrace|\gg\lambda^{k+1}e^{-Dk^{4}\log k}x, \ \ \ \ \ (14)$

provided ${x \geq x_0(\mathbb{K},\lambda,m)}$.

If we consider values of ${\lambda}$ slightly bigger than ${k^{-4}(\log k)^{-2}}$, a little variation of the sieve method used to prove Theorem 1 leads to the following improvement on the Freiberg bound.

Remark 5 For every non-negative integer ${m}$ and positive real number ${\lambda}$ smaller than ${k^{-1}(\log k)^{-1}}$, with ${k}$ the value ${k=C\exp(49m/C')}$, for suitable constants ${C,C'>0}$, we have

for a certain ${D>0}$, if ${x}$ is sufficiently large in terms of ${m}$ and ${\lambda}$.

### Daniele Mastrostefano, Peter Mühlbacher

We are two PhD students at the University of Warwick.