Skip to content

Conversation

snuyanzin
Copy link

This PR suggests to use BitSets for tracking of ReplaceOps and InsertBeforeOps in TokenStreamRewriter#reduceToSingleOperationPerIndex
and also tries to improve the situation described at
#4886

I also made some measurements
like before PR

Benchmark                 Mode  Cnt      Score     Error   Units
MyBenchmark.tokens1      thrpt   10  25429.444 ± 422.468  ops/ms
MyBenchmark.tokens10     thrpt   10   2098.225 ±  14.184  ops/ms
MyBenchmark.tokens100    thrpt   10     26.714 ±   2.210  ops/ms
MyBenchmark.tokens1000   thrpt   10      0.254 ±   0.003  ops/ms
MyBenchmark.tokens10000  thrpt   10      0.002 ±   0.001  ops/ms

after

Benchmark                 Mode  Cnt      Score     Error   Units
MyBenchmark.tokens1      thrpt   10  23745.040 ± 218.017  ops/ms
MyBenchmark.tokens10     thrpt   10   4286.191 ±  32.264  ops/ms
MyBenchmark.tokens100    thrpt   10     84.262 ±   0.886  ops/ms
MyBenchmark.tokens1000   thrpt   10      0.929 ±   0.005  ops/ms
MyBenchmark.tokens10000  thrpt   10      0.009 ±   0.001  ops/ms
The code for benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
@Fork(value = 2, jvmArgs = {"-Xms2G", "-Xmx2G"})
public class MyBenchmark {
	public static void main(String[] args) throws RunnerException {

		Options opt = new OptionsBuilder()
			.include(MyBenchmark.class.getSimpleName())
			.build();

		new Runner(opt).run();
	}

	@Benchmark
	@BenchmarkMode(Mode.Throughput)
	public void tokens1(Blackhole blackhole) {
		blackhole.consume(TOKENS1.getText());
	}

	@Benchmark
	@BenchmarkMode(Mode.Throughput)
	public void tokens10(Blackhole blackhole) {
		blackhole.consume(TOKENS10.getText());
	}

	@Benchmark
	@BenchmarkMode(Mode.Throughput)
	public void tokens100(Blackhole blackhole) {
		blackhole.consume(TOKENS100.getText());
	}

	@Benchmark
	@BenchmarkMode(Mode.Throughput)
	public void tokens1000(Blackhole blackhole) {
		blackhole.consume(TOKENS1000.getText());
	}

	@Benchmark
	@BenchmarkMode(Mode.Throughput)
	public void tokens10000(Blackhole blackhole) {
		blackhole.consume(TOKENS10000.getText());
	}

	private static TokenStreamRewriter TOKENS1 = getTokes(1);
	private static TokenStreamRewriter TOKENS10 = getTokes(10);
	private static TokenStreamRewriter TOKENS100 = getTokes(100);
	private static TokenStreamRewriter TOKENS1000 = getTokes(1000);
	private static TokenStreamRewriter TOKENS10000 = getTokes(10000);


	private static TokenStreamRewriter getTokes(int size) {
		LexerGrammar g;
		try {
			g = new LexerGrammar(
				"lexer grammar T;\n"+
					"A : 'a';\n" +
					"B : 'b';\n" +
					"C : 'c';\n");
		} catch (RecognitionException e) {
			throw new RuntimeException(e);
		}
		String input = "abc";
		LexerInterpreter lexEngine = g.createLexerInterpreter(new ANTLRInputStream(input));
		CommonTokenStream stream = new CommonTokenStream(lexEngine);
		stream.fill();
		TokenStreamRewriter tokens = new TokenStreamRewriter(stream);
		for (int i = 0; i < size; i++) {
			tokens.replace(0, 2, "x" + i);
			tokens.insertBefore(i, "y" + i);
		}
		return tokens;
	}
}

… large amount of rewrite operations

Signed-off-by: Sergey Nuyanzin <[email protected]>
@snuyanzin snuyanzin changed the title Make TokenStreamRewriter#reduceToSingleOperationPerIndex faster for large amount of rewrite operations JAVA: Make TokenStreamRewriter#reduceToSingleOperationPerIndex faster for large amount of rewrite operations Sep 14, 2025
@snuyanzin
Copy link
Author

@parrt , @KvanTTT sorry for the ping, may I ask you to take a look here please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant