How to efficiently find the index of value in a System.Numerics.Vector<T>?
I am exploring System.Numerics.Vector with .NET Framework 4.7.2 (the project I am working on cannot be migrated to .NET Core 3 and use the new Intrinsics namespace yet). The project is processing very large CSV/TSV files and we spend a lot of times looping through strings to find commas, quotes, etc. and I am trying to speed up the process.
So far, I have been able to use Vector to identify if a string contains a given character or not (using EqualsAny method). That’s great, but I want to go a little further. I want to efficiently find the index of that character using Vector. I do not know how. Below is he function I use to determine if a string contains a comma or not.
private static readonly char Comma = ','; public static bool HasCommas(this string s) { if (s == null) { return false; } ReadOnlySpan<char> charSpan = s.AsSpan(); ReadOnlySpan<Vector<ushort>> charAsVectors = MemoryMarshal.Cast<char, Vector<ushort>>(charSpan); foreach (Vector<ushort> v in charAsVectors) { bool foundCommas = Vector.EqualsAny(v, StringExtensions.Commas); if (foundCommas) { return true; } } int numberOfCharactersProcessedSoFar = charAsVectors.Length * Vector<ushort>.Count; if (s.Length > numberOfCharactersProcessedSoFar) { for (int i = numberOfCharactersProcessedSoFar; i < s.Length; i++) { if (s[i] == ',') { return true; } } } return false; }
I understand that I could use the function above and scan the resulting Vector, but it would defeat the purpose of using a Vector. I heard about the new Intrinsics library that could help, but I cannot upgrade my project to .NET Core 3.
Given a Vector, how would you efficiently find the position of a character? Is there a clever trick that I am not aware of?