Uses of Class
com.google.genai.proto.SentencepieceModel.TrainerSpec.Builder
Packages that use SentencepieceModel.TrainerSpec.Builder
-
Uses of SentencepieceModel.TrainerSpec.Builder in com.google.genai.proto
Methods in com.google.genai.proto that return SentencepieceModel.TrainerSpec.BuilderModifier and TypeMethodDescriptionSentencepieceModel.TrainerSpec.Builder.addAcceptLanguage
(String value) List of the languages this model can accept.SentencepieceModel.TrainerSpec.Builder.addAcceptLanguageBytes
(com.google.protobuf.ByteString value) List of the languages this model can accept.SentencepieceModel.TrainerSpec.Builder.addAllAcceptLanguage
(Iterable<String> values) List of the languages this model can accept.SentencepieceModel.TrainerSpec.Builder.addAllControlSymbols
(Iterable<String> values) ///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.SentencepieceModel.TrainerSpec.Builder.addAllInput
(Iterable<String> values) ///////////////////////////////////////////////////////////////// General parameters Input corpus files.SentencepieceModel.TrainerSpec.Builder.addAllUserDefinedSymbols
(Iterable<String> values) Defines user defined symbols.SentencepieceModel.TrainerSpec.Builder.addControlSymbols
(String value) ///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.SentencepieceModel.TrainerSpec.Builder.addControlSymbolsBytes
(com.google.protobuf.ByteString value) ///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.SentencepieceModel.TrainerSpec.Builder.addExtension
(com.google.protobuf.GeneratedMessage.GeneratedExtension<SentencepieceModel.TrainerSpec, List<Type>> extension, Type value) ///////////////////////////////////////////////////////////////// General parameters Input corpus files.SentencepieceModel.TrainerSpec.Builder.addInputBytes
(com.google.protobuf.ByteString value) ///////////////////////////////////////////////////////////////// General parameters Input corpus files.SentencepieceModel.TrainerSpec.Builder.addRepeatedField
(com.google.protobuf.Descriptors.FieldDescriptor field, Object value) SentencepieceModel.TrainerSpec.Builder.addUserDefinedSymbols
(String value) Defines user defined symbols.SentencepieceModel.TrainerSpec.Builder.addUserDefinedSymbolsBytes
(com.google.protobuf.ByteString value) Defines user defined symbols.SentencepieceModel.TrainerSpec.Builder.clear()
SentencepieceModel.TrainerSpec.Builder.clearAcceptLanguage()
List of the languages this model can accept.SentencepieceModel.TrainerSpec.Builder.clearAllowWhitespaceOnlyPieces()
Allows pieces that only contain whitespaces instead of appearing only as prefix or suffix of other pieces.SentencepieceModel.TrainerSpec.Builder.clearBosId()
<s>SentencepieceModel.TrainerSpec.Builder.clearBosPiece()
optional string bos_piece = 46 [default = "<s>"];
SentencepieceModel.TrainerSpec.Builder.clearByteFallback()
Decomposes unknown pieces into UTF-8 bytes.SentencepieceModel.TrainerSpec.Builder.clearCharacterCoverage()
///////////////////////////////////////////////////////////////// Training parameters.SentencepieceModel.TrainerSpec.Builder.clearControlSymbols()
///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.SentencepieceModel.TrainerSpec.Builder.clearDifferentialPrivacyClippingThreshold()
Clipping threshold to apply after adding noise.SentencepieceModel.TrainerSpec.Builder.clearDifferentialPrivacyNoiseLevel()
Set these parameters if you need DP version of sentencepiece.SentencepieceModel.TrainerSpec.Builder.clearEnableDifferentialPrivacy()
Whether to use DP version of sentencepiece.SentencepieceModel.TrainerSpec.Builder.clearEosId()
</s>SentencepieceModel.TrainerSpec.Builder.clearEosPiece()
optional string eos_piece = 47 [default = "</s>"];
SentencepieceModel.TrainerSpec.Builder.clearExtension
(com.google.protobuf.GeneratedMessage.GeneratedExtension<SentencepieceModel.TrainerSpec, T> extension) SentencepieceModel.TrainerSpec.Builder.clearField
(com.google.protobuf.Descriptors.FieldDescriptor field) SentencepieceModel.TrainerSpec.Builder.clearHardVocabLimit()
`vocab_size` is treated as hard limit.SentencepieceModel.TrainerSpec.Builder.clearInput()
///////////////////////////////////////////////////////////////// General parameters Input corpus files.SentencepieceModel.TrainerSpec.Builder.clearInputFormat()
Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freqSentencepieceModel.TrainerSpec.Builder.clearInputSentenceSize()
Maximum size of sentences the trainer loads from `input` parameter.SentencepieceModel.TrainerSpec.Builder.clearMaxSentenceLength()
The maximum sentence length in byte.SentencepieceModel.TrainerSpec.Builder.clearMaxSentencepieceLength()
///////////////////////////////////////////////////////////////// SentencePiece parameters which control the shapes of sentence piece.SentencepieceModel.TrainerSpec.Builder.clearMiningSentenceSize()
Deprecated.SentencepieceModel.TrainerSpec.Builder.clearModelPrefix()
Output model file prefix.SentencepieceModel.TrainerSpec.Builder.clearModelType()
optional .com.google.genai.proto.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];
SentencepieceModel.TrainerSpec.Builder.clearNumSubIterations()
Number of EM sub iterations.SentencepieceModel.TrainerSpec.Builder.clearNumThreads()
Number of threads in the training.SentencepieceModel.TrainerSpec.Builder.clearOneof
(com.google.protobuf.Descriptors.OneofDescriptor oneof) SentencepieceModel.TrainerSpec.Builder.clearPadId()
<pad> (padding)SentencepieceModel.TrainerSpec.Builder.clearPadPiece()
optional string pad_piece = 48 [default = "<pad>"];
SentencepieceModel.TrainerSpec.Builder.clearPretokenizationDelimiter()
Defines the pre-tokenization delimiter.SentencepieceModel.TrainerSpec.Builder.clearRequiredChars()
Defines required characters.SentencepieceModel.TrainerSpec.Builder.clearSeedSentencepiecesFile()
Path to a seed sentencepieces file, with one tab-separated seed sentencepiece <tab> frequency per line.SentencepieceModel.TrainerSpec.Builder.clearSeedSentencepieceSize()
The size of seed sentencepieces.SentencepieceModel.TrainerSpec.Builder.clearSelfTestSampleSize()
Size of self-test samples, which are encoded in the model file.SentencepieceModel.TrainerSpec.Builder.clearShrinkingFactor()
In every EM sub-iterations, keeps top `shrinking_factor` * `current sentencepieces size` with respect to the loss of the sentence piece.SentencepieceModel.TrainerSpec.Builder.clearShuffleInputSentence()
optional bool shuffle_input_sentence = 19 [default = true];
SentencepieceModel.TrainerSpec.Builder.clearSplitByNumber()
When `split_by_number` is true, put a boundary between number and non-number transition.SentencepieceModel.TrainerSpec.Builder.clearSplitByUnicodeScript()
Uses Unicode script to split sentence pieces.SentencepieceModel.TrainerSpec.Builder.clearSplitByWhitespace()
Use a white space to split sentence pieces.SentencepieceModel.TrainerSpec.Builder.clearSplitDigits()
Split all digits (0-9) into separate pieces.SentencepieceModel.TrainerSpec.Builder.clearTrainExtremelyLargeCorpus()
Increase bit depth to allow unigram model training on large (>10M sentences) corpora.SentencepieceModel.TrainerSpec.Builder.clearTrainingSentenceSize()
Deprecated.SentencepieceModel.TrainerSpec.Builder.clearTreatWhitespaceAsSuffix()
Adds whitespace symbol (_) as a suffix instead of prefix.SentencepieceModel.TrainerSpec.Builder.clearUnkId()
///////////////////////////////////////////////////////////////// Reserved special meta tokens.SentencepieceModel.TrainerSpec.Builder.clearUnkPiece()
optional string unk_piece = 45 [default = "<unk>"];
SentencepieceModel.TrainerSpec.Builder.clearUnkSurface()
Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.SentencepieceModel.TrainerSpec.Builder.clearUseAllVocab()
use all symbols for vocab extraction.SentencepieceModel.TrainerSpec.Builder.clearUserDefinedSymbols()
Defines user defined symbols.SentencepieceModel.TrainerSpec.Builder.clearVocabSize()
Vocabulary size.SentencepieceModel.TrainerSpec.Builder.clearVocabularyOutputPieceScore()
When creating the vocabulary file, defines whether or not to additionally output the score for each piece.SentencepieceModel.TrainerSpec.Builder.clone()
SentencepieceModel.ModelProto.Builder.getTrainerSpecBuilder()
Spec used to generate this model file.SentencepieceModel.TrainerSpec.Builder.mergeFrom
(SentencepieceModel.TrainerSpec other) SentencepieceModel.TrainerSpec.Builder.mergeFrom
(com.google.protobuf.CodedInputStream input, com.google.protobuf.ExtensionRegistryLite extensionRegistry) SentencepieceModel.TrainerSpec.Builder.mergeFrom
(com.google.protobuf.Message other) SentencepieceModel.TrainerSpec.Builder.mergeUnknownFields
(com.google.protobuf.UnknownFieldSet unknownFields) SentencepieceModel.TrainerSpec.newBuilder()
SentencepieceModel.TrainerSpec.newBuilder
(SentencepieceModel.TrainerSpec prototype) SentencepieceModel.TrainerSpec.newBuilderForType()
SentencepieceModel.TrainerSpec.Builder.setAcceptLanguage
(int index, String value) List of the languages this model can accept.SentencepieceModel.TrainerSpec.Builder.setAllowWhitespaceOnlyPieces
(boolean value) Allows pieces that only contain whitespaces instead of appearing only as prefix or suffix of other pieces.SentencepieceModel.TrainerSpec.Builder.setBosId
(int value) <s>SentencepieceModel.TrainerSpec.Builder.setBosPiece
(String value) optional string bos_piece = 46 [default = "<s>"];
SentencepieceModel.TrainerSpec.Builder.setBosPieceBytes
(com.google.protobuf.ByteString value) optional string bos_piece = 46 [default = "<s>"];
SentencepieceModel.TrainerSpec.Builder.setByteFallback
(boolean value) Decomposes unknown pieces into UTF-8 bytes.SentencepieceModel.TrainerSpec.Builder.setCharacterCoverage
(float value) ///////////////////////////////////////////////////////////////// Training parameters.SentencepieceModel.TrainerSpec.Builder.setControlSymbols
(int index, String value) ///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.SentencepieceModel.TrainerSpec.Builder.setDifferentialPrivacyClippingThreshold
(long value) Clipping threshold to apply after adding noise.SentencepieceModel.TrainerSpec.Builder.setDifferentialPrivacyNoiseLevel
(float value) Set these parameters if you need DP version of sentencepiece.SentencepieceModel.TrainerSpec.Builder.setEnableDifferentialPrivacy
(boolean value) Whether to use DP version of sentencepiece.SentencepieceModel.TrainerSpec.Builder.setEosId
(int value) </s>SentencepieceModel.TrainerSpec.Builder.setEosPiece
(String value) optional string eos_piece = 47 [default = "</s>"];
SentencepieceModel.TrainerSpec.Builder.setEosPieceBytes
(com.google.protobuf.ByteString value) optional string eos_piece = 47 [default = "</s>"];
SentencepieceModel.TrainerSpec.Builder.setExtension
(com.google.protobuf.GeneratedMessage.GeneratedExtension<SentencepieceModel.TrainerSpec, List<Type>> extension, int index, Type value) SentencepieceModel.TrainerSpec.Builder.setExtension
(com.google.protobuf.GeneratedMessage.GeneratedExtension<SentencepieceModel.TrainerSpec, Type> extension, Type value) SentencepieceModel.TrainerSpec.Builder.setField
(com.google.protobuf.Descriptors.FieldDescriptor field, Object value) SentencepieceModel.TrainerSpec.Builder.setHardVocabLimit
(boolean value) `vocab_size` is treated as hard limit.///////////////////////////////////////////////////////////////// General parameters Input corpus files.SentencepieceModel.TrainerSpec.Builder.setInputFormat
(String value) Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freqSentencepieceModel.TrainerSpec.Builder.setInputFormatBytes
(com.google.protobuf.ByteString value) Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freqSentencepieceModel.TrainerSpec.Builder.setInputSentenceSize
(long value) Maximum size of sentences the trainer loads from `input` parameter.SentencepieceModel.TrainerSpec.Builder.setMaxSentenceLength
(int value) The maximum sentence length in byte.SentencepieceModel.TrainerSpec.Builder.setMaxSentencepieceLength
(int value) ///////////////////////////////////////////////////////////////// SentencePiece parameters which control the shapes of sentence piece.SentencepieceModel.TrainerSpec.Builder.setMiningSentenceSize
(int value) Deprecated.SentencepieceModel.TrainerSpec.Builder.setModelPrefix
(String value) Output model file prefix.SentencepieceModel.TrainerSpec.Builder.setModelPrefixBytes
(com.google.protobuf.ByteString value) Output model file prefix.SentencepieceModel.TrainerSpec.Builder.setModelType
(SentencepieceModel.TrainerSpec.ModelType value) optional .com.google.genai.proto.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];
SentencepieceModel.TrainerSpec.Builder.setNumSubIterations
(int value) Number of EM sub iterations.SentencepieceModel.TrainerSpec.Builder.setNumThreads
(int value) Number of threads in the training.SentencepieceModel.TrainerSpec.Builder.setPadId
(int value) <pad> (padding)SentencepieceModel.TrainerSpec.Builder.setPadPiece
(String value) optional string pad_piece = 48 [default = "<pad>"];
SentencepieceModel.TrainerSpec.Builder.setPadPieceBytes
(com.google.protobuf.ByteString value) optional string pad_piece = 48 [default = "<pad>"];
SentencepieceModel.TrainerSpec.Builder.setPretokenizationDelimiter
(String value) Defines the pre-tokenization delimiter.SentencepieceModel.TrainerSpec.Builder.setPretokenizationDelimiterBytes
(com.google.protobuf.ByteString value) Defines the pre-tokenization delimiter.SentencepieceModel.TrainerSpec.Builder.setRepeatedField
(com.google.protobuf.Descriptors.FieldDescriptor field, int index, Object value) SentencepieceModel.TrainerSpec.Builder.setRequiredChars
(String value) Defines required characters.SentencepieceModel.TrainerSpec.Builder.setRequiredCharsBytes
(com.google.protobuf.ByteString value) Defines required characters.SentencepieceModel.TrainerSpec.Builder.setSeedSentencepiecesFile
(String value) Path to a seed sentencepieces file, with one tab-separated seed sentencepiece <tab> frequency per line.SentencepieceModel.TrainerSpec.Builder.setSeedSentencepiecesFileBytes
(com.google.protobuf.ByteString value) Path to a seed sentencepieces file, with one tab-separated seed sentencepiece <tab> frequency per line.SentencepieceModel.TrainerSpec.Builder.setSeedSentencepieceSize
(int value) The size of seed sentencepieces.SentencepieceModel.TrainerSpec.Builder.setSelfTestSampleSize
(int value) Size of self-test samples, which are encoded in the model file.SentencepieceModel.TrainerSpec.Builder.setShrinkingFactor
(float value) In every EM sub-iterations, keeps top `shrinking_factor` * `current sentencepieces size` with respect to the loss of the sentence piece.SentencepieceModel.TrainerSpec.Builder.setShuffleInputSentence
(boolean value) optional bool shuffle_input_sentence = 19 [default = true];
SentencepieceModel.TrainerSpec.Builder.setSplitByNumber
(boolean value) When `split_by_number` is true, put a boundary between number and non-number transition.SentencepieceModel.TrainerSpec.Builder.setSplitByUnicodeScript
(boolean value) Uses Unicode script to split sentence pieces.SentencepieceModel.TrainerSpec.Builder.setSplitByWhitespace
(boolean value) Use a white space to split sentence pieces.SentencepieceModel.TrainerSpec.Builder.setSplitDigits
(boolean value) Split all digits (0-9) into separate pieces.SentencepieceModel.TrainerSpec.Builder.setTrainExtremelyLargeCorpus
(boolean value) Increase bit depth to allow unigram model training on large (>10M sentences) corpora.SentencepieceModel.TrainerSpec.Builder.setTrainingSentenceSize
(int value) Deprecated.SentencepieceModel.TrainerSpec.Builder.setTreatWhitespaceAsSuffix
(boolean value) Adds whitespace symbol (_) as a suffix instead of prefix.SentencepieceModel.TrainerSpec.Builder.setUnkId
(int value) ///////////////////////////////////////////////////////////////// Reserved special meta tokens.SentencepieceModel.TrainerSpec.Builder.setUnknownFields
(com.google.protobuf.UnknownFieldSet unknownFields) SentencepieceModel.TrainerSpec.Builder.setUnkPiece
(String value) optional string unk_piece = 45 [default = "<unk>"];
SentencepieceModel.TrainerSpec.Builder.setUnkPieceBytes
(com.google.protobuf.ByteString value) optional string unk_piece = 45 [default = "<unk>"];
SentencepieceModel.TrainerSpec.Builder.setUnkSurface
(String value) Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.SentencepieceModel.TrainerSpec.Builder.setUnkSurfaceBytes
(com.google.protobuf.ByteString value) Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.SentencepieceModel.TrainerSpec.Builder.setUseAllVocab
(boolean value) use all symbols for vocab extraction.SentencepieceModel.TrainerSpec.Builder.setUserDefinedSymbols
(int index, String value) Defines user defined symbols.SentencepieceModel.TrainerSpec.Builder.setVocabSize
(int value) Vocabulary size.SentencepieceModel.TrainerSpec.Builder.setVocabularyOutputPieceScore
(boolean value) When creating the vocabulary file, defines whether or not to additionally output the score for each piece.SentencepieceModel.TrainerSpec.toBuilder()
Methods in com.google.genai.proto with parameters of type SentencepieceModel.TrainerSpec.BuilderModifier and TypeMethodDescriptionSentencepieceModel.ModelProto.Builder.setTrainerSpec
(SentencepieceModel.TrainerSpec.Builder builderForValue) Spec used to generate this model file.