Uses of Class
com.google.genai.proto.SentencepieceModel.TrainerSpec.Builder
Packages that use SentencepieceModel.TrainerSpec.Builder
- 
Uses of SentencepieceModel.TrainerSpec.Builder in com.google.genai.protoMethods in com.google.genai.proto that return SentencepieceModel.TrainerSpec.BuilderModifier and TypeMethodDescriptionSentencepieceModel.TrainerSpec.Builder.addAcceptLanguage(String value) List of the languages this model can accept.SentencepieceModel.TrainerSpec.Builder.addAcceptLanguageBytes(com.google.protobuf.ByteString value) List of the languages this model can accept.SentencepieceModel.TrainerSpec.Builder.addAllAcceptLanguage(Iterable<String> values) List of the languages this model can accept.SentencepieceModel.TrainerSpec.Builder.addAllControlSymbols(Iterable<String> values) ///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.SentencepieceModel.TrainerSpec.Builder.addAllInput(Iterable<String> values) ///////////////////////////////////////////////////////////////// General parameters Input corpus files.SentencepieceModel.TrainerSpec.Builder.addAllUserDefinedSymbols(Iterable<String> values) Defines user defined symbols.SentencepieceModel.TrainerSpec.Builder.addControlSymbols(String value) ///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.SentencepieceModel.TrainerSpec.Builder.addControlSymbolsBytes(com.google.protobuf.ByteString value) ///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.SentencepieceModel.TrainerSpec.Builder.addExtension(com.google.protobuf.GeneratedMessage.GeneratedExtension<SentencepieceModel.TrainerSpec, List<Type>> extension, Type value) ///////////////////////////////////////////////////////////////// General parameters Input corpus files.SentencepieceModel.TrainerSpec.Builder.addInputBytes(com.google.protobuf.ByteString value) ///////////////////////////////////////////////////////////////// General parameters Input corpus files.SentencepieceModel.TrainerSpec.Builder.addRepeatedField(com.google.protobuf.Descriptors.FieldDescriptor field, Object value) SentencepieceModel.TrainerSpec.Builder.addUserDefinedSymbols(String value) Defines user defined symbols.SentencepieceModel.TrainerSpec.Builder.addUserDefinedSymbolsBytes(com.google.protobuf.ByteString value) Defines user defined symbols.SentencepieceModel.TrainerSpec.Builder.clear()SentencepieceModel.TrainerSpec.Builder.clearAcceptLanguage()List of the languages this model can accept.SentencepieceModel.TrainerSpec.Builder.clearAllowWhitespaceOnlyPieces()Allows pieces that only contain whitespaces instead of appearing only as prefix or suffix of other pieces.SentencepieceModel.TrainerSpec.Builder.clearBosId()<s>SentencepieceModel.TrainerSpec.Builder.clearBosPiece()optional string bos_piece = 46 [default = "<s>"];SentencepieceModel.TrainerSpec.Builder.clearByteFallback()Decomposes unknown pieces into UTF-8 bytes.SentencepieceModel.TrainerSpec.Builder.clearCharacterCoverage()///////////////////////////////////////////////////////////////// Training parameters.SentencepieceModel.TrainerSpec.Builder.clearControlSymbols()///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.SentencepieceModel.TrainerSpec.Builder.clearDifferentialPrivacyClippingThreshold()Clipping threshold to apply after adding noise.SentencepieceModel.TrainerSpec.Builder.clearDifferentialPrivacyNoiseLevel()Set these parameters if you need DP version of sentencepiece.SentencepieceModel.TrainerSpec.Builder.clearEnableDifferentialPrivacy()Whether to use DP version of sentencepiece.SentencepieceModel.TrainerSpec.Builder.clearEosId()</s>SentencepieceModel.TrainerSpec.Builder.clearEosPiece()optional string eos_piece = 47 [default = "</s>"];SentencepieceModel.TrainerSpec.Builder.clearExtension(com.google.protobuf.GeneratedMessage.GeneratedExtension<SentencepieceModel.TrainerSpec, T> extension) SentencepieceModel.TrainerSpec.Builder.clearField(com.google.protobuf.Descriptors.FieldDescriptor field) SentencepieceModel.TrainerSpec.Builder.clearHardVocabLimit()`vocab_size` is treated as hard limit.SentencepieceModel.TrainerSpec.Builder.clearInput()///////////////////////////////////////////////////////////////// General parameters Input corpus files.SentencepieceModel.TrainerSpec.Builder.clearInputFormat()Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freqSentencepieceModel.TrainerSpec.Builder.clearInputSentenceSize()Maximum size of sentences the trainer loads from `input` parameter.SentencepieceModel.TrainerSpec.Builder.clearMaxSentenceLength()The maximum sentence length in byte.SentencepieceModel.TrainerSpec.Builder.clearMaxSentencepieceLength()///////////////////////////////////////////////////////////////// SentencePiece parameters which control the shapes of sentence piece.SentencepieceModel.TrainerSpec.Builder.clearMiningSentenceSize()Deprecated.SentencepieceModel.TrainerSpec.Builder.clearModelPrefix()Output model file prefix.SentencepieceModel.TrainerSpec.Builder.clearModelType()optional .com.google.genai.proto.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];SentencepieceModel.TrainerSpec.Builder.clearNumSubIterations()Number of EM sub iterations.SentencepieceModel.TrainerSpec.Builder.clearNumThreads()Number of threads in the training.SentencepieceModel.TrainerSpec.Builder.clearOneof(com.google.protobuf.Descriptors.OneofDescriptor oneof) SentencepieceModel.TrainerSpec.Builder.clearPadId()<pad> (padding)SentencepieceModel.TrainerSpec.Builder.clearPadPiece()optional string pad_piece = 48 [default = "<pad>"];SentencepieceModel.TrainerSpec.Builder.clearPretokenizationDelimiter()Defines the pre-tokenization delimiter.SentencepieceModel.TrainerSpec.Builder.clearRequiredChars()Defines required characters.SentencepieceModel.TrainerSpec.Builder.clearSeedSentencepiecesFile()Path to a seed sentencepieces file, with one tab-separated seed sentencepiece <tab> frequency per line.SentencepieceModel.TrainerSpec.Builder.clearSeedSentencepieceSize()The size of seed sentencepieces.SentencepieceModel.TrainerSpec.Builder.clearSelfTestSampleSize()Size of self-test samples, which are encoded in the model file.SentencepieceModel.TrainerSpec.Builder.clearShrinkingFactor()In every EM sub-iterations, keeps top `shrinking_factor` * `current sentencepieces size` with respect to the loss of the sentence piece.SentencepieceModel.TrainerSpec.Builder.clearShuffleInputSentence()optional bool shuffle_input_sentence = 19 [default = true];SentencepieceModel.TrainerSpec.Builder.clearSplitByNumber()When `split_by_number` is true, put a boundary between number and non-number transition.SentencepieceModel.TrainerSpec.Builder.clearSplitByUnicodeScript()Uses Unicode script to split sentence pieces.SentencepieceModel.TrainerSpec.Builder.clearSplitByWhitespace()Use a white space to split sentence pieces.SentencepieceModel.TrainerSpec.Builder.clearSplitDigits()Split all digits (0-9) into separate pieces.SentencepieceModel.TrainerSpec.Builder.clearTrainExtremelyLargeCorpus()Increase bit depth to allow unigram model training on large (>10M sentences) corpora.SentencepieceModel.TrainerSpec.Builder.clearTrainingSentenceSize()Deprecated.SentencepieceModel.TrainerSpec.Builder.clearTreatWhitespaceAsSuffix()Adds whitespace symbol (_) as a suffix instead of prefix.SentencepieceModel.TrainerSpec.Builder.clearUnkId()///////////////////////////////////////////////////////////////// Reserved special meta tokens.SentencepieceModel.TrainerSpec.Builder.clearUnkPiece()optional string unk_piece = 45 [default = "<unk>"];SentencepieceModel.TrainerSpec.Builder.clearUnkSurface()Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.SentencepieceModel.TrainerSpec.Builder.clearUseAllVocab()use all symbols for vocab extraction.SentencepieceModel.TrainerSpec.Builder.clearUserDefinedSymbols()Defines user defined symbols.SentencepieceModel.TrainerSpec.Builder.clearVocabSize()Vocabulary size.SentencepieceModel.TrainerSpec.Builder.clearVocabularyOutputPieceScore()When creating the vocabulary file, defines whether or not to additionally output the score for each piece.SentencepieceModel.TrainerSpec.Builder.clone()SentencepieceModel.ModelProto.Builder.getTrainerSpecBuilder()Spec used to generate this model file.SentencepieceModel.TrainerSpec.Builder.mergeFrom(SentencepieceModel.TrainerSpec other) SentencepieceModel.TrainerSpec.Builder.mergeFrom(com.google.protobuf.CodedInputStream input, com.google.protobuf.ExtensionRegistryLite extensionRegistry) SentencepieceModel.TrainerSpec.Builder.mergeFrom(com.google.protobuf.Message other) SentencepieceModel.TrainerSpec.Builder.mergeUnknownFields(com.google.protobuf.UnknownFieldSet unknownFields) SentencepieceModel.TrainerSpec.newBuilder()SentencepieceModel.TrainerSpec.newBuilder(SentencepieceModel.TrainerSpec prototype) SentencepieceModel.TrainerSpec.newBuilderForType()SentencepieceModel.TrainerSpec.Builder.setAcceptLanguage(int index, String value) List of the languages this model can accept.SentencepieceModel.TrainerSpec.Builder.setAllowWhitespaceOnlyPieces(boolean value) Allows pieces that only contain whitespaces instead of appearing only as prefix or suffix of other pieces.SentencepieceModel.TrainerSpec.Builder.setBosId(int value) <s>SentencepieceModel.TrainerSpec.Builder.setBosPiece(String value) optional string bos_piece = 46 [default = "<s>"];SentencepieceModel.TrainerSpec.Builder.setBosPieceBytes(com.google.protobuf.ByteString value) optional string bos_piece = 46 [default = "<s>"];SentencepieceModel.TrainerSpec.Builder.setByteFallback(boolean value) Decomposes unknown pieces into UTF-8 bytes.SentencepieceModel.TrainerSpec.Builder.setCharacterCoverage(float value) ///////////////////////////////////////////////////////////////// Training parameters.SentencepieceModel.TrainerSpec.Builder.setControlSymbols(int index, String value) ///////////////////////////////////////////////////////////////// Vocabulary management Defines control symbols used as an indicator to change the behavior of the decoder.SentencepieceModel.TrainerSpec.Builder.setDifferentialPrivacyClippingThreshold(long value) Clipping threshold to apply after adding noise.SentencepieceModel.TrainerSpec.Builder.setDifferentialPrivacyNoiseLevel(float value) Set these parameters if you need DP version of sentencepiece.SentencepieceModel.TrainerSpec.Builder.setEnableDifferentialPrivacy(boolean value) Whether to use DP version of sentencepiece.SentencepieceModel.TrainerSpec.Builder.setEosId(int value) </s>SentencepieceModel.TrainerSpec.Builder.setEosPiece(String value) optional string eos_piece = 47 [default = "</s>"];SentencepieceModel.TrainerSpec.Builder.setEosPieceBytes(com.google.protobuf.ByteString value) optional string eos_piece = 47 [default = "</s>"];SentencepieceModel.TrainerSpec.Builder.setExtension(com.google.protobuf.GeneratedMessage.GeneratedExtension<SentencepieceModel.TrainerSpec, List<Type>> extension, int index, Type value) SentencepieceModel.TrainerSpec.Builder.setExtension(com.google.protobuf.GeneratedMessage.GeneratedExtension<SentencepieceModel.TrainerSpec, Type> extension, Type value) SentencepieceModel.TrainerSpec.Builder.setField(com.google.protobuf.Descriptors.FieldDescriptor field, Object value) SentencepieceModel.TrainerSpec.Builder.setHardVocabLimit(boolean value) `vocab_size` is treated as hard limit.///////////////////////////////////////////////////////////////// General parameters Input corpus files.SentencepieceModel.TrainerSpec.Builder.setInputFormat(String value) Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freqSentencepieceModel.TrainerSpec.Builder.setInputFormatBytes(com.google.protobuf.ByteString value) Input corpus format: "text": one-sentence-per-line text format (default) "tsv": sentence <tab> freqSentencepieceModel.TrainerSpec.Builder.setInputSentenceSize(long value) Maximum size of sentences the trainer loads from `input` parameter.SentencepieceModel.TrainerSpec.Builder.setMaxSentenceLength(int value) The maximum sentence length in byte.SentencepieceModel.TrainerSpec.Builder.setMaxSentencepieceLength(int value) ///////////////////////////////////////////////////////////////// SentencePiece parameters which control the shapes of sentence piece.SentencepieceModel.TrainerSpec.Builder.setMiningSentenceSize(int value) Deprecated.SentencepieceModel.TrainerSpec.Builder.setModelPrefix(String value) Output model file prefix.SentencepieceModel.TrainerSpec.Builder.setModelPrefixBytes(com.google.protobuf.ByteString value) Output model file prefix.SentencepieceModel.TrainerSpec.Builder.setModelType(SentencepieceModel.TrainerSpec.ModelType value) optional .com.google.genai.proto.TrainerSpec.ModelType model_type = 3 [default = UNIGRAM];SentencepieceModel.TrainerSpec.Builder.setNumSubIterations(int value) Number of EM sub iterations.SentencepieceModel.TrainerSpec.Builder.setNumThreads(int value) Number of threads in the training.SentencepieceModel.TrainerSpec.Builder.setPadId(int value) <pad> (padding)SentencepieceModel.TrainerSpec.Builder.setPadPiece(String value) optional string pad_piece = 48 [default = "<pad>"];SentencepieceModel.TrainerSpec.Builder.setPadPieceBytes(com.google.protobuf.ByteString value) optional string pad_piece = 48 [default = "<pad>"];SentencepieceModel.TrainerSpec.Builder.setPretokenizationDelimiter(String value) Defines the pre-tokenization delimiter.SentencepieceModel.TrainerSpec.Builder.setPretokenizationDelimiterBytes(com.google.protobuf.ByteString value) Defines the pre-tokenization delimiter.SentencepieceModel.TrainerSpec.Builder.setRepeatedField(com.google.protobuf.Descriptors.FieldDescriptor field, int index, Object value) SentencepieceModel.TrainerSpec.Builder.setRequiredChars(String value) Defines required characters.SentencepieceModel.TrainerSpec.Builder.setRequiredCharsBytes(com.google.protobuf.ByteString value) Defines required characters.SentencepieceModel.TrainerSpec.Builder.setSeedSentencepiecesFile(String value) Path to a seed sentencepieces file, with one tab-separated seed sentencepiece <tab> frequency per line.SentencepieceModel.TrainerSpec.Builder.setSeedSentencepiecesFileBytes(com.google.protobuf.ByteString value) Path to a seed sentencepieces file, with one tab-separated seed sentencepiece <tab> frequency per line.SentencepieceModel.TrainerSpec.Builder.setSeedSentencepieceSize(int value) The size of seed sentencepieces.SentencepieceModel.TrainerSpec.Builder.setSelfTestSampleSize(int value) Size of self-test samples, which are encoded in the model file.SentencepieceModel.TrainerSpec.Builder.setShrinkingFactor(float value) In every EM sub-iterations, keeps top `shrinking_factor` * `current sentencepieces size` with respect to the loss of the sentence piece.SentencepieceModel.TrainerSpec.Builder.setShuffleInputSentence(boolean value) optional bool shuffle_input_sentence = 19 [default = true];SentencepieceModel.TrainerSpec.Builder.setSplitByNumber(boolean value) When `split_by_number` is true, put a boundary between number and non-number transition.SentencepieceModel.TrainerSpec.Builder.setSplitByUnicodeScript(boolean value) Uses Unicode script to split sentence pieces.SentencepieceModel.TrainerSpec.Builder.setSplitByWhitespace(boolean value) Use a white space to split sentence pieces.SentencepieceModel.TrainerSpec.Builder.setSplitDigits(boolean value) Split all digits (0-9) into separate pieces.SentencepieceModel.TrainerSpec.Builder.setTrainExtremelyLargeCorpus(boolean value) Increase bit depth to allow unigram model training on large (>10M sentences) corpora.SentencepieceModel.TrainerSpec.Builder.setTrainingSentenceSize(int value) Deprecated.SentencepieceModel.TrainerSpec.Builder.setTreatWhitespaceAsSuffix(boolean value) Adds whitespace symbol (_) as a suffix instead of prefix.SentencepieceModel.TrainerSpec.Builder.setUnkId(int value) ///////////////////////////////////////////////////////////////// Reserved special meta tokens.SentencepieceModel.TrainerSpec.Builder.setUnknownFields(com.google.protobuf.UnknownFieldSet unknownFields) SentencepieceModel.TrainerSpec.Builder.setUnkPiece(String value) optional string unk_piece = 45 [default = "<unk>"];SentencepieceModel.TrainerSpec.Builder.setUnkPieceBytes(com.google.protobuf.ByteString value) optional string unk_piece = 45 [default = "<unk>"];SentencepieceModel.TrainerSpec.Builder.setUnkSurface(String value) Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.SentencepieceModel.TrainerSpec.Builder.setUnkSurfaceBytes(com.google.protobuf.ByteString value) Encodes <unk> into U+2047 (DOUBLE QUESTION MARK), since this character can be useful both for user and developer.SentencepieceModel.TrainerSpec.Builder.setUseAllVocab(boolean value) use all symbols for vocab extraction.SentencepieceModel.TrainerSpec.Builder.setUserDefinedSymbols(int index, String value) Defines user defined symbols.SentencepieceModel.TrainerSpec.Builder.setVocabSize(int value) Vocabulary size.SentencepieceModel.TrainerSpec.Builder.setVocabularyOutputPieceScore(boolean value) When creating the vocabulary file, defines whether or not to additionally output the score for each piece.SentencepieceModel.TrainerSpec.toBuilder()Methods in com.google.genai.proto with parameters of type SentencepieceModel.TrainerSpec.BuilderModifier and TypeMethodDescriptionSentencepieceModel.ModelProto.Builder.setTrainerSpec(SentencepieceModel.TrainerSpec.Builder builderForValue) Spec used to generate this model file.