Skip to content
Smartcat
 

Smartcat Multilingual Generative AI Launch

What to expect in this 60-minute session

➡️ Introductions and opening presentation - Andrew Federici, Smartcat

➡️ Panel Discussion: Current trends in AI - usage trends, security concerns, killer apps (David Kelly, Laszlo K Varga, Mike Kaput)

➡️ Introducing Smartcat AI - Igor Afanasyev & Jean-Luc Saillard, Smartcat

➡️ Q&A - Panelists + Smartcat


Q&A Part 1 - Panelist Discussion

 

Q: How should organizations deal with sensitive information, or information that may be subject to regulatory or compliance policies?

A: It’s a bit like the Wild Wild West right now when it comes to compliance or adherence. 
The biggest reason why we say this is, there are a lot of people who are playing with these open source or public tools, without a real understanding of what's going on behind the scenes. They type some stuff into a generative AI tool, some magic happens, and the new content appears. But they don't necessarily understand everything that's going on, and the ramifications as it relates to copyright and reputation and privacy.

Here are some suggestions to get started. But note different companies are adopting very different approaches so please take these as guidelines and suggestions, and work with your own in-house experts if you have any concerns.

  1. It's really really important to get some basic policies around the use of generative AI in place sooner rather than later. Initially they don't have to be some big scary set of laws against or for using generative AI. But you do need to start giving yourself and employees guidance on what these tools can do, what they should and shouldn't be used for, because there are a lot of pitfalls, especially if you're in a more regulated industry. 
  2. If you are involved in conversations about how to deploy a solution like this, we recommend you include someone with a legal or a compliance background in your AI council or informal group right from the start. This will help you avoid any issues down the road, rather than you coming up with a big plan on your own and then having it get scuttled due to legal or by compliance issues. 
  3. It gets more complicated if you’re working with partners or agencies. We've already seen this in L&D, where an agency is contracted to develop a new training program, similar to prior works. And the question came up when the new materials were developed using an open source LLM: as the curriculum owner, how does that affect my copyright for the new materials? How can I find out? There’s also the counter-concern. If I’m working with an agency, are THEY practicing good compliant behavior? Or is my content being used to train a public LLM model? How can I validate that?
  4. So the best solution is to get some rules in place ASAP that give you, your organization and your partners a little bit more of a solid roadmap on how to use these tools in an effective and safe manner.

 

Q: Do you expect integration between CAT tools/MT Engines with GenAI for a customized translation/localization solution?

A: This is already happening. Many vendors, Smartcat included, already offer GPT4 as an alternative translation engine, as well as prompt-based editing. So it’s not a question of “will this happen” but rather, how will it be implemented by each vendor in the space. 

 

Q: Have there been any attempts to create large language models with generative AI for endangered and low-resource languages?

A: While the fundamental model of LLMs requires a large library, there are efforts underway to capture endangered languages and build LLMs. Check out this recent research paper by the University of Maryland on revitalizing endangered languages using AI. The https://oecd.ai/en/wonk/language-models-policy-implications is also actively monitoring and working on this from a global policy perspective.

 

Q: What are your expectations for AI's impact on some poor online languages, i.e., Arabic, Farsi, Hebrew, etc.?

A: While LLMs are not likely to help with the translation itself for low-resource languages (or do no better than existing NMT engines), some of them can be helpful in augmenting the existing NMT solutions (with context, rephrasing, shortening, tone adjustments, etc), and we're already seeing this from some of the MT providers.

 

Q: In the near- to mid-term, what changes do you expect in the volume of intervention by humans (translator/editor/post-editor) in the translation/localization process?

A: We expect the amount of intervention for each piece of content to decrease as the tech gets better. At the same time, because it’s so easy to generate new content in additional languages, there's just going to be more content to edit or post-edit (either translated by MT or originated in multilingual form by an LLM).

 

Q: When talking in the context of AI, what's the difference between Natural Language Processing (NLP) and Machine Translation?

A: To clarify some of these terms: NLP includes machine translation (MT) and many other disciplines). NLP (as well as MT) can be programmatic (rule-based, statistic) or machine learning (AI) based. To date, most MT uses a rules-based approach, although platforms like Smartcat include a learning component that’s technically AI.
Nowadays AI NLP is all the rage, but it doesn't mean that non-AI NLP has become obsolete. For instance, semantic search and analysis via knowledge graphs (non-AI NLP) is highly evolved and delivers a significant level of quality, accuracy and consistency. AI can also do this to a certain degree and accuracy. The best solution will be when we see these approaches working together (such as in Retrieval Augmented Generation or RAG).

 

Q&A Part 2 - Product Demo

Q: Can the context (articles that you selected to help generate) come from our translated documents? For example, if we translate a lot of news articles, can I generate a new article based on the ones that we previously translated?

A: The way our system works, it analyzes your source documents. And we're working on analyzing targets as well. So you should be able to generate your content, not only based on source documents but also based on verified translations that you have.

 

Q: Do you have a process in place where content generated is verified? This has been an issue with some other generative AI tools.

A: It’s true that the output of generative AI tools typically requires cleanup and validation. We’ve taken steps in the latest Smartcat AI release to ensure you get the best results right out of the gate, but we have plans for more robust tools in the months ahead. 

  1. As we showed in the demo, when somebody enters a prompt to generate a new piece of content, we pull in the most relevant content for the user to build from. So by its nature, the quality of the content itself should be higher than what you could get from an open platform like ChatGPT.  And of course you can use additional prompts to tweak the output.
  2. In the next development phase we will integrate the generative component more tightly into the Smartcat editing interface so you will be able to quickly create new content, and then edit it either manually or via the prompt interface.

 

Q: If I have a new client that needs content, but I have yet to do any previous translations for them, can they send me their previous files, and content, can I upload it, and then can I use it as a reference?

A: Yes, you just need to bring the context files into your workspace (or client workspace) in order to use them to generate new content.

 

Q: Do the results generated come only from what has been previously uploaded/indexed in previous Smartcat projects?

A: Yes, Smartcat generates new content just from your uploaded documents. This is part of our commitment to your privacy and security.

 

Q: Is this Smartcat AI assistant available in other languages?

A: Two part answer:

  • The tool itself is language agnostic. So if you upload Spanish documents as a source, Smartcat will index them in Spanish, you can enter your prompt  in Spanish, and the output will be Spanish.
  • The Smartcat UI itself is available in multiple languages.

Q: Are there any plans to add voice-over translation and multilanguage voice-over generation for videos?

A: Yes, we're working on it. It's too early to give all the details but that's something that we consider a crucial part of our platform.

 

Q: How do I turn what was generated into a “classic” Smartcat project, instead of having it “only” in a chat format?

A: With the September 14 release there is no automated way of transforming the content directly into a project. You can copy and paste from the chat window into a document, and then bring that document into an existing or new Smartcat project. 
We’re working on making this automatic in the future so content flows seamlessly into a project

Note you can also translate the generated content directly within the chat window.

 

Q: Can you show the correlation between using AI and the regular translation process right in the interface?

A: (We didn’t get a chance to answer all questions live in the session, but most of this was covered live)

As noted above, the current chat interface/window creates a block of formatted text – just like other Gen AI tools – that you then need to save out into a separate document.

To switch from the Chat interface to the main Smartcat project dashboard, just click on the “Home” icon in the upper left corner of the screen.

 

Q: We are a part of a bigger organization with many people using Smartcat. Will all of the information and content be combined when I put in a prompt, or will the outcome just be based on my input?

A: It works as follows:

  1. Smartcat can have multiple workspaces of content. Each workspace has its own access controls – so you might have access to Marketing materials but not Learning & Development for example. By default, we index all of the content that lives in all those Smartcat workspaces. If you had access to the workspace initially, then the content should be available to you for new content generation.
  2. Workspace administrators can choose to opt out of indexing.  So if a specific workspace contains sensitive or proprietary material, there's a way to opt out of indexing that for content generation, and it will never appear in the context.